Train just on qi (#15)

* QI updates
* update code to support training ML on QI features
* fix code so features that are not from QI should not be part of generalizations
and add description
* merging two branches, training on QI and on all data
* adding tests and asserts
This commit is contained in:
olasaadi 2022-01-12 17:01:27 +02:00 committed by GitHub
parent 2eb626c00c
commit a9a93c8a3a
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 373 additions and 135 deletions

View file

@ -19,9 +19,8 @@ class Anonymize:
"""
:param k: The privacy parameter that determines the number of records that will be indistinguishable from each
other (when looking at the quasi identifiers). Should be at least 2.
:param quasi_identifiers: The indexes of the features that need to be anonymized (these should be the features
that may directly, indirectly or in combination with additional data, identify an
individual).
:param quasi_identifiers: The features that need to be minimized in case of pandas data, and indexes of features
in case of numpy data.
:param categorical_features: The list of categorical features (should only be supplied when passing data as a
pandas dataframe.
"""