Train just on qi (#15)

* QI updates * update code to support training ML on QI features * fix code so features that are not from QI should not be part of generalizations and add description * merging two branches, training on QI and on all data * adding tests and asserts
2026-06-20 15:38:05 +02:00 · 2022-01-12 17:01:27 +02:00 · 2022-01-12 17:01:27 +02:00 · a9a93c8a3a
commit a9a93c8a3a
parent 2eb626c00c
4 changed files with 373 additions and 135 deletions
--- a/apt/anonymization/anonymizer.py
+++ b/apt/anonymization/anonymizer.py
@ -19,9 +19,8 @@ class Anonymize:
        """
        :param k: The privacy parameter that determines the number of records that will be indistinguishable from each
                  other (when looking at the quasi identifiers). Should be at least 2.
-        :param quasi_identifiers: The indexes of the features that need to be anonymized (these should be the features
-                                  that may directly, indirectly or in combination with additional data, identify an
-                                  individual).
+        :param quasi_identifiers: The features that need to be minimized in case of pandas data, and indexes of features
+                                  in case of numpy data.
        :param categorical_features: The list of categorical features (should only be supplied when passing data as a
                                     pandas dataframe.
        """