Add data minimization functionality to the ai-privacy-toolkit (#3)

* Fix directory issue when running tests for first time * Initial version of data minimization * Update version and documentation * Fix documentation
2026-07-20 16:51:02 +02:00 · 2021-07-12 15:56:42 +03:00 · 2021-07-12 15:56:42 +03:00 · f2e1364b43
commit f2e1364b43
parent bcc3d67ba4
14 changed files with 920 additions and 34 deletions
--- a/apt/minimization/init.py
+++ b/apt/minimization/init.py
@ -0,0 +1,19 @@
+"""
+Module providing data minimization for ML.
+
+This module implements a first-of-a-kind method to help reduce the amount of personal data needed to perform
+predictions with a machine learning model, by removing or generalizing some of the input features. For more information
+about the method see: http://export.arxiv.org/pdf/2008.04113
+
+The main class, ``GeneralizeToRepresentative``, is a scikit-learn compatible ``Transformer``, that receives an existing
+estimator and labeled training data, and learns the generalizations that can be applied to any newly collected data for
+analysis by the original model. The ``fit()`` method learns the generalizations and the ``transform()`` method applies
+them to new data.
+
+It is also possible to export the generalizations as feature ranges.
+
+The current implementation supports only numeric features, so any categorical features must be transformed to a numeric
+representation before using this class.
+
+"""
+from apt.minimization.minimizer import GeneralizeToRepresentative