mirror of https://github.com/IBM/ai-privacy-toolkit.git synced 2026-07-23 17:01:03 +02:00

abigailt a37ff06df8 Squashed commit of the following: commit `d53818644e` Author: olasaadi <92303887+olasaadi@users.noreply.github.com> Date: Mon Mar 7 20:12:55 2022 +0200 Build the dt on all features anon (#23) * add param to build the DT on all features and not just on QI * one-hot encoding only for categorical features commit `c47819a031` Author: abigailt <abigailt@il.ibm.com> Date: Wed Feb 23 19:40:11 2022 +0200 Update docs commit `7e2ce7fe96` Merge: `7fbd1e4` `752871d` Author: abigailt <abigailt@il.ibm.com> Date: Wed Feb 23 19:26:44 2022 +0200 Merge remote-tracking branch 'origin/main' into main commit `7fbd1e4b90` Author: abigailt <abigailt@il.ibm.com> Date: Wed Feb 23 19:22:54 2022 +0200 Update version and docs commit `752871dd0c` Author: olasaadi <92303887+olasaadi@users.noreply.github.com> Date: Wed Feb 23 14:57:12 2022 +0200 add minimization notebook (#22) * add german credit notebook to showcase new features (minimize only some features and categorical features) * add notebook to show minimization data on a regression problem		2022-04-25 17:39:30 +03:00
..
__init__.py	Initial commit	2021-04-28 14:00:19 +03:00
anonymizer.py	Squashed commit of the following:	2022-04-25 17:39:30 +03:00
README.md	Update readme's with paper citations (#21 )	2022-02-01 12:27:22 +02:00

README.md

anonymization module

This module contains methods for anonymizing ML model training data, so that when a model is retrained on the anonymized data, the model itself will also be considered anonymous. This may help exempt the model from different obligations and restrictions set out in data protection regulations such as GDPR, CCPA, etc.

The module contains methods that enable anonymizing training datasets in a manner that is tailored to and guided by an existing, trained ML model. It uses the existing model's predictions on the training data to train a second, anonymizer model, that eventually determines the generalizations that will be applied to the training data. For more information about the method see: https://arxiv.org/abs/2007.13086

Once the anonymized training data is returned, it can be used to retrain the model.

The following figure depicts the overall process:

anonymization process

Citation

Goldsteen A., Ezov G., Shmelkin R., Moffie M., Farkash A. (2022) Anonymizing Machine Learning Models. In: Garcia-Alfaro J., Muñoz-Tapia J.L., Navarro-Arribas G., Soriano M. (eds) Data Privacy Management, Cryptocurrencies and Blockchain Technology. DPM 2021, CBT 2021. Lecture Notes in Computer Science, vol 13140. Springer, Cham. https://doi.org/10.1007/978-3-030-93944-1_8