mirror of https://github.com/IBM/ai-privacy-toolkit.git synced 2026-06-14 15:25:12 +02:00

abigailgold 2b2dab6bef Data and Model wrappers (#26 ) * Squashed commit of wrappers: Wrapper minimizer * apply dataset wrapper on minimizer * apply changes on minimization notebook * add black_box_access and unlimited_queries params Dataset wrapper anonymizer Add features_names to ArrayDataset and allow providing features names in QI and Cat features not just indexes update notebooks categorical features and QI passed by indexes dataset include feature names and is_pandas param add pytorch Dataset Remove redundant code. Use data wrappers in model wrapper APIs. add generic dataset components Create initial version of wrappers for models * Fix handling of categorical features		2022-04-27 12:33:27 +03:00
..
__init__.py	Initial commit	2021-04-28 14:00:19 +03:00
anonymizer.py	Data and Model wrappers (#26 )	2022-04-27 12:33:27 +03:00
README.md	Update readme's with paper citations (#21 )	2022-02-01 12:27:22 +02:00

README.md

anonymization module

This module contains methods for anonymizing ML model training data, so that when a model is retrained on the anonymized data, the model itself will also be considered anonymous. This may help exempt the model from different obligations and restrictions set out in data protection regulations such as GDPR, CCPA, etc.

The module contains methods that enable anonymizing training datasets in a manner that is tailored to and guided by an existing, trained ML model. It uses the existing model's predictions on the training data to train a second, anonymizer model, that eventually determines the generalizations that will be applied to the training data. For more information about the method see: https://arxiv.org/abs/2007.13086

Once the anonymized training data is returned, it can be used to retrain the model.

The following figure depicts the overall process:

anonymization process

Citation

Goldsteen A., Ezov G., Shmelkin R., Moffie M., Farkash A. (2022) Anonymizing Machine Learning Models. In: Garcia-Alfaro J., Muñoz-Tapia J.L., Navarro-Arribas G., Soriano M. (eds) Data Privacy Management, Cryptocurrencies and Blockchain Technology. DPM 2021, CBT 2021. Lecture Notes in Computer Science, vol 13140. Springer, Cham. https://doi.org/10.1007/978-3-030-93944-1_8