From 6b21d5af592add937f0e73749d3b65cb7117e6f2 Mon Sep 17 00:00:00 2001 From: abigailgold <57357634+abigailgold@users.noreply.github.com> Date: Mon, 14 Jun 2021 16:15:26 +0300 Subject: [PATCH] Updated FAQ (markdown) --- FAQ.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/FAQ.md b/FAQ.md index a1ba15c..98c8cd0 100644 --- a/FAQ.md +++ b/FAQ.md @@ -1,5 +1,5 @@ ### 1. Why do ML models need privacy protection? -Recent studies show that a malicious third party with access to a trained ML model, even without access to the training data itself, can still reveal sensitive, personal information about the people whose data was used to train the model. For example, it may be possible to reveal whether or not a person’s data is part of the model’s training set (membership inference), or even infer sensitive atributes about them, such as their salary (attribute inference). For more information see: https://github.com/IBM/ai-privacy-toolkit/wiki/Relevant-papers#membership-inference-attacks +Recent studies show that a malicious third party with access to a trained ML model, even without access to the training data itself, can still reveal sensitive, personal information about the people whose data was used to train the model. For example, it may be possible to reveal whether or not a person’s data is part of the model’s training set (membership inference), or even infer sensitive atributes about them, such as their salary (attribute inference). For more information see: https://github.com/IBM/ai-privacy-toolkit/wiki/Relevant-papers#privacy-attacks-on-ml-models ### 2. What do you mean when you say anonymization? The ML-guided anonymization method implemented in the anonymization module of this toolkit is based on a long-known construct called k-anonymity, which was proposed by L. Sweeney in 2002 to address the problem of releasing personal data while preserving individual privacy. This is a method to reduce the likelihood of any single person being identified when the dataset is linked with other, external data sources. The approach is based on generalizing attributes and possibly deleting records until each record becomes indistinguishable from at least k − 1 other records. This generalization is applied only to those attributes that can be linked with other data sources containing identifiers, called quasi-identifiers (QI).