mirror of
https://github.com/IBM/ai-privacy-toolkit.git
synced 2026-06-08 15:05:13 +02:00
Updated FAQ (markdown)
parent
6b21d5af59
commit
761be56f4d
1 changed files with 7 additions and 2 deletions
9
FAQ.md
9
FAQ.md
|
|
@ -14,9 +14,14 @@ However, a deeper analysis of these cases reveals that this typically occurs whe
|
|||
In addition, in the ML-guided anonymization paper (https://arxiv.org/abs/2007.13086) we show that this method achieves similar protection against membership inference attacks on the resulting models to more robust methods such as those based on differential privacy, and even demonstrated a reduced risk of attribute inference using this method.
|
||||
|
||||
### 4. Why not just use differential privacy instead?
|
||||
Differential privacy has several big advantages. First and foremost, it provides a robust mathematical privacy guarantee, as opposed to k-anonymity that is considered a syntactic privacy construct. Differential privacy also provides forward-proof privacy, i.e., it doesn't matter which datasets may be available in the future, re-identification risk will not increase. Differential privacy is also suitable for high-dimensional and non-tabular data, such as images or text.
|
||||
|
||||
However it also suffers from a few drawbacks. It is much more invasive and complex to implement and use, and requires involvement of the data scientists since it requires replacing the original training algorithm with a new one. Moreover, each type of ML model, and in some cases different architectures and other internal implementation details, require a different differentially private implementation. This makes it much more difficult to implement in large organizations with many diverse types of models. Finally, it is typically much more resource-intensive than non-private, highly optimized training algorithms.
|
||||
|
||||
On the other hand, ML-guided anonymization sits “outside” of the training process, which does not need to be replaced or change in any way, and it is actually model-agnostic. The existing training algorithms, architectures and even hyper-parameters can be reused. This makes it much easier to integrate into existing ML pipelines. Since it does not rely on making modifications to the training process, it can be applied in a wide variety of use cases, including machine learning as a service. However, it only works for tabular, relatively low-dimensional data.
|
||||
|
||||
### 5. Do I really need an already trained model to use ML-guided anonymization?
|
||||
Working with an existing model enables the highest level of tailoring, and will likely yield the highest accuracy results. The original model’s predictions are used to guide the anonymization process, i.e., the creation of the groups of k records that will be generalized together. The initial model used to generate these predictions may be a simple, representative model, trained on a subset of the data or a pre-trained model performing a similar classification task as the target model. However, if such a model is not available, the true class labels may be used instead of the model's predictions.
|
||||
|
||||
|
||||
### 6. What kind of models does ML-guided anonymization work for?
|
||||
### 6. What kind of models does ML-guided anonymization work for?
|
||||
This method is model-agnostic and does not require any changes to the training algorithm. So far we have only tested it on classification models in the supervised learning domain. However we believe it may be applicable in many more use cases, such as regression models and maybe even unsupervised learning models.
|
||||
Loading…
Add table
Add a link
Reference in a new issue