mirror of
https://github.com/IBM/ai-privacy-toolkit.git
synced 2026-05-07 02:52:39 +02:00
Update readme's with paper citations (#21)
This commit is contained in:
parent
3feebe8973
commit
9de078f937
2 changed files with 23 additions and 12 deletions
|
|
@ -19,4 +19,11 @@ The following figure depicts the overall process:
|
|||
</p>
|
||||
<br />
|
||||
|
||||
Citation
|
||||
--------
|
||||
Goldsteen A., Ezov G., Shmelkin R., Moffie M., Farkash A. (2022) Anonymizing Machine Learning Models. In: Garcia-Alfaro
|
||||
J., Muñoz-Tapia J.L., Navarro-Arribas G., Soriano M. (eds) Data Privacy Management, Cryptocurrencies and Blockchain
|
||||
Technology. DPM 2021, CBT 2021. Lecture Notes in Computer Science, vol 13140. Springer, Cham.
|
||||
https://doi.org/10.1007/978-3-030-93944-1_8
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -37,8 +37,7 @@ The current implementation supports numeric features and categorical features.
|
|||
Start by training your machine learning model. In this example, we will use a ``DecisionTreeClassifier``, but any
|
||||
scikit-learn model can be used. We will use the iris dataset in our example.
|
||||
|
||||
.. code:: python
|
||||
|
||||
```
|
||||
from sklearn import datasets
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.tree import DecisionTreeClassifier
|
||||
|
|
@ -48,36 +47,37 @@ scikit-learn model can be used. We will use the iris dataset in our example.
|
|||
|
||||
base_est = DecisionTreeClassifier()
|
||||
base_est.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
Now create the ``GeneralizeToRepresentative`` transformer and train it. Supply it with the original model and the
|
||||
desired target accuracy. The training process may receive the original labeled training data or the model's predictions
|
||||
on the data.
|
||||
|
||||
.. code:: python
|
||||
|
||||
```
|
||||
predictions = base_est.predict(X_train)
|
||||
gen = GeneralizeToRepresentative(base_est, target_accuracy=0.9)
|
||||
gen.fit(X_train, predictions)
|
||||
```
|
||||
|
||||
Now use the transformer to transform new data, for example the test data.
|
||||
|
||||
.. code:: python
|
||||
|
||||
```
|
||||
transformed = gen.transform(X_test)
|
||||
```
|
||||
|
||||
The transformed data has the same columns and formats as the original data, so it can be used directly to derive
|
||||
predictions from the original model.
|
||||
|
||||
.. code:: python
|
||||
|
||||
```
|
||||
new_predictions = base_est.predict(transformed)
|
||||
|
||||
```
|
||||
|
||||
To export the resulting generalizations, retrieve the ``Transformer``'s ``_generalize`` parameter.
|
||||
|
||||
.. code:: python
|
||||
|
||||
```
|
||||
generalizations = base_est._generalize
|
||||
|
||||
```
|
||||
|
||||
The returned object has the following structure::
|
||||
|
||||
{
|
||||
|
|
@ -103,6 +103,10 @@ Where each value inside the range list represents a cutoff point. For example, f
|
|||
this example are: ``<21.5, 21.5-39.0, 39.0-51.0, 51.0-70.5, >70.5``. The ``untouched`` list represents features that
|
||||
were not generalized, i.e., their values should remain unchanged.
|
||||
|
||||
Citation
|
||||
--------
|
||||
Goldsteen, A., Ezov, G., Shmelkin, R. et al. Data minimization for GDPR compliance in machine learning models. AI Ethics
|
||||
(2021). https://doi.org/10.1007/s43681-021-00095-8
|
||||
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue