Merge remote-tracking branch 'origin/main' into main

This commit is contained in:
abigailt 2022-02-23 19:26:44 +02:00
commit 7e2ce7fe96
5 changed files with 671 additions and 13 deletions

View file

@ -19,4 +19,11 @@ The following figure depicts the overall process:
</p>
<br />
Citation
--------
Goldsteen A., Ezov G., Shmelkin R., Moffie M., Farkash A. (2022) Anonymizing Machine Learning Models. In: Garcia-Alfaro
J., Muñoz-Tapia J.L., Navarro-Arribas G., Soriano M. (eds) Data Privacy Management, Cryptocurrencies and Blockchain
Technology. DPM 2021, CBT 2021. Lecture Notes in Computer Science, vol 13140. Springer, Cham.
https://doi.org/10.1007/978-3-030-93944-1_8

View file

@ -37,8 +37,7 @@ The current implementation supports numeric features and categorical features.
Start by training your machine learning model. In this example, we will use a ``DecisionTreeClassifier``, but any
scikit-learn model can be used. We will use the iris dataset in our example.
.. code:: python
```
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
@ -48,36 +47,37 @@ scikit-learn model can be used. We will use the iris dataset in our example.
base_est = DecisionTreeClassifier()
base_est.fit(X_train, y_train)
```
Now create the ``GeneralizeToRepresentative`` transformer and train it. Supply it with the original model and the
desired target accuracy. The training process may receive the original labeled training data or the model's predictions
on the data.
.. code:: python
```
predictions = base_est.predict(X_train)
gen = GeneralizeToRepresentative(base_est, target_accuracy=0.9)
gen.fit(X_train, predictions)
```
Now use the transformer to transform new data, for example the test data.
.. code:: python
```
transformed = gen.transform(X_test)
```
The transformed data has the same columns and formats as the original data, so it can be used directly to derive
predictions from the original model.
.. code:: python
```
new_predictions = base_est.predict(transformed)
```
To export the resulting generalizations, retrieve the ``Transformer``'s ``_generalize`` parameter.
.. code:: python
```
generalizations = base_est._generalize
```
The returned object has the following structure::
{
@ -103,6 +103,10 @@ Where each value inside the range list represents a cutoff point. For example, f
this example are: ``<21.5, 21.5-39.0, 39.0-51.0, 51.0-70.5, >70.5``. The ``untouched`` list represents features that
were not generalized, i.e., their values should remain unchanged.
Citation
--------
Goldsteen, A., Ezov, G., Shmelkin, R. et al. Data minimization for GDPR compliance in machine learning models. AI Ethics
(2021). https://doi.org/10.1007/s43681-021-00095-8