Update readme's with paper citations (#21)

This commit is contained in:
abigailgold 2022-02-01 12:27:22 +02:00 committed by GitHub
parent 3feebe8973
commit 9de078f937
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 23 additions and 12 deletions

View file

@ -19,4 +19,11 @@ The following figure depicts the overall process:
</p>
<br />
Citation
--------
Goldsteen A., Ezov G., Shmelkin R., Moffie M., Farkash A. (2022) Anonymizing Machine Learning Models. In: Garcia-Alfaro
J., Muñoz-Tapia J.L., Navarro-Arribas G., Soriano M. (eds) Data Privacy Management, Cryptocurrencies and Blockchain
Technology. DPM 2021, CBT 2021. Lecture Notes in Computer Science, vol 13140. Springer, Cham.
https://doi.org/10.1007/978-3-030-93944-1_8

View file

@ -37,8 +37,7 @@ The current implementation supports numeric features and categorical features.
Start by training your machine learning model. In this example, we will use a ``DecisionTreeClassifier``, but any
scikit-learn model can be used. We will use the iris dataset in our example.
.. code:: python
```
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
@ -48,36 +47,37 @@ scikit-learn model can be used. We will use the iris dataset in our example.
base_est = DecisionTreeClassifier()
base_est.fit(X_train, y_train)
```
Now create the ``GeneralizeToRepresentative`` transformer and train it. Supply it with the original model and the
desired target accuracy. The training process may receive the original labeled training data or the model's predictions
on the data.
.. code:: python
```
predictions = base_est.predict(X_train)
gen = GeneralizeToRepresentative(base_est, target_accuracy=0.9)
gen.fit(X_train, predictions)
```
Now use the transformer to transform new data, for example the test data.
.. code:: python
```
transformed = gen.transform(X_test)
```
The transformed data has the same columns and formats as the original data, so it can be used directly to derive
predictions from the original model.
.. code:: python
```
new_predictions = base_est.predict(transformed)
```
To export the resulting generalizations, retrieve the ``Transformer``'s ``_generalize`` parameter.
.. code:: python
```
generalizations = base_est._generalize
```
The returned object has the following structure::
{
@ -103,6 +103,10 @@ Where each value inside the range list represents a cutoff point. For example, f
this example are: ``<21.5, 21.5-39.0, 39.0-51.0, 51.0-70.5, >70.5``. The ``untouched`` list represents features that
were not generalized, i.e., their values should remain unchanged.
Citation
--------
Goldsteen, A., Ezov, G., Shmelkin, R. et al. Data minimization for GDPR compliance in machine learning models. AI Ethics
(2021). https://doi.org/10.1007/s43681-021-00095-8