Commit graph

25 commits

Author SHA1 Message Date
abigailgold
57e38ea4fa
Support for many new model output types (#93)
* General model wrappers and methods supporting multi-label classifiers
* Support for pytorch multi-label binary classifier
* New model output types + single implementation of score method that supports multiple output types. 
* Anonymization with pytorch multi-output binary model
* Support for multi-label binary models in minimizer. 
* Support for multi-label logits/probabilities
---------
Signed-off-by: abigailt <abigailt@il.ibm.com>
2024-07-03 09:04:59 -04:00
abigailgold
a8f5326572
Fix issue with computed ranges for one-hot encoded features (#90)
Signed-off-by: abigailt <abigailt@il.ibm.com>
2024-01-17 12:45:22 -05:00
abigailgold
6d81cd8ed4
Support for one-hot encoded features in minimization (#87)
* Initial version with first working test
* Make sure representative values in generalizations for 1-hot encoded features are consistent.
* Updated notebooks for one-hot encoded data
* Review comments

Signed-off-by: abigailt <abigailt@il.ibm.com>
2023-12-24 18:18:18 -05:00
abigailgold
5dce961092
Support 1-hot encoded features in anonymization + fixes related to encoding in minimization (#86)
* Support 1-hot encoded features in anonymization (#72)
* Fix anonymization adult notebook + new notebook to demonstrate anonymization on 1-hot encoded data

* Minimizer: No default encoder, if none provided data is supplied to the model as is. Fix data type of representative values. Fix and add more tests.

Signed-off-by: abigailt <abigailt@il.ibm.com>
2023-10-19 11:48:15 +03:00
abigailgold
26addd192f
Support pytorch models in data minimization (#85)
* Support pytorch models in data minimization

Signed-off-by: abigailt <abigailt@il.ibm.com>
2023-09-21 17:48:15 +03:00
abigailgold
13a0567183
Make data minimization more consistent and performant (#83)
* Update requirements

* Update incompatible scipy version

* Reduce runtime of dataset assessment tests

* ncp is now a class that contains 3 values: fit_score, transform_score and generalizations_score so that it doesn't matter in what order the different methods are called, all calculated ncp scores are stored.
Generalizations can now be applied either from tree cells or from global generalizations struct depending on the value of generalize_using_transform. Representative values can also be computed from global generalizations.
Removing a feature from the generalization can also be applied in either mode.

* Compute generalizations with test data when possible (for computing better representatives).

* Externalize common test code to methods.
2023-08-21 18:39:15 +03:00
abigailgold
d52fcd0041
Formatting (#68)
Fix most flake/lint errors and ignore a few others

Signed-off-by: abigailt <abigailt@il.ibm.com>
2022-12-25 15:13:57 +02:00
abigailt
a76c3d2714 Fix random state to make tests pass
Signed-off-by: abigailt <abigailt@il.ibm.com>
2022-12-21 09:51:49 +02:00
abigailt
ba88bc09ba Add option for non-stratified split in minimizer
Signed-off-by: abigailt <abigailt@il.ibm.com>
2022-12-21 09:23:19 +02:00
abigailgold
dfa684da6b
Consistent one-hot-encoding (#38)
* Reuse code between generalize and transform methods

* Option to get encoder from user

* Consistent encoding for decision tree and generalizations (separate from target model encoding)
2022-05-22 18:02:33 +03:00
abigailt
7055d5ecf6 Fix bug in pruning loop + fix test 2022-05-19 18:07:03 +03:00
abigailt
186f11eaaf Fix misclassification of categorical features with no generalizations (now appear under the 'untouched' category) 2022-05-19 16:42:31 +03:00
abigailgold
fe676fa426
New model wrappers (#32)
* keras wrapper + blackbox classifier wrapper (fix #7)

* fix error in NCP calculation

* Update notebooks

* Fix #25 (incorrect attack_feature indexes for social feature in notebook)

* Consistent naming of internal parameters
2022-05-12 15:44:29 +03:00
abigailgold
fd6be8e778
Documentation updates (#29)
* Bump version to 0.1.0 (breaking changes to some APIs)

* Update documentation

* Update requirements

* gitignore
2022-05-02 11:46:18 +03:00
abigailgold
2b2dab6bef
Data and Model wrappers (#26)
* Squashed commit of wrappers:

    Wrapper minimizer

    * apply dataset wrapper on minimizer
    * apply changes on minimization notebook
    * add black_box_access and unlimited_queries params

    Dataset wrapper anonymizer

    Add features_names to ArrayDataset
    and allow providing features names in QI and Cat features not just indexes

    update notebooks

    categorical features and QI passed by indexes
    dataset include feature names and is_pandas param

    add pytorch Dataset

    Remove redundant code.
    Use data wrappers in model wrapper APIs.

    add generic dataset components 

    Create initial version of wrappers for models

* Fix handling of categorical features
2022-04-27 12:33:27 +03:00
abigailt
c47819a031 Update docs 2022-02-23 19:40:11 +02:00
abigailt
7e2ce7fe96 Merge remote-tracking branch 'origin/main' into main 2022-02-23 19:26:44 +02:00
abigailt
7fbd1e4b90 Update version and docs 2022-02-23 19:22:54 +02:00
abigailgold
9de078f937
Update readme's with paper citations (#21) 2022-02-01 12:27:22 +02:00
olasaadi
3feebe8973
Regression minimization (#20)
* support regression in minimization and add test

* fix #10
2022-01-27 15:57:55 +02:00
olasaadi
a9a93c8a3a
Train just on qi (#15)
* QI updates
* update code to support training ML on QI features
* fix code so features that are not from QI should not be part of generalizations
and add description
* merging two branches, training on QI and on all data
* adding tests and asserts
2022-01-12 17:01:27 +02:00
olasaadi
2eb626c00c
Sup cat features (#14)
* support categorical features

* update the documentation and readme
added a test for the case where cells are supplied as a param.

* add big tests (adult test and iris)
and fixed bugs

* update transform to return numpy if original data is numpy

* added nursery test

* break loop if there is an illegal level

* Stop pruning one step before passing accuracy threshold

* adding asserts and fix DecisionTreeClassifier init

* Fix tests

Co-authored-by: abigailt <abigailt@il.ibm.com>
2022-01-11 09:51:04 +02:00
abigailgold
43952e2332
Minimization fixes (#12)
* Fixes related to corner cases in calculating generalizations

* Fix print

* Fix corner cases in transform as well

* Improve prints + bug fixes in calculation of feature to remove

* Notebook demonstrating ai minimization
2021-08-17 21:19:48 +03:00
abigailt
c06e2180e9 Fix images 2021-07-12 16:06:32 +03:00
abigailgold
f2e1364b43
Add data minimization functionality to the ai-privacy-toolkit (#3)
* Fix directory issue when running tests for first time

* Initial version of data minimization

* Update version and documentation

* Fix documentation
2021-07-12 15:56:42 +03:00