Commit graph

67 commits

Author SHA1 Message Date
abigailgold
57e38ea4fa
Support for many new model output types (#93)
* General model wrappers and methods supporting multi-label classifiers
* Support for pytorch multi-label binary classifier
* New model output types + single implementation of score method that supports multiple output types. 
* Anonymization with pytorch multi-output binary model
* Support for multi-label binary models in minimizer. 
* Support for multi-label logits/probabilities
---------
Signed-off-by: abigailt <abigailt@il.ibm.com>
2024-07-03 09:04:59 -04:00
abigailgold
e00535d120
Fix error with pandas dataframes (#92)
* Fix error with pandas dataframes in _columns_different_distributions + add appropriate test
* Update documentation of classes to reflect that all data should be encoded and scaled.

---------

Signed-off-by: abigailt <abigailt@il.ibm.com>
2024-02-13 08:56:12 -05:00
abigailgold
a8f5326572
Fix issue with computed ranges for one-hot encoded features (#90)
Signed-off-by: abigailt <abigailt@il.ibm.com>
2024-01-17 12:45:22 -05:00
abigailgold
6d81cd8ed4
Support for one-hot encoded features in minimization (#87)
* Initial version with first working test
* Make sure representative values in generalizations for 1-hot encoded features are consistent.
* Updated notebooks for one-hot encoded data
* Review comments

Signed-off-by: abigailt <abigailt@il.ibm.com>
2023-12-24 18:18:18 -05:00
abigailgold
5dce961092
Support 1-hot encoded features in anonymization + fixes related to encoding in minimization (#86)
* Support 1-hot encoded features in anonymization (#72)
* Fix anonymization adult notebook + new notebook to demonstrate anonymization on 1-hot encoded data

* Minimizer: No default encoder, if none provided data is supplied to the model as is. Fix data type of representative values. Fix and add more tests.

Signed-off-by: abigailt <abigailt@il.ibm.com>
2023-10-19 11:48:15 +03:00
abigailgold
26addd192f
Support pytorch models in data minimization (#85)
* Support pytorch models in data minimization

Signed-off-by: abigailt <abigailt@il.ibm.com>
2023-09-21 17:48:15 +03:00
andersonm-ibm
a40484e0c9
Add column distribution comparison, and a third method for dataset asssessment by membership classification (#84)
* Add column distribution comparison, and a third method for dataset assessment by membership classification

* Address review comments, add additional distribution comparison tests and make them externally configurable too, in addition to the alpha becoming configurable.

Signed-off-by: Maya Anderson <mayaa@il.ibm.com>
2023-09-21 16:43:19 +03:00
abigailgold
13a0567183
Make data minimization more consistent and performant (#83)
* Update requirements

* Update incompatible scipy version

* Reduce runtime of dataset assessment tests

* ncp is now a class that contains 3 values: fit_score, transform_score and generalizations_score so that it doesn't matter in what order the different methods are called, all calculated ncp scores are stored.
Generalizations can now be applied either from tree cells or from global generalizations struct depending on the value of generalize_using_transform. Representative values can also be computed from global generalizations.
Removing a feature from the generalization can also be applied in either mode.

* Compute generalizations with test data when possible (for computing better representatives).

* Externalize common test code to methods.
2023-08-21 18:39:15 +03:00
andersonm-ibm
e9a225501f
Limit scikit-learn version because of API changes (#81)
* Limit scikit-learn versions between 0.22.2 and 1.1.3, remove deprecated load_boston().

* Set pytest configuration option to show test progress in detail.

* Change np.int to int according to DeprecationWarning

Signed-off-by: Maya Anderson <mayaa@il.ibm.com>
2023-05-14 08:52:06 +03:00
andersonm-ibm
3885ab9d9d
Change back flake8 warnings to errors. Fix tests not to fail it. (#76)
Signed-off-by: Maya Anderson <mayaa@il.ibm.com>
2023-05-11 11:33:50 +03:00
abigailgold
8a9ef80146
Increase version to 0.2.0 (#74)
* Remove tensorflow dependency if not using keras model
* Remove xgboost dependency if not using xgboost model
* Documentation updates

Signed-off-by: abigailt <abigailt@il.ibm.com>
2023-05-08 12:50:55 +03:00
Maya Anderson
dbb958f791 Merge pull request #71 from IBM/dataset_assessment
Add AI privacy Dataset assessment module with two attack implementations.

Signed-off-by: Maya Anderson <mayaa@il.ibm.com>
2023-03-20 14:21:29 +02:00
abigailgold
d52fcd0041
Formatting (#68)
Fix most flake/lint errors and ignore a few others

Signed-off-by: abigailt <abigailt@il.ibm.com>
2022-12-25 15:13:57 +02:00
Maya Anderson
89bdcfc00e Prepare project for CI: cleanup dependencies, fix test data location, cleanup assert.
Signed-off-by: Maya Anderson <mayaa@il.ibm.com>
2022-12-20 16:00:36 +02:00
abigailt
64038f76f9 Merge with main 2022-08-01 18:12:03 +03:00
abigailt
dc5cc793ee Merge with main 2022-08-01 18:11:34 +03:00
abigailt
a9e2a35e18 Add support for xgboost XGBClassifier (#53) 2022-07-28 17:21:24 +03:00
olasaadi
74ce92acc4 fix 2022-07-26 18:37:44 +03:00
abigailt
a13415ad67 Externalize BlackboxClassifier dataset (x and predictions) 2022-07-25 16:31:45 +03:00
abigailt
fb534f7a0f BlackboxClassifier based on predictions to work with DatasetWithPredictions 2022-07-25 16:31:45 +03:00
abigailt
77a6e08c8e Keras regression support 2022-07-24 18:45:50 +03:00
Ron Shmelkin
c77e34e373
update pytorch wrapper to use torch loaders
fix tests
and dataset style
2022-07-24 14:31:47 +03:00
olasaadi
6f69f5557b fix bug 2022-07-20 18:29:48 +03:00
olasaadi
3bf26b67d2 fix 2022-07-20 17:36:00 +03:00
abigailt
a7d156660e Wrap predict method in BlackBoxClassifierPredictMethod to avoid exception in ART when supplied method returns scalars 2022-07-20 13:33:19 +03:00
abigailt
1cc73b3da1 Check for mismatch between model output type and actual output 2022-07-20 13:33:19 +03:00
abigailt
bc7ab0cc7f Add model type to blackbox classifier (#49) 2022-07-20 13:33:19 +03:00
olasaadi
4973fbebc6 fix 2022-07-19 21:16:39 +03:00
abigailgold
00f9c16863
Support additional use cases for data (#46)
* Make ART black box classifier not apply preprocessing to data
* Add option to store predictions (in addition to x,y) in Dataset and Data classes
2022-07-11 14:28:09 +03:00
Shlomit Shachor
e25e58b253
enhance calculation of nb classes + tests (#45)
* update get_nb_classes method to handle 1-hot and scalar input
2022-07-05 11:32:17 +03:00
abigailgold
c6eb553a9f
Blackbox predict method (#43)
* Support output probabilities
* Support black box classifier with predict method
* Update requirements (security alert #1)
2022-06-30 18:23:53 +03:00
Shlomit Shachor
1c4b963add
Wrappers no train (#40)
1) Handle train None in Data
2) Update BB Classifier to handle None either for train or test (x or y)
2022-06-26 14:43:22 +03:00
olasaadi
21cba95a28 fix 2022-06-06 14:32:34 +03:00
olasaadi
c954f53ad7 fix 2022-06-06 14:02:40 +03:00
olasaadi
302d0c4b8c update 2022-06-02 15:25:07 +03:00
olasaadi
a3fb68fb56 update 2022-05-30 12:52:32 +03:00
olasaadi
023f8764da update 2022-05-30 11:51:22 +03:00
olasaadi
59d8b16bb4 fix 2022-05-23 12:49:38 +03:00
abigailgold
dfa684da6b
Consistent one-hot-encoding (#38)
* Reuse code between generalize and transform methods

* Option to get encoder from user

* Consistent encoding for decision tree and generalizations (separate from target model encoding)
2022-05-22 18:02:33 +03:00
abigailt
7055d5ecf6 Fix bug in pruning loop + fix test 2022-05-19 18:07:03 +03:00
abigailt
186f11eaaf Fix misclassification of categorical features with no generalizations (now appear under the 'untouched' category) 2022-05-19 16:42:31 +03:00
abigailgold
fe676fa426
New model wrappers (#32)
* keras wrapper + blackbox classifier wrapper (fix #7)

* fix error in NCP calculation

* Update notebooks

* Fix #25 (incorrect attack_feature indexes for social feature in notebook)

* Consistent naming of internal parameters
2022-05-12 15:44:29 +03:00
abigailgold
2b2dab6bef
Data and Model wrappers (#26)
* Squashed commit of wrappers:

    Wrapper minimizer

    * apply dataset wrapper on minimizer
    * apply changes on minimization notebook
    * add black_box_access and unlimited_queries params

    Dataset wrapper anonymizer

    Add features_names to ArrayDataset
    and allow providing features names in QI and Cat features not just indexes

    update notebooks

    categorical features and QI passed by indexes
    dataset include feature names and is_pandas param

    add pytorch Dataset

    Remove redundant code.
    Use data wrappers in model wrapper APIs.

    add generic dataset components 

    Create initial version of wrappers for models

* Fix handling of categorical features
2022-04-27 12:33:27 +03:00
abigailt
a37ff06df8 Squashed commit of the following:
commit d53818644e
Author: olasaadi <92303887+olasaadi@users.noreply.github.com>
Date:   Mon Mar 7 20:12:55 2022 +0200

    Build the dt on all features anon (#23)

    * add param to build the DT on all features and not just on QI
    * one-hot encoding only for categorical features

commit c47819a031
Author: abigailt <abigailt@il.ibm.com>
Date:   Wed Feb 23 19:40:11 2022 +0200

    Update docs

commit 7e2ce7fe96
Merge: 7fbd1e4 752871d
Author: abigailt <abigailt@il.ibm.com>
Date:   Wed Feb 23 19:26:44 2022 +0200

    Merge remote-tracking branch 'origin/main' into main

commit 7fbd1e4b90
Author: abigailt <abigailt@il.ibm.com>
Date:   Wed Feb 23 19:22:54 2022 +0200

    Update version and docs

commit 752871dd0c
Author: olasaadi <92303887+olasaadi@users.noreply.github.com>
Date:   Wed Feb 23 14:57:12 2022 +0200

    add minimization notebook (#22)

    * add german credit notebook to showcase new features (minimize only some features and categorical features)

    * add notebook to show minimization data on a regression problem
2022-04-25 17:39:30 +03:00
Ola Saadi
ac5d82aab6 Wrapper minimizer (#20)
* apply dataset wrapper on minimizer
* apply changes on minimization notebook
* add black_box_access and unlimited_queries params
2022-04-18 13:14:49 +03:00
ABIGAIL GOLDSTEEN
6b04fd5564 Remove failing assert
Regression scores do not necessarily have to be between 0 and 1 (as opposed to classification scores).
2022-04-05 14:51:02 +03:00
Ola Saadi
5f6a258f8f Merge branch 'wrappers' into dataset_wrapper_anonimizer 2022-03-28 17:11:41 +03:00
olasaadi
b54f0a2382 fix tests 2022-03-24 19:35:26 +02:00
olasaadi
66c86dc595 fix notebook and add features_names to ArrayDataset
and allow providing features names in QI and Cat features not just indexes
2022-03-24 19:32:24 +02:00
olasaadi
312469212e fix docstring and fix assert in test 2022-03-22 13:59:28 +02:00