* General model wrappers and methods supporting multi-label classifiers
* Support for pytorch multi-label binary classifier
* New model output types + single implementation of score method that supports multiple output types.
* Anonymization with pytorch multi-output binary model
* Support for multi-label binary models in minimizer.
* Support for multi-label logits/probabilities
---------
Signed-off-by: abigailt <abigailt@il.ibm.com>
* Fix error with pandas dataframes in _columns_different_distributions + add appropriate test
* Update documentation of classes to reflect that all data should be encoded and scaled.
---------
Signed-off-by: abigailt <abigailt@il.ibm.com>
* Initial version with first working test
* Make sure representative values in generalizations for 1-hot encoded features are consistent.
* Updated notebooks for one-hot encoded data
* Review comments
Signed-off-by: abigailt <abigailt@il.ibm.com>
* Support 1-hot encoded features in anonymization (#72)
* Fix anonymization adult notebook + new notebook to demonstrate anonymization on 1-hot encoded data
* Minimizer: No default encoder, if none provided data is supplied to the model as is. Fix data type of representative values. Fix and add more tests.
Signed-off-by: abigailt <abigailt@il.ibm.com>
* Add column distribution comparison, and a third method for dataset assessment by membership classification
* Address review comments, add additional distribution comparison tests and make them externally configurable too, in addition to the alpha becoming configurable.
Signed-off-by: Maya Anderson <mayaa@il.ibm.com>
* Update requirements
* Update incompatible scipy version
* Reduce runtime of dataset assessment tests
* ncp is now a class that contains 3 values: fit_score, transform_score and generalizations_score so that it doesn't matter in what order the different methods are called, all calculated ncp scores are stored.
Generalizations can now be applied either from tree cells or from global generalizations struct depending on the value of generalize_using_transform. Representative values can also be computed from global generalizations.
Removing a feature from the generalization can also be applied in either mode.
* Compute generalizations with test data when possible (for computing better representatives).
* Externalize common test code to methods.
* Limit scikit-learn versions between 0.22.2 and 1.1.3, remove deprecated load_boston().
* Set pytest configuration option to show test progress in detail.
* Change np.int to int according to DeprecationWarning
Signed-off-by: Maya Anderson <mayaa@il.ibm.com>
* Remove tensorflow dependency if not using keras model
* Remove xgboost dependency if not using xgboost model
* Documentation updates
Signed-off-by: abigailt <abigailt@il.ibm.com>
* Reuse code between generalize and transform methods
* Option to get encoder from user
* Consistent encoding for decision tree and generalizations (separate from target model encoding)
* Squashed commit of wrappers:
Wrapper minimizer
* apply dataset wrapper on minimizer
* apply changes on minimization notebook
* add black_box_access and unlimited_queries params
Dataset wrapper anonymizer
Add features_names to ArrayDataset
and allow providing features names in QI and Cat features not just indexes
update notebooks
categorical features and QI passed by indexes
dataset include feature names and is_pandas param
add pytorch Dataset
Remove redundant code.
Use data wrappers in model wrapper APIs.
add generic dataset components
Create initial version of wrappers for models
* Fix handling of categorical features
commit d53818644e
Author: olasaadi <92303887+olasaadi@users.noreply.github.com>
Date: Mon Mar 7 20:12:55 2022 +0200
Build the dt on all features anon (#23)
* add param to build the DT on all features and not just on QI
* one-hot encoding only for categorical features
commit c47819a031
Author: abigailt <abigailt@il.ibm.com>
Date: Wed Feb 23 19:40:11 2022 +0200
Update docs
commit 7e2ce7fe96
Merge: 7fbd1e4752871d
Author: abigailt <abigailt@il.ibm.com>
Date: Wed Feb 23 19:26:44 2022 +0200
Merge remote-tracking branch 'origin/main' into main
commit 7fbd1e4b90
Author: abigailt <abigailt@il.ibm.com>
Date: Wed Feb 23 19:22:54 2022 +0200
Update version and docs
commit 752871dd0c
Author: olasaadi <92303887+olasaadi@users.noreply.github.com>
Date: Wed Feb 23 14:57:12 2022 +0200
add minimization notebook (#22)
* add german credit notebook to showcase new features (minimize only some features and categorical features)
* add notebook to show minimization data on a regression problem