* General model wrappers and methods supporting multi-label classifiers
* Support for pytorch multi-label binary classifier
* New model output types + single implementation of score method that supports multiple output types.
* Anonymization with pytorch multi-output binary model
* Support for multi-label binary models in minimizer.
* Support for multi-label logits/probabilities
---------
Signed-off-by: abigailt <abigailt@il.ibm.com>
* Initial version with first working test
* Make sure representative values in generalizations for 1-hot encoded features are consistent.
* Updated notebooks for one-hot encoded data
* Review comments
Signed-off-by: abigailt <abigailt@il.ibm.com>
* Support 1-hot encoded features in anonymization (#72)
* Fix anonymization adult notebook + new notebook to demonstrate anonymization on 1-hot encoded data
* Minimizer: No default encoder, if none provided data is supplied to the model as is. Fix data type of representative values. Fix and add more tests.
Signed-off-by: abigailt <abigailt@il.ibm.com>
* Update requirements
* Update incompatible scipy version
* Reduce runtime of dataset assessment tests
* ncp is now a class that contains 3 values: fit_score, transform_score and generalizations_score so that it doesn't matter in what order the different methods are called, all calculated ncp scores are stored.
Generalizations can now be applied either from tree cells or from global generalizations struct depending on the value of generalize_using_transform. Representative values can also be computed from global generalizations.
Removing a feature from the generalization can also be applied in either mode.
* Compute generalizations with test data when possible (for computing better representatives).
* Externalize common test code to methods.
* Reuse code between generalize and transform methods
* Option to get encoder from user
* Consistent encoding for decision tree and generalizations (separate from target model encoding)
* Squashed commit of wrappers:
Wrapper minimizer
* apply dataset wrapper on minimizer
* apply changes on minimization notebook
* add black_box_access and unlimited_queries params
Dataset wrapper anonymizer
Add features_names to ArrayDataset
and allow providing features names in QI and Cat features not just indexes
update notebooks
categorical features and QI passed by indexes
dataset include feature names and is_pandas param
add pytorch Dataset
Remove redundant code.
Use data wrappers in model wrapper APIs.
add generic dataset components
Create initial version of wrappers for models
* Fix handling of categorical features
* QI updates
* update code to support training ML on QI features
* fix code so features that are not from QI should not be part of generalizations
and add description
* merging two branches, training on QI and on all data
* adding tests and asserts
* support categorical features
* update the documentation and readme
added a test for the case where cells are supplied as a param.
* add big tests (adult test and iris)
and fixed bugs
* update transform to return numpy if original data is numpy
* added nursery test
* break loop if there is an illegal level
* Stop pruning one step before passing accuracy threshold
* adding asserts and fix DecisionTreeClassifier init
* Fix tests
Co-authored-by: abigailt <abigailt@il.ibm.com>
* Fixes related to corner cases in calculating generalizations
* Fix print
* Fix corner cases in transform as well
* Improve prints + bug fixes in calculation of feature to remove
* Notebook demonstrating ai minimization