Fix error with pandas dataframes (#92)

* Fix error with pandas dataframes in _columns_different_distributions + add appropriate test
* Update documentation of classes to reflect that all data should be encoded and scaled.

---------

Signed-off-by: abigailt <abigailt@il.ibm.com>
This commit is contained in:
abigailgold 2024-02-13 08:56:12 -05:00 committed by GitHub
parent cb70ca10e6
commit e00535d120
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 28 additions and 30 deletions

View file

@ -47,7 +47,8 @@ class DatasetAssessmentManager:
synthetic_data: ArrayDataset, dataset_name: str = DEFAULT_DATASET_NAME, categorical_features: list = [])\
-> list[DatasetAttackScore]:
"""
Do dataset privacy risk assessment by running dataset attacks, and return their scores.
Do dataset privacy risk assessment by running dataset attacks, and return their scores. All data is assumed
to be encoded and scaled.
:param original_data_members: A container for the training original samples and labels,
only samples are used in the assessment