Fix error with pandas dataframes (#92)

* Fix error with pandas dataframes in _columns_different_distributions + add appropriate test * Update documentation of classes to reflect that all data should be encoded and scaled. --------- Signed-off-by: abigailt <abigailt@il.ibm.com>
2026-04-26 13:26:21 +02:00 · 2024-02-13 08:56:12 -05:00 · 2024-02-13 08:56:12 -05:00 · e00535d120
commit e00535d120
parent cb70ca10e6
6 changed files with 28 additions and 30 deletions
--- a/apt/risk/data_assessment/dataset_attack_membership_classification.py
+++ b/apt/risk/data_assessment/dataset_attack_membership_classification.py
@ -71,9 +71,11 @@ class DatasetAttackMembershipClassification(DatasetAttackMembership):
                 config: DatasetAttackConfigMembershipClassification = DatasetAttackConfigMembershipClassification(),
                 dataset_name: str = DEFAULT_DATASET_NAME, categorical_features: list = None):
        """
-        :param original_data_members: A container for the training original samples and labels
-        :param original_data_non_members: A container for the holdout original samples and labels
-        :param synthetic_data: A container for the synthetic samples and labels
+        :param original_data_members: A container for the training original samples and labels. Should be encoded and
+                                      scaled.
+        :param original_data_non_members: A container for the holdout original samples and labels. Should be encoded
+                                          and scaled.
+        :param synthetic_data: A container for the synthetic samples and labels. Should be encoded and scaled.
        :param config: Configuration parameters to guide the attack, optional
        :param dataset_name: A name to identify this dataset, optional
        """