From 743c67aef8c6b0aed81d4334546a543cc2187832 Mon Sep 17 00:00:00 2001 From: Yizhou Chi Date: Sat, 14 Sep 2024 17:34:17 +0800 Subject: [PATCH] change task type prompt to prevent unwanted label transformation --- metagpt/prompts/task_type.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/metagpt/prompts/task_type.py b/metagpt/prompts/task_type.py index 116756edc..ca0aae572 100644 --- a/metagpt/prompts/task_type.py +++ b/metagpt/prompts/task_type.py @@ -11,7 +11,7 @@ The current task is about data preprocessing, please note the following: - Monitor data types per column, applying appropriate methods. - Ensure operations are on existing dataset columns. - Avoid writing processed data to files. -- Avoid any change to label column, such as standardization, etc. +- **ATTENTION** Do NOT make any changes to the label column, such as standardization, etc. - Prefer alternatives to one-hot encoding for categorical data. - Only encode or scale necessary columns to allow for potential feature-specific engineering tasks (like time_extract, binning, extraction, etc.) later. - Each step do data preprocessing to train, must do same for test separately at the same time. @@ -26,7 +26,7 @@ The current task is about feature engineering. when performing it, please adhere - Avoid creating redundant or excessively numerous features in one step. - Exclude ID columns from feature generation and remove them. - Each feature engineering operation performed on the train set must also applies to the dev/test separately at the same time. -- Avoid using the label column to create features, except for cat encoding. +- **ATTENTION** Do NOT use the label column to create features or make any changes to the label column, except for cat encoding. - Use the data from previous task result if exist, do not mock or reload data yourself. - Always copy the DataFrame before processing it and use the copy to process. """