Resolve comment and modify readme

This commit is contained in:
didi 2024-10-22 11:41:45 +08:00
parent 5aa62b76ce
commit e575b629a1
4 changed files with 111 additions and 132 deletions

57
examples/aflow/README-.md Normal file
View file

@ -0,0 +1,57 @@
# AFlow: Automating Agentic Workflow Generation
AFlow is a framework for automatically generating and optimizing Agentic Workflows. It uses Monte Carlo tree search in a code-represented workflow space to find effective workflows, replacing manual development with machine effort. Our approach shows potential to outperform handcrafted workflows on various tasks.
[Read our paper on arXiv](https://arxiv.org/abs/2410.10762)
![Performance of AFlow](examples/aflow/img/performance_of_AFlow.png)
## Framework Components
- **Node**: Basic unit of LLM invocation. See `metagpt/actions/action_node.py` for a flexible interface to control LLM, temperature, format, and prompt.
- **Operator**: Predefined combinations of Nodes to enhance search efficiency. Encapsulates common operations like Generate, Format, Review, Revise, Ensemble, Test, and Programmer. See `metagpt/ext/aflow/operator.py` for details. You can customize your own Operator by referencing the implementations in this code.
- **Workflow**: A sequence of LLM-invoking nodes connected by edges. Can be represented as graphs, neural networks, or code to express various execution structures. See `metagpt/ext/aflow/workflow.py` for our implementation.
- **Optimizer**: Uses LLMs within a Monte Carlo Tree Search variant to explore and refine workflows. Iteratively selects, expands, evaluates, and updates workflows based on performance. See `metagpt/ext/aflow/scripts/optimizer.py` for details.
- **Evaluator**: Assesses workflow performance on given tasks. Provides feedback to guide the optimization process towards more effective workflows. See `metagpt/ext/aflow/scripts/evaluator.py` for details.
## Datasets
### Experimental Datasets
We conducted experiments on six datasets (HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP) and provide their evaluation code. The data can be found in this [datasets](https://drive.google.com/uc?export=download&id=1DNoegtZiUhWtvkd2xoIuElmIi4ah7k8e) link, or you can download them using `metagpt/ext/aflow/data/download_data.py`
### Custom Datasets
For custom tasks, you can reference the code in the metagpt/ext/aflow/benchmark folder. Inherit the `BaseBenchmark` class and implement `evaluate_problem`, `calculate_score`, and `get_result_columns` to add your custom dataset benchmark. Then, add your benchmark name in `metagpt/ext/aflow/scripts/evaluator.py` and `metagpt/ext/aflow/scripts/optimizer.py` to find effective workflows for your custom dataset.
## Quick Start
1. Configure your search in `optimize.py`:
- Open `examples/aflow/optimize.py`
- Set the following parameters:
```python
dataset = "HumanEval" # Choose from: "HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP" or your custom dataset name
question_type = "code" # Choose from: "math", "code", "qa"
sample = 4 # Number of samples to use for optimization
check_convergence = True # Whether to check for convergence
optimized_path = "path/to/optimized/workflows" # Path to save optimized workflows, defaults to metagpt/ext/aflow/scripts/optimized
initial_round = 1 # Starting round number
max_rounds = 20 # Maximum number of optimization rounds
```
- Adjust these parameters according to your specific requirements and dataset
2. Set up parameters in `config/config2.yaml` (see `examples/aflow/config2.example.yaml` for reference)
3. Set the operator you want to use in `optimize.py` and in `optimized_path/template/operator.py`, `optimized_path/template/operator.json`. You can reference our implementation to add operators for specific datasets
4. When you first run, you can download the datasets and initial rounds by setting `download(["datasets", "initial_rounds"])` in `examples/aflow/optimize.py`
5. (Optional) Add your custom dataset and corresponding evaluation function following the [Custom Datasets](#custom-datasets) section
6. Run `python examples/aflow/optimize.py` to start the optimization process!
## Citation
If you use AFlow in your research, please cite our paper:
```
@article{zhang2024aflow,
title={AFlow: Automating Agentic Workflow Generation},
author={Zhang, Jiayi and Xiang, Jinyu and Yu, Zhaoyang and Teng, Fengwei and Chen, Xionghui and Chen, Jiaqi and Zhuge, Mingchen and Cheng, Xin and Hong, Sirui and Wang, Jinlin and others},
journal={arXiv preprint arXiv:2410.10762},
year={2024}
}
```

View file

@ -1,70 +0,0 @@
# AFlow: Automating Agentic Workflow Generation
AFlow is a framework for automatically generating and optimizing Agentic Workflows. It uses Monte Carlo tree search in a code-represented workflow space to find effective workflows, replacing manual development with machine effort. Our approach shows potential to outperform handcrafted workflows on various tasks.
[Read our paper on arXiv](https://arxiv.org/abs/2410.10762)
[Insert performance graph/image here]
## Framework Components
- **Node**: Basic unit of LLM invocation. See `action_node.py` for a flexible interface to control LLM, temperature, format, and prompt.
- **Operator**: Predefined combinations of Nodes to enhance search efficiency. Encapsulates common operations like Generate, Format, Review, Revise, Ensemble, Test, and Programmer.
- **Workflow**: A sequence of LLM-invoking nodes connected by edges. Can be represented as graphs, neural networks, or code to express various execution structures.
- **Optimizer**: Uses LLMs within a Monte Carlo Tree Search variant to explore and refine workflows. Iteratively selects, expands, evaluates, and updates workflows based on performance.
- **Evaluator**: Assesses workflow performance on given tasks. Provides feedback to guide the optimization process towards more effective workflows.
## Datasets
We provide implementations for [list datasets here].
Data is available at [link to data].
For custom tasks, [brief instructions or link to documentation].
## Quick Start
1. Configure your search in `optimize.py`:
- Open `metagpt/ext/aflow/scripts/optimize.py`
- Set the following parameters:
```python
dataset = "HumanEval" # Choose from: "HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP" or your custom dataset name
question_type = "code" # Choose from: "math", "code", "qa"
sample = 5 # Number of samples to use for optimization
check_convergence = True # Whether to check for convergence
optimized_path = "path/to/optimized/workflows" # Path to save optimized workflows
initial_round = 1 # Starting round number
max_rounds = 20 # Maximum number of optimization rounds
```
- Adjust these parameters according to your specific requirements and dataset
2. Set up parameters in `config/config2.yaml` (see `metagpt/ext/aflow/config2.example.yaml` for reference)
3. Set the operator you want to use in `optimize.py` and in `xxxx`
4. Download the init round of six datasets and put them in `xxxxxx`
5. Add your custom dataset and corresponding evaluation function:
- Create a new Python file in the `metagpt/ext/aflow/benchmark/` directory, named `{custom_dataset_name}.py`
- Implement the following key functions in this new file:
- `load_data`: for loading the dataset
- `evaluate_problem`: for evaluating a single problem solution
- `evaluate_all_problems`: for evaluating all problems
- `save_results_to_csv`: for saving evaluation results
- `optimize_{custom_dataset_name}_evaluation`: main evaluation function that integrates the above functionalities
- Add your custom dataset name and config val_list in `metagpt/ext/aflow/scripts/evaluator.py`
## License
[License information]
## Citation
If you use AFlow in your research, please cite our paper:
```
@article{zhang2024aflow,
title={AFlow: Automating Agentic Workflow Generation},
author={Zhang, Jiayi and Xiang, Jinyu and Yu, Zhaoyang and Teng, Fengwei and Chen, Xionghui and Chen, Jiaqi and Zhuge, Mingchen and Cheng, Xin and Hong, Sirui and Wang, Jinlin and others},
journal={arXiv preprint arXiv:2410.10762},
year={2024}
}
```

View file

@ -1,60 +1,47 @@
# AFlow: Automating Agentic Workflow Generation
AFlow is a framework for automatically generating and optimizing Agentic Workflows. It uses Monte Carlo tree search in a code-represented workflow space to find effective workflows, replacing manual development with machine effort. Our approach shows potential to outperform handcrafted workflows on various tasks.
AFlow is a framework for automatically generating and optimizing Agentic Workflows. It uses Monte Carlo tree search in a code-represented workflow space to find effective workflows, replacing manual development with machine effort. Our approach shows potential to outperform handcrafted workflows on various tasks.
[Read our paper on arXiv](https://arxiv.org/abs/2410.10762)
[Insert performance graph/image here]
![Performance of AFlow](examples/aflow/img/performance_of_AFlow.png)
## Framework Components
- **Node**: Basic unit of LLM invocation. See `action_node.py` for a flexible interface to control LLM, temperature, format, and prompt.
- **Operator**: Predefined combinations of Nodes to enhance search efficiency. Encapsulates common operations like Generate, Format, Review, Revise, Ensemble, Test, and Programmer.
- **Workflow**: A sequence of LLM-invoking nodes connected by edges. Can be represented as graphs, neural networks, or code to express various execution structures.
- **Optimizer**: Uses LLMs within a Monte Carlo Tree Search variant to explore and refine workflows. Iteratively selects, expands, evaluates, and updates workflows based on performance.
- **Evaluator**: Assesses workflow performance on given tasks. Provides feedback to guide the optimization process towards more effective workflows.
- **Node**: Basic unit of LLM invocation. See `metagpt/actions/action_node.py` for a flexible interface to control LLM, temperature, format, and prompt.
- **Operator**: Predefined combinations of Nodes to enhance search efficiency. Encapsulates common operations like Generate, Format, Review, Revise, Ensemble, Test, and Programmer. See `metagpt/ext/aflow/operator.py` for details. You can customize your own Operator by referencing the implementations in this code.
- **Workflow**: A sequence of LLM-invoking nodes connected by edges. Can be represented as graphs, neural networks, or code to express various execution structures. See `metagpt/ext/aflow/workflow.py` for our implementation.
- **Optimizer**: Uses LLMs within a Monte Carlo Tree Search variant to explore and refine workflows. Iteratively selects, expands, evaluates, and updates workflows based on performance. See `metagpt/ext/aflow/scripts/optimizer.py` for details.
- **Evaluator**: Assesses workflow performance on given tasks. Provides feedback to guide the optimization process towards more effective workflows. See `metagpt/ext/aflow/scripts/evaluator.py` for details.
## Datasets
We provide implementations for [list datasets here].
### Experimental Datasets
We conducted experiments on six datasets (HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP) and provide their evaluation code. The data can be found in this [datasets](https://drive.google.com/uc?export=download&id=1DNoegtZiUhWtvkd2xoIuElmIi4ah7k8e) link, or you can download them using `metagpt/ext/aflow/data/download_data.py`
Data is available at [link to data].
For custom tasks, [brief instructions or link to documentation].
### Custom Datasets
For custom tasks, you can reference the code in the metagpt/ext/aflow/benchmark folder. Inherit the `BaseBenchmark` class and implement `evaluate_problem`, `calculate_score`, and `get_result_columns` to add your custom dataset benchmark. Then, add your benchmark name in `metagpt/ext/aflow/scripts/evaluator.py` and `metagpt/ext/aflow/scripts/optimizer.py` to find effective workflows for your custom dataset.
## Quick Start
1. Configure your search in `optimize.py`:
- Open `metagpt/ext/aflow/scripts/optimize.py`
- Open `examples/aflow/optimize.py`
- Set the following parameters:
```python
dataset = "HumanEval" # Choose from: "HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP" or your custom dataset name
question_type = "code" # Choose from: "math", "code", "qa"
sample = 5 # Number of samples to use for optimization
sample = 4 # Number of samples to use for optimization
check_convergence = True # Whether to check for convergence
optimized_path = "path/to/optimized/workflows" # Path to save optimized workflows
optimized_path = "path/to/optimized/workflows" # Path to save optimized workflows, defaults to metagpt/ext/aflow/scripts/optimized
initial_round = 1 # Starting round number
max_rounds = 20 # Maximum number of optimization rounds
```
- Adjust these parameters according to your specific requirements and dataset
2. Set up parameters in `config/config2.yaml` (see `metagpt/ext/aflow/config2.example.yaml` for reference)
3. Set the operator you want to use in `optimize.py` and in `xxxx`
4. Download the init round of six datasets and put them in `xxxxxx`
5. Add your custom dataset and corresponding evaluation function:
- Create a new Python file in the `metagpt/ext/aflow/benchmark/` directory, named `{custom_dataset_name}.py`
- Implement the following key functions in this new file:
- `load_data`: for loading the dataset
- `evaluate_problem`: for evaluating a single problem solution
- `evaluate_all_problems`: for evaluating all problems
- `save_results_to_csv`: for saving evaluation results
- `optimize_{custom_dataset_name}_evaluation`: main evaluation function that integrates the above functionalities
- Add your custom dataset name and config val_list in `metagpt/ext/aflow/scripts/evaluator.py`
## License
[License information]
2. Set up parameters in `config/config2.yaml` (see `examples/aflow/config2.example.yaml` for reference)
3. Set the operator you want to use in `optimize.py` and in `optimized_path/template/operator.py`, `optimized_path/template/operator.json`. You can reference our implementation to add operators for specific datasets
4. When you first run, you can download the datasets and initial rounds by setting `download(["datasets", "initial_rounds"])` in `examples/aflow/optimize.py`
5. (Optional) Add your custom dataset and corresponding evaluation function following the [Custom Datasets](#custom-datasets) section
6. Run `python examples/aflow/optimize.py` to start the optimization process!
## Citation

View file

@ -17,26 +17,26 @@ class ConvergenceUtils:
def load_data(self, root_path):
"""
读取 JSON 文件如果不存在则创建一个新文件然后返回数据
Read JSON file, create a new file if it doesn't exist, then return the data.
"""
rounds_dir = os.path.join(root_path, "workflows")
result_file = os.path.join(rounds_dir, "results.json")
# 确保目录存在
# Ensure directory exists
os.makedirs(rounds_dir, exist_ok=True)
# 如果文件不存在,创建一个包含空列表的新文件
# If file doesn't exist, create a new one with an empty list
if not os.path.exists(result_file):
with open(result_file, 'w') as file:
json.dump([], file)
# 读取文件并返回数据
# Read file and return data
with open(result_file, 'r') as file:
return json.load(file)
def process_rounds(self):
"""
round 为单位组织数据返回按轮次的分数字典
Organize data by round, return a dictionary of scores by round.
"""
self.data = self.load_data(root_path=self.root_path)
rounds = {}
@ -50,7 +50,7 @@ class ConvergenceUtils:
def calculate_avg_and_std(self):
"""
计算每轮的平均分和标准差返回两个列表平均分和标准差
Calculate average score and standard deviation for each round, return two lists: average scores and standard deviations.
"""
self.rounds = self.process_rounds()
@ -64,61 +64,66 @@ class ConvergenceUtils:
def check_convergence(self, top_k=3, z=0, consecutive_rounds=5):
"""
检查收敛的函数z 为置信水平对应的 z 分数
consecutive_rounds 为连续轮次内满足停止条件的次数
Check for convergence. z is the z-score corresponding to the confidence level.
consecutive_rounds is the number of consecutive rounds that must meet the stop condition.
"""
# Calculate average score and standard deviation for each round
self.avg_scores, self.stds = self.calculate_avg_and_std()
# If total rounds are not enough to calculate top_k+1 rounds, return not converged
if len(self.avg_scores) < top_k + 1:
return False, None, None
convergence_count = 0
previous_Y = None
sigma_Y_previous = None
convergence_count = 0 # Convergence counter
previous_Y = None # Y value of the previous round (average of top_k scores)
sigma_Y_previous = None # Standard error of Y value from previous round
for i in range(len(self.avg_scores)):
# 动态选择当前轮次及之前所有轮次的 top_k
top_k_indices = np.argsort(self.avg_scores[:i + 1])[::-1][:top_k]
top_k_scores = [self.avg_scores[j] for j in top_k_indices]
top_k_stds = [self.stds[j] for j in top_k_indices]
# Dynamically select top_k from current round and all previous rounds
top_k_indices = np.argsort(self.avg_scores[:i + 1])[::-1][:top_k] # Select top k indices by descending average score
top_k_scores = [self.avg_scores[j] for j in top_k_indices] # Get list of top k scores
top_k_stds = [self.stds[j] for j in top_k_indices] # Get list of standard deviations corresponding to top k scores
# Calculate mean of top k scores for current round, i.e., Y_current
Y_current = np.mean(top_k_scores)
# Calculate standard error of Y_current (sigma_Y_current), representing score dispersion
sigma_Y_current = np.sqrt(np.sum([s ** 2 for s in top_k_stds]) / (top_k ** 2))
# If not the first round, calculate change in Y (Delta_Y) and corresponding standard error
if previous_Y is not None:
# Calculate Y difference between current round and previous round
Delta_Y = Y_current - previous_Y
# Calculate standard error of Y difference (sigma_Delta_Y)
sigma_Delta_Y = np.sqrt(sigma_Y_current ** 2 + sigma_Y_previous ** 2)
# Check if Y change is within acceptable confidence interval, i.e., convergence condition
if abs(Delta_Y) <= z * sigma_Delta_Y:
convergence_count += 1
# If consecutive converged rounds reach set value, return convergence information
if convergence_count >= consecutive_rounds:
return True, i - consecutive_rounds + 1, i
else:
# If change is large, reset convergence counter
convergence_count = 0
# Update Y value and standard error for previous round
previous_Y = Y_current
sigma_Y_previous = sigma_Y_current
# If convergence condition not met, return not converged
return False, None, None
def print_results(self):
"""
打印所有轮次的平均分和标准差
Print average score and standard deviation for all rounds.
"""
self.avg_scores, self.stds = self.calculate_avg_and_std()
for i, (avg_score, std) in enumerate(zip(self.avg_scores, self.stds), 1):
logger.info(f"轮次 {i}: 平均分 = {avg_score:.4f}, 标准差 = {std:.4f}")
logger.info(f"Round {i}: Average Score = {avg_score:.4f}, Standard Deviation = {std:.4f}")
if __name__ == "__main__":
# 使用该类,并指定 top_k
checker = ConvergenceUtils("path") # 例如设置 top_k=5
# Use this class and specify top_k
checker = ConvergenceUtils("path") # For example, set top_k=5
converged, convergence_round, final_round = checker.check_convergence()
if converged:
logger.info(f"检测到收敛,发生在第 {convergence_round} 轮,最终轮次为 {final_round}")
logger.info(f"Convergence detected, occurred at round {convergence_round}, final round is {final_round}")
else:
logger.info("在所有轮次内未检测到收敛")
logger.info("No convergence detected within all rounds")
# 打印每轮的平均分和标准差
# Print average score and standard deviation for each round
checker.print_results()