Merge pull request #1510 from didiforgithub/main

AFLOW
2026-07-02 16:01:04 +02:00 · 2024-10-29 20:33:17 +08:00 · 2024-10-29 20:33:17 +08:00 · dbfd37bb4d
commit dbfd37bb4d
parent 4dc1a385af d01051abc6
41 changed files with 3286 additions and 0 deletions
--- a/examples/aflow/README.md
+++ b/examples/aflow/README.md
@ -0,0 +1,89 @@
+# AFlow: Automating Agentic Workflow Generation
+
+AFlow is a framework for automatically generating and optimizing Agentic Workflows. It uses Monte Carlo tree search in a code-represented workflow space to find effective workflows, replacing manual development with machine effort. Our approach shows potential to outperform handcrafted workflows on various tasks. 
+
+[Read our paper on arXiv](https://arxiv.org/abs/2410.10762)
+
+<p align="center">
+<a href=""><img src="../../docs/resources/aflow/AFLOW-performance.jpg" alt="Performance Of AFlow" title="Performance of AFlow<sub>1</sub>" width="80%"></a>
+</p>
+
+## Framework Components
+
+- **Node**: Basic unit of LLM invocation. See `metagpt/actions/action_node.py` for a flexible interface to control LLM, temperature, format, and prompt.
+- **Operator**: Predefined combinations of Nodes to enhance search efficiency. Encapsulates common operations like Generate, Format, Review, Revise, Ensemble, Test, and Programmer. See `metagpt/ext/aflow/operator.py` for details. You can customize your own Operator by referencing the implementations in this code.
+- **Workflow**: A sequence of LLM-invoking nodes connected by edges. Can be represented as graphs, neural networks, or code to express various execution structures. See `metagpt/ext/aflow/workflow.py` for our implementation.
+- **Optimizer**: Uses LLMs within a Monte Carlo Tree Search variant to explore and refine workflows. Iteratively selects, expands, evaluates, and updates workflows based on performance. See `metagpt/ext/aflow/scripts/optimizer.py` for details.
+- **Evaluator**: Assesses workflow performance on given tasks. Provides feedback to guide the optimization process towards more effective workflows. See `metagpt/ext/aflow/scripts/evaluator.py` for details.
+
+<p align="center">
+<a href=""><img src="../../docs/resources/aflow/AFLOW-method.jpg" alt="Framework of AFlow" title="Framework of AFlow <sub>1</sub>" width="80%"></a>
+</p>
+
+## Datasets
+
+### Experimental Datasets
+We conducted experiments on six datasets (HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP) and provide their evaluation code. The data can be found in this [datasets](https://drive.google.com/uc?export=download&id=1DNoegtZiUhWtvkd2xoIuElmIi4ah7k8e) link, or you can download them using `metagpt/ext/aflow/data/download_data.py`
+
+<p align="center">
+<a href=""><img src="../../docs/resources/aflow/AFLOW-experiment.jpg" alt="Performance Of AFlow" title="Performance Of AFlow <sub>1</sub>" width="80%"></a>
+</p>
+
+### Custom Datasets
+For custom tasks, you can reference the code in the `metagpt/ext/aflow/benchmark` folder. Inherit the `BaseBenchmark` class and implement `evaluate_problem`, `calculate_score`, and `get_result_columns` to add your custom dataset benchmark. Then, add your benchmark name in `metagpt/ext/aflow/scripts/evaluator.py` and `metagpt/ext/aflow/scripts/optimizer.py` to find effective workflows for your custom dataset.
+
+## Quick Start
+
+1. Configure optimization parameters:
+   - Use command line arguments or modify default parameters in `examples/aflow/optimize.py`:
+     ```python
+     --dataset MATH          # Dataset type (HumanEval/MBPP/GSM8K/MATH/HotpotQA/DROP)
+     --sample 4             # Sample count - number of workflows to be resampled
+     --question_type math   # Question type (math/code/qa)
+     --optimized_path PATH  # Optimized result save path
+     --initial_round 1      # Initial round
+     --max_rounds 20        # Max iteration rounds for AFLOW
+     --check_convergence    # Whether to enable early stop
+     --validation_rounds 5  # Validation rounds for AFLOW
+     --if_first_optimize   # Set True for first optimization, False afterwards
+     ```
+
+2. Configure LLM parameters in `config/config2.yaml` (see `examples/aflow/config2.example.yaml` for reference)
+
+3. Set up operators in `optimize.py` and in `optimized_path/template/operator.py`, `optimized_path/template/operator.json`. You can reference our implementation to add operators for specific datasets
+
+4. For first-time use, download datasets and initial rounds by setting `download(["datasets", "initial_rounds"])` in `examples/aflow/optimize.py`
+
+5. (Optional) Add your custom dataset and corresponding evaluation function following the [Custom Datasets](#custom-datasets) section
+
+6. (Optional) If you want to use a portion of the validation data, you can set `va_list` in `examples/aflow/evaluator.py`
+
+7. Run the optimization:
+   ```bash
+   # Using default parameters
+   python -m examples.aflow.optimize
+   
+   # Or with custom parameters
+   python -m examples.aflow.optimize --dataset MATH --sample 4 --question_type math
+   ```
+
+## Reproduce the Results in the Paper
+1. We provide the raw data obtained from our experiments in this [link](https://drive.google.com/uc?export=download&id=1Sr5wjgKf3bN8OC7G6cO3ynzJqD4w6_Dv), including the workflows and prompts generated in each iteration, as well as their trajectories on the validation dataset. We also provide the optimal workflow for each dataset and the corresponding data on the test dataset. You can download these data using `metagpt/ext/aflow/data/download_data.py`.
+2. You can directly reproduce our experimental results by running the scripts in `examples/aflow/experiments`.
+
+
+## Citation
+
+If you use AFlow in your research, please cite our paper:
+
+```
+@misc{zhang2024aflow,
+      title={AFlow: Automating Agentic Workflow Generation}, 
+      author={Jiayi Zhang and Jinyu Xiang and Zhaoyang Yu and Fengwei Teng and Xionghui Chen and Jiaqi Chen and Mingchen Zhuge and Xin Cheng and Sirui Hong and Jinlin Wang and Bingnan Zheng and Bang Liu and Yuyu Luo and Chenglin Wu},
+      year={2024},
+      eprint={2410.10762},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2410.10762}, 
+}
+```
--- a/examples/aflow/config2.example.yaml
+++ b/examples/aflow/config2.example.yaml
@ -0,0 +1,12 @@
+models:
+ "<model_name>": # model: "gpt-4-turbo"  # or gpt-3.5-turbo
+   api_type: "openai"  # or azure / ollama / groq etc.
+   base_url: "<your base url>" 
+   api_key: "<your api key>"
+   temperature: 0
+ "<model_name>":  
+   api_type: "openai"  
+   base_url: "<your base url>"
+   api_key: "<your api key>"
+   temperature: 0
+CALC_USAGE: True 
--- a/examples/aflow/experiments/optimize_drop.py
+++ b/examples/aflow/experiments/optimize_drop.py
@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-
+# @Date    : 8/23/2024 20:00 PM
+# @Author  : didi
+# @Desc    : Entrance of AFlow.
+
+import argparse
+
+from metagpt.configs.models_config import ModelsConfig
+from metagpt.ext.aflow.scripts.evaluator import Optimizer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="AFlow Optimizer for DROP")
+    parser.add_argument("--dataset", type=str, default="DROP", help="Dataset type")
+    parser.add_argument("--sample", type=int, default=4, help="Sample count")
+    parser.add_argument("--question_type", type=str, default="qa", help="Question type")
+    parser.add_argument(
+        "--optimized_path", type=str, default="metagpt/ext/aflow/scripts/optimized", help="Optimized result save path"
+    )
+    parser.add_argument("--initial_round", type=int, default=1, help="Initial round")
+    parser.add_argument("--max_rounds", type=int, default=20, help="Max iteration rounds")
+    parser.add_argument("--check_convergence", type=bool, default=True, help="Whether to enable early stop")
+    parser.add_argument("--validation_rounds", type=int, default=5, help="Validation rounds")
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = parse_args()
+
+    mini_llm_config = ModelsConfig.default().get("gpt-4o-mini")
+    claude_llm_config = ModelsConfig.default().get("claude-3-5-sonnet-20240620")
+
+    operators = [
+        "Custom",
+        "AnswerGenerate",
+        "ScEnsemble",
+    ]
+
+    optimizer = Optimizer(
+        dataset=args.dataset,
+        question_type=args.question_type,
+        opt_llm_config=claude_llm_config,
+        exec_llm_config=mini_llm_config,
+        check_convergence=args.check_convergence,
+        operators=operators,
+        optimized_path=args.optimized_path,
+        sample=args.sample,
+        initial_round=args.initial_round,
+        max_rounds=args.max_rounds,
+        validation_rounds=args.validation_rounds,
+    )
+
+    optimizer.optimize("Graph")
--- a/examples/aflow/experiments/optimize_gsm8k.py
+++ b/examples/aflow/experiments/optimize_gsm8k.py
@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-
+# @Date    : 8/23/2024 20:00 PM
+# @Author  : didi
+# @Desc    : Entrance of AFlow.
+
+import argparse
+
+from metagpt.configs.models_config import ModelsConfig
+from metagpt.ext.aflow.scripts.evaluator import Optimizer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="AFlow Optimizer for GSM8K")
+    parser.add_argument("--dataset", type=str, default="GSM8K", help="Dataset type")
+    parser.add_argument("--sample", type=int, default=4, help="Sample count")
+    parser.add_argument("--question_type", type=str, default="math", help="Question type")
+    parser.add_argument(
+        "--optimized_path", type=str, default="metagpt/ext/aflow/scripts/optimized", help="Optimized result save path"
+    )
+    parser.add_argument("--initial_round", type=int, default=1, help="Initial round")
+    parser.add_argument("--max_rounds", type=int, default=20, help="Max iteration rounds")
+    parser.add_argument("--check_convergence", type=bool, default=True, help="Whether to enable early stop")
+    parser.add_argument("--validation_rounds", type=int, default=5, help="Validation rounds")
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = parse_args()
+
+    mini_llm_config = ModelsConfig.default().get("gpt-4o-mini")
+    claude_llm_config = ModelsConfig.default().get("claude-3-5-sonnet-20240620")
+
+    operators = [
+        "Custom",
+        "ScEnsemble",
+        "Programmer",
+    ]
+
+    optimizer = Optimizer(
+        dataset=args.dataset,
+        question_type=args.question_type,
+        opt_llm_config=claude_llm_config,
+        exec_llm_config=mini_llm_config,
+        check_convergence=args.check_convergence,
+        operators=operators,
+        optimized_path=args.optimized_path,
+        sample=args.sample,
+        initial_round=args.initial_round,
+        max_rounds=args.max_rounds,
+        validation_rounds=args.validation_rounds,
+    )
+
+    optimizer.optimize("Graph")
--- a/examples/aflow/experiments/optimize_hotpotqa.py
+++ b/examples/aflow/experiments/optimize_hotpotqa.py
@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-
+# @Date    : 8/23/2024 20:00 PM
+# @Author  : didi
+# @Desc    : Entrance of AFlow.
+
+import argparse
+
+from metagpt.configs.models_config import ModelsConfig
+from metagpt.ext.aflow.scripts.evaluator import Optimizer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="AFlow Optimizer for HotpotQA")
+    parser.add_argument("--dataset", type=str, default="HotpotQA", help="Dataset type")
+    parser.add_argument("--sample", type=int, default=4, help="Sample count")
+    parser.add_argument("--question_type", type=str, default="qa", help="Question type")
+    parser.add_argument(
+        "--optimized_path", type=str, default="metagpt/ext/aflow/scripts/optimized", help="Optimized result save path"
+    )
+    parser.add_argument("--initial_round", type=int, default=1, help="Initial round")
+    parser.add_argument("--max_rounds", type=int, default=20, help="Max iteration rounds")
+    parser.add_argument("--check_convergence", type=bool, default=True, help="Whether to enable early stop")
+    parser.add_argument("--validation_rounds", type=int, default=5, help="Validation rounds")
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = parse_args()
+
+    mini_llm_config = ModelsConfig.default().get("gpt-4o-mini")
+    claude_llm_config = ModelsConfig.default().get("claude-3-5-sonnet-20240620")
+
+    operators = [
+        "Custom",
+        "AnswerGenerate",
+        "ScEnsemble",
+    ]
+
+    optimizer = Optimizer(
+        dataset=args.dataset,
+        question_type=args.question_type,
+        opt_llm_config=claude_llm_config,
+        exec_llm_config=mini_llm_config,
+        check_convergence=args.check_convergence,
+        operators=operators,
+        optimized_path=args.optimized_path,
+        sample=args.sample,
+        initial_round=args.initial_round,
+        max_rounds=args.max_rounds,
+        validation_rounds=args.validation_rounds,
+    )
+
+    optimizer.optimize("Graph")
--- a/examples/aflow/experiments/optimize_humaneval.py
+++ b/examples/aflow/experiments/optimize_humaneval.py
@ -0,0 +1,54 @@
+# -*- coding: utf-8 -*-
+# @Date    : 8/23/2024 20:00 PM
+# @Author  : didi
+# @Desc    : Entrance of AFlow.
+
+import argparse
+
+from metagpt.configs.models_config import ModelsConfig
+from metagpt.ext.aflow.scripts.evaluator import Optimizer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="AFlow Optimizer for HumanEval")
+    parser.add_argument("--dataset", type=str, default="HumanEval", help="Dataset type")
+    parser.add_argument("--sample", type=int, default=4, help="Sample count")
+    parser.add_argument("--question_type", type=str, default="code", help="Question type")
+    parser.add_argument(
+        "--optimized_path", type=str, default="metagpt/ext/aflow/scripts/optimized", help="Optimized result save path"
+    )
+    parser.add_argument("--initial_round", type=int, default=1, help="Initial round")
+    parser.add_argument("--max_rounds", type=int, default=20, help="Max iteration rounds")
+    parser.add_argument("--check_convergence", type=bool, default=True, help="Whether to enable early stop")
+    parser.add_argument("--validation_rounds", type=int, default=5, help="Validation rounds")
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = parse_args()
+
+    mini_llm_config = ModelsConfig.default().get("gpt-4o-mini")
+    claude_llm_config = ModelsConfig.default().get("claude-3-5-sonnet-20240620")
+
+    operators = [
+        "Custom",
+        "CustomCodeGenerate",
+        "ScEnsemble",
+        "Test",
+    ]
+
+    optimizer = Optimizer(
+        dataset=args.dataset,
+        question_type=args.question_type,
+        opt_llm_config=claude_llm_config,
+        exec_llm_config=mini_llm_config,
+        check_convergence=args.check_convergence,
+        operators=operators,
+        optimized_path=args.optimized_path,
+        sample=args.sample,
+        initial_round=args.initial_round,
+        max_rounds=args.max_rounds,
+        validation_rounds=args.validation_rounds,
+    )
+
+    optimizer.optimize("Graph")
--- a/examples/aflow/experiments/optimize_math.py
+++ b/examples/aflow/experiments/optimize_math.py
@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-
+# @Date    : 8/23/2024 20:00 PM
+# @Author  : didi
+# @Desc    : Entrance of AFlow.
+
+import argparse
+
+from metagpt.configs.models_config import ModelsConfig
+from metagpt.ext.aflow.scripts.evaluator import Optimizer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="AFlow Optimizer for MATH")
+    parser.add_argument("--dataset", type=str, default="MATH", help="Dataset type")
+    parser.add_argument("--sample", type=int, default=4, help="Sample count")
+    parser.add_argument("--question_type", type=str, default="math", help="Question type")
+    parser.add_argument(
+        "--optimized_path", type=str, default="metagpt/ext/aflow/scripts/optimized", help="Optimized result save path"
+    )
+    parser.add_argument("--initial_round", type=int, default=1, help="Initial round")
+    parser.add_argument("--max_rounds", type=int, default=20, help="Max iteration rounds")
+    parser.add_argument("--check_convergence", type=bool, default=True, help="Whether to enable early stop")
+    parser.add_argument("--validation_rounds", type=int, default=5, help="Validation rounds")
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = parse_args()
+
+    mini_llm_config = ModelsConfig.default().get("gpt-4o-mini")
+    claude_llm_config = ModelsConfig.default().get("claude-3-5-sonnet-20240620")
+
+    operators = [
+        "Custom",
+        "ScEnsemble",
+        "Programmer",
+    ]
+
+    optimizer = Optimizer(
+        dataset=args.dataset,
+        question_type=args.question_type,
+        opt_llm_config=claude_llm_config,
+        exec_llm_config=mini_llm_config,
+        check_convergence=args.check_convergence,
+        operators=operators,
+        optimized_path=args.optimized_path,
+        sample=args.sample,
+        initial_round=args.initial_round,
+        max_rounds=args.max_rounds,
+        validation_rounds=args.validation_rounds,
+    )
+
+    optimizer.optimize("Graph")
--- a/examples/aflow/experiments/optimize_mbpp.py
+++ b/examples/aflow/experiments/optimize_mbpp.py
@ -0,0 +1,54 @@
+# -*- coding: utf-8 -*-
+# @Date    : 8/23/2024 20:00 PM
+# @Author  : didi
+# @Desc    : Entrance of AFlow.
+
+import argparse
+
+from metagpt.configs.models_config import ModelsConfig
+from metagpt.ext.aflow.scripts.evaluator import Optimizer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="AFlow Optimizer for MBPP")
+    parser.add_argument("--dataset", type=str, default="MBPP", help="Dataset type")
+    parser.add_argument("--sample", type=int, default=4, help="Sample count")
+    parser.add_argument("--question_type", type=str, default="code", help="Question type")
+    parser.add_argument(
+        "--optimized_path", type=str, default="metagpt/ext/aflow/scripts/optimized", help="Optimized result save path"
+    )
+    parser.add_argument("--initial_round", type=int, default=1, help="Initial round")
+    parser.add_argument("--max_rounds", type=int, default=20, help="Max iteration rounds")
+    parser.add_argument("--check_convergence", type=bool, default=True, help="Whether to enable early stop")
+    parser.add_argument("--validation_rounds", type=int, default=5, help="Validation rounds")
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = parse_args()
+
+    mini_llm_config = ModelsConfig.default().get("gpt-4o-mini")
+    claude_llm_config = ModelsConfig.default().get("claude-3-5-sonnet-20240620")
+
+    operators = [
+        "Custom",
+        "CustomCodeGenerate",
+        "ScEnsemble",
+        "Test",
+    ]
+
+    optimizer = Optimizer(
+        dataset=args.dataset,
+        question_type=args.question_type,
+        opt_llm_config=claude_llm_config,
+        exec_llm_config=mini_llm_config,
+        check_convergence=args.check_convergence,
+        operators=operators,
+        optimized_path=args.optimized_path,
+        sample=args.sample,
+        initial_round=args.initial_round,
+        max_rounds=args.max_rounds,
+        validation_rounds=args.validation_rounds,
+    )
+
+    optimizer.optimize("Graph")
--- a/examples/aflow/optimize.py
+++ b/examples/aflow/optimize.py
@ -0,0 +1,71 @@
+# -*- coding: utf-8 -*-
+# @Date    : 8/23/2024 20:00 PM
+# @Author  : didi
+# @Desc    : Entrance of AFlow.
+
+import argparse
+
+from metagpt.configs.models_config import ModelsConfig
+from metagpt.ext.aflow.data.download_data import download
+from metagpt.ext.aflow.scripts.optimizer import Optimizer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="AFlow Optimizer")
+    parser.add_argument(
+        "--dataset",
+        type=str,
+        default="MATH",
+        help="Dataset type, including HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP",
+    )
+    parser.add_argument("--sample", type=int, default=4, help="Sample count")
+    parser.add_argument("--question_type", type=str, default="math", help="Question type, including math, code, qa")
+    parser.add_argument(
+        "--optimized_path", type=str, default="metagpt/ext/aflow/scripts/optimized", help="Optimized result save path"
+    )
+    parser.add_argument("--initial_round", type=int, default=1, help="Initial round")
+    parser.add_argument("--max_rounds", type=int, default=20, help="Max iteration rounds")
+    parser.add_argument("--check_convergence", type=bool, default=True, help="Whether to enable early stop")
+    parser.add_argument("--validation_rounds", type=int, default=5, help="Validation rounds")
+    parser.add_argument("--if_first_optimize", type=bool, default=True, help="Whether this is first optimization")
+    return parser.parse_args()
+
+
+# Config llm model, you can modify `config/config2.yaml` to use more llms.
+mini_llm_config = ModelsConfig.default().get("gpt-4o-mini")
+claude_llm_config = ModelsConfig.default().get("claude-3-5-sonnet-20240620")
+
+# Config operators.
+operators = [
+    "Custom",  # It's basic unit of a fixed node. optimizer can modify its prompt to get vairous nodes.
+    # "AnswerGenerate",              # It's for qa
+    # "CustomCodeGenerate",         # It's for code
+    "ScEnsemble",  # It's for code, math and qa
+    # "Test",                       # It's for code
+    "Programmer",  # It's for math
+]
+
+if __name__ == "__main__":
+    args = parse_args()
+
+    # Create an optimizer instance
+    optimizer = Optimizer(
+        dataset=args.dataset,  # Config dataset
+        question_type=args.question_type,  # Config Question Type
+        opt_llm_config=claude_llm_config,  # Config Optimizer LLM
+        exec_llm_config=mini_llm_config,  # Config Execution LLM
+        check_convergence=args.check_convergence,  # Whether Early Stop
+        operators=operators,  # Config Operators you want to use
+        optimized_path=args.optimized_path,  # Config Optimized workflow's file path
+        sample=args.sample,  # Only Top(sample) rounds will be selected.
+        initial_round=args.initial_round,  # Optimize from initial round
+        max_rounds=args.max_rounds,  # The max iteration of AFLOW.
+        validation_rounds=args.validation_rounds,  # The validation rounds of AFLOW.
+    )
+
+    # When you fisrt use, please download the datasets and initial rounds; If you want to get a look of the results, please download the results.
+    download(["datasets", "initial_rounds"], if_first_download=args.if_first_optimize)
+    # Optimize workflow via setting the optimizer's mode to 'Graph'
+    optimizer.optimize("Graph")
+    # Test workflow via setting the optimizer's mode to 'Test'
+    # optimizer.optimize("Test")