Merge branch 'geekan:main' into main

2026-06-11 15:15:18 +02:00 · 2023-08-17 11:02:27 -05:00 · 2023-08-17 11:02:27 -05:00 · a0e6d20034
commit a0e6d20034
parent 21629f841b 625342199a
50 changed files with 1734 additions and 250 deletions
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -1,20 +1,9 @@
+default_stages: [ commit ]
+
 # Install
 # 1. pip install pre-commit
 # 2. pre-commit install(the first time you download the repo, it will be cached for future use)
 repos:
-  - repo: https://github.com/pycqa/flake8
-    rev: 4.0.1
-    hooks:
-      - id: flake8
-        args: [
-            "--show-source",
-            "--count",
-            "--statistics",
-            "--extend-ignore=E203,E402,C901,E501,E101,E266,E731,W291,F821,W191,E122,E125,E127,E128,W293",
-            "--per-file-ignores=__init__.py:F401",
-        ] # when necessary, ignore errors, https://flake8.pycqa.org/en/latest/user/error-codes.html
-        exclude: ^venv/ # exclude dir, e.g. (^foo/|^bar/)
-
  - repo: https://github.com/pycqa/isort
    rev: 5.11.5
    hooks:
@ -24,3 +13,15 @@ repos:
            (?x)^(
            .*__init__\.py$
            )
+
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: v0.0.284
+    hooks:
+      - id: ruff
+
+  - repo: https://github.com/psf/black
+    rev: 23.3.0
+    hooks:
+      - id: black
+        args: ['--line-length', '120']
--- a/README.md
+++ b/README.md
@ -33,7 +33,7 @@ ## Examples (fully generated by GPT-4)

 ![Jinri Toutiao Recsys Data & API Design](docs/resources/workspace/content_rec_sys/resources/data_api_design.png)

-It requires around **$0.2** (GPT-4 api's costs) to generate one example with analysis and design, around **$2.0** to a full project.
+It costs approximately **$0.2** (in GPT-4 API fees) to generate one example with analysis and design, and around **$2.0** for a full project.

 ## Installation

@ -71,6 +71,8 @@ # Step 3: Clone the repository to your local machine, and install it.
    MMDC: "./node_modules/.bin/mmdc"
    ```

+- if `python setup.py install` fails with error `[Errno 13] Permission denied: '/usr/local/lib/python3.11/dist-packages/test-easy-install-13129.write-test'`, try instead running `python setup.py install --user`
+
 ### Installation by Docker

 ```bash
--- a/config/config.yaml
+++ b/config/config.yaml
@ -63,6 +63,10 @@ SD_T2I_API: "/sdapi/v1/txt2img"
 #PUPPETEER_CONFIG: "./config/puppeteer-config.json"
 #MMDC: "./node_modules/.bin/mmdc"

-### for update_costs & calc_usage
-UPDATE_COSTS: false
-CALC_USAGE: false
+
+### for calc_usage
+# CALC_USAGE: false
+
+### for Research
+MODEL_FOR_RESEARCHER_SUMMARY: gpt-3.5-turbo
+MODEL_FOR_RESEARCHER_REPORT: gpt-3.5-turbo-16k
--- a/docs/FAQ-EN.md
+++ b/docs/FAQ-EN.md
@ -0,0 +1,181 @@
+Our vision is to [extend human life](https://github.com/geekan/HowToLiveLonger) and [reduce working hours](https://github.com/geekan/MetaGPT/).
+
+1.  ### Convenient Link for Sharing this Document:
+
+```
+- MetaGPT-Index/FAQ https://deepwisdom.feishu.cn/wiki/MsGnwQBjiif9c3koSJNcYaoSnu4
+```
+
+2.  ### Link
+
+<!---->
+
+1.  Code：https://github.com/geekan/MetaGPT
+
+1.  Roadmap：https://github.com/geekan/MetaGPT/blob/main/docs/ROADMAP.md
+
+1.  EN
+
+    1.  Demo Video: [MetaGPT: Multi-Agent AI Programming Framework](https://www.youtube.com/watch?v=8RNzxZBTW8M)
+    1.  Tutorial: [MetaGPT: Deploy POWERFUL Autonomous Ai Agents BETTER Than SUPERAGI!](https://www.youtube.com/watch?v=q16Gi9pTG_M&t=659s)
+
+1.  CN
+
+    1.  Demo Video: [MetaGPT：一行代码搭建你的虚拟公司_哔哩哔哩_bilibili](https://www.bilibili.com/video/BV1NP411C7GW/?spm_id_from=333.999.0.0&vd_source=735773c218b47da1b4bd1b98a33c5c77)
+    1.  Tutorial: [一个提示词写游戏 Flappy bird, 比AutoGPT强10倍的MetaGPT，最接近AGI的AI项目](https://youtu.be/Bp95b8yIH5c)
+    1.  Author's thoughts video(CN): [MetaGPT作者深度解析直播回放_哔哩哔哩_bilibili](https://www.bilibili.com/video/BV1Ru411V7XL/?spm_id_from=333.337.search-card.all.click)
+
+<!---->
+
+3.  ### How to become a contributor?
+
+<!---->
+
+1.  Choose a task from the Roadmap (or you can propose one). By submitting a PR, you can become a contributor and join the dev team.
+1.  Current contributors come from backgrounds including: ByteDance AI Lab/DingDong/Didi/Xiaohongshu, Tencent/Baidu/MSRA/TikTok/BloomGPT Infra/Bilibili/CUHK/HKUST/CMU/UCB
+
+<!---->
+
+4.  ### Chief Evangelist (Monthly Rotation)
+
+MetaGPT Community - The position of Chief Evangelist rotates on a monthly basis. The primary responsibilities include:
+
+1.  Maintaining community FAQ documents, announcements, Github resources/READMEs.
+1.  Responding to, answering, and distributing community questions within an average of 30 minutes, including on platforms like Github Issues, Discord and WeChat.
+1.  Upholding a community atmosphere that is enthusiastic, genuine, and friendly.
+1.  Encouraging everyone to become contributors and participate in projects that are closely related to achieving AGI (Artificial General Intelligence).
+1.  (Optional) Organizing small-scale events, such as hackathons.
+
+<!---->
+
+5.  ### FAQ
+
+<!---->
+
+1.  Experience with the generated repo code:
+
+    1.  https://github.com/geekan/MetaGPT/releases/tag/v0.1.0
+
+1.  Code truncation/ Parsing failure:
+
+    1.  Check if it's due to exceeding length. Consider using the gpt-3.5-turbo-16k or other long token versions.
+
+1.  Success rate:
+
+    1.  There hasn't been a quantitative analysis yet, but the success rate of code generated by GPT-4 is significantly higher than that of gpt-3.5-turbo.
+
+1.  Support for incremental, differential updates (if you wish to continue a half-done task):
+
+    1.  Several prerequisite tasks are listed on the ROADMAP.
+
+1.  Can existing code be loaded?
+
+    1.  It's not on the ROADMAP yet, but there are plans in place. It just requires some time.
+
+1.  Support for multiple programming languages and natural languages?
+
+    1.  It's listed on ROADMAP.
+
+1.  Want to join the contributor team? How to proceed?
+
+    1.  Merging a PR will get you into the contributor's team. The main ongoing tasks are all listed on the ROADMAP.
+
+1.  PRD stuck / unable to access/ connection interrupted
+
+    1.  The official OPENAI_API_BASE address is `https://api.openai.com/v1`
+    1.  If the official OPENAI_API_BASE address is inaccessible in your environment (this can be verified with curl), it's recommended to configure using the reverse proxy OPENAI_API_BASE provided by libraries such as openai-forward. For instance, `OPENAI_API_BASE: "``https://api.openai-forward.com/v1``"`
+    1.  If the official OPENAI_API_BASE address is inaccessible in your environment (again, verifiable via curl), another option is to configure the OPENAI_PROXY parameter. This way, you can access the official OPENAI_API_BASE via a local proxy. If you don't need to access via a proxy, please do not enable this configuration; if accessing through a proxy is required, modify it to the correct proxy address. Note that when OPENAI_PROXY is enabled, don't set OPENAI_API_BASE.
+    1.  Note: OpenAI's default API design ends with a v1. An example of the correct configuration is: `OPENAI_API_BASE: "``https://api.openai.com/v1``"`
+
+1.  Absolutely! How can I assist you today?
+
+    1.  Did you use Chi or a similar service? These services are prone to errors, and it seems that the error rate is higher when consuming 3.5k-4k tokens in GPT-4
+
+1.  What does Max token mean?
+
+    1.  It's a configuration for OpenAI's maximum response length. If the response exceeds the max token, it will be truncated.
+
+1.  How to change the investment amount?
+
+    1.  You can view all commands by typing `python startup.py --help`
+
+1.  Which version of Python is more stable?
+
+    1.  python3.9 / python3.10
+
+1.  Can't use GPT-4, getting the error "The model gpt-4 does not exist."
+
+    1.  OpenAI's official requirement: You can use GPT-4 only after spending $1 on OpenAI.
+    1.  Tip: Run some data with gpt-3.5-turbo (consume the free quota and $1), and then you should be able to use gpt-4.
+
+1.  Can games whose code has never been seen before be written?
+
+    1.  Refer to the README. The recommendation system of Toutiao is one of the most complex systems in the world currently. Although it's not on GitHub, many discussions about it exist online. If it can visualize these, it suggests it can also summarize these discussions and convert them into code. The prompt would be something like "write a recommendation system similar to Toutiao". Note: this was approached in earlier versions of the software. The SOP of those versions was different; the current one adopts Elon Musk's five-step work method, emphasizing trimming down requirements as much as possible.
+
+1.  Under what circumstances would there typically be errors?
+
+    1.  More than 500 lines of code: some function implementations may be left blank.
+    1.  When using a database, it often gets the implementation wrong — since the SQL database initialization process is usually not in the code.
+    1.  With more lines of code, there's a higher chance of false impressions, leading to calls to non-existent APIs.
+
+1.  Instructions for using SD Skills/UI Role:
+
+    1.  Currently, there is a test script located in /tests/metagpt/roles. The file ui_role provides the corresponding code implementation. For testing, you can refer to the test_ui in the same directory.
+
+    1.  The UI role takes over from the product manager role, extending the output from the 【UI Design draft】 provided by the product manager role. The UI role has implemented the UIDesign Action. Within the run of UIDesign, it processes the respective context, and based on the set template, outputs the UI. The output from the UI role includes:
+
+        1.  UI Design Description：Describes the content to be designed and the design objectives.
+        1.  Selected Elements：Describes the elements in the design that need to be illustrated.
+        1.  HTML Layout：Outputs the HTML code for the page.
+        1.  CSS Styles (styles.css)：Outputs the CSS code for the page.
+
+    1.  Currently, the SD skill is a tool invoked by UIDesign. It instantiates the SDEngine, with specific code found in metagpt/tools/sd_engine.
+
+    1.  Configuration instructions for SD Skills: The SD interface is currently deployed based on *https://github.com/AUTOMATIC1111/stable-diffusion-webui* **For environmental configurations and model downloads, please refer to the aforementioned GitHub repository. To initiate the SD service that supports API calls, run the command specified in cmd with the parameter nowebui, i.e.,
+
+        1.  > python webui.py --enable-insecure-extension-access --port xxx --no-gradio-queue --nowebui
+        1.      Once it runs without errors, the interface will be accessible after approximately 1 minute when the model finishes loading.
+        1.  Configure SD_URL and SD_T2I_API in the config.yaml/key.yaml files.
+        1.  ![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/065295a67b0b4feea665d1372722d49d~tplv-k3u1fbpfcp-zoom-1.image)
+        1.      SD_URL is the deployed server/machine IP, and Port is the specified port above, defaulting to 7860.
+        1.  > SD_URL: IP:Port
+
+1.  An error occurred during installation: "Another program is using this file...egg".
+
+    1.  Delete the file and try again.
+    1.  Or manually execute`pip install -r requirements.txt`
+
+1.  The origin of the name MetaGPT？
+
+    1.  The name was derived after iterating with GPT-4 over a dozen rounds. GPT-4 scored and suggested it.
+
+1.  Is there a more step-by-step installation tutorial?
+
+    1.  Youtube（CN）：[一个提示词写游戏 Flappy bird, 比AutoGPT强10倍的MetaGPT，最接近AGI的AI项目=一个软件公司产品经理+程序员](https://youtu.be/Bp95b8yIH5c)
+    1.  Youtube（EN）https://www.youtube.com/watch?v=q16Gi9pTG_M&t=659s
+
+1.  openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details
+
+    1.  If you haven't exhausted your free quota, set RPM to 3 or lower in the settings.
+    1.  If your free quota is used up, consider adding funds to your account.
+
+1.  What does "borg" mean in n_borg?
+
+    1.  https://en.wikipedia.org/wiki/Borg
+    1.  The Borg civilization operates based on a hive or collective mentality, known as "the Collective." Every Borg individual is connected to the collective via a sophisticated subspace network, ensuring continuous oversight and guidance for every member. This collective consciousness allows them to not only "share the same thoughts" but also to adapt swiftly to new strategies. While individual members of the collective rarely communicate, the collective "voice" sometimes transmits aboard ships.
+
+1.  How to use the Claude API？
+
+    1.  The full implementation of the Claude API is not provided in the current code.
+    1.  You can use the Claude API through third-party API conversion projects like: https://github.com/jtsang4/claude-to-chatgpt
+
+1.  Is Llama2 supported？
+
+    1.  On the day Llama2 was released, some of the community members began experiments and found that output can be generated based on MetaGPT's structure. However, Llama2's context is too short to generate a complete project. Before regularly using Llama2, it's necessary to expand the context window to at least 8k. If anyone has good recommendations for expansion models or methods, please leave a comment.
+
+1.  `mermaid-cli getElementsByTagName SyntaxError: Unexpected token '.'`
+
+    1.  Upgrade node to version 14.x or above:
+
+        1.  `npm install -g n`
+        1.  `n stable` to install the stable version of node（v18.x）
--- a/docs/README_CN.md
+++ b/docs/README_CN.md
@ -197,4 +197,4 @@ ## 加入微信讨论群

 添加运营小姐姐，拉你入群

-<img src="resources/20230808-220924.jpg" width = "30%" height = "30%" alt="MetaGPT WeChat Discuss Group" align=center />
+<img src="resources/20230811-214014.jpg" width = "30%" height = "30%" alt="MetaGPT WeChat Discuss Group" align=center />
--- a/docs/resources/20230808-002840.jpg
+++ b/docs/resources/20230808-002840.jpg
--- a/docs/resources/20230808-220924.jpg
+++ b/docs/resources/20230808-220924.jpg
--- a/docs/resources/20230811-214014.jpg
+++ b/docs/resources/20230811-214014.jpg
--- a/docs/resources/MetaGPT-WeChat-Group-Simple.png
+++ b/docs/resources/MetaGPT-WeChat-Group-Simple.png
--- a/docs/resources/MetaGPT-WeChat-Group.jpeg
+++ b/docs/resources/MetaGPT-WeChat-Group.jpeg
--- a/docs/resources/MetaGPT-WeChat-Group4.jpeg
+++ b/docs/resources/MetaGPT-WeChat-Group4.jpeg
--- a/docs/resources/MetaGPT-WeChat-Personal-new.jpg
+++ b/docs/resources/MetaGPT-WeChat-Personal-new.jpg
--- a/examples/research.py
+++ b/examples/research.py
@ -0,0 +1,16 @@
+#!/usr/bin/env python
+
+import asyncio
+
+from metagpt.roles.researcher import RESEARCH_PATH, Researcher
+
+
+async def main():
+    topic = "dataiku vs. datarobot"
+    role = Researcher(language="en-us")
+    await role.run(topic)
+    print(f"save report to {RESEARCH_PATH / f'{topic}.md'}.")
+
+
+if __name__ == '__main__':
+    asyncio.run(main())
--- a/metagpt/actions/research.py
+++ b/metagpt/actions/research.py
@ -0,0 +1,277 @@
+#!/usr/bin/env python
+
+from __future__ import annotations
+
+import asyncio
+import json
+from typing import Callable
+
+from pydantic import parse_obj_as
+
+from metagpt.actions import Action
+from metagpt.config import CONFIG
+from metagpt.logs import logger
+from metagpt.tools.search_engine import SearchEngine
+from metagpt.tools.web_browser_engine import WebBrowserEngine, WebBrowserEngineType
+from metagpt.utils.text import generate_prompt_chunk, reduce_message_length
+
+LANG_PROMPT = "Please respond in {language}."
+
+RESEARCH_BASE_SYSTEM = """You are an AI critical thinker research assistant. Your sole purpose is to write well \
+written, critically acclaimed, objective and structured reports on the given text."""
+
+RESEARCH_TOPIC_SYSTEM = "You are an AI researcher assistant, and your research topic is:\n#TOPIC#\n{topic}"
+
+SEARCH_TOPIC_PROMPT = """Please provide up to 2 necessary keywords related to your research topic for Google search. \
+Your response must be in JSON format, for example: ["keyword1", "keyword2"]."""
+
+SUMMARIZE_SEARCH_PROMPT = """### Requirements
+1. The keywords related to your research topic and the search results are shown in the "Search Result Information" section.
+2. Provide up to {decomposition_nums} queries related to your research topic base on the search results.
+3. Please respond in the following JSON format: ["query1", "query2", "query3", ...].
+
+### Search Result Information
+{search_results}
+"""
+
+COLLECT_AND_RANKURLS_PROMPT = """### Topic
+{topic}
+### Query
+{query}
+
+### The online search results
+{results}
+
+### Requirements
+Please remove irrelevant search results that are not related to the query or topic. Then, sort the remaining search results \
+based on the link credibility. If two results have equal credibility, prioritize them based on the relevance. Provide the
+ranked results' indices in JSON format, like [0, 1, 3, 4, ...], without including other words.
+"""
+
+WEB_BROWSE_AND_SUMMARIZE_PROMPT = '''### Requirements
+1. Utilize the text in the "Reference Information" section to respond to the question "{query}".
+2. If the question cannot be directly answered using the text, but the text is related to the research topic, please provide \
+a comprehensive summary of the text.
+3. If the text is entirely unrelated to the research topic, please reply with a simple text "Not relevant."
+4. Include all relevant factual information, numbers, statistics, etc., if available.
+
+### Reference Information
+{content}
+'''
+
+
+CONDUCT_RESEARCH_PROMPT = '''### Reference Information
+{content}
+
+### Requirements
+Please provide a detailed research report in response to the following topic: "{topic}", using the information provided \
+above. The report must meet the following requirements:
+
+- Focus on directly addressing the chosen topic.
+- Ensure a well-structured and in-depth presentation, incorporating relevant facts and figures where available.
+- Present data and findings in an intuitive manner, utilizing feature comparative tables, if applicable.
+- The report should have a minimum word count of 2,000 and be formatted with Markdown syntax following APA style guidelines.
+- Include all source URLs in APA format at the end of the report.
+'''
+
+
+class CollectLinks(Action):
+    """Action class to collect links from a search engine."""
+    def __init__(
+        self,
+        name: str = "",
+        *args,
+        rank_func: Callable[[list[str]], None] | None = None,
+        **kwargs,
+    ):
+        super().__init__(name, *args, **kwargs)
+        self.desc = "Collect links from a search engine."
+        self.search_engine = SearchEngine()
+        self.rank_func = rank_func
+
+    async def run(
+        self,
+        topic: str,
+        decomposition_nums: int = 4,
+        url_per_query: int = 4,
+        system_text: str | None = None,
+    ) -> dict[str, list[str]]:
+        """Run the action to collect links.
+
+        Args:
+            topic: The research topic.
+            decomposition_nums: The number of search questions to generate.
+            url_per_query: The number of URLs to collect per search question.
+            system_text: The system text.
+
+        Returns:
+            A dictionary containing the search questions as keys and the collected URLs as values.
+        """
+        system_text = system_text if system_text else RESEARCH_TOPIC_SYSTEM.format(topic=topic)
+        keywords = await self._aask(SEARCH_TOPIC_PROMPT, [system_text])
+        try:
+            keywords = json.loads(keywords)
+            keywords = parse_obj_as(list[str], keywords)
+        except Exception as e:
+            logger.exception(f"fail to get keywords related to the research topic \"{topic}\" for {e}")
+            keywords = [topic]
+        results = await asyncio.gather(*(self.search_engine.run(i, as_string=False) for i in keywords))
+
+        def gen_msg():
+            while True:
+                search_results = "\n".join(f"#### Keyword: {i}\n Search Result: {j}\n" for (i, j) in zip(keywords, results))
+                prompt = SUMMARIZE_SEARCH_PROMPT.format(decomposition_nums=decomposition_nums, search_results=search_results)
+                yield prompt
+                remove = max(results, key=len)
+                remove.pop()
+                if len(remove) == 0:
+                    break
+        prompt = reduce_message_length(gen_msg(), self.llm.model, system_text, CONFIG.max_tokens_rsp)
+        logger.debug(prompt)
+        queries = await self._aask(prompt, [system_text])
+        try:
+            queries = json.loads(queries)
+            queries = parse_obj_as(list[str], queries)
+        except Exception as e:
+            logger.exception(f"fail to break down the research question due to {e}")
+            queries = keywords
+        ret = {}
+        for query in queries:
+            ret[query] = await self._search_and_rank_urls(topic, query, url_per_query)
+        return ret
+
+    async def _search_and_rank_urls(self, topic: str, query: str, num_results: int = 4) -> list[str]:
+        """Search and rank URLs based on a query.
+
+        Args:
+            topic: The research topic.
+            query: The search query.
+            num_results: The number of URLs to collect.
+
+        Returns:
+            A list of ranked URLs.
+        """
+        max_results = max(num_results * 2, 6)
+        results = await self.search_engine.run(query, max_results=max_results, as_string=False)
+        _results = "\n".join(f"{i}: {j}" for i, j in zip(range(max_results), results))
+        prompt = COLLECT_AND_RANKURLS_PROMPT.format(topic=topic, query=query, results=_results)
+        logger.debug(prompt)
+        indices = await self._aask(prompt)
+        try:
+            indices = json.loads(indices)
+            assert all(isinstance(i, int) for i in indices)
+        except Exception as e:
+            logger.exception(f"fail to rank results for {e}")
+            indices = list(range(max_results))
+        results = [results[i] for i in indices]
+        if self.rank_func:
+            results = self.rank_func(results)
+        return [i["link"] for i in results[:num_results]]
+
+
+class WebBrowseAndSummarize(Action):
+    """Action class to explore the web and provide summaries of articles and webpages."""
+    def __init__(
+        self,
+        *args,
+        browse_func: Callable[[list[str]], None] | None = None,
+        **kwargs,
+    ):
+        super().__init__(*args, **kwargs)
+        if CONFIG.model_for_researcher_summary:
+            self.llm.model = CONFIG.model_for_researcher_summary
+        self.web_browser_engine = WebBrowserEngine(
+            engine=WebBrowserEngineType.CUSTOM if browse_func else None,
+            run_func=browse_func,
+        )
+        self.desc = "Explore the web and provide summaries of articles and webpages."
+
+    async def run(
+        self,
+        url: str,
+        *urls: str,
+        query: str,
+        system_text: str = RESEARCH_BASE_SYSTEM,
+    ) -> dict[str, str]:
+        """Run the action to browse the web and provide summaries.
+
+        Args:
+            url: The main URL to browse.
+            urls: Additional URLs to browse.
+            query: The research question.
+            system_text: The system text.
+
+        Returns:
+            A dictionary containing the URLs as keys and their summaries as values.
+        """
+        contents = await self.web_browser_engine.run(url, *urls)
+        if not urls:
+            contents = [contents]
+
+        summaries = {}
+        prompt_template = WEB_BROWSE_AND_SUMMARIZE_PROMPT.format(query=query, content="{}")
+        for u, content in zip([url, *urls], contents):
+            content = content.inner_text
+            chunk_summaries = []
+            for prompt in generate_prompt_chunk(content, prompt_template, self.llm.model, system_text, CONFIG.max_tokens_rsp):
+                logger.debug(prompt)
+                summary = await self._aask(prompt, [system_text])
+                if summary == "Not relevant.":
+                    continue
+                chunk_summaries.append(summary)
+
+            if not chunk_summaries:
+                summaries[u] = None
+                continue
+
+            if len(chunk_summaries) == 1:
+                summaries[u] = chunk_summaries[0]
+                continue
+
+            content = "\n".join(chunk_summaries)
+            prompt = WEB_BROWSE_AND_SUMMARIZE_PROMPT.format(query=query, content=content)
+            summary = await self._aask(prompt, [system_text])
+            summaries[u] = summary
+        return summaries
+
+
+class ConductResearch(Action):
+    """Action class to conduct research and generate a research report."""
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        if CONFIG.model_for_researcher_report:
+            self.llm.model = CONFIG.model_for_researcher_report
+
+    async def run(
+        self,
+        topic: str,
+        content: str,
+        system_text: str = RESEARCH_BASE_SYSTEM,
+    ) -> str:
+        """Run the action to conduct research and generate a research report.
+
+        Args:
+            topic: The research topic.
+            content: The content for research.
+            system_text: The system text.
+
+        Returns:
+            The generated research report.
+        """
+        prompt = CONDUCT_RESEARCH_PROMPT.format(topic=topic, content=content)
+        logger.debug(prompt)
+        self.llm.auto_max_tokens = True
+        return await self._aask(prompt, [system_text])
+
+
+def get_research_system_text(topic: str, language: str):
+    """Get the system text for conducting research.
+
+    Args:
+        topic: The research topic.
+        language: The language for the system text.
+
+    Returns:
+        The system text for conducting research.
+    """
+    return " ".join((RESEARCH_TOPIC_SYSTEM.format(topic=topic), LANG_PROMPT.format(language=language)))
--- a/metagpt/actions/run_code.py
+++ b/metagpt/actions/run_code.py
@ -5,13 +5,13 @@
@Author  : alexanderwu
@File    : run_code.py
 """
-import traceback
 import os
 import subprocess
-from typing import List, Tuple
+import traceback
+from typing import Tuple

-from metagpt.logs import logger
 from metagpt.actions.action import Action
+from metagpt.logs import logger

 PROMPT_TEMPLATE = """
 Role: You are a senior development and qa engineer, your role is summarize the code running result.
@ -27,7 +27,7 @@ Please summarize the cause of the errors and give correction instruction
 Determine the ONE file to rewrite in order to fix the error, for example, xyz.py, or test_xyz.py
 ## Status:
 Determine if all of the code works fine, if so write PASS, else FAIL,
-WRITE ONLY ONE WORD, PASS OR FAIL, IN THI SECTION
+WRITE ONLY ONE WORD, PASS OR FAIL, IN THIS SECTION
 ## Send To:
 Please write Engineer if the errors are due to problematic development codes, and QaEngineer to problematic test codes, and NoOne if there are no errors,
 WRITE ONLY ONE WORD, Engineer OR QaEngineer OR NoOne, IN THIS SECTION.
@ -55,6 +55,7 @@ standard output: {outs};
 standard errors: {errs};
 """

+
 class RunCode(Action):
    def __init__(self, name="RunCode", context=None, llm=None):
        super().__init__(name, context, llm)
@ -65,7 +66,7 @@ class RunCode(Action):
            # We will document_store the result in this dictionary
            namespace = {}
            exec(code, namespace)
-            return namespace.get('result', ""), ""
+            return namespace.get("result", ""), ""
        except Exception:
            # If there is an error in the code, return the error message
            return "", traceback.format_exc()
@ -81,10 +82,12 @@ class RunCode(Action):
        # Modify the PYTHONPATH environment variable
        additional_python_paths = [working_directory] + additional_python_paths
        additional_python_paths = ":".join(additional_python_paths)
-        env['PYTHONPATH'] = additional_python_paths + ':' + env.get('PYTHONPATH', '')
+        env["PYTHONPATH"] = additional_python_paths + ":" + env.get("PYTHONPATH", "")

        # Start the subprocess
-        process = subprocess.Popen(command, cwd=working_directory, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env)
+        process = subprocess.Popen(
+            command, cwd=working_directory, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env
+        )

        try:
            # Wait for the process to complete, with a timeout
@ -93,7 +96,7 @@ class RunCode(Action):
            logger.info("The command did not complete within the given timeout.")
            process.kill()  # Kill the process if it times out
            stdout, stderr = process.communicate()
-        return stdout.decode('utf-8'), stderr.decode('utf-8')
+        return stdout.decode("utf-8"), stderr.decode("utf-8")

    async def run(
        self, code, mode="script", code_file_name="", test_code="", test_file_name="", command=[], **kwargs
@ -108,11 +111,13 @@ class RunCode(Action):
        logger.info(f"{errs=}")

        context = CONTEXT.format(
-            code=code, code_file_name=code_file_name,
-            test_code=test_code, test_file_name=test_file_name,
+            code=code,
+            code_file_name=code_file_name,
+            test_code=test_code,
+            test_file_name=test_file_name,
            command=" ".join(command),
-            outs=outs[:500], # outs might be long but they are not important, truncate them to avoid token overflow
-            errs=errs[:10000] # truncate errors to avoid token overflow
+            outs=outs[:500],  # outs might be long but they are not important, truncate them to avoid token overflow
+            errs=errs[:10000],  # truncate errors to avoid token overflow
        )

        prompt = PROMPT_TEMPLATE.format(context=context)
--- a/metagpt/actions/write_test.py
+++ b/metagpt/actions/write_test.py
@ -5,7 +5,6 @@
@Author  : alexanderwu
@File    : environment.py
 """
-from metagpt.logs import logger
 from metagpt.actions.action import Action
 from metagpt.utils.common import CodeParser

@ -29,6 +28,7 @@ you should correctly import the necessary classes based on these file locations!
 ## {test_file_name}: Write test code with triple quoto. Do your best to implement THIS ONLY ONE FILE.
 """

+
 class WriteTest(Action):
    def __init__(self, name="WriteTest", context=None, llm=None):
        super().__init__(name, context, llm)
@ -43,7 +43,7 @@ class WriteTest(Action):
            code_to_test=code_to_test,
            test_file_name=test_file_name,
            source_file_path=source_file_path,
-            workspace=workspace
+            workspace=workspace,
        )
        code = await self.write_code(prompt)
        return code
--- a/metagpt/const.py
+++ b/metagpt/const.py
@ -32,5 +32,6 @@ UT_PY_PATH = UT_PATH / "files/ut/"
 API_QUESTIONS_PATH = UT_PATH / "files/question/"
 YAPI_URL = "http://yapi.deepwisdomai.com/"
 TMP = PROJECT_ROOT / 'tmp'
+RESEARCH_PATH = DATA_PATH / "research"

 MEM_TTL = 24 * 30 * 3600
--- a/metagpt/document_store/init.py
+++ b/metagpt/document_store/init.py
@ -7,3 +7,5 @@
 """

 from metagpt.document_store.faiss_store import FaissStore
+
+__all__ = ["FaissStore"]
--- a/metagpt/document_store/base_store.py
+++ b/metagpt/document_store/base_store.py
@ -15,7 +15,7 @@ class BaseStore(ABC):
    """FIXME: consider add_index, set_index and think about granularity."""

    @abstractmethod
-    def search(self, query, *args, **kwargs):
+    def search(self, *args, **kwargs):
        raise NotImplementedError

    @abstractmethod
--- a/metagpt/document_store/qdrant_store.py
+++ b/metagpt/document_store/qdrant_store.py
@ -0,0 +1,129 @@
+from dataclasses import dataclass
+from typing import List
+
+from qdrant_client import QdrantClient
+from qdrant_client.models import Filter, PointStruct, VectorParams
+
+from metagpt.document_store.base_store import BaseStore
+
+
+@dataclass
+class QdrantConnection:
+    """
+   Args:
+       url: qdrant url
+       host: qdrant host
+       port: qdrant port
+       memory: qdrant service use memory mode
+       api_key: qdrant cloud api_key
+   """
+    url: str = None
+    host: str = None
+    port: int = None
+    memory: bool = False
+    api_key: str = None
+
+
+class QdrantStore(BaseStore):
+    def __init__(self, connect: QdrantConnection):
+        if connect.memory:
+            self.client = QdrantClient(":memory:")
+        elif connect.url:
+            self.client = QdrantClient(url=connect.url, api_key=connect.api_key)
+        elif connect.host and connect.port:
+            self.client = QdrantClient(
+                host=connect.host, port=connect.port, api_key=connect.api_key
+            )
+        else:
+            raise Exception("please check QdrantConnection.")
+
+    def create_collection(
+        self,
+        collection_name: str,
+        vectors_config: VectorParams,
+        force_recreate=False,
+        **kwargs,
+    ):
+        """
+        create a collection
+        Args:
+            collection_name: collection name
+            vectors_config: VectorParams object,detail in https://github.com/qdrant/qdrant-client
+            force_recreate: default is False, if True, will delete exists collection,then create it
+            **kwargs:
+
+        Returns:
+
+        """
+        try:
+            self.client.get_collection(collection_name)
+            if force_recreate:
+                res = self.client.recreate_collection(
+                    collection_name, vectors_config=vectors_config, **kwargs
+                )
+                return res
+            return True
+        except:  # noqa: E722
+            return self.client.recreate_collection(
+                collection_name, vectors_config=vectors_config, **kwargs
+            )
+
+    def has_collection(self, collection_name: str):
+        try:
+            self.client.get_collection(collection_name)
+            return True
+        except:  # noqa: E722
+            return False
+
+    def delete_collection(self, collection_name: str, timeout=60):
+        res = self.client.delete_collection(collection_name, timeout=timeout)
+        if not res:
+            raise Exception(f"Delete collection {collection_name} failed.")
+
+    def add(self, collection_name: str, points: List[PointStruct]):
+        """
+        add some vector data to qdrant
+        Args:
+            collection_name: collection name
+            points: list of PointStruct object, about PointStruct detail in https://github.com/qdrant/qdrant-client
+
+        Returns: NoneX
+
+        """
+        # self.client.upload_records()
+        self.client.upsert(
+            collection_name,
+            points,
+        )
+
+    def search(
+        self,
+        collection_name: str,
+        query: List[float],
+        query_filter: Filter = None,
+        k=10,
+        return_vector=False,
+    ):
+        """
+        vector search
+        Args:
+            collection_name: qdrant collection name
+            query: input vector
+            query_filter: Filter object, detail in https://github.com/qdrant/qdrant-client
+            k: return the most similar k pieces of data
+            return_vector: whether return vector
+
+        Returns: list of dict
+
+        """
+        hits = self.client.search(
+            collection_name=collection_name,
+            query_vector=query,
+            query_filter=query_filter,
+            limit=k,
+            with_vectors=return_vector,
+        )
+        return [hit.__dict__ for hit in hits]
+
+    def write(self, *args, **kwargs):
+        pass
--- a/metagpt/memory/init.py
+++ b/metagpt/memory/init.py
@ -9,3 +9,8 @@
 from metagpt.memory.memory import Memory
 from metagpt.memory.longterm_memory import LongTermMemory

+
+__all__ = [
+    "Memory",
+    "LongTermMemory",
+]
--- a/metagpt/memory/longterm_memory.py
+++ b/metagpt/memory/longterm_memory.py
@ -2,12 +2,10 @@
 # -*- coding: utf-8 -*-
 # @Desc   : the implement of Long-term memory

-from typing import Iterable, Type
-
 from metagpt.logs import logger
-from metagpt.schema import Message
 from metagpt.memory import Memory
 from metagpt.memory.memory_storage import MemoryStorage
+from metagpt.schema import Message


 class LongTermMemory(Memory):
@ -27,10 +25,11 @@ class LongTermMemory(Memory):
        messages = self.memory_storage.recover_memory(role_id)
        self.rc = rc
        if not self.memory_storage.is_initialized:
-            logger.warning(f'It may the first time to run Agent {role_id}, the long-term memory is empty')
+            logger.warning(f"It may the first time to run Agent {role_id}, the long-term memory is empty")
        else:
-            logger.warning(f'Agent {role_id} has existed memory storage with {len(messages)} messages '
-                           f'and has recovered them.')
+            logger.warning(
+                f"Agent {role_id} has existed memory storage with {len(messages)} messages " f"and has recovered them."
+            )
        self.msg_from_recover = True
        self.add_batch(messages)
        self.msg_from_recover = False
--- a/metagpt/provider/init.py
+++ b/metagpt/provider/init.py
@ -7,3 +7,6 @@
 """

 from metagpt.provider.openai_api import OpenAIGPTAPI
+
+
+__all__ = ["OpenAIGPTAPI"]
--- a/metagpt/provider/openai_api.py
+++ b/metagpt/provider/openai_api.py
@ -122,6 +122,15 @@ See FAQ 5.8
    raise retry_state.outcome.exception()


+def log_and_reraise(retry_state):
+    logger.error(f"Retry attempts exhausted. Last exception: {retry_state.outcome.exception()}")
+    logger.warning("""
+Recommend going to https://deepwisdom.feishu.cn/wiki/MsGnwQBjiif9c3koSJNcYaoSnu4#part-XdatdVlhEojeAfxaaEZcMV3ZniQ
+See FAQ 5.8
+""")
+    raise retry_state.outcome.exception()
+
+
 class OpenAIGPTAPI(BaseGPTAPI, RateLimiter):
    """
    Check https://platform.openai.com/examples for examples
@ -223,11 +232,16 @@ class OpenAIGPTAPI(BaseGPTAPI, RateLimiter):
    def _calc_usage(self, messages: list[dict], rsp: str) -> dict:
        usage = {}
        if CONFIG.calc_usage:
-            prompt_tokens = count_message_tokens(messages, self.model)
-            completion_tokens = count_string_tokens(rsp, self.model)
-            usage['prompt_tokens'] = prompt_tokens
-            usage['completion_tokens'] = completion_tokens
-        return usage
+            try:
+                prompt_tokens = count_message_tokens(messages, self.model)
+                completion_tokens = count_string_tokens(rsp, self.model)
+                usage['prompt_tokens'] = prompt_tokens
+                usage['completion_tokens'] = completion_tokens
+                return usage
+            except Exception as e:
+                logger.error("usage calculation failed!", e)
+        else:
+            return usage

    async def acompletion_batch(self, batch: list[list[dict]]) -> list[dict]:
        """Return full JSON"""
@ -256,10 +270,13 @@ class OpenAIGPTAPI(BaseGPTAPI, RateLimiter):
        return results

    def _update_costs(self, usage: dict):
-        if CONFIG.update_costs:
-            prompt_tokens = int(usage['prompt_tokens'])
-            completion_tokens = int(usage['completion_tokens'])
-            self._cost_manager.update_cost(prompt_tokens, completion_tokens, self.model)
+        if CONFIG.calc_usage:
+            try:
+                prompt_tokens = int(usage['prompt_tokens'])
+                completion_tokens = int(usage['completion_tokens'])
+                self._cost_manager.update_cost(prompt_tokens, completion_tokens, self.model)
+            except Exception as e:
+                logger.error("updating costs failed!", e)

    def get_costs(self) -> Costs:
        return self._cost_manager.get_costs()
--- a/metagpt/roles/init.py
+++ b/metagpt/roles/init.py
@ -8,10 +8,23 @@

 from metagpt.roles.role import Role
 from metagpt.roles.architect import Architect
-from metagpt.roles.product_manager import ProductManager
 from metagpt.roles.project_manager import ProjectManager
+from metagpt.roles.product_manager import ProductManager
 from metagpt.roles.engineer import Engineer
 from metagpt.roles.qa_engineer import QaEngineer
 from metagpt.roles.seacher import Searcher
 from metagpt.roles.sales import Sales
 from metagpt.roles.customer_service import CustomerService
+
+
+__all__ = [
+    "Role",
+    "Architect",
+    "ProjectManager",
+    "ProductManager",
+    "Engineer",
+    "QaEngineer",
+    "Searcher",
+    "Sales",
+    "CustomerService",
+]
--- a/metagpt/roles/qa_engineer.py
+++ b/metagpt/roles/qa_engineer.py
@ -6,40 +6,44 @@
@File    : qa_engineer.py
 """
 import os
-import re
 from pathlib import Path
-from typing import Type

-from metagpt.actions import WriteTest, WriteCode, WriteDesign, RunCode, DebugError
+from metagpt.actions import DebugError, RunCode, WriteCode, WriteDesign, WriteTest
 from metagpt.const import WORKSPACE_ROOT
 from metagpt.logs import logger
 from metagpt.roles import Role
 from metagpt.schema import Message
-from metagpt.roles.engineer import Engineer
 from metagpt.utils.common import CodeParser, parse_recipient
-from metagpt.utils.special_tokens import MSG_SEP, FILENAME_CODE_SEP
+from metagpt.utils.special_tokens import FILENAME_CODE_SEP, MSG_SEP
+

 class QaEngineer(Role):
-    def __init__(self, name="Edward", profile="QaEngineer",
-                 goal="Write comprehensive and robust tests to ensure codes will work as expected without bugs",
-                 constraints="The test code you write should conform to code standard like PEP8, be modular, easy to read and maintain",
-                 test_round_allowed=5):
+    def __init__(
+        self,
+        name="Edward",
+        profile="QaEngineer",
+        goal="Write comprehensive and robust tests to ensure codes will work as expected without bugs",
+        constraints="The test code you write should conform to code standard like PEP8, be modular, easy to read and maintain",
+        test_round_allowed=5,
+    ):
        super().__init__(name, profile, goal, constraints)
-        self._init_actions([WriteTest]) # FIXME: a bit hack here, only init one action to circumvent _think() logic, will overwrite _think() in future updates
+        self._init_actions(
+            [WriteTest]
+        )  # FIXME: a bit hack here, only init one action to circumvent _think() logic, will overwrite _think() in future updates
        self._watch([WriteCode, WriteTest, RunCode, DebugError])
        self.test_round = 0
        self.test_round_allowed = test_round_allowed
-    
+
    @classmethod
    def parse_workspace(cls, system_design_msg: Message) -> str:
        if not system_design_msg.instruct_content:
            return system_design_msg.instruct_content.dict().get("Python package name")
        return CodeParser.parse_str(block="Python package name", text=system_design_msg.content)
-    
+
    def get_workspace(self, return_proj_dir=True) -> Path:
        msg = self._rc.memory.get_by_action(WriteDesign)[-1]
        if not msg:
-            return WORKSPACE_ROOT / 'src'
+            return WORKSPACE_ROOT / "src"
        workspace = self.parse_workspace(msg)
        # project directory: workspace/{package_name}, which contains package source code folder, tests folder, resources folder, etc.
        if return_proj_dir:
@ -48,49 +52,52 @@ class QaEngineer(Role):
        return WORKSPACE_ROOT / workspace / workspace

    def write_file(self, filename: str, code: str):
-        workspace = self.get_workspace() / 'tests'
+        workspace = self.get_workspace() / "tests"
        file = workspace / filename
        file.parent.mkdir(parents=True, exist_ok=True)
        file.write_text(code)

    async def _write_test(self, message: Message) -> None:
-
        code_msgs = message.content.split(MSG_SEP)
-        result_msg_all = []
+        # result_msg_all = []
        for code_msg in code_msgs:
-
            # write tests
            file_name, file_path = code_msg.split(FILENAME_CODE_SEP)
            code_to_test = open(file_path, "r").read()
            if "test" in file_name:
-                continue # Engineer might write some test files, skip testing a test file
+                continue  # Engineer might write some test files, skip testing a test file
            test_file_name = "test_" + file_name
            test_file_path = self.get_workspace() / "tests" / test_file_name
-            logger.info(f'Writing {test_file_name}..')
+            logger.info(f"Writing {test_file_name}..")
            test_code = await WriteTest().run(
                code_to_test=code_to_test,
                test_file_name=test_file_name,
                # source_file_name=file_name,
                source_file_path=file_path,
-                workspace=self.get_workspace()
+                workspace=self.get_workspace(),
            )
            self.write_file(test_file_name, test_code)

            # prepare context for run tests in next round
-            command = ['python', f'tests/{test_file_name}']
+            command = ["python", f"tests/{test_file_name}"]
            file_info = {
-                "file_name": file_name, "file_path": str(file_path),
-                "test_file_name": test_file_name, "test_file_path": str(test_file_path),
-                "command": command
+                "file_name": file_name,
+                "file_path": str(file_path),
+                "test_file_name": test_file_name,
+                "test_file_path": str(test_file_path),
+                "command": command,
            }
            msg = Message(
-                content=str(file_info), role=self.profile, cause_by=WriteTest,
-                sent_from=self.profile, send_to=self.profile
+                content=str(file_info),
+                role=self.profile,
+                cause_by=WriteTest,
+                sent_from=self.profile,
+                send_to=self.profile,
            )
            self._publish_message(msg)
-        
-        logger.info(f'Done {self.get_workspace()}/tests generating.')
-    
+
+        logger.info(f"Done {self.get_workspace()}/tests generating.")
+
    async def _run_code(self, msg):
        file_info = eval(msg.content)
        development_file_path = file_info["file_path"]
@ -110,17 +117,14 @@ class QaEngineer(Role):
            test_code=test_code,
            test_file_name=file_info["test_file_name"],
            command=file_info["command"],
-            working_directory=proj_dir, # workspace/package_name, will run tests/test_xxx.py here
-            additional_python_paths=[development_code_dir], # workspace/package_name/package_name,
-                                                            # import statement inside package code needs this
+            working_directory=proj_dir,  # workspace/package_name, will run tests/test_xxx.py here
+            additional_python_paths=[development_code_dir],  # workspace/package_name/package_name,
+            # import statement inside package code needs this
        )

-        recipient = parse_recipient(result_msg) # the recipient might be Engineer or myself
+        recipient = parse_recipient(result_msg)  # the recipient might be Engineer or myself
        content = str(file_info) + FILENAME_CODE_SEP + result_msg
-        msg = Message(
-            content=content, role=self.profile, cause_by=RunCode,
-            sent_from=self.profile, send_to=recipient
-        )
+        msg = Message(content=content, role=self.profile, cause_by=RunCode, sent_from=self.profile, send_to=recipient)
        self._publish_message(msg)

    async def _debug_error(self, msg):
@ -128,21 +132,27 @@ class QaEngineer(Role):
        file_name, code = await DebugError().run(context)
        if file_name:
            self.write_file(file_name, code)
-            recipient = msg.sent_from # send back to the one who ran the code for another run, might be one's self
-            msg = Message(content=file_info, role=self.profile, cause_by=DebugError, sent_from=self.profile, send_to=recipient)
+            recipient = msg.sent_from  # send back to the one who ran the code for another run, might be one's self
+            msg = Message(
+                content=file_info, role=self.profile, cause_by=DebugError, sent_from=self.profile, send_to=recipient
+            )
            self._publish_message(msg)
-    
+
    async def _observe(self) -> int:
        await super()._observe()
-        self._rc.news = [msg for msg in self._rc.news \
-            if msg.send_to == self.profile] # only relevant msgs count as observed news
+        self._rc.news = [
+            msg for msg in self._rc.news if msg.send_to == self.profile
+        ]  # only relevant msgs count as observed news
        return len(self._rc.news)

    async def _act(self) -> Message:
        if self.test_round > self.test_round_allowed:
            result_msg = Message(
                content=f"Exceeding {self.test_round_allowed} rounds of tests, skip (writing code counts as a round, too)",
-                role=self.profile, cause_by=WriteTest, sent_from=self.profile, send_to=""
+                role=self.profile,
+                cause_by=WriteTest,
+                sent_from=self.profile,
+                send_to="",
            )
            return result_msg

@ -161,6 +171,9 @@ class QaEngineer(Role):
        self.test_round += 1
        result_msg = Message(
            content=f"Round {self.test_round} of tests done",
-            role=self.profile, cause_by=WriteTest, sent_from=self.profile, send_to=""
+            role=self.profile,
+            cause_by=WriteTest,
+            sent_from=self.profile,
+            send_to="",
        )
        return result_msg
--- a/metagpt/roles/researcher.py
+++ b/metagpt/roles/researcher.py
@ -0,0 +1,93 @@
+#!/usr/bin/env python
+
+import asyncio
+
+from pydantic import BaseModel
+
+from metagpt.actions import CollectLinks, ConductResearch, WebBrowseAndSummarize
+from metagpt.actions.research import get_research_system_text
+from metagpt.const import RESEARCH_PATH
+from metagpt.logs import logger
+from metagpt.roles import Role
+from metagpt.schema import Message
+
+
+class Report(BaseModel):
+    topic: str
+    links: dict[str, list[str]] = None
+    summaries: list[tuple[str, str]] = None
+    content: str = ""
+
+
+class Researcher(Role):
+    def __init__(
+        self,
+        name: str = "David",
+        profile: str = "Researcher",
+        goal: str = "Gather information and conduct research",
+        constraints: str = "Ensure accuracy and relevance of information",
+        language: str = "en-us",
+        **kwargs,
+    ):
+        super().__init__(name, profile, goal, constraints, **kwargs)
+        self._init_actions([CollectLinks(name), WebBrowseAndSummarize(name), ConductResearch(name)])
+        self.language = language
+        if language not in ("en-us", "zh-cn"):
+            logger.warning(f"The language `{language}` has not been tested, it may not work.")
+
+    async def _think(self) -> None:
+        if self._rc.todo is None:
+            self._set_state(0)
+            return
+
+        if self._rc.state + 1 < len(self._states):
+            self._set_state(self._rc.state + 1)
+        else:
+            self._rc.todo = None
+
+    async def _act(self) -> Message:
+        logger.info(f"{self._setting}: ready to {self._rc.todo}")
+        todo = self._rc.todo
+        msg = self._rc.memory.get(k=1)[0]
+        if isinstance(msg.instruct_content, Report):
+            instruct_content = msg.instruct_content
+            topic = instruct_content.topic
+        else:
+            topic = msg.content
+
+        research_system_text = get_research_system_text(topic, self.language)
+        if isinstance(todo, CollectLinks):
+            links = await todo.run(topic, 4, 4)
+            ret = Message("", Report(topic=topic, links=links), role=self.profile, cause_by=type(todo))
+        elif isinstance(todo, WebBrowseAndSummarize):
+            links = instruct_content.links
+            todos = (todo.run(*url, query=query, system_text=research_system_text) for (query, url) in links.items())
+            summaries = await asyncio.gather(*todos)
+            summaries = list((url, summary) for i in summaries for (url, summary) in i.items() if summary)
+            ret = Message("", Report(topic=topic, summaries=summaries), role=self.profile, cause_by=type(todo))
+        else:
+            summaries = instruct_content.summaries
+            summary_text = "\n---\n".join(f"url: {url}\nsummary: {summary}" for (url, summary) in summaries)
+            content = await self._rc.todo.run(topic, summary_text, system_text=research_system_text)
+            ret = Message("", Report(topic=topic, content=content), role=self.profile, cause_by=type(self._rc.todo))
+        self._rc.memory.add(ret)
+        return ret
+
+    async def _react(self) -> Message:
+        while True:
+            await self._think()
+            if self._rc.todo is None:
+                break
+            msg = await self._act()
+        report = msg.instruct_content
+        self.write_report(report.topic, report.content)
+        return msg
+
+    def write_report(self, topic: str, content: str):
+        filepath = RESEARCH_PATH / f"{topic}.md"
+        filepath.write_text(content)
+
+
+if __name__ == "__main__":
+    role = Researcher(language="en-us")
+    asyncio.run(role.run("dataiku vs. datarobot"))
--- a/metagpt/tools/init.py
+++ b/metagpt/tools/init.py
@ -14,6 +14,7 @@ class SearchEngineType(Enum):
    SERPAPI_GOOGLE = auto()
    DIRECT_GOOGLE = auto()
    SERPER_GOOGLE = auto()
+    DUCK_DUCK_GO = auto()
    CUSTOM_ENGINE = auto()


--- a/metagpt/tools/search_engine_ddg.py
+++ b/metagpt/tools/search_engine_ddg.py
@ -0,0 +1,107 @@
+#!/usr/bin/env python
+
+from __future__ import annotations
+
+import asyncio
+import json
+from concurrent import futures
+from typing import Literal, overload
+
+from duckduckgo_search import DDGS
+from googleapiclient.errors import HttpError
+
+from metagpt.config import CONFIG
+from metagpt.logs import logger
+
+
+class DDGAPIWrapper:
+    """Wrapper around duckduckgo_search API.
+
+    To use this module, you should have the `duckduckgo_search` Python package installed.
+    """
+    def __init__(
+        self,
+        *,
+        loop: asyncio.AbstractEventLoop | None = None,
+        executor: futures.Executor | None = None,
+    ):
+        kwargs = {}
+        if CONFIG.global_proxy:
+            kwargs["proxies"] = CONFIG.global_proxy
+        self.loop = loop
+        self.executor = executor
+        self.ddgs = DDGS(**kwargs)
+
+    @overload
+    def run(
+        self,
+        query: str,
+        max_results: int = 8,
+        as_string: Literal[True] = True,
+        focus: list[str] | None = None,
+    ) -> str:
+        ...
+
+    @overload
+    def run(
+        self,
+        query: str,
+        max_results: int = 8,
+        as_string: Literal[False] = False,
+        focus: list[str] | None = None,
+    ) -> list[dict[str, str]]:
+        ...
+
+    async def run(
+        self,
+        query: str,
+        max_results: int = 8,
+        as_string: bool = True,
+    ) -> str | list[dict]:
+        """Return the results of a Google search using the official Google API
+
+        Args:
+            query: The search query.
+            max_results: The number of results to return.
+            as_string: A boolean flag to determine the return type of the results. If True, the function will
+                return a formatted string with the search results. If False, it will return a list of dictionaries
+                containing detailed information about each search result.
+
+        Returns:
+            The results of the search.
+        """
+        loop = self.loop or asyncio.get_event_loop()
+        future = loop.run_in_executor(
+            self.executor,
+            self._search_from_ddgs,
+            query,
+            max_results,
+        )
+        try:
+            search_results = await future
+            # Extract the search result items from the response
+
+        except HttpError as e:
+            # Handle errors in the API call
+            logger.exception(f"fail to search {query} for {e}")
+            search_results = []
+        
+        # Return the list of search result URLs
+        if as_string:
+            return json.dumps(search_results, ensure_ascii=False)
+        return search_results
+
+    def _search_from_ddgs(self, query: str, max_results: int):
+        return [
+            {
+                "link": i["href"],
+                "snippet": i["body"],
+                "title": i["title"]
+            } for (_, i) in zip(range(max_results), self.ddgs.text(query))
+        ]
+
+
+if __name__ == "__main__":
+    import fire
+
+    fire.Fire(DDGAPIWrapper().run)
--- a/metagpt/tools/search_engine_googleapi.py
+++ b/metagpt/tools/search_engine_googleapi.py
@ -0,0 +1,117 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+from __future__ import annotations
+
+import asyncio
+import json
+from concurrent import futures
+from urllib.parse import urlparse
+
+import httplib2
+from googleapiclient.discovery import build
+from googleapiclient.errors import HttpError
+
+from metagpt.config import CONFIG
+from metagpt.logs import logger
+
+
+class GoogleAPIWrapper:
+    """Wrapper around GoogleAPI.
+
+    To use this module, you should have the `google-api-python-client` Python package installed
+    and set property values for the configurations `GOOGLE_API_KEY` and `GOOGLE_CSE_ID`. See 
+    https://programmablesearchengine.google.com/controlpanel/all.
+    """
+    def __init__(
+        self,
+        *,
+        loop: asyncio.AbstractEventLoop | None = None,
+        executor: futures.Executor | None = None,
+    ):
+        build_kwargs = {"developerKey": CONFIG.google_api_key}
+        if CONFIG.global_proxy:
+            parse_result = urlparse(CONFIG.global_proxy)
+            proxy_type = parse_result.scheme
+            if proxy_type == "https":
+                proxy_type = "http"
+            build_kwargs["http"] = httplib2.Http(
+                proxy_info=httplib2.ProxyInfo(
+                    getattr(httplib2.socks, f"PROXY_TYPE_{proxy_type.upper()}"),
+                    parse_result.hostname,
+                    parse_result.port,
+                ),
+            )
+        service = build("customsearch", "v1", **build_kwargs)
+        self.google_api_client = service.cse()
+        self.custom_search_engine_id = CONFIG.google_cse_id
+        self.loop = loop
+        self.executor = executor
+
+    async def run(
+        self,
+        query: str,
+        max_results: int = 8,
+        as_string: bool = True,
+        focus: list[str] | None = None,
+    ) -> str | list[dict]:
+        """Return the results of a Google search using the official Google API.
+
+        Args:
+            query: The search query.
+            max_results: The number of results to return.
+            as_string: A boolean flag to determine the return type of the results. If True, the function will
+                return a formatted string with the search results. If False, it will return a list of dictionaries
+                containing detailed information about each search result.
+            focus: Specific information to be focused on from each search result.
+
+        Returns:
+            The results of the search.
+        """
+        loop = self.loop or asyncio.get_event_loop()
+        future = loop.run_in_executor(
+            self.executor,
+            self.google_api_client.list(
+                q=query,
+                num=max_results,
+                cx=self.custom_search_engine_id
+            ).execute
+        )
+        try:
+            result = await future
+            # Extract the search result items from the response
+            search_results = result.get("items", [])
+
+        except HttpError as e:
+            # Handle errors in the API call
+            logger.exception(f"fail to search {query} for {e}")
+            search_results = []
+        
+        focus = focus or ["snippet", "link", "title"]
+        details = [{i: j for i, j in item_dict.items() if i in focus} for item_dict in search_results]
+        # Return the list of search result URLs
+        if as_string:
+            return safe_google_results(details)
+        
+        return details
+
+
+def safe_google_results(results: str | list) -> str:
+    """Return the results of a google search in a safe format.
+
+    Args:
+        results: The search results.
+
+    Returns:
+        The results of the search.
+    """
+    if isinstance(results, list):
+        safe_message = json.dumps([result for result in results])
+    else:
+        safe_message = results.encode("utf-8", "ignore").decode("utf-8")
+    return safe_message
+
+
+if __name__ == "__main__":
+    import fire
+
+    fire.Fire(GoogleAPIWrapper().run)
--- a/metagpt/utils/init.py
+++ b/metagpt/utils/init.py
@ -13,3 +13,12 @@ from metagpt.utils.token_counter import (
    count_message_tokens,
    count_string_tokens,
 )
+
+
+__all__ = [
+    "read_docx",
+    "Singleton",
+    "TOKEN_COSTS",
+    "count_message_tokens",
+    "count_string_tokens",
+]
--- a/metagpt/utils/parse_html.py
+++ b/metagpt/utils/parse_html.py
@ -0,0 +1,57 @@
+#!/usr/bin/env python
+from __future__ import annotations
+
+from typing import Generator, Optional
+from urllib.parse import urljoin, urlparse
+
+from bs4 import BeautifulSoup
+from pydantic import BaseModel
+
+
+class WebPage(BaseModel):
+    inner_text: str
+    html: str
+    url: str
+
+    class Config:
+        underscore_attrs_are_private = True
+
+    _soup : Optional[BeautifulSoup] = None
+    _title: Optional[str] = None
+
+    @property
+    def soup(self) -> BeautifulSoup:
+        if self._soup is None:
+            self._soup = BeautifulSoup(self.html, "html.parser")
+        return self._soup
+    
+    @property
+    def title(self):
+        if self._title is None:
+            title_tag = self.soup.find("title")
+            self._title = title_tag.text.strip() if title_tag is not None else ""
+        return self._title
+
+    def get_links(self) -> Generator[str, None, None]:
+        for i in self.soup.find_all("a", href=True):
+            url = i["href"]
+            result = urlparse(url)
+            if not result.scheme and result.path:
+                yield urljoin(self.url, url)
+            elif url.startswith(("http://", "https://")):
+                yield urljoin(self.url, url)
+
+
+def get_html_content(page: str, base: str):
+    soup = _get_soup(page)
+
+    return soup.get_text(strip=True)
+
+
+def _get_soup(page: str):
+    soup = BeautifulSoup(page, "html.parser")
+    # https://stackoverflow.com/questions/1936466/how-to-scrape-only-visible-webpage-text-with-beautifulsoup
+    for s in soup(["style", "script", "[document]", "head", "title"]):
+        s.extract()
+
+    return soup
--- a/metagpt/utils/serialize.py
+++ b/metagpt/utils/serialize.py
@ -3,14 +3,11 @@
 # @Desc   : the implement of serialization and deserialization

 import copy
-from typing import Tuple, List, Type, Union, Dict
 import pickle
-from collections import defaultdict
-from pydantic import create_model
+from typing import Dict, List, Tuple

-from metagpt.schema import Message
-from metagpt.actions.action import Action
 from metagpt.actions.action_output import ActionOutput
+from metagpt.schema import Message


 def actionoutout_schema_to_mapping(schema: Dict) -> Dict:
@ -34,12 +31,12 @@ def actionoutout_schema_to_mapping(schema: Dict) -> Dict:
    ```
    """
    mapping = dict()
-    for field, property in schema['properties'].items():
-        if property['type'] == 'string':
+    for field, property in schema["properties"].items():
+        if property["type"] == "string":
            mapping[field] = (str, ...)
-        elif property['type'] == 'array' and property['items']['type'] == 'string':
+        elif property["type"] == "array" and property["items"]["type"] == "string":
            mapping[field] = (List[str], ...)
-        elif property['type'] == 'array' and property['items']['type'] == 'array':
+        elif property["type"] == "array" and property["items"]["type"] == "array":
            # here only consider the `Tuple[str, str]` situation
            mapping[field] = (List[Tuple[str, str]], ...)
    return mapping
@ -53,11 +50,7 @@ def serialize_message(message: Message):
        schema = ic.schema()
        mapping = actionoutout_schema_to_mapping(schema)

-        message_cp.instruct_content = {
-            'class': schema['title'],
-            'mapping': mapping,
-            'value': ic.dict()
-        }
+        message_cp.instruct_content = {"class": schema["title"], "mapping": mapping, "value": ic.dict()}
    msg_ser = pickle.dumps(message_cp)

    return msg_ser
@ -67,9 +60,8 @@ def deserialize_message(message_ser: str) -> Message:
    message = pickle.loads(message_ser)
    if message.instruct_content:
        ic = message.instruct_content
-        ic_obj = ActionOutput.create_model_class(class_name=ic['class'],
-                                                 mapping=ic['mapping'])
-        ic_new = ic_obj(**ic['value'])
+        ic_obj = ActionOutput.create_model_class(class_name=ic["class"], mapping=ic["mapping"])
+        ic_new = ic_obj(**ic["value"])
        message.instruct_content = ic_new

    return message
--- a/metagpt/utils/text.py
+++ b/metagpt/utils/text.py
@ -0,0 +1,124 @@
+from typing import Generator, Sequence
+
+from metagpt.utils.token_counter import TOKEN_MAX, count_string_tokens
+
+
+def reduce_message_length(msgs: Generator[str, None, None], model_name: str, system_text: str, reserved: int = 0,) -> str:
+    """Reduce the length of concatenated message segments to fit within the maximum token size.
+
+    Args:
+        msgs: A generator of strings representing progressively shorter valid prompts.
+        model_name: The name of the encoding to use. (e.g., "gpt-3.5-turbo")
+        system_text: The system prompts.
+        reserved: The number of reserved tokens.
+
+    Returns:
+        The concatenated message segments reduced to fit within the maximum token size.
+
+    Raises:
+        RuntimeError: If it fails to reduce the concatenated message length.
+    """
+    max_token = TOKEN_MAX.get(model_name, 2048) - count_string_tokens(system_text, model_name) - reserved
+    for msg in msgs:
+        if count_string_tokens(msg, model_name) < max_token:
+            return msg
+
+    raise RuntimeError("fail to reduce message length")
+
+
+def generate_prompt_chunk(
+    text: str,
+    prompt_template: str,
+    model_name: str,
+    system_text: str,
+    reserved: int = 0,
+) -> Generator[str, None, None]:
+    """Split the text into chunks of a maximum token size.
+
+    Args:
+        text: The text to split.
+        prompt_template: The template for the prompt, containing a single `{}` placeholder. For example, "### Reference\n{}".
+        model_name: The name of the encoding to use. (e.g., "gpt-3.5-turbo")
+        system_text: The system prompts.
+        reserved: The number of reserved tokens.
+
+    Yields:
+        The chunk of text.
+    """
+    paragraphs = text.splitlines(keepends=True)
+    current_token = 0
+    current_lines = []
+
+    reserved = reserved + count_string_tokens(prompt_template+system_text, model_name)
+    # 100 is a magic number to ensure the maximum context length is not exceeded
+    max_token = TOKEN_MAX.get(model_name, 2048) - reserved - 100  
+
+    while paragraphs:
+        paragraph = paragraphs.pop(0)
+        token = count_string_tokens(paragraph, model_name)
+        if current_token + token <= max_token:
+            current_lines.append(paragraph)
+            current_token += token
+        elif token > max_token:
+            paragraphs = split_paragraph(paragraph) + paragraphs
+            continue
+        else:
+            yield prompt_template.format("".join(current_lines))
+            current_lines = [paragraph]
+            current_token = token
+
+    if current_lines:
+        yield prompt_template.format("".join(current_lines))
+
+
+def split_paragraph(paragraph: str, sep: str = ".,", count: int = 2) -> list[str]:
+    """Split a paragraph into multiple parts.
+
+    Args:
+        paragraph: The paragraph to split.
+        sep: The separator character.
+        count: The number of parts to split the paragraph into.
+
+    Returns:
+        A list of split parts of the paragraph.
+    """
+    for i in sep:
+        sentences = list(_split_text_with_ends(paragraph, i))
+        if len(sentences) <= 1:
+            continue
+        ret = ["".join(j) for j in _split_by_count(sentences, count)]
+        return ret
+    return _split_by_count(paragraph, count)
+
+
+def decode_unicode_escape(text: str) -> str:
+    """Decode a text with unicode escape sequences.
+
+    Args:
+        text: The text to decode.
+
+    Returns:
+        The decoded text.
+    """
+    return text.encode("utf-8").decode("unicode_escape", "ignore")
+
+
+def _split_by_count(lst: Sequence , count: int):
+    avg = len(lst) // count
+    remainder = len(lst) % count
+    start = 0
+    for i in range(count):
+        end = start + avg + (1 if i < remainder else 0)
+        yield lst[start:end]
+        start = end
+
+
+def _split_text_with_ends(text: str, sep: str = "."):
+    parts = []
+    for i in text:
+        parts.append(i)
+        if i == sep:
+            yield "".join(parts)
+            parts = []
+    if parts:
+        yield "".join(parts)
--- a/metagpt/utils/token_counter.py
+++ b/metagpt/utils/token_counter.py
@ -25,6 +25,21 @@ TOKEN_COSTS = {
 }


+TOKEN_MAX = {
+    "gpt-3.5-turbo": 4096,
+    "gpt-3.5-turbo-0301": 4096,
+    "gpt-3.5-turbo-0613": 4096,
+    "gpt-3.5-turbo-16k": 16384,
+    "gpt-3.5-turbo-16k-0613": 16384,
+    "gpt-4-0314": 8192,
+    "gpt-4": 8192,
+    "gpt-4-32k": 32768,
+    "gpt-4-32k-0314": 32768,
+    "gpt-4-0613": 8192,
+    "text-embedding-ada-002": 8192,
+}
+
+
 def count_message_tokens(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
@ -39,7 +54,7 @@ def count_message_tokens(messages, model="gpt-3.5-turbo-0613"):
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
-        }:
+    }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
@ -79,3 +94,18 @@ def count_string_tokens(string: str, model_name: str) -> int:
    """
    encoding = tiktoken.encoding_for_model(model_name)
    return len(encoding.encode(string))
+
+
+def get_max_completion_tokens(messages: list[dict], model: str, default: int) -> int: 
+    """Calculate the maximum number of completion tokens for a given model and list of messages.
+
+    Args:
+        messages: A list of messages.
+        model: The model name.
+
+    Returns:
+        The maximum number of completion tokens.
+    """
+    if model not in TOKEN_MAX:
+        return default
+    return TOKEN_MAX[model] - count_message_tokens(messages)
--- a/requirements.txt
+++ b/requirements.txt
@ -16,8 +16,9 @@ meilisearch==0.21.0
 numpy==1.24.3
 openai==0.27.8
 openpyxl
-pandas==1.4.1
-pydantic==1.10.7
+beautifulsoup4==4.12.2
+pandas==2.0.3
+pydantic==1.10.8
 #pygame==2.1.3
 #pymilvus==2.2.8
 pytest==7.2.2
@ -36,3 +37,4 @@ anthropic==0.3.6
 typing-inspect==0.8.0
 typing_extensions==4.5.0
 libcst==1.0.1
+qdrant-client==1.4.0
--- a/ruff.toml
+++ b/ruff.toml
@ -0,0 +1,40 @@
+select = ["E", "F"]
+ignore = ["E501", "E712", "E722", "F821", "E731"]
+
+# Allow autofix for all enabled rules (when `--fix`) is provided.
+fixable = ["A", "B", "C", "D", "E", "F", "G", "I", "N", "Q", "S", "T", "W", "ANN", "ARG", "BLE", "COM", "DJ", "DTZ", "EM", "ERA", "EXE", "FBT", "ICN", "INP", "ISC", "NPY", "PD", "PGH", "PIE", "PL", "PT", "PTH", "PYI", "RET", "RSE", "RUF", "SIM", "SLF", "TCH", "TID", "TRY", "UP", "YTT"]
+unfixable = []
+
+# Exclude a variety of commonly ignored directories.
+exclude = [
+    ".bzr",
+    ".direnv",
+    ".eggs",
+    ".git",
+    ".git-rewrite",
+    ".hg",
+    ".mypy_cache",
+    ".nox",
+    ".pants.d",
+    ".pytype",
+    ".ruff_cache",
+    ".svn",
+    ".tox",
+    ".venv",
+    "__pypackages__",
+    "_build",
+    "buck-out",
+    "build",
+    "dist",
+    "node_modules",
+    "venv",
+]
+
+# Same as Black.
+line-length = 119
+
+# Allow unused variables when underscore-prefixed.
+dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
+
+# Assume Python 3.9
+target-version = "py39"
--- a/setup.py
+++ b/setup.py
@ -44,7 +44,7 @@ setup(
    install_requires=requirements,
    extras_require={
        "playwright": ["playwright>=1.26", "beautifulsoup4"],
-        "selenium": ["selenium>4", "webdriver_manager<3.9", "beautifulsoup4"],
+        "selenium": ["selenium>4", "webdriver_manager", "beautifulsoup4"],
    },
    cmdclass={
        "install_mermaid": InstallMermaidCLI,
--- a/tests/metagpt/actions/test_run_code.py
+++ b/tests/metagpt/actions/test_run_code.py
@ -6,24 +6,23 @@
@File    : test_run_code.py
 """
 import pytest
-import asyncio
+
 from metagpt.actions.run_code import RunCode

+
@pytest.mark.asyncio
 async def test_run_text():
-    action = RunCode()
-    result, errs = await RunCode.run_text('result = 1 + 1')
+    result, errs = await RunCode.run_text("result = 1 + 1")
    assert result == 2
    assert errs == ""

-    result, errs = await RunCode.run_text('result = 1 / 0')
+    result, errs = await RunCode.run_text("result = 1 / 0")
    assert result == ""
    assert "ZeroDivisionError" in errs

+
@pytest.mark.asyncio
 async def test_run_script():
-    action = RunCode()
-    
    # Successful command
    out, err = await RunCode.run_script(".", command=["echo", "Hello World"])
    assert out.strip() == "Hello World"
@ -33,6 +32,7 @@ async def test_run_script():
    out, err = await RunCode.run_script(".", command=["python", "-c", "print(1/0)"])
    assert "ZeroDivisionError" in err

+
@pytest.mark.asyncio
 async def test_run():
    action = RunCode()
@ -47,10 +47,11 @@ async def test_run():
        test_file_name="",
        command=["echo", "Hello World"],
        working_directory=".",
-        additional_python_paths=[]
+        additional_python_paths=[],
    )
    assert "PASS" in result

+
@pytest.mark.asyncio
 async def test_run_failure():
    action = RunCode()
@ -65,6 +66,6 @@ async def test_run_failure():
        test_file_name="",
        command=["python", "-c", "print(1/0)"],
        working_directory=".",
-        additional_python_paths=[]
+        additional_python_paths=[],
    )
-    assert "FAIL" in result
+    assert "FAIL" in result
--- a/tests/metagpt/actions/test_write_code_review.py
+++ b/tests/metagpt/actions/test_write_code_review.py
@ -8,8 +8,6 @@
 import pytest

 from metagpt.actions.write_code_review import WriteCodeReview
-from metagpt.logs import logger
-from tests.metagpt.actions.mock import SEARCH_CODE_SAMPLE


@pytest.mark.asyncio
@ -20,11 +18,7 @@ def add(a, b):
 """
    # write_code_review = WriteCodeReview("write_code_review")

-    code = await WriteCodeReview().run(
-        context="编写一个从a加b的函数，返回a+b",
-        code=code,
-        filename="math.py"
-    )
+    code = await WriteCodeReview().run(context="编写一个从a加b的函数，返回a+b", code=code, filename="math.py")

    # 我们不能精确地预测生成的代码评审，但我们可以检查返回的是否为字符串
    assert isinstance(code, str)
@ -33,6 +27,7 @@ def add(a, b):
    captured = capfd.readouterr()
    print(f"输出内容: {captured.out}")

+
 # @pytest.mark.asyncio
 # async def test_write_code_review_directly():
 #     code = SEARCH_CODE_SAMPLE
--- a/tests/metagpt/document_store/test_qdrant_store.py
+++ b/tests/metagpt/document_store/test_qdrant_store.py
@ -0,0 +1,77 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+"""
+@Time    : 2023/6/11 21:08
+@Author  : hezhaozhao
+@File    : test_qdrant_store.py
+"""
+import random
+
+from qdrant_client.models import (
+    Distance,
+    FieldCondition,
+    Filter,
+    PointStruct,
+    Range,
+    VectorParams,
+)
+
+from metagpt.document_store.qdrant_store import QdrantConnection, QdrantStore
+
+seed_value = 42
+random.seed(seed_value)
+
+vectors = [[random.random() for _ in range(2)] for _ in range(10)]
+
+points = [
+    PointStruct(
+        id=idx, vector=vector, payload={"color": "red", "rand_number": idx % 10}
+    )
+    for idx, vector in enumerate(vectors)
+]
+
+
+def test_milvus_store():
+    qdrant_connection = QdrantConnection(memory=True)
+    vectors_config = VectorParams(size=2, distance=Distance.COSINE)
+    qdrant_store = QdrantStore(qdrant_connection)
+    qdrant_store.create_collection("Book", vectors_config, force_recreate=True)
+    assert qdrant_store.has_collection("Book") is True
+    qdrant_store.delete_collection("Book")
+    assert qdrant_store.has_collection("Book") is False
+    qdrant_store.create_collection("Book", vectors_config)
+    assert qdrant_store.has_collection("Book") is True
+    qdrant_store.add("Book", points)
+    results = qdrant_store.search("Book", query=[1.0, 1.0])
+    assert results[0]["id"] == 2
+    assert results[0]["score"] == 0.999106722578389
+    assert results[1]["score"] == 7
+    assert results[1]["score"] == 0.9961650411397226
+    results = qdrant_store.search("Book", query=[1.0, 1.0], return_vector=True)
+    assert results[0]["id"] == 2
+    assert results[0]["score"] == 0.999106722578389
+    assert results[0]["vector"] == [0.7363563179969788, 0.6765939593315125]
+    assert results[1]["score"] == 7
+    assert results[1]["score"] == 0.9961650411397226
+    assert results[1]["vector"] == [0.7662628889083862, 0.6425272226333618]
+    results = qdrant_store.search(
+        "Book",
+        query=[1.0, 1.0],
+        query_filter=Filter(
+            must=[FieldCondition(key="rand_number", range=Range(gte=8))]
+        ),
+    )
+    assert results[0]["id"] == 8
+    assert results[0]["score"] == 0.9100373450784073
+    assert results[1]["id"] == 9
+    assert results[1]["score"] == 0.7127610621127889
+    results = qdrant_store.search(
+        "Book",
+        query=[1.0, 1.0],
+        query_filter=Filter(
+            must=[FieldCondition(key="rand_number", range=Range(gte=8))]
+        ),
+        return_vector=True,
+    )
+    assert results[0]["vector"] == [0.35037919878959656, 0.9366079568862915]
+    assert results[1]["vector"] == [0.9999677538871765, 0.00802854634821415]
--- a/tests/metagpt/roles/test_researcher.py
+++ b/tests/metagpt/roles/test_researcher.py
@ -0,0 +1,32 @@
+from pathlib import Path
+from random import random
+from tempfile import TemporaryDirectory
+
+import pytest
+
+from metagpt.roles import researcher
+
+
+async def mock_llm_ask(self, prompt: str, system_msgs):
+    if "Please provide up to 2 necessary keywords" in prompt:
+        return '["dataiku", "datarobot"]'
+    elif "Provide up to 4 queries related to your research topic" in prompt:
+        return '["Dataiku machine learning platform", "DataRobot AI platform comparison", ' \
+            '"Dataiku vs DataRobot features", "Dataiku and DataRobot use cases"]'
+    elif "sort the remaining search results" in prompt:
+        return '[1,2]'
+    elif "Not relevant." in prompt:
+        return "Not relevant" if random() > 0.5 else prompt[-100:]
+    elif "provide a detailed research report" in prompt:
+        return f"# Research Report\n## Introduction\n{prompt}"
+    return ""
+
+
+@pytest.mark.asyncio
+async def test_researcher(mocker):
+    with TemporaryDirectory() as dirname:
+        topic = "dataiku vs. datarobot"
+        mocker.patch("metagpt.provider.base_gpt_api.BaseGPTAPI.aask", mock_llm_ask)
+        researcher.RESEARCH_PATH = Path(dirname)
+        await researcher.Researcher().run(topic)
+        assert (researcher.RESEARCH_PATH / f"{topic}.md").read_text().startswith("# Research Report")
--- a/tests/metagpt/roles/ui_role.py
+++ b/tests/metagpt/roles/ui_role.py
@ -2,22 +2,19 @@
 # @Date    : 2023/7/15 16:40
 # @Author  : stellahong (stellahong@fuzhi.ai)
 # @Desc    :
-import re
 import os
-from importlib import import_module
+import re
 from functools import wraps
+from importlib import import_module

-from metagpt.logs import logger
-from metagpt.actions import Action, ActionOutput
-from metagpt.roles import ProductManager, Role
-from metagpt.schema import Message
+from metagpt.actions import Action, ActionOutput, WritePRD
 from metagpt.const import WORKSPACE_ROOT
-
-from metagpt.actions import WritePRD
-from metagpt.software_company import SoftwareCompany
+from metagpt.logs import logger
+from metagpt.roles import Role
+from metagpt.schema import Message
 from metagpt.tools.sd_engine import SDEngine

-PROMPT_TEMPLATE = '''
+PROMPT_TEMPLATE = """
 # Context
 {context}

@ -34,9 +31,9 @@ Attention: Use '##' to split sections, not '#', and '## <SECTION_NAME>' SHOULD W
 ## CSS Styles (styles.css):Provide as Plain text,use standard css code
 ## Anything UNCLEAR:Provide as Plain text. Make clear here.

-'''
+"""

-FORMAT_EXAMPLE = '''
+FORMAT_EXAMPLE = """

 ## UI Design Description
 ```Snake games are classic and addictive games with simple yet engaging elements. Here are the main elements commonly found in snake games ```
@ -126,7 +123,7 @@ body {
 ## Anything UNCLEAR
 There are no unclear points.

-'''
+"""

 OUTPUT_MAPPING = {
    "UI Design Description": (str, ...),
@ -139,25 +136,25 @@ OUTPUT_MAPPING = {

 def load_engine(func):
    """Decorator to load an engine by file name and engine name."""
-    
+
    @wraps(func)
    def wrapper(*args, **kwargs):
        file_name, engine_name = func(*args, **kwargs)
-        engine_file = import_module(file_name, package='metagpt')
+        engine_file = import_module(file_name, package="metagpt")
        ip_module_cls = getattr(engine_file, engine_name)
        try:
            engine = ip_module_cls()
        except:
            engine = None
-        
+
        return engine
-    
+
    return wrapper


 def parse(func):
    """Decorator to parse information using regex pattern."""
-    
+
    @wraps(func)
    def wrapper(*args, **kwargs):
        context, pattern = func(*args, **kwargs)
@ -168,30 +165,30 @@ def parse(func):
        else:
            text_info = context
            logger.info("未找到匹配的内容")
-        
+
        return text_info
-    
+
    return wrapper


 class UIDesign(Action):
    """Class representing the UI Design action."""
-    
+
    def __init__(self, name, context=None, llm=None):
        super().__init__(name, context, llm)  # 需要调用LLM进一步丰富UI设计的prompt
-    
+
    @parse
    def parse_requirement(self, context: str):
        """Parse UI Design draft from the context using regex."""
        pattern = r"## UI Design draft.*?\n(.*?)## Anything UNCLEAR"
        return context, pattern
-    
+
    @parse
    def parse_ui_elements(self, context: str):
        """Parse Selected Elements from the context using regex."""
        pattern = r"## Selected Elements.*?\n(.*?)## HTML Layout"
        return context, pattern
-    
+
    @parse
    def parse_css_code(self, context: str):
        pattern = r"```css.*?\n(.*?)## Anything UNCLEAR"
@ -201,7 +198,7 @@ class UIDesign(Action):
    def parse_html_code(self, context: str):
        pattern = r"```html.*?\n(.*?)```"
        return context, pattern
-    
+
    async def draw_icons(self, context, *args, **kwargs):
        """Draw icons using SDEngine."""
        engine = SDEngine()
@ -215,20 +212,20 @@ class UIDesign(Action):
            prompts_batch.append(prompt)
        await engine.run_t2i(prompts_batch)
        logger.info("Finish icon design using StableDiffusion API")
-    
+
    async def _save(self, css_content, html_content):
-        save_dir = WORKSPACE_ROOT / "resources" / 'codes'
+        save_dir = WORKSPACE_ROOT / "resources" / "codes"
        if not os.path.exists(save_dir):
            os.makedirs(save_dir, exist_ok=True)
        # Save CSS and HTML content to files
-        css_file_path = save_dir / f"ui_design.css"
-        html_file_path = save_dir / f"ui_design.html"
-        
-        with open(css_file_path, 'w') as css_file:
+        css_file_path = save_dir / "ui_design.css"
+        html_file_path = save_dir / "ui_design.html"
+
+        with open(css_file_path, "w") as css_file:
            css_file.write(css_content)
-        with open(html_file_path, 'w') as html_file:
+        with open(html_file_path, "w") as html_file:
            html_file.write(html_content)
-    
+
    async def run(self, requirements: list[Message], *args, **kwargs) -> ActionOutput:
        """Run the UI Design action."""
        # fixme: update prompt (根据需求细化prompt）
@ -249,23 +246,27 @@ class UIDesign(Action):

 class UI(Role):
    """Class representing the UI Role."""
-    
-    def __init__(self, name="Catherine", profile="UI Design",
-                 goal="Finish a workable and good User Interface design based on a product design",
-                 constraints="Give clear layout description and use standard icons to finish the design",
-                 skills=["SD"]):
+
+    def __init__(
+        self,
+        name="Catherine",
+        profile="UI Design",
+        goal="Finish a workable and good User Interface design based on a product design",
+        constraints="Give clear layout description and use standard icons to finish the design",
+        skills=["SD"],
+    ):
        super().__init__(name, profile, goal, constraints)
        self.load_skills(skills)
        self._init_actions([UIDesign])
        self._watch([WritePRD])
-    
+
    @load_engine
    def load_sd_engine(self):
        """Load the SDEngine."""
        file_name = ".tools.sd_engine"
        engine_name = "SDEngine"
        return file_name, engine_name
-    
+
    def load_skills(self, skills):
        """Load skills for the UI Role."""
        # todo: 添加其他出图engine
@ -273,4 +274,3 @@ class UI(Role):
            if skill == "SD":
                self.sd_engine = self.load_sd_engine()
                logger.info(f"load skill engine {self.sd_engine}")
-    
--- a/tests/metagpt/tools/test_search_engine.py
+++ b/tests/metagpt/tools/test_search_engine.py
@ -5,24 +5,44 @@
@Author  : alexanderwu
@File    : test_search_engine.py
 """
+from __future__ import annotations

 import pytest

 from metagpt.logs import logger
+from metagpt.tools import SearchEngineType
 from metagpt.tools.search_engine import SearchEngine


+class MockSearchEnine:
+    async def run(self, query: str, max_results: int = 8, as_string: bool = True) -> str | list[dict[str, str]]:
+        rets = [{"url": "https://metagpt.com/mock/{i}", "title": query, "snippet": query * i} for i in range(max_results)]
+        return "\n".join(rets) if as_string else rets
+
+
@pytest.mark.asyncio
-@pytest.mark.usefixtures("llm_api")
-async def test_search_engine(llm_api):
-    search_engine = SearchEngine()
-    poetries = [
-        # ("北京美食", "北京"),
-        ("屈臣氏", "屈臣氏")
-    ]
-    for i, j in poetries:
-        rsp = await search_engine.run(i)
-        # rsp = context.llm.ask_batch([prompt])
-        logger.info(rsp)
-        # assert any(j in k['body'] for k in rsp)
-        assert len(rsp) > 0
+@pytest.mark.parametrize(
+    ("search_engine_typpe", "run_func", "max_results", "as_string"),
+    [
+        (SearchEngineType.SERPAPI_GOOGLE, None, 8, True),
+        (SearchEngineType.SERPAPI_GOOGLE, None, 4, False),
+        (SearchEngineType.DIRECT_GOOGLE, None, 8, True),
+        (SearchEngineType.DIRECT_GOOGLE, None, 6, False),
+        (SearchEngineType.SERPER_GOOGLE, None, 8, True),
+        (SearchEngineType.SERPER_GOOGLE, None, 6, False),
+        (SearchEngineType.DUCK_DUCK_GO, None, 8, True),
+        (SearchEngineType.DUCK_DUCK_GO, None, 6, False),
+        (SearchEngineType.CUSTOM_ENGINE, MockSearchEnine().run, 8, False),
+        (SearchEngineType.CUSTOM_ENGINE, MockSearchEnine().run, 6, False),
+        
+    ],
+)
+async def test_search_engine(search_engine_typpe, run_func, max_results, as_string, ):
+    search_engine = SearchEngine(search_engine_typpe, run_func)
+    rsp = await search_engine.run("metagpt", max_results=max_results, as_string=as_string)
+    logger.info(rsp)
+    if as_string:
+        assert isinstance(rsp, str)
+    else:
+        assert isinstance(rsp, list)
+        assert len(rsp) == max_results
--- a/tests/metagpt/tools/test_web_browser_engine.py
+++ b/tests/metagpt/tools/test_web_browser_engine.py
@ -1,6 +1,6 @@
 import pytest
-from metagpt.config import Config
-from metagpt.tools import web_browser_engine, WebBrowserEngineType
+
+from metagpt.tools import WebBrowserEngineType, web_browser_engine


@pytest.mark.asyncio
--- a/tests/metagpt/tools/test_web_browser_engine_playwright.py
+++ b/tests/metagpt/tools/test_web_browser_engine_playwright.py
@ -1,4 +1,5 @@
 import pytest
+
 from metagpt.config import CONFIG
 from metagpt.tools import web_browser_engine_playwright

@ -20,6 +21,7 @@ async def test_scrape_web_page(browser_type, use_proxy, kwagrs, url, urls, proxy
            CONFIG.global_proxy = proxy
        browser = web_browser_engine_playwright.PlaywrightWrapper(browser_type, **kwagrs)
        result = await browser.run(url)
+        result = result.inner_text
        assert isinstance(result, str)
        assert "Deepwisdom" in result

--- a/tests/metagpt/tools/test_web_browser_engine_selenium.py
+++ b/tests/metagpt/tools/test_web_browser_engine_selenium.py
@ -1,4 +1,5 @@
 import pytest
+
 from metagpt.config import CONFIG
 from metagpt.tools import web_browser_engine_selenium

@ -20,6 +21,7 @@ async def test_scrape_web_page(browser_type, use_proxy, url, urls, proxy, capfd)
            CONFIG.global_proxy = proxy
        browser = web_browser_engine_selenium.SeleniumWrapper(browser_type)
        result = await browser.run(url)
+        result = result.inner_text
        assert isinstance(result, str)
        assert "Deepwisdom" in result

@ -27,7 +29,7 @@ async def test_scrape_web_page(browser_type, use_proxy, url, urls, proxy, capfd)
            results = await browser.run(url, *urls)
            assert isinstance(results, list)
            assert len(results) == len(urls) + 1
-            assert all(("Deepwisdom" in i) for i in results)
+            assert all(("Deepwisdom" in i.inner_text) for i in results)
        if use_proxy:
            assert "Proxy:" in capfd.readouterr().out
    finally:
--- a/tests/metagpt/utils/test_parse_html.py
+++ b/tests/metagpt/utils/test_parse_html.py
@ -0,0 +1,68 @@
+from metagpt.utils import parse_html
+
+PAGE = """
+<!DOCTYPE html>
+<html>
+<head>
+    <title>Random HTML Example</title>
+</head>
+<body>
+    <h1>This is a Heading</h1>
+    <p>This is a paragraph with <a href="test">a link</a> and some <em>emphasized</em> text.</p>
+    <ul>
+        <li>Item 1</li>
+        <li>Item 2</li>
+        <li>Item 3</li>
+    </ul>
+    <ol>
+        <li>Numbered Item 1</li>
+        <li>Numbered Item 2</li>
+        <li>Numbered Item 3</li>
+    </ol>
+    <table>
+        <tr>
+            <th>Header 1</th>
+            <th>Header 2</th>
+        </tr>
+        <tr>
+            <td>Row 1, Cell 1</td>
+            <td>Row 1, Cell 2</td>
+        </tr>
+        <tr>
+            <td>Row 2, Cell 1</td>
+            <td>Row 2, Cell 2</td>
+        </tr>
+    </table>
+    <img src="image.jpg" alt="Sample Image">
+    <form action="/submit" method="post">
+        <label for="name">Name:</label>
+        <input type="text" id="name" name="name" required>
+        <label for="email">Email:</label>
+        <input type="email" id="email" name="email" required>
+        <button type="submit">Submit</button>
+    </form>
+    <div class="box">
+        <p>This is a div with a class "box".</p>
+        <p><a href="https://metagpt.com">a link</a></p>
+        <p><a href="#section2"></a></p>
+        <p><a href="ftp://192.168.1.1:8080"></a></p>
+        <p><a href="javascript:alert('Hello');"></a></p>
+    </div>
+</body>
+</html>
+"""
+
+CONTENT = 'This is a HeadingThis is a paragraph witha linkand someemphasizedtext.Item 1Item 2Item 3Numbered Item 1Numbered '\
+'Item 2Numbered Item 3Header 1Header 2Row 1, Cell 1Row 1, Cell 2Row 2, Cell 1Row 2, Cell 2Name:Email:SubmitThis is a div '\
+'with a class "box".a link'
+
+
+def test_web_page():
+    page = parse_html.WebPage(inner_text=CONTENT, html=PAGE, url="http://example.com")
+    assert page.title == "Random HTML Example"
+    assert list(page.get_links()) == ["http://example.com/test", "https://metagpt.com"]
+
+
+def test_get_page_content():
+    ret = parse_html.get_html_content(PAGE, "http://example.com")
+    assert ret == CONTENT
--- a/tests/metagpt/utils/test_serialize.py
+++ b/tests/metagpt/utils/test_serialize.py
@ -3,94 +3,64 @@
 # @Desc   : the unittest of serialize

 from typing import List, Tuple
-import pytest

-from pydantic import create_model
-
-from metagpt.actions.action_output import ActionOutput
 from metagpt.actions import WritePRD
+from metagpt.actions.action_output import ActionOutput
 from metagpt.schema import Message
-from metagpt.utils.serialize import actionoutout_schema_to_mapping, serialize_message, deserialize_message
+from metagpt.utils.serialize import (
+    actionoutout_schema_to_mapping,
+    deserialize_message,
+    serialize_message,
+)


 def test_actionoutout_schema_to_mapping():
-    schema = {
-        'title': 'test',
-        'type': 'object',
-        'properties': {
-            'field': {
-                'title': 'field',
-                'type': 'string'
-            }
-        }
-    }
+    schema = {"title": "test", "type": "object", "properties": {"field": {"title": "field", "type": "string"}}}
    mapping = actionoutout_schema_to_mapping(schema)
-    assert mapping['field'] == (str, ...)
+    assert mapping["field"] == (str, ...)

    schema = {
-        'title': 'test',
-        'type': 'object',
-        'properties': {
-            'field': {
-                'title': 'field',
-                'type': 'array',
-                'items': {
-                    'type': 'string'
-                }
-            }
-        }
+        "title": "test",
+        "type": "object",
+        "properties": {"field": {"title": "field", "type": "array", "items": {"type": "string"}}},
    }
    mapping = actionoutout_schema_to_mapping(schema)
-    assert mapping['field'] == (List[str], ...)
+    assert mapping["field"] == (List[str], ...)

    schema = {
-        'title': 'test',
-        'type': 'object',
-        'properties': {
-            'field': {
-                'title': 'field',
-                'type': 'array',
-                'items': {
-                    'type': 'array',
-                    'minItems': 2,
-                    'maxItems': 2,
-                    'items': [
-                        {
-                            'type': 'string'
-                        },
-                        {
-                            'type': 'string'
-                        }
-                    ]
-                }
+        "title": "test",
+        "type": "object",
+        "properties": {
+            "field": {
+                "title": "field",
+                "type": "array",
+                "items": {
+                    "type": "array",
+                    "minItems": 2,
+                    "maxItems": 2,
+                    "items": [{"type": "string"}, {"type": "string"}],
+                },
            }
-        }
+        },
    }
    mapping = actionoutout_schema_to_mapping(schema)
-    assert mapping['field'] == (List[Tuple[str, str]], ...)
+    assert mapping["field"] == (List[Tuple[str, str]], ...)

    assert True, True


 def test_serialize_and_deserialize_message():
-    out_mapping = {
-        'field1': (str, ...),
-        'field2': (List[str], ...)
-    }
-    out_data = {
-        'field1': 'field1 value',
-        'field2': ['field2 value1', 'field2 value2']
-    }
-    ic_obj = ActionOutput.create_model_class('prd', out_mapping)
+    out_mapping = {"field1": (str, ...), "field2": (List[str], ...)}
+    out_data = {"field1": "field1 value", "field2": ["field2 value1", "field2 value2"]}
+    ic_obj = ActionOutput.create_model_class("prd", out_mapping)

-    message = Message(content='prd demand',
-                      instruct_content=ic_obj(**out_data),
-                      role='user',
-                      cause_by=WritePRD)  # WritePRD as test action
+    message = Message(
+        content="prd demand", instruct_content=ic_obj(**out_data), role="user", cause_by=WritePRD
+    )  # WritePRD as test action

    message_ser = serialize_message(message)

    new_message = deserialize_message(message_ser)
    assert new_message.content == message.content
    assert new_message.cause_by == message.cause_by
-    assert new_message.instruct_content.field1 == out_data['field1']
+    assert new_message.instruct_content.field1 == out_data["field1"]
--- a/tests/metagpt/utils/test_text.py
+++ b/tests/metagpt/utils/test_text.py
@ -0,0 +1,77 @@
+import pytest
+
+from metagpt.utils.text import (
+    decode_unicode_escape,
+    generate_prompt_chunk,
+    reduce_message_length,
+    split_paragraph,
+)
+
+
+def _msgs():
+    length = 20
+    while length:
+        yield "Hello," * 1000 * length
+        length -= 1
+
+
+def _paragraphs(n):
+    return " ".join("Hello World." for _ in range(n))
+
+
+@pytest.mark.parametrize(
+    "msgs, model_name, system_text, reserved, expected",
+    [
+        (_msgs(), "gpt-3.5-turbo", "System", 1500, 1),
+        (_msgs(), "gpt-3.5-turbo-16k", "System", 3000, 6),
+        (_msgs(), "gpt-3.5-turbo-16k", "Hello," * 1000, 3000, 5),
+        (_msgs(), "gpt-4", "System", 2000, 3),
+        (_msgs(), "gpt-4", "Hello," * 1000, 2000, 2),
+        (_msgs(), "gpt-4-32k", "System", 4000, 14),
+        (_msgs(), "gpt-4-32k", "Hello," * 2000, 4000, 12),
+    ]
+)
+def test_reduce_message_length(msgs, model_name, system_text, reserved, expected):
+    assert len(reduce_message_length(msgs, model_name, system_text, reserved)) / (len("Hello,")) / 1000 == expected
+
+
+@pytest.mark.parametrize(
+    "text, prompt_template, model_name, system_text, reserved, expected",
+    [
+        (" ".join("Hello World." for _ in range(1000)), "Prompt: {}", "gpt-3.5-turbo", "System", 1500, 2),
+        (" ".join("Hello World." for _ in range(1000)), "Prompt: {}", "gpt-3.5-turbo-16k", "System", 3000, 1),
+        (" ".join("Hello World." for _ in range(4000)), "Prompt: {}", "gpt-4", "System", 2000, 2),
+        (" ".join("Hello World." for _ in range(8000)), "Prompt: {}", "gpt-4-32k", "System", 4000, 1),
+    ]
+)
+def test_generate_prompt_chunk(text, prompt_template, model_name, system_text, reserved, expected):
+    ret = list(generate_prompt_chunk(text, prompt_template, model_name, system_text, reserved))
+    assert len(ret) == expected
+
+
+@pytest.mark.parametrize(
+    "paragraph, sep, count, expected",
+    [
+        (_paragraphs(10), ".", 2, [_paragraphs(5), f" {_paragraphs(5)}"]),
+        (_paragraphs(10), ".", 3, [_paragraphs(4), f" {_paragraphs(3)}", f" {_paragraphs(3)}"]),
+        (f"{_paragraphs(5)}\n{_paragraphs(3)}", "\n.", 2, [f"{_paragraphs(5)}\n", _paragraphs(3)]),
+        ("......", ".", 2, ["...", "..."]),
+        ("......", ".", 3, ["..", "..", ".."]),
+        (".......", ".", 2, ["....", "..."]),
+    ]
+)
+def test_split_paragraph(paragraph, sep, count, expected):
+    ret = split_paragraph(paragraph, sep, count)
+    assert ret == expected
+
+
+@pytest.mark.parametrize(
+    "text, expected",
+    [
+        ("Hello\\nWorld", "Hello\nWorld"),
+        ("Hello\\tWorld", "Hello\tWorld"),
+        ("Hello\\u0020World", "Hello World"),
+    ]
+)
+def test_decode_unicode_escape(text, expected):
+    assert decode_unicode_escape(text) == expected