Merge remote-tracking branch 'upstream/dev' into improvement-agent-speed

Resolves: surfsense_backend/app/agents/new_chat/middleware/memory_injection.py - Took both imports: upstream moved MEMORY_HARD_LIMIT/SOFT_LIMIT to app.services.memory; kept our perf-logger import for timing. Pulls in upstream changes: - Memory document feature (services/memory refactor, removal of app.agents.new_chat.memory_extraction and background extraction in stream_new_chat — agent now drives memory via update_memory tool). - BACKEND_URL env refactor across web tool-ui/editor/chat/dashboard/lib. - GitHub Actions backend test workflow + pre-commit biome bump. - Token-display polish in MessageInfoDropdown; save_memory no-update sentinel. Verified: 1723 unit tests pass, ruff clean. No semantic regression in stream_new_chat (their memory-extraction deletion and our preflight removal touch different functions).
2026-05-27 19:25:15 +02:00 · 2026-05-20 21:23:48 +02:00 · 2026-05-20 21:23:48 +02:00 · 49da7a57df
commit 49da7a57df
parent d5ee8cc4cd 883ac81ce1
79 changed files with 1992 additions and 2296 deletions
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/memory_protocol/private.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/memory_protocol/private.md
@ -6,4 +6,10 @@ standing instructions?
 If yes, call `update_memory` **alongside** your normal response — don't
 defer it to a later turn. Skip ephemeral chat noise (one-off Q/A, greetings,
 session logistics). Stay within the budget shown in `<user_memory>`.
+
+Memory is heading-based markdown. New entries should be under `##` headings
+such as `## Facts`, `## Preferences`, or `## Instructions`, with bullets like
+`- YYYY-MM-DD: text`. If existing memory contains legacy
+`(YYYY-MM-DD) [fact|pref|instr]` markers, preserve the information but write
+new saves in the heading-based format.
 </memory_protocol>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/memory_protocol/team.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/memory_protocol/team.md
@ -6,4 +6,12 @@ key facts?
 If yes, call `update_memory` **alongside** your normal response — don't
 defer it to a later turn. Skip ephemeral chat noise (one-off Q/A, greetings,
 session logistics). Stay within the budget shown in `<team_memory>`.
+
+Team memory is heading-based markdown. New entries should be under `##`
+headings such as `## Product Decisions`, `## Engineering Conventions`,
+`## Project Facts`, or `## Open Questions`, with bullets like
+`- YYYY-MM-DD: text`. If existing memory contains legacy `(YYYY-MM-DD) [fact]`
+markers, preserve the information but write new saves in the heading-based
+format. Do not create personal headings such as `## Preferences` or
+`## Instructions`.
 </memory_protocol>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/update_memory/private/description.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/update_memory/private/description.md
@ -9,7 +9,9 @@
  - Skip ephemeral chat noise (one-off Q/A, greetings, session logistics).
  - Args: `updated_memory` — FULL replacement markdown (merge and curate,
    don't only append).
-  - Formatting: bullets `- (YYYY-MM-DD) [marker] text` with markers `[fact]`,
-    `[pref]`, `[instr]` (priority when trimming: `instr > pref > fact`).
-    Group bullets under short `##` headings; stay under the limit shown in
-    `<user_memory>`.
+  - Formatting: heading-based markdown with entries under `##` headings.
+    Recommended headings are `## Facts`, `## Preferences`, `## Instructions`,
+    though clearer natural headings are allowed. New bullets should look like
+    `- YYYY-MM-DD: text`; stay under the limit shown in `<user_memory>`.
+  - If existing memory uses legacy `(YYYY-MM-DD) [fact|pref|instr]` markers,
+    preserve the information but write the updated document in the new format.
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/update_memory/private/example.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/update_memory/private/example.md
@ -1,28 +1,28 @@
 <example>
 <user_name>Alex</user_name>, <user_memory> is empty.
 user: "I'm a space enthusiast, explain astrophage to me"
-→ update_memory(updated_memory="## Interests & background\n- (2025-03-15) [fact] Alex is a space enthusiast\n")
+→ update_memory(updated_memory="## Facts\n- 2025-03-15: Alex is a space enthusiast\n")
 (Casual durable fact; use first name, neutral heading.)
 </example>

 <example>
 user: "Remember that I prefer concise answers over detailed explanations"
-→ update_memory(updated_memory="## Interests & background\n- (2025-03-15) [fact] Alex is a space enthusiast\n\n## Response style\n- (2025-03-15) [pref] Alex prefers concise answers over detailed explanations\n")
+→ update_memory(updated_memory="## Facts\n- 2025-03-15: Alex is a space enthusiast\n\n## Preferences\n- 2025-03-15: Alex prefers concise answers over detailed explanations\n")
 (Durable preference; merge with existing memory.)
 </example>

 <example>
 user: "I actually moved to Tokyo last month"
-→ update_memory(updated_memory="...\n\n## Personal context\n- (2025-03-15) [fact] Alex lives in Tokyo (previously London)\n...")
+→ update_memory(updated_memory="...\n\n## Facts\n- 2025-03-15: Alex lives in Tokyo (previously London)\n...")
 (Updated fact; date reflects when recorded.)
 </example>

 <example>
 user: "I'm a freelance photographer working on a nature documentary"
-→ update_memory(updated_memory="...\n\n## Current focus\n- (2025-03-15) [fact] Alex is a freelance photographer\n- (2025-03-15) [fact] Alex is working on a nature documentary\n")
+→ update_memory(updated_memory="...\n\n## Current Focus\n- 2025-03-15: Alex is a freelance photographer\n- 2025-03-15: Alex is working on a nature documentary\n")
 </example>

 <example>
 user: "Always respond in bullet points"
-→ update_memory(updated_memory="...\n\n## Response style\n- (2025-03-15) [instr] Always respond to Alex in bullet points\n")
+→ update_memory(updated_memory="...\n\n## Instructions\n- 2025-03-15: Always respond to Alex in bullet points\n")
 </example>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/update_memory/team/description.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/update_memory/team/description.md
@ -9,8 +9,14 @@
  - Skip ephemeral chat noise (one-off Q/A, greetings, session logistics).
  - Args: `updated_memory` — FULL replacement markdown (merge and curate,
    don't only append).
-  - Formatting: bullets `- (YYYY-MM-DD) [fact] text`. Team memory uses ONLY
-    the `[fact]` marker (never `[pref]` or `[instr]`). Group bullets under
-    short `##` headings (2-3 words each); stay under the limit shown in
-    `<team_memory>`. When trimming, prioritise: decisions/conventions > key
-    facts > current priorities.
+  - Formatting: heading-based markdown with entries under `##` headings.
+    Recommended headings are `## Product Decisions`,
+    `## Engineering Conventions`, `## Project Facts`, and `## Open Questions`.
+    New bullets should look like `- YYYY-MM-DD: text`; stay under the limit
+    shown in `<team_memory>`.
+  - If existing memory uses legacy `(YYYY-MM-DD) [fact]` markers, preserve the
+    information but write the updated document in the new format.
+  - Do not create personal headings such as `## Preferences`,
+    `## Instructions`, `## Personal Notes`, or `## Personal Instructions`.
+    When trimming, prioritise: decisions/conventions > key facts > current
+    priorities.
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/update_memory/team/example.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/update_memory/team/example.md
@ -1,9 +1,9 @@
 <example>
 user: "Let's remember that we decided to do weekly standup meetings on Mondays"
-→ update_memory(updated_memory="...\n\n## Team rituals\n- (2025-03-15) [fact] Weekly standup meetings on Mondays\n...")
+→ update_memory(updated_memory="...\n\n## Product Decisions\n- 2025-03-15: Weekly standup meetings happen on Mondays\n...")
 </example>

 <example>
 user: "Our office is in downtown Seattle, 5th floor"
-→ update_memory(updated_memory="...\n\n## Workspace\n- (2025-03-15) [fact] Office location: downtown Seattle, 5th floor\n...")
+→ update_memory(updated_memory="...\n\n## Project Facts\n- 2025-03-15: Office location is downtown Seattle, 5th floor\n...")
 </example>
--- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/memory/system_prompt.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/memory/system_prompt.md
@ -18,6 +18,10 @@ Persist durable preferences/facts/instructions with `update_memory` while avoidi
 - Do not store transient chatter.
 - Do not store secrets unless explicitly instructed.
 - If memory intent is unclear, return `status=blocked` with the missing intent signal.
+- Persisted memory is heading-based markdown. New saved bullets should look like
+  `- YYYY-MM-DD: text` under `##` headings. If existing memory has legacy
+  `(YYYY-MM-DD) [fact|pref|instr]` markers, preserve the information but write
+  the updated document in the heading-based format.
 </tool_policy>

 <out_of_scope>
@ -53,4 +57,7 @@ Rules:
 - `status=success` -> `next_step=null`, `missing_fields=null`.
 - `status=partial|blocked|error` -> `next_step` must be non-null.
 - `status=blocked` due to missing required inputs -> `missing_fields` must be non-null.
+- `evidence.memory_category` is a semantic classification for supervisor logs
+  only. It is not the persisted storage format and must not force inline
+  `[fact|preference|instruction]` markers into saved memory.
 </output_contract>
--- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/memory/tools/update_memory.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/memory/tools/update_memory.py
@ -1,280 +1,23 @@
-"""Overwrite one markdown memory document per user or team, with size and shrink guards."""
+"""Memory update tools backed by the canonical memory service."""

 from __future__ import annotations

 import logging
-import re
-from typing import Any, Literal
+from typing import Any
 from uuid import UUID

-from langchain_core.messages import HumanMessage
 from langchain_core.tools import tool
-from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession

-from app.db import SearchSpace, User
+from app.services.memory import (
+    MEMORY_HARD_LIMIT,
+    MEMORY_SOFT_LIMIT,
+    MemoryScope,
+    save_memory,
+)

 logger = logging.getLogger(__name__)

-MEMORY_SOFT_LIMIT = 18_000
-MEMORY_HARD_LIMIT = 25_000
-
-_SECTION_HEADING_RE = re.compile(r"^##\s+(.+)$", re.MULTILINE)
-_HEADING_NORMALIZE_RE = re.compile(r"\s+")
-
-_MARKER_RE = re.compile(r"\[(fact|pref|instr)\]")
-_BULLET_FORMAT_RE = re.compile(r"^- \(\d{4}-\d{2}-\d{2}\) \[(fact|pref|instr)\] .+$")
-_PERSONAL_ONLY_MARKERS = {"pref", "instr"}
-
-
-# ---------------------------------------------------------------------------
-# Diff validation
-# ---------------------------------------------------------------------------
-
-
-def _extract_headings(memory: str) -> set[str]:
-    """Return all ``## …`` heading texts (without the ``## `` prefix)."""
-    return set(_SECTION_HEADING_RE.findall(memory))
-
-
-def _normalize_heading(heading: str) -> str:
-    """Normalize heading text for robust scope checks."""
-    return _HEADING_NORMALIZE_RE.sub(" ", heading.strip().lower())
-
-
-def _validate_memory_scope(
-    content: str, scope: Literal["user", "team"]
-) -> dict[str, Any] | None:
-    """Reject personal-only markers ([pref], [instr]) in team memory."""
-    if scope != "team":
-        return None
-
-    markers = set(_MARKER_RE.findall(content))
-    leaked = sorted(markers & _PERSONAL_ONLY_MARKERS)
-    if leaked:
-        tags = ", ".join(f"[{m}]" for m in leaked)
-        return {
-            "status": "error",
-            "message": (
-                f"Team memory cannot include personal markers: {tags}. "
-                "Use [fact] only in team memory."
-            ),
-        }
-    return None
-
-
-def _validate_bullet_format(content: str) -> list[str]:
-    """Return warnings for bullet lines that don't match the required format.
-
-    Expected: ``- (YYYY-MM-DD) [fact|pref|instr] text``
-    """
-    warnings: list[str] = []
-    for line in content.splitlines():
-        stripped = line.strip()
-        if not stripped.startswith("- "):
-            continue
-        if not _BULLET_FORMAT_RE.match(stripped):
-            short = stripped[:80] + ("..." if len(stripped) > 80 else "")
-            warnings.append(f"Malformed bullet: {short}")
-    return warnings
-
-
-def _validate_diff(old_memory: str | None, new_memory: str) -> list[str]:
-    """Return a list of warning strings about suspicious changes."""
-    if not old_memory:
-        return []
-
-    warnings: list[str] = []
-    old_headings = _extract_headings(old_memory)
-    new_headings = _extract_headings(new_memory)
-    dropped = old_headings - new_headings
-    if dropped:
-        names = ", ".join(sorted(dropped))
-        warnings.append(
-            f"Sections removed: {names}. "
-            "If unintentional, the user can restore from the settings page."
-        )
-
-    old_len = len(old_memory)
-    new_len = len(new_memory)
-    if old_len > 0 and new_len < old_len * 0.4:
-        warnings.append(
-            f"Memory shrank significantly ({old_len:,} -> {new_len:,} chars). "
-            "Possible data loss."
-        )
-    return warnings
-
-
-# ---------------------------------------------------------------------------
-# Size validation & soft warning
-# ---------------------------------------------------------------------------
-
-
-def _validate_memory_size(content: str) -> dict[str, Any] | None:
-    """Return an error/warning dict if *content* is too large, else None."""
-    length = len(content)
-    if length > MEMORY_HARD_LIMIT:
-        return {
-            "status": "error",
-            "message": (
-                f"Memory exceeds {MEMORY_HARD_LIMIT:,} character limit "
-                f"({length:,} chars). Consolidate by merging related items, "
-                "removing outdated entries, and shortening descriptions. "
-                "Then call update_memory again."
-            ),
-        }
-    return None
-
-
-def _soft_warning(content: str) -> str | None:
-    """Return a warning string if content exceeds the soft limit."""
-    length = len(content)
-    if length > MEMORY_SOFT_LIMIT:
-        return (
-            f"Memory is at {length:,}/{MEMORY_HARD_LIMIT:,} characters. "
-            "Consolidate by merging related items and removing less important "
-            "entries on your next update."
-        )
-    return None
-
-
-# ---------------------------------------------------------------------------
-# Forced rewrite when memory exceeds the hard limit
-# ---------------------------------------------------------------------------
-
-_FORCED_REWRITE_PROMPT = """\
-You are a memory curator. The following memory document exceeds the character \
-limit and must be shortened.
-
-RULES:
-1. Rewrite the document to be under {target} characters.
-2. Preserve existing ## headings. Every entry must remain under a heading. You may merge
-   or rename headings to consolidate, but keep names personal and descriptive.
-3. Priority for keeping content: [instr] > [pref] > [fact].
-4. Merge duplicate entries, remove outdated entries, shorten verbose descriptions.
-5. Every bullet MUST have format: - (YYYY-MM-DD) [fact|pref|instr] text
-6. Preserve the user's first name in entries — do not replace it with "the user".
-7. Output ONLY the consolidated markdown — no explanations, no wrapping.
-
-<memory_document>
-{content}
-</memory_document>"""
-
-
-async def _forced_rewrite(content: str, llm: Any) -> str | None:
-    """Use a focused LLM call to compress *content* under the hard limit.
-
-    Returns the rewritten string, or ``None`` if the call fails.
-    """
-    try:
-        prompt = _FORCED_REWRITE_PROMPT.format(
-            target=MEMORY_HARD_LIMIT, content=content
-        )
-        response = await llm.ainvoke(
-            [HumanMessage(content=prompt)],
-            config={"tags": ["surfsense:internal"]},
-        )
-        text = (
-            response.content
-            if isinstance(response.content, str)
-            else str(response.content)
-        )
-        return text.strip()
-    except Exception:
-        logger.exception("Forced rewrite LLM call failed")
-        return None
-
-
-# ---------------------------------------------------------------------------
-# Shared save-and-respond logic
-# ---------------------------------------------------------------------------
-
-
-async def _save_memory(
-    *,
-    updated_memory: str,
-    old_memory: str | None,
-    llm: Any | None,
-    apply_fn,
-    commit_fn,
-    rollback_fn,
-    label: str,
-    scope: Literal["user", "team"],
-) -> dict[str, Any]:
-    """Validate, optionally force-rewrite if over the hard limit, save, and
-    return a response dict.
-
-    Parameters
-    ----------
-    updated_memory : str
-        The new document the agent submitted.
-    old_memory : str | None
-        The previously persisted document (for diff checks).
-    llm : Any | None
-        LLM instance for forced rewrite (may be ``None``).
-    apply_fn : callable(str) -> None
-        Callback that sets the new memory on the ORM object.
-    commit_fn : coroutine
-        ``session.commit``.
-    rollback_fn : coroutine
-        ``session.rollback``.
-    label : str
-        Human label for log messages (e.g. "user memory", "team memory").
-    """
-    content = updated_memory
-
-    # --- forced rewrite if over the hard limit ---
-    if len(content) > MEMORY_HARD_LIMIT and llm is not None:
-        rewritten = await _forced_rewrite(content, llm)
-        if rewritten is not None and len(rewritten) < len(content):
-            content = rewritten
-
-    # --- hard-limit gate (reject if still too large after rewrite) ---
-    size_err = _validate_memory_size(content)
-    if size_err:
-        return size_err
-
-    scope_err = _validate_memory_scope(content, scope)
-    if scope_err:
-        return scope_err
-
-    # --- persist ---
-    try:
-        apply_fn(content)
-        await commit_fn()
-    except Exception as e:
-        logger.exception("Failed to update %s: %s", label, e)
-        await rollback_fn()
-        return {"status": "error", "message": f"Failed to update {label}: {e}"}
-
-    # --- build response ---
-    resp: dict[str, Any] = {
-        "status": "saved",
-        "message": f"{label.capitalize()} updated.",
-    }
-
-    if content is not updated_memory:
-        resp["notice"] = "Memory was automatically rewritten to fit within limits."
-
-    diff_warnings = _validate_diff(old_memory, content)
-    if diff_warnings:
-        resp["diff_warnings"] = diff_warnings
-
-    format_warnings = _validate_bullet_format(content)
-    if format_warnings:
-        resp["format_warnings"] = format_warnings
-
-    warning = _soft_warning(content)
-    if warning:
-        resp["warning"] = warning
-
-    return resp
-
-
-# ---------------------------------------------------------------------------
-# Tool factories
-# ---------------------------------------------------------------------------
-

 def create_update_memory_tool(
    user_id: str | UUID,
@ -287,40 +30,22 @@ def create_update_memory_tool(
    async def update_memory(updated_memory: str) -> dict[str, Any]:
        """Update the user's personal memory document.

-        Your current memory is shown in <user_memory> in the system prompt.
-        When the user shares important long-term information (preferences,
-        facts, instructions, context), rewrite the memory document to include
-        the new information.  Merge new facts with existing ones, update
-        contradictions, remove outdated entries, and keep it concise.
-
-        Args:
-            updated_memory: The FULL updated markdown document (not a diff).
+        The current memory is shown in <user_memory>. Pass the FULL updated
+        markdown document, not a diff.
        """
        try:
-            result = await db_session.execute(select(User).where(User.id == uid))
-            user = result.scalars().first()
-            if not user:
-                return {"status": "error", "message": "User not found."}
-
-            old_memory = user.memory_md
-
-            return await _save_memory(
-                updated_memory=updated_memory,
-                old_memory=old_memory,
+            result = await save_memory(
+                scope=MemoryScope.USER,
+                target_id=uid,
+                content=updated_memory,
+                session=db_session,
                llm=llm,
-                apply_fn=lambda content: setattr(user, "memory_md", content),
-                commit_fn=db_session.commit,
-                rollback_fn=db_session.rollback,
-                label="memory",
-                scope="user",
            )
+            return result.to_dict()
        except Exception as e:
            logger.exception("Failed to update user memory: %s", e)
            await db_session.rollback()
-            return {
-                "status": "error",
-                "message": f"Failed to update memory: {e}",
-            }
+            return {"status": "error", "message": f"Failed to update memory: {e}"}

    return update_memory

@ -334,36 +59,18 @@ def create_update_team_memory_tool(
    async def update_memory(updated_memory: str) -> dict[str, Any]:
        """Update the team's shared memory document for this search space.

-        Your current team memory is shown in <team_memory> in the system
-        prompt.  When the team shares important long-term information
-        (decisions, conventions, key facts, priorities), rewrite the memory
-        document to include the new information.  Merge new facts with
-        existing ones, update contradictions, remove outdated entries, and
-        keep it concise.
-
-        Args:
-            updated_memory: The FULL updated markdown document (not a diff).
+        The current team memory is shown in <team_memory>. Pass the FULL updated
+        markdown document, not a diff.
        """
        try:
-            result = await db_session.execute(
-                select(SearchSpace).where(SearchSpace.id == search_space_id)
-            )
-            space = result.scalars().first()
-            if not space:
-                return {"status": "error", "message": "Search space not found."}
-
-            old_memory = space.shared_memory_md
-
-            return await _save_memory(
-                updated_memory=updated_memory,
-                old_memory=old_memory,
+            result = await save_memory(
+                scope=MemoryScope.TEAM,
+                target_id=search_space_id,
+                content=updated_memory,
+                session=db_session,
                llm=llm,
-                apply_fn=lambda content: setattr(space, "shared_memory_md", content),
-                commit_fn=db_session.commit,
-                rollback_fn=db_session.rollback,
-                label="team memory",
-                scope="team",
            )
+            return result.to_dict()
        except Exception as e:
            logger.exception("Failed to update team memory: %s", e)
            await db_session.rollback()
@ -373,3 +80,11 @@ def create_update_team_memory_tool(
            }

    return update_memory
+
+
+__all__ = [
+    "MEMORY_HARD_LIMIT",
+    "MEMORY_SOFT_LIMIT",
+    "create_update_memory_tool",
+    "create_update_team_memory_tool",
+]