mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-06-08 20:25:19 +02:00
feat: update memory extraction and management protocols to enforce structured bullet formats, utilize user first names, and enhance validation for team and user memory entries
This commit is contained in:
parent
ad2a981a77
commit
b8e1c9801b
8 changed files with 215 additions and 101 deletions
|
|
@ -34,13 +34,12 @@ info, things that only matter for the current task.
|
|||
If the message contains memorizable information, output the FULL updated \
|
||||
memory document with the new facts merged into the existing content. Follow \
|
||||
these rules:
|
||||
- Use the same ## section structure as the existing memory.
|
||||
- Preserve any existing ## headings; create new ones if useful.
|
||||
- Keep entries as single concise bullet points (under 120 chars each).
|
||||
- Every bullet MUST start with a (YYYY-MM-DD) date prefix.
|
||||
- Every bullet MUST use format: - (YYYY-MM-DD) [fact|pref|instr] text
|
||||
[fact] = durable facts, [pref] = preferences, [instr] = standing instructions.
|
||||
- If a new fact contradicts an existing entry, update the existing entry.
|
||||
- Do not duplicate information that is already present.
|
||||
- Standard sections: \
|
||||
"## About the user", "## Preferences", "## Instructions"
|
||||
|
||||
If nothing is worth remembering, output exactly: NO_UPDATE
|
||||
|
||||
|
|
@ -77,16 +76,13 @@ NOT worth remembering:
|
|||
|
||||
If the message contains memorizable team information, output the FULL updated \
|
||||
team memory document with new facts merged into existing content. Follow rules:
|
||||
- Use the same ## section structure as the existing memory.
|
||||
- Preserve any existing ## headings; create new ones if useful.
|
||||
- Keep entries as single concise bullet points (under 120 chars each).
|
||||
- Every bullet MUST start with a (YYYY-MM-DD) date prefix.
|
||||
- Every bullet MUST use format: - (YYYY-MM-DD) [fact] text
|
||||
Team memory uses ONLY the [fact] marker. Never use [pref] or [instr].
|
||||
- If a new fact contradicts an existing entry, update the existing entry.
|
||||
- Do not duplicate existing information.
|
||||
- NEVER use personal sections like "## About the user", "## Preferences", \
|
||||
or "## Instructions".
|
||||
- Preserve neutral team phrasing; avoid person-specific memory unless role-anchored.
|
||||
- Standard sections: "## Team decisions", "## Team conventions", \
|
||||
"## Key facts", "## Current priorities"
|
||||
|
||||
If nothing is worth remembering, output exactly: NO_UPDATE
|
||||
|
||||
|
|
|
|||
|
|
@ -281,18 +281,16 @@ _MEMORY_TOOL_INSTRUCTIONS: dict[str, dict[str, str]] = {
|
|||
- updated_memory: The FULL updated markdown document (not a diff).
|
||||
Merge new facts with existing ones, update contradictions, remove outdated entries.
|
||||
Treat every update as a curation pass — consolidate, don't just append.
|
||||
- Every bullet MUST start with a (YYYY-MM-DD) date prefix indicating when it was recorded or last updated.
|
||||
- Every bullet MUST use this format: - (YYYY-MM-DD) [marker] text
|
||||
Markers:
|
||||
[fact] — durable facts (role, background, projects, tools, expertise)
|
||||
[pref] — preferences (response style, languages, formats, tools)
|
||||
[instr] — standing instructions (always/never do, response rules)
|
||||
- Keep it concise and well under the character limit shown in <user_memory>.
|
||||
- You MUST organize memory using these standard sections (add new `##` sections only if none of the standard ones fit):
|
||||
## About the user
|
||||
## Preferences
|
||||
## Instructions
|
||||
- Section guidance:
|
||||
* About the user: role, background, company, durable identity context
|
||||
* Preferences: languages, tools, frameworks, response style preferences
|
||||
* Instructions: standing instructions, things to always/never do
|
||||
- Use any `##` heading that fits. Headings are optional and freeform — organize
|
||||
however makes sense for the content (e.g. ## Work, ## Research, ## Personal).
|
||||
- Each entry MUST be a single bullet point. Keep entries concise (aim for under 120 chars each).
|
||||
- During consolidation, prioritize keeping: identity/instructions > preferences.
|
||||
- During consolidation, prioritize keeping: [instr] > [pref] > [fact].
|
||||
""",
|
||||
"shared": """
|
||||
- update_memory: Update the team's shared memory document for this search space.
|
||||
|
|
@ -311,18 +309,11 @@ _MEMORY_TOOL_INSTRUCTIONS: dict[str, dict[str, str]] = {
|
|||
- updated_memory: The FULL updated markdown document (not a diff).
|
||||
Merge new facts with existing ones, update contradictions, remove outdated entries.
|
||||
Treat every update as a curation pass — consolidate, don't just append.
|
||||
- Every bullet MUST start with a (YYYY-MM-DD) date prefix indicating when it was recorded or last updated.
|
||||
- Every bullet MUST use this format: - (YYYY-MM-DD) [fact] text
|
||||
Team memory uses ONLY the [fact] marker. Never use [pref] or [instr] in team memory.
|
||||
- Keep it concise and well under the character limit shown in <team_memory>.
|
||||
- You MUST organize memory using these standard sections (add new `##` sections only if none of the standard ones fit):
|
||||
## Team decisions
|
||||
## Conventions
|
||||
## Key facts
|
||||
## Current priorities
|
||||
- Section guidance:
|
||||
* Team decisions: agreed choices and durable technical/product decisions
|
||||
* Conventions: coding standards, tools, processes, naming patterns
|
||||
* Key facts: stable facts about org/team/system setup
|
||||
* Current priorities: active projects, near-term goals, important blockers
|
||||
- Use any `##` heading that fits. Headings are optional and freeform — organize
|
||||
however makes sense for the content (e.g. ## Decisions, ## Architecture, ## Process).
|
||||
- Each entry MUST be a single bullet point. Keep entries concise (aim for under 120 chars each).
|
||||
- During consolidation, prioritize keeping: decisions/conventions > key facts > current priorities.
|
||||
""",
|
||||
|
|
@ -334,24 +325,27 @@ _MEMORY_TOOL_EXAMPLES: dict[str, dict[str, str]] = {
|
|||
"private": """
|
||||
- <user_memory> is empty. User: "I'm a space enthusiast, explain astrophage to me"
|
||||
- The user casually shared a durable fact about themselves. Save it:
|
||||
update_memory(updated_memory="## About the user\\n- (2025-03-15) Space enthusiast\\n")
|
||||
update_memory(updated_memory="- (2025-03-15) [fact] Space enthusiast\\n")
|
||||
- User: "Remember that I prefer concise answers over detailed explanations"
|
||||
- Durable preference. You see the current <user_memory> and merge:
|
||||
update_memory(updated_memory="## About the user\\n- (2025-03-15) Space enthusiast\\n\\n## Preferences\\n- (2025-03-15) Prefers concise answers over detailed explanations\\n...")
|
||||
- Durable preference. Merge with existing memory:
|
||||
update_memory(updated_memory="- (2025-03-15) [fact] Space enthusiast\\n- (2025-03-15) [pref] Prefers concise answers over detailed explanations\\n")
|
||||
- User: "I actually moved to Tokyo last month"
|
||||
- Updated fact, date prefix reflects when recorded:
|
||||
update_memory(updated_memory="## About the user\\n- (2025-03-15) Lives in Tokyo (previously London)\\n...")
|
||||
update_memory(updated_memory="- (2025-03-15) [fact] Lives in Tokyo (previously London)\\n...")
|
||||
- User: "I'm a freelance photographer working on a nature documentary"
|
||||
- Durable background info. Save it under About the user:
|
||||
update_memory(updated_memory="## About the user\\n- (2025-03-15) Freelance photographer\\n- (2025-03-15) Working on a nature documentary\\n")
|
||||
- Durable background info:
|
||||
update_memory(updated_memory="- (2025-03-15) [fact] Freelance photographer\\n- (2025-03-15) [fact] Working on a nature documentary\\n")
|
||||
- User: "Always respond in bullet points"
|
||||
- Standing instruction:
|
||||
update_memory(updated_memory="...\\n- (2025-03-15) [instr] Always respond in bullet points\\n")
|
||||
""",
|
||||
"shared": """
|
||||
- User: "Let's remember that we decided to do weekly standup meetings on Mondays"
|
||||
- Durable team decision:
|
||||
update_memory(updated_memory="## Team decisions\\n- (2025-03-15) Weekly standup meetings on Mondays\\n...")
|
||||
update_memory(updated_memory="- (2025-03-15) [fact] Weekly standup meetings on Mondays\\n...")
|
||||
- User: "Our office is in downtown Seattle, 5th floor"
|
||||
- Durable team fact:
|
||||
update_memory(updated_memory="## Key facts\\n- (2025-03-15) Office location: downtown Seattle, 5th floor\\n...")
|
||||
update_memory(updated_memory="- (2025-03-15) [fact] Office location: downtown Seattle, 5th floor\\n...")
|
||||
""",
|
||||
},
|
||||
}
|
||||
|
|
|
|||
|
|
@ -36,13 +36,11 @@ MEMORY_HARD_LIMIT = 25_000
|
|||
_SECTION_HEADING_RE = re.compile(r"^##\s+(.+)$", re.MULTILINE)
|
||||
_HEADING_NORMALIZE_RE = re.compile(r"\s+")
|
||||
|
||||
_USER_ONLY_HEADINGS = {"about the user", "preferences", "instructions"}
|
||||
_TEAM_ONLY_HEADINGS = {
|
||||
"team decisions",
|
||||
"conventions",
|
||||
"key facts",
|
||||
"current priorities",
|
||||
}
|
||||
_MARKER_RE = re.compile(r"\[(fact|pref|instr)\]")
|
||||
_BULLET_FORMAT_RE = re.compile(
|
||||
r"^- \(\d{4}-\d{2}-\d{2}\) \[(fact|pref|instr)\] .+$"
|
||||
)
|
||||
_PERSONAL_ONLY_MARKERS = {"pref", "instr"}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
|
|
@ -63,37 +61,40 @@ def _normalize_heading(heading: str) -> str:
|
|||
def _validate_memory_scope(
|
||||
content: str, scope: Literal["user", "team"]
|
||||
) -> dict[str, Any] | None:
|
||||
"""Reject cross-scope headings (user sections in team memory and vice versa)."""
|
||||
headings = {_normalize_heading(h) for h in _extract_headings(content)}
|
||||
if not headings:
|
||||
"""Reject personal-only markers ([pref], [instr]) in team memory."""
|
||||
if scope != "team":
|
||||
return None
|
||||
|
||||
if scope == "team":
|
||||
leaked = sorted(headings & _USER_ONLY_HEADINGS)
|
||||
if leaked:
|
||||
return {
|
||||
"status": "error",
|
||||
"message": (
|
||||
"Team memory cannot include personal sections: "
|
||||
+ ", ".join(leaked)
|
||||
+ ". Use team sections only."
|
||||
),
|
||||
}
|
||||
return None
|
||||
|
||||
leaked = sorted(headings & _TEAM_ONLY_HEADINGS)
|
||||
markers = set(_MARKER_RE.findall(content))
|
||||
leaked = sorted(markers & _PERSONAL_ONLY_MARKERS)
|
||||
if leaked:
|
||||
tags = ", ".join(f"[{m}]" for m in leaked)
|
||||
return {
|
||||
"status": "error",
|
||||
"message": (
|
||||
"User memory cannot include team sections: "
|
||||
+ ", ".join(leaked)
|
||||
+ ". Use personal sections only."
|
||||
f"Team memory cannot include personal markers: {tags}. "
|
||||
"Use [fact] only in team memory."
|
||||
),
|
||||
}
|
||||
return None
|
||||
|
||||
|
||||
def _validate_bullet_format(content: str) -> list[str]:
|
||||
"""Return warnings for bullet lines that don't match the required format.
|
||||
|
||||
Expected: ``- (YYYY-MM-DD) [fact|pref|instr] text``
|
||||
"""
|
||||
warnings: list[str] = []
|
||||
for line in content.splitlines():
|
||||
stripped = line.strip()
|
||||
if not stripped.startswith("- "):
|
||||
continue
|
||||
if not _BULLET_FORMAT_RE.match(stripped):
|
||||
short = stripped[:80] + ("..." if len(stripped) > 80 else "")
|
||||
warnings.append(f"Malformed bullet: {short}")
|
||||
return warnings
|
||||
|
||||
|
||||
def _validate_diff(old_memory: str | None, new_memory: str) -> list[str]:
|
||||
"""Return a list of warning strings about suspicious changes."""
|
||||
if not old_memory:
|
||||
|
|
@ -163,13 +164,11 @@ limit and must be shortened.
|
|||
|
||||
RULES:
|
||||
1. Rewrite the document to be under {target} characters.
|
||||
2. Preserve all ## section headings.
|
||||
3. Priority for keeping content: identity/instructions > preferences > \
|
||||
current context.
|
||||
2. Preserve any existing ## headings.
|
||||
3. Priority for keeping content: [instr] > [pref] > [fact].
|
||||
4. Merge duplicate entries, remove outdated entries, shorten verbose descriptions.
|
||||
5. Each entry must be a single bullet point.
|
||||
6. Every bullet MUST keep its (YYYY-MM-DD) date prefix.
|
||||
7. Output ONLY the consolidated markdown — no explanations, no wrapping.
|
||||
5. Every bullet MUST have format: - (YYYY-MM-DD) [fact|pref|instr] text
|
||||
6. Output ONLY the consolidated markdown — no explanations, no wrapping.
|
||||
|
||||
<memory_document>
|
||||
{content}
|
||||
|
|
@ -275,6 +274,10 @@ async def _save_memory(
|
|||
if diff_warnings:
|
||||
resp["diff_warnings"] = diff_warnings
|
||||
|
||||
format_warnings = _validate_bullet_format(content)
|
||||
if format_warnings:
|
||||
resp["format_warnings"] = format_warnings
|
||||
|
||||
warning = _soft_warning(content)
|
||||
if warning:
|
||||
resp["warning"] = warning
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue