diff --git a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/citations/on.md b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/citations/on.md index b200f7a9a..e61a0bffb 100644 --- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/citations/on.md +++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/citations/on.md @@ -1,11 +1,42 @@ -Apply chunk citations only when the runtime injects `` / -`` blocks. +Citations reach the answer through two channels. Use whichever applies — and +never invent ids you didn't see. Citation ids are resolved by exact-match +lookup; a wrong id silently breaks the link, so when in doubt, omit. + +### Channel A — chunk blocks injected this turn +When `search_surfsense_docs` or `web_search` returns `` / +`` blocks in this turn: 1. For each factual statement taken from those chunks, add - `[citation:chunk_id]` using the exact id from ``. -2. Multiple chunks → `[citation:id1], [citation:id2]` (comma-separated). -3. Never invent or normalise ids; if unsure, omit. -4. Plain brackets only — no markdown links, no footnote numbering. -5. If no chunk-tagged documents appear this turn, do not fabricate citations. + `[citation:chunk_id]` using the **exact** id from a visible + `` tag. Copy digit-for-digit (or the URL verbatim); + do not retype from memory. +2. `` is the parent doc id, **not** a citation source — + only ids inside `` count. +3. Multiple chunks → `[citation:id1], [citation:id2]` (comma-separated, + each id copied individually). +4. Never invent, normalise, or guess at adjacent ids; if unsure, omit. +5. Plain brackets only — no markdown links, no footnote numbering. + +### Channel B — citations relayed by a `task` specialist +A `task(...)` tool message may contain `[citation:]` markers +the specialist already attached to its prose. The specialist saw the +underlying `` blocks; you didn't. So: + +1. **Preserve those markers verbatim** in your final answer — do not + reformat, renumber, drop, or wrap them in markdown links. When you + paraphrase a specialist sentence, copy the marker character-for- + character; do not regenerate the id from memory (LLMs reliably + corrupt nearby digits). +2. Keep each marker attached to the sentence the specialist attached + it to. +3. Do **not** add new `[citation:…]` markers of your own to a + specialist's prose; if a fact has no marker, the specialist + couldn't tie it to a chunk and neither can you. +4. When a specialist returns JSON, the citation markers live inside + the prose-bearing fields (e.g. a summary or excerpt). Pull them + along with the surrounding sentence when you quote. + +If neither channel surfaces citation markers this turn, do not fabricate +them. diff --git a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/description_readonly.md b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/description_readonly.md index d6837ec92..e989e3ee6 100644 --- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/description_readonly.md +++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/description_readonly.md @@ -2,4 +2,4 @@ Read-only specialist for the user's workspace (documents and folders). Use to fi Pass your full question as one string. The specialist runs in isolation: it cannot see this thread, so include any path hints, filters, or constraints it needs. -The specialist returns plain prose with absolute paths. +The specialist returns plain prose with absolute paths and `[citation:]` markers when claims came from KB-indexed chunks. Preserve those markers verbatim if you forward the answer. diff --git a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_cloud.md b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_cloud.md index 514ec6639..2ae21c271 100644 --- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_cloud.md +++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_cloud.md @@ -35,6 +35,43 @@ Map outcomes to your `status`: You construct the structured `evidence` fields from your own knowledge of what you called and what you observed — the tools do not return them. Never report values you did not actually see. +## Chunk citations in your prose + +When `read_file` returns a KB-indexed document under `/documents/`, the response includes `` blocks. Whenever a fact in your `action_summary` or `evidence.content_excerpt` came from a specific chunk, append `[citation:]` to the sentence stating that fact, using the **exact** id from the `` tag. The caller relays these markers to the end user verbatim, and the UI resolves each id by exact match against the database, so a wrong id silently breaks the citation. + +### Where chunk ids live in `read_file` output + +A KB document's XML has three numeric attributes — only **one** is a citation source: + +``` + + + 42 ← NOT a citation. Parent doc id; ignore for citations. + ... + + + ← Index hint; the same id also appears below. + + + + ← This is the citation source. + + + +``` + +### Rules + +- Use the **exact** id from a `` tag whose content you actually quoted or paraphrased. Copy digit-for-digit; do **not** retype from memory. +- Before emitting `[citation:N]`, confirm the literal substring `` (or its index twin `chunk_id="N"`) appears in the tool result you are summarising this turn. If you can't see it, omit the citation. +- Never cite `` — that's the parent doc, not a chunk. +- Never invent, normalise, shorten, or guess at adjacent ids. If unsure between two candidates, omit rather than pick. +- Prefer **fewer accurate citations** over many speculative ones. +- Multiple chunks supporting the same point → comma-separated and copied individually: `[citation:128], [citation:129]`. +- Plain square brackets only — no markdown links, no parentheses, no footnote numbers. +- Tool results without `` (write/edit/move confirmations, `ls` / `glob` / `grep` listings, error strings) carry no chunk id and need none. +- Populate `evidence.chunk_ids` with **only** ids you actually emitted in `[citation:…]` markers — same set, same digits. + ## Examples **Example 1 — happy path write (path discovered from existing convention):** diff --git a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_desktop.md b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_desktop.md index bfa96ee5b..4e5465aaf 100644 --- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_desktop.md +++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_desktop.md @@ -35,6 +35,10 @@ Map outcomes to your `status`: You construct the structured `evidence` fields from your own knowledge of what you called and what you observed — the tools do not return them. `chunk_ids` apply only to `` hits; for local-file operations leave them `null`. Never report values you did not actually see. +## Chunk citations in your prose + +In desktop mode your filesystem tools read local files only, and local-file tool results do **not** carry `` tags. Do not emit `[citation:…]` markers in `action_summary` or `evidence.content_excerpt`, and leave `evidence.chunk_ids` `null` — the absolute path is the only reference for local-file work. + ## Examples **Example 1 — happy path write (path discovered from existing convention):** diff --git a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_readonly_cloud.md b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_readonly_cloud.md index 3abfcd8b9..c7813e71d 100644 --- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_readonly_cloud.md +++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_readonly_cloud.md @@ -27,3 +27,42 @@ Reply in plain prose: - Cite every claim with an absolute path under `/documents/`. - If the workspace does not contain the requested information, say so explicitly. Do not fabricate paths or content. - If the question is genuinely ambiguous after a thorough lookup, list the candidates with their paths and stop. + +## Chunk citations + +When the evidence for a claim came from a `read_file` response that included `` blocks (i.e. a KB-indexed document under `/documents/`), append `[citation:]` to the sentence stating that claim. The caller passes these markers through to the end user verbatim, and the UI resolves each id by exact match against the database, so a wrong id silently breaks the citation. + +### Where chunk ids live in `read_file` output + +A KB document's XML has three numeric attributes — only **one** is a citation source: + +``` + + + 42 ← NOT a citation. Parent doc id; ignore for citations. + ... + + + ← Index hint; the same id also appears below. + + + + ← This is the citation source. + + + +``` + +### Rules + +- Use the **exact** id from a `` tag whose content you actually quoted or paraphrased. Copy digit-for-digit; do **not** retype from memory. +- Before emitting `[citation:N]`, confirm the literal substring `` (or its index twin `chunk_id="N"`) appears in the tool result you are summarising this turn. If you can't see it, omit the citation. +- Never cite `` — that's the parent doc, not a chunk. +- Never invent, normalise, shorten, or guess at adjacent ids. If unsure between two candidates, omit rather than pick. +- Prefer **fewer accurate citations** over many speculative ones. One correct `[citation:128]` is more useful than a string of wrong ids. +- Multiple chunks supporting the same point → comma-separated and copied individually: `[citation:128], [citation:129]`. +- Plain square brackets only — no markdown links, no parentheses, no footnote numbers. +- If a claim came from a tool result that did **not** carry a chunk id (`ls`, `glob`, `grep` listings, error strings, or files without ``), skip the citation. +- The absolute path under `/documents/` is always required; chunk citations are additive, they do not replace the path reference. + +Example: `The Q2 roadmap lists three milestones (/documents/planning/q2-roadmap.md) [citation:128], [citation:129].` diff --git a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_readonly_desktop.md b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_readonly_desktop.md index 1b3d72b64..2ea711e44 100644 --- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_readonly_desktop.md +++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/knowledge_base/system_prompt_readonly_desktop.md @@ -28,3 +28,7 @@ Reply in plain prose: - Cite every claim with an absolute path. - If the workspace does not contain the requested information, say so explicitly. Do not fabricate paths or content. - If the question is genuinely ambiguous after a thorough lookup, list the candidates with their paths and stop. + +## Chunk citations + +In desktop mode your filesystem tools read local files only, and local-file `read_file` responses do **not** carry `` tags. Cite each claim with the absolute local path; do not emit `[citation:…]` markers — your caller has nothing to resolve them against.