mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-25 19:15:18 +02:00
fix: citations in agent responses
This commit is contained in:
parent
1a4400c923
commit
cacb27e007
6 changed files with 123 additions and 8 deletions
|
|
@ -1,11 +1,42 @@
|
||||||
<citations>
|
<citations>
|
||||||
Apply chunk citations only when the runtime injects `<document>` /
|
Citations reach the answer through two channels. Use whichever applies — and
|
||||||
`<chunk id='…'>` blocks.
|
never invent ids you didn't see. Citation ids are resolved by exact-match
|
||||||
|
lookup; a wrong id silently breaks the link, so when in doubt, omit.
|
||||||
|
|
||||||
|
### Channel A — chunk blocks injected this turn
|
||||||
|
When `search_surfsense_docs` or `web_search` returns `<document>` /
|
||||||
|
`<chunk id='…'>` blocks in this turn:
|
||||||
|
|
||||||
1. For each factual statement taken from those chunks, add
|
1. For each factual statement taken from those chunks, add
|
||||||
`[citation:chunk_id]` using the exact id from `<chunk id='…'>`.
|
`[citation:chunk_id]` using the **exact** id from a visible
|
||||||
2. Multiple chunks → `[citation:id1], [citation:id2]` (comma-separated).
|
`<chunk id='…'>` tag. Copy digit-for-digit (or the URL verbatim);
|
||||||
3. Never invent or normalise ids; if unsure, omit.
|
do not retype from memory.
|
||||||
4. Plain brackets only — no markdown links, no footnote numbering.
|
2. `<document_id>` is the parent doc id, **not** a citation source —
|
||||||
5. If no chunk-tagged documents appear this turn, do not fabricate citations.
|
only ids inside `<chunk id='…'>` count.
|
||||||
|
3. Multiple chunks → `[citation:id1], [citation:id2]` (comma-separated,
|
||||||
|
each id copied individually).
|
||||||
|
4. Never invent, normalise, or guess at adjacent ids; if unsure, omit.
|
||||||
|
5. Plain brackets only — no markdown links, no footnote numbering.
|
||||||
|
|
||||||
|
### Channel B — citations relayed by a `task` specialist
|
||||||
|
A `task(...)` tool message may contain `[citation:<chunk_id>]` markers
|
||||||
|
the specialist already attached to its prose. The specialist saw the
|
||||||
|
underlying `<chunk id='…'>` blocks; you didn't. So:
|
||||||
|
|
||||||
|
1. **Preserve those markers verbatim** in your final answer — do not
|
||||||
|
reformat, renumber, drop, or wrap them in markdown links. When you
|
||||||
|
paraphrase a specialist sentence, copy the marker character-for-
|
||||||
|
character; do not regenerate the id from memory (LLMs reliably
|
||||||
|
corrupt nearby digits).
|
||||||
|
2. Keep each marker attached to the sentence the specialist attached
|
||||||
|
it to.
|
||||||
|
3. Do **not** add new `[citation:…]` markers of your own to a
|
||||||
|
specialist's prose; if a fact has no marker, the specialist
|
||||||
|
couldn't tie it to a chunk and neither can you.
|
||||||
|
4. When a specialist returns JSON, the citation markers live inside
|
||||||
|
the prose-bearing fields (e.g. a summary or excerpt). Pull them
|
||||||
|
along with the surrounding sentence when you quote.
|
||||||
|
|
||||||
|
If neither channel surfaces citation markers this turn, do not fabricate
|
||||||
|
them.
|
||||||
</citations>
|
</citations>
|
||||||
|
|
|
||||||
|
|
@ -2,4 +2,4 @@ Read-only specialist for the user's workspace (documents and folders). Use to fi
|
||||||
|
|
||||||
Pass your full question as one string. The specialist runs in isolation: it cannot see this thread, so include any path hints, filters, or constraints it needs.
|
Pass your full question as one string. The specialist runs in isolation: it cannot see this thread, so include any path hints, filters, or constraints it needs.
|
||||||
|
|
||||||
The specialist returns plain prose with absolute paths.
|
The specialist returns plain prose with absolute paths and `[citation:<chunk_id>]` markers when claims came from KB-indexed chunks. Preserve those markers verbatim if you forward the answer.
|
||||||
|
|
|
||||||
|
|
@ -35,6 +35,43 @@ Map outcomes to your `status`:
|
||||||
|
|
||||||
You construct the structured `evidence` fields from your own knowledge of what you called and what you observed — the tools do not return them. Never report values you did not actually see.
|
You construct the structured `evidence` fields from your own knowledge of what you called and what you observed — the tools do not return them. Never report values you did not actually see.
|
||||||
|
|
||||||
|
## Chunk citations in your prose
|
||||||
|
|
||||||
|
When `read_file` returns a KB-indexed document under `/documents/`, the response includes `<chunk id='…'>` blocks. Whenever a fact in your `action_summary` or `evidence.content_excerpt` came from a specific chunk, append `[citation:<chunk_id>]` to the sentence stating that fact, using the **exact** id from the `<chunk id='…'>` tag. The caller relays these markers to the end user verbatim, and the UI resolves each id by exact match against the database, so a wrong id silently breaks the citation.
|
||||||
|
|
||||||
|
### Where chunk ids live in `read_file` output
|
||||||
|
|
||||||
|
A KB document's XML has three numeric attributes — only **one** is a citation source:
|
||||||
|
|
||||||
|
```
|
||||||
|
<document>
|
||||||
|
<document_metadata>
|
||||||
|
<document_id>42</document_id> ← NOT a citation. Parent doc id; ignore for citations.
|
||||||
|
...
|
||||||
|
</document_metadata>
|
||||||
|
<chunk_index>
|
||||||
|
<entry chunk_id="128" lines="14-22"/> ← Index hint; the same id also appears below.
|
||||||
|
<entry chunk_id="129" lines="23-30" matched="true"/>
|
||||||
|
</chunk_index>
|
||||||
|
<document_content>
|
||||||
|
<chunk id='128'><![CDATA[…]]></chunk> ← This is the citation source.
|
||||||
|
<chunk id='129'><![CDATA[…]]></chunk>
|
||||||
|
</document_content>
|
||||||
|
</document>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rules
|
||||||
|
|
||||||
|
- Use the **exact** id from a `<chunk id='…'>` tag whose content you actually quoted or paraphrased. Copy digit-for-digit; do **not** retype from memory.
|
||||||
|
- Before emitting `[citation:N]`, confirm the literal substring `<chunk id='N'>` (or its index twin `chunk_id="N"`) appears in the tool result you are summarising this turn. If you can't see it, omit the citation.
|
||||||
|
- Never cite `<document_id>` — that's the parent doc, not a chunk.
|
||||||
|
- Never invent, normalise, shorten, or guess at adjacent ids. If unsure between two candidates, omit rather than pick.
|
||||||
|
- Prefer **fewer accurate citations** over many speculative ones.
|
||||||
|
- Multiple chunks supporting the same point → comma-separated and copied individually: `[citation:128], [citation:129]`.
|
||||||
|
- Plain square brackets only — no markdown links, no parentheses, no footnote numbers.
|
||||||
|
- Tool results without `<chunk id='…'>` (write/edit/move confirmations, `ls` / `glob` / `grep` listings, error strings) carry no chunk id and need none.
|
||||||
|
- Populate `evidence.chunk_ids` with **only** ids you actually emitted in `[citation:…]` markers — same set, same digits.
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
**Example 1 — happy path write (path discovered from existing convention):**
|
**Example 1 — happy path write (path discovered from existing convention):**
|
||||||
|
|
|
||||||
|
|
@ -35,6 +35,10 @@ Map outcomes to your `status`:
|
||||||
|
|
||||||
You construct the structured `evidence` fields from your own knowledge of what you called and what you observed — the tools do not return them. `chunk_ids` apply only to `<priority_documents>` hits; for local-file operations leave them `null`. Never report values you did not actually see.
|
You construct the structured `evidence` fields from your own knowledge of what you called and what you observed — the tools do not return them. `chunk_ids` apply only to `<priority_documents>` hits; for local-file operations leave them `null`. Never report values you did not actually see.
|
||||||
|
|
||||||
|
## Chunk citations in your prose
|
||||||
|
|
||||||
|
In desktop mode your filesystem tools read local files only, and local-file tool results do **not** carry `<chunk id='…'>` tags. Do not emit `[citation:…]` markers in `action_summary` or `evidence.content_excerpt`, and leave `evidence.chunk_ids` `null` — the absolute path is the only reference for local-file work.
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
**Example 1 — happy path write (path discovered from existing convention):**
|
**Example 1 — happy path write (path discovered from existing convention):**
|
||||||
|
|
|
||||||
|
|
@ -27,3 +27,42 @@ Reply in plain prose:
|
||||||
- Cite every claim with an absolute path under `/documents/`.
|
- Cite every claim with an absolute path under `/documents/`.
|
||||||
- If the workspace does not contain the requested information, say so explicitly. Do not fabricate paths or content.
|
- If the workspace does not contain the requested information, say so explicitly. Do not fabricate paths or content.
|
||||||
- If the question is genuinely ambiguous after a thorough lookup, list the candidates with their paths and stop.
|
- If the question is genuinely ambiguous after a thorough lookup, list the candidates with their paths and stop.
|
||||||
|
|
||||||
|
## Chunk citations
|
||||||
|
|
||||||
|
When the evidence for a claim came from a `read_file` response that included `<chunk id='…'>` blocks (i.e. a KB-indexed document under `/documents/`), append `[citation:<chunk_id>]` to the sentence stating that claim. The caller passes these markers through to the end user verbatim, and the UI resolves each id by exact match against the database, so a wrong id silently breaks the citation.
|
||||||
|
|
||||||
|
### Where chunk ids live in `read_file` output
|
||||||
|
|
||||||
|
A KB document's XML has three numeric attributes — only **one** is a citation source:
|
||||||
|
|
||||||
|
```
|
||||||
|
<document>
|
||||||
|
<document_metadata>
|
||||||
|
<document_id>42</document_id> ← NOT a citation. Parent doc id; ignore for citations.
|
||||||
|
...
|
||||||
|
</document_metadata>
|
||||||
|
<chunk_index>
|
||||||
|
<entry chunk_id="128" lines="14-22"/> ← Index hint; the same id also appears below.
|
||||||
|
<entry chunk_id="129" lines="23-30" matched="true"/>
|
||||||
|
</chunk_index>
|
||||||
|
<document_content>
|
||||||
|
<chunk id='128'><![CDATA[…]]></chunk> ← This is the citation source.
|
||||||
|
<chunk id='129'><![CDATA[…]]></chunk>
|
||||||
|
</document_content>
|
||||||
|
</document>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rules
|
||||||
|
|
||||||
|
- Use the **exact** id from a `<chunk id='…'>` tag whose content you actually quoted or paraphrased. Copy digit-for-digit; do **not** retype from memory.
|
||||||
|
- Before emitting `[citation:N]`, confirm the literal substring `<chunk id='N'>` (or its index twin `chunk_id="N"`) appears in the tool result you are summarising this turn. If you can't see it, omit the citation.
|
||||||
|
- Never cite `<document_id>` — that's the parent doc, not a chunk.
|
||||||
|
- Never invent, normalise, shorten, or guess at adjacent ids. If unsure between two candidates, omit rather than pick.
|
||||||
|
- Prefer **fewer accurate citations** over many speculative ones. One correct `[citation:128]` is more useful than a string of wrong ids.
|
||||||
|
- Multiple chunks supporting the same point → comma-separated and copied individually: `[citation:128], [citation:129]`.
|
||||||
|
- Plain square brackets only — no markdown links, no parentheses, no footnote numbers.
|
||||||
|
- If a claim came from a tool result that did **not** carry a chunk id (`ls`, `glob`, `grep` listings, error strings, or files without `<chunk id='…'>`), skip the citation.
|
||||||
|
- The absolute path under `/documents/` is always required; chunk citations are additive, they do not replace the path reference.
|
||||||
|
|
||||||
|
Example: `The Q2 roadmap lists three milestones (/documents/planning/q2-roadmap.md) [citation:128], [citation:129].`
|
||||||
|
|
|
||||||
|
|
@ -28,3 +28,7 @@ Reply in plain prose:
|
||||||
- Cite every claim with an absolute path.
|
- Cite every claim with an absolute path.
|
||||||
- If the workspace does not contain the requested information, say so explicitly. Do not fabricate paths or content.
|
- If the workspace does not contain the requested information, say so explicitly. Do not fabricate paths or content.
|
||||||
- If the question is genuinely ambiguous after a thorough lookup, list the candidates with their paths and stop.
|
- If the question is genuinely ambiguous after a thorough lookup, list the candidates with their paths and stop.
|
||||||
|
|
||||||
|
## Chunk citations
|
||||||
|
|
||||||
|
In desktop mode your filesystem tools read local files only, and local-file `read_file` responses do **not** carry `<chunk id='…'>` tags. Cite each claim with the absolute local path; do not emit `[citation:…]` markers — your caller has nothing to resolve them against.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue