trustgraph/trustgraph-base/trustgraph/messaging/translators
Sunny 6c9a545a06
feat: add cross-encoder reranking to Document-RAG with two-limit control (#878) (#1011)
Wire the FlashRank reranker subsystem from #1005 into Document-RAG: after
vector retrieval, over-fetch a wider candidate pool, rerank with the
cross-encoder, and keep the top doc_limit chunks for synthesis.

Per maintainer review, the fetch and select sizes are two caller-controlled
limits rather than one internal heuristic:

- doc_limit:   chunks selected into the synthesis prompt (unchanged meaning).
- fetch_limit: candidate pool pulled from the vector store before reranking.
  0 = derive (OVERFETCH_FACTOR x doc_limit); values below doc_limit are
  raised to it. Lets the caller control how hard the reranker has to work.

Details:
- schema: DocumentRagQuery.fetch_limit (additive, backward compatible).
- document_rag.py / rag.py: fetch_limit resolved in the processor (mirrors
  doc_limit); the core applies the heuristic default and derives synthesis
  provenance from the chunk-selection focus when reranking ran.
- provenance: tg:ChunkSelection focus stage (mirrors tg:EdgeSelection).
- request translator + client SDKs + CLI: fetch-limit / --fetch-limit,
  threaded exactly like doc_limit and the GraphRAG limits.
- tests: no-op identity, over-fetch/narrow, explicit fetch_limit, heuristic
  default, floor-at-doc_limit, provenance lineage, cross-repo topic wiring.

Reranking is skipped byte-identically when no reranker role is wired.
Requires the companion trustgraph-templates change wiring the reranker
topics into the document-rag flow (mirrors #279 for GraphRAG).
2026-07-02 09:50:13 +01:00
..
__init__.py feat: replace LLM edge scoring with cross-encoder reranker in GraphRAG (#1005) 2026-06-30 14:36:37 +01:00
agent.py feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840) 2026-04-21 23:23:01 +01:00
base.py Pub/sub abstraction: decouple from Pulsar (#751) 2026-04-01 20:16:53 +01:00
collection.py Per-workspace queue routing for workspace-scoped services (#862) 2026-05-04 10:30:03 +01:00
config.py feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840) 2026-04-21 23:23:01 +01:00
diagnosis.py Pub/sub abstraction: decouple from Pulsar (#751) 2026-04-01 20:16:53 +01:00
document_loading.py feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840) 2026-04-21 23:23:01 +01:00
embeddings.py Pub/sub abstraction: decouple from Pulsar (#751) 2026-04-01 20:16:53 +01:00
embeddings_query.py feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840) 2026-04-21 23:23:01 +01:00
flow.py Per-workspace queue routing for workspace-scoped services (#862) 2026-05-04 10:30:03 +01:00
iam.py feat: global usernames and rename workspace to default_workspace (#1001) 2026-06-25 16:34:31 +01:00
knowledge.py feat: complete knowledge core storage — named graphs, provenance, source material (#973) 2026-06-03 10:46:52 +01:00
library.py Per-workspace queue routing for workspace-scoped services (#862) 2026-05-04 10:30:03 +01:00
metadata.py Per-workspace queue routing for workspace-scoped services (#862) 2026-05-04 10:30:03 +01:00
nlp_query.py Pub/sub abstraction: decouple from Pulsar (#751) 2026-04-01 20:16:53 +01:00
primitives.py Pub/sub abstraction: decouple from Pulsar (#751) 2026-04-01 20:16:53 +01:00
prompt.py Expose LLM token usage across all service layers (#782) 2026-04-13 14:38:34 +01:00
reranker.py feat: replace LLM edge scoring with cross-encoder reranker in GraphRAG (#1005) 2026-06-30 14:36:37 +01:00
retrieval.py feat: add cross-encoder reranking to Document-RAG with two-limit control (#878) (#1011) 2026-07-02 09:50:13 +01:00
rows_query.py feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840) 2026-04-21 23:23:01 +01:00
sparql_query.py feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840) 2026-04-21 23:23:01 +01:00
structured_query.py feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840) 2026-04-21 23:23:01 +01:00
text_completion.py Expose LLM token usage across all service layers (#782) 2026-04-13 14:38:34 +01:00
tool.py Pub/sub abstraction: decouple from Pulsar (#751) 2026-04-01 20:16:53 +01:00
triples.py feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840) 2026-04-21 23:23:01 +01:00