feat: add cross-encoder reranking to Document-RAG with two-limit control (#878) (#1011)

Wire the FlashRank reranker subsystem from #1005 into Document-RAG: after
vector retrieval, over-fetch a wider candidate pool, rerank with the
cross-encoder, and keep the top doc_limit chunks for synthesis.

Per maintainer review, the fetch and select sizes are two caller-controlled
limits rather than one internal heuristic:

- doc_limit:   chunks selected into the synthesis prompt (unchanged meaning).
- fetch_limit: candidate pool pulled from the vector store before reranking.
  0 = derive (OVERFETCH_FACTOR x doc_limit); values below doc_limit are
  raised to it. Lets the caller control how hard the reranker has to work.

Details:
- schema: DocumentRagQuery.fetch_limit (additive, backward compatible).
- document_rag.py / rag.py: fetch_limit resolved in the processor (mirrors
  doc_limit); the core applies the heuristic default and derives synthesis
  provenance from the chunk-selection focus when reranking ran.
- provenance: tg:ChunkSelection focus stage (mirrors tg:EdgeSelection).
- request translator + client SDKs + CLI: fetch-limit / --fetch-limit,
  threaded exactly like doc_limit and the GraphRAG limits.
- tests: no-op identity, over-fetch/narrow, explicit fetch_limit, heuristic
  default, floor-at-doc_limit, provenance lineage, cross-repo topic wiring.

Reranking is skipped byte-identically when no reranker role is wired.
Requires the companion trustgraph-templates change wiring the reranker
topics into the document-rag flow (mirrors #279 for GraphRAG).
This commit is contained in:
Sunny 2026-07-02 02:50:13 -06:00 committed by GitHub
parent f18d48dc39
commit 6c9a545a06
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
18 changed files with 853 additions and 26 deletions

View file

@ -101,27 +101,27 @@ class TestQuery:
assert query.rag == mock_rag
assert query.collection == "test_collection"
assert query.verbose is False
assert query.doc_limit == 20 # Default value
assert query.fetch_limit == 20 # Default value
def test_query_initialization_with_custom_doc_limit(self):
"""Test Query initialization with custom doc_limit"""
def test_query_initialization_with_custom_fetch_limit(self):
"""Test Query initialization with custom fetch_limit"""
# Create mock DocumentRag
mock_rag = MagicMock()
# Initialize Query with custom doc_limit
# Initialize Query with custom fetch_limit
query = Query(
rag=mock_rag,
workspace="test_workspace",
collection="custom_collection",
verbose=True,
doc_limit=50
fetch_limit=50
)
# Verify initialization
assert query.rag == mock_rag
assert query.collection == "custom_collection"
assert query.verbose is True
assert query.doc_limit == 50
assert query.fetch_limit == 50
@pytest.mark.asyncio
async def test_extract_concepts(self):
@ -224,7 +224,7 @@ class TestQuery:
workspace="test_workspace",
collection="test_collection",
verbose=False,
doc_limit=15
fetch_limit=15
)
# Call get_docs with concepts list
@ -377,7 +377,7 @@ class TestQuery:
workspace="test_workspace",
collection="test_collection",
verbose=True,
doc_limit=5
fetch_limit=5
)
# Call get_docs with concepts
@ -615,7 +615,7 @@ class TestQuery:
workspace="test_workspace",
collection="test_collection",
verbose=False,
doc_limit=10
fetch_limit=10
)
docs, chunk_ids = await query.get_docs(["concept A", "concept B"])