feat: add cross-encoder reranking to Document-RAG with two-limit control (#878) (#1011)

Wire the FlashRank reranker subsystem from #1005 into Document-RAG: after vector retrieval, over-fetch a wider candidate pool, rerank with the cross-encoder, and keep the top doc_limit chunks for synthesis. Per maintainer review, the fetch and select sizes are two caller-controlled limits rather than one internal heuristic: - doc_limit: chunks selected into the synthesis prompt (unchanged meaning). - fetch_limit: candidate pool pulled from the vector store before reranking. 0 = derive (OVERFETCH_FACTOR x doc_limit); values below doc_limit are raised to it. Lets the caller control how hard the reranker has to work. Details: - schema: DocumentRagQuery.fetch_limit (additive, backward compatible). - document_rag.py / rag.py: fetch_limit resolved in the processor (mirrors doc_limit); the core applies the heuristic default and derives synthesis provenance from the chunk-selection focus when reranking ran. - provenance: tg:ChunkSelection focus stage (mirrors tg:EdgeSelection). - request translator + client SDKs + CLI: fetch-limit / --fetch-limit, threaded exactly like doc_limit and the GraphRAG limits. - tests: no-op identity, over-fetch/narrow, explicit fetch_limit, heuristic default, floor-at-doc_limit, provenance lineage, cross-repo topic wiring. Reranking is skipped byte-identically when no reranker role is wired. Requires the companion trustgraph-templates change wiring the reranker topics into the document-rag flow (mirrors #279 for GraphRAG).
2026-07-03 06:51:00 +02:00 · 2026-07-02 02:50:13 -06:00 · 2026-07-02 02:50:13 -06:00 · 6c9a545a06
commit 6c9a545a06
parent f18d48dc39
18 changed files with 853 additions and 26 deletions
--- a/tests/unit/test_retrieval/test_document_rag.py
+++ b/tests/unit/test_retrieval/test_document_rag.py
@ -101,27 +101,27 @@ class TestQuery:
        assert query.rag == mock_rag
        assert query.collection == "test_collection"
        assert query.verbose is False
-        assert query.doc_limit == 20  # Default value
+        assert query.fetch_limit == 20  # Default value

-    def test_query_initialization_with_custom_doc_limit(self):
-        """Test Query initialization with custom doc_limit"""
+    def test_query_initialization_with_custom_fetch_limit(self):
+        """Test Query initialization with custom fetch_limit"""
        # Create mock DocumentRag
        mock_rag = MagicMock()

-        # Initialize Query with custom doc_limit
+        # Initialize Query with custom fetch_limit
        query = Query(
            rag=mock_rag,
            workspace="test_workspace",
            collection="custom_collection",
            verbose=True,
-            doc_limit=50
+            fetch_limit=50
        )

        # Verify initialization
        assert query.rag == mock_rag
        assert query.collection == "custom_collection"
        assert query.verbose is True
-        assert query.doc_limit == 50
+        assert query.fetch_limit == 50

    @pytest.mark.asyncio
    async def test_extract_concepts(self):
@ -224,7 +224,7 @@ class TestQuery:
            workspace="test_workspace",
            collection="test_collection",
            verbose=False,
-            doc_limit=15
+            fetch_limit=15
        )

        # Call get_docs with concepts list
@ -377,7 +377,7 @@ class TestQuery:
            workspace="test_workspace",
            collection="test_collection",
            verbose=True,
-            doc_limit=5
+            fetch_limit=5
        )

        # Call get_docs with concepts
@ -615,7 +615,7 @@ class TestQuery:
            workspace="test_workspace",
            collection="test_collection",
            verbose=False,
-            doc_limit=10
+            fetch_limit=10
        )

        docs, chunk_ids = await query.get_docs(["concept A", "concept B"])