feat: filter and cap GraphRAG reranker input across full stack (#1021)

- Filter out RDF/RDFS/OWL schema predicates (rdfs:domain, owl:inverseOf,
  etc.) from hop traversal, keeping rdf:type for data signal
- Skip edges where reranker-visible components are unlabeled IRIs, since
  the cross-encoder cannot meaningfully score raw URIs
- Add max-reranker-input safety cap (default 350) to prevent overloading
  the reranker, applied after filtering for maximum useful candidates
- Expose max-reranker-input as per-request parameter through schema,
  translator, REST API, socket client, CLI, and OpenAPI spec
- Update tests
- Update tech spec
This commit is contained in:
cybermaggedon 2026-07-03 15:51:04 +01:00 committed by GitHub
parent 76c4763b9b
commit 68e816e65c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 198 additions and 43 deletions

View file

@ -42,6 +42,13 @@ properties:
minimum: 1
maximum: 5
example: 3
max-reranker-input:
type: integer
description: Maximum candidate edges sent to the reranker per hop
default: 350
minimum: 1
maximum: 1000
example: 350
streaming:
type: boolean
description: Enable streaming response delivery