Commit graph

4 commits

Author SHA1 Message Date
cybermaggedon
6cbaf88fc6 fix: ontology extractor reads .objects, not .object, from PromptResult (#842)
The extract-with-ontologies prompt is a JSONL prompt, which means the
prompt service returns a PromptResult with response_type="jsonl" and
the parsed items in `.objects` (plural).  The ontology extractor was
reading `.object` (singular) — the field used for response_type="json"
— which is always None for JSONL prompts.

Effect: the parser received None on every chunk, hit its "Unexpected
response type: <class 'NoneType'>" branch, returned no ExtractionResult,
and extract_with_simplified_format returned []. Every extraction
silently produced zero triples.

Graphs populated only with the seed ontology schema (TBox) and
document/chunk provenance — no instance triples at all.  The e2e test
threshold of >=100 edges per collection was met by schema + provenance
alone, so the failure mode was invisible until RAG queries couldn't
find any content.

Regression introduced in v2.3 with the token-usage work (commit
56d700f3 / 14e49d83) when PromptClient.prompt() began returning a
PromptResult wrapper instead of the raw text/dict/list.  All other
call sites of .prompt() across retrieval/, agent/, orchestrator/ were
already reading the correct field for their prompt's response_type;
ontology extraction was the sole stranded caller.

Also adds tests/unit/test_extract/test_ontology/test_extract_with_simplified_format.py
covering:
  - happy path: populated .objects produces non-empty triples
  - production failure shape: .objects=None returns [] cleanly
  - empty .objects returns [] without raising
  - defensive: do not silently fall back to .object for a JSONL prompt
2026-04-22 12:10:42 +01:00
Het Patel
391b9076f3 feat: add domain and range validation to triple extraction in extract.py (#825) 2026-04-17 11:29:57 +01:00
cybermaggedon
cf0daedefa
Changed schema for Value -> Term, majorly breaking change (#622)
* Changed schema for Value -> Term, majorly breaking change

* Following the schema change, Value -> Term into all processing

* Updated Cassandra for g, p, s, o index patterns (7 indexes)

* Reviewed and updated all tests

* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down
2026-01-27 13:48:08 +00:00
cybermaggedon
6c85038c75
Ontology extraction tests (#560) 2025-11-13 20:02:12 +00:00