Fixes#826. Addresses all five points the maintainer called out in
the follow-up to #825.
Source change (trustgraph-flow/trustgraph/extract/kg/ontology/extract.py):
- Added `_is_subclass_of(cls, target, ontology_subset, max_depth=100)`
helper with visited-set cycle detection + a defensive depth cap.
LLM-generated ontologies may emit cycles (A subclass_of B,
B subclass_of A); the prior while-loop would infinite-loop on that.
- Replaced both near-identical domain and range subclass walks in
`is_valid_triple` with a single call to the new helper. Net is
-20 duplicated lines + 26-line helper.
Tests (tests/unit/test_extract/test_ontology/test_prompt_and_extraction.py):
- test_is_valid_triple_subclass_is_accepted: domain expects Recipe,
actual type is Cake (subclass), validates.
- test_is_valid_triple_handles_subclass_cycle_without_infinite_loop:
A subclass_of B, B subclass_of A; call returns False within the
depth cap rather than hanging.
- test_parse_and_validate_triples_collects_entity_types_from_rdf_type:
end-to-end path: rdf:type triples build the entity_types dict,
subsequent domain-check triples validate against it.
- test_is_valid_triple_entity_types_none_default: the None default
path now has explicit coverage.
156 existing tests in tests/unit/test_extract/test_ontology still pass.
The extract-with-ontologies prompt is a JSONL prompt, which means the
prompt service returns a PromptResult with response_type="jsonl" and
the parsed items in `.objects` (plural). The ontology extractor was
reading `.object` (singular) — the field used for response_type="json"
— which is always None for JSONL prompts.
Effect: the parser received None on every chunk, hit its "Unexpected
response type: <class 'NoneType'>" branch, returned no ExtractionResult,
and extract_with_simplified_format returned []. Every extraction
silently produced zero triples.
Graphs populated only with the seed ontology schema (TBox) and
document/chunk provenance — no instance triples at all. The e2e test
threshold of >=100 edges per collection was met by schema + provenance
alone, so the failure mode was invisible until RAG queries couldn't
find any content.
Regression introduced in v2.3 with the token-usage work (commit
56d700f3 / 14e49d83) when PromptClient.prompt() began returning a
PromptResult wrapper instead of the raw text/dict/list. All other
call sites of .prompt() across retrieval/, agent/, orchestrator/ were
already reading the correct field for their prompt's response_type;
ontology extraction was the sole stranded caller.
Also adds tests/unit/test_extract/test_ontology/test_extract_with_simplified_format.py
covering:
- happy path: populated .objects produces non-empty triples
- production failure shape: .objects=None returns [] cleanly
- empty .objects returns [] without raising
- defensive: do not silently fall back to .object for a JSONL prompt
* Changed schema for Value -> Term, majorly breaking change
* Following the schema change, Value -> Term into all processing
* Updated Cassandra for g, p, s, o index patterns (7 indexes)
* Reviewed and updated all tests
* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down