The extract-with-ontologies prompt is a JSONL prompt, which means the
prompt service returns a PromptResult with response_type="jsonl" and
the parsed items in `.objects` (plural). The ontology extractor was
reading `.object` (singular) — the field used for response_type="json"
— which is always None for JSONL prompts.
Effect: the parser received None on every chunk, hit its "Unexpected
response type: <class 'NoneType'>" branch, returned no ExtractionResult,
and extract_with_simplified_format returned []. Every extraction
silently produced zero triples.
Graphs populated only with the seed ontology schema (TBox) and
document/chunk provenance — no instance triples at all. The e2e test
threshold of >=100 edges per collection was met by schema + provenance
alone, so the failure mode was invisible until RAG queries couldn't
find any content.
Regression introduced in v2.3 with the token-usage work (commit
56d700f3 / 14e49d83) when PromptClient.prompt() began returning a
PromptResult wrapper instead of the raw text/dict/list. All other
call sites of .prompt() across retrieval/, agent/, orchestrator/ were
already reading the correct field for their prompt's response_type;
ontology extraction was the sole stranded caller.
Also adds tests/unit/test_extract/test_ontology/test_extract_with_simplified_format.py
covering:
- happy path: populated .objects produces non-empty triples
- production failure shape: .objects=None returns [] cleanly
- empty .objects returns [] without raising
- defensive: do not silently fall back to .object for a JSONL prompt
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
* Changed schema for Value -> Term, majorly breaking change
* Following the schema change, Value -> Term into all processing
* Updated Cassandra for g, p, s, o index patterns (7 indexes)
* Reviewed and updated all tests
* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down