Tech spec

2026-04-25 00:16:23 +02:00 · 2026-04-14 20:31:42 +01:00 · 2026-04-14 20:31:42 +01:00 · 0f4c91b99a
commit 0f4c91b99a
parent ce3c8b421b
1 changed files with 354 additions and 0 deletions
--- a/docs/tech-specs/ontology-analytics-service.md
+++ b/docs/tech-specs/ontology-analytics-service.md
@ -0,0 +1,354 @@
 ---
 layout: default
 title: "Ontology Analytics Service Technical Specification"
 parent: "Tech Specs"
 ---
 # Ontology Analytics Service Technical Specification
 ## Overview
 This specification proposes extracting ontology analytics — the embedding,
 vector storage, and similarity-based selection of ontology elements — out of
 `kg-extract-ontology` and into a separate, reusable service. The goal is to
 make ontology analytics available to processors other than the extractor,
 preload the analytics state at ontology-load time rather than on-demand, and
 simplify the current per-flow duplication of vector stores.
 ## Problem Statement
 The current implementation in `kg-extract-ontology`
 (`trustgraph-flow/trustgraph/extract/kg/ontology/extract.py`) embeds
 ontology analytics inside a single processor. This has four concrete
 problems:
 1. **Analytics are locked inside one processor.** Other services that
   could benefit from ontology-similarity lookups (e.g. relationship
   extraction, agent tool selection, query routing, future features that
   don't exist yet) have no way to call into the analytics without
   duplicating the embedder/vector store/selector machinery. The
   extractor owns state that conceptually belongs to the platform.
 2. **Vector stores are built lazily, per-flow, on first message.** Each
   flow that uses `kg-extract-ontology` gets its own
   `{embedder, vector_store, selector}` triple, keyed by `id(flow)`, and
   the vector store is only populated the first time a message arrives
   on that flow. This has several consequences:
   - **Cold-start cost on the hot path.** The first document chunk
     processed by a flow pays the full cost of embedding every element
     of the ontology (hundreds to thousands of calls to the embeddings
     service) before extraction can even begin. Subsequent chunks are
     fast, but the first one can take seconds to minutes depending on
     ontology size.
   - **N copies of the same data.** If three flows all use the same
     embedding model and the same ontology, three identical in-memory
     vector stores are built and maintained. Memory scales with the
     number of flows, not the number of ontologies.
   - **Total loss on restart.** Vector stores are `InMemoryVectorStore`
     instances — nothing is persisted. A restart forces every flow to
     re-embed the entire ontology on its next message.
   - **Ontology updates re-embed everything.** When the ontology config
     changes, `flow_components` is cleared and the re-embedding happens
     lazily across every flow, again on the hot path.
   The per-flow split exists for a real reason: different flows may use
   different embedding models with different vector dimensions, and the
   current code detects the dimension by probing the flow's embeddings
   service with a test string. But in practice most deployments use one
   embedding model, and the per-flow split is paying a cost for
   flexibility that isn't being used.
 3. **Per-message selection is not batched.** For every document chunk
   processed, the selector calls `embedder.embed_text(segment.text)`
   once per segment (`ontology_selector.py:96`), each of which
   round-trips through the pub/sub layer to the embeddings service as a
   single-element batch. A chunk with 20 segments fires 20 sequential
   embeddings requests when it could fire one. Ontology ingest is
   already batched at 50 at a time; per-message lookups are not.
 4. **There is no way to scope ontologies to a flow.** All loaded
   ontologies are visible to all flows. A flow that only cares about,
   say, the FIBO ontology still pays for embedding and similarity search
   across every unrelated ontology in config. This is a usability and
   performance problem that gets worse as the number of loaded
   ontologies grows.
 Together these problems mean the current implementation is tightly
 coupled, duplicates work across flows, pays its worst costs on the
 hot path, and can't be reused. Each of the four goals below addresses
 one of these problems directly.
 ## Goals
 - **Eager ontology processing.** When an ontology is loaded (via config
  push), embed it and populate the vector store immediately, before any
  document processing messages arrive. First-chunk latency should not
  include ontology embedding cost.
 - **Simplify away from per-flow analytics.** Move the vector store out
  of per-flow scope. Revisit whether per-flow dimension detection is
  still necessary, or whether a shared analytics store (per embedding
  model, or globally) is sufficient for realistic deployments.
 - **Extract ontology analytics into a standalone service.** Expose
  embedding, similarity search, and ontology-subset selection as a
  service callable from any processor via the normal pub/sub
  request/response pattern. `kg-extract-ontology` becomes one consumer
  among many.
 - **(Stretch)** Allow flows to select which ontologies they use when
  the flow is started, so a flow can restrict itself to a named subset
  of the loaded ontologies rather than seeing all of them.
 ## Background
 The current ontology analytics live in four files under
 `trustgraph-flow/trustgraph/extract/kg/ontology/`:
 - `ontology_loader.py` — parses ontology definitions from config.
 - `ontology_embedder.py` — generates embeddings for ontology elements
  and stores them in an `InMemoryVectorStore`. Batches at 50 elements
  per call on ingest.
 - `vector_store.py` — `InMemoryVectorStore`, a numpy-backed dense vector
  store with similarity search.
 - `ontology_selector.py` — given a query (a document segment), returns
  the top-K most relevant ontology elements via similarity search.
 These are wired together inside `extract.py`, where `Processor` holds
 one `OntologyLoader` and one `TextProcessor` but maintains a
 `flow_components` dict mapping `id(flow) →
 {embedder, vector_store, selector}`. `initialize_flow_components` is
 called on the first message for each flow; ontology config changes
 trigger `self.flow_components.clear()` to force re-initialisation.
 The analytics stack is conceptually independent from the extractor —
 it's just a "given text, find relevant ontology elements" service —
 but because it's embedded in the extractor, no other processor can use
 it, and its lifecycle is coupled to message processing rather than to
 ontology loading.
 ## Technical Design
 ### Architecture
 A new processor, tentatively `ontology-analytics`, hosts the embedder,
 vector store, and selector. It is a `FlowProcessor` — flow-aware so
 future per-user state management can hook in at the flow boundary —
 but the analytics state it holds is **global to the process**, not
 per-flow. Shared state is the default; flow scope is just a handle
 available when needed.
 Crucially, the service does **not** call a separate embeddings service
 for its own embedding work. It loads and runs an embedding model
 directly, in-process, the same way `embeddings-fastembed` does today.
 The rationale: the embedding model used for ontology analytics is a
 deployment choice driven by the ontology's semantics, not by the
 user's document embedding model. Coupling the two would be an
 accidental constraint. Decoupling means:
 - The analytics service has no dependency on a flow's
  `embeddings-request` service. No routing, no external call, no
  async dependency on another processor being up.
 - Ontology ingest happens synchronously in the analytics process as
  soon as the ontology config is loaded. No round-trips.
 - Per-message selection (embedding a query segment and searching the
  vector store) is also local. Fast path, no pub/sub hop for
  embedding.
 - Most deployments will use a single embedding model configured on
  the analytics service. If a deployment needs two, run two
  `ontology-analytics` processor instances with different ids and
  have flows address the one they want.
 The flow configuration optionally specifies which analytics service
 id to use and which ontologies the flow cares about; otherwise the
 flow sees all loaded ontologies through the default analytics
 service.
 Components inside the new processor:
 1. **OntologyLoader** (moved from `kg-extract-ontology`)
   Parses ontologies from config-push messages. Owns the parsed
   ontology objects.
 2. **In-process embedding model**
   Loaded once at startup using the service's configured model name.
   Same pattern as `embeddings-fastembed` — direct library call,
   batched.
 3. **Vector store (global)**
   One `InMemoryVectorStore` per loaded ontology, not per flow.
   Populated eagerly when the ontology is loaded (or reloaded).
   Lives in process memory for the lifetime of the service.
 4. **Selector**
   Given a query text and a subset of ontology ids, returns the
   top-K most relevant ontology elements across the union of those
   stores. Batches embedding calls for multi-segment queries.
 5. **Request handler**
   Exposes a request/response service over the pub/sub layer for
   other processors (initially `kg-extract-ontology`, later
   others) to call.
 Module: `trustgraph-flow/trustgraph/ontology_analytics/` (new)
 ### Data Models
 #### Service request
 ```
 OntologyAnalyticsRequest {
    query_texts: list[str]         # batch of query segments
    ontology_ids: list[str] | None # optional subset; None = all
    top_k: int                     # default per-service
    similarity_threshold: float    # default per-service
 }
 ```
 #### Service response
 ```
 OntologyAnalyticsResponse {
    results: list[list[OntologyMatch]]  # one list per query_text
 }
 OntologyMatch {
    element_id: str        # e.g. "fibo:Bond"
    element_type: str      # class | property | individual
    ontology_id: str       # which ontology it came from
    text: str              # the text that was embedded
    score: float           # similarity score
 }
 ```
 The request is a batch by construction: the caller sends a list of
 segments, the service embeds them in one batched embedding call,
 searches the relevant stores, and returns a list of match-lists
 aligned with the input. This directly fixes the per-segment
 unbatched call in the current selector.
 #### Flow configuration
 Flow instance config gains an optional `ontology_analytics` block:
 ```
 ontology_analytics:
  service_id: ontology-analytics   # default
  ontologies: [fibo, geonames]     # default: all loaded
 ```
 If omitted, flows use the default analytics service id and see every
 loaded ontology.
 ### APIs
 New services:
 - `ontology-analytics` request/response service, schema as above.
 New flow service client:
 - `flow("ontology-analytics-request")` — standard flow-scoped client
  wrapper for calling the analytics service.
 Modified code paths:
 - `kg-extract-ontology` no longer owns the embedder, vector store,
  or selector. Its per-message logic calls
  `flow("ontology-analytics-request")` with the segments extracted
  from the chunk and gets back the same shape of data it used to
  compute locally.
 - The `on_ontology_config` handler in `kg-extract-ontology` goes
  away; config-push of ontology types is handled by the new
  service.
 ### Implementation Details
 - **Eager ingest.** When `on_ontology_config` fires on the analytics
  service, the service synchronously (well, in an asyncio task)
  re-embeds the changed ontologies and atomically swaps the new
  vector stores in. Per-ontology granularity — unchanged ontologies
  are not re-embedded.
 - **Ontology updates as cutover.** Once a new ontology version is
  swapped in, all subsequent requests see the new state. In-flight
  requests complete against whichever version they started reading.
  No version pinning; callers who care take responsibility.
 - **Shared vs flow state.** The processor keeps a flow dict for
  future per-user additions but the vector stores themselves are
  module-level (on `self`, not on per-flow state). The flow is only
  used to find the caller's `ontology_analytics` config block for
  this request.
 - **Batching.** Both ontology ingest and per-request query embedding
  use the in-process embedder's batched interface. No per-element
  round-trips anywhere on the hot path.
 ## Security Considerations
 *(To be filled in.)*
 ## Performance Considerations
 *(To be filled in — but note up front that batching per-message
 selector calls is an easy pre-existing win that doesn't require the
 new service, and should probably land separately first.)*
 ## Testing Strategy
 *(To be filled in.)*
 ## Migration Plan
 *(To be filled in — needs to cover how `kg-extract-ontology`
 transitions from owning the analytics to calling a new service, and
 what happens to in-flight flows during the rollover.)*
 ## Resolved Questions
 - **Persistence:** in-memory only. Ontology data is small, restart
  cost is acceptable, and adding a storage backend (Qdrant, local
  file, etc.) isn't justified.
 - **Per-embedding-model scoping:** the service has its own embedding
  model, loaded in-process, independent of any flow's embeddings
  service. Deployments needing multiple analytics embedding models
  run multiple `ontology-analytics` instances.
 - **Flow-scoped ontology selection:** configured in flow instance
  config under an `ontology_analytics` block, same place as other
  flow-scoped service selection (LLM, embedding model, etc.).
 - **Ontology-update semantics:** cutover. Once a new ontology version
  is swapped in, subsequent requests see the new state. No version
  pinning; users take responsibility for dangerous changes.
 - **Processor shape:** FlowProcessor. Keeps the flow boundary
  available as a hook for future per-user state management, even
  though the analytics state itself is global to the process.
 - **Embedding model configuration:** processor argument (e.g.
  `--embedding-model all-MiniLM-L6-v2`), not flow-class parameter.
  Default is minilm. Deployments needing a different model restart
  the service with a different argument; multi-model deployments
  run multiple service instances.
 - **`kg-extract-ontology` after the split:** still owns LLM
  invocation, prompt construction, triple building, and emission.
  The only change is that it calls
  `flow("ontology-analytics-request")` to get the relevant ontology
  subset for a chunk, instead of maintaining its own embedder,
  vector store, and selector. The local `OntologyLoader`,
  `OntologyEmbedder`, `OntologySelector`, `InMemoryVectorStore` and
  the per-flow `flow_components` dict all go away.
 ## Future Work
 - **Dynamic ontology learning.** A future flow may identify new
  potential ontology elements at runtime and extend the ontology in
  a learning/bootstrap fashion. That this is even feasible is a
  useful validation of the split: the analytics service becomes a
  reusable component with broader applications than the initial
  extractor use case. Several write-path shapes are viable —
  config-round-trip (learning flow writes back to config, normal
  config-push triggers re-embed, one source of truth, heavier per
  element) or a direct add API on the service (cheaper per element,
  but creates two sources of truth and loses state on restart). The
  config-round-trip feels like the right default but the decision
  is deferred until a concrete learning flow is designed.
 ## References
 - Current implementation:
  `trustgraph-flow/trustgraph/extract/kg/ontology/`
 - Related existing spec: `ontorag.md`