Commit graph

68 commits

Author SHA1 Message Date
cybermaggedon
d7745baab4
Add Kafka pub/sub backend (#830)
Third backend alongside Pulsar and RabbitMQ. Topics map 1:1 to Kafka
topics, subscriptions map to consumer groups. Response/notify uses
unique consumer groups with correlation ID filtering. Topic lifecycle
managed via AdminClient with class-based retention.

Initial code drop: Needs major integration testing
2026-04-18 11:18:34 +01:00
cybermaggedon
9f84891fcc
Flow service lifecycle management (#822)
feat: separate flow service from config service with explicit queue
lifecycle management

The flow service is now an independent service that owns the lifecycle
of flow and blueprint queues. System services own their own queues.
Consumers never create queues.

Flow service separation:
- New service at trustgraph-flow/trustgraph/flow/service/
- Uses async ConfigClient (RequestResponse pattern) to talk to config
  service
- Config service stripped of all flow handling

Queue lifecycle management:
- PubSubBackend protocol gains create_queue, delete_queue,
  queue_exists, ensure_queue — all async
- RabbitMQ: implements via pika with asyncio.to_thread internally
- Pulsar: stubs for future admin REST API implementation
- Consumer _connect() no longer creates queues (passive=True for named
  queues)
- System services call ensure_queue on startup
- Flow service creates queues on flow start, deletes on flow stop
- Flow service ensures queues for pre-existing flows on startup

Two-phase flow stop:
- Phase 1: set flow status to "stopping", delete processor config
  entries
- Phase 2: retry queue deletion, then delete flow record

Config restructure:
- active-flow config replaced with processor:{name} types
- Each processor has its own config type, each flow variant is a key
- Flow start/stop use batch put/delete — single config push per
  operation
- FlowProcessor subscribes to its own type only

Blueprint format:
- Processor entries split into topics and parameters dicts
- Flow interfaces use {"flow": "topic"} instead of bare strings
- Specs (ConsumerSpec, ProducerSpec, etc.) read from
  definition["topics"]

Tests updated
2026-04-16 17:19:39 +01:00
Alex Jenkins
8954fa3ad7 Feat: TrustGraph i18n & Documentation Translation Updates (#781)
Native CLI i18n: The TrustGraph CLI has built-in translation support
that dynamically loads language strings. You can test and use
different languages by simply passing the --lang flag (e.g., --lang
es for Spanish, --lang ru for Russian) or by configuring your
environment's LANG variable.

Automated Docs Translations: This PR introduces autonomously
translated Markdown documentation into several target languages,
including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew,
Arabic, Simplified Chinese, and Russian.
2026-04-14 12:08:32 +01:00
cybermaggedon
1515dbaf08
Updated tech specs for agent & explainability changes (#796) 2026-04-13 17:29:24 +01:00
cybermaggedon
ddd4bd7790
Deliver explainability triples inline in retrieval response stream (#763)
Provenance triples are now included directly in explain messages from
GraphRAG, DocumentRAG, and Agent services, eliminating the need for
follow-up knowledge graph queries to retrieve explainability details.

Each explain message in the response stream now carries:
- explain_id: root URI for this provenance step (unchanged)
- explain_graph: named graph where triples are stored (unchanged)
- explain_triples: the actual provenance triples for this step (new)

Changes across the stack:
- Schema: added explain_triples field to GraphRagResponse,
  DocumentRagResponse, and AgentResponse
- Services: all explain message call sites pass triples through
  (graph_rag, document_rag, agent react, agent orchestrator)
- Translators: encode explain_triples via TripleTranslator for
  gateway wire format
- Python SDK: ProvenanceEvent now includes parsed ExplainEntity
  and raw triples; expanded event_type detection
- CLI: invoke_graph_rag, invoke_agent, invoke_document_rag use
  inline entity when available, fall back to graph query
- Tech specs updated

Additional explainability test
2026-04-07 12:19:05 +01:00
cybermaggedon
4acd853023
Config push notify pattern: replace stateful pub/sub with signal+ fetch (#760)
Replace the config push mechanism that broadcast the full config
blob on a 'state' class pub/sub queue with a lightweight notify
signal containing only the version number and affected config
types. Processors fetch the full config via request/response from
the config service when notified.

This eliminates the need for the pub/sub 'state' queue class and
stateful pub/sub services entirely. The config push queue moves
from 'state' to 'flow' class — a simple transient signal rather
than a retained message.  This solves the RabbitMQ
late-subscriber problem where restarting processes never received
the current config because their fresh queue had no historical
messages.

Key changes:
- ConfigPush schema: config dict replaced with types list
- Subscribe-then-fetch startup with retry: processors subscribe
  to notify queue, fetch config via request/response, then
  process buffered notifies with version comparison to avoid race
  conditions
- register_config_handler() accepts optional types parameter so
  handlers only fire when their config types change
- Short-lived config request/response clients to avoid subscriber
  contention on non-persistent response topics
- Config service passes affected types through put/delete/flow
  operations
- Gateway ConfigReceiver rewritten with same notify pattern and
  retry loop

Tests updated

New tests:
- register_config_handler: without types, with types, multiple
  types, multiple handlers
- on_config_notify: old/same version skipped, irrelevant types
  skipped (version still updated), relevant type triggers fetch,
  handler without types always called, mixed handler filtering,
  empty types invokes all, fetch failure handled gracefully
- fetch_config: returns config+version, raises on error response,
  stops client even on exception
- fetch_and_apply_config: applies to all handlers on startup,
  retries on failure
2026-04-06 16:57:27 +01:00
cybermaggedon
d9dc4cbab5
SPARQL query service (#754)
SPARQL 1.1 query service wrapping pub/sub triples interface

Add a backend-agnostic SPARQL query service that parses SPARQL
queries using rdflib, decomposes them into triple pattern lookups
via the existing TriplesClient pub/sub interface, and performs
in-memory joins, filters, and projections.

Includes:
- SPARQL parser, algebra evaluator, expression evaluator, solution
  sequence operations (BGP, JOIN, OPTIONAL, UNION, FILTER, BIND,
  VALUES, GROUP BY, ORDER BY, LIMIT/OFFSET, DISTINCT, aggregates)
- FlowProcessor service with TriplesClientSpec
- Gateway dispatcher, request/response translators, API spec
- Python SDK method (FlowInstance.sparql_query)
- CLI command (tg-invoke-sparql-query)
- Tech spec (docs/tech-specs/sparql-query.md)

New unit tests for SPARQL query
2026-04-02 17:21:39 +01:00
cybermaggedon
4fb0b4d8e8
Pub/sub abstraction: decouple from Pulsar (#751)
Remove Pulsar-specific concepts from application code so that
the pub/sub backend is swappable via configuration.

Rename translators:
- to_pulsar/from_pulsar → decode/encode across all translator
  classes, dispatch handlers, and tests (55+ files)
- from_response_with_completion → encode_with_completion
- Remove pulsar.schema.Record from translator base class

Queue naming (CLASS:TOPICSPACE:TOPIC):
- Replace topic() helper with queue() using new format:
  flow:tg:name, request:tg:name, response:tg:name, state:tg:name
- Queue class implies persistence/TTL (no QoS in names)
- Update Pulsar backend map_topic() to parse new format
- Librarian queues use flow class (persistent, for chunking)
- Config push uses state class (persistent, last-value)
- Remove 15 dead topic imports from schema files
- Update init_trustgraph.py namespace: config → state

Confine Pulsar to pulsar_backend.py:
- Delete legacy PulsarClient class from pubsub.py
- Move add_args to add_pubsub_args() with standalone flag
  for CLI tools (defaults to localhost)
- PulsarBackendConsumer.receive() catches _pulsar.Timeout,
  raises standard TimeoutError
- Remove Pulsar imports from: async_processor, flow_processor,
  log_level, all 11 client files, 4 storage writers, gateway
  service, gateway config receiver
- Remove log_level/LoggerLevel from client API
- Rewrite tg-monitor-prompts to use backend abstraction
- Update tg-dump-queues to use add_pubsub_args

Also: pubsub-abstraction.md tech spec covering problem statement,
design goals, as-is requirements, candidate broker assessment,
approach, and implementation order.
2026-04-01 20:16:53 +01:00
cybermaggedon
849987f0e6
Add multi-pattern orchestrator with plan-then-execute and supervisor (#739)
Introduce an agent orchestrator service that supports three
execution patterns (ReAct, plan-then-execute, supervisor) with
LLM-based meta-routing to select the appropriate pattern and task
type per request. Update the agent schema to support
orchestration fields (correlation, sub-agents, plan steps) and
remove legacy response fields (answer, thought, observation).
2026-03-31 00:32:49 +01:00
cybermaggedon
5c6fe90fe2
Add universal document decoder with multi-format support (#705)
Add universal document decoder with multi-format support
using 'unstructured'.

New universal decoder service powered by the unstructured
library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF,
ODT, EPUB and more through a single service. Tables are preserved
as HTML markup for better downstream extraction. Images are
stored in the librarian but excluded from the text
pipeline. Configurable section grouping strategies
(whole-document, heading, element-type, count, size) for non-page
formats. Page-based formats (PDF, PPTX, XLSX) are automatically
grouped by page.

All four decoders (PDF, Mistral OCR, Tesseract OCR, universal)
now share the "document-decoder" ident so they are
interchangeable.  PDF-only decoders fetch document metadata to
check MIME type and gracefully skip unsupported formats.

Librarian changes: removed MIME type whitelist validation so any
document format can be ingested. Simplified routing so text/plain
goes to text-load and everything else goes to document-load.
Removed dual inline/streaming data paths — documents always use
document_id for content retrieval.

New provenance entity types (tg:Section, tg:Image) and metadata
predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for
richer explainability.

Universal decoder is in its own package (trustgraph-unstructured)
and container image (trustgraph-unstructured).
2026-03-23 12:56:35 +00:00
cybermaggedon
64e3f6bd0d
Subgraph provenance (#694)
Replace per-triple provenance reification with subgraph model

Extraction provenance previously created a full reification (statement
URI, activity, agent) for every single extracted triple, producing ~13
provenance triples per knowledge triple.  Since each chunk is processed
by a single LLM call, this was both redundant and semantically
inaccurate.

Now one subgraph object is created per chunk extraction, with
tg:contains linking to each extracted triple.  For 20 extractions from
a chunk this reduces provenance from ~260 triples to ~33.

- Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri
- Replace triple_provenance_triples() with subgraph_provenance_triples()
- Refactor kg-extract-definitions and kg-extract-relationships to
  generate provenance once per chunk instead of per triple
- Add subgraph provenance to kg-extract-ontology and kg-extract-agent
  (previously had none)
- Update CLI tools and tech specs to match

Also rename tg-show-document-hierarchy to tg-show-extraction-provenance.

Added extra typing for extraction provenance, fixed extraction prov CLI
2026-03-13 11:37:59 +00:00
cybermaggedon
312174eb88
Adding explainability to the ReACT agent (#689)
* Added tech spec

* Add provenance recording to React agent loop

Enables agent sessions to be traced and debugged using the same
explainability infrastructure as GraphRAG. Agent traces record:
- Session start with query and timestamp
- Each iteration's thought, action, arguments, and observation
- Final answer with derivation chain

Changes:
- Add session_id and collection fields to AgentRequest schema
- Add agent predicates (TG_THOUGHT, TG_ACTION, etc.) to namespaces
- Create agent provenance triple generators in provenance/agent.py
- Register explainability producer in agent service
- Emit provenance triples during agent execution
- Update CLI tools to detect and render agent traces alongside GraphRAG

* Updated explainability taxonomy:

GraphRAG: tg:Question → tg:Exploration → tg:Focus → tg:Synthesis

Agent: tg:Question → tg:Analysis(s) → tg:Conclusion

All entities also have their PROV-O type (prov:Activity or prov:Entity).

Updated commit message:

Add provenance recording to React agent loop

Enables agent sessions to be traced and debugged using the same
explainability infrastructure as GraphRAG.

Entity types follow human reasoning patterns:
- tg:Question - the user's query (shared with GraphRAG)
- tg:Analysis - each think/act/observe cycle
- tg:Conclusion - the final answer

Also adds explicit TG types to GraphRAG entities:
- tg:Question, tg:Exploration, tg:Focus, tg:Synthesis

All types retain their PROV-O base types (prov:Activity, prov:Entity).

Changes:
- Add session_id and collection fields to AgentRequest schema
- Add explainability entity types to namespaces.py
- Create agent provenance triple generators
- Register explainability producer in agent service
- Emit provenance triples during agent execution
- Update CLI tools to detect and render both trace types

* Document RAG explainability is now complete. Here's a summary of the
changes made:

Schema Changes:
- trustgraph-base/trustgraph/schema/services/retrieval.py: Added
  explain_id and explain_graph fields to DocumentRagResponse
- trustgraph-base/trustgraph/messaging/translators/retrieval.py:
  Updated translator to handle explainability fields

Provenance Changes:
- trustgraph-base/trustgraph/provenance/namespaces.py: Added
  TG_CHUNK_COUNT and TG_SELECTED_CHUNK predicates
- trustgraph-base/trustgraph/provenance/uris.py: Added
  docrag_question_uri, docrag_exploration_uri, docrag_synthesis_uri
  generators
- trustgraph-base/trustgraph/provenance/triples.py: Added
  docrag_question_triples, docrag_exploration_triples,
  docrag_synthesis_triples builders
- trustgraph-base/trustgraph/provenance/__init__.py: Exported all
  new Document RAG functions and predicates

Service Changes:
- trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py:
  Added explainability callback support and triple emission at each
  phase (Question → Exploration → Synthesis)
- trustgraph-flow/trustgraph/retrieval/document_rag/rag.py:
  Registered explainability producer and wired up the callback

Documentation:
- docs/tech-specs/agent-explainability.md: Added Document RAG entity
  types and provenance model documentation

Document RAG Provenance Model:
Question (urn:trustgraph:docrag:{uuid})
    │
    │  tg:query, prov:startedAtTime
    │  rdf:type = prov:Activity, tg:Question
    │
    ↓ prov:wasGeneratedBy
    │
Exploration (urn:trustgraph:docrag:{uuid}/exploration)
    │
    │  tg:chunkCount, tg:selectedChunk (multiple)
    │  rdf:type = prov:Entity, tg:Exploration
    │
    ↓ prov:wasDerivedFrom
    │
Synthesis (urn:trustgraph:docrag:{uuid}/synthesis)
    │
    │  tg:content = "The answer..."
    │  rdf:type = prov:Entity, tg:Synthesis

* Specific subtype that makes the retrieval mechanism immediately
obvious:

System: GraphRAG
TG Types on Question: tg:Question, tg:GraphRagQuestion
URI Pattern: urn:trustgraph:question:{uuid}
────────────────────────────────────────
System: Document RAG
TG Types on Question: tg:Question, tg:DocRagQuestion
URI Pattern: urn:trustgraph:docrag:{uuid}
────────────────────────────────────────
System: Agent
TG Types on Question: tg:Question, tg:AgentQuestion
URI Pattern: urn:trustgraph:agent:{uuid}
Files modified:
- trustgraph-base/trustgraph/provenance/namespaces.py - Added
TG_GRAPH_RAG_QUESTION, TG_DOC_RAG_QUESTION, TG_AGENT_QUESTION
- trustgraph-base/trustgraph/provenance/triples.py - Added subtype to
question_triples and docrag_question_triples
- trustgraph-base/trustgraph/provenance/agent.py - Added subtype to
agent_session_triples
- trustgraph-base/trustgraph/provenance/__init__.py - Exported new types
- docs/tech-specs/agent-explainability.md - Documented the subtypes

This allows:
- Query all questions: ?q rdf:type tg:Question
- Query only GraphRAG: ?q rdf:type tg:GraphRagQuestion
- Query only Document RAG: ?q rdf:type tg:DocRagQuestion
- Query only Agent: ?q rdf:type tg:AgentQuestion

* Fixed tests
2026-03-11 15:28:15 +00:00
cybermaggedon
a53ed41da2
Add explainability CLI tools (#688)
Add explainability CLI tools for debugging provenance data
- tg-show-document-hierarchy: Display document → page → chunk → edge
  hierarchy by traversing prov:wasDerivedFrom relationships
- tg-list-explain-traces: List all GraphRAG sessions with questions
  and timestamps from the retrieval graph
- tg-show-explain-trace: Show full explainability cascade for a
  GraphRAG session (question → exploration → focus → synthesis)

These tools query the source and retrieval graphs to help debug
and explore provenance/explainability data stored during document
processing and GraphRAG queries.
2026-03-11 13:44:29 +00:00
cybermaggedon
aa4f5c6c00
Remove redundant metadata (#685)
The metadata field (list of triples) in the pipeline Metadata class
was redundant. Document metadata triples already flow directly from
librarian to triple-store via emit_document_provenance() - they don't
need to pass through the extraction pipeline.

Additionally, chunker and PDF decoder were overwriting metadata to []
anyway, so any metadata passed through the pipeline was being
discarded.

Changes:
- Remove metadata field from Metadata dataclass
  (schema/core/metadata.py)
- Update all Metadata instantiations to remove metadata=[]
  parameter
- Remove metadata handling from translators (document_loading,
  knowledge)
- Remove metadata consumption from extractors (ontology, agent)
- Update gateway serializers and import handlers
- Update all unit, integration, and contract tests
2026-03-11 10:51:39 +00:00
cybermaggedon
1837d73f34
Dataflow tech spec for extraction (#684) 2026-03-11 10:49:50 +00:00
cybermaggedon
84941ce645
Fix Cassandra schema and graph filter semantics (#680)
Schema fix (dtype/lang clustering key):
- Add dtype and lang to PRIMARY KEY in quads_by_entity table
- Add otype, dtype, lang to PRIMARY KEY in quads_by_collection table
- Fixes deduplication bug where literals with same value but different
  datatype or language tag were collapsed (e.g., "thing" vs "thing"@en)
- Update delete_collection to pass new clustering columns
- Update tech spec to reflect new schema

Graph filter semantics (simplified, no wildcard constant):
- g=None means all graphs (no filter)
- g="" means default graph only
- g="uri" means specific named graph
- Remove GRAPH_WILDCARD usage from EntityCentricKnowledgeGraph
- Fix service.py streaming and non-streaming paths
- Fix CLI to preserve empty string for -g '' argument
2026-03-10 12:52:51 +00:00
cybermaggedon
ec83775789
Update tech spec (#678) 2026-03-10 10:07:37 +00:00
cybermaggedon
0a2ce47a88
Batch embeddings (#668)
Base Service (trustgraph-base/trustgraph/base/embeddings_service.py):
- Changed on_request to use request.texts

FastEmbed Processor
(trustgraph-flow/trustgraph/embeddings/fastembed/processor.py):
- on_embeddings(texts, model=None) now processes full batch efficiently
- Returns [[v.tolist()] for v in vecs] - list of vector sets

Ollama Processor (trustgraph-flow/trustgraph/embeddings/ollama/processor.py):
- on_embeddings(texts, model=None) passes list directly to Ollama
- Returns [[embedding] for embedding in embeds.embeddings]

EmbeddingsClient (trustgraph-base/trustgraph/base/embeddings_client.py):
- embed(texts, timeout=300) accepts list of texts

Tests Updated:
- test_fastembed_dynamic_model.py - 4 tests updated for new interface
- test_ollama_dynamic_model.py - 4 tests updated for new interface

Updated CLI, SDK and APIs
2026-03-08 18:36:54 +00:00
cybermaggedon
24bbe94136
Document chunks not stored in vector store (#665)
- Schema - ChunkEmbeddings now uses chunk_id: str instead of chunk: bytes
- Schema - DocumentEmbeddingsResponse now returns chunk_ids: list[str]
  instead of chunks
- Translators - Updated to serialize/deserialize chunk_id
- Clients - DocumentEmbeddingsClient.query() returns chunk_ids
- SDK/API - flow.py, socket_client.py, bulk_client.py updated
- Document embeddings service - Stores chunk_id (document ID) instead
  of chunk text
- Storage writers - Qdrant, Milvus, Pinecone store chunk_id in payload
- Query services - Return chunk_id from vector store searches
- Gateway dispatchers - Serialize chunk_id in API responses
- Document RAG - Added librarian client to fetch chunk content from
  Garage using chunk_ids
- CLI tools - Updated all three tools:
  - invoke_document_embeddings.py - displays chunk_ids, removed
    max_chunk_length
  - save_doc_embeds.py - exports chunk_id
  - load_doc_embeds.py - imports chunk_id
2026-03-07 23:10:45 +00:00
cybermaggedon
cd5580be59
Extract-time provenance (#661)
1. Shared Provenance Module - URI generators, namespace constants,
   triple builders, vocabulary bootstrap
2. Librarian - Emits document metadata to graph on processing
   initiation (vocabulary bootstrap + PROV-O triples)
3. PDF Extractor - Saves pages as child documents, emits parent-child
   provenance edges, forwards page IDs
4. Chunker - Saves chunks as child documents, emits provenance edges,
   forwards chunk ID + content
5. Knowledge Extractors (both definitions and relationships):
   - Link entities to chunks via SUBJECT_OF (not top-level document)
   - Removed duplicate metadata emission (now handled by librarian)
   - Get chunk_doc_id and chunk_uri from incoming Chunk message
6. Embedding Provenance:
   - EntityContext schema has chunk_id field
   - EntityEmbeddings schema has chunk_id field
   - Definitions extractor sets chunk_id when creating EntityContext
   - Graph embeddings processor passes chunk_id through to
     EntityEmbeddings

Provenance Flow:
Document → Page (PDF) → Chunk → Extracted Facts/Embeddings
    ↓           ↓          ↓              ↓
  librarian  librarian  librarian    (chunk_id reference)
  + graph    + graph    + graph

Each artifact is stored in librarian with parent-child linking, and PROV-O
edges are emitted to the knowledge graph for full traceability from any
extracted fact back to its source document.

Also, updating tests
2026-03-05 18:36:10 +00:00
cybermaggedon
a630e143ef
Incremental / large document loading (#659)
Tech spec

BlobStore (trustgraph-flow/trustgraph/librarian/blob_store.py):
- get_stream() - yields document content in chunks for streaming retrieval
- create_multipart_upload() - initializes S3 multipart upload, returns
  upload_id
- upload_part() - uploads a single part, returns etag
- complete_multipart_upload() - finalizes upload with part etags
- abort_multipart_upload() - cancels and cleans up

Cassandra schema (trustgraph-flow/trustgraph/tables/library.py):
- New upload_session table with 24-hour TTL
- Index on user for listing sessions
- Prepared statements for all operations
- Methods: create_upload_session(), get_upload_session(),
  update_upload_session_chunk(), delete_upload_session(),
  list_upload_sessions()

- Schema extended with UploadSession, UploadProgress, and new
  request/response fields
- Librarian methods: begin_upload, upload_chunk, complete_upload,
  abort_upload, get_upload_status, list_uploads
- Service routing for all new operations
- Python SDK with transparent chunked upload:
  - add_document() auto-switches to chunked for files > 10MB
  - Progress callback support (on_progress)
  - get_pending_uploads(), get_upload_status(), abort_upload(),
    resume_upload()

- Document table: Added parent_id and document_type columns with index
- Document schema (knowledge/document.py): Added document_id field for
  streaming retrieval
- Librarian operations:
  - add-child-document for extracted PDF pages
  - list-children to get child documents
  - stream-document for chunked content retrieval
  - Cascade delete removes children when parent is deleted
  - list-documents filters children by default
- PDF decoder (decoding/pdf/pdf_decoder.py): Updated to stream large
  documents from librarian API to temp file
- Librarian service (librarian/service.py): Sends document_id instead of
  content for large PDFs (>2MB)
- Deprecated tools (load_pdf.py, load_text.py): Added deprecation
  warnings directing users to tg-add-library-document +
  tg-start-library-processing

Remove load_pdf and load_text utils

Move chunker/librarian comms to base class

Updating tests
2026-03-04 16:57:58 +00:00
cybermaggedon
a38ca9474f
Tool services - dynamically pluggable tool implementations for agent frameworks (#658)
* New schema

* Tool service implementation

* Base class

* Joke service, for testing

* Update unit tests for tool services
2026-03-04 14:51:32 +00:00
cybermaggedon
e19ea8667d
Tool services tech spec (#656) 2026-02-28 14:46:13 +00:00
cybermaggedon
4d31cd4c03
Agent explainability tech specs (#655)
* Query time provenance tech spec

* Extraction provenance placeholder
2026-02-28 14:44:18 +00:00
cybermaggedon
4bbc6d844f
Row embeddings APIs exposed (#646)
* Added row embeddings API and CLI support

* Updated protocol specs

* Row embeddings agent tool

* Add new agent tool to CLI
2026-02-23 21:52:56 +00:00
cybermaggedon
1809c1f56d
Structured data 2 (#645)
* Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables

* Tech spec updated to track implementation
2026-02-23 15:56:29 +00:00
cybermaggedon
00c1ca681b
Entity-centric graph (#633)
* Tech spec for new entity-centric graph schema

* Graph implementation
2026-02-16 13:26:43 +00:00
cybermaggedon
cf0daedefa
Changed schema for Value -> Term, majorly breaking change (#622)
* Changed schema for Value -> Term, majorly breaking change

* Following the schema change, Value -> Term into all processing

* Updated Cassandra for g, p, s, o index patterns (7 indexes)

* Reviewed and updated all tests

* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down
2026-01-27 13:48:08 +00:00
cybermaggedon
e061f2c633
Graph contexts tech spec (#621) 2026-01-26 22:41:00 +00:00
cybermaggedon
e214eb4e02
Feature/prompts jsonl (#619)
* Tech spec

* JSONL implementation complete

* Updated prompt client users

* Fix tests
2026-01-26 17:38:00 +00:00
cybermaggedon
fce43ae035
REST API OpenAPI spec (#612)
* OpenAPI spec in specs/api.  Checked lint with redoc.
2026-01-15 11:04:37 +00:00
cybermaggedon
b08db761d7
Fix config inconsistency (#609)
* Plural/singular confusion in config key

* Flow class vs flow blueprint nomenclature change

* Update docs & CLI to reflect the above
2026-01-14 12:31:40 +00:00
cybermaggedon
ae13190093
Address legacy issues in storage management (#595)
* Removed legacy storage management cruft.  Tidied tech specs.

* Fix deletion of last collection

* Storage processor ignores data on the queue which is for a deleted collection

* Updated tests
2026-01-05 13:45:14 +00:00
cybermaggedon
25563bae3c
Change MinIO integration options in librarian to be more generic - to support a Garage integration (#594)
* Tweak object store parameters to be more generic for other S3-type store integration

* Update librarian to have region & SSL params

* Update MinIO migration tech spec
2025-12-27 18:01:51 +00:00
cybermaggedon
34eb083836
Messaging fabric plugins (#592)
* Plugin architecture for messaging fabric

* Schemas use a technology neutral expression

* Schemas strictness has uncovered some incorrect schema use which is fixed
2025-12-17 21:40:43 +00:00
cybermaggedon
f12fcc2652
Loki logging (#586)
* Consolidate logging into a single module

* Added Loki logging

* Update tech spec

* Add processor label

* Fix recursive log entries, logging Loki"s internals
2025-12-09 23:24:41 +00:00
cybermaggedon
7d07f802a8
Basic multitenant support (#583)
* Tech spec

* Address multi-tenant queue option problems in CLI

* Modified collection service to use config

* Changed storage management to use the config service definition
2025-12-05 21:45:30 +00:00
cybermaggedon
01aeede78b
Python API implements streaming interfaces (#577)
* Tech spec

* Python CLI utilities updated to use the API including streaming features

* Added type safety to Python API

* Completed missing auth token support in CLI
2025-12-04 17:38:57 +00:00
cybermaggedon
b957004db9
Feature/improve ontology extract (#576)
* Tech spec to change ontology extraction

* Ontology extract refactoring
2025-12-03 13:36:10 +00:00
cybermaggedon
1948edaa50
Streaming rag responses (#568)
* Tech spec for streaming RAG

* Support for streaming Graph/Doc RAG
2025-11-26 19:47:39 +00:00
cybermaggedon
310a2deb06
Feature/streaming llm phase 1 (#566)
* Tidy up duplicate tech specs in doc directory

* Streaming LLM text-completion service tech spec.

* text-completion and prompt interfaces

* streaming change applied to all LLMs, so far tested with VertexAI

* Skip Pinecone unit tests, upstream module issue is affecting things, tests are passing again

* Added agent streaming, not working and has broken tests
2025-11-26 09:59:10 +00:00
cybermaggedon
db4e842df3
Update tech spec (#558) 2025-11-13 16:29:20 +00:00
cybermaggedon
c69f5207a4
OntoRAG: Ontology-Based Knowledge Extraction and Query Technical Specification (#523)
* Onto-rag tech spec

* New processor kg-extract-ontology, use 'ontology' objects from config to guide triple extraction

* Also entity contexts

* Integrate with ontology extractor from workbench

This is first phase, the extraction is tested and working, also GraphRAG with the extracted knowledge works
2025-11-12 20:38:08 +00:00
cybermaggedon
4c3db4dbbe
MCP auth for the simple case (#557)
* MCP auth token header

* Mention limitations

* Fix AgentStep schema error by converting argument values to strings.

* Added tests for MCP auth and agent step parsing
2025-11-11 12:28:53 +00:00
cybermaggedon
6129bb68c1
Fix hard coded vector size (#555)
* Fixed hard-coded embeddings store size

* Vector store lazy-creates collections, different collections for
  different dimension lengths.

* Added tech spec for vector store lifecycle

* Fixed some tests for the new spec
2025-11-10 16:56:51 +00:00
cybermaggedon
51107008fd
master -> 1.5 (README updates) (#552) 2025-10-11 11:46:03 +01:00
cybermaggedon
52b133fc86
Collection delete pt. 3 (#542)
* Fixing collection deletion

* Fixing collection management param error

* Always test for collections

* Add Cassandra collection table

* Updated tech spec for explicit creation/deletion

* Remove implicit collection creation

* Fix up collection tracking in all processors
2025-09-30 16:02:33 +01:00
cybermaggedon
dc79b10552
Feaature/flow default params (#541)
* Flow creation uses parameter defaults in API and CLI

* Submit strings for flow parameters
2025-09-30 14:06:08 +01:00
cybermaggedon
8354ea1276
Update flow parameter tech spec for advanced params (#537)
* Add advanced mode to tech spec,  fix enum description in tech spec

* Updated tech-spec for controlled-by relationship between parameters

* Update tg-show-flows CLI

* Update tg-show-flows, tg-show-flow-classes, tg-start-flow CLI

* Add tg-show-parameter-types
2025-09-26 10:55:10 +01:00
cybermaggedon
aa8e422e8c
Flow configurable parameters (#532)
* Fix pyproject.toml missing requests dep

* parameters is now parameter-types

* Update flow parameters tech spec for recent changes (no impact on this repo)
2025-09-25 19:11:40 +01:00