Update docs for API/CLI changes in 1.0 (#420)

* Update some API basics for the 0.23/1.0 API change
This commit is contained in:
cybermaggedon 2025-07-03 14:58:29 +01:00 committed by GitHub
parent b1a546e4d2
commit cc224e97f6
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
69 changed files with 19981 additions and 407 deletions

View file

@ -7,7 +7,7 @@
[![PyPI version](https://img.shields.io/pypi/v/trustgraph.svg)](https://pypi.org/project/trustgraph/) [![Discord](https://img.shields.io/discord/1251652173201149994
)](https://discord.gg/sQMwkRz5GX)
📑 [Full Docs](https://docs.trustgraph.ai/docs/TrustGraph) 📺 [YouTube](https://www.youtube.com/@TrustGraphAI?sub_confirmation=1) 🔧 [Configuration Builder](https://config-ui.demo.trustgraph.ai/) ⚙️ [API Docs](docs/apis/README.md) 🧑‍💻 [CLI Docs](https://docs.trustgraph.ai/docs/running/cli) 💬 [Discord](https://discord.gg/sQMwkRz5GX) 📖 [Blog](https://blog.trustgraph.ai/subscribe)
📑 [Full Docs](https://docs.trustgraph.ai/docs/TrustGraph) 📺 [YouTube](https://www.youtube.com/@TrustGraphAI?sub_confirmation=1) 🔧 [Configuration Builder](https://config-ui.demo.trustgraph.ai/) ⚙️ [API Docs](docs/apis/README.md) 🧑‍💻 [CLI Docs](docs/cli/README.md) 💬 [Discord](https://discord.gg/sQMwkRz5GX) 📖 [Blog](https://blog.trustgraph.ai/subscribe)
</div>
@ -48,6 +48,9 @@ Deploying state-of-the-art AI requires managing a complex web of models, framewo
* **Component Flexibility:** Avoid component lock-in. TrustGraph integrates multiple options for all system components.
## 🚀 Getting Started
This is a very-quickstart. See [other installation options](docs/README.md).
- [Install the CLI](#install-the-trustgraph-cli)
- [Configuration Builder](#-configuration-builder)
- [Platform Restarts](#platform-restarts)

View file

@ -1,18 +0,0 @@
podman-compose -f docker-compose.yaml up -d
tg-processor-state
tg-load-text --keyword cats animals home-life --name "Mark's cats" --description "This document describes Mark's cats" --copyright-notice 'Public domain' --publication-organization 'trustgraph.ai' --publication-date 2024-10-23 --copyright-holder 'trustgraph.ai' --copyright-year 2024 --publication-description 'Uploading to Github' --url https://example.com --id TG-000001 ../trustgraph/README.cats
tg-load-text --keyword nasa challenger space-shuttle shuttle orbiter --name 'Challenger Report Volume 1' --description 'The findings of the Presidential Commission regarding the circumstances surrounding the Challenger accident are reported and recommendations for corrective action are outlined' --copyright-notice 'Work of the US Gov. Public Use Permitted' --publication-organization 'NASA' --publication-date 1986-06-06 --copyright-holder 'US Government' --copyright-year 1986 --publication-description 'The findings of the Commission regarding the circumstances surrounding the Challenger accident are reported' --url https://ntrs.nasa.gov/citations/19860015255 --id AD-A171402 ../trustgraph/README.challenger
tg-graph-show
tg-query-graph-rag -q 'Tell me cat facts'
tg-invoke-agent -v -q "How many cats does Mark have? Calculate that number raised to 0.4 power. Is that number lower than the numeric part of the mission identifier of the Space Shuttle Challenger on its last mission? If so, give me an apple pie recipe, otherwise return a poem about cheese."

59
docs/README.md Normal file
View file

@ -0,0 +1,59 @@
# TrustGraph Documentation Index
Welcome to the TrustGraph documentation. This directory contains comprehensive guides for using TrustGraph's APIs and command-line tools.
## Documentation Overview
### 📚 [API Documentation](apis/README.md)
Complete reference for TrustGraph's APIs, including REST, WebSocket, Pulsar, and Python SDK interfaces. Learn how to integrate TrustGraph services into your applications.
### 🖥️ [CLI Documentation](cli/README.md)
Comprehensive guide to TrustGraph's command-line interface. Includes detailed documentation for all CLI commands, from system administration to knowledge graph management.
### 🚀 [Quick Start Guide](README.quickstart-docker-compose.md)
Step-by-step guide to get TrustGraph running using Docker Compose. Perfect for first-time users who want to quickly deploy and test TrustGraph.
## Getting Started
If you're new to TrustGraph, we recommend starting with the
[Compose - Quick Start Guide](README.quickstart-docker-compose.md)
to get a working system up and running quickly.
For developers integrating TrustGraph into applications, check out the
[API Documentation](apis/README.md) to understand the available interfaces.
For system administrators and power users, the
[CLI Documentation](cli/README.md) provides detailed information about all
command-line tools.
## Ways to deploy
If you haven't deployed TrustGraph before, the 'compose' deployment
mentioned above is going to be the least commitment of setting things up:
See [Quick Start Guide](README.quickstart-docker-compose.md)
Other deployment mechanisms include:
- [Scaleway Kubernetes deployment using Pulumi](https://github.com/trustgraph-ai/pulumi-trustgraph-scaleway)
- [Intel Gaudi and GPU](https://github.com/trustgraph-ai/trustgraph-tiber-cloud) - tested on Intel Tiber cloud
- [Azure Kubernetes deployment using Pulumi](https://github.com/trustgraph-ai/pulumi-trustgraph-aks)
- [AWS EC2 single instance deployment using Pulumi](https://github.com/trustgraph-ai/pulumi-trustgraph-ec2)
- [GCP GKE cloud deployment using Pulumi](https://github.com/trustgraph-ai/pulumi-trustgraph-gke)
- [RKE Kubernetes on AWS deployment using Pulumi](https://github.com/trustgraph-ai/pulumi-trustgraph-aws-rke)
- It should be possible to deploy on AWS EKS, but we haven't been able to
script anything reliable so far.
## Support
For questions, issues, or contributions:
- **GitHub Issues**: Report bugs and feature requests
- **Documentation**: This documentation covers most use cases
- **Community**: Join discussions and share experiences
## Related Resources
- [TrustGraph GitHub Repository](https://github.com/trustgraph-ai/trustgraph)
- [Docker Hub Images](https://hub.docker.com/u/trustgraph)
- [Example Notebooks](https://github.com/trustgraph-ai/example-notebooks) -
shows some example use of various APIs.

View file

@ -1,6 +1,8 @@
# Getting Started
## Preparation
> [!TIP]
> Before launching `TrustGraph`, be sure to have the `Docker Engine` or `Podman Machine` installed and running on the host machine.
>
@ -13,24 +15,29 @@
> [!TIP]
> If using `Podman`, the only change will be to substitute `podman` instead of `docker` in all commands.
All `TrustGraph` components are deployed through a `Docker Compose` file. There are **16** `Docker Compose` files to choose from, depending on the desired model deployment and choosing between the graph stores `Cassandra` or `Neo4j` or `FalkorDB`:
## Create the configuration
- `AzureAI` serverless endpoint for deployed models in Azure
- `Bedrock` API for models deployed in AWS Bedrock
- `Claude` through Anthropic's API
- `Cohere` through Cohere's API
- `Mix` for mixed model deployments
- `Ollama` for local model deployments
- `OpenAI` for OpenAI's API
- `VertexAI` for models deployed in Google Cloud
This guide talks you through the Compose file launch, which is the easiest
way to lauch on a standalone machine, or a single cloud instance.
See [README](README.md) for links to other deployment mechanisms.
`Docker Compose` enables the following functions:
- Run the required components for full end-to-end `Graph RAG` knowledge pipeline
- Inspect processing logs
- Load text corpus and begin knowledge extraction
- Verify extracted Graph Edges
- Model agnostic, Graph RAG
To create the deployment configuration, go to the
[deployment portal](https://config-ui.demo.trustgraph.ai/) and follow the
instructions.
- Select Docker Compose or Podman Compose as the deployment
mechanism.
- Use Cassandra for the graph store, it's easiest and most tested.
- Use Qdrant for the vector store, it's easiest and most tested.
- Chunker: Recursive, chunk size of 1000, 50 overlap should be fine.
- Pick your favourite LLM model:
- If you have enough horsepower in a local GPU, LMStudio is an easy
starting point for a local model deployment. Ollama is fairly easy.
- VertexAI on Google is relatively straightforward for a cloud
model-as-a-service LLM, and you can get some free credits.
- Max output tokens as per the model, 2048 is safe.
- Customisation, check LLM Prompt Manager and Agent Tools.
- Finish deployment, Generate and download the deployment bundle.
Read the extra deploy steps on that page.
## Preparing TrustGraph
@ -41,208 +48,31 @@ Below is a step-by-step guide to deploy `TrustGraph`, extract knowledge from a P
```
python3 -m venv env
. env/bin/activate
pip3 install pulsar-client
pip3 install cassandra-driver
export PYTHON_PATH=.
pip install trustgraph-cli
```
### Clone the GitHub Repo
```
git clone https://github.com/trustgraph-ai/trustgraph trustgraph
cd trustgraph
```
## TrustGraph as Docker Compose Files
Launching `TrustGraph` is a simple as running a single `Docker Compose` file. There are `Docker Compose` files for each possible model deployment and graph store configuration. Depending on your chosen model ang graph store deployment, chose one of the following launch files:
| Model Deployment | Graph Store | Launch File |
| ---------------- | ------------ | ----------- |
| AWS Bedrock | Cassandra | `tg-launch-bedrock-cassandra.yaml` |
| AWS Bedrock | Neo4j | `tg-launch-bedrock-neo4j.yaml` |
| AzureAI Serverless Endpoint | Cassandra | `tg-launch-azure-cassandra.yaml` |
| AzureAI Serverless Endpoint | Neo4j | `tg-launch-azure-neo4j.yaml` |
| Anthropic API | Cassandra | `tg-launch-claude-cassandra.yaml` |
| Anthropic API | Neo4j | `tg-launch-claude-neo4j.yaml` |
| Cohere API | Cassandra | `tg-launch-cohere-cassandra.yaml` |
| Cohere API | Neo4j | `tg-launch-cohere-neo4j.yaml` |
| Mixed Depoloyment | Cassandra | `tg-launch-mix-cassandra.yaml` |
| Mixed Depoloyment | Neo4j | `tg-launch-mix-neo4j.yaml` |
| Ollama | Cassandra | `tg-launch-ollama-cassandra.yaml` |
| Ollama | Neo4j | `tg-launch-ollama-neo4j.yaml` |
| OpenAI | Cassandra | `tg-launch-openai-cassandra.yaml` |
| OpenAI | Neo4j | `tg-launch-openai-neo4j.yaml` |
| VertexAI | Cassandra | `tg-launch-vertexai-cassandra.yaml` |
| VertexAI | Neo4j | `tg-launch-vertexai-neo4j.yaml` |
> [!CAUTION]
> All tokens, paths, and authentication files must be set **PRIOR** to launching a `Docker Compose` file.
## Chunking
Extraction performance can vary signficantly with chunk size. The default chunk size is `2000` characters using a recursive method. Decreasing the chunk size may increase the amount of extracted graph edges at the cost of taking longer to complete the extraction process. The chunking method and sizes can be adjusted in the selected `YAML` file. In the selected `YAML` file, find the section for `chunker`. Under the commands list, modify the follwing parameters:
```
- "chunker-recursive" # recursive text splitter in characters
- "chunker-token" # recursive style token splitter
- "--chunk-size"
- "<number-of-characters/tokens-per-chunk>"
- "--chunk-overlap"
- "<number-of-characters/tokens-to-overlap-per-chunk>"
```
## Model Parameters
Most configurations allow adjusting some model parameters. For configurations with adjustable parameters, the `temperature` and `max_output` tokens can be set in the selected `YAML` file:
```
- "-x"
- <max_model_output_tokens>
- "-t"
- <model_temperature>
```
> [!TIP]
> The default `temperature` in `TrustGraph` is set to `0.0`. Even for models with long input contexts, the max output might only be 2048 (like some intances of Llama3.1). Make sure `max_output` is not set higher than allowed for a given model.
## Choose a TrustGraph Configuration
Choose one of the `Docker Compose` files that meets your preferred model and graph store deployments. Each deployment will require setting some `environment variables` and commands in the chosen `YAML` file. All variables and commands must be set prior to running the chosen `Docker Compose` file.
### AWS Bedrock API
```
export AWS_ACCESS_KEY_ID=<ID-KEY-HERE>
export AWS_SECRET_ACCESS_KEY=<TOKEN-GOES-HERE>
export AWS_DEFAULT_REGION=<REGION-HERE>
docker compose -f tg-launch-bedrock-cassandra.yaml up -d # Using Cassandra as the graph store
docker compose -f tg-launch-bedrock-neo4j.yaml up -d # Using Neo4j as the graph store
```
> [!NOTE]
> The current defaults for `AWS Bedrock` are `Mistral Large 2 (24.07)` in `US-West-2`.
To change the model and region, go the sections for `text-completion` and `text-completion-rag` in the `tg-launch-bedrock.yaml` file. Add the following lines under the `command` section:
```
- "-r"
- "<"us-east-1" or "us-west-2">
- "-m"
- "<bedrock-api-model-name-here>
```
> [!TIP]
> Having two separate modules for `text-completion` and `text-completion-rag` allows for using one model for extraction and a different model for RAG.
### AzureAI Serverless Model Deployment
```
export AZURE_ENDPOINT=<https://ENDPOINT.HOST.GOES.HERE/>
export AZURE_TOKEN=<TOKEN-GOES-HERE>
docker compose -f tg-launch-azure-cassandra.yaml up -d # Using Cassandra as the graph store
docker compsoe -f tg-launch-azure-neo4j.yaml up -d # Using Neo4j as the graph store
```
### Claude through Anthropic API
```
export CLAUDE_KEY=<TOKEN-GOES-HERE>
docker compose -f tg-launch-claude-cassandra.yaml up -d # Using Cassandra as the graph store
docker compose -f tg-launch-claude-neo4j.yaml up -d # Using Neo4j as the graph store
```
### Cohere API
```
export COHERE_KEY=<TOKEN-GOES-HERE>
docker compose -f tg-launch-cohere-cassandra.yaml up -d # Using Cassandra as the graph store
docker compose -f tg-launch-cohere-neo4j.yaml up -d # Using Neo4j as the graph store
```
### Ollama Hosted Model Deployment
> [!TIP]
> The power of `Ollama` is the flexibility it provides in Language Model deployments. Being able to run LMs with `Ollama` enables fully secure AI `TrustGraph` pipelines that aren't relying on any external APIs. No data is leaving the host environment or network. More information on `Ollama` deployments can be found [here](https://trustgraph.ai/docs/deploy/localnetwork).
> [!NOTE]
> The current default model for an `Ollama` deployment is `Gemma2:9B`.
```
export OLLAMA_HOST=<hostname> # Set to location of machine running Ollama such as http://localhost:11434
docker compose -f tg-launch-ollama-cassandra.yaml up -d # Using Cassandra as the graph store
docker compose -f tg-launch-ollama-neo4j.yaml up -d # Using Neo4j as the graph store
```
> [!NOTE]
> On `MacOS`, if running `Ollama` locally set `OLLAMA_HOST=http://host.docker.internal:11434`.
To change the `Ollama` model, first make sure the desired model has been pulled and fully downloaded. In the `YAML` file, go to the section for `text-completion` and `text-completion-rag`. Under `commands`, add the following two lines:
```
- "-m"
- "<model-name-here>"
```
### OpenAI API
```
export OPENAI_TOKEN=<TOKEN-GOES-HERE>
docker compose -f tg-launch-openai-cassandra.yaml up -d # Using Cassandra as the graph store
docker compose -f tg-launch-openai-neo4j.yaml up -d # Using Neo4j as the graph store
```
### VertexAI through GCP
```
mkdir -p vertexai
cp <your config> vertexai/private.json
docker compose -f tg-launch-vertexai-cassandra.yaml up -d # Using Cassandra as the graph store
docker compose -f tg-launch-vertexai-neo4j.yaml up -d # Using Neo4j as the graph store
```
> [!TIP]
> If you're running `SELinux` on Linux you may need to set the permissions on the VertexAI directory so that the key file can be mounted on a Docker container using the following command:
>
> ```
> chcon -Rt svirt_sandbox_file_t vertexai/
> ```
## Mixing Models
One of the most powerful features of `TrustGraph` is the ability to use one model deployment for the `Naive Extraction` process and a different model for `RAG`. Since the `Naive Extraction` can be a one time process, it makes sense to use a more performant model to generate the most comprehensive set of graph edges and embeddings as possible. With a high-quality extraction, it's possible to use a much smaller model for `RAG` and still achieve "big" model performance.
A "split" model deployment uses `tg-launch-mix.yaml`. There are two modules: `text-completion` and `text-completion-rag`. The `text-completion` module is called only for extraction while `text-completion-rag` is called only for RAG.
### Choosing Model Deployments
Before launching the `Docker Compose` file, the desired model deployments must be specified. The options are:
- `text-completion-azure`
- `text-completion-bedrock`
- `text-completion-claude`
- `text-completion-cohere`
- `text-completion-ollama`
- `text-completion-openai`
- `text-completion-vertexai`
For the `text-completion` and `text-completion-rag` modules in the `tg-launch-mix.yaml`file, choose one of the above deployment options and enter that line as the first line under `command` for each `text-completion` and `text-completion-rag` module. Depending on the model deployment, other variables such as endpoints, keys, and model names must specified under the `command` section as well. Once all variables and commands have been set, the `mix` deployment can be lauched with:
```
docker compose -f tg-launch-mix-cassandra.yaml up -d # Using Cassandra as the graph store
docker compose -f tg-launch-mix-neo4j.yaml up -d # Using Neo4j as the graph store
```
> [!TIP]
> Any of the `YAML` files can be modified for a "split" deployment by adding the `text-completion-rag` module.
## Running TrustGraph
```
docker-compose -f docker-compose.yaml up -d
```
After running the chosen `Docker Compose` file, all `TrustGraph` services will launch and be ready to run `Naive Extraction` jobs and provide `RAG` responses using the extracted knowledge.
### Verify TrustGraph Containers
On first running a `Docker Compose` file, it may take a while (depending on your network connection) to pull all the necessary components. Once all of the components have been pulled, check that the TrustGraph containers are running:
On first running a `Docker Compose` file, it may take a while (depending on your network connection) to pull all the necessary components. Once all of the components have been pulled.
A quick check that TrustGraph processors have started:
```
tg-show-processor-state
```
Processors start quickly, but can take a while (~60 seconds) for
Pulsar and Cassandra to start.
If you have any concerns,
check that the TrustGraph containers are running:
```
docker ps
@ -257,129 +87,60 @@ docker ps -a
> [!TIP]
> Before proceeding, allow the system to stabilize. A safe warm up period is `120 seconds`. If services seem to be "stuck", it could be because services did not have time to initialize correctly and are trying to restart. Waiting `120 seconds` before launching any scripts should provide much more reliable operation.
### Load a Text Corpus
### Everything running
Create a sources directory and get a test PDF file. To demonstrate the power of `TrustGraph`, the provided script loads a PDF of the public [Roger's Commision Report](https://sma.nasa.gov/SignificantIncidents/assets/rogers_commission_report.pdf) from the NASA Challenger disaster. This PDF includes complex formatting, unique terms, complex concepts, unique concepts, and information not commonly found in public knowledge sources.
An easy way to check all the main start is complete:
```
mkdir sources
curl -o sources/Challenger-Report-Vol1.pdf https://sma.nasa.gov/SignificantIncidents/assets/rogers_commission_report.pdf
tg-show-flows
```
Load the file for knowledge extraction:
You should see a default flow. If you see an error, leave it and try again.
### Load some sample documents
```
scripts/load-pdf -f sources/Challenger-Report-Vol1.pdf
tg-load-sample-documents
```
> [!NOTE]
> To load a text file, use the following script:
>
> ```
> scripts/load-text -f sources/<txt-file.txt>
> ```
### Workbench
The console output `File loaded.` indicates the text corpus has been sucessfully loaded to the processing queues and extraction will begin.
A UI is launched on port 8888, see if you can see it at
[http://localhost:8888/](http://localhost:8888/)
### Processing Logs
Verify things are working:
- Go to the prompts page see that you can see some prompts
- Go to the library page, and check you can see the sample documents you
just loaded.
### Load a document
At this point, many processing services are running concurrently. You can check the status of these processes with the following logs:
- On the library page, select a document. Beyond State Vigilance is a
smallish doc to work with.
- Select the doc by clicking on it.
- Select Submit at the bottom of the screen on the action bar.
- Select a processing flow, use the default.
- Click submit.
`PDF Decoder`:
```
docker logs trustgraph-pdf-decoder-1
```
### Look in Grafana
Output should look:
```
Decoding 1f7b7055...
Done.
```
`Chunker`:
```
docker logs trustgraph-chunker-1
```
The output should be similiar to the output of the `Decode`, except it should be a sequence of many entries.
`Vectorizer`:
```
docker logs trustgraph-vectorize-1
```
Similar output to above processes, except many entries instead.
`Language Model Inference`:
```
docker logs trustgraph-text-completion-1
```
Output should be a sequence of entries:
```
Handling prompt fa1b98ae-70ef-452b-bcbe-21a867c5e8e2...
Send response...
Done.
```
`Knowledge Graph Definitions`:
```
docker logs trustgraph-kg-extract-definitions-1
```
Output should be an array of JSON objects with keys `entity` and `definition`:
```
Indexing 1f7b7055-p11-c1...
[
{
"entity": "Orbiter",
"definition": "A spacecraft designed for spaceflight."
},
{
"entity": "flight deck",
"definition": "The top level of the crew compartment, typically where flight controls are located."
},
{
"entity": "middeck",
"definition": "The lower level of the crew compartment, used for sleeping, working, and storing equipment."
}
]
Done.
```
`Knowledge Graph Relationshps`:
```
docker logs trustgraph-kg-extract-relationships-1
```
Output should be an array of JSON objects with keys `subject`, `predicate`, `object`, and `object-entity`:
```
Indexing 1f7b7055-p11-c3...
[
{
"subject": "Space Shuttle",
"predicate": "carry",
"object": "16 tons of cargo",
"object-entity": false
},
{
"subject": "friction",
"predicate": "generated by",
"object": "atmosphere",
"object-entity": true
}
]
Done.
```
A Grafana is launched on port 3000, see if you can see it at
[http://localhost:3000/](http://localhost:3000/)
- Login as admin, password admin.
- Skip the password change screen / change the password.
- Verify things are working by selecting the TrustGraph dashboard
- After a short while, you should see the backlog rise to a few hundred
document chunks.
Once some chunks are loaded, you can start to work with the document.
### Graph Parsing
To check that the knowledge graph is successfully parsing data:
```
scripts/graph-show
tg-show-graph
```
The output should be a set of semantic triples in [N-Triples](https://www.w3.org/TR/rdf12-n-triples/) format.
@ -390,64 +151,25 @@ http://trustgraph.ai/e/enterprise http://www.w3.org/2000/01/rdf-schema#label Ent
http://trustgraph.ai/e/enterprise http://www.w3.org/2004/02/skos/core#definition A prototype space shuttle orbiter used for atmospheric flight testing.
```
### Number of Graph Edges
### Work with the document
N-Triples format is not particularly human readable. It's more useful to know how many graph edges have successfully been extracted from the text corpus:
```
scripts/graph-show | wc -l
```
Back on the workbench, click on the 'Vector search' tab, and
search for something e.g. state. You should see some search results.
Click on results to start exploring the knowledge graph.
The Challenger report has a long introduction with quite a bit of adminstrative text commonly found in official reports. The first few hundred graph edges mostly capture this document formatting knowledge. To fully test the ability to extract complex knowledge, wait until at least `1000` graph edges have been extracted. The full extraction for this PDF will extract many thousand graph edges.
Click on Graph view on an explored page to visualize the graph.
### RAG Test
```
scripts/query-graph-rag -q 'Give me 20 facts about the space shuttle Challenger'
```
This script forms a LM prompt asking for 20 facts regarding the Challenger disaster. Depending on how many graph edges have been extracted, the response will be similar to:
### Queries over the document
```
Here are 20 facts from the provided knowledge graph about the Space Shuttle disaster:
1. **Space Shuttle Challenger was a Space Shuttle spacecraft.**
2. **The third Spacelab mission was carried by Orbiter Challenger.**
3. **Francis R. Scobee was the Commander of the Challenger crew.**
4. **Earth-to-orbit systems are designed to transport payloads and humans from Earth's surface into orbit.**
5. **The Space Shuttle program involved the Space Shuttle.**
6. **Orbiter Challenger flew on mission 41-B.**
7. **Orbiter Challenger was used on STS-7 and STS-8 missions.**
8. **Columbia completed the orbital test.**
9. **The Space Shuttle flew 24 successful missions.**
10. **One possibility for the Space Shuttle was a winged but unmanned recoverable liquid-fuel vehicle based on the Saturn 5 rocket.**
11. **A Commission was established to investigate the space shuttle Challenger accident.**
12. **Judit h Arlene Resnik was Mission Specialist Two.**
13. **Mission 51-L was originally scheduled for December 1985 but was delayed until January 1986.**
14. **The Corporation's Space Transportation Systems Division was responsible for the design and development of the Space Shuttle Orbiter.**
15. **Michael John Smith was the Pilot of the Challenger crew.**
16. **The Space Shuttle is composed of two recoverable Solid Rocket Boosters.**
17. **The Space Shuttle provides for the broadest possible spectrum of civil/military missions.**
18. **Mission 51-L consisted of placing one satellite in orbit, deploying and retrieving Spartan, and conducting six experiments.**
19. **The Space Shuttle became the focus of NASA's near-term future.**
20. **The Commission focused its attention on safety aspects of future flights.**
```
For any errors with the `RAG` proces, check the following log:
```
docker logs -f trustgraph-graph-rag-1
```
### Custom RAG Queries
At any point, a RAG request can be generated and run with the following script:
```
scripts/query-graph-rag -q "RAG request here"
```
On workbench, click Graph RAG and enter a question e.g.
What is this document about?
### Shutting Down TrustGraph
When shutting down `TrustGraph`, it's best to shut down all Docker containers and volumes. Run the `docker compose down` command that corresponds to your model and graph store deployment:
```
docker compose -f tg-launch-<model-deployment>-<graph-store>.yaml down -v
docker compose -f document-compose.yaml down -v -t 0
```
> [!TIP]
@ -460,3 +182,4 @@ docker compose -f tg-launch-<model-deployment>-<graph-store>.yaml down -v
> ```
> docker volume ls
> ```

View file

@ -3,8 +3,10 @@
## Overview
If you want to interact with TrustGraph through APIs, there are 3
forms of API which may be of interest to you:
If you want to interact with TrustGraph through APIs, there are 4
forms of API which may be of interest to you. All four mechanisms
invoke the same underlying TrustGraph functionality but are made
available for integration in different ways:
### Pulsar APIs
@ -56,6 +58,31 @@ Cons:
using a basic REST API, particular if you want to cover all of the error
scenarios well
### Python SDK API
The `trustgraph-base` package provides a Python SDK that wraps the underlying
service invocations in a convenient Python API.
Pros:
- Native Python integration with type hints and documentation
- Simplified service invocation without manual message handling
- Built-in error handling and response parsing
- Convenient for Python-based applications and scripts
Cons:
- Python-specific, not available for other programming languages
- Requires Python environment and trustgraph-base package installation
- Less control over low-level message handling
## Flow-hosted APIs
There are two types of APIs: Flow-hosted which need a flow to be running
to operate. Non-flow-hosted which are core to the system, and can
be seen as 'global' - they are not dependent on a flow to be running.
Knowledge, Librarian, Config and Flow APIs fall into the latter
category.
## See also
- [TrustGraph websocket overview](websocket.md)
@ -64,9 +91,19 @@ Cons:
- [Text completion](api-text-completion.md)
- [Prompt completion](api-prompt.md)
- [Graph RAG](api-graph-rag.md)
- [Document RAG](api-document-rag.md)
- [Agent](api-agent.md)
- [Embeddings](api-embeddings.md)
- [Graph embeddings](api-graph-embeddings.md)
- [Document embeddings](api-document-embeddings.md)
- [Entity contexts](api-entity-contexts.md)
- [Triples query](api-triples-query.md)
- [Document load](api-document-load.md)
- [Text load](api-text-load.md)
- [Config](api-config.md)
- [Flow](api-flow.md)
- [Librarian](api-librarian.md)
- [Knowledge](api-knowledge.md)
- [Metrics](api-metrics.md)
- [Core import/export](api-core-import-export.md)

View file

@ -18,7 +18,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `thought`: Optional, a string, provides an interim agent thought
- `observation`: Optional, a string, provides an interim agent thought
- `answer`: Optional, a string, provides the final answer
@ -61,6 +61,7 @@ Request:
{
"id": "blrqotfefnmnh7de-20",
"service": "agent",
"flow": "default",
"request": {
"question": "What does NASA stand for?"
}

261
docs/apis/api-config.md Normal file
View file

@ -0,0 +1,261 @@
# TrustGraph Config API
This API provides centralized configuration management for TrustGraph components.
Configuration data is organized hierarchically by type and key, with support for
persistent storage and push notifications.
## Request/response
### Request
The request contains the following fields:
- `operation`: The operation to perform (`get`, `list`, `getvalues`, `put`, `delete`, `config`)
- `keys`: Array of ConfigKey objects (for `get`, `delete` operations)
- `type`: Configuration type (for `list`, `getvalues` operations)
- `values`: Array of ConfigValue objects (for `put` operation)
### Response
The response contains the following fields:
- `version`: Version number for tracking changes
- `values`: Array of ConfigValue objects returned by operations
- `directory`: Array of key names returned by `list` operation
- `config`: Full configuration map returned by `config` operation
- `error`: Error information if operation fails
## Operations
### PUT - Store Configuration Values
Request:
```json
{
"operation": "put",
"values": [
{
"type": "test",
"key": "key1",
"value": "value1"
}
]
}
```
Response:
```json
{
"version": 123
}
```
### GET - Retrieve Configuration Values
Request:
```json
{
"operation": "get",
"keys": [
{
"type": "test",
"key": "key1"
}
]
}
```
Response:
```json
{
"version": 123,
"values": [
{
"type": "test",
"key": "key1",
"value": "value1"
}
]
}
```
### LIST - List Keys by Type
Request:
```json
{
"operation": "list",
"type": "test"
}
```
Response:
```json
{
"version": 123,
"directory": ["key1", "key2", "key3"]
}
```
### GETVALUES - Get All Values by Type
Request:
```json
{
"operation": "getvalues",
"type": "test"
}
```
Response:
```json
{
"version": 123,
"values": [
{
"type": "test",
"key": "key1",
"value": "value1"
},
{
"type": "test",
"key": "key2",
"value": "value2"
}
]
}
```
### CONFIG - Get Entire Configuration
Request:
```json
{
"operation": "config"
}
```
Response:
```json
{
"version": 123,
"config": {
"test": {
"key1": "value1",
"key2": "value2"
}
}
}
```
### DELETE - Remove Configuration Values
Request:
```json
{
"operation": "delete",
"keys": [
{
"type": "test",
"key": "key1"
}
]
}
```
Response:
```json
{
"version": 124
}
```
## REST service
The REST service is available at `/api/v1/config` and accepts the above request formats.
## Websocket
Requests have a `request` object containing the operation fields.
Responses have a `response` object containing the response fields.
Request:
```json
{
"id": "unique-request-id",
"service": "config",
"request": {
"operation": "get",
"keys": [
{
"type": "test",
"key": "key1"
}
]
}
}
```
Response:
```json
{
"id": "unique-request-id",
"response": {
"version": 123,
"values": [
{
"type": "test",
"key": "key1",
"value": "value1"
}
]
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Config API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/config.py
Default request queue:
`non-persistent://tg/request/config`
Default response queue:
`non-persistent://tg/response/config`
Request schema:
`trustgraph.schema.ConfigRequest`
Response schema:
`trustgraph.schema.ConfigResponse`
## Python SDK
The Python SDK provides convenient access to the Config API:
```python
from trustgraph.api.config import ConfigClient
client = ConfigClient()
# Put a value
await client.put("test", "key1", "value1")
# Get a value
value = await client.get("test", "key1")
# List keys
keys = await client.list("test")
# Get all values for a type
values = await client.get_values("test")
```
## Features
- **Hierarchical Organization**: Configuration organized by type and key
- **Versioning**: Each operation returns a version number for change tracking
- **Persistent Storage**: Data stored in Cassandra for persistence
- **Push Notifications**: Configuration changes pushed to subscribers
- **Multiple Access Methods**: Available via Pulsar, REST, WebSocket, and Python SDK

View file

@ -0,0 +1,324 @@
# TrustGraph Core Import/Export API
This API provides bulk import and export capabilities for TrustGraph knowledge cores.
It handles efficient transfer of both RDF triples and graph embeddings using MessagePack
binary format for high-performance data exchange.
## Overview
The Core Import/Export API enables:
- **Bulk Import**: Import large knowledge cores from binary streams
- **Bulk Export**: Export knowledge cores as binary streams
- **Efficient Format**: Uses MessagePack for compact, fast serialization
- **Dual Data Types**: Handles both RDF triples and graph embeddings
- **Streaming**: Supports streaming for large datasets
## Import Endpoint
**Endpoint:** `POST /api/v1/import-core`
**Query Parameters:**
- `id`: Knowledge core identifier
- `user`: User identifier
**Content-Type:** `application/octet-stream`
**Request Body:** MessagePack-encoded binary stream
### Import Process
1. **Stream Processing**: Reads binary data in 128KB chunks
2. **MessagePack Decoding**: Unpacks binary data into structured messages
3. **Knowledge Storage**: Stores data via Knowledge API
4. **Response**: Returns success/error status
### Import Data Format
The import stream contains MessagePack-encoded tuples with type indicators:
#### Triples Data
```python
("t", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"t": [ # triples array
{
"s": {"value": "subject", "is_uri": true},
"p": {"value": "predicate", "is_uri": true},
"o": {"value": "object", "is_uri": false}
}
]
})
```
#### Graph Embeddings Data
```python
("ge", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"e": [ # entities array
{
"e": {"value": "entity", "is_uri": true},
"v": [[0.1, 0.2, 0.3]] # vectors
}
]
})
```
## Export Endpoint
**Endpoint:** `GET /api/v1/export-core`
**Query Parameters:**
- `id`: Knowledge core identifier
- `user`: User identifier
**Content-Type:** `application/octet-stream`
**Response Body:** MessagePack-encoded binary stream
### Export Process
1. **Knowledge Retrieval**: Fetches data via Knowledge API
2. **MessagePack Encoding**: Encodes data into binary format
3. **Streaming Response**: Sends data as binary stream
4. **Type Identification**: Uses type prefixes for data classification
## Usage Examples
### Import Knowledge Core
```bash
# Import from file
curl -X POST \
-H "Authorization: Bearer your-token" \
-H "Content-Type: application/octet-stream" \
--data-binary @knowledge-core.msgpack \
"http://api-gateway:8080/api/v1/import-core?id=core-123&user=alice"
```
### Export Knowledge Core
```bash
# Export to file
curl -H "Authorization: Bearer your-token" \
"http://api-gateway:8080/api/v1/export-core?id=core-123&user=alice" \
-o knowledge-core.msgpack
```
## Python Integration
### Import Example
```python
import msgpack
import requests
def import_knowledge_core(core_id, user, triples_data, embeddings_data, token):
# Prepare data
messages = []
# Add triples
if triples_data:
messages.append(("t", {
"m": {
"i": core_id,
"m": [],
"u": user,
"c": "default"
},
"t": triples_data
}))
# Add embeddings
if embeddings_data:
messages.append(("ge", {
"m": {
"i": core_id,
"m": [],
"u": user,
"c": "default"
},
"e": embeddings_data
}))
# Pack data
binary_data = b''.join(msgpack.packb(msg) for msg in messages)
# Upload
response = requests.post(
f"http://api-gateway:8080/api/v1/import-core?id={core_id}&user={user}",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/octet-stream"
},
data=binary_data
)
return response.status_code == 200
# Usage
triples = [
{
"s": {"value": "Person1", "is_uri": True},
"p": {"value": "hasName", "is_uri": True},
"o": {"value": "John Doe", "is_uri": False}
}
]
embeddings = [
{
"e": {"value": "Person1", "is_uri": True},
"v": [[0.1, 0.2, 0.3, 0.4]]
}
]
success = import_knowledge_core("core-123", "alice", triples, embeddings, "your-token")
```
### Export Example
```python
import msgpack
import requests
def export_knowledge_core(core_id, user, token):
response = requests.get(
f"http://api-gateway:8080/api/v1/export-core?id={core_id}&user={user}",
headers={"Authorization": f"Bearer {token}"}
)
if response.status_code != 200:
return None
# Decode MessagePack stream
data = response.content
unpacker = msgpack.Unpacker()
unpacker.feed(data)
triples = []
embeddings = []
for unpacked in unpacker:
msg_type, msg_data = unpacked
if msg_type == "t":
triples.extend(msg_data["t"])
elif msg_type == "ge":
embeddings.extend(msg_data["e"])
return {
"triples": triples,
"embeddings": embeddings
}
# Usage
data = export_knowledge_core("core-123", "alice", "your-token")
if data:
print(f"Exported {len(data['triples'])} triples")
print(f"Exported {len(data['embeddings'])} embeddings")
```
## Data Format Specification
### MessagePack Tuples
Each message is a tuple: `(type_indicator, data_object)`
**Type Indicators:**
- `"t"`: RDF triples data
- `"ge"`: Graph embeddings data
### Metadata Structure
```python
{
"i": "core-identifier", # ID
"m": [...], # Metadata triples array
"u": "user-identifier", # User
"c": "collection-name" # Collection
}
```
### Triple Structure
```python
{
"s": {"value": "subject", "is_uri": boolean},
"p": {"value": "predicate", "is_uri": boolean},
"o": {"value": "object", "is_uri": boolean}
}
```
### Entity Embedding Structure
```python
{
"e": {"value": "entity", "is_uri": boolean},
"v": [[float, float, ...]] # Array of vectors
}
```
## Performance Characteristics
### Import Performance
- **Streaming**: Processes data in 128KB chunks
- **Memory Efficient**: Incremental unpacking
- **Concurrent**: Multiple imports can run simultaneously
- **Error Handling**: Robust error recovery
### Export Performance
- **Direct Streaming**: Data streamed directly from knowledge store
- **Efficient Encoding**: MessagePack for minimal overhead
- **Large Dataset Support**: Handles cores of any size
## Error Handling
### Import Errors
- **Format Errors**: Invalid MessagePack data
- **Type Errors**: Unknown type indicators
- **Storage Errors**: Knowledge API failures
- **Authentication**: Invalid user credentials
### Export Errors
- **Not Found**: Core ID doesn't exist
- **Access Denied**: User lacks permissions
- **System Errors**: Knowledge API failures
### Error Responses
- **HTTP 400**: Bad request (invalid parameters)
- **HTTP 401**: Unauthorized access
- **HTTP 404**: Core not found
- **HTTP 500**: Internal server error
## Use Cases
### Data Migration
- **System Upgrades**: Export/import during system migrations
- **Environment Sync**: Copy cores between environments
- **Backup/Restore**: Full knowledge core backup operations
### Batch Processing
- **Bulk Loading**: Load large knowledge datasets efficiently
- **Data Integration**: Merge knowledge from multiple sources
- **ETL Pipelines**: Extract-Transform-Load operations
### Performance Optimization
- **Faster Than REST**: Binary format reduces transfer time
- **Atomic Operations**: Complete import/export as single operation
- **Resource Efficient**: Minimal memory footprint during transfer
## Security Considerations
- **Authentication Required**: Bearer token authentication
- **User Isolation**: Access restricted to user's own cores
- **Data Validation**: Input validation on import
- **Audit Logging**: Operations logged for security auditing

View file

@ -0,0 +1,252 @@
# TrustGraph Document Embeddings API
This API provides import, export, and query capabilities for document embeddings. It handles
document chunks with their vector embeddings and metadata, supporting both real-time WebSocket
operations and request/response patterns.
## Schema Overview
### DocumentEmbeddings Structure
- `metadata`: Document metadata (ID, user, collection, RDF triples)
- `chunks`: Array of document chunks with embeddings
### ChunkEmbeddings Structure
- `chunk`: Text chunk as bytes
- `vectors`: Array of vector embeddings (Array of Array of Double)
### DocumentEmbeddingsRequest Structure
- `vectors`: Query vector embeddings
- `limit`: Maximum number of results
- `user`: User identifier
- `collection`: Collection identifier
### DocumentEmbeddingsResponse Structure
- `error`: Error information if operation fails
- `documents`: Array of matching documents as bytes
## Import/Export Operations
### Import - WebSocket Endpoint
**Endpoint:** `/api/v1/flow/{flow}/import/document-embeddings`
**Method:** WebSocket connection
**Request Format:**
```json
{
"metadata": {
"id": "doc-123",
"user": "alice",
"collection": "research",
"metadata": [
{
"s": {"v": "doc-123", "e": true},
"p": {"v": "dc:title", "e": true},
"o": {"v": "Research Paper", "e": false}
}
]
},
"chunks": [
{
"chunk": "This is the first chunk of the document...",
"vectors": [
[0.1, 0.2, 0.3, 0.4],
[0.5, 0.6, 0.7, 0.8]
]
},
{
"chunk": "This is the second chunk...",
"vectors": [
[0.9, 0.8, 0.7, 0.6],
[0.5, 0.4, 0.3, 0.2]
]
}
]
}
```
**Response:** Import operations are fire-and-forget with no response payload.
### Export - WebSocket Endpoint
**Endpoint:** `/api/v1/flow/{flow}/export/document-embeddings`
**Method:** WebSocket connection
The export endpoint streams document embeddings data in real-time. Each message contains:
```json
{
"metadata": {
"id": "doc-123",
"user": "alice",
"collection": "research",
"metadata": [
{
"s": {"v": "doc-123", "e": true},
"p": {"v": "dc:title", "e": true},
"o": {"v": "Research Paper", "e": false}
}
]
},
"chunks": [
{
"chunk": "Decoded text content of chunk",
"vectors": [[0.1, 0.2, 0.3, 0.4]]
}
]
}
```
## Query Operations
### Query Document Embeddings
**Purpose:** Find documents similar to provided vector embeddings
**Request:**
```json
{
"vectors": [
[0.1, 0.2, 0.3, 0.4, 0.5],
[0.6, 0.7, 0.8, 0.9, 1.0]
],
"limit": 10,
"user": "alice",
"collection": "research"
}
```
**Response:**
```json
{
"documents": [
"base64-encoded-document-1",
"base64-encoded-document-2"
]
}
```
## WebSocket Usage Examples
### Importing Document Embeddings
```javascript
// Connect to import endpoint
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/import/document-embeddings');
// Send document embeddings
ws.send(JSON.stringify({
metadata: {
id: "doc-123",
user: "alice",
collection: "research"
},
chunks: [
{
chunk: "Document content chunk 1",
vectors: [[0.1, 0.2, 0.3]]
}
]
}));
```
### Exporting Document Embeddings
```javascript
// Connect to export endpoint
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/export/document-embeddings');
// Listen for exported data
ws.onmessage = (event) => {
const documentEmbeddings = JSON.parse(event.data);
console.log('Received document:', documentEmbeddings.metadata.id);
console.log('Chunks:', documentEmbeddings.chunks.length);
};
```
## Data Format Details
### Metadata Format
Each metadata triple contains:
- `s`: Subject (object with `v` for value and `e` for is_entity boolean)
- `p`: Predicate (object with `v` for value and `e` for is_entity boolean)
- `o`: Object (object with `v` for value and `e` for is_entity boolean)
### Vector Format
- Vectors are arrays of floating-point numbers
- Each chunk can have multiple vectors (different embedding models)
- Vectors should be consistently dimensioned within a collection
### Text Encoding
- Chunk text is handled as UTF-8 encoded bytes internally
- WebSocket API accepts/returns plain text strings
- Base64 encoding used for binary data in query responses
## Python SDK
```python
from trustgraph.clients.document_embeddings_client import DocumentEmbeddingsClient
# Create client
client = DocumentEmbeddingsClient()
# Query similar documents
request = {
"vectors": [[0.1, 0.2, 0.3, 0.4]],
"limit": 5,
"user": "alice",
"collection": "research"
}
response = await client.query(request)
documents = response.documents
```
## Integration with TrustGraph
### Storage Integration
- Document embeddings are stored in vector databases
- Metadata is cross-referenced with knowledge graph
- Supports multi-tenant isolation by user and collection
### Processing Pipeline
1. **Document Ingestion**: Text documents loaded via text-load API
2. **Chunking**: Documents split into manageable chunks
3. **Embedding Generation**: Vector embeddings created for each chunk
4. **Storage**: Embeddings stored via import API
5. **Retrieval**: Similar documents found via query API
### Use Cases
- **Semantic Search**: Find documents similar to query embeddings
- **RAG Systems**: Retrieve relevant document chunks for question answering
- **Document Clustering**: Group similar documents using embeddings
- **Content Recommendations**: Suggest related documents to users
- **Knowledge Discovery**: Find connections between document collections
## Error Handling
Common error scenarios:
- Invalid vector dimensions
- Missing required metadata fields
- User/collection access restrictions
- WebSocket connection failures
- Malformed JSON data
Errors are returned in the response `error` field:
```json
{
"error": {
"type": "ValidationError",
"message": "Invalid vector dimensions"
}
}
```
## Performance Considerations
- **Batch Processing**: Import multiple documents in single WebSocket session
- **Vector Dimensions**: Consistent embedding dimensions improve performance
- **Collection Sizing**: Limit collections to reasonable sizes for query performance
- **Real-time vs Batch**: Choose appropriate method based on use case requirements

View file

@ -0,0 +1,96 @@
# TrustGraph Document RAG API
This presents a prompt to the Document RAG service and retrieves the answer.
This makes use of a number of the other APIs behind the scenes:
Embeddings, Document Embeddings, Prompt, TextCompletion, Triples Query.
## Request/response
### Request
The request contains the following fields:
- `query`: The question to answer
### Response
The response contains the following fields:
- `response`: LLM response
## REST service
The REST service accepts a request object containing the `query` field.
The response is a JSON object containing the `response` field.
e.g.
Request:
```
{
"query": "What does NASA stand for?"
}
```
Response:
```
{
"response": "National Aeronautics and Space Administration"
}
```
## Websocket
Requests have a `request` object containing the `query` field.
Responses have a `response` object containing `response` field.
e.g.
Request:
```
{
"id": "blrqotfefnmnh7de-14",
"service": "document-rag",
"flow": "default",
"request": {
"query": "What does NASA stand for?"
}
}
```
Response:
```
{
"id": "blrqotfefnmnh7de-14",
"response": {
"response": "National Aeronautics and Space Administration"
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Document RAG API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/retrieval.py
Default request queue:
`non-persistent://tg/request/document-rag`
Default response queue:
`non-persistent://tg/response/document-rag`
Request schema:
`trustgraph.schema.DocumentRagQuery`
Response schema:
`trustgraph.schema.DocumentRagResponse`
## Pulsar Python client
The client class is
`trustgraph.clients.DocumentRagClient`
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/clients/document_rag_client.py

View file

@ -10,7 +10,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `vectors`: Embeddings response, an array of arrays. An embedding is
an array of floating-point numbers. As multiple embeddings may be
returned, an array of embeddings is returned, hence an array
@ -51,6 +51,7 @@ Request:
{
"id": "qgzw1287vfjc8wsk-2",
"service": "embeddings",
"flow": "default",
"request": {
"text": "What is a cat?"
}

View file

@ -0,0 +1,259 @@
# TrustGraph Entity Contexts API
This API provides import and export capabilities for entity contexts data. Entity contexts
associate entities with their textual context information, commonly used for entity
descriptions, definitions, or explanatory text in knowledge graphs.
## Schema Overview
### EntityContext Structure
- `entity`: Entity identifier (Value object with value, is_uri, type)
- `context`: Textual context or description string
### EntityContexts Structure
- `metadata`: Metadata including ID, user, collection, and RDF triples
- `entities`: Array of EntityContext objects
### Value Structure
- `value`: The entity value as string
- `is_uri`: Boolean indicating if the value is a URI
- `type`: Data type of the value (optional)
## Import/Export Operations
### Import - WebSocket Endpoint
**Endpoint:** `/api/v1/flow/{flow}/import/entity-contexts`
**Method:** WebSocket connection
**Request Format:**
```json
{
"metadata": {
"id": "context-batch-123",
"user": "alice",
"collection": "research",
"metadata": [
{
"s": {"value": "source-doc", "is_uri": true},
"p": {"value": "dc:title", "is_uri": true},
"o": {"value": "Research Paper", "is_uri": false}
}
]
},
"entities": [
{
"entity": {
"v": "https://example.com/Person/JohnDoe",
"e": true
},
"context": "John Doe is a researcher at MIT specializing in artificial intelligence and machine learning."
},
{
"entity": {
"v": "https://example.com/Organization/MIT",
"e": true
},
"context": "Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts."
},
{
"entity": {
"v": "machine learning",
"e": false
},
"context": "Machine learning is a method of data analysis that automates analytical model building using algorithms."
}
]
}
```
**Response:** Import operations are fire-and-forget with no response payload.
### Export - WebSocket Endpoint
**Endpoint:** `/api/v1/flow/{flow}/export/entity-contexts`
**Method:** WebSocket connection
The export endpoint streams entity contexts data in real-time. Each message contains:
```json
{
"metadata": {
"id": "context-batch-123",
"user": "alice",
"collection": "research",
"metadata": [
{
"s": {"value": "source-doc", "is_uri": true},
"p": {"value": "dc:title", "is_uri": true},
"o": {"value": "Research Paper", "is_uri": false}
}
]
},
"entities": [
{
"entity": {
"v": "https://example.com/Person/JohnDoe",
"e": true
},
"context": "John Doe is a researcher at MIT specializing in artificial intelligence."
}
]
}
```
## WebSocket Usage Examples
### Importing Entity Contexts
```javascript
// Connect to import endpoint
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/import/entity-contexts');
// Send entity contexts
ws.send(JSON.stringify({
metadata: {
id: "context-batch-1",
user: "alice",
collection: "research"
},
entities: [
{
entity: {
v: "Albert Einstein",
e: false
},
context: "Albert Einstein was a German-born theoretical physicist widely acknowledged to be one of the greatest physicists of all time."
}
]
}));
```
### Exporting Entity Contexts
```javascript
// Connect to export endpoint
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/export/entity-contexts');
// Listen for exported data
ws.onmessage = (event) => {
const entityContexts = JSON.parse(event.data);
console.log('Received contexts for', entityContexts.entities.length, 'entities');
entityContexts.entities.forEach(item => {
console.log('Entity:', item.entity.v);
console.log('Context:', item.context);
});
};
```
## Data Format Details
### Entity Format
The `entity` field uses the Value structure:
- `v`: The entity value (name, URI, identifier)
- `e`: Boolean indicating if it's a URI entity (true) or literal (false)
- `type`: Optional data type specification
### Context Format
- Plain text string providing description or context
- Can include definitions, explanations, or background information
- Supports multi-sentence descriptions and detailed context
### Metadata Format
Each metadata triple contains:
- `s`: Subject (object with `value` and `is_uri` fields)
- `p`: Predicate (object with `value` and `is_uri` fields)
- `o`: Object (object with `value` and `is_uri` fields)
## Integration with TrustGraph
### Storage Integration
- Entity contexts are stored in graph databases
- Links entities to their descriptive text
- Supports multi-tenant isolation by user and collection
### Processing Pipeline
1. **Text Analysis**: Extract entities from documents
2. **Context Extraction**: Identify descriptive text for entities
3. **Entity Linking**: Associate entities with their contexts
4. **Import**: Store entity-context pairs via import API
5. **Knowledge Enhancement**: Use contexts for better entity understanding
### Use Cases
- **Entity Disambiguation**: Provide context to distinguish similar entities
- **Knowledge Base Enhancement**: Add descriptive information to entities
- **Question Answering**: Use entity contexts to provide detailed answers
- **Entity Summarization**: Generate summaries based on collected contexts
- **Knowledge Graph Visualization**: Display rich entity information
## Authentication
Both import and export endpoints support authentication:
- API token authentication via Authorization header
- Flow-based access control
- User and collection isolation
## Error Handling
Common error scenarios:
- Invalid JSON format
- Missing required metadata fields
- User/collection access restrictions
- WebSocket connection failures
- Invalid entity value formats
Errors are typically handled at the WebSocket connection level with connection termination or error messages.
## Performance Considerations
- **Batch Processing**: Import multiple entity contexts in single messages
- **Context Length**: Balance detailed context with performance
- **Flow Capacity**: Ensure target flow can handle entity context volume
- **Real-time vs Batch**: Choose appropriate method based on use case
## Python Integration
While no direct Python SDK is mentioned in the codebase, integration can be achieved through:
```python
import websocket
import json
# Connect to import endpoint
def import_entity_contexts(flow_id, contexts_data):
ws_url = f"ws://api-gateway:8080/api/v1/flow/{flow_id}/import/entity-contexts"
ws = websocket.create_connection(ws_url)
# Send data
ws.send(json.dumps(contexts_data))
ws.close()
# Usage example
contexts = {
"metadata": {
"id": "batch-1",
"user": "alice",
"collection": "research"
},
"entities": [
{
"entity": {"v": "Neural Networks", "e": False},
"context": "Neural networks are computing systems inspired by biological neural networks."
}
]
}
import_entity_contexts("my-flow", contexts)
```
## Features
- **Real-time Streaming**: WebSocket-based import/export for live data flow
- **Batch Operations**: Process multiple entity contexts efficiently
- **Rich Metadata**: Full metadata support with RDF triples
- **Entity Types**: Support for both URI entities and literal values
- **Flow Integration**: Direct integration with TrustGraph processing flows
- **Multi-tenant Support**: User and collection-based data isolation

252
docs/apis/api-flow.md Normal file
View file

@ -0,0 +1,252 @@
# TrustGraph Flow API
This API provides workflow management for TrustGraph components. It manages flow classes
(workflow templates) and flow instances (active running workflows) that orchestrate
complex data processing pipelines.
## Request/response
### Request
The request contains the following fields:
- `operation`: The operation to perform (see operations below)
- `class_name`: Flow class name (for class operations and start-flow)
- `class_definition`: Flow class definition JSON (for put-class)
- `description`: Flow description (for start-flow)
- `flow_id`: Flow instance ID (for flow instance operations)
### Response
The response contains the following fields:
- `class_names`: Array of flow class names (returned by list-classes)
- `flow_ids`: Array of active flow IDs (returned by list-flows)
- `class_definition`: Flow class definition JSON (returned by get-class)
- `flow`: Flow instance JSON (returned by get-flow)
- `description`: Flow description (returned by get-flow)
- `error`: Error information if operation fails
## Operations
### Flow Class Operations
#### LIST-CLASSES - List All Flow Classes
Request:
```json
{
"operation": "list-classes"
}
```
Response:
```json
{
"class_names": ["pdf-processor", "text-analyzer", "knowledge-extractor"]
}
```
#### GET-CLASS - Get Flow Class Definition
Request:
```json
{
"operation": "get-class",
"class_name": "pdf-processor"
}
```
Response:
```json
{
"class_definition": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion\", \"response\": \"persistent://tg/response/text-completion\"}}, \"description\": \"PDF processing workflow\"}"
}
```
#### PUT-CLASS - Create/Update Flow Class
Request:
```json
{
"operation": "put-class",
"class_name": "pdf-processor",
"class_definition": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion\", \"response\": \"persistent://tg/response/text-completion\"}}, \"description\": \"PDF processing workflow\"}"
}
```
Response:
```json
{}
```
#### DELETE-CLASS - Remove Flow Class
Request:
```json
{
"operation": "delete-class",
"class_name": "pdf-processor"
}
```
Response:
```json
{}
```
### Flow Instance Operations
#### LIST-FLOWS - List Active Flow Instances
Request:
```json
{
"operation": "list-flows"
}
```
Response:
```json
{
"flow_ids": ["flow-123", "flow-456", "flow-789"]
}
```
#### GET-FLOW - Get Flow Instance
Request:
```json
{
"operation": "get-flow",
"flow_id": "flow-123"
}
```
Response:
```json
{
"flow": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion-flow-123\", \"response\": \"persistent://tg/response/text-completion-flow-123\"}}}",
"description": "PDF processing workflow instance"
}
```
#### START-FLOW - Start Flow Instance
Request:
```json
{
"operation": "start-flow",
"class_name": "pdf-processor",
"flow_id": "flow-123",
"description": "Processing document batch 1"
}
```
Response:
```json
{}
```
#### STOP-FLOW - Stop Flow Instance
Request:
```json
{
"operation": "stop-flow",
"flow_id": "flow-123"
}
```
Response:
```json
{}
```
## REST service
The REST service is available at `/api/v1/flow` and accepts the above request formats.
## Websocket
Requests have a `request` object containing the operation fields.
Responses have a `response` object containing the response fields.
Request:
```json
{
"id": "unique-request-id",
"service": "flow",
"request": {
"operation": "list-classes"
}
}
```
Response:
```json
{
"id": "unique-request-id",
"response": {
"class_names": ["pdf-processor", "text-analyzer"]
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Flow API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/flows.py
Default request queue:
`non-persistent://tg/request/flow`
Default response queue:
`non-persistent://tg/response/flow`
Request schema:
`trustgraph.schema.FlowRequest`
Response schema:
`trustgraph.schema.FlowResponse`
## Python SDK
The Python SDK provides convenient access to the Flow API:
```python
from trustgraph.api.flow import FlowClient
client = FlowClient()
# List all flow classes
classes = await client.list_classes()
# Get a flow class definition
definition = await client.get_class("pdf-processor")
# Start a flow instance
await client.start_flow("pdf-processor", "flow-123", "Processing batch 1")
# List active flows
flows = await client.list_flows()
# Stop a flow instance
await client.stop_flow("flow-123")
```
## Features
- **Flow Classes**: Templates that define workflow structure and interfaces
- **Flow Instances**: Active running workflows based on flow classes
- **Dynamic Management**: Flows can be started/stopped dynamically
- **Template Processing**: Uses template replacement for customizing flow instances
- **Integration**: Works with TrustGraph ecosystem for data processing pipelines
- **Persistent Storage**: Flow definitions and instances stored for reliability
## Use Cases
- **Document Processing**: Orchestrating PDF processing through chunking, extraction, and storage
- **Knowledge Extraction**: Managing workflows for relationship and definition extraction
- **Data Pipelines**: Coordinating complex multi-step data processing workflows
- **Resource Management**: Dynamically scaling processing flows based on demand

View file

@ -17,7 +17,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `entities`: An array of graph entities. The entity type is described here:
TrustGraph uses the same schema for knowledge graph elements:
@ -85,6 +85,7 @@ Request:
{
"id": "qgzw1287vfjc8wsk-3",
"service": "graph-embeddings-query",
"flow": "default",
"request": {
"vectors": [
[

View file

@ -14,7 +14,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `response`: LLM response
## REST service
@ -52,6 +52,7 @@ Request:
{
"id": "blrqotfefnmnh7de-14",
"service": "graph-rag",
"flow": "default",
"request": {
"query": "What does NASA stand for?"
}

310
docs/apis/api-knowledge.md Normal file
View file

@ -0,0 +1,310 @@
# TrustGraph Knowledge API
This API provides knowledge graph management for TrustGraph. It handles storage, retrieval,
and flow integration of knowledge cores containing RDF triples and graph embeddings with
multi-tenant support.
## Request/response
### Request
The request contains the following fields:
- `operation`: The operation to perform (see operations below)
- `user`: User identifier (for user-specific operations)
- `id`: Knowledge core identifier
- `flow`: Flow identifier (for load operations)
- `collection`: Collection identifier (for load operations)
- `triples`: RDF triples data (for put operations)
- `graph_embeddings`: Graph embeddings data (for put operations)
### Response
The response contains the following fields:
- `error`: Error information if operation fails
- `ids`: Array of knowledge core IDs (returned by list operation)
- `eos`: End of stream indicator for streaming responses
- `triples`: RDF triples data (returned by get operation)
- `graph_embeddings`: Graph embeddings data (returned by get operation)
## Operations
### PUT-KG-CORE - Store Knowledge Core
Request:
```json
{
"operation": "put-kg-core",
"user": "alice",
"id": "core-123",
"triples": {
"metadata": {
"id": "core-123",
"user": "alice",
"collection": "research"
},
"triples": [
{
"s": {"value": "Person1", "is_uri": true},
"p": {"value": "hasName", "is_uri": true},
"o": {"value": "John Doe", "is_uri": false}
},
{
"s": {"value": "Person1", "is_uri": true},
"p": {"value": "worksAt", "is_uri": true},
"o": {"value": "Company1", "is_uri": true}
}
]
},
"graph_embeddings": {
"metadata": {
"id": "core-123",
"user": "alice",
"collection": "research"
},
"entities": [
{
"entity": {"value": "Person1", "is_uri": true},
"vectors": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]
}
]
}
}
```
Response:
```json
{}
```
### GET-KG-CORE - Retrieve Knowledge Core
Request:
```json
{
"operation": "get-kg-core",
"id": "core-123"
}
```
Response:
```json
{
"triples": {
"metadata": {
"id": "core-123",
"user": "alice",
"collection": "research"
},
"triples": [
{
"s": {"value": "Person1", "is_uri": true},
"p": {"value": "hasName", "is_uri": true},
"o": {"value": "John Doe", "is_uri": false}
}
]
},
"graph_embeddings": {
"metadata": {
"id": "core-123",
"user": "alice",
"collection": "research"
},
"entities": [
{
"entity": {"value": "Person1", "is_uri": true},
"vectors": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]
}
]
}
}
```
### LIST-KG-CORES - List Knowledge Cores
Request:
```json
{
"operation": "list-kg-cores",
"user": "alice"
}
```
Response:
```json
{
"ids": ["core-123", "core-456", "core-789"]
}
```
### DELETE-KG-CORE - Delete Knowledge Core
Request:
```json
{
"operation": "delete-kg-core",
"user": "alice",
"id": "core-123"
}
```
Response:
```json
{}
```
### LOAD-KG-CORE - Load Knowledge Core into Flow
Request:
```json
{
"operation": "load-kg-core",
"id": "core-123",
"flow": "qa-flow",
"collection": "research"
}
```
Response:
```json
{}
```
### UNLOAD-KG-CORE - Unload Knowledge Core from Flow
Request:
```json
{
"operation": "unload-kg-core",
"id": "core-123"
}
```
Response:
```json
{}
```
## Data Structures
### Triple Structure
Each RDF triple contains:
- `s`: Subject (Value object)
- `p`: Predicate (Value object)
- `o`: Object (Value object)
### Value Structure
- `value`: The actual value as string
- `is_uri`: Boolean indicating if value is a URI
- `type`: Data type of the value (optional)
### Triples Structure
- `metadata`: Metadata including ID, user, collection
- `triples`: Array of Triple objects
### Graph Embeddings Structure
- `metadata`: Metadata including ID, user, collection
- `entities`: Array of EntityEmbeddings objects
### Entity Embeddings Structure
- `entity`: The entity being embedded (Value object)
- `vectors`: Array of vector embeddings (Array of Array of Double)
## REST service
The REST service is available at `/api/v1/knowledge` and accepts the above request formats.
## Websocket
Requests have a `request` object containing the operation fields.
Responses have a `response` object containing the response fields.
Request:
```json
{
"id": "unique-request-id",
"service": "knowledge",
"request": {
"operation": "list-kg-cores",
"user": "alice"
}
}
```
Response:
```json
{
"id": "unique-request-id",
"response": {
"ids": ["core-123", "core-456"]
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Knowledge API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/knowledge.py
Default request queue:
`non-persistent://tg/request/knowledge`
Default response queue:
`non-persistent://tg/response/knowledge`
Request schema:
`trustgraph.schema.KnowledgeRequest`
Response schema:
`trustgraph.schema.KnowledgeResponse`
## Python SDK
The Python SDK provides convenient access to the Knowledge API:
```python
from trustgraph.api.knowledge import KnowledgeClient
client = KnowledgeClient()
# List knowledge cores
cores = await client.list_kg_cores("alice")
# Get a knowledge core
core = await client.get_kg_core("core-123")
# Store a knowledge core
await client.put_kg_core(
user="alice",
id="core-123",
triples=triples_data,
graph_embeddings=embeddings_data
)
# Load core into flow
await client.load_kg_core("core-123", "qa-flow", "research")
# Delete a knowledge core
await client.delete_kg_core("alice", "core-123")
```
## Features
- **Knowledge Core Management**: Store, retrieve, list, and delete knowledge cores
- **Dual Data Types**: Support for both RDF triples and graph embeddings
- **Flow Integration**: Load knowledge cores into processing flows
- **Multi-tenant Support**: User-specific knowledge cores with isolation
- **Streaming Support**: Efficient transfer of large knowledge cores
- **Collection Organization**: Group knowledge cores by collection
- **Semantic Reasoning**: RDF triples enable symbolic reasoning
- **Vector Similarity**: Graph embeddings enable neural approaches
## Use Cases
- **Knowledge Base Construction**: Build semantic knowledge graphs from documents
- **Question Answering**: Load knowledge cores for graph-based QA systems
- **Semantic Search**: Use embeddings for similarity-based knowledge retrieval
- **Multi-domain Knowledge**: Organize knowledge by user and collection
- **Hybrid Reasoning**: Combine symbolic (triples) and neural (embeddings) approaches
- **Knowledge Transfer**: Export and import knowledge cores between systems

360
docs/apis/api-librarian.md Normal file
View file

@ -0,0 +1,360 @@
# TrustGraph Librarian API
This API provides document library management for TrustGraph. It handles document storage,
metadata management, and processing orchestration using hybrid storage (MinIO for content,
Cassandra for metadata) with multi-user support.
## Request/response
### Request
The request contains the following fields:
- `operation`: The operation to perform (see operations below)
- `document_id`: Document identifier (for document operations)
- `document_metadata`: Document metadata object (for add/update operations)
- `content`: Document content as base64-encoded bytes (for add operations)
- `processing_id`: Processing job identifier (for processing operations)
- `processing_metadata`: Processing metadata object (for add-processing)
- `user`: User identifier (required for most operations)
- `collection`: Collection filter (optional for list operations)
- `criteria`: Query criteria array (for filtering operations)
### Response
The response contains the following fields:
- `error`: Error information if operation fails
- `document_metadata`: Single document metadata (for get operations)
- `content`: Document content as base64-encoded bytes (for get-content)
- `document_metadatas`: Array of document metadata (for list operations)
- `processing_metadatas`: Array of processing metadata (for list-processing)
## Document Operations
### ADD-DOCUMENT - Add Document to Library
Request:
```json
{
"operation": "add-document",
"document_metadata": {
"id": "doc-123",
"time": 1640995200000,
"kind": "application/pdf",
"title": "Research Paper",
"comments": "Important research findings",
"user": "alice",
"tags": ["research", "ai", "machine-learning"],
"metadata": [
{
"subject": "doc-123",
"predicate": "dc:creator",
"object": "Dr. Smith"
}
]
},
"content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
}
```
Response:
```json
{}
```
### GET-DOCUMENT-METADATA - Get Document Metadata
Request:
```json
{
"operation": "get-document-metadata",
"document_id": "doc-123",
"user": "alice"
}
```
Response:
```json
{
"document_metadata": {
"id": "doc-123",
"time": 1640995200000,
"kind": "application/pdf",
"title": "Research Paper",
"comments": "Important research findings",
"user": "alice",
"tags": ["research", "ai", "machine-learning"],
"metadata": [
{
"subject": "doc-123",
"predicate": "dc:creator",
"object": "Dr. Smith"
}
]
}
}
```
### GET-DOCUMENT-CONTENT - Get Document Content
Request:
```json
{
"operation": "get-document-content",
"document_id": "doc-123",
"user": "alice"
}
```
Response:
```json
{
"content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
}
```
### LIST-DOCUMENTS - List User's Documents
Request:
```json
{
"operation": "list-documents",
"user": "alice",
"collection": "research"
}
```
Response:
```json
{
"document_metadatas": [
{
"id": "doc-123",
"time": 1640995200000,
"kind": "application/pdf",
"title": "Research Paper",
"comments": "Important research findings",
"user": "alice",
"tags": ["research", "ai"]
},
{
"id": "doc-124",
"time": 1640995300000,
"kind": "text/plain",
"title": "Meeting Notes",
"comments": "Team meeting discussion",
"user": "alice",
"tags": ["meeting", "notes"]
}
]
}
```
### UPDATE-DOCUMENT - Update Document Metadata
Request:
```json
{
"operation": "update-document",
"document_metadata": {
"id": "doc-123",
"title": "Updated Research Paper",
"comments": "Updated findings and conclusions",
"user": "alice",
"tags": ["research", "ai", "machine-learning", "updated"]
}
}
```
Response:
```json
{}
```
### REMOVE-DOCUMENT - Remove Document
Request:
```json
{
"operation": "remove-document",
"document_id": "doc-123",
"user": "alice"
}
```
Response:
```json
{}
```
## Processing Operations
### ADD-PROCESSING - Start Document Processing
Request:
```json
{
"operation": "add-processing",
"processing_metadata": {
"id": "proc-456",
"document_id": "doc-123",
"time": 1640995400000,
"flow": "pdf-extraction",
"user": "alice",
"collection": "research",
"tags": ["extraction", "nlp"]
}
}
```
Response:
```json
{}
```
### LIST-PROCESSING - List Processing Jobs
Request:
```json
{
"operation": "list-processing",
"user": "alice",
"collection": "research"
}
```
Response:
```json
{
"processing_metadatas": [
{
"id": "proc-456",
"document_id": "doc-123",
"time": 1640995400000,
"flow": "pdf-extraction",
"user": "alice",
"collection": "research",
"tags": ["extraction", "nlp"]
}
]
}
```
### REMOVE-PROCESSING - Stop Processing Job
Request:
```json
{
"operation": "remove-processing",
"processing_id": "proc-456",
"user": "alice"
}
```
Response:
```json
{}
```
## REST service
The REST service is available at `/api/v1/librarian` and accepts the above request formats.
## Websocket
Requests have a `request` object containing the operation fields.
Responses have a `response` object containing the response fields.
Request:
```json
{
"id": "unique-request-id",
"service": "librarian",
"request": {
"operation": "list-documents",
"user": "alice"
}
}
```
Response:
```json
{
"id": "unique-request-id",
"response": {
"document_metadatas": [...]
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Librarian API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/library.py
Default request queue:
`non-persistent://tg/request/librarian`
Default response queue:
`non-persistent://tg/response/librarian`
Request schema:
`trustgraph.schema.LibrarianRequest`
Response schema:
`trustgraph.schema.LibrarianResponse`
## Python SDK
The Python SDK provides convenient access to the Librarian API:
```python
from trustgraph.api.library import LibrarianClient
client = LibrarianClient()
# Add a document
with open("document.pdf", "rb") as f:
content = f.read()
await client.add_document(
doc_id="doc-123",
title="Research Paper",
content=content,
user="alice",
tags=["research", "ai"]
)
# Get document metadata
metadata = await client.get_document_metadata("doc-123", "alice")
# List documents
documents = await client.list_documents("alice", collection="research")
# Start processing
await client.add_processing(
processing_id="proc-456",
document_id="doc-123",
flow="pdf-extraction",
user="alice"
)
```
## Features
- **Hybrid Storage**: MinIO for content, Cassandra for metadata
- **Multi-user Support**: User-based document ownership and access control
- **Rich Metadata**: RDF-style metadata triples and tagging system
- **Processing Integration**: Automatic triggering of document processing workflows
- **Content Types**: Support for multiple document formats (PDF, text, etc.)
- **Collection Management**: Optional document grouping by collection
- **Metadata Search**: Query documents by metadata criteria
## Use Cases
- **Document Management**: Store and organize documents with rich metadata
- **Knowledge Extraction**: Process documents to extract structured knowledge
- **Research Libraries**: Manage collections of research papers and documents
- **Content Processing**: Orchestrate document processing workflows
- **Multi-tenant Systems**: Support multiple users with isolated document libraries

313
docs/apis/api-metrics.md Normal file
View file

@ -0,0 +1,313 @@
# TrustGraph Metrics API
This API provides access to TrustGraph system metrics through a Prometheus proxy endpoint.
It allows authenticated access to monitoring and observability data from the TrustGraph
system components.
## Overview
The Metrics API is implemented as a proxy to a Prometheus metrics server, providing:
- System performance metrics
- Service health information
- Resource utilization data
- Request/response statistics
- Error rates and latency metrics
## Authentication
All metrics endpoints require Bearer token authentication:
```
Authorization: Bearer <your-api-token>
```
Unauthorized requests return HTTP 401.
## Endpoint
**Base Path:** `/api/metrics`
**Method:** GET
**Description:** Proxies requests to the underlying Prometheus API
## Usage Examples
### Query Current Metrics
```bash
# Get all available metrics
curl -H "Authorization: Bearer your-token" \
"http://api-gateway:8080/api/metrics/query?query=up"
# Get specific metric with time range
curl -H "Authorization: Bearer your-token" \
"http://api-gateway:8080/api/metrics/query_range?query=cpu_usage&start=1640995200&end=1640998800&step=60"
# Get metric metadata
curl -H "Authorization: Bearer your-token" \
"http://api-gateway:8080/api/metrics/metadata"
```
### Common Prometheus API Endpoints
The metrics API supports all standard Prometheus API endpoints:
#### Instant Queries
```
GET /api/metrics/query?query=<prometheus_query>
```
#### Range Queries
```
GET /api/metrics/query_range?query=<query>&start=<timestamp>&end=<timestamp>&step=<duration>
```
#### Metadata
```
GET /api/metrics/metadata
GET /api/metrics/metadata?metric=<metric_name>
```
#### Series
```
GET /api/metrics/series?match[]=<series_selector>
```
#### Label Values
```
GET /api/metrics/label/<label_name>/values
```
#### Targets
```
GET /api/metrics/targets
```
## Example Queries
### System Health
```bash
# Check if services are up
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=up"
# Get service uptime
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=time()-process_start_time_seconds"
```
### Performance Metrics
```bash
# CPU usage
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=rate(cpu_seconds_total[5m])"
# Memory usage
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=process_resident_memory_bytes"
# Request rate
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=rate(http_requests_total[5m])"
```
### TrustGraph-Specific Metrics
```bash
# Document processing rate
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=rate(trustgraph_documents_processed_total[5m])"
# Knowledge graph size
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=trustgraph_triples_count"
# Embedding generation rate
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=rate(trustgraph_embeddings_generated_total[5m])"
```
## Response Format
Responses follow the standard Prometheus API format:
### Successful Query Response
```json
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "up",
"instance": "api-gateway:8080",
"job": "trustgraph"
},
"value": [1640995200, "1"]
}
]
}
}
```
### Range Query Response
```json
{
"status": "success",
"data": {
"resultType": "matrix",
"result": [
{
"metric": {
"__name__": "cpu_usage",
"instance": "worker-1"
},
"values": [
[1640995200, "0.15"],
[1640995260, "0.18"],
[1640995320, "0.12"]
]
}
]
}
}
```
### Error Response
```json
{
"status": "error",
"errorType": "bad_data",
"error": "invalid query syntax"
}
```
## Available Metrics
### Standard System Metrics
- `up`: Service availability (1 = up, 0 = down)
- `process_resident_memory_bytes`: Memory usage
- `process_cpu_seconds_total`: CPU time
- `http_requests_total`: HTTP request count
- `http_request_duration_seconds`: Request latency
### TrustGraph-Specific Metrics
- `trustgraph_documents_processed_total`: Documents processed count
- `trustgraph_triples_count`: Knowledge graph triple count
- `trustgraph_embeddings_generated_total`: Embeddings generated count
- `trustgraph_flow_executions_total`: Flow execution count
- `trustgraph_pulsar_messages_total`: Pulsar message count
- `trustgraph_errors_total`: Error count by component
## Time Series Queries
### Time Ranges
Use standard Prometheus time range formats:
- `5m`: 5 minutes
- `1h`: 1 hour
- `1d`: 1 day
- `1w`: 1 week
### Rate Calculations
```bash
# 5-minute rate
rate(metric_name[5m])
# Increase over time
increase(metric_name[1h])
```
### Aggregations
```bash
# Sum across instances
sum(metric_name)
# Average by label
avg by (instance) (metric_name)
# Top 5 values
topk(5, metric_name)
```
## Integration Examples
### Python Integration
```python
import requests
def query_metrics(token, query):
headers = {"Authorization": f"Bearer {token}"}
params = {"query": query}
response = requests.get(
"http://api-gateway:8080/api/metrics/query",
headers=headers,
params=params
)
return response.json()
# Get system uptime
uptime = query_metrics("your-token", "time() - process_start_time_seconds")
```
### JavaScript Integration
```javascript
async function queryMetrics(token, query) {
const response = await fetch(
`http://api-gateway:8080/api/metrics/query?query=${encodeURIComponent(query)}`,
{
headers: {
'Authorization': `Bearer ${token}`
}
}
);
return await response.json();
}
// Get request rate
const requestRate = await queryMetrics('your-token', 'rate(http_requests_total[5m])');
```
## Error Handling
### Common HTTP Status Codes
- `200`: Success
- `400`: Bad request (invalid query)
- `401`: Unauthorized (invalid/missing token)
- `422`: Unprocessable entity (query execution error)
- `500`: Internal server error
### Error Types
- `bad_data`: Invalid query syntax
- `timeout`: Query execution timeout
- `canceled`: Query was canceled
- `execution`: Query execution error
## Best Practices
### Query Optimization
- Use appropriate time ranges to limit data volume
- Apply label filters to reduce result sets
- Use recording rules for frequently accessed metrics
### Rate Limiting
- Avoid high-frequency polling
- Cache results when appropriate
- Use appropriate step sizes for range queries
### Security
- Keep API tokens secure
- Use HTTPS in production
- Rotate tokens regularly
## Use Cases
- **System Monitoring**: Track system health and performance
- **Capacity Planning**: Monitor resource utilization trends
- **Alerting**: Set up alerts based on metric thresholds
- **Performance Analysis**: Analyze system performance over time
- **Debugging**: Investigate issues using detailed metrics
- **Business Intelligence**: Track document processing and knowledge extraction metrics

View file

@ -15,7 +15,7 @@ The request contains the following fields:
### Response
The request contains either of these fields:
The response contains either of these fields:
- `text`: A plain text response
- `object`: A structured object, JSON-encoded
@ -60,6 +60,7 @@ Request:
{
"id": "akshfkiehfkseffh-142",
"service": "prompt",
"flow": "default",
"request": {
"id": "extract-definitions",
"variables": {

View file

@ -19,7 +19,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `response`: LLM response
## REST service
@ -59,6 +59,7 @@ Request:
{
"id": "blrqotfefnmnh7de-1",
"service": "text-completion",
"flow": "default",
"request": {
"system": "You are a helpful agent",
"prompt": "What does NASA stand for?"

168
docs/apis/api-text-load.md Normal file
View file

@ -0,0 +1,168 @@
# TrustGraph Text Load API
This API loads text documents into TrustGraph processing pipelines. It's a sender API
that accepts text documents with metadata and queues them for processing through
specified flows.
## Request Format
The text-load API accepts a JSON request with the following fields:
- `id`: Document identifier (typically a URI)
- `metadata`: Array of RDF triples providing document metadata
- `charset`: Character encoding (defaults to "utf-8")
- `text`: Base64-encoded text content
- `user`: User identifier (defaults to "trustgraph")
- `collection`: Collection identifier (defaults to "default")
## Request Example
```json
{
"id": "https://example.com/documents/research-paper-123",
"metadata": [
{
"s": {"v": "https://example.com/documents/research-paper-123", "e": true},
"p": {"v": "http://purl.org/dc/terms/title", "e": true},
"o": {"v": "Machine Learning in Healthcare", "e": false}
},
{
"s": {"v": "https://example.com/documents/research-paper-123", "e": true},
"p": {"v": "http://purl.org/dc/terms/creator", "e": true},
"o": {"v": "Dr. Jane Smith", "e": false}
},
{
"s": {"v": "https://example.com/documents/research-paper-123", "e": true},
"p": {"v": "http://purl.org/dc/terms/subject", "e": true},
"o": {"v": "Healthcare AI", "e": false}
}
],
"charset": "utf-8",
"text": "VGhpcyBpcyBhIHNhbXBsZSByZXNlYXJjaCBwYXBlciBhYm91dCBtYWNoaW5lIGxlYXJuaW5nIGluIGhlYWx0aGNhcmUuLi4=",
"user": "researcher",
"collection": "healthcare-research"
}
```
## Response
The text-load API is a sender API with no response body. Success is indicated by HTTP status code 200.
## REST service
The text-load service is available at:
`POST /api/v1/flow/{flow-id}/service/text-load`
Where `{flow-id}` is the identifier of the flow that will process the document.
Example:
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d @document.json \
http://api-gateway:8080/api/v1/flow/pdf-processing/service/text-load
```
## Metadata Format
Each metadata triple contains:
- `s`: Subject (object with `v` for value and `e` for is_entity boolean)
- `p`: Predicate (object with `v` for value and `e` for is_entity boolean)
- `o`: Object (object with `v` for value and `e` for is_entity boolean)
The `e` field indicates whether the value should be treated as an entity (true) or literal (false).
## Common Metadata Properties
### Document Properties
- `http://purl.org/dc/terms/title`: Document title
- `http://purl.org/dc/terms/creator`: Document author
- `http://purl.org/dc/terms/subject`: Document subject/topic
- `http://purl.org/dc/terms/description`: Document description
- `http://purl.org/dc/terms/date`: Publication date
- `http://purl.org/dc/terms/language`: Document language
### Organizational Properties
- `http://xmlns.com/foaf/0.1/name`: Organization name
- `http://www.w3.org/2006/vcard/ns#hasAddress`: Organization address
- `http://xmlns.com/foaf/0.1/homepage`: Organization website
### Publication Properties
- `http://purl.org/ontology/bibo/doi`: DOI identifier
- `http://purl.org/ontology/bibo/isbn`: ISBN identifier
- `http://purl.org/ontology/bibo/volume`: Publication volume
- `http://purl.org/ontology/bibo/issue`: Publication issue
## Text Encoding
The `text` field must contain base64-encoded content. To encode text:
```bash
# Command line encoding
echo "Your text content here" | base64
# Python encoding
import base64
encoded_text = base64.b64encode("Your text content here".encode('utf-8')).decode('utf-8')
```
## Integration with Processing Flows
Once loaded, text documents are processed through the specified flow, which typically includes:
1. **Text Chunking**: Breaking documents into manageable chunks
2. **Embedding Generation**: Creating vector embeddings for semantic search
3. **Knowledge Extraction**: Extracting entities and relationships
4. **Graph Storage**: Storing extracted knowledge in the knowledge graph
5. **Indexing**: Making content searchable for RAG queries
## Error Handling
Common errors include:
- Invalid base64 encoding in text field
- Missing required fields (id, text)
- Invalid metadata triple format
- Flow not found or inactive
## Python SDK
```python
import base64
from trustgraph.api.text_load import TextLoadClient
client = TextLoadClient()
# Prepare document
document = {
"id": "https://example.com/doc-123",
"metadata": [
{
"s": {"v": "https://example.com/doc-123", "e": True},
"p": {"v": "http://purl.org/dc/terms/title", "e": True},
"o": {"v": "Sample Document", "e": False}
}
],
"charset": "utf-8",
"text": base64.b64encode("Document content here".encode('utf-8')).decode('utf-8'),
"user": "alice",
"collection": "research"
}
# Load document
await client.load_text_document("my-flow", document)
```
## Use Cases
- **Research Paper Ingestion**: Load academic papers with rich metadata
- **Document Processing**: Ingest documents for knowledge extraction
- **Content Management**: Build searchable document repositories
- **RAG System Population**: Load content for question-answering systems
- **Knowledge Base Construction**: Convert documents into structured knowledge
## Features
- **Rich Metadata**: Full RDF metadata support for semantic annotation
- **Flow Integration**: Direct integration with TrustGraph processing flows
- **Multi-tenant**: User and collection-based document organization
- **Encoding Support**: Flexible character encoding support
- **No Response Required**: Fire-and-forget operation for high throughput

View file

@ -21,7 +21,7 @@ Returned triples will match all of `s`, `p` and `o` where provided.
### Response
The request contains the following fields:
The response contains the following fields:
- `response`: A list of triples.
Each triple contains `s`, `p` and `o` fields describing the
@ -33,15 +33,53 @@ Each triple element uses the same schema:
- `is_uri`: A boolean value which is true if this is a graph entity i.e.
`value` is a URI, not a literal value.
## Data Format Details
### Triple Element Format
To reduce the size of JSON messages, triple elements (subject, predicate, object) are encoded using a compact format:
- `v`: The value as a string (maps to `value` in the full schema)
- `e`: Boolean indicating if this is an entity/URI (maps to `is_uri` in the full schema)
Each triple element (`s`, `p`, `o`) contains:
- `v`: The actual value as a string
- `e`: Boolean indicating the value type
- `true`: The value is a URI/entity (e.g., `"http://example.com/Person1"`)
- `false`: The value is a literal (e.g., `"John Doe"`, `"42"`, `"2023-01-01"`)
### Examples
**URI/Entity Element:**
```json
{
"v": "http://trustgraph.ai/e/space-station-modules",
"e": true
}
```
**Literal Element:**
```json
{
"v": "space station modules",
"e": false
}
```
**Numeric Literal:**
```json
{
"v": "42",
"e": false
}
```
## REST service
The REST service accepts a request object containing the `s`, `p`, `o`
and `limit` fields.
The response is a JSON object containing the `response` field.
To reduce the size of the JSON, the graph entities are encoded as an
object with `value` and `is_uri` mapped to `v` and `e` respectively.
e.g.
This example query matches triples with a subject of
@ -58,6 +96,7 @@ Request:
{
"id": "qgzw1287vfjc8wsk-4",
"service": "triples-query",
"flow": "default",
"request": {
"s": {
"v": "http://trustgraph.ai/e/space-station-modules",
@ -97,13 +136,9 @@ Response:
## Websocket
Requests have a `request` object containing the `system` and
`prompt` fields.
Requests have a `request` object containing the query fields (`s`, `p`, `o`, `limit`).
Responses have a `response` object containing `response` field.
To reduce the size of the JSON, the graph entities are encoded as an
object with `value` and `is_uri` mapped to `v` and `e` respectively.
e.g.
Request:
@ -178,10 +213,3 @@ The client class is
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/clients/triples_query_client.py

View file

@ -1,3 +1,230 @@
# TrustGraph Pulsar API
Coming soon
Apache Pulsar is the underlying message queue system used by TrustGraph for inter-component communication. Understanding Pulsar queue names is essential for direct integration with TrustGraph services.
## Overview
TrustGraph uses two types of APIs with different queue naming patterns:
1. **Global Services**: Fixed queue names, not dependent on flows
2. **Flow-Hosted Services**: Dynamic queue names that depend on the specific flow configuration
## Global Services (Fixed Queue Names)
These services run independently and have fixed Pulsar queue names:
### Config API
- **Request Queue**: `non-persistent://tg/request/config`
- **Response Queue**: `non-persistent://tg/response/config`
- **Push Queue**: `persistent://tg/config/config`
### Flow API
- **Request Queue**: `non-persistent://tg/request/flow`
- **Response Queue**: `non-persistent://tg/response/flow`
### Knowledge API
- **Request Queue**: `non-persistent://tg/request/knowledge`
- **Response Queue**: `non-persistent://tg/response/knowledge`
### Librarian API
- **Request Queue**: `non-persistent://tg/request/librarian`
- **Response Queue**: `non-persistent://tg/response/librarian`
## Flow-Hosted Services (Dynamic Queue Names)
These services are hosted within specific flows and have queue names that depend on the flow configuration:
- Agent API
- Document RAG API
- Graph RAG API
- Text Completion API
- Prompt API
- Embeddings API
- Graph Embeddings API
- Triples Query API
- Text Load API
- Document Load API
## Discovering Flow-Hosted Queue Names
To find the queue names for flow-hosted services, you need to query the flow configuration using the Config API.
### Method 1: Using the Config API
Query for the flow configuration:
**Request:**
```json
{
"operation": "get",
"keys": [
{
"type": "flows",
"key": "your-flow-name"
}
]
}
```
**Response:**
The response will contain a flow definition with an "interfaces" object that lists all queue names.
### Method 2: Using the CLI
Use the TrustGraph CLI to dump the configuration:
```bash
tg-show-config
```
## Flow Interface Types
Flow configurations define two types of service interfaces:
### 1. Request/Response Interfaces
Services that accept a request and return a response:
```json
{
"graph-rag": {
"request": "non-persistent://tg/request/graph-rag:document-rag+graph-rag",
"response": "non-persistent://tg/response/graph-rag:document-rag+graph-rag"
}
}
```
**Examples**: agent, document-rag, graph-rag, text-completion, prompt, embeddings, graph-embeddings, triples
### 2. Fire-and-Forget Interfaces
Services that accept data but don't return a response:
```json
{
"text-load": "persistent://tg/flow/text-document-load:default"
}
```
**Examples**: text-load, document-load, triples-store, graph-embeddings-store, document-embeddings-store, entity-contexts-load
## Example Flow Configuration
Here's an example of a complete flow configuration showing queue names:
```json
{
"class-name": "document-rag+graph-rag",
"description": "Default processing flow",
"interfaces": {
"agent": {
"request": "non-persistent://tg/request/agent:default",
"response": "non-persistent://tg/response/agent:default"
},
"document-rag": {
"request": "non-persistent://tg/request/document-rag:document-rag+graph-rag",
"response": "non-persistent://tg/response/document-rag:document-rag+graph-rag"
},
"graph-rag": {
"request": "non-persistent://tg/request/graph-rag:document-rag+graph-rag",
"response": "non-persistent://tg/response/graph-rag:document-rag+graph-rag"
},
"text-completion": {
"request": "non-persistent://tg/request/text-completion:document-rag+graph-rag",
"response": "non-persistent://tg/response/text-completion:document-rag+graph-rag"
},
"embeddings": {
"request": "non-persistent://tg/request/embeddings:document-rag+graph-rag",
"response": "non-persistent://tg/response/embeddings:document-rag+graph-rag"
},
"triples": {
"request": "non-persistent://tg/request/triples:document-rag+graph-rag",
"response": "non-persistent://tg/response/triples:document-rag+graph-rag"
},
"text-load": "persistent://tg/flow/text-document-load:default",
"document-load": "persistent://tg/flow/document-load:default",
"triples-store": "persistent://tg/flow/triples-store:default",
"graph-embeddings-store": "persistent://tg/flow/graph-embeddings-store:default"
}
}
```
## Queue Naming Patterns
### Global Services
- **Pattern**: `{persistence}://tg/{namespace}/{service-name}`
- **Example**: `non-persistent://tg/request/config`
### Flow-Hosted Request/Response
- **Pattern**: `{persistence}://tg/{namespace}/{service-name}:{flow-identifier}`
- **Example**: `non-persistent://tg/request/graph-rag:document-rag+graph-rag`
### Flow-Hosted Fire-and-Forget
- **Pattern**: `{persistence}://tg/flow/{service-name}:{flow-identifier}`
- **Example**: `persistent://tg/flow/text-document-load:default`
## Persistence Types
- **non-persistent**: Messages are not persisted to disk, faster but less reliable
- **persistent**: Messages are persisted to disk, slower but more reliable
## Practical Usage
### Python Example
```python
import pulsar
from trustgraph.schema import ConfigRequest, ConfigResponse
# Connect to Pulsar
client = pulsar.Client('pulsar://localhost:6650')
# Create producer for config requests
producer = client.create_producer(
'non-persistent://tg/request/config',
schema=pulsar.schema.AvroSchema(ConfigRequest)
)
# Create consumer for config responses
consumer = client.subscribe(
'non-persistent://tg/response/config',
subscription_name='my-subscription',
schema=pulsar.schema.AvroSchema(ConfigResponse)
)
# Send request
request = ConfigRequest(operation='list-classes')
producer.send(request)
# Receive response
response = consumer.receive()
print(response.value())
```
### Flow Service Example
```python
# First, get the flow configuration to find queue names
config_request = ConfigRequest(
operation='get',
keys=[ConfigKey(type='flows', key='my-flow')]
)
# Use the returned interface information to determine queue names
# Then connect to the appropriate queues for the service you need
```
## Best Practices
1. **Query Flow Configuration**: Always query the current flow configuration to get accurate queue names
2. **Handle Dynamic Names**: Flow-hosted service queue names can change when flows are reconfigured
3. **Choose Appropriate Persistence**: Use persistent queues for critical data, non-persistent for performance
4. **Schema Validation**: Use the appropriate Pulsar schema for each service
5. **Error Handling**: Implement proper error handling for queue connection and message failures
## Security Considerations
- Pulsar access should be restricted in production environments
- Use appropriate authentication and authorization mechanisms
- Monitor queue access and message patterns for security anomalies
- Consider encryption for sensitive data in messages

View file

@ -18,13 +18,16 @@ When hosted using docker compose, you can access the service at
## Request
A request message is a JSON message containing 3 fields:
A request message is a JSON message containing 3/4 fields:
- `id`: A unique ID which is used to correlate requests and responses.
You should make sure it is unique.
- `service`: The name of the service to invoke.
- `request`: The request body which is passed to the service - this is
defined in the API documentation for that service.
- `flow`: Some APIs are supported by processors launched within a flow,
are are dependent on a flow running. For such APIs, the flow identifier
needs to be provided.
e.g.
@ -32,6 +35,7 @@ e.g.
{
"id": "qgzw1287vfjc8wsk-1",
"service": "graph-rag",
"flow": "default",
"request": {
"query": "What does NASA stand for?"
}
@ -86,6 +90,7 @@ Request:
{
"id": "blrqotfefnmnh7de-20",
"service": "agent",
"flow": "default",
"request": {
"question": "What does NASA stand for?"
}

170
docs/cli/README.md Normal file
View file

@ -0,0 +1,170 @@
# TrustGraph CLI Documentation
The TrustGraph Command Line Interface (CLI) provides comprehensive command-line access to all TrustGraph services. These tools wrap the REST and WebSocket APIs to provide convenient, scriptable access to TrustGraph functionality.
## Installation
The CLI tools are installed as part of the `trustgraph-cli` package:
```bash
pip install trustgraph-cli
```
## Global Options
Most CLI commands support these common options:
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
- `-C, --collection COLLECTION`: Collection identifier (default: `default`)
- `-f, --flow-id FLOW`: Flow identifier (default: `default`)
## Command Categories
### System Administration & Configuration
**System Setup:**
- [`tg-init-trustgraph`](tg-init-trustgraph.md) - Initialize Pulsar with TrustGraph configuration
- [`tg-init-pulsar-manager`](tg-init-pulsar-manager.md) - Initialize Pulsar manager setup
- [`tg-show-config`](tg-show-config.md) - Display current system configuration
**Token Management:**
- [`tg-set-token-costs`](tg-set-token-costs.md) - Configure model token costs
- [`tg-show-token-costs`](tg-show-token-costs.md) - Display token cost configuration
- [`tg-show-token-rate`](tg-show-token-rate.md) - Show token usage rates
**Prompt Management:**
- [`tg-set-prompt`](tg-set-prompt.md) - Configure prompt templates and system prompts
- [`tg-show-prompts`](tg-show-prompts.md) - Display configured prompt templates
### Flow Management
**Flow Operations:**
- [`tg-start-flow`](tg-start-flow.md) - Start a processing flow
- [`tg-stop-flow`](tg-stop-flow.md) - Stop a running flow
- [`tg-show-flows`](tg-show-flows.md) - List all configured flows
- [`tg-show-flow-state`](tg-show-flow-state.md) - Show current flow states
**Flow Class Management:**
- [`tg-put-flow-class`](tg-put-flow-class.md) - Upload/update flow class definition
- [`tg-get-flow-class`](tg-get-flow-class.md) - Retrieve flow class definition
- [`tg-delete-flow-class`](tg-delete-flow-class.md) - Remove flow class definition
- [`tg-show-flow-classes`](tg-show-flow-classes.md) - List available flow classes
### Knowledge Graph Management
**Knowledge Core Operations:**
- [`tg-load-kg-core`](tg-load-kg-core.md) - Load knowledge core into processing
- [`tg-put-kg-core`](tg-put-kg-core.md) - Store knowledge core in system
- [`tg-get-kg-core`](tg-get-kg-core.md) - Retrieve knowledge core
- [`tg-delete-kg-core`](tg-delete-kg-core.md) - Remove knowledge core
- [`tg-unload-kg-core`](tg-unload-kg-core.md) - Unload knowledge core from processing
- [`tg-show-kg-cores`](tg-show-kg-cores.md) - List available knowledge cores
**Graph Data Operations:**
- [`tg-show-graph`](tg-show-graph.md) - Display graph triples/edges
- [`tg-graph-to-turtle`](tg-graph-to-turtle.md) - Export graph to Turtle format
- [`tg-load-turtle`](tg-load-turtle.md) - Import RDF triples from Turtle files
### Document Processing & Library Management
**Document Loading:**
- [`tg-load-pdf`](tg-load-pdf.md) - Load PDF documents into processing
- [`tg-load-text`](tg-load-text.md) - Load text documents into processing
- [`tg-load-sample-documents`](tg-load-sample-documents.md) - Load sample documents for testing
**Library Management:**
- [`tg-add-library-document`](tg-add-library-document.md) - Add documents to library
- [`tg-show-library-documents`](tg-show-library-documents.md) - List documents in library
- [`tg-remove-library-document`](tg-remove-library-document.md) - Remove documents from library
- [`tg-start-library-processing`](tg-start-library-processing.md) - Start processing library documents
- [`tg-stop-library-processing`](tg-stop-library-processing.md) - Stop library document processing
- [`tg-show-library-processing`](tg-show-library-processing.md) - Show library processing status
**Document Embeddings:**
- [`tg-load-doc-embeds`](tg-load-doc-embeds.md) - Load document embeddings
- [`tg-save-doc-embeds`](tg-save-doc-embeds.md) - Save document embeddings
### AI Services & Agent Interaction
**Query & Interaction:**
- [`tg-invoke-agent`](tg-invoke-agent.md) - Interactive agent Q&A via WebSocket
- [`tg-invoke-llm`](tg-invoke-llm.md) - Direct LLM text completion
- [`tg-invoke-prompt`](tg-invoke-prompt.md) - Use configured prompt templates
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Document-based RAG queries
- [`tg-invoke-graph-rag`](tg-invoke-graph-rag.md) - Graph-based RAG queries
**Tool & Prompt Management:**
- [`tg-show-tools`](tg-show-tools.md) - List available agent tools
- [`tg-set-prompt`](tg-set-prompt.md) - Configure prompt templates
- [`tg-show-prompts`](tg-show-prompts.md) - List configured prompts
### System Monitoring & Debugging
**System Status:**
- [`tg-show-processor-state`](tg-show-processor-state.md) - Show processing component states
**Debugging:**
- [`tg-dump-msgpack`](tg-dump-msgpack.md) - Dump MessagePack data for debugging
## Quick Start Examples
### Basic Document Processing
```bash
# Start a flow
tg-start-flow --flow-id my-flow --class-name document-processing
# Load a document
tg-load-text --flow-id my-flow --text "Your document content" --title "Test Document"
# Query the knowledge
tg-invoke-graph-rag --flow-id my-flow --query "What is the document about?"
```
### Knowledge Management
```bash
# List available knowledge cores
tg-show-kg-cores
# Load a knowledge core into a flow
tg-load-kg-core --flow-id my-flow --kg-core-id my-knowledge
# Query the knowledge graph
tg-show-graph --limit 100
```
### Flow Management
```bash
# Show available flow classes
tg-show-flow-classes
# Show running flows
tg-show-flows
# Stop a flow
tg-stop-flow --flow-id my-flow
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL for all commands
- `TRUSTGRAPH_USER`: Default user identifier
- `TRUSTGRAPH_COLLECTION`: Default collection identifier
## Authentication
CLI commands inherit authentication from the environment or API configuration. See the main TrustGraph documentation for authentication setup.
## Error Handling
All CLI commands provide:
- Consistent error reporting
- Exit codes (0 for success, non-zero for errors)
- Detailed error messages for troubleshooting
- Retry logic for network operations where appropriate
## Related Documentation
- [TrustGraph API Documentation](../apis/README.md)
- [TrustGraph WebSocket Guide](../apis/websocket.md)
- [TrustGraph Pulsar Guide](../apis/pulsar.md)

View file

@ -0,0 +1,285 @@
# tg-add-library-document
Adds documents to the TrustGraph library with comprehensive metadata support.
## Synopsis
```bash
tg-add-library-document [options] file1 [file2 ...]
```
## Description
The `tg-add-library-document` command adds documents to the TrustGraph library system, which provides persistent document storage with rich metadata management. Unlike direct document loading, the library approach offers better document lifecycle management, metadata preservation, and processing control.
Documents added to the library can later be processed using `tg-start-library-processing` for controlled batch processing operations.
## Options
### Connection & User
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
### Document Information
- `--name NAME`: Document name/title
- `--description DESCRIPTION`: Document description
- `--id ID`: Custom document identifier (if not specified, uses content hash)
- `--kind MIMETYPE`: Document MIME type (auto-detected if not specified)
- `--tags TAGS`: Comma-separated list of tags
### Copyright Information
- `--copyright-notice NOTICE`: Copyright notice text
- `--copyright-holder HOLDER`: Copyright holder name
- `--copyright-year YEAR`: Copyright year
- `--license LICENSE`: Copyright license
### Publication Information
- `--publication-organization ORG`: Publishing organization name
- `--publication-description DESC`: Publication description
- `--publication-date DATE`: Publication date
- `--publication-url URL`: Publication URL
### Document Source
- `--document-url URL`: Original document source URL
- `--keyword KEYWORDS`: Document keywords (space-separated)
## Arguments
- `file1 [file2 ...]`: One or more files to add to the library
## Examples
### Basic Document Addition
```bash
tg-add-library-document report.pdf
```
### With Complete Metadata
```bash
tg-add-library-document \
--name "Annual Research Report 2024" \
--description "Comprehensive analysis of research outcomes" \
--copyright-holder "Research Institute" \
--copyright-year "2024" \
--license "CC BY 4.0" \
--tags "research,annual,analysis" \
--keyword "research" "analysis" "2024" \
annual-report.pdf
```
### Academic Paper
```bash
tg-add-library-document \
--name "Machine Learning in Healthcare" \
--description "Study on ML applications in medical diagnosis" \
--publication-organization "University Medical School" \
--publication-date "2024-03-15" \
--copyright-holder "Dr. Jane Smith" \
--tags "machine-learning,healthcare,medical" \
--keyword "ML" "healthcare" "diagnosis" \
ml-healthcare-paper.pdf
```
### Multiple Documents with Shared Metadata
```bash
tg-add-library-document \
--publication-organization "Tech Company" \
--copyright-holder "Tech Company Inc." \
--copyright-year "2024" \
--license "Proprietary" \
--tags "documentation,technical" \
manual-v1.pdf manual-v2.pdf manual-v3.pdf
```
### Custom Document ID
```bash
tg-add-library-document \
--id "PROJ-2024-001" \
--name "Project Specification" \
--description "Technical requirements document" \
project-spec.docx
```
## Document Processing
1. **File Reading**: Reads document content as binary data
2. **ID Generation**: Creates SHA256 hash-based ID (unless custom ID provided)
3. **Metadata Assembly**: Combines all metadata into structured format
4. **Library Storage**: Stores document and metadata in library system
5. **URI Creation**: Generates TrustGraph document URI
## Document ID Generation
- **Automatic**: SHA256 hash of file content converted to TrustGraph URI
- **Custom**: Use `--id` parameter for specific identifiers
- **Format**: `http://trustgraph.ai/d/[hash-or-custom-id]`
## MIME Type Detection
The system automatically detects document types:
- **PDF**: `application/pdf`
- **Word**: `application/vnd.openxmlformats-officedocument.wordprocessingml.document`
- **Text**: `text/plain`
- **HTML**: `text/html`
Override with `--kind` parameter if needed.
## Metadata Format
Metadata is stored as RDF triples including:
### Dublin Core Properties
- `dc:title`: Document name
- `dc:description`: Document description
- `dc:creator`: Copyright holder
- `dc:date`: Publication date
- `dc:rights`: Copyright notice
- `dc:license`: License information
- `dc:subject`: Keywords and tags
### Organization Information
- `foaf:Organization`: Publisher details
- `foaf:name`: Organization name
- `vcard:hasURL`: Organization website
### Document Properties
- `bibo:doi`: DOI if applicable
- `bibo:url`: Document source URL
## Output
For each successfully added document:
```bash
report.pdf: Loaded successfully.
```
For failures:
```bash
invalid.pdf: Failed: File not found
```
## Error Handling
### File Errors
```bash
document.pdf: Failed: No such file or directory
```
**Solution**: Verify file path exists and is readable.
### Permission Errors
```bash
document.pdf: Failed: Permission denied
```
**Solution**: Check file permissions and user access rights.
### Connection Errors
```bash
document.pdf: Failed: Connection refused
```
**Solution**: Verify API URL and ensure TrustGraph is running.
### Library Errors
```bash
document.pdf: Failed: Document already exists
```
**Solution**: Use different ID or update existing document.
## Library Management Workflow
### 1. Add Documents
```bash
tg-add-library-document research-paper.pdf
```
### 2. Verify Addition
```bash
tg-show-library-documents
```
### 3. Start Processing
```bash
tg-start-library-processing --flow-id research-flow
```
### 4. Monitor Processing
```bash
tg-show-library-processing
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-library-documents`](tg-show-library-documents.md) - List library documents
- [`tg-remove-library-document`](tg-remove-library-document.md) - Remove documents from library
- [`tg-start-library-processing`](tg-start-library-processing.md) - Process library documents
- [`tg-stop-library-processing`](tg-stop-library-processing.md) - Stop library processing
- [`tg-show-library-processing`](tg-show-library-processing.md) - Show processing status
## API Integration
This command uses the [Librarian API](../apis/api-librarian.md) with the `add-document` operation to store documents with metadata.
## Use Cases
### Research Document Management
```bash
tg-add-library-document \
--name "Climate Change Analysis" \
--publication-organization "Climate Research Institute" \
--tags "climate,research,environment" \
climate-study.pdf
```
### Corporate Documentation
```bash
tg-add-library-document \
--name "Product Manual v2.1" \
--copyright-holder "Acme Corporation" \
--license "Proprietary" \
--tags "manual,product,v2.1" \
product-manual.pdf
```
### Legal Document Archive
```bash
tg-add-library-document \
--name "Contract Template" \
--description "Standard service agreement template" \
--copyright-holder "Legal Department" \
--tags "legal,contract,template" \
contract-template.docx
```
### Academic Paper Collection
```bash
tg-add-library-document \
--publication-organization "IEEE" \
--copyright-year "2024" \
--tags "academic,ieee,conference" \
paper1.pdf paper2.pdf paper3.pdf
```
## Best Practices
1. **Consistent Metadata**: Use standardized metadata fields for better organization
2. **Meaningful Tags**: Add relevant tags for document discovery
3. **Copyright Information**: Include complete copyright details for legal compliance
4. **Batch Operations**: Process related documents together with shared metadata
5. **Version Control**: Use clear naming and tagging for document versions
6. **Library Organization**: Use collections and user assignments for multi-tenant systems
## Advantages over Direct Loading
### Library Benefits
- **Persistent Storage**: Documents preserved in library system
- **Metadata Management**: Rich metadata storage and querying
- **Processing Control**: Controlled batch processing with start/stop
- **Document Lifecycle**: Full document management capabilities
- **Search and Discovery**: Better document organization and retrieval
### When to Use Library vs Direct Loading
- **Use Library**: For document management, metadata preservation, controlled processing
- **Use Direct Loading**: For immediate processing, simple workflows, temporary documents

View file

@ -0,0 +1,330 @@
# tg-delete-flow-class
Permanently deletes a flow class definition from TrustGraph.
## Synopsis
```bash
tg-delete-flow-class -n CLASS_NAME [options]
```
## Description
The `tg-delete-flow-class` command permanently removes a flow class definition from TrustGraph. This operation cannot be undone, so use with caution.
**⚠️ Warning**: Deleting a flow class that has active flow instances may cause those instances to become unusable. Always check for active flows before deletion.
## Options
### Required Arguments
- `-n, --class-name CLASS_NAME`: Name of the flow class to delete
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Delete a Flow Class
```bash
tg-delete-flow-class -n "old-test-flow"
```
### Delete with Custom API URL
```bash
tg-delete-flow-class -n "deprecated-flow" -u http://staging:8088/
```
### Safe Deletion Workflow
```bash
# 1. Check if flow class exists
tg-show-flow-classes | grep "target-flow"
# 2. Backup the flow class first
tg-get-flow-class -n "target-flow" > backup-target-flow.json
# 3. Check for active flow instances
tg-show-flows | grep "target-flow"
# 4. Delete the flow class
tg-delete-flow-class -n "target-flow"
# 5. Verify deletion
tg-show-flow-classes | grep "target-flow" || echo "Flow class deleted successfully"
```
## Prerequisites
### Flow Class Must Exist
Verify the flow class exists before attempting deletion:
```bash
# List all flow classes
tg-show-flow-classes
# Check specific flow class
tg-show-flow-classes | grep "target-class"
```
### Check for Active Flow Instances
Before deleting a flow class, check if any flow instances are using it:
```bash
# List all active flows
tg-show-flows
# Look for instances using the flow class
tg-show-flows | grep "target-class"
```
## Error Handling
### Flow Class Not Found
```bash
Exception: Flow class 'nonexistent-class' not found
```
**Solution**: Verify the flow class exists with `tg-show-flow-classes`.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
### Permission Errors
```bash
Exception: Access denied to delete flow class
```
**Solution**: Verify user permissions for flow class management.
### Active Flow Instances
```bash
Exception: Cannot delete flow class with active instances
```
**Solution**: Stop all flow instances using this class before deletion.
## Use Cases
### Cleanup Development Classes
```bash
# Delete test and development flow classes
test_classes=("test-flow-v1" "dev-experiment" "prototype-flow")
for class in "${test_classes[@]}"; do
echo "Deleting $class..."
tg-delete-flow-class -n "$class"
done
```
### Migration Cleanup
```bash
# After migrating to new flow classes, remove old ones
old_classes=("legacy-flow" "deprecated-processor" "old-pipeline")
for class in "${old_classes[@]}"; do
# Backup first
tg-get-flow-class -n "$class" > "backup-$class.json" 2>/dev/null
# Delete
tg-delete-flow-class -n "$class"
echo "Deleted $class"
done
```
### Conditional Deletion
```bash
# Delete flow class only if no active instances exist
flow_class="target-flow"
active_instances=$(tg-show-flows | grep "$flow_class" | wc -l)
if [ $active_instances -eq 0 ]; then
echo "No active instances found, deleting flow class..."
tg-delete-flow-class -n "$flow_class"
else
echo "Warning: $active_instances active instances found. Cannot delete."
tg-show-flows | grep "$flow_class"
fi
```
## Safety Considerations
### Always Backup First
```bash
# Create backup before deletion
flow_class="important-flow"
backup_dir="flow-class-backups/$(date +%Y%m%d-%H%M%S)"
mkdir -p "$backup_dir"
echo "Backing up flow class: $flow_class"
tg-get-flow-class -n "$flow_class" > "$backup_dir/$flow_class.json"
if [ $? -eq 0 ]; then
echo "Backup created: $backup_dir/$flow_class.json"
echo "Proceeding with deletion..."
tg-delete-flow-class -n "$flow_class"
else
echo "Backup failed. Aborting deletion."
exit 1
fi
```
### Verification Script
```bash
#!/bin/bash
# safe-delete-flow-class.sh
flow_class="$1"
if [ -z "$flow_class" ]; then
echo "Usage: $0 <flow-class-name>"
exit 1
fi
echo "Safety checks for deleting flow class: $flow_class"
# Check if flow class exists
if ! tg-show-flow-classes | grep -q "$flow_class"; then
echo "ERROR: Flow class '$flow_class' not found"
exit 1
fi
# Check for active instances
active_count=$(tg-show-flows | grep "$flow_class" | wc -l)
if [ $active_count -gt 0 ]; then
echo "ERROR: Found $active_count active instances using this flow class"
echo "Active instances:"
tg-show-flows | grep "$flow_class"
exit 1
fi
# Create backup
backup_file="backup-$flow_class-$(date +%Y%m%d-%H%M%S).json"
echo "Creating backup: $backup_file"
tg-get-flow-class -n "$flow_class" > "$backup_file"
if [ $? -ne 0 ]; then
echo "ERROR: Failed to create backup"
exit 1
fi
# Confirm deletion
echo "Ready to delete flow class: $flow_class"
echo "Backup saved as: $backup_file"
read -p "Are you sure you want to delete this flow class? (y/N): " confirm
if [ "$confirm" = "y" ] || [ "$confirm" = "Y" ]; then
echo "Deleting flow class..."
tg-delete-flow-class -n "$flow_class"
# Verify deletion
if ! tg-show-flow-classes | grep -q "$flow_class"; then
echo "Flow class deleted successfully"
else
echo "ERROR: Flow class still exists after deletion"
exit 1
fi
else
echo "Deletion cancelled"
rm "$backup_file"
fi
```
## Integration with Other Commands
### Complete Flow Class Lifecycle
```bash
# 1. List existing flow classes
tg-show-flow-classes
# 2. Get flow class details
tg-get-flow-class -n "target-flow"
# 3. Check for active instances
tg-show-flows | grep "target-flow"
# 4. Stop active instances if needed
tg-stop-flow -i "instance-id"
# 5. Create backup
tg-get-flow-class -n "target-flow" > backup.json
# 6. Delete flow class
tg-delete-flow-class -n "target-flow"
# 7. Verify deletion
tg-show-flow-classes | grep "target-flow"
```
### Bulk Deletion with Validation
```bash
# Delete multiple flow classes safely
classes_to_delete=("old-flow1" "old-flow2" "test-flow")
for class in "${classes_to_delete[@]}"; do
echo "Processing $class..."
# Check if exists
if ! tg-show-flow-classes | grep -q "$class"; then
echo " $class not found, skipping"
continue
fi
# Check for active instances
if tg-show-flows | grep -q "$class"; then
echo " $class has active instances, skipping"
continue
fi
# Backup and delete
tg-get-flow-class -n "$class" > "backup-$class.json"
tg-delete-flow-class -n "$class"
echo " $class deleted"
done
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-flow-classes`](tg-show-flow-classes.md) - List available flow classes
- [`tg-get-flow-class`](tg-get-flow-class.md) - Retrieve flow class definitions
- [`tg-put-flow-class`](tg-put-flow-class.md) - Create/update flow class definitions
- [`tg-show-flows`](tg-show-flows.md) - List active flow instances
- [`tg-stop-flow`](tg-stop-flow.md) - Stop flow instances
## API Integration
This command uses the [Flow API](../apis/api-flow.md) with the `delete-class` operation to remove flow class definitions.
## Best Practices
1. **Always Backup**: Create backups before deletion
2. **Check Dependencies**: Verify no active flow instances exist
3. **Confirmation**: Use interactive confirmation for important deletions
4. **Logging**: Log deletion operations for audit trails
5. **Permissions**: Ensure appropriate access controls for deletion operations
6. **Testing**: Test deletion procedures in non-production environments first
## Troubleshooting
### Command Succeeds but Class Still Exists
```bash
# Check if deletion actually occurred
tg-show-flow-classes | grep "deleted-class"
# Verify API connectivity
tg-show-flow-classes > /dev/null && echo "API accessible"
```
### Permissions Issues
```bash
# Verify user has deletion permissions
# Contact system administrator if access denied
```
### Network Connectivity
```bash
# Test API connectivity
curl -s "$TRUSTGRAPH_URL/api/v1/flow/classes" > /dev/null
echo "API response: $?"
```

View file

@ -0,0 +1,312 @@
# tg-delete-kg-core
Permanently removes a knowledge core from the TrustGraph system.
## Synopsis
```bash
tg-delete-kg-core --id CORE_ID [options]
```
## Description
The `tg-delete-kg-core` command permanently removes a stored knowledge core from the TrustGraph system. This operation is irreversible and will delete all RDF triples, graph embeddings, and metadata associated with the specified knowledge core.
**Warning**: This operation permanently deletes data. Ensure you have backups if the knowledge core might be needed in the future.
## Options
### Required Arguments
- `--id, --identifier CORE_ID`: Identifier of the knowledge core to delete
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
## Examples
### Delete Specific Knowledge Core
```bash
tg-delete-kg-core --id "old-research-data"
```
### Delete with Specific User
```bash
tg-delete-kg-core --id "test-knowledge" -U developer
```
### Using Custom API URL
```bash
tg-delete-kg-core --id "obsolete-core" -u http://production:8088/
```
## Prerequisites
### Knowledge Core Must Exist
Verify the knowledge core exists before deletion:
```bash
# Check available knowledge cores
tg-show-kg-cores
# Ensure the core exists
tg-show-kg-cores | grep "target-core-id"
```
### Backup Important Data
Create backups before deletion:
```bash
# Export knowledge core before deletion
tg-get-kg-core --id "important-core" -o backup.msgpack
# Then proceed with deletion
tg-delete-kg-core --id "important-core"
```
## Safety Considerations
### Unload from Flows First
Unload the knowledge core from any active flows:
```bash
# Check which flows might be using the core
tg-show-flows
# Unload from active flows
tg-unload-kg-core --id "target-core" --flow-id "active-flow"
# Then delete the core
tg-delete-kg-core --id "target-core"
```
### Verify Dependencies
Check if other systems depend on the knowledge core:
```bash
# Search for references in flow configurations
tg-show-config | grep "target-core"
# Check processing history
tg-show-library-processing | grep "target-core"
```
## Deletion Process
1. **Validation**: Verifies knowledge core exists and user has permission
2. **Dependency Check**: Ensures core is not actively loaded in flows
3. **Data Removal**: Permanently deletes RDF triples and graph embeddings
4. **Metadata Cleanup**: Removes all associated metadata and references
5. **Index Updates**: Updates system indexes to reflect deletion
## Output
Successful deletion typically produces no output:
```bash
# Delete core (no output expected on success)
tg-delete-kg-core --id "test-core"
# Verify deletion
tg-show-kg-cores | grep "test-core"
# Should return no results
```
## Error Handling
### Knowledge Core Not Found
```bash
Exception: Knowledge core 'invalid-core' not found
```
**Solution**: Check available cores with `tg-show-kg-cores` and verify the core ID.
### Permission Denied
```bash
Exception: Access denied to knowledge core
```
**Solution**: Verify user permissions and ownership of the knowledge core.
### Core In Use
```bash
Exception: Knowledge core is currently loaded in active flows
```
**Solution**: Unload the core from all flows before deletion using `tg-unload-kg-core`.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
## Deletion Verification
### Confirm Deletion
```bash
# Verify core no longer exists
tg-show-kg-cores | grep "deleted-core-id"
# Should return no results if successfully deleted
echo $? # Should be 1 (not found)
```
### Check Flow Impact
```bash
# Verify flows are not affected
tg-show-flows
# Test that queries still work for remaining knowledge
tg-invoke-graph-rag -q "test query" -f remaining-flow
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-kg-cores`](tg-show-kg-cores.md) - List available knowledge cores
- [`tg-get-kg-core`](tg-get-kg-core.md) - Export knowledge core for backup
- [`tg-unload-kg-core`](tg-unload-kg-core.md) - Unload core from flows
- [`tg-put-kg-core`](tg-put-kg-core.md) - Store new knowledge cores
## API Integration
This command uses the [Knowledge API](../apis/api-knowledge.md) with the `delete-kg-core` operation to permanently remove knowledge cores.
## Use Cases
### Development Cleanup
```bash
# Remove test knowledge cores
tg-delete-kg-core --id "test-data-v1" -U developer
tg-delete-kg-core --id "experimental-core" -U developer
```
### Version Management
```bash
# Remove obsolete versions after upgrading
tg-get-kg-core --id "knowledge-v1" -o backup-v1.msgpack
tg-delete-kg-core --id "knowledge-v1"
# Keep only knowledge-v2
```
### Storage Cleanup
```bash
# Clean up unused knowledge cores
for core in $(tg-show-kg-cores | grep "temp-"); do
echo "Deleting temporary core: $core"
tg-delete-kg-core --id "$core"
done
```
### Error Recovery
```bash
# Remove corrupted knowledge cores
tg-delete-kg-core --id "corrupted-core-2024"
tg-put-kg-core --id "restored-core-2024" -i restored-backup.msgpack
```
## Safe Deletion Workflow
### Standard Procedure
```bash
# 1. Backup the knowledge core
tg-get-kg-core --id "target-core" -o "backup-$(date +%Y%m%d).msgpack"
# 2. Unload from active flows
tg-unload-kg-core --id "target-core" --flow-id "production-flow"
# 3. Verify no dependencies
tg-show-config | grep "target-core"
# 4. Perform deletion
tg-delete-kg-core --id "target-core"
# 5. Verify deletion
tg-show-kg-cores | grep "target-core"
```
### Bulk Deletion
```bash
# Delete multiple cores safely
cores_to_delete=("old-core-1" "old-core-2" "test-core")
for core in "${cores_to_delete[@]}"; do
echo "Processing $core..."
# Backup
tg-get-kg-core --id "$core" -o "backup-$core-$(date +%Y%m%d).msgpack"
# Delete
tg-delete-kg-core --id "$core"
# Verify
if tg-show-kg-cores | grep -q "$core"; then
echo "ERROR: $core still exists after deletion"
else
echo "SUCCESS: $core deleted"
fi
done
```
## Best Practices
1. **Always Backup**: Export knowledge cores before deletion
2. **Check Dependencies**: Verify no flows are using the core
3. **Staged Deletion**: Delete test/development cores before production
4. **Verification**: Confirm deletion completed successfully
5. **Documentation**: Record why cores were deleted for audit purposes
6. **Access Control**: Ensure only authorized users can delete cores
## Recovery Options
### If Accidentally Deleted
```bash
# Restore from backup if available
tg-put-kg-core --id "restored-core" -i backup.msgpack
# Reload into flows if needed
tg-load-kg-core --id "restored-core" --flow-id "production-flow"
```
### Audit Trail
```bash
# Keep records of deletions
echo "$(date): Deleted knowledge core 'old-core' - reason: obsolete version" >> deletion-log.txt
```
## System Impact
### Storage Recovery
- Disk space is freed immediately
- Database indexes are updated
- System performance may improve
### Service Continuity
- Running flows continue to operate
- Other knowledge cores remain unaffected
- New knowledge cores can use the same ID
## Troubleshooting
### Deletion Fails
```bash
# Check if core is loaded in flows
tg-show-flows | grep -A 10 "knowledge"
# Force unload if necessary
tg-unload-kg-core --id "stuck-core" --flow-id "problem-flow"
# Retry deletion
tg-delete-kg-core --id "stuck-core"
```
### Partial Deletion
```bash
# If core still appears in listings
tg-show-kg-cores | grep "partially-deleted"
# Contact system administrator if deletion appears incomplete
```

489
docs/cli/tg-dump-msgpack.md Normal file
View file

@ -0,0 +1,489 @@
# tg-dump-msgpack
Reads and analyzes knowledge core files in MessagePack format for diagnostic purposes.
## Synopsis
```bash
tg-dump-msgpack -i INPUT_FILE [options]
```
## Description
The `tg-dump-msgpack` command is a diagnostic utility that reads knowledge core files stored in MessagePack format and outputs their contents in JSON format or provides a summary analysis. This tool is primarily used for debugging, data inspection, and understanding the structure of knowledge cores.
MessagePack is a binary serialization format that TrustGraph uses for efficient storage and transfer of knowledge graph data.
## Options
### Required Arguments
- `-i, --input-file FILE`: Input MessagePack file to read
### Optional Arguments
- `-s, --summary`: Show a summary analysis of the file contents
- `-r, --records`: Dump individual records in JSON format (default behavior)
## Examples
### Dump Records as JSON
```bash
tg-dump-msgpack -i knowledge-core.msgpack
```
### Show Summary Analysis
```bash
tg-dump-msgpack -i knowledge-core.msgpack --summary
```
### Save Output to File
```bash
tg-dump-msgpack -i knowledge-core.msgpack > analysis.json
```
### Analyze Multiple Files
```bash
for file in *.msgpack; do
echo "=== $file ==="
tg-dump-msgpack -i "$file" --summary
echo
done
```
## Output Formats
### Record Output (Default)
With `-r` or `--records` (default behavior), the command outputs each record as a separate JSON object:
```json
["t", {"m": {"m": [{"s": {"v": "uri1"}, "p": {"v": "predicate"}, "o": {"v": "object"}}]}}]
["ge", {"v": [[0.1, 0.2, 0.3, ...]]}]
["de", {"metadata": {...}, "chunks": [...]}]
```
### Summary Output
With `-s` or `--summary`, the command provides an analytical overview:
```
Vector dimension: 384
- NASA Challenger Report
- Technical Documentation
- Safety Engineering Guidelines
```
## Record Types
MessagePack files may contain different types of records:
### Triple Records ("t")
RDF triples representing knowledge graph relationships:
```json
["t", {
"m": {
"m": [{
"s": {"v": "http://example.org/subject"},
"p": {"v": "http://example.org/predicate"},
"o": {"v": "object value"}
}]
}
}]
```
### Graph Embeddings ("ge")
Vector embeddings for graph entities:
```json
["ge", {
"v": [[0.1, 0.2, 0.3, 0.4, ...]]
}]
```
### Document Embeddings ("de")
Document chunk embeddings with metadata:
```json
["de", {
"metadata": {
"id": "doc-123",
"user": "trustgraph",
"collection": "default"
},
"chunks": [{
"chunk": "text content",
"vectors": [0.1, 0.2, 0.3, ...]
}]
}]
```
## Use Cases
### Data Inspection
```bash
# Quick peek at file structure
tg-dump-msgpack -i mystery-core.msgpack --summary
# Detailed record analysis
tg-dump-msgpack -i knowledge-core.msgpack | head -20
```
### Debugging Knowledge Cores
```bash
# Check if file contains expected data types
tg-dump-msgpack -i core.msgpack | grep -o '^\["[^"]*"' | sort | uniq -c
# Find specific entities
tg-dump-msgpack -i core.msgpack | grep "NASA"
# Check vector dimensions
tg-dump-msgpack -i core.msgpack --summary | grep "Vector dimension"
```
### Quality Assurance
```bash
# Validate file completeness
validate_msgpack() {
local file="$1"
echo "Validating: $file"
# Check file exists and is readable
if [ ! -r "$file" ]; then
echo "Error: Cannot read file $file"
return 1
fi
# Get summary
summary=$(tg-dump-msgpack -i "$file" --summary 2>/dev/null)
if [ $? -ne 0 ]; then
echo "Error: Failed to read MessagePack file"
return 1
fi
# Check for vector dimension (indicates embeddings present)
if echo "$summary" | grep -q "Vector dimension:"; then
dim=$(echo "$summary" | grep "Vector dimension:" | awk '{print $3}')
echo "✓ Contains embeddings (dimension: $dim)"
else
echo "⚠ No embeddings found"
fi
# Count labels (indicates entities present)
label_count=$(echo "$summary" | grep "^-" | wc -l)
echo "✓ Found $label_count labeled entities"
return 0
}
# Validate multiple files
for file in cores/*.msgpack; do
validate_msgpack "$file"
done
```
### Data Migration
```bash
# Convert MessagePack to JSON for processing
convert_to_json() {
local input="$1"
local output="$2"
echo "Converting $input to $output..."
tg-dump-msgpack -i "$input" > "$output"
# Add array wrapper for valid JSON array
sed -i '1i[' "$output"
sed -i '$a]' "$output"
sed -i 's/$/,/' "$output"
sed -i '$s/,$//' "$output"
echo "Conversion complete"
}
convert_to_json "knowledge.msgpack" "knowledge.json"
```
### Analysis and Reporting
```bash
# Generate comprehensive analysis report
analyze_msgpack() {
local file="$1"
local report_file="${file%.msgpack}_analysis.txt"
echo "MessagePack Analysis Report" > "$report_file"
echo "File: $file" >> "$report_file"
echo "Generated: $(date)" >> "$report_file"
echo "=============================" >> "$report_file"
echo "" >> "$report_file"
# Summary information
echo "Summary:" >> "$report_file"
tg-dump-msgpack -i "$file" --summary >> "$report_file"
echo "" >> "$report_file"
# Record type analysis
echo "Record Type Distribution:" >> "$report_file"
tg-dump-msgpack -i "$file" | \
grep -o '^\["[^"]*"' | \
sort | uniq -c | \
awk '{print " " $2 ": " $1 " records"}' >> "$report_file"
echo "" >> "$report_file"
# File statistics
file_size=$(stat -c%s "$file")
echo "File Statistics:" >> "$report_file"
echo " Size: $file_size bytes" >> "$report_file"
echo " Size (human): $(numfmt --to=iec-i --suffix=B $file_size)" >> "$report_file"
echo "Analysis saved to: $report_file"
}
# Analyze all MessagePack files
for file in *.msgpack; do
analyze_msgpack "$file"
done
```
### Comparative Analysis
```bash
# Compare two knowledge cores
compare_msgpack() {
local file1="$1"
local file2="$2"
echo "Comparing MessagePack files:"
echo "File 1: $file1"
echo "File 2: $file2"
echo "=========================="
# Compare summaries
echo "Summary comparison:"
echo "File 1:"
tg-dump-msgpack -i "$file1" --summary | sed 's/^/ /'
echo ""
echo "File 2:"
tg-dump-msgpack -i "$file2" --summary | sed 's/^/ /'
echo ""
# Compare record counts
echo "Record type comparison:"
echo "File 1:"
tg-dump-msgpack -i "$file1" | \
grep -o '^\["[^"]*"' | \
sort | uniq -c | \
awk '{print " " $2 ": " $1}' | \
sort
echo "File 2:"
tg-dump-msgpack -i "$file2" | \
grep -o '^\["[^"]*"' | \
sort | uniq -c | \
awk '{print " " $2 ": " $1}' | \
sort
}
compare_msgpack "core1.msgpack" "core2.msgpack"
```
## Advanced Usage
### Large File Processing
```bash
# Process large files in chunks
process_large_msgpack() {
local file="$1"
local chunk_size=1000
echo "Processing large file: $file"
# Count total records first
total_records=$(tg-dump-msgpack -i "$file" | wc -l)
echo "Total records: $total_records"
# Process in chunks
tg-dump-msgpack -i "$file" | \
split -l $chunk_size - "chunk_"
echo "Split into chunks of $chunk_size records each"
# Process each chunk
for chunk in chunk_*; do
echo "Processing $chunk..."
# Add your processing logic here
wc -l "$chunk"
done
# Clean up
rm chunk_*
}
```
### Data Extraction
```bash
# Extract specific data types
extract_triples() {
local file="$1"
local output="triples.json"
echo "Extracting triples from $file..."
tg-dump-msgpack -i "$file" | \
grep '^\["t"' > "$output"
echo "Triples saved to: $output"
}
extract_embeddings() {
local file="$1"
local output="embeddings.json"
echo "Extracting embeddings from $file..."
tg-dump-msgpack -i "$file" | \
grep -E '^\["(ge|de)"' > "$output"
echo "Embeddings saved to: $output"
}
# Extract all data types
extract_triples "knowledge.msgpack"
extract_embeddings "knowledge.msgpack"
```
### Integration with Other Tools
```bash
# Convert MessagePack to formats for other tools
msgpack_to_turtle() {
local input="$1"
local output="$2"
echo "Converting MessagePack to Turtle format..."
# Extract triples and convert to Turtle
tg-dump-msgpack -i "$input" | \
grep '^\["t"' | \
jq -r '.[1].m.m[] |
"<" + .s.v + "> <" + .p.v + "> " +
(if .o.e then "<" + .o.v + ">" else "\"" + .o.v + "\"" end) + " ."' \
> "$output"
echo "Turtle format saved to: $output"
}
msgpack_to_turtle "knowledge.msgpack" "knowledge.ttl"
```
## Error Handling
### File Not Found
```bash
Exception: [Errno 2] No such file or directory: 'missing.msgpack'
```
**Solution**: Check file path and ensure the file exists.
### Invalid MessagePack Format
```bash
Exception: Unpack failed
```
**Solution**: Verify the file is a valid MessagePack file and not corrupted.
### Memory Issues with Large Files
```bash
MemoryError: Unable to allocate memory
```
**Solution**: Process large files in chunks or use streaming approaches.
### Permission Errors
```bash
Exception: [Errno 13] Permission denied
```
**Solution**: Check file permissions and ensure read access.
## Performance Considerations
### File Size Optimization
```bash
# Check file compression efficiency
check_compression() {
local file="$1"
original_size=$(stat -c%s "$file")
# Test compression
gzip -c "$file" > "${file}.gz"
compressed_size=$(stat -c%s "${file}.gz")
ratio=$(echo "scale=2; $compressed_size * 100 / $original_size" | bc)
echo "Original: $(numfmt --to=iec-i --suffix=B $original_size)"
echo "Compressed: $(numfmt --to=iec-i --suffix=B $compressed_size)"
echo "Compression ratio: ${ratio}%"
rm "${file}.gz"
}
```
### Processing Speed
```bash
# Time processing operations
time_msgpack_ops() {
local file="$1"
echo "Timing MessagePack operations for: $file"
# Time summary generation
echo "Summary generation:"
time tg-dump-msgpack -i "$file" --summary > /dev/null
# Time full dump
echo "Full record dump:"
time tg-dump-msgpack -i "$file" > /dev/null
}
```
## Related Commands
- [`tg-get-kg-core`](tg-get-kg-core.md) - Export knowledge cores to MessagePack
- [`tg-load-kg-core`](tg-load-kg-core.md) - Load MessagePack knowledge cores
- [`tg-save-doc-embeds`](tg-save-doc-embeds.md) - Save document embeddings to MessagePack
## Best Practices
1. **File Validation**: Always validate MessagePack files before processing
2. **Memory Management**: Be cautious with large files to avoid memory issues
3. **Backup**: Keep backups of original MessagePack files before analysis
4. **Incremental Processing**: Process large files incrementally when possible
5. **Documentation**: Document the structure and content of your MessagePack files
6. **Version Control**: Track changes in MessagePack file formats over time
## Troubleshooting
### Corrupted Files
```bash
# Test file integrity
if tg-dump-msgpack -i "test.msgpack" --summary > /dev/null 2>&1; then
echo "File appears valid"
else
echo "File may be corrupted"
fi
```
### Empty or Incomplete Files
```bash
# Check for empty files
if [ ! -s "test.msgpack" ]; then
echo "File is empty"
fi
# Check record count
record_count=$(tg-dump-msgpack -i "test.msgpack" 2>/dev/null | wc -l)
echo "Records found: $record_count"
```
### Format Issues
```bash
# Validate JSON output
tg-dump-msgpack -i "test.msgpack" | head -1 | jq . > /dev/null
if [ $? -eq 0 ]; then
echo "JSON output is valid"
else
echo "JSON output may be malformed"
fi
```

View file

@ -0,0 +1,344 @@
# tg-get-flow-class
Retrieves and displays a flow class definition in JSON format.
## Synopsis
```bash
tg-get-flow-class -n CLASS_NAME [options]
```
## Description
The `tg-get-flow-class` command retrieves a stored flow class definition from TrustGraph and displays it in formatted JSON. This is useful for examining flow class configurations, creating backups, or preparing to modify existing flow classes.
The output can be saved to files for version control, documentation, or as input for creating new flow classes with `tg-put-flow-class`.
## Options
### Required Arguments
- `-n, --class-name CLASS_NAME`: Name of the flow class to retrieve
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Display Flow Class Definition
```bash
tg-get-flow-class -n "document-processing"
```
### Save Flow Class to File
```bash
tg-get-flow-class -n "production-flow" > production-flow-backup.json
```
### Compare Flow Classes
```bash
# Get multiple flow classes for comparison
tg-get-flow-class -n "dev-flow" > dev-flow.json
tg-get-flow-class -n "prod-flow" > prod-flow.json
diff dev-flow.json prod-flow.json
```
### Using Custom API URL
```bash
tg-get-flow-class -n "remote-flow" -u http://production:8088/
```
## Output Format
The command outputs the flow class definition in formatted JSON:
```json
{
"description": "Document processing and analysis flow",
"interfaces": {
"agent": {
"request": "non-persistent://tg/request/agent:doc-proc",
"response": "non-persistent://tg/response/agent:doc-proc"
},
"document-rag": {
"request": "non-persistent://tg/request/document-rag:doc-proc",
"response": "non-persistent://tg/response/document-rag:doc-proc"
},
"text-load": "persistent://tg/flow/text-document-load:doc-proc",
"document-load": "persistent://tg/flow/document-load:doc-proc",
"triples-store": "persistent://tg/flow/triples-store:doc-proc"
},
"tags": ["production", "document-processing"]
}
```
### Key Components
#### Description
Human-readable description of the flow class purpose and capabilities.
#### Interfaces
Service definitions showing:
- **Request/Response Services**: Services with both request and response queues
- **Fire-and-Forget Services**: Services with only input queues
#### Tags (Optional)
Categorization tags for organizing flow classes.
## Prerequisites
### Flow Class Must Exist
Verify the flow class exists before retrieval:
```bash
# Check available flow classes
tg-show-flow-classes
# Look for specific class
tg-show-flow-classes | grep "target-class"
```
## Error Handling
### Flow Class Not Found
```bash
Exception: Flow class 'invalid-class' not found
```
**Solution**: Check available classes with `tg-show-flow-classes` and verify the class name.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
### Permission Errors
```bash
Exception: Access denied to flow class
```
**Solution**: Verify user permissions for accessing flow class definitions.
## Use Cases
### Configuration Backup
```bash
# Backup all flow classes
mkdir -p flow-class-backups/$(date +%Y%m%d)
tg-show-flow-classes | awk '{print $1}' | while read class; do
if [ "$class" != "flow" ]; then # Skip header
tg-get-flow-class -n "$class" > "flow-class-backups/$(date +%Y%m%d)/$class.json"
fi
done
```
### Flow Class Migration
```bash
# Export from source environment
tg-get-flow-class -n "production-flow" -u http://source:8088/ > prod-flow.json
# Import to target environment
tg-put-flow-class -n "production-flow" -c "$(cat prod-flow.json)" -u http://target:8088/
```
### Template Creation
```bash
# Get existing flow class as template
tg-get-flow-class -n "base-flow" > template.json
# Modify template and create new class
sed 's/base-flow/new-flow/g' template.json > new-flow.json
tg-put-flow-class -n "custom-flow" -c "$(cat new-flow.json)"
```
### Configuration Analysis
```bash
# Analyze flow class configurations
tg-get-flow-class -n "complex-flow" | jq '.interfaces | keys'
tg-get-flow-class -n "complex-flow" | jq '.interfaces | length'
```
### Version Control Integration
```bash
# Store flow classes in git
mkdir -p flow-classes
tg-get-flow-class -n "main-flow" > flow-classes/main-flow.json
git add flow-classes/main-flow.json
git commit -m "Update main-flow configuration"
```
## JSON Processing
### Extract Specific Information
```bash
# Get only interface names
tg-get-flow-class -n "my-flow" | jq -r '.interfaces | keys[]'
# Get only description
tg-get-flow-class -n "my-flow" | jq -r '.description'
# Get request queues
tg-get-flow-class -n "my-flow" | jq -r '.interfaces | to_entries[] | select(.value.request) | .value.request'
```
### Validate Configuration
```bash
# Validate JSON structure
tg-get-flow-class -n "my-flow" | jq . > /dev/null && echo "Valid JSON" || echo "Invalid JSON"
# Check required fields
config=$(tg-get-flow-class -n "my-flow")
echo "$config" | jq -e '.description' > /dev/null || echo "Missing description"
echo "$config" | jq -e '.interfaces' > /dev/null || echo "Missing interfaces"
```
## Integration with Other Commands
### Flow Class Lifecycle
```bash
# 1. Examine existing flow class
tg-get-flow-class -n "old-flow"
# 2. Save backup
tg-get-flow-class -n "old-flow" > old-flow-backup.json
# 3. Modify configuration
cp old-flow-backup.json new-flow.json
# Edit new-flow.json as needed
# 4. Upload new version
tg-put-flow-class -n "updated-flow" -c "$(cat new-flow.json)"
# 5. Test new flow class
tg-start-flow -n "updated-flow" -i "test-instance" -d "Testing updated flow"
```
### Bulk Operations
```bash
# Process multiple flow classes
flow_classes=("flow1" "flow2" "flow3")
for class in "${flow_classes[@]}"; do
echo "Processing $class..."
tg-get-flow-class -n "$class" > "backup-$class.json"
# Modify configuration
sed 's/old-pattern/new-pattern/g' "backup-$class.json" > "updated-$class.json"
# Upload updated version
tg-put-flow-class -n "$class" -c "$(cat updated-$class.json)"
done
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-put-flow-class`](tg-put-flow-class.md) - Upload/update flow class definitions
- [`tg-show-flow-classes`](tg-show-flow-classes.md) - List available flow classes
- [`tg-delete-flow-class`](tg-delete-flow-class.md) - Remove flow class definitions
- [`tg-start-flow`](tg-start-flow.md) - Create flow instances from classes
## API Integration
This command uses the [Flow API](../apis/api-flow.md) with the `get-class` operation to retrieve flow class definitions.
## Advanced Usage
### Configuration Diff
```bash
# Compare flow class versions
tg-get-flow-class -n "flow-v1" > v1.json
tg-get-flow-class -n "flow-v2" > v2.json
diff -u v1.json v2.json
```
### Extract Queue Information
```bash
# Get all queue names from flow class
tg-get-flow-class -n "my-flow" | jq -r '
.interfaces |
to_entries[] |
if .value | type == "object" then
.value.request, .value.response
else
.value
end
' | sort | uniq
```
### Configuration Validation Script
```bash
#!/bin/bash
# validate-flow-class.sh
flow_class="$1"
if [ -z "$flow_class" ]; then
echo "Usage: $0 <flow-class-name>"
exit 1
fi
echo "Validating flow class: $flow_class"
# Get configuration
config=$(tg-get-flow-class -n "$flow_class" 2>/dev/null)
if [ $? -ne 0 ]; then
echo "ERROR: Flow class not found"
exit 1
fi
# Validate JSON
echo "$config" | jq . > /dev/null
if [ $? -ne 0 ]; then
echo "ERROR: Invalid JSON structure"
exit 1
fi
# Check required fields
desc=$(echo "$config" | jq -r '.description // empty')
if [ -z "$desc" ]; then
echo "WARNING: Missing description"
fi
interfaces=$(echo "$config" | jq -r '.interfaces // empty')
if [ -z "$interfaces" ] || [ "$interfaces" = "null" ]; then
echo "ERROR: Missing interfaces"
exit 1
fi
echo "Flow class validation passed"
```
## Best Practices
1. **Regular Backups**: Save flow class definitions before modifications
2. **Version Control**: Store configurations in version control systems
3. **Documentation**: Include meaningful descriptions in flow classes
4. **Validation**: Validate JSON structure before using configurations
5. **Template Management**: Use existing classes as templates for new ones
6. **Change Tracking**: Document changes when updating flow classes
## Troubleshooting
### Empty Output
```bash
# If command returns empty output
tg-get-flow-class -n "my-flow"
# Check if flow class exists
tg-show-flow-classes | grep "my-flow"
```
### Invalid JSON Output
```bash
# If output appears corrupted
tg-get-flow-class -n "my-flow" | jq .
# Should show parsing error if JSON is invalid
```
### Permission Issues
```bash
# If access denied errors occur
# Verify authentication and user permissions
# Contact system administrator if needed
```

365
docs/cli/tg-get-kg-core.md Normal file
View file

@ -0,0 +1,365 @@
# tg-get-kg-core
Exports a knowledge core from TrustGraph to a MessagePack file.
## Synopsis
```bash
tg-get-kg-core --id CORE_ID -o OUTPUT_FILE [options]
```
## Description
The `tg-get-kg-core` command retrieves a stored knowledge core from TrustGraph and exports it to a MessagePack format file. This allows you to backup knowledge cores, transfer them between systems, or examine their contents offline.
The exported file contains both RDF triples and graph embeddings in a compact binary format that can later be imported using `tg-put-kg-core`.
## Options
### Required Arguments
- `--id, --identifier CORE_ID`: Identifier of the knowledge core to export
- `-o, --output OUTPUT_FILE`: Path for the output MessagePack file
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `ws://localhost:8088/`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
## Examples
### Basic Knowledge Core Export
```bash
tg-get-kg-core --id "research-knowledge" -o research-backup.msgpack
```
### Export with Specific User
```bash
tg-get-kg-core \
--id "medical-knowledge" \
-o medical-backup.msgpack \
-U medical-team
```
### Export with Timestamped Filename
```bash
tg-get-kg-core \
--id "production-core" \
-o "production-backup-$(date +%Y%m%d-%H%M%S).msgpack"
```
### Using Custom API URL
```bash
tg-get-kg-core \
--id "remote-core" \
-o remote-backup.msgpack \
-u ws://production:8088/
```
## Prerequisites
### Knowledge Core Must Exist
Verify the knowledge core exists:
```bash
# Check available knowledge cores
tg-show-kg-cores
# Verify specific core exists
tg-show-kg-cores | grep "target-core-id"
```
### Output Directory Must Be Writable
Ensure the output directory exists and is writable:
```bash
# Create backup directory if needed
mkdir -p backups
# Export to backup directory
tg-get-kg-core --id "my-core" -o backups/my-core-backup.msgpack
```
## Export Process
1. **Connection**: Establishes WebSocket connection to Knowledge API
2. **Request**: Sends get-kg-core request with core ID and user
3. **Streaming**: Receives data in chunks via WebSocket
4. **Processing**: Converts response data to MessagePack format
5. **Writing**: Writes binary data to output file
6. **Summary**: Reports statistics on exported data
## Output Format
The exported MessagePack file contains structured data with two types of messages:
### Triple Messages (`"t"`)
Contains RDF triples (facts and relationships):
```python
("t", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"t": [ # triples array
{
"s": {"value": "subject", "is_uri": true},
"p": {"value": "predicate", "is_uri": true},
"o": {"value": "object", "is_uri": false}
}
]
})
```
### Graph Embedding Messages (`"ge"`)
Contains vector embeddings for entities:
```python
("ge", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"e": [ # entities array
{
"e": {"value": "entity", "is_uri": true},
"v": [[0.1, 0.2, 0.3]] # vectors
}
]
})
```
## Output Statistics
The command reports the number of messages exported:
```bash
Got: 150 triple, 75 GE messages.
```
Where:
- **triple**: Number of RDF triple message chunks exported
- **GE**: Number of graph embedding message chunks exported
## Error Handling
### Knowledge Core Not Found
```bash
Exception: Knowledge core 'invalid-core' not found
```
**Solution**: Check available cores with `tg-show-kg-cores` and verify the core ID.
### Permission Denied
```bash
Exception: Access denied to knowledge core
```
**Solution**: Verify user permissions for the specified knowledge core.
### File Permission Errors
```bash
Exception: Permission denied: output.msgpack
```
**Solution**: Check write permissions for the output directory and filename.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
### Disk Space Errors
```bash
Exception: No space left on device
```
**Solution**: Free up disk space or use a different output location.
## File Management
### Backup Organization
```bash
# Create organized backup structure
mkdir -p backups/{daily,weekly,monthly}
# Daily backup
tg-get-kg-core --id "prod-core" -o "backups/daily/prod-$(date +%Y%m%d).msgpack"
# Weekly backup
tg-get-kg-core --id "prod-core" -o "backups/weekly/prod-week-$(date +%V).msgpack"
```
### Compression
```bash
# Export and compress for storage
tg-get-kg-core --id "large-core" -o large-core.msgpack
gzip large-core.msgpack
# Results in large-core.msgpack.gz
```
## File Verification
### Check File Size
```bash
# Export and verify
tg-get-kg-core --id "my-core" -o my-core.msgpack
ls -lh my-core.msgpack
# Typical sizes: small cores (KB-MB), large cores (MB-GB)
```
### Validate Export
```bash
# Test the exported file by importing to different ID
tg-put-kg-core --id "test-import" -i my-core.msgpack
tg-show-kg-cores | grep "test-import"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL (automatically converted to WebSocket format)
## Related Commands
- [`tg-put-kg-core`](tg-put-kg-core.md) - Import knowledge core from MessagePack file
- [`tg-show-kg-cores`](tg-show-kg-cores.md) - List available knowledge cores
- [`tg-delete-kg-core`](tg-delete-kg-core.md) - Delete knowledge cores
- [`tg-dump-msgpack`](tg-dump-msgpack.md) - Examine MessagePack file contents
## API Integration
This command uses the [Knowledge API](../apis/api-knowledge.md) via WebSocket connection with `get-kg-core` operations to retrieve knowledge data.
## Use Cases
### Regular Backups
```bash
#!/bin/bash
# Daily backup script
cores=("production-core" "research-core" "customer-data")
backup_dir="backups/$(date +%Y%m%d)"
mkdir -p "$backup_dir"
for core in "${cores[@]}"; do
echo "Backing up $core..."
tg-get-kg-core --id "$core" -o "$backup_dir/$core.msgpack"
done
```
### Migration Between Environments
```bash
# Export from development
tg-get-kg-core --id "dev-knowledge" -o dev-export.msgpack
# Import to staging
tg-put-kg-core --id "staging-knowledge" -i dev-export.msgpack
```
### Knowledge Core Versioning
```bash
# Create versioned backups
version="v$(date +%Y%m%d)"
tg-get-kg-core --id "main-knowledge" -o "knowledge-$version.msgpack"
# Tag with git or other version control
git add "knowledge-$version.msgpack"
git commit -m "Knowledge core backup $version"
```
### Data Analysis
```bash
# Export for offline analysis
tg-get-kg-core --id "analytics-data" -o analytics.msgpack
# Process with custom tools
python analyze_knowledge.py analytics.msgpack
```
### Disaster Recovery
```bash
# Create comprehensive backup
cores=$(tg-show-kg-cores)
backup_date=$(date +%Y%m%d-%H%M%S)
backup_dir="disaster-recovery-$backup_date"
mkdir -p "$backup_dir"
for core in $cores; do
echo "Backing up $core..."
tg-get-kg-core --id "$core" -o "$backup_dir/$core.msgpack"
done
# Create checksum file
cd "$backup_dir"
sha256sum *.msgpack > checksums.sha256
```
## Automated Backup Strategies
### Cron Job Setup
```bash
# Add to crontab for daily backups at 2 AM
# 0 2 * * * /path/to/backup-script.sh
#!/bin/bash
# backup-script.sh
BACKUP_DIR="/backups/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Backup all cores
tg-show-kg-cores | while read core; do
tg-get-kg-core --id "$core" -o "$BACKUP_DIR/$core.msgpack"
done
# Cleanup old backups (keep 30 days)
find /backups -type d -mtime +30 -exec rm -rf {} \;
```
### Incremental Backups
```bash
# Compare with previous backup
current_cores=$(tg-show-kg-cores | sort)
previous_cores=$(cat last-backup-cores.txt 2>/dev/null | sort)
# Only backup changed cores
comm -13 <(echo "$previous_cores") <(echo "$current_cores") | while read core; do
tg-get-kg-core --id "$core" -o "incremental/$core.msgpack"
done
echo "$current_cores" > last-backup-cores.txt
```
## Best Practices
1. **Regular Backups**: Schedule automated backups of important knowledge cores
2. **Organized Storage**: Use dated directories and consistent naming
3. **Verification**: Test backup files periodically by importing them
4. **Compression**: Compress large backup files to save storage
5. **Access Control**: Secure backup files with appropriate permissions
6. **Documentation**: Document what each knowledge core contains
7. **Retention Policy**: Implement backup retention policies
## Troubleshooting
### Large File Exports
```bash
# For very large knowledge cores
# Monitor progress and disk space
df -h . # Check available space
tg-get-kg-core --id "huge-core" -o huge-core.msgpack &
watch -n 5 'ls -lh huge-core.msgpack' # Monitor file growth
```
### Network Timeouts
```bash
# If export times out, try smaller cores or check network
# Split large cores if possible, or increase timeout settings
```
### Corrupted Exports
```bash
# Verify file integrity
file my-core.msgpack # Should show "data"
python -c "import msgpack; msgpack.unpack(open('my-core.msgpack', 'rb'))"
```

View file

@ -0,0 +1,494 @@
# tg-graph-to-turtle
Exports knowledge graph data to Turtle (TTL) format for backup, analysis, or migration.
## Synopsis
```bash
tg-graph-to-turtle [options]
```
## Description
The `tg-graph-to-turtle` command connects to TrustGraph's triple query service and exports all graph triples in Turtle format. This is useful for creating backups, analyzing graph structure, migrating data, or integrating with external RDF tools.
The command queries up to 10,000 triples and outputs them in standard Turtle format to stdout, while also saving to an `output.ttl` file.
## Options
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-f, --flow-id ID`: Flow instance ID to use (default: `default`)
- `-U, --user USER`: User ID for data scope (default: `trustgraph`)
- `-C, --collection COLLECTION`: Collection to export (default: `default`)
## Examples
### Basic Export
```bash
tg-graph-to-turtle
```
### Export to File
```bash
tg-graph-to-turtle > knowledge-graph.ttl
```
### Export Specific Collection
```bash
tg-graph-to-turtle -C "research-data" > research-graph.ttl
```
### Export with Custom Flow
```bash
tg-graph-to-turtle -f "production-flow" -U "admin" > production-graph.ttl
```
## Output Format
The command generates Turtle format with proper RDF syntax:
```turtle
@prefix ns1: <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ns1:Person rdf:type rdfs:Class .
ns1:john rdf:type ns1:Person ;
ns1:name "John Doe" ;
ns1:age "30" .
ns1:jane rdf:type ns1:Person ;
ns1:name "Jane Smith" ;
ns1:department "Engineering" .
```
### Output Destinations
1. **stdout**: Standard output for piping or display
2. **output.ttl**: Automatically created file in current directory
## Use Cases
### Data Backup
```bash
# Create timestamped backups
timestamp=$(date +%Y%m%d_%H%M%S)
tg-graph-to-turtle > "backup_${timestamp}.ttl"
# Backup specific collections
collections=("research" "products" "customers")
for collection in "${collections[@]}"; do
tg-graph-to-turtle -C "$collection" > "backup_${collection}_${timestamp}.ttl"
done
```
### Data Migration
```bash
# Export from source environment
tg-graph-to-turtle -u "http://source:8088/" > source-data.ttl
# Import to target environment
tg-load-turtle -i "migration-$(date +%Y%m%d)" \
-u "ws://target:8088/" \
source-data.ttl
```
### Graph Analysis
```bash
# Export for analysis
tg-graph-to-turtle > analysis-data.ttl
# Analyze with external tools
rapper -i turtle -o ntriples analysis-data.ttl | wc -l # Count triples
grep -c "rdf:type" analysis-data.ttl # Count type assertions
```
### Integration with External Tools
```bash
# Export for Apache Jena
tg-graph-to-turtle > jena-input.ttl
tdb2.tdbloader --loc=tdb-database jena-input.ttl
# Export for Virtuoso
tg-graph-to-turtle > virtuoso-data.ttl
isql-v -U dba -P password < load-script.sql
```
## Advanced Usage
### Incremental Exports
```bash
# Export with timestamps for incremental backups
last_export_file="last_export_timestamp.txt"
current_time=$(date +%Y%m%d_%H%M%S)
if [ -f "$last_export_file" ]; then
last_export=$(cat "$last_export_file")
echo "Last export: $last_export"
fi
echo "Current export: $current_time"
tg-graph-to-turtle > "incremental_${current_time}.ttl"
echo "$current_time" > "$last_export_file"
```
### Multi-Collection Export
```bash
# Export all collections to separate files
export_all_collections() {
local output_dir="graph_exports_$(date +%Y%m%d)"
mkdir -p "$output_dir"
echo "Exporting all collections to $output_dir"
# Get list of collections (this would need to be implemented)
# For now, use known collections
collections=("default" "research" "products" "documents")
for collection in "${collections[@]}"; do
echo "Exporting collection: $collection"
tg-graph-to-turtle -C "$collection" > "$output_dir/${collection}.ttl"
# Verify export
if [ -s "$output_dir/${collection}.ttl" ]; then
triple_count=$(grep -c "\." "$output_dir/${collection}.ttl")
echo " Exported $triple_count triples"
else
echo " No data exported"
fi
done
}
export_all_collections
```
### Filtered Export
```bash
# Export specific types of triples
export_filtered() {
local filter_type="$1"
local output_file="$2"
echo "Exporting $filter_type triples to $output_file"
# Export all data first
tg-graph-to-turtle > temp_full_export.ttl
# Filter based on type
case "$filter_type" in
"classes")
grep "rdf:type.*Class" temp_full_export.ttl > "$output_file"
;;
"instances")
grep -v "rdf:type.*Class" temp_full_export.ttl > "$output_file"
;;
"properties")
grep "rdf:type.*Property" temp_full_export.ttl > "$output_file"
;;
*)
echo "Unknown filter type: $filter_type"
return 1
;;
esac
rm temp_full_export.ttl
}
# Usage
export_filtered "classes" "schema-classes.ttl"
export_filtered "instances" "instance-data.ttl"
```
### Compression and Packaging
```bash
# Export and compress
export_compressed() {
local collection="$1"
local timestamp=$(date +%Y%m%d_%H%M%S)
local filename="${collection}_${timestamp}"
echo "Exporting and compressing collection: $collection"
# Export to temporary file
tg-graph-to-turtle -C "$collection" > "${filename}.ttl"
# Compress
gzip "${filename}.ttl"
# Create metadata
cat > "${filename}.meta" << EOF
Collection: $collection
Export Date: $(date)
Compressed Size: $(stat -c%s "${filename}.ttl.gz") bytes
MD5: $(md5sum "${filename}.ttl.gz" | cut -d' ' -f1)
EOF
echo "Export complete: ${filename}.ttl.gz"
}
# Export multiple collections compressed
collections=("research" "products" "customers")
for collection in "${collections[@]}"; do
export_compressed "$collection"
done
```
### Validation and Quality Checks
```bash
# Export with validation
export_with_validation() {
local output_file="$1"
echo "Exporting with validation to $output_file"
# Export
tg-graph-to-turtle > "$output_file"
# Validate Turtle syntax
if rapper -q -i turtle "$output_file" > /dev/null 2>&1; then
echo "✓ Valid Turtle syntax"
else
echo "✗ Invalid Turtle syntax"
return 1
fi
# Count triples
triple_count=$(rapper -i turtle -c "$output_file" 2>/dev/null)
echo "Total triples: $triple_count"
# Check for common issues
if grep -q "^@prefix" "$output_file"; then
echo "✓ Prefixes found"
else
echo "⚠ No prefixes found"
fi
# Check for URIs with spaces (malformed)
malformed_uris=$(grep -c " " "$output_file" || echo "0")
if [ "$malformed_uris" -gt 0 ]; then
echo "⚠ Found $malformed_uris lines with spaces (potential malformed URIs)"
fi
}
# Validate export
export_with_validation "validated-export.ttl"
```
## Performance Optimization
### Streaming Export
```bash
# Handle large datasets with streaming
stream_export() {
local collection="$1"
local chunk_size="$2"
local output_prefix="$3"
echo "Streaming export of collection: $collection"
# Export to temporary file
tg-graph-to-turtle -C "$collection" > temp_export.ttl
# Split into chunks
split -l "$chunk_size" temp_export.ttl "${output_prefix}_"
# Add .ttl extension and validate each chunk
for chunk in ${output_prefix}_*; do
mv "$chunk" "$chunk.ttl"
# Validate chunk
if rapper -q -i turtle "$chunk.ttl" > /dev/null 2>&1; then
echo "✓ Valid chunk: $chunk.ttl"
else
echo "✗ Invalid chunk: $chunk.ttl"
fi
done
rm temp_export.ttl
}
# Stream large collection
stream_export "large-collection" 1000 "chunk"
```
### Parallel Processing
```bash
# Export multiple collections in parallel
parallel_export() {
local collections=("$@")
local timestamp=$(date +%Y%m%d_%H%M%S)
echo "Exporting ${#collections[@]} collections in parallel"
for collection in "${collections[@]}"; do
(
echo "Exporting $collection..."
tg-graph-to-turtle -C "$collection" > "${collection}_${timestamp}.ttl"
echo "✓ Completed: $collection"
) &
done
wait
echo "All exports completed"
}
# Export collections in parallel
parallel_export "research" "products" "customers" "documents"
```
## Integration Scripts
### Automated Backup System
```bash
#!/bin/bash
# automated-backup.sh
backup_dir="graph_backups"
retention_days=30
echo "Starting automated graph backup..."
# Create backup directory
mkdir -p "$backup_dir"
# Export with timestamp
timestamp=$(date +%Y%m%d_%H%M%S)
backup_file="$backup_dir/graph_backup_${timestamp}.ttl"
echo "Exporting to: $backup_file"
tg-graph-to-turtle > "$backup_file"
# Compress
gzip "$backup_file"
echo "Compressed: ${backup_file}.gz"
# Clean old backups
find "$backup_dir" -name "*.ttl.gz" -mtime +$retention_days -delete
echo "Cleaned backups older than $retention_days days"
# Verify backup
if [ -f "${backup_file}.gz" ]; then
size=$(stat -c%s "${backup_file}.gz")
echo "Backup completed: ${size} bytes"
else
echo "Backup failed!"
exit 1
fi
```
### Data Sync Script
```bash
#!/bin/bash
# sync-graphs.sh
source_url="$1"
target_url="$2"
collection="$3"
if [ -z "$source_url" ] || [ -z "$target_url" ] || [ -z "$collection" ]; then
echo "Usage: $0 <source-url> <target-url> <collection>"
exit 1
fi
echo "Syncing collection '$collection' from $source_url to $target_url"
# Export from source
temp_file="sync_temp_$(date +%s).ttl"
tg-graph-to-turtle -u "$source_url" -C "$collection" > "$temp_file"
# Validate export
if [ ! -s "$temp_file" ]; then
echo "No data exported from source"
exit 1
fi
# Load to target
doc_id="sync-$(date +%Y%m%d-%H%M%S)"
if tg-load-turtle -i "$doc_id" -u "$target_url" -C "$collection" "$temp_file"; then
echo "Sync completed successfully"
else
echo "Sync failed"
exit 1
fi
# Cleanup
rm "$temp_file"
```
## Error Handling
### Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph is running.
### Flow Not Found
```bash
Exception: Flow instance not found
```
**Solution**: Verify flow ID with `tg-show-flows`.
### Permission Errors
```bash
Exception: Access denied
```
**Solution**: Check user permissions for the specified collection.
### Empty Output
```bash
# No triples exported
```
**Solution**: Verify collection contains data and user has access.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-load-turtle`](tg-load-turtle.md) - Import Turtle files
- [`tg-triples-query`](tg-triples-query.md) - Query graph triples
- [`tg-show-flows`](tg-show-flows.md) - List available flows
- [`tg-get-kg-core`](tg-get-kg-core.md) - Export knowledge cores
## API Integration
This command uses the [Triples Query API](../apis/api-triples-query.md) to retrieve graph data and convert it to Turtle format.
## Best Practices
1. **Regular Backups**: Schedule regular exports for data protection
2. **Validation**: Always validate exported Turtle files
3. **Compression**: Compress large exports for storage efficiency
4. **Monitoring**: Track export sizes and success rates
5. **Documentation**: Document export procedures and retention policies
6. **Security**: Ensure sensitive data is properly protected in exports
7. **Version Control**: Consider versioning exported schemas
## Troubleshooting
### Large Dataset Issues
```bash
# Check query limits
grep -c "\." output.ttl # Count exported triples
# Default limit is 10,000 triples
# For larger datasets, consider using tg-get-kg-core
tg-get-kg-core -n "collection-name" > large-export.msgpack
```
### Malformed URIs
```bash
# Check for URIs with spaces
grep " " output.ttl | head -5
# Clean URIs if needed
sed 's/ /%20/g' output.ttl > cleaned-output.ttl
```
### Memory Issues
```bash
# Monitor memory usage during export
free -h
# Consider splitting exports for large datasets
```

View file

@ -0,0 +1,452 @@
# tg-init-pulsar-manager
Initializes Pulsar Manager with default superuser credentials for TrustGraph.
## Synopsis
```bash
tg-init-pulsar-manager
```
## Description
The `tg-init-pulsar-manager` command is a setup utility that creates a default superuser account in Pulsar Manager. This is typically run once during initial TrustGraph deployment to establish administrative access to the Pulsar message queue management interface.
The command configures a superuser with predefined credentials that can be used to access the Pulsar Manager web interface for monitoring and managing Pulsar topics, namespaces, and tenants.
## Default Configuration
The command creates a superuser with these default credentials:
- **Username**: `admin`
- **Password**: `apachepulsar`
- **Description**: `test`
- **Email**: `username@test.org`
## Prerequisites
### Pulsar Manager Service
Pulsar Manager must be running and accessible at `http://localhost:7750` before running this command.
### Network Connectivity
The command requires network access to the Pulsar Manager API endpoint.
## Examples
### Basic Initialization
```bash
tg-init-pulsar-manager
```
### Verify Initialization
```bash
# Run the initialization
tg-init-pulsar-manager
# Check if Pulsar Manager is accessible
curl -s http://localhost:7750/pulsar-manager/ | grep -q "Pulsar Manager"
echo "Pulsar Manager status: $?"
```
### Integration with Setup Scripts
```bash
#!/bin/bash
# setup-trustgraph.sh
echo "Setting up TrustGraph infrastructure..."
# Wait for Pulsar Manager to be ready
echo "Waiting for Pulsar Manager..."
while ! curl -s http://localhost:7750/pulsar-manager/ > /dev/null; do
echo " Waiting for Pulsar Manager to start..."
sleep 5
done
# Initialize Pulsar Manager
echo "Initializing Pulsar Manager..."
tg-init-pulsar-manager
if [ $? -eq 0 ]; then
echo "✓ Pulsar Manager initialized successfully"
echo " You can access it at: http://localhost:7750/pulsar-manager/"
echo " Username: admin"
echo " Password: apachepulsar"
else
echo "✗ Failed to initialize Pulsar Manager"
exit 1
fi
```
## What It Does
The command performs the following operations:
1. **Retrieves CSRF Token**: Gets a CSRF token from Pulsar Manager for secure API access
2. **Creates Superuser**: Makes an authenticated API call to create the superuser account
3. **Sets Permissions**: Configures the user with administrative privileges
### HTTP Operations
```bash
# Equivalent manual operations:
CSRF_TOKEN=$(curl http://localhost:7750/pulsar-manager/csrf-token)
curl \
-H "X-XSRF-TOKEN: $CSRF_TOKEN" \
-H "Cookie: XSRF-TOKEN=$CSRF_TOKEN;" \
-H 'Content-Type: application/json' \
-X PUT \
http://localhost:7750/pulsar-manager/users/superuser \
-d '{"name": "admin", "password": "apachepulsar", "description": "test", "email": "username@test.org"}'
```
## Use Cases
### Initial Deployment
```bash
# Part of TrustGraph deployment sequence
deploy_trustgraph() {
echo "Deploying TrustGraph..."
# Start services
docker-compose up -d pulsar pulsar-manager
# Wait for services
wait_for_service "http://localhost:7750/pulsar-manager/" "Pulsar Manager"
wait_for_service "http://localhost:8080/admin/v2/clusters" "Pulsar"
# Initialize Pulsar Manager
echo "Initializing Pulsar Manager..."
tg-init-pulsar-manager
# Initialize TrustGraph
echo "Initializing TrustGraph..."
tg-init-trustgraph
echo "Deployment complete!"
}
```
### Development Environment Setup
```bash
# Development setup script
setup_dev_environment() {
echo "Setting up development environment..."
# Start local services
docker-compose -f docker-compose.dev.yml up -d
# Wait for readiness
echo "Waiting for services to start..."
sleep 30
# Initialize components
tg-init-pulsar-manager
tg-init-trustgraph
echo "Development environment ready!"
echo "Pulsar Manager: http://localhost:7750/pulsar-manager/"
echo "Credentials: admin / apachepulsar"
}
```
### CI/CD Integration
```bash
# Integration testing setup
setup_test_environment() {
local timeout=300 # 5 minutes
local elapsed=0
echo "Setting up test environment..."
# Start services
docker-compose up -d --wait
# Wait for Pulsar Manager
while ! curl -s http://localhost:7750/pulsar-manager/ > /dev/null; do
if [ $elapsed -ge $timeout ]; then
echo "Timeout waiting for Pulsar Manager"
return 1
fi
sleep 5
elapsed=$((elapsed + 5))
done
# Initialize
if tg-init-pulsar-manager; then
echo "✓ Test environment ready"
else
echo "✗ Failed to initialize test environment"
return 1
fi
}
```
## Docker Integration
### Docker Compose Setup
```yaml
# docker-compose.yml
version: '3.8'
services:
pulsar:
image: apachepulsar/pulsar:latest
ports:
- "6650:6650"
- "8080:8080"
command: bin/pulsar standalone
pulsar-manager:
image: apachepulsar/pulsar-manager:latest
ports:
- "7750:7750"
depends_on:
- pulsar
environment:
SPRING_CONFIGURATION_FILE: /pulsar-manager/pulsar-manager/application.properties
trustgraph-init:
image: trustgraph/cli:latest
depends_on:
- pulsar-manager
command: >
sh -c "
sleep 30 &&
tg-init-pulsar-manager &&
tg-init-trustgraph
"
```
### Kubernetes Setup
```yaml
# k8s-init-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: trustgraph-init
spec:
template:
spec:
containers:
- name: init
image: trustgraph/cli:latest
command:
- sh
- -c
- |
echo "Waiting for Pulsar Manager..."
while ! curl -s http://pulsar-manager:7750/pulsar-manager/; do
sleep 5
done
echo "Initializing Pulsar Manager..."
tg-init-pulsar-manager
echo "Initializing TrustGraph..."
tg-init-trustgraph
env:
- name: PULSAR_MANAGER_URL
value: "http://pulsar-manager:7750"
restartPolicy: Never
```
## Error Handling
### Connection Refused
```bash
curl: (7) Failed to connect to localhost port 7750: Connection refused
```
**Solution**: Ensure Pulsar Manager is running and accessible on port 7750.
### CSRF Token Issues
```bash
curl: (22) The requested URL returned error: 403 Forbidden
```
**Solution**: The CSRF token mechanism may have changed. Check Pulsar Manager API documentation.
### User Already Exists
```bash
HTTP 409 Conflict - User already exists
```
**Solution**: This is expected on subsequent runs. The superuser is already created.
### Network Issues
```bash
curl: (28) Operation timed out
```
**Solution**: Check network connectivity and firewall settings.
## Security Considerations
### Default Credentials
The command uses default credentials that should be changed in production:
```bash
# After initialization, change the password via Pulsar Manager UI
# Or use the API to update credentials
change_admin_password() {
local new_password="$1"
# Login to get session
session=$(curl -s -c cookies.txt \
-d "username=admin&password=apachepulsar" \
http://localhost:7750/pulsar-manager/login)
# Update password
curl -s -b cookies.txt \
-H "Content-Type: application/json" \
-X PUT \
-d "{\"password\": \"$new_password\"}" \
http://localhost:7750/pulsar-manager/users/admin
rm cookies.txt
}
```
### Access Control
```bash
# Restrict access to Pulsar Manager in production
configure_security() {
echo "Configuring Pulsar Manager security..."
# Change default password
change_admin_password "$(openssl rand -base64 32)"
# Configure firewall rules (example)
# iptables -A INPUT -p tcp --dport 7750 -s 10.0.0.0/8 -j ACCEPT
# iptables -A INPUT -p tcp --dport 7750 -j DROP
echo "Security configuration complete"
}
```
## Advanced Usage
### Custom Configuration
```bash
# Create custom initialization script
create_custom_init() {
cat > custom-pulsar-manager-init.sh << 'EOF'
#!/bin/bash
PULSAR_MANAGER_URL=${PULSAR_MANAGER_URL:-http://localhost:7750}
ADMIN_USER=${ADMIN_USER:-admin}
ADMIN_PASS=${ADMIN_PASS:-$(openssl rand -base64 16)}
ADMIN_EMAIL=${ADMIN_EMAIL:-admin@example.com}
echo "Initializing Pulsar Manager at: $PULSAR_MANAGER_URL"
# Get CSRF token
CSRF_TOKEN=$(curl -s "$PULSAR_MANAGER_URL/pulsar-manager/csrf-token")
if [ -z "$CSRF_TOKEN" ]; then
echo "Failed to get CSRF token"
exit 1
fi
# Create superuser
response=$(curl -s -w "%{http_code}" \
-H "X-XSRF-TOKEN: $CSRF_TOKEN" \
-H "Cookie: XSRF-TOKEN=$CSRF_TOKEN;" \
-H 'Content-Type: application/json' \
-X PUT \
"$PULSAR_MANAGER_URL/pulsar-manager/users/superuser" \
-d "{\"name\": \"$ADMIN_USER\", \"password\": \"$ADMIN_PASS\", \"description\": \"Admin user\", \"email\": \"$ADMIN_EMAIL\"}")
http_code="${response: -3}"
if [ "$http_code" = "200" ] || [ "$http_code" = "409" ]; then
echo "Pulsar Manager initialized successfully"
echo "Username: $ADMIN_USER"
echo "Password: $ADMIN_PASS"
else
echo "Failed to initialize Pulsar Manager (HTTP $http_code)"
exit 1
fi
EOF
chmod +x custom-pulsar-manager-init.sh
}
```
### Health Checks
```bash
# Health check script
check_pulsar_manager() {
local max_attempts=30
local attempt=1
echo "Checking Pulsar Manager health..."
while [ $attempt -le $max_attempts ]; do
if curl -s http://localhost:7750/pulsar-manager/ > /dev/null; then
echo "✓ Pulsar Manager is healthy"
return 0
fi
echo "Attempt $attempt/$max_attempts - Pulsar Manager not ready"
sleep 5
attempt=$((attempt + 1))
done
echo "✗ Pulsar Manager health check failed"
return 1
}
# Use in deployment scripts
if check_pulsar_manager; then
tg-init-pulsar-manager
else
echo "Cannot initialize Pulsar Manager - service not healthy"
exit 1
fi
```
## Related Commands
- [`tg-init-trustgraph`](tg-init-trustgraph.md) - Initialize TrustGraph with Pulsar configuration
- [`tg-show-config`](tg-show-config.md) - Display current TrustGraph configuration
## Integration Points
### Pulsar Manager UI
After initialization, access the web interface at:
- **URL**: `http://localhost:7750/pulsar-manager/`
- **Username**: `admin`
- **Password**: `apachepulsar`
### TrustGraph Integration
This command is typically run before `tg-init-trustgraph` as part of the complete TrustGraph setup process.
## Best Practices
1. **Run Once**: Only run during initial setup - subsequent runs are harmless but unnecessary
2. **Change Defaults**: Change default credentials in production environments
3. **Network Security**: Restrict access to Pulsar Manager in production
4. **Health Checks**: Always verify Pulsar Manager is running before initialization
5. **Automation**: Include in deployment automation scripts
6. **Documentation**: Document custom credentials for operations teams
## Troubleshooting
### Service Not Ready
```bash
# Check if Pulsar Manager is running
docker ps | grep pulsar-manager
netstat -tlnp | grep 7750
```
### Port Conflicts
```bash
# Check if port 7750 is in use
lsof -i :7750
```
### Docker Issues
```bash
# Check Pulsar Manager logs
docker logs pulsar-manager
# Restart if needed
docker restart pulsar-manager
```

View file

@ -0,0 +1,523 @@
# tg-init-trustgraph
Initializes Pulsar with TrustGraph tenant, namespaces, and configuration settings.
## Synopsis
```bash
tg-init-trustgraph [options]
```
## Description
The `tg-init-trustgraph` command initializes the Apache Pulsar messaging system with the required tenant, namespaces, policies, and configuration needed for TrustGraph operation. This is a foundational setup command that must be run before TrustGraph can operate properly.
The command creates the necessary Pulsar infrastructure and optionally loads initial configuration data into the system.
## Options
### Optional Arguments
- `-p, --pulsar-admin-url URL`: Pulsar admin URL (default: `http://pulsar:8080`)
- `--pulsar-host HOST`: Pulsar host for client connections (default: `pulsar://pulsar:6650`)
- `--pulsar-api-key KEY`: Pulsar API key for authentication
- `-c, --config CONFIG`: Initial configuration JSON to load
- `-t, --tenant TENANT`: Tenant name (default: `tg`)
## Examples
### Basic Initialization
```bash
tg-init-trustgraph
```
### Custom Pulsar Configuration
```bash
tg-init-trustgraph \
--pulsar-admin-url http://localhost:8080 \
--pulsar-host pulsar://localhost:6650
```
### With Initial Configuration
```bash
tg-init-trustgraph \
--config '{"prompt": {"system": "You are a helpful AI assistant"}}'
```
### Custom Tenant
```bash
tg-init-trustgraph --tenant production-tg
```
### Production Setup
```bash
tg-init-trustgraph \
--pulsar-admin-url http://pulsar-cluster:8080 \
--pulsar-host pulsar://pulsar-cluster:6650 \
--pulsar-api-key "your-api-key" \
--tenant production \
--config "$(cat production-config.json)"
```
## What It Creates
### Tenant Structure
The command creates a TrustGraph tenant with the following namespaces:
#### Flow Namespace (`tg/flow`)
- **Purpose**: Processing workflows and flow definitions
- **Retention**: Default retention policies
#### Request Namespace (`tg/request`)
- **Purpose**: Incoming API requests and commands
- **Retention**: Default retention policies
#### Response Namespace (`tg/response`)
- **Purpose**: API responses and results
- **Retention**: 3 minutes, unlimited size
- **Subscription Expiration**: 30 minutes
#### Config Namespace (`tg/config`)
- **Purpose**: System configuration and settings
- **Retention**: 10MB size limit, unlimited time
- **Subscription Expiration**: 5 minutes
### Configuration Loading
If a configuration is provided, the command also:
1. Connects to the configuration service
2. Loads the provided configuration data
3. Ensures configuration versioning is maintained
## Configuration Format
The configuration should be provided as JSON with this structure:
```json
{
"prompt": {
"system": "System prompt text",
"template-index": ["template1", "template2"],
"template.template1": {
"id": "template1",
"prompt": "Template text with {{variables}}",
"response-type": "text"
}
},
"token-costs": {
"gpt-4": {
"input_price": 0.00003,
"output_price": 0.00006
}
},
"agent": {
"tool-index": ["tool1"],
"tool.tool1": {
"id": "tool1",
"name": "Example Tool",
"description": "Tool description",
"arguments": []
}
}
}
```
## Use Cases
### Initial Deployment
```bash
# Complete TrustGraph initialization sequence
initialize_trustgraph() {
echo "Initializing TrustGraph infrastructure..."
# Wait for Pulsar to be ready
wait_for_pulsar
# Initialize Pulsar Manager (if using)
tg-init-pulsar-manager
# Initialize TrustGraph
tg-init-trustgraph \
--config "$(cat initial-config.json)"
echo "TrustGraph initialization complete!"
}
wait_for_pulsar() {
local timeout=300
local elapsed=0
while ! curl -s http://pulsar:8080/admin/v2/clusters > /dev/null; do
if [ $elapsed -ge $timeout ]; then
echo "Timeout waiting for Pulsar"
exit 1
fi
echo "Waiting for Pulsar..."
sleep 5
elapsed=$((elapsed + 5))
done
}
```
### Environment-Specific Setup
```bash
# Development environment
setup_dev() {
tg-init-trustgraph \
--pulsar-admin-url http://localhost:8080 \
--pulsar-host pulsar://localhost:6650 \
--tenant dev \
--config "$(cat dev-config.json)"
}
# Staging environment
setup_staging() {
tg-init-trustgraph \
--pulsar-admin-url http://staging-pulsar:8080 \
--pulsar-host pulsar://staging-pulsar:6650 \
--tenant staging \
--config "$(cat staging-config.json)"
}
# Production environment
setup_production() {
tg-init-trustgraph \
--pulsar-admin-url http://prod-pulsar:8080 \
--pulsar-host pulsar://prod-pulsar:6650 \
--pulsar-api-key "$PULSAR_API_KEY" \
--tenant production \
--config "$(cat production-config.json)"
}
```
### Configuration Management
```bash
# Load different configurations
load_ai_config() {
local config='{
"prompt": {
"system": "You are an AI assistant specialized in data analysis.",
"template-index": ["analyze", "summarize"],
"template.analyze": {
"id": "analyze",
"prompt": "Analyze this data: {{data}}",
"response-type": "json"
}
},
"token-costs": {
"gpt-4": {"input_price": 0.00003, "output_price": 0.00006},
"claude-3-sonnet": {"input_price": 0.000003, "output_price": 0.000015}
}
}'
tg-init-trustgraph --config "$config"
}
load_research_config() {
local config='{
"prompt": {
"system": "You are a research assistant focused on academic literature.",
"template-index": ["research", "citation"],
"template.research": {
"id": "research",
"prompt": "Research question: {{question}}\nContext: {{context}}",
"response-type": "text"
}
}
}'
tg-init-trustgraph --config "$config"
}
```
## Advanced Usage
### Cluster Setup
```bash
# Multi-cluster initialization
setup_cluster() {
local clusters=("cluster1:8080" "cluster2:8080" "cluster3:8080")
for cluster in "${clusters[@]}"; do
echo "Initializing cluster: $cluster"
tg-init-trustgraph \
--pulsar-admin-url "http://$cluster" \
--pulsar-host "pulsar://${cluster%:*}:6650" \
--tenant "cluster-$(echo $cluster | cut -d: -f1)" \
--config "$(cat cluster-config.json)"
done
}
```
### Configuration Migration
```bash
# Migrate configuration between environments
migrate_config() {
local source_env="$1"
local target_env="$2"
echo "Migrating configuration from $source_env to $target_env"
# Export existing configuration (would need a tg-export-config command)
# For now, assume we have the config in a file
tg-init-trustgraph \
--pulsar-admin-url "http://$target_env:8080" \
--pulsar-host "pulsar://$target_env:6650" \
--config "$(cat ${source_env}-config.json)"
}
```
### Validation and Testing
```bash
# Validate initialization
validate_initialization() {
local tenant="${1:-tg}"
local admin_url="${2:-http://pulsar:8080}"
echo "Validating TrustGraph initialization..."
# Check tenant exists
if curl -s "$admin_url/admin/v2/tenants/$tenant" > /dev/null; then
echo "✓ Tenant '$tenant' exists"
else
echo "✗ Tenant '$tenant' missing"
return 1
fi
# Check namespaces
local namespaces=("flow" "request" "response" "config")
for ns in "${namespaces[@]}"; do
if curl -s "$admin_url/admin/v2/namespaces/$tenant/$ns" > /dev/null; then
echo "✓ Namespace '$tenant/$ns' exists"
else
echo "✗ Namespace '$tenant/$ns' missing"
return 1
fi
done
echo "✓ TrustGraph initialization validated"
}
# Test configuration loading
test_config_loading() {
local test_config='{
"test": {
"value": "test-value",
"timestamp": "'$(date -Iseconds)'"
}
}'
echo "Testing configuration loading..."
if tg-init-trustgraph --config "$test_config"; then
echo "✓ Configuration loading successful"
else
echo "✗ Configuration loading failed"
return 1
fi
}
```
### Retry Logic and Error Handling
```bash
# Robust initialization with retry
robust_init() {
local max_attempts=5
local attempt=1
local delay=10
while [ $attempt -le $max_attempts ]; do
echo "Initialization attempt $attempt of $max_attempts..."
if tg-init-trustgraph "$@"; then
echo "✓ Initialization successful on attempt $attempt"
return 0
else
echo "✗ Attempt $attempt failed"
if [ $attempt -lt $max_attempts ]; then
echo "Waiting ${delay}s before retry..."
sleep $delay
delay=$((delay * 2)) # Exponential backoff
fi
fi
attempt=$((attempt + 1))
done
echo "✗ All initialization attempts failed"
return 1
}
```
## Docker Integration
### Docker Compose
```yaml
version: '3.8'
services:
pulsar:
image: apachepulsar/pulsar:latest
ports:
- "6650:6650"
- "8080:8080"
command: bin/pulsar standalone
trustgraph-init:
image: trustgraph/cli:latest
depends_on:
- pulsar
volumes:
- ./config.json:/config.json:ro
command: >
sh -c "
sleep 30 &&
tg-init-trustgraph --config '$$(cat /config.json)'
"
environment:
- TRUSTGRAPH_PULSAR_ADMIN_URL=http://pulsar:8080
- TRUSTGRAPH_PULSAR_HOST=pulsar://pulsar:6650
```
### Kubernetes Init Container
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: trustgraph-config
data:
config.json: |
{
"prompt": {
"system": "You are a helpful AI assistant."
}
}
---
apiVersion: batch/v1
kind: Job
metadata:
name: trustgraph-init
spec:
template:
spec:
initContainers:
- name: wait-for-pulsar
image: busybox
command:
- sh
- -c
- |
until nc -z pulsar 8080; do
echo "Waiting for Pulsar..."
sleep 5
done
containers:
- name: init
image: trustgraph/cli:latest
command:
- tg-init-trustgraph
- --pulsar-admin-url=http://pulsar:8080
- --pulsar-host=pulsar://pulsar:6650
- --config=$(cat /config/config.json)
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
configMap:
name: trustgraph-config
restartPolicy: Never
```
## Error Handling
### Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Verify Pulsar is running and accessible at the specified admin URL.
### Authentication Errors
```bash
Exception: 401 Unauthorized
```
**Solution**: Check Pulsar API key if authentication is enabled.
### Tenant Creation Failures
```bash
Exception: Tenant creation failed
```
**Solution**: Verify admin permissions and cluster configuration.
### Configuration Loading Errors
```bash
Exception: Invalid JSON configuration
```
**Solution**: Validate JSON syntax and structure.
## Security Considerations
### API Key Management
```bash
# Use environment variables for sensitive data
export PULSAR_API_KEY="your-secure-api-key"
tg-init-trustgraph --pulsar-api-key "$PULSAR_API_KEY"
# Or use a secure file
tg-init-trustgraph --pulsar-api-key "$(cat /secure/pulsar-key.txt)"
```
### Network Security
```bash
# Use TLS for production
tg-init-trustgraph \
--pulsar-admin-url https://secure-pulsar:8443 \
--pulsar-host pulsar+ssl://secure-pulsar:6651
```
## Related Commands
- [`tg-init-pulsar-manager`](tg-init-pulsar-manager.md) - Initialize Pulsar Manager
- [`tg-show-config`](tg-show-config.md) - Display current configuration
- [`tg-set-prompt`](tg-set-prompt.md) - Configure individual prompts
## Best Practices
1. **Run Once**: Typically run once per environment during initial setup
2. **Idempotent**: Safe to run multiple times - existing resources are preserved
3. **Configuration**: Always load initial configuration during setup
4. **Validation**: Verify initialization success with validation scripts
5. **Environment Variables**: Use environment variables for sensitive configuration
6. **Retry Logic**: Implement retry logic for robust deployments
7. **Monitoring**: Monitor namespace and topic creation for issues
## Troubleshooting
### Pulsar Not Ready
```bash
# Check Pulsar health
curl http://pulsar:8080/admin/v2/clusters
# Check Pulsar logs
docker logs pulsar
```
### Permission Issues
```bash
# Verify Pulsar admin access
curl http://pulsar:8080/admin/v2/tenants
# Check API key validity if using authentication
```
### Configuration Validation
```bash
# Validate JSON configuration
echo "$CONFIG" | jq .
# Test configuration loading separately
tg-init-trustgraph --config '{"test": "value"}'
```

163
docs/cli/tg-invoke-agent.md Normal file
View file

@ -0,0 +1,163 @@
# tg-invoke-agent
Uses the agent service to answer a question via interactive WebSocket connection.
## Synopsis
```bash
tg-invoke-agent -q "your question" [options]
```
## Description
The `tg-invoke-agent` command provides an interactive interface to TrustGraph's agent service. It connects via WebSocket to submit questions and receive real-time responses, including the agent's thinking process and observations when verbose mode is enabled.
The agent uses available tools and knowledge sources to answer questions, providing a conversational AI interface to your TrustGraph knowledge base.
## Options
### Required Arguments
- `-q, --question QUESTION`: The question to ask the agent
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `ws://localhost:8088/`)
- `-f, --flow-id FLOW`: Flow ID to use (default: `default`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
- `-C, --collection COLLECTION`: Collection identifier (default: `default`)
- `-l, --plan PLAN`: Agent plan specification (optional)
- `-s, --state STATE`: Agent initial state (optional)
- `-v, --verbose`: Output agent's thinking process and observations
## Examples
### Basic Question
```bash
tg-invoke-agent -q "What is machine learning?"
```
### Verbose Output with Thinking Process
```bash
tg-invoke-agent -q "Explain the benefits of neural networks" -v
```
### Using Specific Flow
```bash
tg-invoke-agent -q "What documents are available?" -f research-flow
```
### With Custom User and Collection
```bash
tg-invoke-agent -q "Show me recent papers" -U alice -C research-papers
```
### Using Custom API URL
```bash
tg-invoke-agent -q "What is AI?" -u ws://production:8088/
```
## Output Format
### Standard Output
The agent provides direct answers to your questions:
```
AI stands for Artificial Intelligence, which refers to computer systems that can perform tasks typically requiring human intelligence.
```
### Verbose Output
With `-v` flag, you see the agent's thinking process:
```
❓ What is machine learning?
🤔 I need to provide a comprehensive explanation of machine learning, including its definition, key concepts, and applications.
💡 Let me search for information about machine learning in the knowledge base.
Machine learning is a subset of artificial intelligence that enables computers to learn and improve automatically from experience without being explicitly programmed...
```
The emoji indicators represent:
- ❓ Your question
- 🤔 Agent's thinking/reasoning
- 💡 Agent's observations from tools/searches
## Error Handling
Common errors and solutions:
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Verify the API URL and ensure TrustGraph is running.
### Flow Not Found
```bash
Exception: Invalid flow
```
**Solution**: Check that the specified flow exists and is running using `tg-show-flows`.
### Authentication Errors
```bash
Exception: Unauthorized
```
**Solution**: Verify your authentication credentials and permissions.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL (converted to WebSocket URL automatically)
## Related Commands
- [`tg-invoke-graph-rag`](tg-invoke-graph-rag.md) - Graph-based retrieval augmented generation
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Document-based retrieval augmented generation
- [`tg-invoke-llm`](tg-invoke-llm.md) - Direct LLM text completion
- [`tg-show-tools`](tg-show-tools.md) - List available agent tools
- [`tg-show-flows`](tg-show-flows.md) - List available flows
## Technical Details
### WebSocket Communication
The command uses WebSocket protocol for real-time communication with the agent service. The URL is automatically converted from HTTP to WebSocket format.
### Message Format
Messages are exchanged in JSON format:
**Request:**
```json
{
"id": "unique-message-id",
"service": "agent",
"flow": "flow-id",
"request": {
"question": "your question"
}
}
```
**Response:**
```json
{
"id": "unique-message-id",
"response": {
"thought": "agent thinking",
"observation": "agent observation",
"answer": "final answer"
},
"complete": true
}
```
### API Integration
This command uses the [Agent API](../apis/api-agent.md) via WebSocket connection for real-time interaction.
## Use Cases
- **Interactive Q&A**: Ask questions about your knowledge base
- **Research Assistance**: Get help analyzing documents and data
- **Knowledge Discovery**: Explore connections in your data
- **Troubleshooting**: Get help with technical issues using verbose mode
- **Educational**: Learn about topics in your knowledge base

View file

@ -0,0 +1,438 @@
# tg-invoke-document-rag
Invokes the DocumentRAG service to answer questions using document context and retrieval-augmented generation.
## Synopsis
```bash
tg-invoke-document-rag -q QUESTION [options]
```
## Description
The `tg-invoke-document-rag` command uses TrustGraph's DocumentRAG service to answer questions by retrieving relevant document context and generating responses using large language models. This implements a Retrieval-Augmented Generation (RAG) approach that grounds AI responses in your document corpus.
The service searches through indexed documents to find relevant context, then uses that context to generate accurate, source-backed answers to questions.
## Options
### Required Arguments
- `-q, --question QUESTION`: The question to answer
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-f, --flow-id ID`: Flow instance ID to use (default: `default`)
- `-U, --user USER`: User ID for context isolation (default: `trustgraph`)
- `-C, --collection COLLECTION`: Document collection to search (default: `default`)
- `-d, --doc-limit LIMIT`: Maximum number of documents to retrieve (default: `10`)
## Examples
### Basic Question Answering
```bash
tg-invoke-document-rag -q "What is the company's return policy?"
```
### Question with Custom Parameters
```bash
tg-invoke-document-rag \
-q "How do I configure SSL certificates?" \
-f "production-docs" \
-U "admin" \
-C "technical-docs" \
-d 5
```
### Complex Technical Questions
```bash
tg-invoke-document-rag \
-q "What are the performance benchmarks for the new API endpoints?" \
-f "api-docs" \
-C "performance-reports"
```
### Multi-domain Questions
```bash
# Legal documents
tg-invoke-document-rag -q "What are the privacy policy requirements?" -C "legal-docs"
# Technical documentation
tg-invoke-document-rag -q "How do I troubleshoot connection timeouts?" -C "tech-docs"
# Marketing materials
tg-invoke-document-rag -q "What are our key product differentiators?" -C "marketing"
```
## Output Format
The command returns a structured response with:
```json
{
"question": "What is the company's return policy?",
"answer": "Based on the company policy documents, customers can return items within 30 days of purchase for a full refund. Items must be in original condition with receipt. Digital products are non-refundable except in cases of technical defects.",
"sources": [
{
"document": "customer-service-policy.pdf",
"relevance": 0.92,
"section": "Returns and Refunds"
},
{
"document": "terms-of-service.pdf",
"relevance": 0.85,
"section": "Customer Rights"
}
],
"confidence": 0.89
}
```
## Use Cases
### Customer Support
```bash
# Answer common customer questions
tg-invoke-document-rag -q "How do I reset my password?" -C "support-docs"
# Product information queries
tg-invoke-document-rag -q "What are the system requirements?" -C "product-specs"
# Troubleshooting assistance
tg-invoke-document-rag -q "Why is my upload failing?" -C "troubleshooting"
```
### Technical Documentation
```bash
# API documentation queries
tg-invoke-document-rag -q "How do I authenticate with the REST API?" -C "api-docs"
# Configuration questions
tg-invoke-document-rag -q "What are the required environment variables?" -C "config-docs"
# Architecture information
tg-invoke-document-rag -q "How does the caching system work?" -C "architecture"
```
### Research and Analysis
```bash
# Research queries
tg-invoke-document-rag -q "What are the latest industry trends?" -C "research-reports"
# Compliance questions
tg-invoke-document-rag -q "What are the GDPR requirements?" -C "compliance-docs"
# Best practices
tg-invoke-document-rag -q "What are the security best practices?" -C "security-guidelines"
```
### Interactive Q&A Sessions
```bash
# Batch questions for analysis
questions=(
"What is our market share?"
"How do we compare to competitors?"
"What are the growth projections?"
)
for question in "${questions[@]}"; do
echo "Question: $question"
tg-invoke-document-rag -q "$question" -C "business-reports"
echo "---"
done
```
## Document Context and Retrieval
### Document Limit Tuning
```bash
# Few documents for focused answers
tg-invoke-document-rag -q "What is the API rate limit?" -d 3
# Many documents for comprehensive analysis
tg-invoke-document-rag -q "What are all the security measures?" -d 20
```
### Collection-Specific Queries
```bash
# Target specific document collections
tg-invoke-document-rag -q "What is the deployment process?" -C "devops-docs"
tg-invoke-document-rag -q "What are the testing standards?" -C "qa-docs"
tg-invoke-document-rag -q "What is the coding style guide?" -C "dev-standards"
```
### User Context Isolation
```bash
# Department-specific contexts
tg-invoke-document-rag -q "What is the budget allocation?" -U "finance" -C "finance-docs"
tg-invoke-document-rag -q "What are the hiring requirements?" -U "hr" -C "hr-docs"
```
## Error Handling
### Question Required
```bash
Exception: Question is required
```
**Solution**: Provide a question with the `-q` option.
### Flow Not Found
```bash
Exception: Flow instance 'nonexistent-flow' not found
```
**Solution**: Verify the flow ID exists with `tg-show-flows`.
### Collection Not Found
```bash
Exception: Collection 'invalid-collection' not found
```
**Solution**: Check available collections with document library commands.
### No Documents Found
```bash
Exception: No relevant documents found for query
```
**Solution**: Verify documents are indexed and collection contains relevant content.
### API Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph services are running.
## Advanced Usage
### Batch Processing
```bash
# Process questions from file
while IFS= read -r question; do
if [ -n "$question" ]; then
echo "Processing: $question"
tg-invoke-document-rag -q "$question" -C "knowledge-base" > "answer-$(date +%s).json"
fi
done < questions.txt
```
### Question Analysis Pipeline
```bash
#!/bin/bash
# analyze-questions.sh
questions_file="$1"
collection="$2"
if [ -z "$questions_file" ] || [ -z "$collection" ]; then
echo "Usage: $0 <questions-file> <collection>"
exit 1
fi
echo "Question Analysis Report - $(date)"
echo "Collection: $collection"
echo "=================================="
question_num=1
while IFS= read -r question; do
if [ -n "$question" ]; then
echo -e "\n$question_num. $question"
echo "$(printf '=%.0s' {1..50})"
# Get answer
answer=$(tg-invoke-document-rag -q "$question" -C "$collection" 2>/dev/null)
if [ $? -eq 0 ]; then
echo "$answer" | jq -r '.answer' 2>/dev/null || echo "$answer"
# Extract sources if available
sources=$(echo "$answer" | jq -r '.sources[]?.document' 2>/dev/null)
if [ -n "$sources" ]; then
echo -e "\nSources:"
echo "$sources" | sed 's/^/ - /'
fi
else
echo "ERROR: Could not get answer"
fi
question_num=$((question_num + 1))
fi
done < "$questions_file"
```
### Quality Assessment
```bash
# Assess answer quality with multiple document limits
question="What are the security protocols?"
collection="security-docs"
echo "Answer Quality Assessment"
echo "Question: $question"
echo "========================"
for limit in 3 5 10 15 20; do
echo -e "\nDocument limit: $limit"
echo "$(printf '-%.0s' {1..30})"
answer=$(tg-invoke-document-rag -q "$question" -C "$collection" -d $limit 2>/dev/null)
if [ $? -eq 0 ]; then
# Get answer length and source count
answer_length=$(echo "$answer" | jq -r '.answer' 2>/dev/null | wc -c)
source_count=$(echo "$answer" | jq -r '.sources | length' 2>/dev/null)
confidence=$(echo "$answer" | jq -r '.confidence' 2>/dev/null)
echo "Answer length: $answer_length characters"
echo "Source count: $source_count"
echo "Confidence: $confidence"
else
echo "ERROR: Failed to get answer"
fi
done
```
### Interactive Q&A Interface
```bash
#!/bin/bash
# interactive-rag.sh
collection="${1:-default}"
flow_id="${2:-default}"
echo "Interactive Document RAG Interface"
echo "Collection: $collection"
echo "Flow ID: $flow_id"
echo "Type 'quit' to exit"
echo "=================================="
while true; do
echo -n "Question: "
read -r question
if [ "$question" = "quit" ]; then
break
fi
if [ -n "$question" ]; then
echo "Thinking..."
answer=$(tg-invoke-document-rag -q "$question" -C "$collection" -f "$flow_id" 2>/dev/null)
if [ $? -eq 0 ]; then
echo "Answer:"
echo "$answer" | jq -r '.answer' 2>/dev/null || echo "$answer"
# Show sources if available
sources=$(echo "$answer" | jq -r '.sources[]?.document' 2>/dev/null)
if [ -n "$sources" ]; then
echo -e "\nSources:"
echo "$sources" | sed 's/^/ - /'
fi
else
echo "Sorry, I couldn't answer that question."
fi
echo -e "\n$(printf '=%.0s' {1..50})"
fi
done
echo "Goodbye!"
```
## Performance Optimization
### Document Limit Optimization
```bash
# Test different document limits for performance
question="What is the system architecture?"
collection="tech-docs"
for limit in 3 5 10 15 20; do
echo "Testing document limit: $limit"
start_time=$(date +%s%N)
tg-invoke-document-rag -q "$question" -C "$collection" -d $limit > /dev/null 2>&1
end_time=$(date +%s%N)
duration=$(( (end_time - start_time) / 1000000 )) # Convert to milliseconds
echo " Duration: ${duration}ms"
done
```
### Caching Strategy
```bash
# Cache frequently asked questions
cache_dir="rag-cache"
mkdir -p "$cache_dir"
ask_question() {
local question="$1"
local collection="$2"
local cache_key=$(echo "$question-$collection" | md5sum | cut -d' ' -f1)
local cache_file="$cache_dir/$cache_key.json"
if [ -f "$cache_file" ]; then
echo "Cache hit for: $question"
cat "$cache_file"
else
echo "Cache miss, querying: $question"
tg-invoke-document-rag -q "$question" -C "$collection" | tee "$cache_file"
fi
}
# Use cached queries
ask_question "What is the API documentation?" "tech-docs"
ask_question "What are the system requirements?" "spec-docs"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-load-pdf`](tg-load-pdf.md) - Load PDF documents for RAG
- [`tg-show-library-documents`](tg-show-library-documents.md) - List available documents
- [`tg-invoke-prompt`](tg-invoke-prompt.md) - Direct prompt invocation without RAG
- [`tg-start-flow`](tg-start-flow.md) - Start flows for document processing
- [`tg-show-flows`](tg-show-flows.md) - List active flow instances
## API Integration
This command uses the [DocumentRAG API](../apis/api-document-rag.md) to perform retrieval-augmented generation using the document corpus.
## Best Practices
1. **Question Formulation**: Use specific, well-formed questions for better results
2. **Collection Organization**: Organize documents into logical collections
3. **Document Limits**: Balance accuracy with performance using appropriate document limits
4. **User Context**: Use user isolation for sensitive or department-specific queries
5. **Source Verification**: Always check source documents for critical information
6. **Caching**: Implement caching for frequently asked questions
7. **Quality Assessment**: Regularly evaluate answer quality and adjust parameters
## Troubleshooting
### Poor Answer Quality
```bash
# Try different document limits
tg-invoke-document-rag -q "your question" -d 5 # Fewer documents
tg-invoke-document-rag -q "your question" -d 15 # More documents
# Check document collection
tg-show-library-documents -C "your-collection"
```
### Slow Response Times
```bash
# Reduce document limit
tg-invoke-document-rag -q "your question" -d 3
# Check flow performance
tg-show-flows | grep "document-rag"
```
### Missing Context
```bash
# Verify documents are indexed
tg-show-library-documents -C "your-collection"
# Check if collection exists
tg-show-library-documents | grep "your-collection"
```

View file

@ -0,0 +1,221 @@
# tg-invoke-graph-rag
Uses the Graph RAG service to answer questions using knowledge graph data.
## Synopsis
```bash
tg-invoke-graph-rag -q "question" [options]
```
## Description
The `tg-invoke-graph-rag` command performs graph-based Retrieval Augmented Generation (RAG) to answer questions using structured knowledge from the knowledge graph. It retrieves relevant entities and relationships from the graph and uses them to provide contextually accurate answers.
Graph RAG is particularly effective for questions that require understanding relationships between entities, reasoning over structured knowledge, and providing answers based on factual connections in the data.
## Options
### Required Arguments
- `-q, --question QUESTION`: The question to answer using graph knowledge
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-f, --flow-id FLOW`: Flow ID to use (default: `default`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
- `-C, --collection COLLECTION`: Collection identifier (default: `default`)
### Graph Search Parameters
- `-e, --entity-limit LIMIT`: Maximum entities to retrieve (default: `50`)
- `-t, --triple-limit LIMIT`: Maximum triples to retrieve (default: `30`)
- `-s, --max-subgraph-size SIZE`: Maximum subgraph size (default: `150`)
- `-p, --max-path-length LENGTH`: Maximum path length for graph traversal (default: `2`)
## Examples
### Basic Graph RAG Query
```bash
tg-invoke-graph-rag -q "What is the relationship between AI and machine learning?"
```
### With Custom Parameters
```bash
tg-invoke-graph-rag \
-q "How are neural networks connected to deep learning?" \
-e 100 \
-t 50 \
-s 200
```
### Using Specific Flow and Collection
```bash
tg-invoke-graph-rag \
-q "What research papers discuss climate change?" \
-f research-flow \
-C scientific-papers \
-U researcher
```
### Large Graph Exploration
```bash
tg-invoke-graph-rag \
-q "Explain the connections between quantum computing and cryptography" \
-e 150 \
-t 100 \
-s 300 \
-p 3
```
## Graph Search Parameters Explained
### Entity Limit (`-e, --entity-limit`)
Controls how many entities are retrieved from the knowledge graph that are relevant to the question. Higher values provide more context but may include less relevant information.
### Triple Limit (`-t, --triple-limit`)
Limits the number of relationship triples (subject-predicate-object) retrieved. These triples define the relationships between entities.
### Max Subgraph Size (`-s, --max-subgraph-size`)
Sets the maximum size of the knowledge subgraph used for answering. Larger subgraphs provide more complete context but require more processing.
### Max Path Length (`-p, --max-path-length`)
Determines how many "hops" through the graph are considered when finding relationships. Higher values can discover more distant but potentially relevant connections.
## Output Format
The command returns a natural language answer based on the retrieved graph knowledge:
```
Neural networks are a fundamental component of deep learning architectures.
The knowledge graph shows that deep learning is a subset of machine learning
that specifically utilizes multi-layered neural networks. These networks consist
of interconnected nodes (neurons) organized in layers, where each layer processes
and transforms the input data. The relationship between neural networks and deep
learning is that neural networks provide the computational structure, while deep
learning represents the training methodologies and architectures that use these
networks to learn complex patterns from data.
```
## How Graph RAG Works
1. **Query Analysis**: Analyzes the question to identify key entities and concepts
2. **Entity Retrieval**: Finds relevant entities in the knowledge graph
3. **Subgraph Extraction**: Retrieves connected entities and relationships
4. **Context Assembly**: Combines retrieved knowledge into coherent context
5. **Answer Generation**: Uses LLM with graph context to generate accurate answers
## Comparison with Document RAG
### Graph RAG Advantages
- **Structured Knowledge**: Leverages explicit relationships between concepts
- **Reasoning Capability**: Can infer answers from connected facts
- **Consistency**: Provides factually consistent answers based on structured data
- **Relationship Discovery**: Excellent for questions about connections and relationships
### When to Use Graph RAG
- Questions about relationships between entities
- Queries requiring logical reasoning over facts
- When you need to understand connections in complex domains
- For factual questions with precise answers
## Error Handling
### Flow Not Available
```bash
Exception: Invalid flow
```
**Solution**: Verify the flow exists and is running with `tg-show-flows`.
### No Graph Data
```bash
Exception: No relevant knowledge found
```
**Solution**: Ensure knowledge has been loaded into the graph using `tg-load-kg-core` or document processing.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
### Parameter Errors
```bash
Exception: Invalid parameter value
```
**Solution**: Verify that numeric parameters are within valid ranges.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Document-based RAG queries
- [`tg-invoke-agent`](tg-invoke-agent.md) - Interactive agent with multiple tools
- [`tg-load-kg-core`](tg-load-kg-core.md) - Load knowledge into graph
- [`tg-show-graph`](tg-show-graph.md) - Explore graph contents
- [`tg-show-flows`](tg-show-flows.md) - List available flows
## API Integration
This command uses the [Graph RAG API](../apis/api-graph-rag.md) to perform retrieval augmented generation using knowledge graph data.
## Use Cases
### Research and Academia
```bash
tg-invoke-graph-rag \
-q "What are the key researchers working on quantum machine learning?" \
-C academic-papers
```
### Business Intelligence
```bash
tg-invoke-graph-rag \
-q "How do our products relate to market trends?" \
-C business-data
```
### Technical Documentation
```bash
tg-invoke-graph-rag \
-q "What are the dependencies between these software components?" \
-C technical-docs
```
### Medical Knowledge
```bash
tg-invoke-graph-rag \
-q "What are the known interactions between these medications?" \
-C medical-knowledge
```
## Performance Tuning
### For Broad Questions
Increase limits to get comprehensive answers:
```bash
-e 100 -t 80 -s 250 -p 3
```
### For Specific Questions
Use lower limits for faster, focused responses:
```bash
-e 30 -t 20 -s 100 -p 2
```
### For Deep Analysis
Allow longer paths and larger subgraphs:
```bash
-e 150 -t 100 -s 400 -p 4
```
## Best Practices
1. **Parameter Tuning**: Start with defaults and adjust based on question complexity
2. **Question Clarity**: Ask specific questions for better graph retrieval
3. **Knowledge Quality**: Ensure high-quality knowledge is loaded in the graph
4. **Flow Selection**: Use flows with appropriate knowledge domains
5. **Collection Targeting**: Specify relevant collections for focused results

267
docs/cli/tg-invoke-llm.md Normal file
View file

@ -0,0 +1,267 @@
# tg-invoke-llm
Invokes the text completion service with custom system and user prompts.
## Synopsis
```bash
tg-invoke-llm "system prompt" "user prompt" [options]
```
## Description
The `tg-invoke-llm` command provides direct access to the Large Language Model (LLM) text completion service. It allows you to specify both a system prompt (which sets the AI's behavior and context) and a user prompt (the actual query or task), giving you complete control over the LLM interaction.
This is useful for custom AI tasks, experimentation with prompts, and direct LLM integration without the overhead of retrieval augmented generation or agent frameworks.
## Options
### Required Arguments
- `system`: System prompt that defines the AI's role and behavior
- `prompt`: User prompt containing the actual query or task
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-f, --flow-id FLOW`: Flow ID to use (default: `default`)
## Arguments
The command requires exactly two positional arguments:
1. **System Prompt**: Sets the AI's context, role, and behavior
2. **User Prompt**: The specific question, task, or content to process
## Examples
### Basic Question Answering
```bash
tg-invoke-llm "You are a helpful assistant." "What is the capital of France?"
```
### Code Generation
```bash
tg-invoke-llm \
"You are an expert Python programmer." \
"Write a function to calculate the Fibonacci sequence."
```
### Creative Writing
```bash
tg-invoke-llm \
"You are a creative writer specializing in science fiction." \
"Write a short story about time travel in 200 words."
```
### Technical Documentation
```bash
tg-invoke-llm \
"You are a technical writer who creates clear, concise documentation." \
"Explain how REST APIs work in simple terms."
```
### Data Analysis
```bash
tg-invoke-llm \
"You are a data analyst expert at interpreting statistics." \
"Explain what a p-value means and when it's significant."
```
### Using Specific Flow
```bash
tg-invoke-llm \
"You are a medical expert." \
"Explain the difference between Type 1 and Type 2 diabetes." \
-f medical-flow
```
## System Prompt Design
The system prompt is crucial for getting good results:
### Role Definition
```bash
"You are a [role] with expertise in [domain]."
```
### Behavior Instructions
```bash
"You are helpful, accurate, and concise. Always provide examples."
```
### Output Format
```bash
"You are a technical writer. Always structure your responses with clear headings and bullet points."
```
### Constraints
```bash
"You are a helpful assistant. Keep responses under 100 words and always cite sources when possible."
```
## Output Format
The command returns the LLM's response directly:
```
The capital of France is Paris. Paris has been the capital city of France since the late 10th century and is located in the north-central part of the country along the Seine River. It is the most populous city in France with over 2 million inhabitants in the city proper and over 12 million in the metropolitan area.
```
## Prompt Engineering Tips
### Effective System Prompts
- **Be Specific**: Clearly define the AI's role and expertise
- **Set Tone**: Specify the desired communication style
- **Include Constraints**: Set limits on response length or format
- **Provide Context**: Give relevant background information
### Effective User Prompts
- **Be Clear**: State exactly what you want
- **Provide Examples**: Show the desired output format
- **Add Context**: Include relevant background information
- **Specify Format**: Request specific output structure
## Error Handling
### Flow Not Available
```bash
Exception: Invalid flow
```
**Solution**: Verify the flow exists and is running with `tg-show-flows`.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
### Prompt Errors
```bash
Exception: Invalid prompt format
```
**Solution**: Ensure both system and user prompts are provided as separate arguments.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-invoke-agent`](tg-invoke-agent.md) - Interactive agent with tools and reasoning
- [`tg-invoke-graph-rag`](tg-invoke-graph-rag.md) - Graph-based retrieval augmented generation
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Document-based retrieval augmented generation
- [`tg-invoke-prompt`](tg-invoke-prompt.md) - Use predefined prompt templates
## API Integration
This command uses the [Text Completion API](../apis/api-text-completion.md) to perform direct LLM inference with custom prompts.
## Use Cases
### Development and Testing
```bash
# Test prompt variations
tg-invoke-llm "You are a code reviewer." "Review this Python function: def add(a, b): return a + b"
# Experiment with different system prompts
tg-invoke-llm "You are a harsh critic." "What do you think of Python?"
tg-invoke-llm "You are an enthusiastic supporter." "What do you think of Python?"
```
### Content Generation
```bash
# Blog post writing
tg-invoke-llm \
"You are a technical blogger who writes engaging, informative content." \
"Write an introduction to machine learning for beginners."
# Marketing copy
tg-invoke-llm \
"You are a marketing copywriter focused on clear, compelling messaging." \
"Write a product description for a cloud storage service."
```
### Educational Applications
```bash
# Concept explanation
tg-invoke-llm \
"You are a teacher who explains complex topics in simple terms." \
"Explain quantum computing to a high school student."
# Study guides
tg-invoke-llm \
"You are an educational content creator specializing in study materials." \
"Create a study guide for photosynthesis."
```
### Business Applications
```bash
# Report summarization
tg-invoke-llm \
"You are a business analyst who creates executive summaries." \
"Summarize the key points from this quarterly report: [report text]"
# Email drafting
tg-invoke-llm \
"You are a professional communication expert." \
"Draft a polite follow-up email for a job interview."
```
### Research and Analysis
```bash
# Literature review
tg-invoke-llm \
"You are a research academic who analyzes scientific literature." \
"What are the current trends in renewable energy research?"
# Competitive analysis
tg-invoke-llm \
"You are a market research analyst." \
"Compare the features of different cloud computing platforms."
```
## Advanced Techniques
### Multi-step Reasoning
```bash
# Chain of thought prompting
tg-invoke-llm \
"You are a logical reasoner. Work through problems step by step." \
"If a train travels 60 mph for 2 hours, then 80 mph for 1 hour, what's the average speed?"
```
### Format Control
```bash
# JSON output
tg-invoke-llm \
"You are a data processor. Always respond with valid JSON." \
"Convert this to JSON: Name: John, Age: 30, City: New York"
# Structured responses
tg-invoke-llm \
"You are a technical writer. Use markdown formatting with headers and lists." \
"Explain the software development lifecycle."
```
### Domain Expertise
```bash
# Legal analysis
tg-invoke-llm \
"You are a legal expert specializing in contract law." \
"What are the key elements of a valid contract?"
# Medical information
tg-invoke-llm \
"You are a medical professional. Provide accurate, evidence-based information." \
"What are the symptoms of Type 2 diabetes?"
```
## Best Practices
1. **Clear System Prompts**: Define the AI's role and behavior explicitly
2. **Specific User Prompts**: Be precise about what you want
3. **Iterative Refinement**: Experiment with different prompt variations
4. **Output Validation**: Verify the quality and accuracy of responses
5. **Appropriate Flows**: Use flows configured for your specific domain
6. **Length Considerations**: Balance detail with conciseness in prompts

View file

@ -0,0 +1,430 @@
# tg-invoke-prompt
Invokes the LLM prompt service using predefined prompt templates with variable substitution.
## Synopsis
```bash
tg-invoke-prompt [options] template-id [variable=value ...]
```
## Description
The `tg-invoke-prompt` command invokes TrustGraph's LLM prompt service using predefined prompt templates. Templates contain placeholder variables in the format `{{variable}}` that are replaced with values provided on the command line.
This provides a structured way to interact with language models using consistent, reusable prompt templates for specific tasks like question answering, text extraction, analysis, and more.
## Options
### Required Arguments
- `template-id`: Prompt template identifier (e.g., `question`, `extract-definitions`, `summarize`)
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-f, --flow-id ID`: Flow instance ID to use (default: `default`)
- `variable=value`: Template variable assignments (can be specified multiple times)
## Examples
### Basic Question Answering
```bash
tg-invoke-prompt question text="What is artificial intelligence?" context="AI research field"
```
### Extract Definitions
```bash
tg-invoke-prompt extract-definitions \
document="Machine learning is a subset of artificial intelligence..." \
terms="machine learning,neural networks"
```
### Text Summarization
```bash
tg-invoke-prompt summarize \
text="$(cat large-document.txt)" \
max_length="200" \
style="technical"
```
### Custom Flow and Variables
```bash
tg-invoke-prompt analysis \
-f "research-flow" \
data="$(cat research-data.json)" \
focus="trends" \
output_format="markdown"
```
## Variable Substitution
Templates use `{{variable}}` placeholders that are replaced with command-line values:
### Simple Variables
```bash
tg-invoke-prompt greeting name="Alice" time="morning"
# Template: "Good {{time}}, {{name}}!"
# Result: "Good morning, Alice!"
```
### Complex Variables
```bash
tg-invoke-prompt analyze \
dataset="$(cat data.csv)" \
columns="name,age,salary" \
analysis_type="statistical_summary"
```
### Multi-line Variables
```bash
tg-invoke-prompt review \
code="$(cat app.py)" \
checklist="security,performance,maintainability" \
severity="high"
```
## Common Template Types
### Question Answering
```bash
# Direct question
tg-invoke-prompt question \
text="What is the capital of France?" \
context="geography"
# Contextual question
tg-invoke-prompt question \
text="How does this work?" \
context="$(cat technical-manual.txt)"
```
### Text Processing
```bash
# Extract key information
tg-invoke-prompt extract-key-points \
document="$(cat meeting-notes.txt)" \
format="bullet_points"
# Text classification
tg-invoke-prompt classify \
text="Customer is very unhappy with service" \
categories="positive,negative,neutral"
```
### Code Analysis
```bash
# Code review
tg-invoke-prompt code-review \
code="$(cat script.py)" \
language="python" \
focus="security,performance"
# Bug analysis
tg-invoke-prompt debug \
code="$(cat buggy-code.js)" \
error="TypeError: Cannot read property 'length' of undefined"
```
### Data Analysis
```bash
# Data insights
tg-invoke-prompt data-analysis \
data="$(cat sales-data.json)" \
metrics="revenue,growth,trends" \
period="quarterly"
```
## Template Management
### List Available Templates
```bash
# Show available prompt templates
tg-show-prompts
```
### Create Custom Templates
```bash
# Define a new template
tg-set-prompt analysis-template \
"Analyze the following {{data_type}}: {{data}}. Focus on {{focus_areas}}. Output format: {{format}}"
```
### Template Variables
Common template variables:
- `{{text}}` - Input text to process
- `{{context}}` - Additional context information
- `{{format}}` - Output format specification
- `{{language}}` - Programming language for code analysis
- `{{style}}` - Writing or analysis style
- `{{length}}` - Length constraints for output
## Output Formats
### String Response
```bash
tg-invoke-prompt summarize text="Long document..." max_length="100"
# Output: "This document discusses..."
```
### JSON Response
```bash
tg-invoke-prompt extract-structured data="Name: John, Age: 30, City: NYC"
# Output:
# {
# "name": "John",
# "age": 30,
# "city": "NYC"
# }
```
## Error Handling
### Missing Template
```bash
Exception: Template 'nonexistent-template' not found
```
**Solution**: Check available templates with `tg-show-prompts`.
### Missing Variables
```bash
Exception: Template variable 'required_var' not provided
```
**Solution**: Provide all required variables as `variable=value` arguments.
### Malformed Variables
```bash
Exception: Malformed variable: invalid-format
```
**Solution**: Use `variable=value` format for all variable assignments.
### Flow Not Found
```bash
Exception: Flow instance 'invalid-flow' not found
```
**Solution**: Verify flow ID exists with `tg-show-flows`.
## Advanced Usage
### File Input Processing
```bash
# Process multiple files
for file in *.txt; do
echo "Processing $file..."
tg-invoke-prompt summarize \
text="$(cat "$file")" \
filename="$file" \
max_length="150"
done
```
### Batch Processing
```bash
# Process data in batches
while IFS= read -r line; do
tg-invoke-prompt classify \
text="$line" \
categories="spam,ham,promotional" \
confidence_threshold="0.8"
done < input-data.txt
```
### Pipeline Processing
```bash
# Chain multiple prompts
initial_analysis=$(tg-invoke-prompt analyze data="$(cat raw-data.json)")
summary=$(tg-invoke-prompt summarize text="$initial_analysis" style="executive")
echo "$summary"
```
### Interactive Processing
```bash
#!/bin/bash
# interactive-prompt.sh
template="$1"
if [ -z "$template" ]; then
echo "Usage: $0 <template-id>"
exit 1
fi
echo "Interactive prompt using template: $template"
echo "Enter variables (var=value), empty line to execute:"
variables=()
while true; do
read -p "> " input
if [ -z "$input" ]; then
break
fi
variables+=("$input")
done
echo "Executing prompt..."
tg-invoke-prompt "$template" "${variables[@]}"
```
### Configuration-Driven Processing
```bash
# Use configuration file for prompts
config_file="prompt-config.json"
template=$(jq -r '.template' "$config_file")
variables=$(jq -r '.variables | to_entries[] | "\(.key)=\(.value)"' "$config_file")
tg-invoke-prompt "$template" $variables
```
## Performance Optimization
### Caching Results
```bash
# Cache prompt results
cache_dir="prompt-cache"
mkdir -p "$cache_dir"
invoke_with_cache() {
local template="$1"
shift
local args="$@"
local cache_key=$(echo "$template-$args" | md5sum | cut -d' ' -f1)
local cache_file="$cache_dir/$cache_key.txt"
if [ -f "$cache_file" ]; then
echo "Cache hit"
cat "$cache_file"
else
echo "Cache miss, invoking prompt..."
tg-invoke-prompt "$template" "$@" | tee "$cache_file"
fi
}
```
### Parallel Processing
```bash
# Process multiple items in parallel
input_files=(file1.txt file2.txt file3.txt)
for file in "${input_files[@]}"; do
(
echo "Processing $file..."
tg-invoke-prompt analyze \
text="$(cat "$file")" \
filename="$file" > "result-$file.json"
) &
done
wait
```
## Use Cases
### Document Processing
```bash
# Extract metadata from documents
tg-invoke-prompt extract-metadata \
document="$(cat document.pdf)" \
fields="title,author,date,keywords"
# Generate document summaries
tg-invoke-prompt summarize \
text="$(cat report.txt)" \
audience="executives" \
key_points="5"
```
### Code Analysis
```bash
# Security analysis
tg-invoke-prompt security-review \
code="$(cat webapp.py)" \
framework="flask" \
focus="injection,authentication"
# Performance optimization suggestions
tg-invoke-prompt optimize \
code="$(cat slow-function.js)" \
language="javascript" \
target="performance"
```
### Data Analysis
```bash
# Generate insights from data
tg-invoke-prompt insights \
data="$(cat metrics.json)" \
timeframe="monthly" \
focus="trends,anomalies"
# Create data visualizations
tg-invoke-prompt visualize \
data="$(cat sales-data.csv)" \
chart_type="line" \
metrics="revenue,growth"
```
### Content Generation
```bash
# Generate marketing copy
tg-invoke-prompt marketing \
product="AI Assistant" \
audience="developers" \
tone="professional,friendly"
# Create technical documentation
tg-invoke-prompt document \
code="$(cat api.py)" \
format="markdown" \
sections="overview,examples,parameters"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-prompts`](tg-show-prompts.md) - List available prompt templates
- [`tg-set-prompt`](tg-set-prompt.md) - Create/update prompt templates
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Document-based question answering
- [`tg-show-flows`](tg-show-flows.md) - List available flow instances
## API Integration
This command uses the prompt service API to process templates and generate responses using configured language models.
## Best Practices
1. **Template Reuse**: Create reusable templates for common tasks
2. **Variable Validation**: Validate required variables before execution
3. **Error Handling**: Implement proper error handling for production use
4. **Caching**: Cache results for repeated operations
5. **Documentation**: Document custom templates and their expected variables
6. **Security**: Avoid embedding sensitive data in templates
7. **Performance**: Use appropriate flow instances for different workloads
## Troubleshooting
### Template Not Found
```bash
# Check available templates
tg-show-prompts
# Verify template name spelling
tg-show-prompts | grep "template-name"
```
### Variable Errors
```bash
# Check template definition for required variables
tg-show-prompts | grep -A 10 "template-name"
# Validate variable format
echo "variable=value" | grep "="
```
### Flow Issues
```bash
# Check flow status
tg-show-flows | grep "flow-id"
# Verify flow has prompt service
tg-get-flow-class -n "flow-class" | jq '.interfaces.prompt'
```

View file

@ -0,0 +1,568 @@
# tg-load-doc-embeds
Loads document embeddings from MessagePack format into TrustGraph processing pipelines.
## Synopsis
```bash
tg-load-doc-embeds -i INPUT_FILE [options]
```
## Description
The `tg-load-doc-embeds` command loads document embeddings from MessagePack files into a running TrustGraph system. This is typically used to restore previously saved document embeddings or to load embeddings generated by external systems.
The command reads document embedding data in MessagePack format and streams it to TrustGraph's document embeddings import API via WebSocket connections.
## Options
### Required Arguments
- `-i, --input-file FILE`: Input MessagePack file containing document embeddings
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_API` or `http://localhost:8088/`)
- `-f, --flow-id ID`: Flow instance ID to use (default: `default`)
- `--format FORMAT`: Input format - `msgpack` or `json` (default: `msgpack`)
- `--user USER`: Override user ID from input data
- `--collection COLLECTION`: Override collection ID from input data
## Examples
### Basic Loading
```bash
tg-load-doc-embeds -i document-embeddings.msgpack
```
### Load with Custom Flow
```bash
tg-load-doc-embeds \
-i embeddings.msgpack \
-f "document-processing-flow"
```
### Override User and Collection
```bash
tg-load-doc-embeds \
-i embeddings.msgpack \
--user "research-team" \
--collection "research-docs"
```
### Load from JSON Format
```bash
tg-load-doc-embeds \
-i embeddings.json \
--format json
```
### Production Loading
```bash
tg-load-doc-embeds \
-i production-embeddings.msgpack \
-u https://trustgraph-api.company.com/ \
-f "production-flow" \
--user "system" \
--collection "production-docs"
```
## Input Data Format
### MessagePack Structure
Document embeddings are stored as MessagePack records with this structure:
```json
["de", {
"m": {
"i": "document-id",
"m": [{"metadata": "objects"}],
"u": "user-id",
"c": "collection-id"
},
"c": [{
"c": "text chunk content",
"v": [0.1, 0.2, 0.3, ...]
}]
}]
```
### Components
- **Document Metadata** (`m`):
- `i`: Document ID
- `m`: Document metadata objects
- `u`: User ID
- `c`: Collection ID
- **Chunks** (`c`): Array of text chunks with embeddings:
- `c`: Text content of the chunk
- `v`: Vector embedding array
## Use Cases
### Backup Restoration
```bash
# Restore document embeddings from backup
restore_embeddings() {
local backup_file="$1"
local target_collection="$2"
echo "Restoring document embeddings from: $backup_file"
if [ ! -f "$backup_file" ]; then
echo "Backup file not found: $backup_file"
return 1
fi
# Verify backup file
if tg-dump-msgpack -i "$backup_file" --summary | grep -q "Vector dimension:"; then
echo "✓ Backup file contains embeddings"
else
echo "✗ Backup file does not contain valid embeddings"
return 1
fi
# Load embeddings
tg-load-doc-embeds \
-i "$backup_file" \
--collection "$target_collection"
echo "Embedding restoration complete"
}
# Restore from backup
restore_embeddings "backup-20231215.msgpack" "restored-docs"
```
### Data Migration
```bash
# Migrate embeddings between environments
migrate_embeddings() {
local source_file="$1"
local target_env="$2"
local target_user="$3"
echo "Migrating embeddings to: $target_env"
# Load to target environment
tg-load-doc-embeds \
-i "$source_file" \
-u "https://$target_env/api/" \
--user "$target_user" \
--collection "migrated-docs"
echo "Migration complete"
}
# Migrate to production
migrate_embeddings "dev-embeddings.msgpack" "prod.company.com" "migration-user"
```
### Batch Processing
```bash
# Load multiple embedding files
batch_load_embeddings() {
local input_dir="$1"
local collection="$2"
echo "Batch loading embeddings from: $input_dir"
for file in "$input_dir"/*.msgpack; do
if [ -f "$file" ]; then
echo "Loading: $(basename "$file")"
tg-load-doc-embeds \
-i "$file" \
--collection "$collection"
if [ $? -eq 0 ]; then
echo "✓ Loaded: $(basename "$file")"
else
echo "✗ Failed: $(basename "$file")"
fi
fi
done
echo "Batch loading complete"
}
# Load all embeddings
batch_load_embeddings "embeddings/" "batch-processed"
```
### Incremental Loading
```bash
# Load new embeddings incrementally
incremental_load() {
local embeddings_dir="$1"
local processed_log="processed_embeddings.log"
# Create log if it doesn't exist
touch "$processed_log"
for file in "$embeddings_dir"/*.msgpack; do
if [ -f "$file" ]; then
# Check if already processed
if grep -q "$(basename "$file")" "$processed_log"; then
echo "Skipping already processed: $(basename "$file")"
continue
fi
echo "Processing new file: $(basename "$file")"
if tg-load-doc-embeds -i "$file"; then
echo "$(date): $(basename "$file")" >> "$processed_log"
echo "✓ Processed: $(basename "$file")"
else
echo "✗ Failed: $(basename "$file")"
fi
fi
done
}
# Run incremental loading
incremental_load "embeddings/"
```
## Advanced Usage
### Parallel Loading
```bash
# Load multiple files in parallel
parallel_load_embeddings() {
local files=("$@")
local max_parallel=3
local current_jobs=0
for file in "${files[@]}"; do
# Wait if max parallel jobs reached
while [ $current_jobs -ge $max_parallel ]; do
wait -n # Wait for any job to complete
current_jobs=$((current_jobs - 1))
done
# Start loading in background
(
echo "Loading: $file"
tg-load-doc-embeds -i "$file"
echo "Completed: $file"
) &
current_jobs=$((current_jobs + 1))
done
# Wait for all remaining jobs
wait
echo "All parallel loading completed"
}
# Load files in parallel
embedding_files=(embeddings1.msgpack embeddings2.msgpack embeddings3.msgpack)
parallel_load_embeddings "${embedding_files[@]}"
```
### Validation and Loading
```bash
# Validate before loading
validate_and_load() {
local file="$1"
local collection="$2"
echo "Validating embedding file: $file"
# Check file exists and is readable
if [ ! -r "$file" ]; then
echo "Error: Cannot read file $file"
return 1
fi
# Validate MessagePack structure
if ! tg-dump-msgpack -i "$file" --summary > /dev/null 2>&1; then
echo "Error: Invalid MessagePack format"
return 1
fi
# Check for document embeddings
if ! tg-dump-msgpack -i "$file" | grep -q '^\["de"'; then
echo "Error: No document embeddings found"
return 1
fi
# Get embedding statistics
summary=$(tg-dump-msgpack -i "$file" --summary)
vector_dim=$(echo "$summary" | grep "Vector dimension:" | awk '{print $3}')
if [ -n "$vector_dim" ]; then
echo "✓ Found embeddings with dimension: $vector_dim"
else
echo "Warning: Could not determine vector dimension"
fi
# Load embeddings
echo "Loading validated embeddings..."
tg-load-doc-embeds -i "$file" --collection "$collection"
echo "Loading complete"
}
# Validate and load
validate_and_load "embeddings.msgpack" "validated-docs"
```
### Progress Monitoring
```bash
# Monitor loading progress
monitor_loading() {
local file="$1"
local log_file="loading_progress.log"
# Start loading in background
tg-load-doc-embeds -i "$file" > "$log_file" 2>&1 &
local load_pid=$!
echo "Monitoring loading progress (PID: $load_pid)..."
# Monitor progress
while kill -0 $load_pid 2>/dev/null; do
if [ -f "$log_file" ]; then
# Extract progress from log
embeddings_count=$(grep -o "Document embeddings:.*[0-9]" "$log_file" | tail -1 | awk '{print $3}')
if [ -n "$embeddings_count" ]; then
echo "Progress: $embeddings_count embeddings loaded"
fi
fi
sleep 5
done
# Check final status
wait $load_pid
if [ $? -eq 0 ]; then
echo "✓ Loading completed successfully"
else
echo "✗ Loading failed"
cat "$log_file"
fi
rm "$log_file"
}
# Monitor loading
monitor_loading "large-embeddings.msgpack"
```
### Data Transformation
```bash
# Transform embeddings during loading
transform_and_load() {
local input_file="$1"
local output_file="transformed-$(basename "$input_file")"
local new_user="$2"
local new_collection="$3"
echo "Transforming embeddings: user=$new_user, collection=$new_collection"
# This would require a transformation script
# For now, we'll show the concept
# Load with override parameters
tg-load-doc-embeds \
-i "$input_file" \
--user "$new_user" \
--collection "$new_collection"
echo "Transformation and loading complete"
}
# Transform during loading
transform_and_load "original.msgpack" "new-user" "new-collection"
```
## Performance Optimization
### Memory Management
```bash
# Monitor memory usage during loading
monitor_memory_usage() {
local file="$1"
echo "Starting memory-monitored loading..."
# Start loading in background
tg-load-doc-embeds -i "$file" &
local load_pid=$!
# Monitor memory usage
while kill -0 $load_pid 2>/dev/null; do
memory_usage=$(ps -p $load_pid -o rss= 2>/dev/null | awk '{print $1/1024}')
if [ -n "$memory_usage" ]; then
echo "Memory usage: ${memory_usage}MB"
fi
sleep 10
done
wait $load_pid
echo "Loading completed"
}
```
### Chunked Loading
```bash
# Load large files in chunks
chunked_load() {
local large_file="$1"
local chunk_size=1000 # Records per chunk
echo "Loading large file in chunks: $large_file"
# Split the MessagePack file (this would need special tooling)
# For demonstration, assuming we have pre-split files
for chunk in "${large_file%.msgpack}"_chunk_*.msgpack; do
if [ -f "$chunk" ]; then
echo "Loading chunk: $(basename "$chunk")"
tg-load-doc-embeds -i "$chunk"
# Add delay between chunks to reduce system load
sleep 2
fi
done
echo "Chunked loading complete"
}
```
## Error Handling
### File Not Found
```bash
Exception: [Errno 2] No such file or directory
```
**Solution**: Verify file path and ensure the MessagePack file exists.
### Invalid Format
```bash
Exception: Unpack failed
```
**Solution**: Verify the file is a valid MessagePack file with document embeddings.
### WebSocket Connection Issues
```bash
Exception: Connection failed
```
**Solution**: Check API URL and ensure TrustGraph is running with WebSocket support.
### Memory Errors
```bash
MemoryError: Unable to allocate memory
```
**Solution**: Process large files in smaller chunks or increase available memory.
### Flow Not Found
```bash
Exception: Flow not found
```
**Solution**: Verify the flow ID exists with `tg-show-flows`.
## Integration with Other Commands
### Complete Workflow
```bash
# Complete document processing workflow
process_documents_workflow() {
local pdf_dir="$1"
local embeddings_file="embeddings.msgpack"
echo "Starting complete document workflow..."
# 1. Load PDFs
for pdf in "$pdf_dir"/*.pdf; do
tg-load-pdf "$pdf"
done
# 2. Wait for processing
sleep 30
# 3. Save embeddings
tg-save-doc-embeds -o "$embeddings_file"
# 4. Process embeddings (example: load to different collection)
tg-load-doc-embeds -i "$embeddings_file" --collection "processed-docs"
echo "Complete workflow finished"
}
```
### Backup and Restore
```bash
# Complete backup and restore cycle
backup_restore_cycle() {
local backup_file="embeddings-backup.msgpack"
echo "Creating embeddings backup..."
tg-save-doc-embeds -o "$backup_file"
echo "Simulating data loss..."
# (In real scenario, this might be system failure)
echo "Restoring from backup..."
tg-load-doc-embeds -i "$backup_file" --collection "restored"
echo "Backup/restore cycle complete"
}
```
## Environment Variables
- `TRUSTGRAPH_API`: Default API URL
## Related Commands
- [`tg-save-doc-embeds`](tg-save-doc-embeds.md) - Save document embeddings to MessagePack
- [`tg-dump-msgpack`](tg-dump-msgpack.md) - Analyze MessagePack files
- [`tg-load-pdf`](tg-load-pdf.md) - Load PDF documents for processing
- [`tg-show-flows`](tg-show-flows.md) - List available flows
## API Integration
This command uses TrustGraph's WebSocket API for document embeddings import, specifically the `/api/v1/flow/{flow-id}/import/document-embeddings` endpoint.
## Best Practices
1. **Validation**: Always validate MessagePack files before loading
2. **Backups**: Keep backups of original embedding files
3. **Monitoring**: Monitor memory usage and loading progress
4. **Chunking**: Process large files in manageable chunks
5. **Error Handling**: Implement robust error handling and retry logic
6. **Documentation**: Document the source and format of embedding files
7. **Testing**: Test loading procedures in non-production environments
## Troubleshooting
### Loading Stalls
```bash
# Check WebSocket connection
netstat -an | grep :8088
# Check system resources
free -h
df -h
```
### Incomplete Loading
```bash
# Compare input vs loaded data
input_count=$(tg-dump-msgpack -i input.msgpack | grep '^\["de"' | wc -l)
echo "Input embeddings: $input_count"
# Check loaded data (would need query command)
# loaded_count=$(tg-query-embeddings --count)
# echo "Loaded embeddings: $loaded_count"
```
### Performance Issues
```bash
# Monitor network usage
iftop
# Check TrustGraph service logs
docker logs trustgraph-service
```

313
docs/cli/tg-load-kg-core.md Normal file
View file

@ -0,0 +1,313 @@
# tg-load-kg-core
Loads a stored knowledge core into a processing flow for active use.
## Synopsis
```bash
tg-load-kg-core --id CORE_ID [options]
```
## Description
The `tg-load-kg-core` command loads a previously stored knowledge core into an active processing flow, making the knowledge available for queries, reasoning, and other AI operations. This is different from storing knowledge cores - this command makes stored knowledge active and accessible within a specific flow context.
Once loaded, the knowledge core's RDF triples and graph embeddings become available for Graph RAG queries, agent reasoning, and other knowledge-based operations within the specified flow.
## Options
### Required Arguments
- `--id, --identifier CORE_ID`: Identifier of the knowledge core to load
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
- `-f, --flow-id FLOW`: Flow ID to load knowledge into (default: `default`)
- `-c, --collection COLLECTION`: Collection identifier (default: `default`)
## Examples
### Load Knowledge Core into Default Flow
```bash
tg-load-kg-core --id "research-knowledge-v1"
```
### Load into Specific Flow
```bash
tg-load-kg-core \
--id "medical-knowledge" \
--flow-id "medical-analysis" \
--user researcher
```
### Load with Custom Collection
```bash
tg-load-kg-core \
--id "legal-documents" \
--flow-id "legal-flow" \
--collection "law-firm-data"
```
### Using Custom API URL
```bash
tg-load-kg-core \
--id "production-knowledge" \
--flow-id "prod-flow" \
-u http://production:8088/
```
## Prerequisites
### Knowledge Core Must Exist
The knowledge core must be stored in the system:
```bash
# Check available knowledge cores
tg-show-kg-cores
# Store knowledge core if needed
tg-put-kg-core --id "my-knowledge" -i knowledge.msgpack
```
### Flow Must Be Running
The target flow must be active:
```bash
# Check running flows
tg-show-flows
# Start flow if needed
tg-start-flow -n "my-class" -i "my-flow" -d "Knowledge processing flow"
```
## Loading Process
1. **Validation**: Verifies knowledge core exists and flow is running
2. **Knowledge Retrieval**: Retrieves RDF triples and graph embeddings
3. **Flow Integration**: Makes knowledge available within flow context
4. **Index Building**: Creates searchable indexes for efficient querying
5. **Service Activation**: Enables knowledge-based services in the flow
## What Gets Loaded
### RDF Triples
- Subject-predicate-object relationships
- Entity definitions and properties
- Factual knowledge and assertions
- Metadata and provenance information
### Graph Embeddings
- Vector representations of entities
- Semantic similarity data
- Neural network-compatible formats
- Machine learning-ready representations
## Knowledge Availability
Once loaded, knowledge becomes available through:
### Graph RAG Queries
```bash
tg-invoke-graph-rag \
-q "What information is available about AI research?" \
-f my-flow
```
### Agent Interactions
```bash
tg-invoke-agent \
-q "Tell me about the loaded knowledge" \
-f my-flow
```
### Direct Triple Queries
```bash
tg-show-graph -f my-flow
```
## Output
Successful loading typically produces no output, but knowledge becomes queryable:
```bash
# Load knowledge (no output expected)
tg-load-kg-core --id "research-knowledge"
# Verify loading by querying
tg-show-graph | head -10
```
## Error Handling
### Knowledge Core Not Found
```bash
Exception: Knowledge core 'invalid-core' not found
```
**Solution**: Check available cores with `tg-show-kg-cores` and verify the core ID.
### Flow Not Found
```bash
Exception: Flow 'invalid-flow' not found
```
**Solution**: Verify the flow exists and is running with `tg-show-flows`.
### Permission Errors
```bash
Exception: Access denied to knowledge core
```
**Solution**: Verify user permissions for the specified knowledge core.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
### Resource Errors
```bash
Exception: Insufficient memory to load knowledge core
```
**Solution**: Check system resources or try loading smaller knowledge cores.
## Knowledge Core Management
### Loading Workflow
```bash
# 1. Check available knowledge
tg-show-kg-cores
# 2. Ensure flow is running
tg-show-flows
# 3. Load knowledge into flow
tg-load-kg-core --id "my-knowledge" --flow-id "my-flow"
# 4. Verify knowledge is accessible
tg-invoke-graph-rag -q "What knowledge is loaded?" -f my-flow
```
### Multiple Knowledge Cores
```bash
# Load multiple cores for comprehensive knowledge
tg-load-kg-core --id "core-1" --flow-id "research-flow"
tg-load-kg-core --id "core-2" --flow-id "research-flow"
tg-load-kg-core --id "core-3" --flow-id "research-flow"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-kg-cores`](tg-show-kg-cores.md) - List available knowledge cores
- [`tg-put-kg-core`](tg-put-kg-core.md) - Store knowledge core in system
- [`tg-unload-kg-core`](tg-unload-kg-core.md) - Remove knowledge from flow
- [`tg-show-graph`](tg-show-graph.md) - View loaded knowledge triples
- [`tg-invoke-graph-rag`](tg-invoke-graph-rag.md) - Query loaded knowledge
## API Integration
This command uses the [Knowledge API](../apis/api-knowledge.md) with the `load-kg-core` operation to make stored knowledge active within flows.
## Use Cases
### Research Analysis
```bash
# Load research knowledge for analysis
tg-load-kg-core \
--id "research-papers-2024" \
--flow-id "research-analysis" \
--collection "academic-research"
# Query the research knowledge
tg-invoke-graph-rag \
-q "What are the main research trends in AI?" \
-f research-analysis
```
### Domain-Specific Processing
```bash
# Load medical knowledge for healthcare analysis
tg-load-kg-core \
--id "medical-terminology" \
--flow-id "healthcare-nlp" \
--user medical-team
```
### Multi-Domain Knowledge
```bash
# Load knowledge from multiple domains
tg-load-kg-core --id "technical-specs" --flow-id "analysis-flow"
tg-load-kg-core --id "business-data" --flow-id "analysis-flow"
tg-load-kg-core --id "market-research" --flow-id "analysis-flow"
```
### Development and Testing
```bash
# Load test knowledge for development
tg-load-kg-core \
--id "test-knowledge" \
--flow-id "dev-flow" \
--user developer
```
### Production Processing
```bash
# Load production knowledge
tg-load-kg-core \
--id "production-kb-v2.1" \
--flow-id "production-flow" \
--collection "live-data"
```
## Performance Considerations
### Loading Time
- Large knowledge cores may take time to load
- Loading includes indexing for efficient querying
- Multiple cores can be loaded incrementally
### Memory Usage
- Knowledge cores consume memory proportional to their size
- Monitor system resources when loading large cores
- Consider flow capacity when loading multiple cores
### Query Performance
- Loaded knowledge enables faster query responses
- Pre-built indexes improve search performance
- Multiple cores may impact query speed
## Best Practices
1. **Pre-Loading**: Load knowledge cores before intensive querying
2. **Resource Planning**: Monitor memory usage with large knowledge cores
3. **Flow Management**: Use dedicated flows for specific knowledge domains
4. **Version Control**: Load specific knowledge core versions for reproducibility
5. **Testing**: Verify knowledge loading with simple queries
6. **Documentation**: Document which knowledge cores are loaded in which flows
## Knowledge Loading Strategy
### Single Domain
```bash
# Load focused knowledge for specific tasks
tg-load-kg-core --id "specialized-domain" --flow-id "domain-flow"
```
### Multi-Domain
```bash
# Load comprehensive knowledge for broad analysis
tg-load-kg-core --id "general-knowledge" --flow-id "general-flow"
tg-load-kg-core --id "domain-specific" --flow-id "general-flow"
```
### Incremental Loading
```bash
# Load knowledge incrementally as needed
tg-load-kg-core --id "base-knowledge" --flow-id "analysis-flow"
# ... perform some analysis ...
tg-load-kg-core --id "additional-knowledge" --flow-id "analysis-flow"
```

480
docs/cli/tg-load-pdf.md Normal file
View file

@ -0,0 +1,480 @@
# tg-load-pdf
Loads PDF documents into TrustGraph for processing and analysis.
## Synopsis
```bash
tg-load-pdf [options] file1.pdf [file2.pdf ...]
```
## Description
The `tg-load-pdf` command loads PDF documents into TrustGraph by directing them to the PDF decoder service. The command extracts content, generates document metadata, and makes the documents available for processing by other TrustGraph services.
Each PDF is assigned a unique identifier based on its content hash, and comprehensive metadata can be attached including copyright information, publication details, and keywords.
**Note**: Consider using `tg-add-library-document` followed by `tg-start-library-processing` for more comprehensive document management.
## Options
### Required Arguments
- `files`: One or more PDF files to load
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-f, --flow-id ID`: Flow instance ID to use (default: `default`)
- `-U, --user USER`: User ID for document ownership (default: `trustgraph`)
- `-C, --collection COLLECTION`: Collection to assign document (default: `default`)
### Document Metadata
- `--name NAME`: Document name/title
- `--description DESCRIPTION`: Document description
- `--identifier ID`: Custom document identifier
- `--document-url URL`: Source URL for the document
- `--keyword KEYWORD`: Document keywords (can be specified multiple times)
### Copyright Information
- `--copyright-notice NOTICE`: Copyright notice text
- `--copyright-holder HOLDER`: Copyright holder name
- `--copyright-year YEAR`: Copyright year
- `--license LICENSE`: Copyright license
### Publication Details
- `--publication-organization ORG`: Publishing organization
- `--publication-description DESC`: Publication description
- `--publication-date DATE`: Publication date
## Examples
### Basic PDF Loading
```bash
tg-load-pdf document.pdf
```
### Multiple Files
```bash
tg-load-pdf report1.pdf report2.pdf manual.pdf
```
### With Basic Metadata
```bash
tg-load-pdf \
--name "Technical Manual" \
--description "System administration guide" \
--keyword "technical" --keyword "manual" \
technical-manual.pdf
```
### Complete Metadata
```bash
tg-load-pdf \
--name "Annual Report 2023" \
--description "Company annual financial report" \
--copyright-holder "Acme Corporation" \
--copyright-year "2023" \
--license "All Rights Reserved" \
--publication-organization "Acme Corporation" \
--publication-date "2023-12-31" \
--keyword "financial" --keyword "annual" --keyword "report" \
annual-report-2023.pdf
```
### Custom Flow and Collection
```bash
tg-load-pdf \
-f "document-processing-flow" \
-U "finance-team" \
-C "financial-documents" \
--name "Budget Analysis" \
budget-2024.pdf
```
## Document Processing
### Content Extraction
The PDF loader:
1. Calculates SHA256 hash for unique document ID
2. Extracts text content from PDF
3. Preserves document structure and formatting metadata
4. Generates searchable text index
### Metadata Generation
Document metadata includes:
- **Document ID**: SHA256 hash-based unique identifier
- **Content Hash**: For duplicate detection
- **File Information**: Size, format, creation date
- **Custom Metadata**: User-provided attributes
### Integration with Processing Pipeline
```bash
# Load PDF and start processing
tg-load-pdf research-paper.pdf --name "AI Research Paper"
# Check processing status
tg-show-flows | grep "document-processing"
# Query loaded content
tg-invoke-document-rag -q "What is the main conclusion?" -C "default"
```
## Error Handling
### File Not Found
```bash
Exception: [Errno 2] No such file or directory: 'missing.pdf'
```
**Solution**: Verify file path and ensure PDF exists.
### Invalid PDF Format
```bash
Exception: PDF parsing failed: Invalid PDF structure
```
**Solution**: Verify PDF is not corrupted and is a valid PDF file.
### Permission Errors
```bash
Exception: [Errno 13] Permission denied: 'protected.pdf'
```
**Solution**: Check file permissions and ensure read access.
### Flow Not Found
```bash
Exception: Flow instance 'invalid-flow' not found
```
**Solution**: Verify flow ID exists with `tg-show-flows`.
### API Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph is running.
## Advanced Usage
### Batch Processing
```bash
# Process all PDFs in directory
for pdf in *.pdf; do
echo "Loading $pdf..."
tg-load-pdf \
--name "$(basename "$pdf" .pdf)" \
--collection "research-papers" \
"$pdf"
done
```
### Organized Loading
```bash
# Load with structured metadata
categories=("technical" "financial" "legal")
for category in "${categories[@]}"; do
for pdf in "$category"/*.pdf; do
if [ -f "$pdf" ]; then
tg-load-pdf \
--collection "$category-documents" \
--keyword "$category" \
--name "$(basename "$pdf" .pdf)" \
"$pdf"
fi
done
done
```
### CSV-Driven Loading
```bash
# Load PDFs with metadata from CSV
# Format: filename,title,description,keywords
while IFS=',' read -r filename title description keywords; do
if [ -f "$filename" ]; then
echo "Loading $filename..."
# Convert comma-separated keywords to multiple --keyword args
keyword_args=""
IFS='|' read -ra KEYWORDS <<< "$keywords"
for kw in "${KEYWORDS[@]}"; do
keyword_args="$keyword_args --keyword \"$kw\""
done
eval "tg-load-pdf \
--name \"$title\" \
--description \"$description\" \
$keyword_args \
\"$filename\""
fi
done < documents.csv
```
### Publication Processing
```bash
# Load academic papers with publication details
load_academic_paper() {
local file="$1"
local title="$2"
local authors="$3"
local journal="$4"
local year="$5"
tg-load-pdf \
--name "$title" \
--description "Academic paper: $title" \
--copyright-holder "$authors" \
--copyright-year "$year" \
--publication-organization "$journal" \
--publication-date "$year-01-01" \
--keyword "academic" --keyword "research" \
"$file"
}
# Usage
load_academic_paper "ai-paper.pdf" "AI in Healthcare" "Smith et al." "AI Journal" "2023"
```
## Monitoring and Validation
### Load Status Checking
```bash
# Check document loading progress
check_load_status() {
local file="$1"
local expected_name="$2"
echo "Checking load status for: $file"
# Check if document appears in library
if tg-show-library-documents | grep -q "$expected_name"; then
echo "✓ Document loaded successfully"
else
echo "✗ Document not found in library"
return 1
fi
}
# Monitor batch loading
for pdf in *.pdf; do
name=$(basename "$pdf" .pdf)
check_load_status "$pdf" "$name"
done
```
### Content Verification
```bash
# Verify PDF content is accessible
verify_pdf_content() {
local pdf_name="$1"
local test_query="$2"
echo "Verifying content for: $pdf_name"
# Try to query the document
result=$(tg-invoke-document-rag -q "$test_query" -C "default" 2>/dev/null)
if [ $? -eq 0 ] && [ -n "$result" ]; then
echo "✓ Content accessible via RAG"
else
echo "✗ Content not accessible"
return 1
fi
}
# Verify loaded documents
verify_pdf_content "Technical Manual" "What is the installation process?"
```
## Performance Optimization
### Parallel Loading
```bash
# Load multiple PDFs in parallel
pdf_files=(document1.pdf document2.pdf document3.pdf)
for pdf in "${pdf_files[@]}"; do
(
echo "Loading $pdf in background..."
tg-load-pdf \
--name "$(basename "$pdf" .pdf)" \
--collection "batch-$(date +%Y%m%d)" \
"$pdf"
) &
done
wait
echo "All PDFs loaded"
```
### Size-Based Processing
```bash
# Process files based on size
for pdf in *.pdf; do
size=$(stat -c%s "$pdf")
if [ $size -lt 10485760 ]; then # < 10MB
echo "Processing small file: $pdf"
tg-load-pdf --collection "small-docs" "$pdf"
else
echo "Processing large file: $pdf"
tg-load-pdf --collection "large-docs" "$pdf"
fi
done
```
## Document Organization
### Collection Management
```bash
# Organize by document type
organize_by_type() {
local pdf="$1"
local filename=$(basename "$pdf" .pdf)
case "$filename" in
*manual*|*guide*) collection="manuals" ;;
*report*|*analysis*) collection="reports" ;;
*spec*|*specification*) collection="specifications" ;;
*legal*|*contract*) collection="legal" ;;
*) collection="general" ;;
esac
tg-load-pdf \
--collection "$collection" \
--name "$filename" \
"$pdf"
}
# Process all PDFs
for pdf in *.pdf; do
organize_by_type "$pdf"
done
```
### Metadata Standardization
```bash
# Apply consistent metadata standards
standardize_metadata() {
local pdf="$1"
local dept="$2"
local year="$3"
local name=$(basename "$pdf" .pdf)
local collection="$dept-$(date +%Y)"
tg-load-pdf \
--name "$name" \
--description "$dept document from $year" \
--copyright-holder "Company Name" \
--copyright-year "$year" \
--collection "$collection" \
--keyword "$dept" --keyword "$year" \
"$pdf"
}
# Usage
standardize_metadata "finance-report.pdf" "finance" "2023"
```
## Integration with Other Services
### Library Integration
```bash
# Alternative approach using library services
load_via_library() {
local pdf="$1"
local name="$2"
# Add to library first
tg-add-library-document \
--name "$name" \
--file "$pdf" \
--collection "documents"
# Start processing
tg-start-library-processing \
--collection "documents"
}
```
### Workflow Integration
```bash
# Complete document workflow
process_document_workflow() {
local pdf="$1"
local name="$2"
echo "Starting document workflow for: $name"
# 1. Load PDF
tg-load-pdf --name "$name" "$pdf"
# 2. Wait for processing
sleep 5
# 3. Verify availability
if tg-show-library-documents | grep -q "$name"; then
echo "Document available in library"
# 4. Test RAG functionality
tg-invoke-document-rag -q "What is this document about?"
# 5. Extract key information
tg-invoke-prompt extract-key-points \
text="Document: $name" \
format="bullet_points"
else
echo "Document processing failed"
fi
}
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-add-library-document`](tg-add-library-document.md) - Add documents to library
- [`tg-start-library-processing`](tg-start-library-processing.md) - Process library documents
- [`tg-show-library-documents`](tg-show-library-documents.md) - List library documents
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Query document content
- [`tg-show-flows`](tg-show-flows.md) - Monitor processing flows
## API Integration
This command uses the document loading API to process PDF files and make them available for text extraction, search, and analysis.
## Best Practices
1. **Metadata Completeness**: Provide comprehensive metadata for better organization
2. **Collection Organization**: Use logical collections for document categorization
3. **Error Handling**: Implement robust error handling for batch operations
4. **Performance**: Consider file sizes and processing capacity
5. **Monitoring**: Verify successful loading and processing
6. **Security**: Ensure sensitive documents are properly protected
7. **Backup**: Maintain backups of source PDFs
## Troubleshooting
### PDF Processing Issues
```bash
# Check PDF validity
file document.pdf
pdfinfo document.pdf
# Try alternative PDF processors
qpdf --check document.pdf
```
### Memory Issues
```bash
# For large PDFs, monitor memory usage
free -h
# Consider processing large files separately
```
### Content Extraction Problems
```bash
# Verify PDF contains extractable text
pdftotext document.pdf test-output.txt
cat test-output.txt | head -20
```

View file

@ -0,0 +1,567 @@
# tg-load-sample-documents
Loads predefined sample documents into TrustGraph library for testing and demonstration purposes.
## Synopsis
```bash
tg-load-sample-documents [options]
```
## Description
The `tg-load-sample-documents` command loads a curated set of sample documents into TrustGraph's document library. These documents include academic papers, government reports, and reference materials that demonstrate TrustGraph's capabilities and provide data for testing and evaluation.
The command downloads documents from public sources and adds them to the library with comprehensive metadata including RDF triples for semantic relationships.
## Options
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User ID for document ownership (default: `trustgraph`)
## Examples
### Basic Loading
```bash
tg-load-sample-documents
```
### Load with Custom User
```bash
tg-load-sample-documents -U "demo-user"
```
### Load to Custom Environment
```bash
tg-load-sample-documents -u http://demo.trustgraph.ai:8088/
```
## Sample Documents
The command loads the following sample documents:
### 1. NASA Challenger Report
- **Title**: Report of the Presidential Commission on the Space Shuttle Challenger Accident, Volume 1
- **Topics**: Safety engineering, space shuttle, NASA
- **Format**: PDF
- **Source**: NASA Technical Reports Server
- **Use Case**: Demonstrates technical document processing and safety analysis
### 2. Old Icelandic Dictionary
- **Title**: A Concise Dictionary of Old Icelandic
- **Topics**: Language, linguistics, Old Norse, grammar
- **Format**: PDF
- **Publication**: 1910, Clarendon Press
- **Use Case**: Historical document processing and linguistic analysis
### 3. US Intelligence Threat Assessment
- **Title**: Annual Threat Assessment of the U.S. Intelligence Community - March 2025
- **Topics**: National security, cyberthreats, geopolitics
- **Format**: PDF
- **Source**: Director of National Intelligence
- **Use Case**: Current affairs analysis and security research
### 4. Intelligence and State Policy
- **Title**: The Role of Intelligence and State Policies in International Security
- **Topics**: Intelligence, international security, state policy
- **Format**: PDF (sample)
- **Publication**: Cambridge Scholars Publishing, 2021
- **Use Case**: Academic research and policy analysis
### 5. Globalization and Intelligence
- **Title**: Beyond the Vigilant State: Globalisation and Intelligence
- **Topics**: Intelligence, globalization, security studies
- **Format**: PDF
- **Author**: Richard J. Aldrich
- **Use Case**: Academic paper analysis and research
## Use Cases
### Demo Environment Setup
```bash
# Set up demonstration environment
setup_demo_environment() {
echo "Setting up TrustGraph demo environment..."
# Initialize system
tg-init-trustgraph
# Load sample documents
echo "Loading sample documents..."
tg-load-sample-documents -U "demo"
# Wait for processing
echo "Waiting for document processing..."
sleep 60
# Start document processing
echo "Starting document processing..."
tg-show-library-documents -U "demo" | \
grep "| id" | \
awk '{print $3}' | \
while read doc_id; do
proc_id="demo_proc_$(date +%s)_${doc_id}"
tg-start-library-processing -d "$doc_id" --id "$proc_id" -U "demo"
done
echo "Demo environment ready!"
echo "Try: tg-invoke-document-rag -q 'What caused the Challenger accident?' -U demo"
}
```
### Testing Data Pipeline
```bash
# Test complete document processing pipeline
test_document_pipeline() {
echo "Testing document processing pipeline..."
# Load sample documents
tg-load-sample-documents -U "test"
# List loaded documents
echo "Loaded documents:"
tg-show-library-documents -U "test"
# Start processing for each document
tg-show-library-documents -U "test" | \
grep "| id" | \
awk '{print $3}' | \
while read doc_id; do
echo "Processing document: $doc_id"
proc_id="test_$(date +%s)_${doc_id}"
tg-start-library-processing -d "$doc_id" --id "$proc_id" -U "test"
done
# Wait for processing
echo "Processing documents... (this may take several minutes)"
sleep 300
# Test document queries
echo "Testing document queries..."
test_queries=(
"What is the Challenger accident?"
"What is Old Icelandic?"
"What are the main cybersecurity threats?"
"What is intelligence policy?"
)
for query in "${test_queries[@]}"; do
echo "Query: $query"
tg-invoke-document-rag -q "$query" -U "test" | head -5
echo "---"
done
echo "Pipeline test complete!"
}
```
### Educational Environment
```bash
# Set up educational/training environment
setup_educational_environment() {
local class_name="$1"
echo "Setting up educational environment for: $class_name"
# Create user for the class
class_user=$(echo "$class_name" | tr '[:upper:]' '[:lower:]' | tr ' ' '-')
# Load sample documents for the class
tg-load-sample-documents -U "$class_user"
# Process documents
echo "Processing documents for educational use..."
tg-show-library-documents -U "$class_user" | \
grep "| id" | \
awk '{print $3}' | \
while read doc_id; do
proc_id="edu_$(date +%s)_${doc_id}"
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
-U "$class_user" \
--collection "education"
done
echo "Educational environment ready for: $class_name"
echo "User: $class_user"
echo "Collection: education"
}
# Set up for different classes
setup_educational_environment "AI Research Methods"
setup_educational_environment "Security Studies"
```
### Benchmarking and Performance Testing
```bash
# Benchmark document processing performance
benchmark_processing() {
echo "Starting document processing benchmark..."
# Load sample documents
start_time=$(date +%s)
tg-load-sample-documents -U "benchmark"
load_time=$(date +%s)
echo "Document loading time: $((load_time - start_time))s"
# Count documents
doc_count=$(tg-show-library-documents -U "benchmark" | grep -c "| id")
echo "Documents loaded: $doc_count"
# Start processing
processing_ids=()
tg-show-library-documents -U "benchmark" | \
grep "| id" | \
awk '{print $3}' | \
while read doc_id; do
proc_id="bench_$(date +%s)_${doc_id}"
processing_ids+=("$proc_id")
tg-start-library-processing -d "$doc_id" --id "$proc_id" -U "benchmark"
done
processing_start=$(date +%s)
# Monitor processing completion
echo "Monitoring processing completion..."
while true; do
active_processing=$(tg-show-flows | grep -c "bench_" || echo "0")
if [ "$active_processing" -eq 0 ]; then
break
fi
echo "Active processing jobs: $active_processing"
sleep 30
done
processing_end=$(date +%s)
echo "Processing completion time: $((processing_end - processing_start))s"
echo "Total benchmark time: $((processing_end - start_time))s"
# Test query performance
echo "Testing query performance..."
query_start=$(date +%s)
for i in {1..10}; do
tg-invoke-document-rag \
-q "What are the main topics in these documents?" \
-U "benchmark" > /dev/null
done
query_end=$(date +%s)
echo "Average query time: $(echo "scale=2; ($query_end - $query_start) / 10" | bc)s"
}
```
## Advanced Usage
### Selective Document Loading
```bash
# Load only specific types of documents
load_by_category() {
local category="$1"
case "$category" in
"government")
echo "Loading government documents..."
# This would require modifying the script to load selectively
# For now, we load all and filter by tags later
tg-load-sample-documents -U "gov-docs"
;;
"academic")
echo "Loading academic documents..."
tg-load-sample-documents -U "academic-docs"
;;
"historical")
echo "Loading historical documents..."
tg-load-sample-documents -U "historical-docs"
;;
*)
echo "Loading all sample documents..."
tg-load-sample-documents
;;
esac
}
# Load by category
load_by_category "government"
load_by_category "academic"
```
### Multi-Environment Loading
```bash
# Load sample documents to multiple environments
multi_environment_setup() {
local environments=("dev" "staging" "demo")
for env in "${environments[@]}"; do
echo "Setting up $env environment..."
tg-load-sample-documents \
-u "http://$env.trustgraph.company.com:8088/" \
-U "sample-data"
echo "✓ $env environment loaded"
done
echo "All environments loaded with sample documents"
}
```
### Custom Document Sets
```bash
# Create custom document loading scripts based on the sample
create_custom_loader() {
local domain="$1"
cat > "load-${domain}-documents.py" << 'EOF'
#!/usr/bin/env python3
"""
Custom document loader for specific domain
Based on tg-load-sample-documents
"""
import argparse
import os
from trustgraph.api import Api
# Define your own document set here
documents = [
{
"id": "https://example.com/doc/custom-1",
"title": "Custom Document 1",
"url": "https://example.com/docs/custom1.pdf",
# Add your document definitions...
}
]
# Rest of the implementation similar to tg-load-sample-documents
EOF
echo "Custom loader created: load-${domain}-documents.py"
}
# Create custom loaders for different domains
create_custom_loader "medical"
create_custom_loader "legal"
create_custom_loader "technical"
```
## Document Analysis
### Content Analysis
```bash
# Analyze loaded sample documents
analyze_sample_documents() {
echo "Analyzing sample documents..."
# Get document statistics
total_docs=$(tg-show-library-documents | grep -c "| id")
echo "Total documents: $total_docs"
# Analyze by type
echo "Document types:"
tg-show-library-documents | \
grep "| kind" | \
awk '{print $3}' | \
sort | uniq -c
# Analyze tags
echo "Popular tags:"
tg-show-library-documents | \
grep "| tags" | \
sed 's/.*| tags.*| \(.*\) |.*/\1/' | \
tr ',' '\n' | \
sed 's/^ *//;s/ *$//' | \
sort | uniq -c | sort -nr | head -10
# Document sizes (would need additional API)
echo "Document analysis complete"
}
```
### Query Testing
```bash
# Test sample documents with various queries
test_sample_queries() {
echo "Testing sample document queries..."
# Define test queries for different domains
queries=(
"What caused the Challenger space shuttle accident?"
"What is Old Norse language?"
"What are current cybersecurity threats?"
"How does globalization affect intelligence services?"
"What are the main security challenges in international relations?"
)
for query in "${queries[@]}"; do
echo "Testing query: $query"
echo "===================="
result=$(tg-invoke-document-rag -q "$query" 2>/dev/null)
if [ $? -eq 0 ]; then
echo "$result" | head -3
echo "✓ Query successful"
else
echo "✗ Query failed"
fi
echo ""
done
}
```
## Error Handling
### Network Issues
```bash
Exception: Connection failed during download
```
**Solution**: Check internet connectivity and retry. Documents are cached locally after first download.
### Insufficient Storage
```bash
Exception: No space left on device
```
**Solution**: Free up disk space. Sample documents total approximately 50-100MB.
### API Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Verify TrustGraph API is running and accessible.
### Processing Failures
```bash
Exception: Document processing failed
```
**Solution**: Check TrustGraph service logs and ensure all components are running.
## Monitoring and Validation
### Loading Progress
```bash
# Monitor sample document loading
monitor_sample_loading() {
echo "Starting sample document loading with monitoring..."
# Start loading in background
tg-load-sample-documents &
load_pid=$!
# Monitor progress
while kill -0 $load_pid 2>/dev/null; do
doc_count=$(tg-show-library-documents 2>/dev/null | grep -c "| id" || echo "0")
echo "Documents loaded so far: $doc_count"
sleep 10
done
wait $load_pid
if [ $? -eq 0 ]; then
final_count=$(tg-show-library-documents | grep -c "| id")
echo "✓ Loading completed successfully"
echo "Total documents loaded: $final_count"
else
echo "✗ Loading failed"
fi
}
```
### Validation
```bash
# Validate sample document loading
validate_sample_loading() {
echo "Validating sample document loading..."
# Expected document count (based on current sample set)
expected_docs=5
# Check actual count
actual_docs=$(tg-show-library-documents | grep -c "| id")
if [ "$actual_docs" -eq "$expected_docs" ]; then
echo "✓ Document count correct: $actual_docs"
else
echo "⚠ Document count mismatch: expected $expected_docs, got $actual_docs"
fi
# Check for expected documents
expected_titles=(
"Challenger"
"Icelandic"
"Intelligence"
"Threat Assessment"
"Vigilant State"
)
for title in "${expected_titles[@]}"; do
if tg-show-library-documents | grep -q "$title"; then
echo "✓ Found document containing: $title"
else
echo "✗ Missing document containing: $title"
fi
done
echo "Validation complete"
}
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-library-documents`](tg-show-library-documents.md) - List loaded documents
- [`tg-start-library-processing`](tg-start-library-processing.md) - Process loaded documents
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Query processed documents
- [`tg-load-pdf`](tg-load-pdf.md) - Load individual PDF documents
## API Integration
This command uses the [Library API](../apis/api-librarian.md) to add sample documents to TrustGraph's document repository.
## Best Practices
1. **Demo Preparation**: Use for setting up demonstration environments
2. **Testing**: Ideal for testing document processing pipelines
3. **Education**: Excellent for training and educational purposes
4. **Development**: Use in development environments for consistent test data
5. **Benchmarking**: Suitable for performance testing and optimization
6. **Documentation**: Great for documenting TrustGraph capabilities
## Troubleshooting
### Download Failures
```bash
# Check document URLs are accessible
curl -I "https://ntrs.nasa.gov/api/citations/19860015255/downloads/19860015255.pdf"
# Check local cache
ls -la doc-cache/
```
### Processing Issues
```bash
# Check document processing status
tg-show-library-processing
# Verify documents are in library
tg-show-library-documents | grep -E "(Challenger|Icelandic|Intelligence)"
```
### Performance Problems
```bash
# Monitor system resources during loading
top
df -h
```

211
docs/cli/tg-load-text.md Normal file
View file

@ -0,0 +1,211 @@
# tg-load-text
Loads text documents into TrustGraph processing pipelines with rich metadata support.
## Synopsis
```bash
tg-load-text [options] file1 [file2 ...]
```
## Description
The `tg-load-text` command loads text documents into TrustGraph for processing. It creates a SHA256 hash-based document ID and supports comprehensive metadata including copyright information, publication details, and keywords.
**Note**: Consider using `tg-add-library-document` followed by `tg-start-library-processing` for better document management and processing control.
## Options
### Connection & Flow
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-f, --flow-id FLOW`: Flow ID for processing (default: `default`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
- `-C, --collection COLLECTION`: Collection identifier (default: `default`)
### Document Metadata
- `--name NAME`: Document name/title
- `--description DESCRIPTION`: Document description
- `--document-url URL`: Document source URL
### Copyright Information
- `--copyright-notice NOTICE`: Copyright notice text
- `--copyright-holder HOLDER`: Copyright holder name
- `--copyright-year YEAR`: Copyright year
- `--license LICENSE`: Copyright license
### Publication Information
- `--publication-organization ORG`: Publishing organization
- `--publication-description DESC`: Publication description
- `--publication-date DATE`: Publication date
### Keywords
- `--keyword KEYWORD [KEYWORD ...]`: Document keywords (can specify multiple)
## Arguments
- `file1 [file2 ...]`: One or more text files to load
## Examples
### Basic Document Loading
```bash
tg-load-text document.txt
```
### Loading with Metadata
```bash
tg-load-text \
--name "Research Paper on AI" \
--description "Comprehensive study of machine learning algorithms" \
--keyword "AI" "machine learning" "research" \
research-paper.txt
```
### Complete Metadata Example
```bash
tg-load-text \
--name "TrustGraph Documentation" \
--description "Complete user guide for TrustGraph system" \
--copyright-holder "TrustGraph Project" \
--copyright-year "2024" \
--license "MIT" \
--publication-organization "TrustGraph Foundation" \
--publication-date "2024-01-15" \
--keyword "documentation" "guide" "tutorial" \
--flow-id research-flow \
trustgraph-guide.txt
```
### Multiple Files
```bash
tg-load-text chapter1.txt chapter2.txt chapter3.txt
```
### Custom Flow and Collection
```bash
tg-load-text \
--flow-id medical-research \
--user researcher \
--collection medical-papers \
medical-study.txt
```
## Output
For each file processed, the command outputs:
### Success
```
document.txt: Loaded successfully.
```
### Failure
```
document.txt: Failed: Connection refused
```
## Document Processing
1. **File Reading**: Reads the text file content
2. **Hash Generation**: Creates SHA256 hash for unique document ID
3. **URI Creation**: Converts hash to document URI format
4. **Metadata Assembly**: Combines all metadata into RDF triples
5. **API Submission**: Sends to TrustGraph via Text Load API
## Document ID Generation
Documents are assigned IDs based on their content hash:
- SHA256 hash of file content
- Converted to TrustGraph document URI format
- Example: `http://trustgraph.ai/d/abc123...`
## Metadata Format
The metadata is stored as RDF triples including:
### Standard Properties
- `dc:title`: Document name
- `dc:description`: Document description
- `dc:creator`: Copyright holder
- `dc:date`: Publication date
- `dc:rights`: Copyright notice
- `dc:license`: License information
### Keywords
- `dc:subject`: Each keyword as separate triple
### Organization Information
- `foaf:Organization`: Publication organization details
## Error Handling
### File Errors
```bash
document.txt: Failed: No such file or directory
```
**Solution**: Verify the file path exists and is readable.
### Connection Errors
```bash
document.txt: Failed: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
### Flow Errors
```bash
document.txt: Failed: Invalid flow
```
**Solution**: Verify the flow exists and is running using `tg-show-flows`.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-add-library-document`](tg-add-library-document.md) - Add documents to library (recommended)
- [`tg-load-pdf`](tg-load-pdf.md) - Load PDF documents
- [`tg-show-library-documents`](tg-show-library-documents.md) - List loaded documents
- [`tg-start-library-processing`](tg-start-library-processing.md) - Start document processing
## API Integration
This command uses the [Text Load API](../apis/api-text-load.md) to submit documents for processing. The text content is base64-encoded for transmission.
## Use Cases
### Academic Research
```bash
tg-load-text \
--name "Climate Change Impact Study" \
--publication-organization "University Research Center" \
--keyword "climate" "research" "environment" \
climate-study.txt
```
### Corporate Documentation
```bash
tg-load-text \
--name "Product Manual" \
--copyright-holder "Acme Corp" \
--license "Proprietary" \
--keyword "manual" "product" "guide" \
product-manual.txt
```
### Technical Documentation
```bash
tg-load-text \
--name "API Reference" \
--description "Complete API documentation" \
--keyword "API" "reference" "technical" \
api-docs.txt
```
## Best Practices
1. **Use Descriptive Names**: Provide clear document names and descriptions
2. **Add Keywords**: Include relevant keywords for better searchability
3. **Complete Metadata**: Fill in copyright and publication information
4. **Batch Processing**: Load multiple related files together
5. **Use Collections**: Organize documents by topic or project using collections

505
docs/cli/tg-load-turtle.md Normal file
View file

@ -0,0 +1,505 @@
# tg-load-turtle
Loads RDF triples from Turtle files into the TrustGraph knowledge graph.
## Synopsis
```bash
tg-load-turtle -i DOCUMENT_ID [options] file1.ttl [file2.ttl ...]
```
## Description
The `tg-load-turtle` command loads RDF triples from Turtle (TTL) format files into TrustGraph's knowledge graph. It parses Turtle files, converts them to TrustGraph's internal triple format, and imports them using WebSocket connections for efficient batch processing.
The command supports retry logic and automatic reconnection to handle network interruptions during large data imports.
## Options
### Required Arguments
- `-i, --document-id ID`: Document ID to associate with the triples
- `files`: One or more Turtle files to load
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `ws://localhost:8088/`)
- `-f, --flow-id ID`: Flow instance ID to use (default: `default`)
- `-U, --user USER`: User ID for triple ownership (default: `trustgraph`)
- `-C, --collection COLLECTION`: Collection to assign triples (default: `default`)
## Examples
### Basic Turtle Loading
```bash
tg-load-turtle -i "doc123" knowledge-base.ttl
```
### Multiple Files
```bash
tg-load-turtle -i "ontology-v1" \
schema.ttl \
instances.ttl \
relationships.ttl
```
### Custom Flow and Collection
```bash
tg-load-turtle \
-i "research-data" \
-f "knowledge-import-flow" \
-U "research-team" \
-C "research-kg" \
research-triples.ttl
```
### Load with Custom API URL
```bash
tg-load-turtle \
-i "production-data" \
-u "ws://production:8088/" \
production-ontology.ttl
```
## Turtle Format Support
### Basic Triples
```turtle
@prefix ex: <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ex:Person rdf:type rdfs:Class .
ex:john rdf:type ex:Person .
ex:john ex:name "John Doe" .
ex:john ex:age "30"^^xsd:integer .
```
### Complex Structures
```turtle
@prefix org: <http://example.org/organization/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
org:TechCorp rdf:type foaf:Organization ;
foaf:name "Technology Corporation" ;
org:hasEmployee org:john, org:jane ;
org:foundedYear "2010"^^xsd:gYear .
org:john foaf:name "John Smith" ;
foaf:mbox <mailto:john@techcorp.com> ;
org:position "Software Engineer" .
```
### Ontology Loading
```turtle
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<http://example.org/ontology> rdf:type owl:Ontology ;
dc:title "Example Ontology" ;
dc:creator "Knowledge Team" .
ex:Vehicle rdf:type owl:Class ;
rdfs:label "Vehicle" ;
rdfs:comment "A means of transportation" .
ex:Car rdfs:subClassOf ex:Vehicle .
ex:Truck rdfs:subClassOf ex:Vehicle .
```
## Data Processing
### Triple Conversion
The loader converts Turtle triples to TrustGraph format:
- **URIs**: Converted to URI references with `is_uri=true`
- **Literals**: Converted to literal values with `is_uri=false`
- **Datatypes**: Preserved in literal values
### Batch Processing
- Triples are sent individually via WebSocket
- Each triple includes document metadata
- Automatic retry on connection failures
- Progress tracking for large files
### Error Handling
- Invalid Turtle syntax causes parsing errors
- Network interruptions trigger automatic retry
- Malformed triples are skipped with warnings
## Use Cases
### Ontology Import
```bash
# Load domain ontology
tg-load-turtle -i "healthcare-ontology" \
-C "ontologies" \
healthcare-schema.ttl
# Load instance data
tg-load-turtle -i "patient-data" \
-C "healthcare-data" \
patient-records.ttl
```
### Knowledge Base Migration
```bash
# Migrate from external knowledge base
tg-load-turtle -i "migration-$(date +%Y%m%d)" \
-C "migrated-data" \
exported-knowledge.ttl
```
### Research Data Loading
```bash
# Load research datasets
datasets=("publications" "authors" "citations")
for dataset in "${datasets[@]}"; do
tg-load-turtle -i "research-$dataset" \
-C "research-data" \
"$dataset.ttl"
done
```
### Structured Data Import
```bash
# Load structured data from various sources
tg-load-turtle -i "products" -C "catalog" product-catalog.ttl
tg-load-turtle -i "customers" -C "crm" customer-data.ttl
tg-load-turtle -i "orders" -C "transactions" order-history.ttl
```
## Advanced Usage
### Batch Processing Multiple Files
```bash
# Process all Turtle files in directory
for ttl in *.ttl; do
doc_id=$(basename "$ttl" .ttl)
echo "Loading $ttl as document $doc_id..."
tg-load-turtle -i "$doc_id" \
-C "bulk-import-$(date +%Y%m%d)" \
"$ttl"
done
```
### Parallel Loading
```bash
# Load multiple files in parallel
ttl_files=(schema.ttl instances.ttl relationships.ttl)
for ttl in "${ttl_files[@]}"; do
(
doc_id=$(basename "$ttl" .ttl)
echo "Loading $ttl in background..."
tg-load-turtle -i "parallel-$doc_id" \
-C "parallel-import" \
"$ttl"
) &
done
wait
echo "All files loaded"
```
### Size-Based Processing
```bash
# Handle large files differently
for ttl in *.ttl; do
size=$(stat -c%s "$ttl")
doc_id=$(basename "$ttl" .ttl)
if [ $size -lt 10485760 ]; then # < 10MB
echo "Processing small file: $ttl"
tg-load-turtle -i "$doc_id" -C "small-files" "$ttl"
else
echo "Processing large file: $ttl"
# Use dedicated collection for large files
tg-load-turtle -i "$doc_id" -C "large-files" "$ttl"
fi
done
```
### Validation and Loading
```bash
# Validate before loading
validate_and_load() {
local ttl_file="$1"
local doc_id="$2"
echo "Validating $ttl_file..."
# Check Turtle syntax
if rapper -q -i turtle "$ttl_file" > /dev/null 2>&1; then
echo "✓ Valid Turtle syntax"
# Count triples
triple_count=$(rapper -i turtle -c "$ttl_file" 2>/dev/null)
echo " Triples: $triple_count"
# Load if valid
echo "Loading $ttl_file..."
tg-load-turtle -i "$doc_id" -C "validated-data" "$ttl_file"
else
echo "✗ Invalid Turtle syntax in $ttl_file"
return 1
fi
}
# Validate and load all files
for ttl in *.ttl; do
doc_id=$(basename "$ttl" .ttl)
validate_and_load "$ttl" "$doc_id"
done
```
## Error Handling
### Invalid Turtle Syntax
```bash
Exception: Turtle parsing failed
```
**Solution**: Validate Turtle syntax with tools like `rapper` or `rdflib`.
### Document ID Required
```bash
Exception: Document ID is required
```
**Solution**: Provide document ID with `-i` option.
### WebSocket Connection Issues
```bash
Exception: WebSocket connection failed
```
**Solution**: Check API URL and ensure TrustGraph WebSocket service is running.
### File Not Found
```bash
Exception: [Errno 2] No such file or directory
```
**Solution**: Verify file paths and ensure Turtle files exist.
### Flow Not Found
```bash
Exception: Flow instance not found
```
**Solution**: Verify flow ID with `tg-show-flows`.
## Monitoring and Verification
### Load Progress Tracking
```bash
# Monitor loading progress
monitor_load() {
local ttl_file="$1"
local doc_id="$2"
echo "Starting load: $ttl_file"
start_time=$(date +%s)
tg-load-turtle -i "$doc_id" -C "monitored" "$ttl_file"
end_time=$(date +%s)
duration=$((end_time - start_time))
echo "Load completed in ${duration}s"
# Verify data is accessible
if tg-triples-query -s "http://example.org/test" > /dev/null 2>&1; then
echo "✓ Data accessible via query"
else
echo "✗ Data not accessible"
fi
}
```
### Data Verification
```bash
# Verify loaded triples
verify_triples() {
local collection="$1"
local expected_count="$2"
echo "Verifying triples in collection: $collection"
# Query for triples
actual_count=$(tg-triples-query -C "$collection" | wc -l)
if [ "$actual_count" -ge "$expected_count" ]; then
echo "✓ Expected triples found ($actual_count >= $expected_count)"
else
echo "✗ Missing triples ($actual_count < $expected_count)"
return 1
fi
}
```
### Content Analysis
```bash
# Analyze loaded content
analyze_turtle_content() {
local ttl_file="$1"
echo "Analyzing content: $ttl_file"
# Extract prefixes
echo "Prefixes:"
grep "^@prefix" "$ttl_file" | head -5
# Count statements
statement_count=$(grep -c "\." "$ttl_file")
echo "Statements: $statement_count"
# Extract subjects
echo "Sample subjects:"
grep -o "^[^[:space:]]*" "$ttl_file" | grep -v "^@" | sort | uniq | head -5
}
```
## Performance Optimization
### Connection Pooling
```bash
# Reuse WebSocket connections for multiple files
load_batch_optimized() {
local collection="$1"
shift
local files=("$@")
echo "Loading ${#files[@]} files to collection: $collection"
# Process files in batches to reuse connections
for ((i=0; i<${#files[@]}; i+=5)); do
batch=("${files[@]:$i:5}")
echo "Processing batch $((i/5 + 1))..."
for ttl in "${batch[@]}"; do
doc_id=$(basename "$ttl" .ttl)
tg-load-turtle -i "$doc_id" -C "$collection" "$ttl" &
done
wait
done
}
```
### Memory Management
```bash
# Handle large files with memory monitoring
load_with_memory_check() {
local ttl_file="$1"
local doc_id="$2"
# Check available memory
available=$(free -m | awk 'NR==2{print $7}')
if [ "$available" -lt 1000 ]; then
echo "Warning: Low memory ($available MB). Consider splitting file."
fi
# Monitor memory during load
tg-load-turtle -i "$doc_id" -C "memory-monitored" "$ttl_file" &
load_pid=$!
while kill -0 $load_pid 2>/dev/null; do
memory_usage=$(ps -p $load_pid -o rss= | awk '{print $1/1024}')
echo "Memory usage: ${memory_usage}MB"
sleep 5
done
}
```
## Data Preparation
### Turtle File Preparation
```bash
# Clean and prepare Turtle files
prepare_turtle() {
local input_file="$1"
local output_file="$2"
echo "Preparing $input_file -> $output_file"
# Remove comments and empty lines
sed '/^#/d; /^$/d' "$input_file" > "$output_file"
# Validate output
if rapper -q -i turtle "$output_file" > /dev/null 2>&1; then
echo "✓ Prepared file is valid"
else
echo "✗ Prepared file is invalid"
return 1
fi
}
```
### Data Splitting
```bash
# Split large Turtle files
split_turtle() {
local input_file="$1"
local lines_per_file="$2"
echo "Splitting $input_file into chunks of $lines_per_file lines"
# Split file
split -l "$lines_per_file" "$input_file" "$(basename "$input_file" .ttl)_part_"
# Add .ttl extension to parts
for part in $(basename "$input_file" .ttl)_part_*; do
mv "$part" "$part.ttl"
done
}
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL (WebSocket format)
## Related Commands
- [`tg-triples-query`](tg-triples-query.md) - Query loaded triples
- [`tg-graph-to-turtle`](tg-graph-to-turtle.md) - Export graph to Turtle format
- [`tg-show-flows`](tg-show-flows.md) - Monitor processing flows
- [`tg-load-pdf`](tg-load-pdf.md) - Load document content
## API Integration
This command uses TrustGraph's WebSocket-based triple import API for efficient batch loading of RDF data.
## Best Practices
1. **Validation**: Always validate Turtle syntax before loading
2. **Document IDs**: Use meaningful, unique document identifiers
3. **Collections**: Organize triples into logical collections
4. **Error Handling**: Implement retry logic for network issues
5. **Performance**: Consider file sizes and system resources
6. **Monitoring**: Track loading progress and verify results
7. **Backup**: Maintain backups of source Turtle files
## Troubleshooting
### WebSocket Connection Issues
```bash
# Test WebSocket connectivity
wscat -c ws://localhost:8088/api/v1/flow/default/import/triples
# Check WebSocket service status
tg-show-flows | grep -i websocket
```
### Parsing Errors
```bash
# Validate Turtle syntax
rapper -i turtle -q file.ttl
# Check for common issues
grep -n "^[[:space:]]*@prefix" file.ttl # Check prefixes
grep -n "\.$" file.ttl | head -5 # Check statement terminators
```
### Memory Issues
```bash
# Monitor memory usage
free -h
ps aux | grep tg-load-turtle
# Split large files if needed
split -l 10000 large-file.ttl chunk_
```

View file

@ -0,0 +1,406 @@
# tg-put-flow-class
Uploads or updates a flow class definition in TrustGraph.
## Synopsis
```bash
tg-put-flow-class -n CLASS_NAME -c CONFIG_JSON [options]
```
## Description
The `tg-put-flow-class` command creates or updates a flow class definition in TrustGraph. Flow classes are templates that define processing pipeline configurations, service interfaces, and resource requirements. These classes are used by `tg-start-flow` to create running flow instances.
Flow classes define the structure and capabilities of processing flows, including which services are available and how they connect to Pulsar queues.
## Options
### Required Arguments
- `-n, --class-name CLASS_NAME`: Name for the flow class
- `-c, --config CONFIG_JSON`: Flow class configuration as raw JSON string
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Basic Flow Class Creation
```bash
tg-put-flow-class \
-n "simple-processing" \
-c '{"description": "Simple text processing flow", "interfaces": {"text-completion": {"request": "non-persistent://tg/request/text-completion:simple", "response": "non-persistent://tg/response/text-completion:simple"}}}'
```
### Document Processing Flow Class
```bash
tg-put-flow-class \
-n "document-analysis" \
-c '{
"description": "Document analysis and RAG processing",
"interfaces": {
"document-rag": {
"request": "non-persistent://tg/request/document-rag:doc-analysis",
"response": "non-persistent://tg/response/document-rag:doc-analysis"
},
"text-load": "persistent://tg/flow/text-document-load:doc-analysis",
"document-load": "persistent://tg/flow/document-load:doc-analysis"
}
}'
```
### Loading from File
```bash
# Create configuration file
cat > research-flow.json << 'EOF'
{
"description": "Research analysis flow with multiple AI services",
"interfaces": {
"agent": {
"request": "non-persistent://tg/request/agent:research",
"response": "non-persistent://tg/response/agent:research"
},
"graph-rag": {
"request": "non-persistent://tg/request/graph-rag:research",
"response": "non-persistent://tg/response/graph-rag:research"
},
"document-rag": {
"request": "non-persistent://tg/request/document-rag:research",
"response": "non-persistent://tg/response/document-rag:research"
},
"embeddings": {
"request": "non-persistent://tg/request/embeddings:research",
"response": "non-persistent://tg/response/embeddings:research"
},
"text-load": "persistent://tg/flow/text-document-load:research",
"triples-store": "persistent://tg/flow/triples-store:research"
}
}
EOF
# Upload the flow class
tg-put-flow-class -n "research-analysis" -c "$(cat research-flow.json)"
```
### Update Existing Flow Class
```bash
# Modify existing flow class by adding new service
tg-put-flow-class \
-n "existing-flow" \
-c '{
"description": "Updated flow with new capabilities",
"interfaces": {
"text-completion": {
"request": "non-persistent://tg/request/text-completion:updated",
"response": "non-persistent://tg/response/text-completion:updated"
},
"prompt": {
"request": "non-persistent://tg/request/prompt:updated",
"response": "non-persistent://tg/response/prompt:updated"
}
}
}'
```
## Flow Class Configuration Format
### Required Fields
#### Description
```json
{
"description": "Human-readable description of the flow class"
}
```
#### Interfaces
```json
{
"interfaces": {
"service-name": "queue-definition-or-object"
}
}
```
### Interface Types
#### Request/Response Services
Services that accept requests and return responses:
```json
{
"service-name": {
"request": "pulsar-queue-url",
"response": "pulsar-queue-url"
}
}
```
Examples:
- `agent`
- `graph-rag`
- `document-rag`
- `text-completion`
- `prompt`
- `embeddings`
- `graph-embeddings`
- `triples`
#### Fire-and-Forget Services
Services that accept data without returning responses:
```json
{
"service-name": "pulsar-queue-url"
}
```
Examples:
- `text-load`
- `document-load`
- `triples-store`
- `graph-embeddings-store`
- `document-embeddings-store`
- `entity-contexts-load`
### Queue Naming Conventions
#### Request/Response Queues
```
non-persistent://tg/request/{service}:{flow-identifier}
non-persistent://tg/response/{service}:{flow-identifier}
```
#### Fire-and-Forget Queues
```
persistent://tg/flow/{service}:{flow-identifier}
```
## Complete Example
### Comprehensive Flow Class
```bash
tg-put-flow-class \
-n "full-processing-pipeline" \
-c '{
"description": "Complete document processing and analysis pipeline",
"interfaces": {
"agent": {
"request": "non-persistent://tg/request/agent:full-pipeline",
"response": "non-persistent://tg/response/agent:full-pipeline"
},
"graph-rag": {
"request": "non-persistent://tg/request/graph-rag:full-pipeline",
"response": "non-persistent://tg/response/graph-rag:full-pipeline"
},
"document-rag": {
"request": "non-persistent://tg/request/document-rag:full-pipeline",
"response": "non-persistent://tg/response/document-rag:full-pipeline"
},
"text-completion": {
"request": "non-persistent://tg/request/text-completion:full-pipeline",
"response": "non-persistent://tg/response/text-completion:full-pipeline"
},
"prompt": {
"request": "non-persistent://tg/request/prompt:full-pipeline",
"response": "non-persistent://tg/response/prompt:full-pipeline"
},
"embeddings": {
"request": "non-persistent://tg/request/embeddings:full-pipeline",
"response": "non-persistent://tg/response/embeddings:full-pipeline"
},
"graph-embeddings": {
"request": "non-persistent://tg/request/graph-embeddings:full-pipeline",
"response": "non-persistent://tg/response/graph-embeddings:full-pipeline"
},
"triples": {
"request": "non-persistent://tg/request/triples:full-pipeline",
"response": "non-persistent://tg/response/triples:full-pipeline"
},
"text-load": "persistent://tg/flow/text-document-load:full-pipeline",
"document-load": "persistent://tg/flow/document-load:full-pipeline",
"triples-store": "persistent://tg/flow/triples-store:full-pipeline",
"graph-embeddings-store": "persistent://tg/flow/graph-embeddings-store:full-pipeline",
"document-embeddings-store": "persistent://tg/flow/document-embeddings-store:full-pipeline",
"entity-contexts-load": "persistent://tg/flow/entity-contexts-load:full-pipeline"
}
}'
```
## Output
Successful upload typically produces no output:
```bash
# Upload flow class (no output expected)
tg-put-flow-class -n "my-flow" -c '{"description": "test", "interfaces": {}}'
# Verify upload
tg-show-flow-classes | grep "my-flow"
```
## Error Handling
### Invalid JSON Format
```bash
Exception: Invalid JSON in config parameter
```
**Solution**: Validate JSON syntax using tools like `jq` or online JSON validators.
### Missing Required Fields
```bash
Exception: Missing required field 'description'
```
**Solution**: Ensure configuration includes all required fields (description, interfaces).
### Invalid Queue Names
```bash
Exception: Invalid queue URL format
```
**Solution**: Verify queue URLs follow the correct Pulsar format with proper tenant/namespace.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
## Validation
### JSON Syntax Check
```bash
# Validate JSON before uploading
config='{"description": "test flow", "interfaces": {}}'
echo "$config" | jq . > /dev/null && echo "Valid JSON" || echo "Invalid JSON"
```
### Flow Class Verification
```bash
# After uploading, verify the flow class exists
tg-show-flow-classes | grep "my-flow-class"
# Get the flow class definition to verify content
tg-get-flow-class -n "my-flow-class"
```
## Flow Class Lifecycle
### Development Workflow
```bash
# 1. Create flow class
tg-put-flow-class -n "dev-flow" -c "$dev_config"
# 2. Test with flow instance
tg-start-flow -n "dev-flow" -i "test-instance" -d "Testing"
# 3. Update flow class as needed
tg-put-flow-class -n "dev-flow" -c "$updated_config"
# 4. Restart flow instance with updates
tg-stop-flow -i "test-instance"
tg-start-flow -n "dev-flow" -i "test-instance" -d "Testing updated"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-get-flow-class`](tg-get-flow-class.md) - Retrieve flow class definitions
- [`tg-show-flow-classes`](tg-show-flow-classes.md) - List available flow classes
- [`tg-delete-flow-class`](tg-delete-flow-class.md) - Remove flow class definitions
- [`tg-start-flow`](tg-start-flow.md) - Create flow instances from classes
## API Integration
This command uses the [Flow API](../apis/api-flow.md) with the `put-class` operation to store flow class definitions.
## Use Cases
### Custom Processing Pipelines
```bash
# Create specialized medical analysis flow
tg-put-flow-class -n "medical-nlp" -c "$medical_config"
```
### Development Environments
```bash
# Create lightweight development flow
tg-put-flow-class -n "dev-minimal" -c "$minimal_config"
```
### Production Deployments
```bash
# Create robust production flow with all services
tg-put-flow-class -n "production-full" -c "$production_config"
```
### Domain-Specific Workflows
```bash
# Create legal document analysis flow
tg-put-flow-class -n "legal-analysis" -c "$legal_config"
```
## Best Practices
1. **Descriptive Names**: Use clear, descriptive flow class names
2. **Comprehensive Descriptions**: Include detailed descriptions of flow capabilities
3. **Consistent Naming**: Follow consistent queue naming conventions
4. **Version Control**: Store flow class configurations in version control
5. **Testing**: Test flow classes thoroughly before production use
6. **Documentation**: Document flow class purposes and requirements
## Template Examples
### Minimal Flow Class
```json
{
"description": "Minimal text processing flow",
"interfaces": {
"text-completion": {
"request": "non-persistent://tg/request/text-completion:minimal",
"response": "non-persistent://tg/response/text-completion:minimal"
}
}
}
```
### RAG-Focused Flow Class
```json
{
"description": "Retrieval Augmented Generation flow",
"interfaces": {
"graph-rag": {
"request": "non-persistent://tg/request/graph-rag:rag-flow",
"response": "non-persistent://tg/response/graph-rag:rag-flow"
},
"document-rag": {
"request": "non-persistent://tg/request/document-rag:rag-flow",
"response": "non-persistent://tg/response/document-rag:rag-flow"
},
"embeddings": {
"request": "non-persistent://tg/request/embeddings:rag-flow",
"response": "non-persistent://tg/response/embeddings:rag-flow"
}
}
}
```
### Document Processing Flow Class
```json
{
"description": "Document ingestion and processing flow",
"interfaces": {
"text-load": "persistent://tg/flow/text-document-load:doc-proc",
"document-load": "persistent://tg/flow/document-load:doc-proc",
"triples-store": "persistent://tg/flow/triples-store:doc-proc",
"embeddings": {
"request": "non-persistent://tg/request/embeddings:doc-proc",
"response": "non-persistent://tg/response/embeddings:doc-proc"
}
}
}
```

241
docs/cli/tg-put-kg-core.md Normal file
View file

@ -0,0 +1,241 @@
# tg-put-kg-core
Stores a knowledge core in the TrustGraph system from MessagePack format.
## Synopsis
```bash
tg-put-kg-core --id CORE_ID -i INPUT_FILE [options]
```
## Description
The `tg-put-kg-core` command loads a knowledge core from a MessagePack-formatted file and stores it in the TrustGraph knowledge system. Knowledge cores contain RDF triples and graph embeddings that represent structured knowledge and can be loaded into flows for processing.
This command processes MessagePack files containing both triples (RDF knowledge) and graph embeddings (vector representations) and stores them via WebSocket connection to the Knowledge API.
## Options
### Required Arguments
- `--id, --identifier CORE_ID`: Unique identifier for the knowledge core
- `-i, --input INPUT_FILE`: Path to MessagePack input file
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `ws://localhost:8088/`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
## Examples
### Store Knowledge Core
```bash
tg-put-kg-core --id "research-core-v1" -i knowledge.msgpack
```
### With Custom User
```bash
tg-put-kg-core \
--id "medical-knowledge" \
-i medical-data.msgpack \
-U researcher
```
### Using Custom API URL
```bash
tg-put-kg-core \
--id "production-core" \
-i prod-knowledge.msgpack \
-u ws://production:8088/
```
## Input File Format
The input file must be in MessagePack format containing structured knowledge data:
### MessagePack Structure
The file contains tuples with type indicators:
#### Triple Data (`"t"`)
```python
("t", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"t": [ # triples array
{
"s": {"value": "subject", "is_uri": true},
"p": {"value": "predicate", "is_uri": true},
"o": {"value": "object", "is_uri": false}
}
]
})
```
#### Graph Embeddings Data (`"ge"`)
```python
("ge", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"e": [ # entities array
{
"e": {"value": "entity", "is_uri": true},
"v": [[0.1, 0.2, 0.3]] # vectors
}
]
})
```
## Processing Flow
1. **File Reading**: Opens MessagePack file for binary reading
2. **Message Unpacking**: Unpacks MessagePack tuples sequentially
3. **Type Processing**: Handles both triples (`"t"`) and graph embeddings (`"ge"`)
4. **WebSocket Transmission**: Sends each message via WebSocket to Knowledge API
5. **Response Handling**: Waits for confirmation of each message
6. **Progress Reporting**: Shows count of processed messages
## Output
The command reports the number of messages processed:
```bash
Put: 150 triple, 75 GE messages.
```
Where:
- **triple**: Number of triple data messages processed
- **GE**: Number of graph embedding messages processed
## Error Handling
### File Not Found
```bash
Exception: No such file or directory: 'missing.msgpack'
```
**Solution**: Verify the input file path exists and is readable.
### Invalid MessagePack Format
```bash
Exception: Unpacked unexpected message type 'x'
```
**Solution**: Ensure the input file is properly formatted MessagePack with correct type indicators.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Verify the API URL and ensure TrustGraph is running.
### Knowledge API Errors
```bash
Exception: Knowledge core operation failed
```
**Solution**: Check that the Knowledge API is available and the core ID is valid.
## File Creation
MessagePack files can be created using:
### Python Example
```python
import msgpack
# Create triples data
triples_msg = ("t", {
"m": {"i": "core-id", "m": [], "u": "user", "c": "default"},
"t": [
{
"s": {"value": "Person1", "is_uri": True},
"p": {"value": "hasName", "is_uri": True},
"o": {"value": "John Doe", "is_uri": False}
}
]
})
# Create embeddings data
embeddings_msg = ("ge", {
"m": {"i": "core-id", "m": [], "u": "user", "c": "default"},
"e": [
{
"e": {"value": "Person1", "is_uri": True},
"v": [[0.1, 0.2, 0.3, 0.4]]
}
]
})
# Write to file
with open("knowledge.msgpack", "wb") as f:
msgpack.pack(triples_msg, f)
msgpack.pack(embeddings_msg, f)
```
### Export from Existing Core
```bash
# Export existing core to MessagePack
tg-get-kg-core --id "existing-core" -o exported.msgpack
# Import to new core
tg-put-kg-core --id "new-core" -i exported.msgpack
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL (automatically converted to WebSocket format)
## Related Commands
- [`tg-get-kg-core`](tg-get-kg-core.md) - Retrieve knowledge core
- [`tg-load-kg-core`](tg-load-kg-core.md) - Load knowledge core into flow
- [`tg-show-kg-cores`](tg-show-kg-cores.md) - List available knowledge cores
- [`tg-delete-kg-core`](tg-delete-kg-core.md) - Remove knowledge core
- [`tg-dump-msgpack`](tg-dump-msgpack.md) - Debug MessagePack files
## API Integration
This command uses the [Knowledge API](../apis/api-knowledge.md) via WebSocket connection with `put-kg-core` operations to store knowledge data.
## Use Cases
### Knowledge Import
```bash
# Import knowledge from external systems
tg-put-kg-core --id "external-kb" -i imported-knowledge.msgpack
```
### Data Migration
```bash
# Migrate knowledge between environments
tg-get-kg-core --id "prod-core" -o backup.msgpack
tg-put-kg-core --id "dev-core" -i backup.msgpack
```
### Knowledge Versioning
```bash
# Store versioned knowledge cores
tg-put-kg-core --id "research-v2.0" -i research-updated.msgpack
```
### Batch Knowledge Loading
```bash
# Load multiple knowledge domains
tg-put-kg-core --id "medical-core" -i medical.msgpack
tg-put-kg-core --id "legal-core" -i legal.msgpack
tg-put-kg-core --id "technical-core" -i technical.msgpack
```
## Best Practices
1. **Unique IDs**: Use descriptive, unique identifiers for knowledge cores
2. **Versioning**: Include version information in core IDs
3. **Validation**: Verify MessagePack files before importing
4. **Backup**: Keep backup copies of important knowledge cores
5. **Documentation**: Document knowledge core contents and sources
6. **Testing**: Test imports with small datasets first

View file

@ -0,0 +1,530 @@
# tg-remove-library-document
Removes a document from the TrustGraph document library.
## Synopsis
```bash
tg-remove-library-document --id DOCUMENT_ID [options]
```
## Description
The `tg-remove-library-document` command permanently removes a document from TrustGraph's document library. This operation deletes the document metadata, content, and any associated processing records.
**⚠️ Warning**: This operation is permanent and cannot be undone. Ensure you have backups if the document data is important.
## Options
### Required Arguments
- `--identifier, --id ID`: Document ID to remove
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User ID (default: `trustgraph`)
## Examples
### Remove Single Document
```bash
tg-remove-library-document --id "doc_123456789"
```
### Remove with Custom User
```bash
tg-remove-library-document --id "doc_987654321" -U "research-team"
```
### Remove with Custom API URL
```bash
tg-remove-library-document --id "doc_555" -u http://staging:8088/
```
## Prerequisites
### Document Must Exist
Verify the document exists before attempting removal:
```bash
# List documents to find the ID
tg-show-library-documents
# Search for specific document
tg-show-library-documents | grep "doc_123456789"
```
### Check for Active Processing
Before removing a document, check if it's currently being processed:
```bash
# Check for active processing jobs
tg-show-flows | grep "processing"
# Stop any active processing first
# tg-stop-library-processing --id "processing_id"
```
## Use Cases
### Cleanup Old Documents
```bash
# Remove outdated documents
old_docs=("doc_old1" "doc_old2" "doc_deprecated")
for doc_id in "${old_docs[@]}"; do
echo "Removing $doc_id..."
tg-remove-library-document --id "$doc_id"
done
```
### Remove Test Documents
```bash
# Remove test documents after development
tg-show-library-documents | \
grep "test\|demo\|sample" | \
grep "| id" | \
awk '{print $3}' | \
while read doc_id; do
echo "Removing test document: $doc_id"
tg-remove-library-document --id "$doc_id"
done
```
### User-Specific Cleanup
```bash
# Remove all documents for a specific user
cleanup_user_documents() {
local user="$1"
echo "Removing all documents for user: $user"
# Get document IDs for the user
tg-show-library-documents -U "$user" | \
grep "| id" | \
awk '{print $3}' | \
while read doc_id; do
echo "Removing document: $doc_id"
tg-remove-library-document --id "$doc_id" -U "$user"
done
}
# Usage
cleanup_user_documents "temp-user"
```
### Conditional Removal
```bash
# Remove documents based on criteria
remove_by_criteria() {
local criteria="$1"
echo "Removing documents matching criteria: $criteria"
tg-show-library-documents | \
grep -B5 -A5 "$criteria" | \
grep "| id" | \
awk '{print $3}' | \
while read doc_id; do
# Confirm before removal
echo -n "Remove document $doc_id? (y/N): "
read confirm
if [[ "$confirm" =~ ^[Yy]$ ]]; then
tg-remove-library-document --id "$doc_id"
echo "Removed: $doc_id"
else
echo "Skipped: $doc_id"
fi
done
}
# Remove documents containing "draft" in title
remove_by_criteria "draft"
```
## Safety Procedures
### Backup Before Removal
```bash
# Create backup of document metadata before removal
backup_document() {
local doc_id="$1"
local backup_dir="document_backups/$(date +%Y%m%d)"
mkdir -p "$backup_dir"
echo "Backing up document: $doc_id"
# Get document metadata
tg-show-library-documents | \
grep -A10 -B2 "$doc_id" > "$backup_dir/$doc_id.metadata"
# Note: Actual document content backup would require additional API
echo "Backup saved: $backup_dir/$doc_id.metadata"
}
# Backup then remove
safe_remove() {
local doc_id="$1"
backup_document "$doc_id"
echo "Removing document: $doc_id"
tg-remove-library-document --id "$doc_id"
echo "Document removed: $doc_id"
}
# Usage
safe_remove "doc_123456789"
```
### Verification Script
```bash
#!/bin/bash
# safe-remove-document.sh
doc_id="$1"
user="${2:-trustgraph}"
if [ -z "$doc_id" ]; then
echo "Usage: $0 <document-id> [user]"
exit 1
fi
echo "Safety checks for removing document: $doc_id"
# Check if document exists
if ! tg-show-library-documents -U "$user" | grep -q "$doc_id"; then
echo "ERROR: Document '$doc_id' not found for user '$user'"
exit 1
fi
# Show document details
echo "Document details:"
tg-show-library-documents -U "$user" | grep -A10 -B2 "$doc_id"
# Check for active processing
echo "Checking for active processing..."
active_processing=$(tg-show-flows | grep -c "processing.*$doc_id" || echo "0")
if [ "$active_processing" -gt 0 ]; then
echo "WARNING: Document has $active_processing active processing jobs"
echo "Consider stopping processing first"
fi
# Confirm removal
echo ""
read -p "Are you sure you want to remove this document? (y/N): " confirm
if [ "$confirm" = "y" ] || [ "$confirm" = "Y" ]; then
echo "Removing document..."
tg-remove-library-document --id "$doc_id" -U "$user"
# Verify removal
if ! tg-show-library-documents -U "$user" | grep -q "$doc_id"; then
echo "Document removed successfully"
else
echo "ERROR: Document still exists after removal"
exit 1
fi
else
echo "Removal cancelled"
fi
```
### Bulk Removal with Confirmation
```bash
# Remove multiple documents with individual confirmation
bulk_remove_with_confirmation() {
local doc_list="$1"
if [ ! -f "$doc_list" ]; then
echo "Usage: $0 <file-with-document-ids>"
return 1
fi
echo "Bulk removal with confirmation"
echo "Document list: $doc_list"
echo "=============================="
while IFS= read -r doc_id; do
if [ -n "$doc_id" ]; then
# Show document info
echo -e "\nDocument ID: $doc_id"
tg-show-library-documents | grep -A5 -B1 "$doc_id" | grep -E "title|note|tags"
# Confirm removal
echo -n "Remove this document? (y/N/q): "
read confirm
case "$confirm" in
y|Y)
tg-remove-library-document --id "$doc_id"
echo "Removed: $doc_id"
;;
q|Q)
echo "Quitting bulk removal"
break
;;
*)
echo "Skipped: $doc_id"
;;
esac
fi
done < "$doc_list"
}
# Create list of documents to remove
echo -e "doc_123\ndoc_456\ndoc_789" > remove_list.txt
bulk_remove_with_confirmation "remove_list.txt"
```
## Advanced Usage
### Age-Based Removal
```bash
# Remove documents older than specified days
remove_old_documents() {
local days_old="$1"
local dry_run="${2:-false}"
if [ -z "$days_old" ]; then
echo "Usage: remove_old_documents <days> [dry_run]"
return 1
fi
cutoff_date=$(date -d "$days_old days ago" +"%Y-%m-%d")
echo "Removing documents older than $cutoff_date"
tg-show-library-documents | \
awk -v cutoff="$cutoff_date" -v dry="$dry_run" '
/^\| id/ { id = $3 }
/^\| time/ {
if ($3 < cutoff) {
if (dry == "true") {
print "Would remove: " id " (date: " $3 ")"
} else {
system("tg-remove-library-document --id " id)
print "Removed: " id " (date: " $3 ")"
}
}
}'
}
# Dry run first
remove_old_documents 90 true
# Actually remove
remove_old_documents 90 false
```
### Size-Based Cleanup
```bash
# Remove documents based on collection size limits
cleanup_by_collection_size() {
local max_docs="$1"
echo "Maintaining maximum $max_docs documents per user"
# Get unique users
users=$(tg-show-library-documents | grep "| id" | awk '{print $3}' | sort | uniq)
for user in $users; do
echo "Checking user: $user"
# Count documents for user
doc_count=$(tg-show-library-documents -U "$user" | grep -c "| id")
if [ "$doc_count" -gt "$max_docs" ]; then
excess=$((doc_count - max_docs))
echo "User $user has $doc_count documents (removing $excess oldest)"
# Get oldest documents (by time)
tg-show-library-documents -U "$user" | \
awk '
/^\| id/ { id = $3 }
/^\| time/ { print $3 " " id }
' | \
sort | \
head -n "$excess" | \
while read date doc_id; do
echo "Removing old document: $doc_id ($date)"
tg-remove-library-document --id "$doc_id" -U "$user"
done
else
echo "User $user has $doc_count documents (within limit)"
fi
done
}
# Maintain maximum 100 documents per user
cleanup_by_collection_size 100
```
### Pattern-Based Removal
```bash
# Remove documents matching specific patterns
remove_by_pattern() {
local pattern="$1"
local field="${2:-title}"
echo "Removing documents with '$pattern' in $field"
tg-show-library-documents | \
awk -v pattern="$pattern" -v field="$field" '
/^\| id/ { id = $3 }
/^\| title/ && field=="title" { if ($0 ~ pattern) print id }
/^\| note/ && field=="note" { if ($0 ~ pattern) print id }
/^\| tags/ && field=="tags" { if ($0 ~ pattern) print id }
' | \
while read doc_id; do
echo "Removing document: $doc_id"
tg-remove-library-document --id "$doc_id"
done
}
# Remove all test documents
remove_by_pattern "test" "title"
remove_by_pattern "temp" "tags"
```
## Error Handling
### Document Not Found
```bash
Exception: Document not found
```
**Solution**: Verify document ID exists with `tg-show-library-documents`.
### Permission Errors
```bash
Exception: Access denied
```
**Solution**: Check user permissions and document ownership.
### Active Processing
```bash
Exception: Cannot remove document with active processing
```
**Solution**: Stop processing with `tg-stop-library-processing` before removal.
### API Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph is running.
## Monitoring and Logging
### Removal Logging
```bash
# Log all removals
logged_remove() {
local doc_id="$1"
local log_file="document_removals.log"
timestamp=$(date)
echo "[$timestamp] Removing document: $doc_id" >> "$log_file"
# Get document info before removal
tg-show-library-documents | \
grep -A5 -B1 "$doc_id" >> "$log_file"
# Remove document
if tg-remove-library-document --id "$doc_id"; then
echo "[$timestamp] Successfully removed: $doc_id" >> "$log_file"
else
echo "[$timestamp] Failed to remove: $doc_id" >> "$log_file"
fi
echo "---" >> "$log_file"
}
# Usage
logged_remove "doc_123456789"
```
### Audit Trail
```bash
# Create audit trail for removals
create_removal_audit() {
local doc_id="$1"
local reason="$2"
local audit_file="removal_audit.csv"
# Create header if file doesn't exist
if [ ! -f "$audit_file" ]; then
echo "timestamp,document_id,user,reason,status" > "$audit_file"
fi
timestamp=$(date '+%Y-%m-%d %H:%M:%S')
user=$(whoami)
# Attempt removal
if tg-remove-library-document --id "$doc_id"; then
status="success"
else
status="failed"
fi
# Log to audit file
echo "$timestamp,$doc_id,$user,$reason,$status" >> "$audit_file"
}
# Usage
create_removal_audit "doc_123" "Outdated content"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-library-documents`](tg-show-library-documents.md) - List library documents
- [`tg-add-library-document`](tg-add-library-document.md) - Add documents to library
- [`tg-start-library-processing`](tg-start-library-processing.md) - Start document processing
- [`tg-stop-library-processing`](tg-stop-library-processing.md) - Stop document processing
## API Integration
This command uses the [Library API](../apis/api-librarian.md) to remove documents from the document repository.
## Best Practices
1. **Always Backup**: Create backups before removing important documents
2. **Verification**: Verify document existence before removal attempts
3. **Processing Check**: Ensure no active processing before removal
4. **Audit Trail**: Maintain logs of all removal operations
5. **Confirmation**: Use interactive confirmation for bulk operations
6. **Testing**: Test removal procedures in non-production environments
7. **Access Control**: Ensure appropriate permissions for removal operations
## Troubleshooting
### Document Still Exists After Removal
```bash
# Verify removal
tg-show-library-documents | grep "document-id"
# Check for caching issues
# Wait a moment and try again
# Verify API connectivity
curl -s "$TRUSTGRAPH_URL/api/v1/library/documents" > /dev/null
```
### Permission Issues
```bash
# Check user permissions
tg-show-library-documents -U "your-user" | grep "document-id"
# Verify user ownership of document
```
### Cannot Remove Due to References
```bash
# Check for document references in processing jobs
tg-show-flows | grep "document-id"
# Stop any referencing processes first
```

View file

@ -0,0 +1,609 @@
# tg-save-doc-embeds
Saves document embeddings from TrustGraph processing streams to MessagePack format files.
## Synopsis
```bash
tg-save-doc-embeds -o OUTPUT_FILE [options]
```
## Description
The `tg-save-doc-embeds` command connects to TrustGraph's document embeddings export stream and saves the embeddings to a file in MessagePack format. This is useful for creating backups of document embeddings, exporting data for analysis, or preparing data for migration between systems.
The command should typically be started before document processing begins to capture all embeddings as they are generated.
## Options
### Required Arguments
- `-o, --output-file FILE`: Output file for saved embeddings
### Optional Arguments
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_API` or `http://localhost:8088/`)
- `-f, --flow-id ID`: Flow instance ID to monitor (default: `default`)
- `--format FORMAT`: Output format - `msgpack` or `json` (default: `msgpack`)
- `--user USER`: Filter by user ID (default: no filter)
- `--collection COLLECTION`: Filter by collection ID (default: no filter)
## Examples
### Basic Document Embeddings Export
```bash
tg-save-doc-embeds -o document-embeddings.msgpack
```
### Export from Specific Flow
```bash
tg-save-doc-embeds \
-o research-embeddings.msgpack \
-f "research-processing-flow"
```
### Filter by User and Collection
```bash
tg-save-doc-embeds \
-o filtered-embeddings.msgpack \
--user "research-team" \
--collection "research-docs"
```
### Export to JSON Format
```bash
tg-save-doc-embeds \
-o embeddings.json \
--format json
```
### Production Backup
```bash
tg-save-doc-embeds \
-o "backup-$(date +%Y%m%d-%H%M%S).msgpack" \
-u https://production-api.company.com/ \
-f "production-flow"
```
## Output Format
### MessagePack Structure
Document embeddings are saved as MessagePack records:
```json
["de", {
"m": {
"i": "document-id",
"m": [{"metadata": "objects"}],
"u": "user-id",
"c": "collection-id"
},
"c": [{
"c": "text chunk content",
"v": [0.1, 0.2, 0.3, ...]
}]
}]
```
### Components
- **Record Type**: `"de"` indicates document embeddings
- **Metadata** (`m`): Document information and context
- **Chunks** (`c`): Text chunks with their vector embeddings
## Use Cases
### Backup Creation
```bash
# Create regular backups of document embeddings
create_embeddings_backup() {
local backup_dir="embeddings-backups"
local timestamp=$(date +%Y%m%d_%H%M%S)
local backup_file="$backup_dir/embeddings-$timestamp.msgpack"
mkdir -p "$backup_dir"
echo "Creating embeddings backup: $backup_file"
# Start backup process
tg-save-doc-embeds -o "$backup_file" &
save_pid=$!
echo "Backup process started (PID: $save_pid)"
echo "To stop: kill $save_pid"
echo "Backup file: $backup_file"
# Optionally wait for a specific duration
# sleep 3600 # Run for 1 hour
# kill $save_pid
}
# Create backup
create_embeddings_backup
```
### Data Migration Preparation
```bash
# Prepare embeddings for migration
prepare_migration_data() {
local source_env="$1"
local collection="$2"
local migration_file="migration-$(date +%Y%m%d).msgpack"
echo "Preparing migration data from: $source_env"
echo "Collection: $collection"
# Export embeddings from source
tg-save-doc-embeds \
-o "$migration_file" \
-u "http://$source_env:8088/" \
--collection "$collection" &
export_pid=$!
# Let it run for specified time to capture data
echo "Capturing embeddings for migration..."
echo "Process PID: $export_pid"
# In practice, you'd run this for the duration needed
# sleep 1800 # 30 minutes
# kill $export_pid
echo "Migration data will be saved to: $migration_file"
}
# Prepare migration from dev to production
prepare_migration_data "dev-server" "processed-docs"
```
### Continuous Export
```bash
# Continuous embeddings export with rotation
continuous_export() {
local output_dir="continuous-exports"
local rotation_hours=24
local file_prefix="embeddings"
mkdir -p "$output_dir"
while true; do
timestamp=$(date +%Y%m%d_%H%M%S)
output_file="$output_dir/${file_prefix}-${timestamp}.msgpack"
echo "Starting export to: $output_file"
# Start export for specified duration
timeout ${rotation_hours}h tg-save-doc-embeds -o "$output_file"
# Compress completed file
gzip "$output_file"
echo "Export completed and compressed: ${output_file}.gz"
# Optional: clean up old files
find "$output_dir" -name "*.msgpack.gz" -mtime +30 -delete
# Brief pause before next rotation
sleep 60
done
}
# Start continuous export (run in background)
continuous_export &
```
### Analysis and Research
```bash
# Export embeddings for research analysis
export_for_research() {
local research_topic="$1"
local output_file="research-${research_topic}-$(date +%Y%m%d).msgpack"
echo "Exporting embeddings for research: $research_topic"
# Start export with filtering
tg-save-doc-embeds \
-o "$output_file" \
--collection "$research_topic" &
export_pid=$!
echo "Research export started (PID: $export_pid)"
echo "Output: $output_file"
# Create analysis script
cat > "analyze-${research_topic}.sh" << EOF
#!/bin/bash
# Analysis script for $research_topic embeddings
echo "Analyzing $research_topic embeddings..."
# Basic statistics
echo "=== Basic Statistics ==="
tg-dump-msgpack -i "$output_file" --summary
# Detailed analysis
echo "=== Detailed Analysis ==="
tg-dump-msgpack -i "$output_file" | head -10
echo "Analysis complete for $research_topic"
EOF
chmod +x "analyze-${research_topic}.sh"
echo "Analysis script created: analyze-${research_topic}.sh"
}
# Export for different research topics
export_for_research "cybersecurity"
export_for_research "climate-change"
```
## Advanced Usage
### Selective Export
```bash
# Export embeddings with multiple filters
selective_export() {
local users=("user1" "user2" "user3")
local collections=("docs1" "docs2")
for user in "${users[@]}"; do
for collection in "${collections[@]}"; do
output_file="embeddings-${user}-${collection}.msgpack"
echo "Exporting for user: $user, collection: $collection"
tg-save-doc-embeds \
-o "$output_file" \
--user "$user" \
--collection "$collection" &
# Store PID for later management
echo $! > "${output_file}.pid"
done
done
echo "All selective exports started"
}
```
### Monitoring and Statistics
```bash
# Monitor export progress with statistics
monitor_export() {
local output_file="$1"
local pid_file="${output_file}.pid"
if [ ! -f "$pid_file" ]; then
echo "PID file not found: $pid_file"
return 1
fi
local export_pid=$(cat "$pid_file")
echo "Monitoring export (PID: $export_pid)..."
echo "Output file: $output_file"
while kill -0 "$export_pid" 2>/dev/null; do
if [ -f "$output_file" ]; then
file_size=$(stat -c%s "$output_file" 2>/dev/null || echo "0")
human_size=$(numfmt --to=iec-i --suffix=B "$file_size")
# Try to count embeddings
embedding_count=$(tg-dump-msgpack -i "$output_file" 2>/dev/null | grep -c '^\["de"' || echo "0")
echo "File size: $human_size, Embeddings: $embedding_count"
else
echo "Output file not yet created..."
fi
sleep 30
done
echo "Export process completed"
rm "$pid_file"
}
# Start export and monitor
tg-save-doc-embeds -o "monitored-export.msgpack" &
echo $! > "monitored-export.msgpack.pid"
monitor_export "monitored-export.msgpack"
```
### Export Validation
```bash
# Validate exported embeddings
validate_export() {
local export_file="$1"
echo "Validating export file: $export_file"
# Check file exists and has content
if [ ! -s "$export_file" ]; then
echo "✗ Export file is empty or missing"
return 1
fi
# Check MessagePack format
if tg-dump-msgpack -i "$export_file" --summary > /dev/null 2>&1; then
echo "✓ Valid MessagePack format"
else
echo "✗ Invalid MessagePack format"
return 1
fi
# Check for document embeddings
embedding_count=$(tg-dump-msgpack -i "$export_file" | grep -c '^\["de"' || echo "0")
if [ "$embedding_count" -gt 0 ]; then
echo "✓ Contains $embedding_count document embeddings"
else
echo "✗ No document embeddings found"
return 1
fi
# Get vector dimension information
summary=$(tg-dump-msgpack -i "$export_file" --summary)
if echo "$summary" | grep -q "Vector dimension:"; then
dimension=$(echo "$summary" | grep "Vector dimension:" | awk '{print $3}')
echo "✓ Vector dimension: $dimension"
else
echo "⚠ Could not determine vector dimension"
fi
echo "Validation completed successfully"
}
```
### Export Scheduling
```bash
# Scheduled export with cron-like functionality
schedule_export() {
local schedule="$1" # e.g., "daily", "hourly", "weekly"
local output_prefix="$2"
case "$schedule" in
"hourly")
interval=3600
;;
"daily")
interval=86400
;;
"weekly")
interval=604800
;;
*)
echo "Invalid schedule: $schedule"
return 1
;;
esac
echo "Starting $schedule exports with prefix: $output_prefix"
while true; do
timestamp=$(date +%Y%m%d_%H%M%S)
output_file="${output_prefix}-${timestamp}.msgpack"
echo "Starting scheduled export: $output_file"
# Run export for the scheduled interval
timeout ${interval}s tg-save-doc-embeds -o "$output_file"
# Validate and compress
if validate_export "$output_file"; then
gzip "$output_file"
echo "✓ Export completed and compressed: ${output_file}.gz"
else
echo "✗ Export validation failed: $output_file"
mv "$output_file" "${output_file}.failed"
fi
# Brief pause before next cycle
sleep 60
done
}
# Start daily scheduled exports
schedule_export "daily" "daily-embeddings" &
```
## Performance Considerations
### Memory Management
```bash
# Monitor memory usage during export
monitor_memory_export() {
local output_file="$1"
# Start export
tg-save-doc-embeds -o "$output_file" &
export_pid=$!
echo "Monitoring memory usage for export (PID: $export_pid)..."
while kill -0 "$export_pid" 2>/dev/null; do
memory_usage=$(ps -p "$export_pid" -o rss= 2>/dev/null | awk '{print $1/1024}')
if [ -n "$memory_usage" ]; then
echo "Memory usage: ${memory_usage}MB"
fi
sleep 10
done
echo "Export completed"
}
```
### Network Optimization
```bash
# Optimize for network conditions
network_optimized_export() {
local output_file="$1"
local api_url="$2"
echo "Starting network-optimized export..."
# Use compression and buffering
tg-save-doc-embeds \
-o "$output_file" \
-u "$api_url" \
--format msgpack & # MessagePack is more compact than JSON
export_pid=$!
# Monitor network usage
echo "Monitoring export (PID: $export_pid)..."
while kill -0 "$export_pid" 2>/dev/null; do
# Monitor network connections
connections=$(netstat -an | grep ":8088" | wc -l)
echo "Active connections: $connections"
sleep 30
done
}
```
## Error Handling
### Connection Issues
```bash
Exception: WebSocket connection failed
```
**Solution**: Check API URL and ensure TrustGraph WebSocket service is running.
### Disk Space Issues
```bash
Exception: No space left on device
```
**Solution**: Free up disk space or use a different output location.
### Permission Errors
```bash
Exception: Permission denied
```
**Solution**: Check write permissions for the output file location.
### Memory Issues
```bash
MemoryError: Unable to allocate memory
```
**Solution**: Monitor memory usage and consider using smaller export windows.
## Integration with Other Commands
### Complete Backup Workflow
```bash
# Complete backup and restore workflow
backup_restore_workflow() {
local backup_file="embeddings-backup.msgpack"
echo "=== Backup Phase ==="
# Create backup
tg-save-doc-embeds -o "$backup_file" &
backup_pid=$!
# Let it run for a while
sleep 300 # 5 minutes
kill $backup_pid
echo "Backup created: $backup_file"
# Validate backup
validate_export "$backup_file"
echo "=== Restore Phase ==="
# Restore from backup (to different collection)
tg-load-doc-embeds -i "$backup_file" --collection "restored"
echo "Backup and restore workflow completed"
}
```
### Analysis Pipeline
```bash
# Export and analyze embeddings
export_analyze_pipeline() {
local topic="$1"
local export_file="analysis-${topic}.msgpack"
echo "Starting export and analysis pipeline for: $topic"
# Export embeddings
tg-save-doc-embeds \
-o "$export_file" \
--collection "$topic" &
export_pid=$!
# Run for analysis duration
sleep 600 # 10 minutes
kill $export_pid
# Analyze exported data
echo "Analyzing exported embeddings..."
tg-dump-msgpack -i "$export_file" --summary
# Count embeddings by user
echo "Embeddings by user:"
tg-dump-msgpack -i "$export_file" | \
jq -r '.[1].m.u' | \
sort | uniq -c
echo "Analysis pipeline completed"
}
```
## Environment Variables
- `TRUSTGRAPH_API`: Default API URL
## Related Commands
- [`tg-load-doc-embeds`](tg-load-doc-embeds.md) - Load document embeddings from files
- [`tg-dump-msgpack`](tg-dump-msgpack.md) - Analyze MessagePack files
- [`tg-show-flows`](tg-show-flows.md) - List available flows for monitoring
## API Integration
This command uses TrustGraph's WebSocket API for document embeddings export, specifically the `/api/v1/flow/{flow-id}/export/document-embeddings` endpoint.
## Best Practices
1. **Start Early**: Begin export before processing starts to capture all data
2. **Monitoring**: Monitor export progress and file sizes
3. **Validation**: Always validate exported files
4. **Compression**: Use compression for long-term storage
5. **Rotation**: Implement file rotation for continuous exports
6. **Backup**: Keep multiple backup copies in different locations
7. **Documentation**: Document export schedules and procedures
## Troubleshooting
### No Data Captured
```bash
# Check if processing is generating embeddings
tg-show-flows | grep processing
# Verify WebSocket connection
netstat -an | grep :8088
```
### Large File Issues
```bash
# Monitor file growth
watch -n 5 'ls -lh *.msgpack'
# Check available disk space
df -h
```
### Process Management
```bash
# List running export processes
ps aux | grep tg-save-doc-embeds
# Kill stuck processes
pkill -f tg-save-doc-embeds
```

442
docs/cli/tg-set-prompt.md Normal file
View file

@ -0,0 +1,442 @@
# tg-set-prompt
Sets prompt templates and system prompts for TrustGraph LLM services.
## Synopsis
```bash
# Set a prompt template
tg-set-prompt --id TEMPLATE_ID --prompt TEMPLATE [options]
# Set system prompt
tg-set-prompt --system SYSTEM_PROMPT
```
## Description
The `tg-set-prompt` command configures prompt templates and system prompts used by TrustGraph's LLM services. Prompt templates contain placeholders like `{{variable}}` that are replaced with actual values when invoked. System prompts provide global context for all LLM interactions.
Templates are stored in TrustGraph's configuration system and can be used with `tg-invoke-prompt` for consistent AI interactions.
## Options
### Prompt Template Mode
- `--id ID`: Unique identifier for the prompt template (required for templates)
- `--prompt TEMPLATE`: Prompt template text with `{{variable}}` placeholders (required for templates)
- `--response TYPE`: Response format - `text` or `json` (default: `text`)
- `--schema SCHEMA`: JSON schema for structured responses (required when response is `json`)
### System Prompt Mode
- `--system PROMPT`: System prompt text (cannot be used with other options)
### Common Options
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Basic Prompt Template
```bash
tg-set-prompt \
--id "greeting" \
--prompt "Hello {{name}}, welcome to {{place}}!"
```
### Question-Answer Template
```bash
tg-set-prompt \
--id "question" \
--prompt "Answer this question based on the context: {{question}}\n\nContext: {{context}}"
```
### JSON Response Template
```bash
tg-set-prompt \
--id "extract-info" \
--prompt "Extract key information from: {{text}}" \
--response "json" \
--schema '{"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "number"}}}'
```
### Analysis Template
```bash
tg-set-prompt \
--id "analyze" \
--prompt "Analyze the following {{data_type}} and provide insights about {{focus_area}}:\n\n{{data}}\n\nFormat the response as {{format}}."
```
### System Prompt
```bash
tg-set-prompt \
--system "You are a helpful AI assistant. Always provide accurate, concise responses. When uncertain, clearly state your limitations."
```
## Template Variables
### Variable Syntax
Templates use `{{variable}}` syntax for placeholders:
```bash
# Template
"Hello {{name}}, today is {{day}}"
# Usage
tg-invoke-prompt greeting name="Alice" day="Monday"
# Result: "Hello Alice, today is Monday"
```
### Common Variables
- `{{text}}` - Input text for processing
- `{{question}}` - Question to answer
- `{{context}}` - Background context
- `{{data}}` - Data to analyze
- `{{format}}` - Output format specification
## Response Types
### Text Response (Default)
```bash
tg-set-prompt \
--id "summarize" \
--prompt "Summarize this text in {{max_words}} words: {{text}}"
```
### JSON Response
```bash
tg-set-prompt \
--id "classify" \
--prompt "Classify this text: {{text}}" \
--response "json" \
--schema '{
"type": "object",
"properties": {
"category": {"type": "string"},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["category", "confidence"]
}'
```
## Use Cases
### Document Processing Templates
```bash
# Document summarization
tg-set-prompt \
--id "document-summary" \
--prompt "Provide a {{length}} summary of this document:\n\n{{document}}\n\nFocus on: {{focus_areas}}"
# Key point extraction
tg-set-prompt \
--id "extract-key-points" \
--prompt "Extract the main points from: {{text}}\n\nReturn as a bulleted list."
# Document classification
tg-set-prompt \
--id "classify-document" \
--prompt "Classify this document into one of these categories: {{categories}}\n\nDocument: {{text}}" \
--response "json" \
--schema '{"type": "object", "properties": {"category": {"type": "string"}, "confidence": {"type": "number"}}}'
```
### Code Analysis Templates
```bash
# Code review
tg-set-prompt \
--id "code-review" \
--prompt "Review this {{language}} code for {{focus}} issues:\n\n{{code}}\n\nProvide specific recommendations."
# Bug detection
tg-set-prompt \
--id "find-bugs" \
--prompt "Analyze this code for potential bugs:\n\n{{code}}\n\nError context: {{error}}"
# Code explanation
tg-set-prompt \
--id "explain-code" \
--prompt "Explain how this {{language}} code works:\n\n{{code}}\n\nTarget audience: {{audience}}"
```
### Data Analysis Templates
```bash
# Data insights
tg-set-prompt \
--id "data-insights" \
--prompt "Analyze this {{data_type}} data and provide insights:\n\n{{data}}\n\nFocus on: {{metrics}}"
# Trend analysis
tg-set-prompt \
--id "trend-analysis" \
--prompt "Identify trends in this data over {{timeframe}}:\n\n{{data}}" \
--response "json" \
--schema '{"type": "object", "properties": {"trends": {"type": "array", "items": {"type": "string"}}}}'
```
### Content Generation Templates
```bash
# Marketing copy
tg-set-prompt \
--id "marketing-copy" \
--prompt "Create {{tone}} marketing copy for {{product}} targeting {{audience}}. Key features: {{features}}"
# Technical documentation
tg-set-prompt \
--id "tech-docs" \
--prompt "Generate technical documentation for:\n\n{{code}}\n\nInclude: {{sections}}"
```
## Advanced Usage
### Multi-Step Templates
```bash
# Research template
tg-set-prompt \
--id "research" \
--prompt "Research question: {{question}}
Available sources: {{sources}}
Please:
1. Analyze the question
2. Review relevant sources
3. Synthesize findings
4. Provide conclusions
Format: {{output_format}}"
```
### Conditional Templates
```bash
# Adaptive response template
tg-set-prompt \
--id "adaptive-response" \
--prompt "Task: {{task}}
Context: {{context}}
Expertise level: {{level}}
If expertise level is 'beginner', provide simple explanations.
If expertise level is 'advanced', include technical details.
If task involves code, include examples.
Response:"
```
### Structured Analysis Template
```bash
tg-set-prompt \
--id "structured-analysis" \
--prompt "Analyze: {{subject}}
Criteria: {{criteria}}
Data: {{data}}
Provide analysis in this structure:
- Overview
- Key Findings
- Recommendations
- Next Steps" \
--response "json" \
--schema '{
"type": "object",
"properties": {
"overview": {"type": "string"},
"key_findings": {"type": "array", "items": {"type": "string"}},
"recommendations": {"type": "array", "items": {"type": "string"}},
"next_steps": {"type": "array", "items": {"type": "string"}}
}
}'
```
### Template Management
```bash
# Create template collection for specific domain
domain="customer-support"
templates=(
"greeting:Hello! I'm here to help with {{issue_type}}. What can I assist you with?"
"escalation:I understand your frustration with {{issue}}. Let me escalate this to {{department}}."
"resolution:Great! I've resolved your {{issue}}. Is there anything else I can help with?"
)
for template in "${templates[@]}"; do
IFS=':' read -r id prompt <<< "$template"
tg-set-prompt --id "${domain}-${id}" --prompt "$prompt"
done
```
## System Prompt Configuration
### General Purpose System Prompt
```bash
tg-set-prompt --system "You are a knowledgeable AI assistant. Provide accurate, helpful responses. When you don't know something, say so clearly. Always consider the context and be concise unless detail is specifically requested."
```
### Domain-Specific System Prompt
```bash
tg-set-prompt --system "You are a technical documentation assistant specializing in software development. Focus on clarity, accuracy, and practical examples. Always include code snippets when relevant and explain complex concepts step-by-step."
```
### Role-Based System Prompt
```bash
tg-set-prompt --system "You are a data analyst AI. When analyzing data, always consider statistical significance, potential biases, and limitations. Present findings objectively and suggest actionable insights."
```
## Error Handling
### Missing Required Fields
```bash
Exception: Must specify --id for prompt
```
**Solution**: Provide both `--id` and `--prompt` for template creation.
### Invalid Response Type
```bash
Exception: Response must be one of: text json
```
**Solution**: Use only `text` or `json` for the `--response` option.
### Invalid JSON Schema
```bash
Exception: JSON schema must be valid JSON
```
**Solution**: Validate JSON schema syntax before using `--schema`.
### Conflicting Options
```bash
Exception: Can't use --system with other args
```
**Solution**: Use `--system` alone, or use template options without `--system`.
## Template Testing
### Test Template Creation
```bash
# Create and test a simple template
tg-set-prompt \
--id "test-template" \
--prompt "Test template with {{variable1}} and {{variable2}}"
# Test the template
tg-invoke-prompt test-template variable1="hello" variable2="world"
```
### Validate JSON Templates
```bash
# Create JSON template
tg-set-prompt \
--id "json-test" \
--prompt "Extract data from: {{text}}" \
--response "json" \
--schema '{"type": "object", "properties": {"result": {"type": "string"}}}'
# Test JSON response
tg-invoke-prompt json-test text="Sample text for testing"
```
### Template Iteration
```bash
# Version 1
tg-set-prompt \
--id "analysis-v1" \
--prompt "Analyze: {{data}}"
# Version 2 (improved)
tg-set-prompt \
--id "analysis-v2" \
--prompt "Analyze the following {{data_type}} and provide insights about {{focus}}:\n\n{{data}}\n\nConsider: {{considerations}}"
# Version 3 (structured)
tg-set-prompt \
--id "analysis-v3" \
--prompt "Analyze: {{data}}" \
--response "json" \
--schema '{"type": "object", "properties": {"summary": {"type": "string"}, "insights": {"type": "array"}}}'
```
## Best Practices
### Template Design
```bash
# Good: Clear, specific prompts
tg-set-prompt \
--id "good-summary" \
--prompt "Summarize this {{document_type}} in {{word_count}} words, focusing on {{key_aspects}}:\n\n{{content}}"
# Better: Include context and constraints
tg-set-prompt \
--id "better-summary" \
--prompt "Task: Summarize the following {{document_type}}
Length: {{word_count}} words maximum
Focus: {{key_aspects}}
Audience: {{target_audience}}
Document:
{{content}}
Summary:"
```
### Variable Naming
```bash
# Use descriptive variable names
tg-set-prompt \
--id "descriptive-vars" \
--prompt "Analyze {{data_source}} data from {{time_period}} for {{business_metric}} trends"
# Group related variables
tg-set-prompt \
--id "grouped-vars" \
--prompt "Compare {{baseline_data}} vs {{comparison_data}} using {{analysis_method}}"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-prompts`](tg-show-prompts.md) - Display configured prompts
- [`tg-invoke-prompt`](tg-invoke-prompt.md) - Use prompt templates
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Document-based AI queries
## API Integration
This command uses the [Config API](../apis/api-config.md) to store prompt templates and system prompts in TrustGraph's configuration system.
## Best Practices
1. **Clear Templates**: Write clear, specific prompt templates
2. **Variable Names**: Use descriptive variable names
3. **Response Types**: Choose appropriate response types for your use case
4. **Schema Validation**: Always validate JSON schemas before setting
5. **Version Control**: Consider versioning important templates
6. **Testing**: Test templates thoroughly with various inputs
7. **Documentation**: Document template variables and expected usage
## Troubleshooting
### Template Not Working
```bash
# Check template exists
tg-show-prompts | grep "template-id"
# Verify variable names match
tg-invoke-prompt template-id var1="test" var2="test"
```
### JSON Schema Errors
```bash
# Validate schema separately
echo '{"type": "object"}' | jq .
# Test with simple schema first
tg-set-prompt --id "test" --prompt "test" --response "json" --schema '{"type": "string"}'
```
### System Prompt Issues
```bash
# Check current system prompt
tg-show-prompts | grep -A5 "System prompt"
# Reset if needed
tg-set-prompt --system "Default system prompt"
```

View file

@ -0,0 +1,464 @@
# tg-set-token-costs
Sets token cost configuration for language models in TrustGraph.
## Synopsis
```bash
tg-set-token-costs --model MODEL_ID -i INPUT_COST -o OUTPUT_COST [options]
```
## Description
The `tg-set-token-costs` command configures the token pricing for language models used by TrustGraph. This information is used for cost tracking, billing, and resource management across AI operations.
Token costs are specified in dollars per million tokens and are stored in TrustGraph's configuration system for use by cost monitoring and reporting tools.
## Options
### Required Arguments
- `--model MODEL_ID`: Language model identifier
- `-i, --input-costs COST`: Input token cost in $ per 1M tokens
- `-o, --output-costs COST`: Output token cost in $ per 1M tokens
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Set Costs for GPT-4
```bash
tg-set-token-costs \
--model "gpt-4" \
-i 30.0 \
-o 60.0
```
### Set Costs for Claude Sonnet
```bash
tg-set-token-costs \
--model "claude-3-sonnet" \
-i 3.0 \
-o 15.0
```
### Set Costs for Local Model
```bash
tg-set-token-costs \
--model "llama-2-7b" \
-i 0.0 \
-o 0.0
```
### Set Costs with Custom API URL
```bash
tg-set-token-costs \
--model "gpt-3.5-turbo" \
-i 0.5 \
-o 1.5 \
-u http://production:8088/
```
## Model Pricing Examples
### OpenAI Models (as of 2024)
```bash
# GPT-4 Turbo
tg-set-token-costs --model "gpt-4-turbo" -i 10.0 -o 30.0
# GPT-4
tg-set-token-costs --model "gpt-4" -i 30.0 -o 60.0
# GPT-3.5 Turbo
tg-set-token-costs --model "gpt-3.5-turbo" -i 0.5 -o 1.5
```
### Anthropic Models
```bash
# Claude 3 Opus
tg-set-token-costs --model "claude-3-opus" -i 15.0 -o 75.0
# Claude 3 Sonnet
tg-set-token-costs --model "claude-3-sonnet" -i 3.0 -o 15.0
# Claude 3 Haiku
tg-set-token-costs --model "claude-3-haiku" -i 0.25 -o 1.25
```
### Google Models
```bash
# Gemini Pro
tg-set-token-costs --model "gemini-pro" -i 0.5 -o 1.5
# Gemini Ultra
tg-set-token-costs --model "gemini-ultra" -i 8.0 -o 24.0
```
### Local/Open Source Models
```bash
# Local models typically have no API costs
tg-set-token-costs --model "llama-2-70b" -i 0.0 -o 0.0
tg-set-token-costs --model "mistral-7b" -i 0.0 -o 0.0
tg-set-token-costs --model "local-model" -i 0.0 -o 0.0
```
## Use Cases
### Cost Tracking Setup
```bash
# Set up comprehensive cost tracking
models=(
"gpt-4:30.0:60.0"
"gpt-3.5-turbo:0.5:1.5"
"claude-3-sonnet:3.0:15.0"
"claude-3-haiku:0.25:1.25"
)
for model_config in "${models[@]}"; do
IFS=':' read -r model input_cost output_cost <<< "$model_config"
echo "Setting costs for $model..."
tg-set-token-costs --model "$model" -i "$input_cost" -o "$output_cost"
done
```
### Environment-Specific Pricing
```bash
# Set different costs for different environments
set_environment_costs() {
local env_url="$1"
local multiplier="$2" # Cost multiplier for environment
echo "Setting costs for environment: $env_url (multiplier: $multiplier)"
# Base costs
declare -A base_costs=(
["gpt-4"]="30.0:60.0"
["claude-3-sonnet"]="3.0:15.0"
["gpt-3.5-turbo"]="0.5:1.5"
)
for model in "${!base_costs[@]}"; do
IFS=':' read -r input_cost output_cost <<< "${base_costs[$model]}"
# Apply multiplier
adjusted_input=$(echo "$input_cost * $multiplier" | bc -l)
adjusted_output=$(echo "$output_cost * $multiplier" | bc -l)
echo " $model: input=$adjusted_input, output=$adjusted_output"
tg-set-token-costs \
--model "$model" \
-i "$adjusted_input" \
-o "$adjusted_output" \
-u "$env_url"
done
}
# Production environment (full cost)
set_environment_costs "http://prod:8088/" 1.0
# Development environment (reduced cost for budgeting)
set_environment_costs "http://dev:8088/" 0.1
```
### Cost Update Automation
```bash
# Automated cost updates from pricing file
update_costs_from_file() {
local pricing_file="$1"
if [ ! -f "$pricing_file" ]; then
echo "Pricing file not found: $pricing_file"
return 1
fi
echo "Updating costs from: $pricing_file"
# Expected format: model_id,input_cost,output_cost
while IFS=',' read -r model input_cost output_cost; do
# Skip header line
if [ "$model" = "model_id" ]; then
continue
fi
echo "Updating $model: input=$input_cost, output=$output_cost"
tg-set-token-costs --model "$model" -i "$input_cost" -o "$output_cost"
done < "$pricing_file"
}
# Create example pricing file
cat > model_pricing.csv << EOF
model_id,input_cost,output_cost
gpt-4,30.0,60.0
gpt-3.5-turbo,0.5,1.5
claude-3-sonnet,3.0,15.0
claude-3-haiku,0.25,1.25
EOF
# Update costs from file
update_costs_from_file "model_pricing.csv"
```
### Bulk Cost Management
```bash
# Bulk cost updates with validation
bulk_cost_update() {
local updates=(
"gpt-4-turbo:10.0:30.0"
"gpt-4:30.0:60.0"
"claude-3-opus:15.0:75.0"
"claude-3-sonnet:3.0:15.0"
"gemini-pro:0.5:1.5"
)
echo "Bulk cost update starting..."
for update in "${updates[@]}"; do
IFS=':' read -r model input_cost output_cost <<< "$update"
# Validate costs are numeric
if ! [[ "$input_cost" =~ ^[0-9]+\.?[0-9]*$ ]] || ! [[ "$output_cost" =~ ^[0-9]+\.?[0-9]*$ ]]; then
echo "Error: Invalid cost format for $model"
continue
fi
echo "Setting costs for $model..."
if tg-set-token-costs --model "$model" -i "$input_cost" -o "$output_cost"; then
echo "✓ Updated $model"
else
echo "✗ Failed to update $model"
fi
done
echo "Bulk update completed"
}
bulk_cost_update
```
## Advanced Usage
### Cost Tier Management
```bash
# Manage different cost tiers
set_cost_tier() {
local tier="$1"
case "$tier" in
"premium")
echo "Setting premium tier costs..."
tg-set-token-costs --model "gpt-4" -i 30.0 -o 60.0
tg-set-token-costs --model "claude-3-opus" -i 15.0 -o 75.0
;;
"standard")
echo "Setting standard tier costs..."
tg-set-token-costs --model "gpt-3.5-turbo" -i 0.5 -o 1.5
tg-set-token-costs --model "claude-3-sonnet" -i 3.0 -o 15.0
;;
"budget")
echo "Setting budget tier costs..."
tg-set-token-costs --model "claude-3-haiku" -i 0.25 -o 1.25
tg-set-token-costs --model "local-model" -i 0.0 -o 0.0
;;
*)
echo "Unknown tier: $tier"
echo "Available tiers: premium, standard, budget"
return 1
;;
esac
}
# Set costs for different tiers
set_cost_tier "premium"
set_cost_tier "standard"
set_cost_tier "budget"
```
### Dynamic Pricing Updates
```bash
# Update costs based on current market rates
update_dynamic_pricing() {
local pricing_api_url="$1" # Hypothetical pricing API
echo "Fetching current pricing from: $pricing_api_url"
# This would integrate with actual pricing APIs
# For demonstration, using static data
declare -A current_prices=(
["gpt-4"]="30.0:60.0"
["gpt-3.5-turbo"]="0.5:1.5"
["claude-3-sonnet"]="3.0:15.0"
)
for model in "${!current_prices[@]}"; do
IFS=':' read -r input_cost output_cost <<< "${current_prices[$model]}"
echo "Updating $model with current market rates..."
tg-set-token-costs --model "$model" -i "$input_cost" -o "$output_cost"
done
}
```
### Cost Validation
```bash
# Validate cost settings
validate_costs() {
local model="$1"
local input_cost="$2"
local output_cost="$3"
echo "Validating costs for $model..."
# Check cost reasonableness
if (( $(echo "$input_cost < 0" | bc -l) )); then
echo "Error: Input cost cannot be negative"
return 1
fi
if (( $(echo "$output_cost < 0" | bc -l) )); then
echo "Error: Output cost cannot be negative"
return 1
fi
# Check if output cost is typically higher
if (( $(echo "$output_cost < $input_cost" | bc -l) )); then
echo "Warning: Output cost is lower than input cost (unusual but not invalid)"
fi
# Check for extremely high costs
if (( $(echo "$input_cost > 100" | bc -l) )) || (( $(echo "$output_cost > 200" | bc -l) )); then
echo "Warning: Costs are unusually high"
fi
echo "Validation passed for $model"
return 0
}
# Validate before setting
if validate_costs "gpt-4" 30.0 60.0; then
tg-set-token-costs --model "gpt-4" -i 30.0 -o 60.0
fi
```
## Error Handling
### Missing Required Arguments
```bash
Exception: error: the following arguments are required: --model, -i/--input-costs, -o/--output-costs
```
**Solution**: Provide all required arguments: model ID, input cost, and output cost.
### Invalid Cost Values
```bash
Exception: argument -i/--input-costs: invalid float value
```
**Solution**: Ensure cost values are valid numbers (e.g., 1.5, not "1.5a").
### API Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph is running.
### Configuration Access Errors
```bash
Exception: Access denied to configuration
```
**Solution**: Verify user permissions for configuration management.
## Cost Monitoring Integration
### Cost Verification
```bash
# Verify costs were set correctly
verify_costs() {
local model="$1"
echo "Verifying costs for model: $model"
# Check current settings
if costs=$(tg-show-token-costs | grep "$model"); then
echo "Current costs: $costs"
else
echo "Error: No costs found for model $model"
return 1
fi
}
# Set and verify
tg-set-token-costs --model "test-model" -i 1.0 -o 2.0
verify_costs "test-model"
```
### Cost Reporting Integration
```bash
# Generate cost report after updates
generate_cost_report() {
local report_file="cost_report_$(date +%Y%m%d_%H%M%S).txt"
echo "Cost Configuration Report - $(date)" > "$report_file"
echo "======================================" >> "$report_file"
tg-show-token-costs >> "$report_file"
echo "Report generated: $report_file"
}
# Update costs and generate report
tg-set-token-costs --model "gpt-4" -i 30.0 -o 60.0
generate_cost_report
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-token-costs`](tg-show-token-costs.md) - Display current token costs
- [`tg-show-config`](tg-show-config.md) - Show configuration settings (if available)
## API Integration
This command uses the [Config API](../apis/api-config.md) to store token cost configuration in TrustGraph's configuration system.
## Best Practices
1. **Regular Updates**: Keep costs current with market rates
2. **Validation**: Validate cost values before setting
3. **Documentation**: Document cost sources and update procedures
4. **Environment Consistency**: Maintain consistent costs across environments
5. **Monitoring**: Track cost changes over time
6. **Backup**: Export cost configurations for backup
7. **Automation**: Automate cost updates where possible
## Troubleshooting
### Costs Not Taking Effect
```bash
# Verify costs were set
tg-show-token-costs | grep "model-name"
# Check API connectivity
curl -s "$TRUSTGRAPH_URL/api/v1/config" > /dev/null
```
### Incorrect Cost Calculations
```bash
# Verify cost format (per million tokens)
# $30 per million tokens = 30.0, not 0.00003
# Check decimal precision
echo "scale=6; 30/1000000" | bc -l # This gives cost per token
```
### Permission Issues
```bash
# Check configuration access
tg-show-token-costs
# Verify user has admin privileges for cost management
```

170
docs/cli/tg-show-config.md Normal file
View file

@ -0,0 +1,170 @@
# tg-show-config
Displays the current TrustGraph system configuration.
## Synopsis
```bash
tg-show-config [options]
```
## Description
The `tg-show-config` command retrieves and displays the complete TrustGraph system configuration in JSON format. This includes flow definitions, service configurations, and other system settings stored in the configuration service.
This is particularly useful for:
- Understanding the current system setup
- Debugging configuration issues
- Finding queue names for Pulsar integration
- Verifying flow definitions and interfaces
## Options
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Display Complete Configuration
```bash
tg-show-config
```
### Using Custom API URL
```bash
tg-show-config -u http://production:8088/
```
## Output Format
The command outputs the configuration version followed by the complete configuration in JSON format:
```
Version: 42
{
"flows": {
"default": {
"class-name": "document-rag+graph-rag",
"description": "Default processing flow",
"interfaces": {
"agent": {
"request": "non-persistent://tg/request/agent:default",
"response": "non-persistent://tg/response/agent:default"
},
"graph-rag": {
"request": "non-persistent://tg/request/graph-rag:document-rag+graph-rag",
"response": "non-persistent://tg/response/graph-rag:document-rag+graph-rag"
},
"text-load": "persistent://tg/flow/text-document-load:default",
...
}
}
},
"prompts": {
"system": "You are a helpful AI assistant...",
"graph-rag": "Answer the question using the provided context..."
},
"token-costs": {
"gpt-4": {
"prompt": 0.03,
"completion": 0.06
}
},
...
}
```
## Configuration Sections
### Flow Definitions
Flow configurations showing:
- **class-name**: The flow class being used
- **description**: Human-readable flow description
- **interfaces**: Pulsar queue names for each service
### Prompt Templates
System and service-specific prompt templates used by AI services.
### Token Costs
Model pricing information for cost tracking and billing.
### Service Settings
Various service-specific configuration parameters.
## Finding Queue Names
The configuration output is essential for discovering Pulsar queue names:
### Flow-Hosted Services
Look in the `flows` section under `interfaces`:
```json
"graph-rag": {
"request": "non-persistent://tg/request/graph-rag:document-rag+graph-rag",
"response": "non-persistent://tg/response/graph-rag:document-rag+graph-rag"
}
```
### Fire-and-Forget Services
Some services only have input queues:
```json
"text-load": "persistent://tg/flow/text-document-load:default"
```
## Error Handling
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Verify the API URL and ensure TrustGraph is running.
### Authentication Errors
```bash
Exception: Unauthorized
```
**Solution**: Check authentication credentials and permissions.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-put-flow-class`](tg-put-flow-class.md) - Update flow class definitions
- [`tg-show-flows`](tg-show-flows.md) - List active flows
- [`tg-set-prompt`](tg-set-prompt.md) - Configure prompt templates
- [`tg-set-token-costs`](tg-set-token-costs.md) - Configure token costs
## API Integration
This command uses the [Config API](../apis/api-config.md) with the `config` operation to retrieve the complete system configuration.
**API Call:**
```json
{
"operation": "config"
}
```
## Use Cases
### Development and Debugging
- Verify flow configurations are correct
- Check that services have proper queue assignments
- Debug configuration-related issues
### System Administration
- Monitor configuration changes over time
- Document current system setup
- Prepare for system migrations
### Integration Development
- Discover Pulsar queue names for direct integration
- Understand service interfaces and capabilities
- Verify API endpoint configurations
### Troubleshooting
- Check if flows are properly configured
- Verify prompt templates are set correctly
- Confirm token cost configurations

View file

@ -0,0 +1,330 @@
# tg-show-flow-classes
Lists all defined flow classes in TrustGraph with their descriptions and tags.
## Synopsis
```bash
tg-show-flow-classes [options]
```
## Description
The `tg-show-flow-classes` command displays a formatted table of all flow class definitions currently stored in TrustGraph. Each flow class is shown with its name, description, and associated tags.
Flow classes are templates that define the structure and services available for creating flow instances. This command helps you understand what flow classes are available for use.
## Options
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### List All Flow Classes
```bash
tg-show-flow-classes
```
Output:
```
+-----------------+----------------------------------+----------------------+
| flow class | description | tags |
+-----------------+----------------------------------+----------------------+
| document-proc | Document processing pipeline | production, nlp |
| data-analysis | Data analysis and visualization | analytics, dev |
| web-scraper | Web content extraction flow | scraping, batch |
| chat-assistant | Conversational AI assistant | ai, interactive |
+-----------------+----------------------------------+----------------------+
```
### Using Custom API URL
```bash
tg-show-flow-classes -u http://production:8088/
```
### Filter Flow Classes
```bash
# Show only production-tagged flow classes
tg-show-flow-classes | grep "production"
# Count total flow classes
tg-show-flow-classes | grep -c "^|"
# Show flow classes with specific patterns
tg-show-flow-classes | grep -E "(document|text|nlp)"
```
## Output Format
The command displays results in a formatted table with columns:
- **flow class**: The unique name/identifier of the flow class
- **description**: Human-readable description of the flow class purpose
- **tags**: Comma-separated list of categorization tags
### Empty Results
If no flow classes exist:
```
No flows.
```
## Use Cases
### Flow Class Discovery
```bash
# Find available flow classes for document processing
tg-show-flow-classes | grep -i document
# List all AI-related flow classes
tg-show-flow-classes | grep -i "ai\|nlp\|chat\|assistant"
# Find development vs production flow classes
tg-show-flow-classes | grep -E "(dev|test|staging)"
tg-show-flow-classes | grep "production"
```
### Flow Class Management
```bash
# Get list of flow class names for scripting
tg-show-flow-classes | awk 'NR>3 && /^\|/ {gsub(/[| ]/, "", $2); print $2}' | grep -v "^$"
# Check if specific flow class exists
if tg-show-flow-classes | grep -q "target-flow"; then
echo "Flow class 'target-flow' exists"
else
echo "Flow class 'target-flow' not found"
fi
```
### Environment Comparison
```bash
# Compare flow classes between environments
echo "Development environment:"
tg-show-flow-classes -u http://dev:8088/
echo "Production environment:"
tg-show-flow-classes -u http://prod:8088/
```
### Reporting and Documentation
```bash
# Generate flow class inventory report
echo "Flow Class Inventory - $(date)" > flow-inventory.txt
echo "=====================================" >> flow-inventory.txt
tg-show-flow-classes >> flow-inventory.txt
# Create CSV export
echo "flow_class,description,tags" > flow-classes.csv
tg-show-flow-classes | awk 'NR>3 && /^\|/ {
gsub(/^\| */, "", $0); gsub(/ *\|$/, "", $0);
gsub(/ *\| */, ",", $0); print $0
}' >> flow-classes.csv
```
## Error Handling
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
### Permission Errors
```bash
Exception: Access denied to list flow classes
```
**Solution**: Verify user permissions for reading flow class definitions.
### Network Timeouts
```bash
Exception: Request timeout
```
**Solution**: Check network connectivity and API server status.
## Integration with Other Commands
### Flow Class Lifecycle
```bash
# 1. List available flow classes
tg-show-flow-classes
# 2. Get details of specific flow class
tg-get-flow-class -n "interesting-flow"
# 3. Start flow instance from class
tg-start-flow -n "interesting-flow" -i "my-instance"
# 4. Monitor flow instance
tg-show-flows | grep "my-instance"
```
### Bulk Operations
```bash
# Process all flow classes
tg-show-flow-classes | awk 'NR>3 && /^\|/ {gsub(/[| ]/, "", $2); if($2) print $2}' | \
while read class_name; do
if [ -n "$class_name" ]; then
echo "Processing flow class: $class_name"
tg-get-flow-class -n "$class_name" > "backup-$class_name.json"
fi
done
```
### Automated Validation
```bash
# Check flow class health
echo "Validating flow classes..."
tg-show-flow-classes | awk 'NR>3 && /^\|/ {gsub(/[| ]/, "", $2); if($2) print $2}' | \
while read class_name; do
if [ -n "$class_name" ]; then
echo -n "Checking $class_name... "
if tg-get-flow-class -n "$class_name" > /dev/null 2>&1; then
echo "OK"
else
echo "ERROR"
fi
fi
done
```
## Advanced Usage
### Flow Class Analysis
```bash
# Analyze flow class distribution by tags
tg-show-flow-classes | awk 'NR>3 && /^\|/ {
# Extract tags column
split($0, parts, "|");
tags = parts[4];
gsub(/^ *| *$/, "", tags);
if (tags) {
split(tags, tag_array, ",");
for (i in tag_array) {
gsub(/^ *| *$/, "", tag_array[i]);
if (tag_array[i]) print tag_array[i];
}
}
}' | sort | uniq -c | sort -nr
```
### Environment Synchronization
```bash
# Sync flow classes between environments
echo "Synchronizing flow classes from dev to staging..."
# Get list from development
dev_classes=$(tg-show-flow-classes -u http://dev:8088/ | \
awk 'NR>3 && /^\|/ {gsub(/[| ]/, "", $2); if($2) print $2}')
# Check each class in staging
for class in $dev_classes; do
if tg-show-flow-classes -u http://staging:8088/ | grep -q "$class"; then
echo "$class: Already exists in staging"
else
echo "$class: Missing in staging - needs sync"
# Get from dev and put to staging
tg-get-flow-class -n "$class" -u http://dev:8088/ > temp-class.json
tg-put-flow-class -n "$class" -c "$(cat temp-class.json)" -u http://staging:8088/
rm temp-class.json
fi
done
```
### Monitoring Script
```bash
#!/bin/bash
# monitor-flow-classes.sh
api_url="${1:-http://localhost:8088/}"
echo "Flow Class Monitoring Report - $(date)"
echo "API URL: $api_url"
echo "----------------------------------------"
# Total count
total=$(tg-show-flow-classes -u "$api_url" | grep -c "^|" 2>/dev/null || echo "0")
echo "Total flow classes: $((total - 3))" # Subtract header rows
# Tag analysis
echo -e "\nTag distribution:"
tg-show-flow-classes -u "$api_url" | awk 'NR>3 && /^\|/ {
split($0, parts, "|");
tags = parts[4];
gsub(/^ *| *$/, "", tags);
if (tags) {
split(tags, tag_array, ",");
for (i in tag_array) {
gsub(/^ *| *$/, "", tag_array[i]);
if (tag_array[i]) print tag_array[i];
}
}
}' | sort | uniq -c | sort -nr
# Health check
echo -e "\nHealth check:"
healthy=0
unhealthy=0
tg-show-flow-classes -u "$api_url" | awk 'NR>3 && /^\|/ {gsub(/[| ]/, "", $2); if($2) print $2}' | \
while read class_name; do
if [ -n "$class_name" ]; then
if tg-get-flow-class -n "$class_name" -u "$api_url" > /dev/null 2>&1; then
healthy=$((healthy + 1))
else
unhealthy=$((unhealthy + 1))
echo " ERROR: $class_name"
fi
fi
done
echo "Healthy: $healthy, Unhealthy: $unhealthy"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-get-flow-class`](tg-get-flow-class.md) - Retrieve specific flow class definitions
- [`tg-put-flow-class`](tg-put-flow-class.md) - Create/update flow class definitions
- [`tg-delete-flow-class`](tg-delete-flow-class.md) - Delete flow class definitions
- [`tg-start-flow`](tg-start-flow.md) - Create flow instances from classes
- [`tg-show-flows`](tg-show-flows.md) - List active flow instances
## API Integration
This command uses the [Flow API](../apis/api-flow.md) with the `list-classes` operation to retrieve flow class listings.
## Best Practices
1. **Regular Inventory**: Periodically review available flow classes
2. **Documentation**: Ensure flow classes have meaningful descriptions
3. **Tagging**: Use consistent tagging for better organization
4. **Cleanup**: Remove unused or deprecated flow classes
5. **Monitoring**: Include flow class health checks in monitoring
6. **Environment Parity**: Keep flow classes synchronized across environments
## Troubleshooting
### No Output
```bash
# If command returns no output, check API connectivity
tg-show-flow-classes -u http://localhost:8088/
# Verify TrustGraph is running and accessible
```
### Formatting Issues
```bash
# If table formatting is broken, check terminal width
export COLUMNS=120
tg-show-flow-classes
```
### Missing Flow Classes
```bash
# If expected flow classes are missing, verify:
# 1. Correct API URL
# 2. Database connectivity
# 3. Flow class definitions are properly stored
```

View file

@ -0,0 +1,518 @@
# tg-show-flow-state
Displays the processor states for a specific flow and its associated flow class.
## Synopsis
```bash
tg-show-flow-state [options]
```
## Description
The `tg-show-flow-state` command shows the current state of processors within a specific TrustGraph flow instance and its corresponding flow class. It queries the metrics system to determine which processing components are running and displays their status with visual indicators.
This command is essential for monitoring flow health and debugging processing issues.
## Options
### Optional Arguments
- `-f, --flow-id ID`: Flow instance ID to examine (default: `default`)
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-m, --metrics-url URL`: Metrics API URL (default: `http://localhost:8088/api/metrics`)
## Examples
### Check Default Flow State
```bash
tg-show-flow-state
```
### Check Specific Flow
```bash
tg-show-flow-state -f "production-flow"
```
### Use Custom Metrics URL
```bash
tg-show-flow-state \
-f "research-flow" \
-m "http://metrics-server:8088/api/metrics"
```
### Check Flow in Different Environment
```bash
tg-show-flow-state \
-f "staging-flow" \
-u "http://staging:8088/" \
-m "http://staging:8088/api/metrics"
```
## Output Format
The command displays processor states for both the flow instance and its flow class:
```
Flow production-flow
- pdf-processor 💚
- text-extractor 💚
- embeddings-generator 💚
- knowledge-builder ❌
- document-indexer 💚
Class document-processing-v2
- base-pdf-processor 💚
- base-text-extractor 💚
- base-embeddings-generator 💚
- base-knowledge-builder 💚
- base-document-indexer 💚
```
### Status Indicators
- **💚 (Green Heart)**: Processor is running and healthy
- **❌ (Red X)**: Processor is not running or unhealthy
### Information Displayed
- **Flow Section**: Shows the state of processors in the specific flow instance
- **Class Section**: Shows the state of processors in the flow class template
- **Processor Names**: Individual processing components within the flow
## Use Cases
### Flow Health Monitoring
```bash
# Monitor flow health continuously
monitor_flow_health() {
local flow_id="$1"
local interval="${2:-30}" # Default 30 seconds
echo "Monitoring flow health: $flow_id"
echo "Refresh interval: ${interval}s"
echo "Press Ctrl+C to stop"
while true; do
clear
echo "Flow Health Monitor - $(date)"
echo "=============================="
tg-show-flow-state -f "$flow_id"
sleep "$interval"
done
}
# Monitor production flow
monitor_flow_health "production-flow" 15
```
### Debugging Processing Issues
```bash
# Comprehensive flow debugging
debug_flow_issues() {
local flow_id="$1"
echo "Debugging flow: $flow_id"
echo "======================="
# Check flow state
echo "1. Processor States:"
tg-show-flow-state -f "$flow_id"
# Check flow configuration
echo -e "\n2. Flow Configuration:"
tg-show-flows | grep "$flow_id"
# Check active processing
echo -e "\n3. Active Processing:"
tg-show-flows | grep -i processing
# Check system resources
echo -e "\n4. System Resources:"
free -h
df -h
echo -e "\nDebugging complete for: $flow_id"
}
# Debug specific flow
debug_flow_issues "problematic-flow"
```
### Multi-Flow Status Dashboard
```bash
# Create status dashboard for multiple flows
create_flow_dashboard() {
local flows=("$@")
echo "TrustGraph Flow Dashboard - $(date)"
echo "==================================="
for flow in "${flows[@]}"; do
echo -e "\n=== Flow: $flow ==="
tg-show-flow-state -f "$flow" 2>/dev/null || echo "Flow not found or inaccessible"
done
echo -e "\n=== Summary ==="
echo "Total flows monitored: ${#flows[@]}"
echo "Dashboard generated: $(date)"
}
# Monitor multiple flows
flows=("production-flow" "research-flow" "development-flow")
create_flow_dashboard "${flows[@]}"
```
### Automated Health Checks
```bash
# Automated health check with alerts
health_check_with_alerts() {
local flow_id="$1"
local alert_email="$2"
echo "Performing health check for: $flow_id"
# Capture flow state
flow_state=$(tg-show-flow-state -f "$flow_id" 2>&1)
if [ $? -ne 0 ]; then
echo "ERROR: Failed to get flow state"
# Send alert email if configured
if [ -n "$alert_email" ]; then
echo "Flow $flow_id is not responding" | mail -s "TrustGraph Alert" "$alert_email"
fi
return 1
fi
# Check for failed processors
failed_count=$(echo "$flow_state" | grep -c "❌")
if [ "$failed_count" -gt 0 ]; then
echo "WARNING: $failed_count processors are not running"
echo "$flow_state"
# Send alert if configured
if [ -n "$alert_email" ]; then
echo -e "Flow $flow_id has $failed_count failed processors:\n\n$flow_state" | \
mail -s "TrustGraph Health Alert" "$alert_email"
fi
return 1
else
echo "✓ All processors are running normally"
return 0
fi
}
# Run health check
health_check_with_alerts "production-flow" "admin@company.com"
```
## Advanced Usage
### Flow State Comparison
```bash
# Compare flow states between environments
compare_flow_states() {
local flow_id="$1"
local env1_url="$2"
local env2_url="$3"
echo "Comparing flow state: $flow_id"
echo "Environment 1: $env1_url"
echo "Environment 2: $env2_url"
echo "================================"
# Get states from both environments
echo "Environment 1 State:"
tg-show-flow-state -f "$flow_id" -u "$env1_url" -m "$env1_url/api/metrics"
echo -e "\nEnvironment 2 State:"
tg-show-flow-state -f "$flow_id" -u "$env2_url" -m "$env2_url/api/metrics"
echo -e "\nComparison complete"
}
# Compare production vs staging
compare_flow_states "main-flow" "http://prod:8088" "http://staging:8088"
```
### Historical State Tracking
```bash
# Track flow state over time
track_flow_state_history() {
local flow_id="$1"
local log_file="flow_state_history.log"
local interval="${2:-60}" # Default 1 minute
echo "Starting flow state tracking: $flow_id"
echo "Log file: $log_file"
echo "Interval: ${interval}s"
while true; do
timestamp=$(date '+%Y-%m-%d %H:%M:%S')
# Get current state
state_output=$(tg-show-flow-state -f "$flow_id" 2>&1)
if [ $? -eq 0 ]; then
# Count healthy and failed processors
healthy_count=$(echo "$state_output" | grep -c "💚")
failed_count=$(echo "$state_output" | grep -c "❌")
# Log summary
echo "$timestamp,$flow_id,$healthy_count,$failed_count" >> "$log_file"
# If there are failures, log details
if [ "$failed_count" -gt 0 ]; then
echo "$timestamp - FAILURES DETECTED in $flow_id:" >> "${log_file}.detailed"
echo "$state_output" >> "${log_file}.detailed"
echo "---" >> "${log_file}.detailed"
fi
else
echo "$timestamp,$flow_id,ERROR,ERROR" >> "$log_file"
fi
sleep "$interval"
done
}
# Start tracking (run in background)
track_flow_state_history "production-flow" 30 &
```
### State-Based Actions
```bash
# Perform actions based on flow state
state_based_actions() {
local flow_id="$1"
echo "Checking flow state for automated actions: $flow_id"
# Get current state
state_output=$(tg-show-flow-state -f "$flow_id")
if [ $? -ne 0 ]; then
echo "ERROR: Cannot get flow state"
return 1
fi
# Check specific processors
if echo "$state_output" | grep -q "pdf-processor.*❌"; then
echo "PDF processor is down - attempting restart..."
# Restart specific processor (this would need additional commands)
# restart_processor "$flow_id" "pdf-processor"
fi
if echo "$state_output" | grep -q "embeddings-generator.*❌"; then
echo "Embeddings generator is down - checking dependencies..."
# Check GPU availability, memory, etc.
nvidia-smi 2>/dev/null || echo "GPU not available"
fi
# Count total failures
failed_count=$(echo "$state_output" | grep -c "❌")
if [ "$failed_count" -gt 3 ]; then
echo "CRITICAL: More than 3 processors failed - considering flow restart"
# This would trigger more serious recovery actions
fi
}
```
### Performance Correlation
```bash
# Correlate flow state with performance metrics
correlate_state_performance() {
local flow_id="$1"
local metrics_url="$2"
echo "Correlating flow state with performance for: $flow_id"
# Get flow state
state_output=$(tg-show-flow-state -f "$flow_id" -m "$metrics_url")
healthy_count=$(echo "$state_output" | grep -c "💚")
failed_count=$(echo "$state_output" | grep -c "❌")
echo "Processors - Healthy: $healthy_count, Failed: $failed_count"
# Get performance metrics (this would need additional API calls)
# throughput=$(get_flow_throughput "$flow_id" "$metrics_url")
# latency=$(get_flow_latency "$flow_id" "$metrics_url")
# echo "Performance - Throughput: ${throughput}/min, Latency: ${latency}ms"
# Calculate health ratio
total_processors=$((healthy_count + failed_count))
if [ "$total_processors" -gt 0 ]; then
health_ratio=$(echo "scale=2; $healthy_count * 100 / $total_processors" | bc)
echo "Health ratio: ${health_ratio}%"
fi
}
```
## Integration with Monitoring Systems
### Prometheus Integration
```bash
# Export flow state metrics to Prometheus format
export_prometheus_metrics() {
local flow_id="$1"
local metrics_file="flow_state_metrics.prom"
# Get flow state
state_output=$(tg-show-flow-state -f "$flow_id")
# Count states
healthy_count=$(echo "$state_output" | grep -c "💚")
failed_count=$(echo "$state_output" | grep -c "❌")
# Generate Prometheus metrics
cat > "$metrics_file" << EOF
# HELP trustgraph_flow_processors_healthy Number of healthy processors in flow
# TYPE trustgraph_flow_processors_healthy gauge
trustgraph_flow_processors_healthy{flow_id="$flow_id"} $healthy_count
# HELP trustgraph_flow_processors_failed Number of failed processors in flow
# TYPE trustgraph_flow_processors_failed gauge
trustgraph_flow_processors_failed{flow_id="$flow_id"} $failed_count
# HELP trustgraph_flow_health_ratio Ratio of healthy processors
# TYPE trustgraph_flow_health_ratio gauge
EOF
total=$((healthy_count + failed_count))
if [ "$total" -gt 0 ]; then
ratio=$(echo "scale=4; $healthy_count / $total" | bc)
echo "trustgraph_flow_health_ratio{flow_id=\"$flow_id\"} $ratio" >> "$metrics_file"
fi
echo "Prometheus metrics exported to: $metrics_file"
}
```
### Grafana Dashboard Data
```bash
# Generate data for Grafana dashboard
generate_grafana_data() {
local flows=("$@")
local output_file="grafana_flow_data.json"
echo "Generating Grafana dashboard data..."
echo "{" > "$output_file"
echo " \"flows\": [" >> "$output_file"
for i in "${!flows[@]}"; do
flow="${flows[$i]}"
# Get flow state
state_output=$(tg-show-flow-state -f "$flow" 2>/dev/null)
if [ $? -eq 0 ]; then
healthy=$(echo "$state_output" | grep -c "💚")
failed=$(echo "$state_output" | grep -c "❌")
else
healthy=0
failed=0
fi
echo " {" >> "$output_file"
echo " \"flow_id\": \"$flow\"," >> "$output_file"
echo " \"healthy_processors\": $healthy," >> "$output_file"
echo " \"failed_processors\": $failed," >> "$output_file"
echo " \"timestamp\": \"$(date -Iseconds)\"" >> "$output_file"
if [ $i -lt $((${#flows[@]} - 1)) ]; then
echo " }," >> "$output_file"
else
echo " }" >> "$output_file"
fi
done
echo " ]" >> "$output_file"
echo "}" >> "$output_file"
echo "Grafana data generated: $output_file"
}
```
## Error Handling
### Flow Not Found
```bash
Exception: Flow 'nonexistent-flow' not found
```
**Solution**: Verify the flow ID exists with `tg-show-flows`.
### Metrics API Unavailable
```bash
Exception: Connection refused to metrics API
```
**Solution**: Check metrics URL and ensure metrics service is running.
### Permission Issues
```bash
Exception: Access denied to metrics
```
**Solution**: Verify permissions for accessing metrics and flow information.
### Invalid Flow State
```bash
Exception: Unable to parse flow state
```
**Solution**: Check if the flow is properly initialized and processors are configured.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-flows`](tg-show-flows.md) - List all flows
- [`tg-show-processor-state`](tg-show-processor-state.md) - Show all processor states
- [`tg-start-flow`](tg-start-flow.md) - Start flow instances
- [`tg-stop-flow`](tg-stop-flow.md) - Stop flow instances
## API Integration
This command integrates with:
- TrustGraph Flow API for flow information
- Prometheus/Metrics API for processor state information
## Best Practices
1. **Regular Monitoring**: Check flow states regularly in production
2. **Automated Alerts**: Set up automated health checks with alerting
3. **Historical Tracking**: Maintain historical flow state data
4. **Integration**: Integrate with monitoring systems like Prometheus/Grafana
5. **Documentation**: Document expected processor configurations
6. **Correlation**: Correlate flow state with performance metrics
7. **Recovery Procedures**: Develop automated recovery procedures for common failures
## Troubleshooting
### No Processors Shown
```bash
# Check if flow exists
tg-show-flows | grep "flow-id"
# Verify metrics service
curl -s http://localhost:8088/api/metrics/query?query=processor_info
```
### Inconsistent States
```bash
# Check metrics service health
curl -s http://localhost:8088/api/metrics/health
# Restart metrics collection if needed
```
### Connection Issues
```bash
# Test API connectivity
curl -s http://localhost:8088/api/v1/flows
# Test metrics connectivity
curl -s http://localhost:8088/api/metrics/query?query=up
```

207
docs/cli/tg-show-flows.md Normal file
View file

@ -0,0 +1,207 @@
# tg-show-flows
Shows configured flows with their interfaces and queue information.
## Synopsis
```bash
tg-show-flows [options]
```
## Description
The `tg-show-flows` command displays all currently configured flow instances, including their identifiers, class names, descriptions, and available service interfaces with corresponding Pulsar queue names.
This command is essential for understanding what flows are available, discovering service endpoints, and finding Pulsar queue names for direct API integration.
## Options
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Show All Flows
```bash
tg-show-flows
```
### Using Custom API URL
```bash
tg-show-flows -u http://production:8088/
```
## Output Format
The command displays each flow in a formatted table with the following information:
```
+-------+---------------------------+
| id | research-flow |
| class | document-rag+graph-rag |
| desc | Research document pipeline |
| queue | agent request: non-persistent://tg/request/agent:default |
| | agent response: non-persistent://tg/request/agent:default |
| | graph-rag request: non-persistent://tg/request/graph-rag:document-rag+graph-rag |
| | graph-rag response: non-persistent://tg/request/graph-rag:document-rag+graph-rag |
| | text-load: persistent://tg/flow/text-document-load:default |
+-------+---------------------------+
+-------+---------------------------+
| id | medical-analysis |
| class | medical-nlp |
| desc | Medical document analysis |
| queue | embeddings request: non-persistent://tg/request/embeddings:medical-nlp |
| | embeddings response: non-persistent://tg/request/embeddings:medical-nlp |
| | document-load: persistent://tg/flow/document-load:medical-analysis |
+-------+---------------------------+
```
### No Flows Available
```bash
No flows.
```
## Interface Types
The queue information shows two types of service interfaces:
### Request/Response Services
Services that accept requests and return responses:
```
agent request: non-persistent://tg/request/agent:default
agent response: non-persistent://tg/response/agent:default
```
### Fire-and-Forget Services
Services that accept data without returning responses:
```
text-load: persistent://tg/flow/text-document-load:default
```
## Service Interface Discovery
Use this command to discover available services and their queue names:
### Common Request/Response Services
- **agent**: Interactive Q&A service
- **graph-rag**: Graph-based retrieval augmented generation
- **document-rag**: Document-based retrieval augmented generation
- **text-completion**: LLM text completion service
- **prompt**: Prompt-based text generation
- **embeddings**: Text embedding generation
- **graph-embeddings**: Graph entity embeddings
- **triples**: Knowledge graph triple queries
### Common Fire-and-Forget Services
- **text-load**: Text document loading
- **document-load**: Document file loading
- **triples-store**: Knowledge graph storage
- **graph-embeddings-store**: Graph embedding storage
- **document-embeddings-store**: Document embedding storage
- **entity-contexts-load**: Entity context loading
## Queue Name Patterns
### Flow-Hosted Request/Response
```
non-persistent://tg/request/{service}:{flow-class}
non-persistent://tg/response/{service}:{flow-class}
```
### Flow-Hosted Fire-and-Forget
```
persistent://tg/flow/{service}:{flow-id}
```
## Error Handling
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Verify the API URL and ensure TrustGraph is running.
### Authentication Errors
```bash
Exception: Unauthorized
```
**Solution**: Check authentication credentials and permissions.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-start-flow`](tg-start-flow.md) - Start a new flow instance
- [`tg-stop-flow`](tg-stop-flow.md) - Stop a running flow
- [`tg-show-flow-classes`](tg-show-flow-classes.md) - List available flow classes
- [`tg-show-flow-state`](tg-show-flow-state.md) - Show detailed flow status
- [`tg-show-config`](tg-show-config.md) - Show complete system configuration
## API Integration
This command uses the [Flow API](../apis/api-flow.md) to list flows and the [Config API](../apis/api-config.md) to retrieve interface descriptions.
## Use Cases
### Service Discovery
Find available services and their endpoints:
```bash
# List all flows and their services
tg-show-flows
# Use discovered queue names for direct Pulsar integration
```
### System Monitoring
Monitor active flows and their configurations:
```bash
# Check what flows are running
tg-show-flows
# Verify flow services are properly configured
```
### Development and Debugging
Understand flow configurations during development:
```bash
# Check if flow started correctly
tg-start-flow -n "my-class" -i "test-flow" -d "Test"
tg-show-flows
# Verify service interfaces are available
```
### Integration Planning
Plan API integrations by understanding available services:
```bash
# Discover queue names for Pulsar clients
tg-show-flows | grep "graph-rag request"
# Find WebSocket endpoints for real-time services
```
## Output Interpretation
### Flow Information
- **id**: Unique flow instance identifier
- **class**: Flow class name used to create the instance
- **desc**: Human-readable flow description
- **queue**: Service interfaces and their Pulsar queue names
### Queue Names
Queue names indicate:
- **Persistence**: `persistent://` vs `non-persistent://`
- **Tenant**: Usually `tg`
- **Namespace**: `request`, `response`, or `flow`
- **Service**: The specific service name
- **Flow Identifier**: Either flow class or flow ID
## Best Practices
1. **Regular Monitoring**: Check flows regularly to ensure they're running correctly
2. **Queue Documentation**: Save queue names for API integration documentation
3. **Flow Lifecycle**: Use in conjunction with flow start/stop commands
4. **Capacity Planning**: Monitor number of active flows for resource planning
5. **Service Discovery**: Use output to understand available capabilities

286
docs/cli/tg-show-graph.md Normal file
View file

@ -0,0 +1,286 @@
# tg-show-graph
Displays knowledge graph triples (edges) from the TrustGraph system.
## Synopsis
```bash
tg-show-graph [options]
```
## Description
The `tg-show-graph` command queries the knowledge graph and displays up to 10,000 triples (subject-predicate-object relationships) in a human-readable format. This is useful for exploring knowledge graph contents, debugging knowledge loading, and understanding the structure of stored knowledge.
Each triple represents a fact or relationship in the knowledge graph, showing how entities are connected through various predicates.
## Options
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-f, --flow-id FLOW`: Flow ID to query (default: `default`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
- `-C, --collection COLLECTION`: Collection identifier (default: `default`)
## Examples
### Display All Graph Triples
```bash
tg-show-graph
```
### Query Specific Flow
```bash
tg-show-graph -f research-flow
```
### Query User's Collection
```bash
tg-show-graph -U researcher -C medical-papers
```
### Using Custom API URL
```bash
tg-show-graph -u http://production:8088/
```
## Output Format
The command displays triples in subject-predicate-object format:
```
<Person1> <hasName> "John Doe"
<Person1> <worksAt> <Organization1>
<Organization1> <hasName> "Acme Corporation"
<Organization1> <locatedIn> <City1>
<City1> <hasName> "New York"
<Document1> <createdBy> <Person1>
<Document1> <hasTitle> "Research Report"
<Document1> <publishedIn> "2024"
```
### Triple Components
- **Subject**: The entity the statement is about (usually a URI)
- **Predicate**: The relationship or property (usually a URI)
- **Object**: The value or target entity (can be URI or literal)
### URI vs Literal Values
- **URIs**: Enclosed in angle brackets `<Entity1>`
- **Literals**: Enclosed in quotes `"Literal Value"`
### Common Predicates
- `<hasName>`: Entity names
- `<hasTitle>`: Document titles
- `<createdBy>`: Authorship relationships
- `<worksAt>`: Employment relationships
- `<locatedIn>`: Location relationships
- `<publishedIn>`: Publication information
- `<dc:creator>`: Dublin Core creator
- `<foaf:name>`: Friend of a Friend name
## Data Limitations
### 10,000 Triple Limit
The command displays up to 10,000 triples to prevent overwhelming output. For larger graphs:
```bash
# Use graph export for complete data
tg-graph-to-turtle -o complete-graph.ttl
# Use targeted queries for specific data
tg-invoke-graph-rag -q "Show me information about specific entities"
```
### Collection Scope
Results are limited to the specified user and collection. To see all data:
```bash
# Query different collections
tg-show-graph -C collection1
tg-show-graph -C collection2
```
## Knowledge Graph Structure
### Entity Types
Common entity types in the output:
- **Documents**: Research papers, reports, manuals
- **People**: Authors, researchers, employees
- **Organizations**: Companies, institutions, publishers
- **Concepts**: Technical terms, topics, categories
- **Events**: Publications, meetings, processes
### Relationship Types
Common relationship types:
- **Authorship**: Who created what
- **Membership**: Who belongs to what organization
- **Hierarchical**: Parent-child relationships
- **Temporal**: When things happened
- **Topical**: What topics are related
## Error Handling
### Flow Not Available
```bash
Exception: Invalid flow
```
**Solution**: Verify the flow exists and is running with `tg-show-flows`.
### No Data Available
```bash
# Empty output (no triples displayed)
```
**Solution**: Check if knowledge has been loaded using `tg-show-kg-cores` and `tg-load-kg-core`.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
### Permission Errors
```bash
Exception: Access denied
```
**Solution**: Verify user permissions for the specified collection.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-graph-to-turtle`](tg-graph-to-turtle.md) - Export graph to Turtle format
- [`tg-load-kg-core`](tg-load-kg-core.md) - Load knowledge into graph
- [`tg-show-kg-cores`](tg-show-kg-cores.md) - List available knowledge cores
- [`tg-invoke-graph-rag`](tg-invoke-graph-rag.md) - Query graph with natural language
- [`tg-load-turtle`](tg-load-turtle.md) - Import RDF data from Turtle files
## API Integration
This command uses the [Triples Query API](../apis/api-triples-query.md) to retrieve knowledge graph triples with no filtering constraints.
## Use Cases
### Knowledge Exploration
```bash
# Explore what knowledge is available
tg-show-graph | head -50
# Look for specific entities
tg-show-graph | grep "Einstein"
```
### Data Verification
```bash
# Verify knowledge loading worked correctly
tg-load-kg-core --kg-core-id "research-data" --flow-id "research-flow"
tg-show-graph -f research-flow | wc -l
```
### Debugging Knowledge Issues
```bash
# Check if specific relationships exist
tg-show-graph | grep "hasName"
tg-show-graph | grep "createdBy"
```
### Graph Analysis
```bash
# Count different relationship types
tg-show-graph | awk '{print $2}' | sort | uniq -c
# Find most connected entities
tg-show-graph | awk '{print $1}' | sort | uniq -c | sort -nr
```
### Data Quality Assessment
```bash
# Check for malformed triples
tg-show-graph | grep -v "^<.*> <.*>"
# Verify URI patterns
tg-show-graph | grep "http://" | head -20
```
## Output Processing
### Filter by Predicate
```bash
# Show only name relationships
tg-show-graph | grep "hasName"
# Show only authorship
tg-show-graph | grep "createdBy"
```
### Extract Entities
```bash
# List all subjects (entities)
tg-show-graph | awk '{print $1}' | sort | uniq
# List all predicates (relationships)
tg-show-graph | awk '{print $2}' | sort | uniq
```
### Export Subsets
```bash
# Save specific relationships
tg-show-graph | grep "Organization" > organization-data.txt
# Save person-related triples
tg-show-graph | grep "Person" > person-data.txt
```
## Performance Considerations
### Large Graphs
For graphs with many triples:
- Command may take time to retrieve 10,000 triples
- Consider using filtered queries for specific data
- Use `tg-graph-to-turtle` for complete export
### Memory Usage
- Output is streamed, so memory usage is manageable
- Piping to other commands processes data incrementally
## Best Practices
1. **Start Small**: Begin with small collections to understand structure
2. **Use Filters**: Pipe output through grep/awk for specific data
3. **Regular Inspection**: Periodically check graph contents
4. **Data Validation**: Verify expected relationships exist
5. **Performance Monitoring**: Monitor query time for large graphs
6. **Collection Organization**: Use collections to organize different domains
## Integration Examples
### With Other Tools
```bash
# Convert to different formats
tg-show-graph | sed 's/[<>"]//g' > simple-triples.txt
# Create entity lists
tg-show-graph | awk '{print $1}' | sort | uniq > entities.txt
# Generate statistics
tg-show-graph | wc -l
echo "Total triples in graph"
```
### Graph Exploration Workflow
```bash
# 1. Check available knowledge
tg-show-kg-cores
# 2. Load knowledge into flow
tg-load-kg-core --kg-core-id "my-knowledge" --flow-id "my-flow"
# 3. Explore the graph
tg-show-graph -f my-flow
# 4. Query specific information
tg-invoke-graph-rag -q "What entities are in the graph?" -f my-flow
```

View file

@ -0,0 +1,227 @@
# tg-show-kg-cores
Shows available knowledge cores in the TrustGraph system.
## Synopsis
```bash
tg-show-kg-cores [options]
```
## Description
The `tg-show-kg-cores` command lists all knowledge cores available in the TrustGraph system for a specific user. Knowledge cores contain structured knowledge (RDF triples and graph embeddings) that can be loaded into flows for processing and querying.
This command is useful for discovering what knowledge resources are available, managing knowledge core inventories, and preparing for knowledge loading operations.
## Options
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
## Examples
### List All Knowledge Cores
```bash
tg-show-kg-cores
```
### List Cores for Specific User
```bash
tg-show-kg-cores -U researcher
```
### Using Custom API URL
```bash
tg-show-kg-cores -u http://production:8088/
```
## Output Format
The command lists knowledge core identifiers, one per line:
```
medical-knowledge-v1
research-papers-2024
legal-documents-core
technical-specifications
climate-data-march
```
### No Knowledge Cores
```bash
No knowledge cores.
```
## Knowledge Core Naming
Knowledge cores typically follow naming conventions that include:
- **Domain**: `medical-`, `legal-`, `technical-`
- **Content Type**: `papers-`, `documents-`, `data-`
- **Version/Date**: `v1`, `2024`, `march`
Example patterns:
- `medical-knowledge-v2.1`
- `research-papers-2024-q1`
- `legal-documents-updated`
- `technical-specs-current`
## Related Operations
After discovering knowledge cores, you can:
### Load into Flow
```bash
# Load core into active flow
tg-load-kg-core --kg-core-id "medical-knowledge-v1" --flow-id "medical-flow"
```
### Examine Contents
```bash
# Export core for examination
tg-get-kg-core --id "research-papers-2024" -o examination.msgpack
```
### Remove Unused Cores
```bash
# Delete obsolete cores
tg-delete-kg-core --id "old-knowledge-v1" -U researcher
```
## Error Handling
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Verify the API URL and ensure TrustGraph is running.
### Authentication Errors
```bash
Exception: Unauthorized
```
**Solution**: Check authentication credentials and user permissions.
### User Not Found
```bash
Exception: User not found
```
**Solution**: Verify the user identifier exists in the system.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-put-kg-core`](tg-put-kg-core.md) - Store knowledge core from file
- [`tg-get-kg-core`](tg-get-kg-core.md) - Retrieve knowledge core to file
- [`tg-load-kg-core`](tg-load-kg-core.md) - Load knowledge core into flow
- [`tg-delete-kg-core`](tg-delete-kg-core.md) - Remove knowledge core
- [`tg-unload-kg-core`](tg-unload-kg-core.md) - Unload knowledge core from flow
## API Integration
This command uses the [Knowledge API](../apis/api-knowledge.md) with the `list-kg-cores` operation to retrieve available knowledge cores.
## Use Cases
### Knowledge Inventory
```bash
# Check what knowledge is available
tg-show-kg-cores
# Document available knowledge resources
tg-show-kg-cores > knowledge-inventory.txt
```
### Pre-Processing Verification
```bash
# Verify knowledge cores exist before loading
tg-show-kg-cores | grep "medical"
tg-load-kg-core --kg-core-id "medical-knowledge-v1" --flow-id "medical-flow"
```
### Multi-User Management
```bash
# Check knowledge for different users
tg-show-kg-cores -U researcher
tg-show-kg-cores -U analyst
tg-show-kg-cores -U admin
```
### Knowledge Discovery
```bash
# Find knowledge cores by pattern
tg-show-kg-cores | grep "2024"
tg-show-kg-cores | grep "medical"
tg-show-kg-cores | grep "v[0-9]"
```
### System Administration
```bash
# Audit knowledge core usage
for user in $(cat users.txt); do
echo "User: $user"
tg-show-kg-cores -U $user
echo
done
```
### Development Workflow
```bash
# Check development knowledge cores
tg-show-kg-cores -U developer | grep "test"
# Load test knowledge for development
tg-load-kg-core --kg-core-id "test-knowledge" --flow-id "dev-flow"
```
## Knowledge Core Lifecycle
1. **Creation**: Knowledge cores created via `tg-put-kg-core` or document processing
2. **Discovery**: Use `tg-show-kg-cores` to find available cores
3. **Loading**: Load cores into flows with `tg-load-kg-core`
4. **Usage**: Query loaded knowledge via RAG or agent services
5. **Management**: Update, backup, or remove cores as needed
## Best Practices
1. **Regular Inventory**: Check available knowledge cores regularly
2. **Naming Conventions**: Use consistent naming for easier discovery
3. **User Organization**: Organize knowledge cores by user and purpose
4. **Version Management**: Track knowledge core versions and updates
5. **Cleanup**: Remove obsolete knowledge cores to save storage
6. **Documentation**: Document knowledge core contents and purposes
## Integration with Other Commands
### Knowledge Loading Workflow
```bash
# 1. Discover available knowledge
tg-show-kg-cores
# 2. Start appropriate flow
tg-start-flow -n "research-class" -i "research-flow" -d "Research analysis"
# 3. Load relevant knowledge
tg-load-kg-core --kg-core-id "research-papers-2024" --flow-id "research-flow"
# 4. Query the knowledge
tg-invoke-graph-rag -q "What are the latest research trends?" -f "research-flow"
```
### Knowledge Management Workflow
```bash
# 1. Audit current knowledge
tg-show-kg-cores > current-cores.txt
# 2. Import new knowledge
tg-put-kg-core --id "new-research-2024" -i new-research.msgpack
# 3. Verify import
tg-show-kg-cores | grep "new-research-2024"
# 4. Remove old versions
tg-delete-kg-core --id "old-research-2023"
```

View file

@ -0,0 +1,481 @@
# tg-show-library-documents
Lists all documents stored in the TrustGraph document library with their metadata.
## Synopsis
```bash
tg-show-library-documents [options]
```
## Description
The `tg-show-library-documents` command displays all documents currently stored in TrustGraph's document library. For each document, it shows comprehensive metadata including ID, timestamp, title, document type, comments, and associated tags.
The document library serves as a centralized repository for managing documents before and after processing through TrustGraph workflows.
## Options
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User ID to filter documents (default: `trustgraph`)
## Examples
### List All Documents
```bash
tg-show-library-documents
```
### List Documents for Specific User
```bash
tg-show-library-documents -U "research-team"
```
### Using Custom API URL
```bash
tg-show-library-documents -u http://production:8088/
```
## Output Format
The command displays each document in a formatted table:
```
+-------+----------------------------------+
| id | doc_123456789 |
| time | 2023-12-15 10:30:45 |
| title | Technical Manual v2.1 |
| kind | PDF |
| note | Updated installation procedures |
| tags | technical, manual, v2.1 |
+-------+----------------------------------+
+-------+----------------------------------+
| id | doc_987654321 |
| time | 2023-12-14 15:22:10 |
| title | Q4 Financial Report |
| kind | PDF |
| note | Quarterly analysis and metrics |
| tags | finance, quarterly, 2023 |
+-------+----------------------------------+
```
### Document Properties
- **id**: Unique document identifier
- **time**: Upload/creation timestamp
- **title**: Document title or name
- **kind**: Document type (PDF, DOCX, TXT, etc.)
- **note**: Comments or description
- **tags**: Comma-separated list of tags
### Empty Results
If no documents exist:
```
No documents.
```
## Use Cases
### Document Inventory
```bash
# Get complete document inventory
tg-show-library-documents > document-inventory.txt
# Count total documents
tg-show-library-documents | grep -c "| id"
```
### Document Discovery
```bash
# Find documents by title pattern
tg-show-library-documents | grep -i "manual"
# Find documents by type
tg-show-library-documents | grep "| kind.*PDF"
# Find recent documents
tg-show-library-documents | grep "2023-12"
```
### User-Specific Queries
```bash
# List documents by different users
users=("research-team" "finance-dept" "legal-team")
for user in "${users[@]}"; do
echo "Documents for $user:"
tg-show-library-documents -U "$user"
echo "---"
done
```
### Document Management
```bash
# Extract document IDs for processing
tg-show-library-documents | \
grep "| id" | \
awk '{print $3}' > document-ids.txt
# Find documents by tags
tg-show-library-documents | \
grep -A5 -B5 "research" | \
grep "| id" | \
awk '{print $3}'
```
## Advanced Usage
### Document Analysis
```bash
# Analyze document distribution by type
analyze_document_types() {
echo "Document Type Distribution:"
echo "=========================="
tg-show-library-documents | \
grep "| kind" | \
awk '{print $3}' | \
sort | uniq -c | sort -nr
}
analyze_document_types
```
### Document Age Analysis
```bash
# Find old documents
find_old_documents() {
local days_old="$1"
echo "Documents older than $days_old days:"
echo "===================================="
cutoff_date=$(date -d "$days_old days ago" +"%Y-%m-%d")
tg-show-library-documents | \
grep "| time" | \
while read -r line; do
doc_date=$(echo "$line" | awk '{print $3}')
if [[ "$doc_date" < "$cutoff_date" ]]; then
echo "$line"
fi
done
}
# Find documents older than 30 days
find_old_documents 30
```
### Tag Analysis
```bash
# Analyze tag usage
analyze_tags() {
echo "Tag Usage Analysis:"
echo "=================="
tg-show-library-documents | \
grep "| tags" | \
sed 's/| tags.*| \(.*\) |/\1/' | \
tr ',' '\n' | \
sed 's/^ *//;s/ *$//' | \
sort | uniq -c | sort -nr
}
analyze_tags
```
### Document Search
```bash
# Search documents by multiple criteria
search_documents() {
local query="$1"
echo "Searching for: $query"
echo "===================="
tg-show-library-documents | \
grep -i -A6 -B6 "$query" | \
grep -E "^\+|^\|"
}
# Search for specific terms
search_documents "financial"
search_documents "manual"
```
### User Document Summary
```bash
# Generate user document summary
user_summary() {
local user="$1"
echo "Document Summary for User: $user"
echo "================================"
docs=$(tg-show-library-documents -U "$user")
if [[ "$docs" == "No documents." ]]; then
echo "No documents found for user: $user"
return
fi
# Count documents
doc_count=$(echo "$docs" | grep -c "| id")
echo "Total documents: $doc_count"
# Count by type
echo -e "\nBy type:"
echo "$docs" | \
grep "| kind" | \
awk '{print $3}' | \
sort | uniq -c | sort -nr
# Recent documents
echo -e "\nRecent documents (last 7 days):"
recent_date=$(date -d "7 days ago" +"%Y-%m-%d")
echo "$docs" | \
grep "| time" | \
awk -v cutoff="$recent_date" '$3 >= cutoff {print $0}'
}
# Generate summary for specific user
user_summary "research-team"
```
### Document Export
```bash
# Export document metadata to CSV
export_to_csv() {
local output_file="$1"
echo "id,time,title,kind,note,tags" > "$output_file"
tg-show-library-documents | \
awk '
BEGIN { record="" }
/^\+/ {
if (record != "") {
print record
record=""
}
}
/^\| id/ { gsub(/^\| id *\| /, ""); gsub(/ *\|$/, ""); record=$0"," }
/^\| time/ { gsub(/^\| time *\| /, ""); gsub(/ *\|$/, ""); record=record$0"," }
/^\| title/ { gsub(/^\| title *\| /, ""); gsub(/ *\|$/, ""); record=record$0"," }
/^\| kind/ { gsub(/^\| kind *\| /, ""); gsub(/ *\|$/, ""); record=record$0"," }
/^\| note/ { gsub(/^\| note *\| /, ""); gsub(/ *\|$/, ""); record=record$0"," }
/^\| tags/ { gsub(/^\| tags *\| /, ""); gsub(/ *\|$/, ""); record=record$0 }
END { if (record != "") print record }
' >> "$output_file"
echo "Exported to: $output_file"
}
# Export to CSV
export_to_csv "documents.csv"
```
### Document Monitoring
```bash
# Monitor document library changes
monitor_documents() {
local interval="$1"
local log_file="document_changes.log"
echo "Monitoring document library (interval: ${interval}s)"
echo "Log file: $log_file"
# Get initial state
tg-show-library-documents > last_state.tmp
while true; do
sleep "$interval"
# Get current state
tg-show-library-documents > current_state.tmp
# Compare states
if ! diff -q last_state.tmp current_state.tmp > /dev/null; then
timestamp=$(date)
echo "[$timestamp] Document library changed" >> "$log_file"
# Log differences
diff last_state.tmp current_state.tmp >> "$log_file"
echo "---" >> "$log_file"
# Update last state
mv current_state.tmp last_state.tmp
echo "[$timestamp] Changes detected and logged"
else
rm current_state.tmp
fi
done
}
# Monitor every 60 seconds
monitor_documents 60
```
### Bulk Operations Helper
```bash
# Generate commands for bulk operations
generate_bulk_commands() {
local operation="$1"
case "$operation" in
"remove-old")
echo "# Commands to remove old documents:"
cutoff_date=$(date -d "90 days ago" +"%Y-%m-%d")
tg-show-library-documents | \
grep -B1 "| time.*$cutoff_date" | \
grep "| id" | \
awk '{print "tg-remove-library-document --id " $3}'
;;
"process-unprocessed")
echo "# Commands to process documents:"
tg-show-library-documents | \
grep "| id" | \
awk '{print "tg-start-library-processing -d " $3 " --id proc-" $3}'
;;
*)
echo "Unknown operation: $operation"
echo "Available: remove-old, process-unprocessed"
;;
esac
}
# Generate removal commands for old documents
generate_bulk_commands "remove-old"
```
## Integration with Other Commands
### Document Processing Workflow
```bash
# Complete document workflow
process_document_workflow() {
echo "Document Library Workflow"
echo "========================"
# 1. List current documents
echo "Current documents:"
tg-show-library-documents
# 2. Add new document (example)
# tg-add-library-document --file new-doc.pdf --title "New Document"
# 3. Start processing
# tg-start-library-processing -d doc_id --id proc_id
# 4. Monitor processing
# tg-show-flows | grep processing
# 5. Verify completion
echo "Documents after processing:"
tg-show-library-documents
}
```
### Document Lifecycle Management
```bash
# Manage document lifecycle
lifecycle_management() {
echo "Document Lifecycle Management"
echo "============================"
# Get all documents
tg-show-library-documents | \
grep "| id" | \
awk '{print $3}' | \
while read doc_id; do
echo "Processing document: $doc_id"
# Check if already processed
if tg-invoke-document-rag -q "test" 2>/dev/null | grep -q "$doc_id"; then
echo " Already processed"
else
echo " Starting processing..."
# tg-start-library-processing -d "$doc_id" --id "proc-$doc_id"
fi
done
}
```
## Error Handling
### Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph is running.
### Permission Errors
```bash
Exception: Access denied
```
**Solution**: Verify user permissions for library access.
### User Not Found
```bash
Exception: User not found
```
**Solution**: Check user ID spelling and ensure user exists.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-add-library-document`](tg-add-library-document.md) - Add documents to library
- [`tg-remove-library-document`](tg-remove-library-document.md) - Remove documents from library
- [`tg-start-library-processing`](tg-start-library-processing.md) - Start document processing
- [`tg-stop-library-processing`](tg-stop-library-processing.md) - Stop document processing
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Query processed documents
## API Integration
This command uses the [Library API](../apis/api-librarian.md) to retrieve document metadata and listings.
## Best Practices
1. **Regular Monitoring**: Check library contents regularly
2. **User Organization**: Use different users for different document categories
3. **Tag Consistency**: Maintain consistent tagging schemes
4. **Cleanup**: Regularly remove outdated documents
5. **Backup**: Export document metadata for backup purposes
6. **Access Control**: Use appropriate user permissions
7. **Documentation**: Maintain good document titles and descriptions
## Troubleshooting
### No Documents Shown
```bash
# Check if documents exist for different users
tg-show-library-documents -U "different-user"
# Verify API connectivity
curl -s "$TRUSTGRAPH_URL/api/v1/library/documents" > /dev/null
echo "API response: $?"
```
### Formatting Issues
```bash
# If output is garbled, check terminal width
export COLUMNS=120
tg-show-library-documents
```
### Slow Response
```bash
# For large document libraries, consider filtering by user
tg-show-library-documents -U "specific-user"
# Check system resources
free -h
ps aux | grep trustgraph
```

View file

@ -0,0 +1,572 @@
# tg-show-library-processing
Displays all active library document processing records and their details.
## Synopsis
```bash
tg-show-library-processing [options]
```
## Description
The `tg-show-library-processing` command lists all library document processing records, showing the status and details of document processing jobs that have been initiated through the library system. This provides visibility into which documents are being processed, their associated flows, and processing metadata.
## Options
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User ID to filter processing records (default: `trustgraph`)
## Examples
### Show All Processing Records
```bash
tg-show-library-processing
```
### Show Processing for Specific User
```bash
tg-show-library-processing -U "research-team"
```
### Use Custom API URL
```bash
tg-show-library-processing -u http://production:8088/
```
## Output Format
The command displays processing records in formatted tables:
```
+----------------+----------------------------------+
| id | proc_research_001 |
| document-id | doc_123456789 |
| time | 2023-12-15 14:30:22 |
| flow | research-processing |
| collection | research-docs |
| tags | nlp, research, automated |
+----------------+----------------------------------+
+----------------+----------------------------------+
| id | proc_batch_002 |
| document-id | doc_987654321 |
| time | 2023-12-15 14:25:18 |
| flow | document-analysis |
| collection | batch-processed |
| tags | batch, analysis |
+----------------+----------------------------------+
```
### Field Details
- **id**: Unique processing record identifier
- **document-id**: ID of the document being processed
- **time**: Timestamp when processing was initiated
- **flow**: Flow instance used for processing
- **collection**: Target collection for processed data
- **tags**: Associated tags for categorization
### Empty Results
If no processing records exist:
```
No processing objects.
```
## Use Cases
### Processing Status Monitoring
```bash
# Monitor active processing jobs
monitor_processing_status() {
local interval="${1:-30}" # Default 30 seconds
echo "Monitoring library processing status..."
echo "Refresh interval: ${interval}s"
echo "Press Ctrl+C to stop"
while true; do
clear
echo "Library Processing Monitor - $(date)"
echo "===================================="
tg-show-library-processing
echo -e "\nProcessing Summary:"
processing_count=$(tg-show-library-processing 2>/dev/null | grep -c "| id" || echo "0")
echo "Active processing jobs: $processing_count"
sleep "$interval"
done
}
# Start monitoring
monitor_processing_status 15
```
### User Activity Analysis
```bash
# Analyze processing activity by user
analyze_user_processing() {
local users=("user1" "user2" "user3" "research-team")
echo "Processing Activity Analysis"
echo "==========================="
for user in "${users[@]}"; do
echo -e "\n--- User: $user ---"
processing_output=$(tg-show-library-processing -U "$user" 2>/dev/null)
if echo "$processing_output" | grep -q "No processing objects"; then
echo "No active processing"
else
count=$(echo "$processing_output" | grep -c "| id" || echo "0")
echo "Active processing jobs: $count"
# Show recent jobs
echo "Recent processing:"
echo "$processing_output" | grep -E "(id|time|flow)" | head -9
fi
done
}
# Run analysis
analyze_user_processing
```
### Processing Queue Management
```bash
# Manage processing queue
manage_processing_queue() {
echo "Processing Queue Management"
echo "=========================="
# Show current queue
echo "Current processing queue:"
tg-show-library-processing
# Count by flow
echo -e "\nProcessing jobs by flow:"
tg-show-library-processing | \
grep "| flow" | \
awk '{print $3}' | \
sort | uniq -c | sort -nr
# Count by collection
echo -e "\nProcessing jobs by collection:"
tg-show-library-processing | \
grep "| collection" | \
awk '{print $3}' | \
sort | uniq -c | sort -nr
# Find long-running jobs (would need timestamps comparison)
echo -e "\nNote: Check timestamps for long-running jobs"
}
# Run queue management
manage_processing_queue
```
### Cleanup and Maintenance
```bash
# Clean up completed processing records
cleanup_processing_records() {
local user="$1"
local max_age_days="${2:-7}" # Default 7 days
echo "Cleaning up processing records older than $max_age_days days for user: $user"
# Get processing records
processing_output=$(tg-show-library-processing -U "$user")
if echo "$processing_output" | grep -q "No processing objects"; then
echo "No processing records to clean up"
return
fi
# Parse processing records (this is a simplified example)
echo "$processing_output" | \
grep "| id" | \
awk '{print $3}' | \
while read proc_id; do
echo "Checking processing record: $proc_id"
# Get the time for this processing record
proc_time=$(echo "$processing_output" | \
grep -A10 "$proc_id" | \
grep "| time" | \
awk '{print $3 " " $4}')
if [ -n "$proc_time" ]; then
# Calculate age (this would need proper date comparison)
echo "Processing record $proc_id from: $proc_time"
# Check if document processing is complete
if tg-invoke-document-rag -q "test" -U "$user" 2>/dev/null | grep -q "answer"; then
echo "Document appears to be processed, considering cleanup..."
# tg-stop-library-processing --id "$proc_id" -U "$user"
fi
fi
done
}
# Clean up old records
cleanup_processing_records "test-user" 3
```
## Advanced Usage
### Processing Performance Analysis
```bash
# Analyze processing performance
analyze_processing_performance() {
echo "Processing Performance Analysis"
echo "=============================="
# Get all processing records
processing_data=$(tg-show-library-processing)
if echo "$processing_data" | grep -q "No processing objects"; then
echo "No processing data available"
return
fi
# Count total processing jobs
total_jobs=$(echo "$processing_data" | grep -c "| id")
echo "Total active processing jobs: $total_jobs"
# Analyze by flow type
echo -e "\nJobs by flow type:"
echo "$processing_data" | \
grep "| flow" | \
awk '{print $3}' | \
sort | uniq -c | sort -nr | \
while read count flow; do
echo " $flow: $count jobs"
done
# Analyze by time patterns
echo -e "\nJobs by hour (last 24h):"
echo "$processing_data" | \
grep "| time" | \
awk '{print $4}' | \
cut -d: -f1 | \
sort | uniq -c | sort -k2n | \
while read count hour; do
echo " ${hour}:00: $count jobs"
done
}
# Run performance analysis
analyze_processing_performance
```
### Cross-User Processing Comparison
```bash
# Compare processing across users
compare_user_processing() {
local users=("$@")
echo "Cross-User Processing Comparison"
echo "==============================="
for user in "${users[@]}"; do
echo -e "\n--- User: $user ---"
processing_data=$(tg-show-library-processing -U "$user" 2>/dev/null)
if echo "$processing_data" | grep -q "No processing objects"; then
echo "Active jobs: 0"
echo "Collections: none"
echo "Flows: none"
else
# Count jobs
job_count=$(echo "$processing_data" | grep -c "| id")
echo "Active jobs: $job_count"
# List collections
collections=$(echo "$processing_data" | \
grep "| collection" | \
awk '{print $3}' | \
sort | uniq | tr '\n' ', ' | sed 's/,$//')
echo "Collections: $collections"
# List flows
flows=$(echo "$processing_data" | \
grep "| flow" | \
awk '{print $3}' | \
sort | uniq | tr '\n' ', ' | sed 's/,$//')
echo "Flows: $flows"
fi
done
}
# Compare processing for multiple users
compare_user_processing "user1" "user2" "research-team" "admin"
```
### Processing Health Check
```bash
# Health check for processing system
processing_health_check() {
echo "Library Processing Health Check"
echo "=============================="
# Check if processing service is responsive
if tg-show-library-processing > /dev/null 2>&1; then
echo "✓ Processing service is responsive"
else
echo "✗ Processing service is not responsive"
return 1
fi
# Get processing statistics
processing_data=$(tg-show-library-processing 2>/dev/null)
if echo "$processing_data" | grep -q "No processing objects"; then
echo " No active processing jobs"
else
active_jobs=$(echo "$processing_data" | grep -c "| id")
echo " Active processing jobs: $active_jobs"
# Check for stuck jobs (simplified check)
echo "Recent job timestamps:"
echo "$processing_data" | \
grep "| time" | \
awk '{print $3 " " $4}' | \
head -5
fi
# Check flow availability
echo -e "\nFlow availability check:"
flows=$(echo "$processing_data" | grep "| flow" | awk '{print $3}' | sort | uniq)
for flow in $flows; do
if tg-show-flows | grep -q "$flow"; then
echo "✓ Flow '$flow' is available"
else
echo "⚠ Flow '$flow' may not be available"
fi
done
echo "Health check completed"
}
# Run health check
processing_health_check
```
### Processing Report Generation
```bash
# Generate comprehensive processing report
generate_processing_report() {
local output_file="processing_report_$(date +%Y%m%d_%H%M%S).txt"
echo "Generating processing report: $output_file"
cat > "$output_file" << EOF
TrustGraph Library Processing Report
Generated: $(date)
====================================
EOF
# Overall statistics
echo "OVERVIEW" >> "$output_file"
echo "--------" >> "$output_file"
processing_data=$(tg-show-library-processing 2>/dev/null)
if echo "$processing_data" | grep -q "No processing objects"; then
echo "No active processing jobs" >> "$output_file"
else
total_jobs=$(echo "$processing_data" | grep -c "| id")
echo "Total active jobs: $total_jobs" >> "$output_file"
# Flow distribution
echo -e "\nFLOW DISTRIBUTION" >> "$output_file"
echo "-----------------" >> "$output_file"
echo "$processing_data" | \
grep "| flow" | \
awk '{print $3}' | \
sort | uniq -c | sort -nr >> "$output_file"
# Collection distribution
echo -e "\nCOLLECTION DISTRIBUTION" >> "$output_file"
echo "-----------------------" >> "$output_file"
echo "$processing_data" | \
grep "| collection" | \
awk '{print $3}' | \
sort | uniq -c | sort -nr >> "$output_file"
# Recent activity
echo -e "\nRECENT PROCESSING JOBS" >> "$output_file"
echo "----------------------" >> "$output_file"
echo "$processing_data" | head -50 >> "$output_file"
fi
echo "Report generated: $output_file"
}
# Generate report
generate_processing_report
```
## Integration with Other Commands
### Processing Workflow Management
```bash
# Complete processing workflow
manage_processing_workflow() {
local user="$1"
local action="$2"
case "$action" in
"status")
echo "Processing status for user: $user"
tg-show-library-processing -U "$user"
;;
"start-batch")
echo "Starting batch processing for user: $user"
tg-show-library-documents -U "$user" | \
grep "| id" | \
awk '{print $3}' | \
while read doc_id; do
proc_id="batch_$(date +%s)_${doc_id}"
tg-start-library-processing -d "$doc_id" --id "$proc_id" -U "$user"
done
;;
"cleanup")
echo "Cleaning up completed processing for user: $user"
cleanup_processing_records "$user"
;;
*)
echo "Usage: manage_processing_workflow <user> <status|start-batch|cleanup>"
;;
esac
}
# Manage workflow for user
manage_processing_workflow "research-team" "status"
```
### Monitoring Integration
```bash
# Integration with system monitoring
processing_metrics_export() {
local metrics_file="processing_metrics.txt"
# Get processing data
processing_data=$(tg-show-library-processing 2>/dev/null)
if echo "$processing_data" | grep -q "No processing objects"; then
active_jobs=0
else
active_jobs=$(echo "$processing_data" | grep -c "| id")
fi
# Export metrics
echo "trustgraph_library_processing_active_jobs $active_jobs" > "$metrics_file"
echo "trustgraph_library_processing_timestamp $(date +%s)" >> "$metrics_file"
# Export by flow
if [ "$active_jobs" -gt 0 ]; then
echo "$processing_data" | \
grep "| flow" | \
awk '{print $3}' | \
sort | uniq -c | \
while read count flow; do
echo "trustgraph_library_processing_jobs_by_flow{flow=\"$flow\"} $count" >> "$metrics_file"
done
fi
echo "Metrics exported to: $metrics_file"
}
processing_metrics_export
```
## Error Handling
### API Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph is running.
### Permission Errors
```bash
Exception: Access denied
```
**Solution**: Verify user permissions for library access.
### User Not Found
```bash
Exception: User not found
```
**Solution**: Check user ID and ensure user exists in the system.
### Service Unavailable
```bash
Exception: Service temporarily unavailable
```
**Solution**: Check TrustGraph service status and try again.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-start-library-processing`](tg-start-library-processing.md) - Start document processing
- [`tg-stop-library-processing`](tg-stop-library-processing.md) - Stop document processing
- [`tg-show-library-documents`](tg-show-library-documents.md) - List library documents
- [`tg-show-flows`](tg-show-flows.md) - List available flows
## API Integration
This command uses the [Library API](../apis/api-librarian.md) to retrieve processing record information.
## Best Practices
1. **Regular Monitoring**: Check processing status regularly
2. **User Filtering**: Use user filtering to focus on relevant processing
3. **Cleanup**: Regularly clean up completed processing records
4. **Performance Tracking**: Monitor processing patterns and performance
5. **Integration**: Integrate with monitoring and alerting systems
6. **Documentation**: Document processing workflows and procedures
7. **Troubleshooting**: Use processing information for issue diagnosis
## Troubleshooting
### No Processing Records
```bash
# Check if library service is running
curl -s http://localhost:8088/api/v1/library/processing
# Verify documents exist
tg-show-library-documents
```
### Stale Processing Records
```bash
# Check for long-running processes
tg-show-library-processing | grep "$(date -d '1 hour ago' '+%Y-%m-%d')"
# Check flow status
tg-show-flows
```
### Performance Issues
```bash
# Check system resources
free -h
df -h
# Monitor API response times
time tg-show-library-processing
```

View file

@ -0,0 +1,196 @@
# tg-show-processor-state
## Synopsis
```
tg-show-processor-state [OPTIONS]
```
## Description
The `tg-show-processor-state` command displays the current state of TrustGraph processors by querying the metrics endpoint. It retrieves processor information from the Prometheus metrics API and displays active processors with visual status indicators.
This command is useful for:
- Monitoring processor health and availability
- Verifying that processors are running correctly
- Troubleshooting processor connectivity issues
- Getting a quick overview of active TrustGraph components
## Options
- `-m, --metrics-url URL`
- Metrics endpoint URL to query for processor information
- Default: `http://localhost:8088/api/metrics`
- Should point to a Prometheus-compatible metrics endpoint
- `-h, --help`
- Show help message and exit
## Examples
### Basic Usage
Display processor states using the default metrics URL:
```bash
tg-show-processor-state
```
### Custom Metrics URL
Query processor states from a different metrics endpoint:
```bash
tg-show-processor-state -m http://metrics.example.com:8088/api/metrics
```
### Remote Monitoring
Monitor processors on a remote TrustGraph instance:
```bash
tg-show-processor-state --metrics-url http://10.0.1.100:8088/api/metrics
```
## Output Format
The command displays processor information in a table format:
```
processor_name 💚
another_processor 💚
third_processor 💚
```
Each line shows:
- Processor name (left-aligned, 30 characters wide)
- Status indicator (💚 for active processors)
## Advanced Usage
### Monitoring Script
Create a monitoring script to periodically check processor states:
```bash
#!/bin/bash
while true; do
echo "=== Processor State Check ==="
date
tg-show-processor-state
echo
sleep 30
done
```
### Health Check Integration
Use in health check scripts:
```bash
#!/bin/bash
output=$(tg-show-processor-state 2>&1)
if [ $? -eq 0 ]; then
echo "Processors are running"
echo "$output"
else
echo "Error checking processor state: $output"
exit 1
fi
```
### Multiple Environment Monitoring
Monitor processors across different environments:
```bash
#!/bin/bash
for env in dev staging prod; do
echo "=== $env Environment ==="
tg-show-processor-state -m "http://${env}-metrics:8088/api/metrics"
echo
done
```
## Error Handling
The command handles various error conditions:
- **Connection errors**: If the metrics endpoint is unavailable
- **Invalid JSON**: If the metrics response is malformed
- **Missing data**: If the expected processor_info metric is not found
- **HTTP errors**: If the metrics endpoint returns an error status
Common error scenarios:
```bash
# Metrics endpoint not available
tg-show-processor-state -m http://invalid-host:8088/api/metrics
# Output: Exception: [Connection error details]
# Invalid URL format
tg-show-processor-state -m "not-a-url"
# Output: Exception: [URL parsing error]
```
## Integration with Other Commands
### With Flow Monitoring
Combine with flow state monitoring:
```bash
echo "=== Processor States ==="
tg-show-processor-state
echo
echo "=== Flow States ==="
tg-show-flow-state
```
### With Configuration Display
Check processors and current configuration:
```bash
echo "=== Active Processors ==="
tg-show-processor-state
echo
echo "=== Current Configuration ==="
tg-show-config
```
## Best Practices
1. **Regular Monitoring**: Include in regular health check routines
2. **Error Handling**: Always check command exit status in scripts
3. **Logging**: Capture output for historical analysis
4. **Alerting**: Set up alerts based on processor availability
5. **Documentation**: Keep track of expected processors for each environment
## Troubleshooting
### No Processors Shown
If no processors are displayed:
1. Verify the metrics endpoint is accessible
2. Check that TrustGraph processors are running
3. Ensure processors are properly configured to export metrics
4. Verify the metrics URL is correct
### Connection Issues
For connection problems:
1. Test network connectivity to the metrics endpoint
2. Verify the metrics service is running
3. Check firewall rules and network policies
4. Ensure the correct port is being used
### Metrics Format Issues
If the command fails with JSON parsing errors:
1. Verify the metrics endpoint returns Prometheus-compatible data
2. Check that the `processor_info` metric exists
3. Ensure the metrics service is properly configured
## Related Commands
- [`tg-show-flow-state`](tg-show-flow-state.md) - Display flow processor states
- [`tg-show-config`](tg-show-config.md) - Show TrustGraph configuration
- [`tg-show-token-costs`](tg-show-token-costs.md) - Display token usage costs
- [`tg-show-library-processing`](tg-show-library-processing.md) - Show library processing status
## See Also
- TrustGraph Processor Documentation
- Prometheus Metrics Configuration
- TrustGraph Monitoring Guide

454
docs/cli/tg-show-prompts.md Normal file
View file

@ -0,0 +1,454 @@
# tg-show-prompts
Displays all configured prompt templates and system prompts in TrustGraph.
## Synopsis
```bash
tg-show-prompts [options]
```
## Description
The `tg-show-prompts` command displays all prompt templates and the system prompt currently configured in TrustGraph. This includes template IDs, prompt text, response types, and JSON schemas for structured responses.
Use this command to review existing prompts, verify configurations, and understand available templates for use with `tg-invoke-prompt`.
## Options
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Display All Prompts
```bash
tg-show-prompts
```
### Using Custom API URL
```bash
tg-show-prompts -u http://production:8088/
```
## Output Format
The command displays prompts in formatted tables:
```
System prompt:
+---------+--------------------------------------------------+
| prompt | You are a helpful AI assistant. Always provide |
| | accurate, concise responses. When uncertain, |
| | clearly state your limitations. |
+---------+--------------------------------------------------+
greeting:
+---------+--------------------------------------------------+
| prompt | Hello {{name}}, welcome to {{place}}! |
+---------+--------------------------------------------------+
question:
+----------+-------------------------------------------------+
| prompt | Answer this question based on the context: |
| | {{question}} |
| | |
| | Context: {{context}} |
+----------+-------------------------------------------------+
extract-info:
+----------+-------------------------------------------------+
| prompt | Extract key information from: {{text}} |
| response | json |
| schema | {"type": "object", "properties": { |
| | "name": {"type": "string"}, |
| | "age": {"type": "number"}}} |
+----------+-------------------------------------------------+
```
### Template Information
For each template, the output shows:
- **prompt**: The template text with variable placeholders
- **response**: Response format (`text` or `json`)
- **schema**: JSON schema for structured responses (when applicable)
## Use Cases
### Template Discovery
```bash
# Find all available templates
tg-show-prompts | grep "^[a-zA-Z]" | grep ":"
# Find templates with specific keywords
tg-show-prompts | grep -B5 -A5 "analyze"
```
### Template Verification
```bash
# Check if specific template exists
if tg-show-prompts | grep -q "my-template:"; then
echo "Template exists"
else
echo "Template not found"
fi
```
### Configuration Review
```bash
# Review current system prompt
tg-show-prompts | grep -A10 "System prompt:"
# Check JSON response templates
tg-show-prompts | grep -B2 -A5 "response.*json"
```
### Template Inventory
```bash
# Count total templates
template_count=$(tg-show-prompts | grep -c "^[a-zA-Z][^:]*:$")
echo "Total templates: $template_count"
# List template names only
tg-show-prompts | grep "^[a-zA-Z][^:]*:$" | sed 's/:$//'
```
## Advanced Usage
### Template Analysis
```bash
# Analyze template complexity
analyze_templates() {
echo "Template Analysis"
echo "================"
tg-show-prompts > temp_prompts.txt
# Count variables per template
echo "Templates with variables:"
grep -B1 -A5 "{{" temp_prompts.txt | \
grep "^[a-zA-Z]" | \
while read template; do
var_count=$(grep -A5 "$template" temp_prompts.txt | grep -o "{{[^}]*}}" | wc -l)
echo " $template $var_count variables"
done
# Find JSON response templates
echo -e "\nJSON Response Templates:"
grep -B1 "response.*json" temp_prompts.txt | \
grep "^[a-zA-Z]" | \
sed 's/:$//'
rm temp_prompts.txt
}
analyze_templates
```
### Template Documentation Generator
```bash
# Generate template documentation
generate_template_docs() {
local output_file="template_documentation.md"
echo "# TrustGraph Prompt Templates" > "$output_file"
echo "Generated on $(date)" >> "$output_file"
echo "" >> "$output_file"
# Extract system prompt
echo "## System Prompt" >> "$output_file"
tg-show-prompts | \
awk '/System prompt:/,/^\+.*\+$/' | \
grep "| prompt" | \
sed 's/| prompt | //' | \
sed 's/ *|$//' >> "$output_file"
echo "" >> "$output_file"
echo "## Templates" >> "$output_file"
# Extract each template
tg-show-prompts | \
grep "^[a-zA-Z][^:]*:$" | \
sed 's/:$//' | \
while read template_id; do
echo "" >> "$output_file"
echo "### $template_id" >> "$output_file"
# Get template details
tg-show-prompts | \
awk "/^$template_id:/,/^$/" | \
while read line; do
if [[ "$line" =~ ^\|\ prompt ]]; then
echo "**Prompt:**" >> "$output_file"
echo '```' >> "$output_file"
echo "$line" | sed 's/| prompt[[:space:]]*| //' | sed 's/ *|$//' >> "$output_file"
echo '```' >> "$output_file"
elif [[ "$line" =~ ^\|\ response ]]; then
response_type=$(echo "$line" | sed 's/| response[[:space:]]*| //' | sed 's/ *|$//')
echo "**Response Type:** $response_type" >> "$output_file"
elif [[ "$line" =~ ^\|\ schema ]]; then
echo "**JSON Schema:**" >> "$output_file"
echo '```json' >> "$output_file"
echo "$line" | sed 's/| schema[[:space:]]*| //' | sed 's/ *|$//' >> "$output_file"
echo '```' >> "$output_file"
fi
done
done
echo "Documentation generated: $output_file"
}
generate_template_docs
```
### Template Validation
```bash
# Validate template configurations
validate_templates() {
echo "Template Validation Report"
echo "========================="
tg-show-prompts > temp_prompts.txt
# Check for templates without variables
echo "Templates without variables:"
grep -B1 -A5 "^[a-zA-Z]" temp_prompts.txt | \
grep -v "{{" | \
grep "^[a-zA-Z][^:]*:$" | \
sed 's/:$//' | \
while read template; do
if ! grep -A5 "$template:" temp_prompts.txt | grep -q "{{"; then
echo " - $template"
fi
done
# Check JSON templates have schemas
echo -e "\nJSON templates without schemas:"
grep -B1 -A10 "response.*json" temp_prompts.txt | \
grep -B10 -A10 "response.*json" | \
while read -r line; do
if [[ "$line" =~ ^([a-zA-Z][^:]*):$ ]]; then
template="${BASH_REMATCH[1]}"
if ! grep -A10 "$template:" temp_prompts.txt | grep -q "schema"; then
echo " - $template"
fi
fi
done
rm temp_prompts.txt
}
validate_templates
```
### Template Usage Examples
```bash
# Generate usage examples for templates
generate_usage_examples() {
local template_id="$1"
echo "Usage examples for template: $template_id"
echo "========================================"
# Extract template and find variables
tg-show-prompts | \
awk "/^$template_id:/,/^$/" | \
grep "| prompt" | \
sed 's/| prompt[[:space:]]*| //' | \
sed 's/ *|$//' | \
while read prompt_text; do
echo "Template:"
echo "$prompt_text"
echo ""
# Extract variables
variables=$(echo "$prompt_text" | grep -o "{{[^}]*}}" | sed 's/[{}]//g' | sort | uniq)
if [ -n "$variables" ]; then
echo "Variables:"
for var in $variables; do
echo " - $var"
done
echo ""
echo "Example usage:"
cmd="tg-invoke-prompt $template_id"
for var in $variables; do
case "$var" in
*name*) cmd="$cmd $var=\"John Doe\"" ;;
*text*|*content*) cmd="$cmd $var=\"Sample text content\"" ;;
*question*) cmd="$cmd $var=\"What is this about?\"" ;;
*context*) cmd="$cmd $var=\"Background information\"" ;;
*) cmd="$cmd $var=\"value\"" ;;
esac
done
echo "$cmd"
else
echo "No variables found."
echo "Usage: tg-invoke-prompt $template_id"
fi
done
}
# Generate examples for specific template
generate_usage_examples "question"
```
### Environment Comparison
```bash
# Compare templates between environments
compare_environments() {
local env1_url="$1"
local env2_url="$2"
echo "Comparing templates between environments"
echo "======================================"
# Get templates from both environments
tg-show-prompts -u "$env1_url" | grep "^[a-zA-Z][^:]*:$" | sed 's/:$//' | sort > env1_templates.txt
tg-show-prompts -u "$env2_url" | grep "^[a-zA-Z][^:]*:$" | sed 's/:$//' | sort > env2_templates.txt
echo "Environment 1 ($env1_url): $(wc -l < env1_templates.txt) templates"
echo "Environment 2 ($env2_url): $(wc -l < env2_templates.txt) templates"
echo ""
# Find differences
echo "Templates only in Environment 1:"
comm -23 env1_templates.txt env2_templates.txt | sed 's/^/ - /'
echo -e "\nTemplates only in Environment 2:"
comm -13 env1_templates.txt env2_templates.txt | sed 's/^/ - /'
echo -e "\nCommon templates:"
comm -12 env1_templates.txt env2_templates.txt | sed 's/^/ - /'
rm env1_templates.txt env2_templates.txt
}
# Compare development and production
compare_environments "http://dev:8088/" "http://prod:8088/"
```
### Template Export/Import
```bash
# Export templates to JSON
export_templates() {
local output_file="$1"
echo "Exporting templates to: $output_file"
echo "{" > "$output_file"
echo " \"export_date\": \"$(date -Iseconds)\"," >> "$output_file"
echo " \"system_prompt\": \"$(tg-show-prompts | awk '/System prompt:/,/^\+.*\+$/' | grep '| prompt' | sed 's/| prompt[[:space:]]*| //' | sed 's/ *|$//' | sed 's/"/\\"/g')\"," >> "$output_file"
echo " \"templates\": {" >> "$output_file"
first=true
tg-show-prompts | \
grep "^[a-zA-Z][^:]*:$" | \
sed 's/:$//' | \
while read template_id; do
if [ "$first" = "false" ]; then
echo "," >> "$output_file"
fi
first=false
echo -n " \"$template_id\": {" >> "$output_file"
# Extract template details
tg-show-prompts | \
awk "/^$template_id:/,/^$/" | \
while read line; do
if [[ "$line" =~ ^\|\ prompt ]]; then
prompt=$(echo "$line" | sed 's/| prompt[[:space:]]*| //' | sed 's/ *|$//' | sed 's/"/\\"/g')
echo -n "\"prompt\": \"$prompt\"" >> "$output_file"
elif [[ "$line" =~ ^\|\ response ]]; then
response=$(echo "$line" | sed 's/| response[[:space:]]*| //' | sed 's/ *|$//')
echo -n ", \"response\": \"$response\"" >> "$output_file"
elif [[ "$line" =~ ^\|\ schema ]]; then
schema=$(echo "$line" | sed 's/| schema[[:space:]]*| //' | sed 's/ *|$//' | sed 's/"/\\"/g')
echo -n ", \"schema\": \"$schema\"" >> "$output_file"
fi
done
echo "}" >> "$output_file"
done
echo " }" >> "$output_file"
echo "}" >> "$output_file"
echo "Export completed: $output_file"
}
# Export current templates
export_templates "templates_backup.json"
```
## Error Handling
### Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph is running.
### Permission Errors
```bash
Exception: Access denied
```
**Solution**: Verify user permissions for configuration access.
### No Templates Found
```bash
# Empty output or no templates section
```
**Solution**: Check if any templates are configured with `tg-set-prompt`.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-set-prompt`](tg-set-prompt.md) - Create/update prompt templates
- [`tg-invoke-prompt`](tg-invoke-prompt.md) - Use prompt templates
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Document-based queries
## API Integration
This command uses the [Config API](../apis/api-config.md) to retrieve prompt templates and system prompts from TrustGraph's configuration system.
## Best Practices
1. **Regular Review**: Periodically review templates for relevance and accuracy
2. **Documentation**: Document template purposes and expected variables
3. **Version Control**: Track template changes over time
4. **Testing**: Verify templates work as expected after viewing
5. **Organization**: Use consistent naming conventions for templates
6. **Cleanup**: Remove unused or outdated templates
7. **Backup**: Export templates for backup and migration purposes
## Troubleshooting
### Formatting Issues
```bash
# If output is garbled or truncated
export COLUMNS=120
tg-show-prompts
```
### Missing Templates
```bash
# Check if templates are actually configured
tg-show-prompts | grep -c "^[a-zA-Z].*:$"
# Verify API connectivity
curl -s "$TRUSTGRAPH_URL/api/v1/config" > /dev/null
```
### Template Not Displaying
```bash
# Check template was set correctly
tg-set-prompt --id "test" --prompt "test template"
tg-show-prompts | grep "test:"
```

View file

@ -0,0 +1,470 @@
# tg-show-token-costs
Displays token cost configuration for language models in TrustGraph.
## Synopsis
```bash
tg-show-token-costs [options]
```
## Description
The `tg-show-token-costs` command displays the configured token pricing for all language models in TrustGraph. This information shows input and output costs per million tokens, which is used for cost tracking, billing, and resource management.
The costs are displayed in a tabular format showing model names and their associated pricing in dollars per million tokens.
## Options
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Display All Token Costs
```bash
tg-show-token-costs
```
### Using Custom API URL
```bash
tg-show-token-costs -u http://production:8088/
```
## Output Format
The command displays costs in a formatted table:
```
+----------------+-------------+--------------+
| model | input, $/Mt | output, $/Mt |
+----------------+-------------+--------------+
| gpt-4 | 30.000 | 60.000 |
| gpt-3.5-turbo | 0.500 | 1.500 |
| claude-3-sonnet| 3.000 | 15.000 |
| claude-3-haiku | 0.250 | 1.250 |
| local-model | 0.000 | 0.000 |
+----------------+-------------+--------------+
```
### Column Details
- **model**: Language model identifier
- **input, $/Mt**: Cost per million input tokens in USD
- **output, $/Mt**: Cost per million output tokens in USD
### Missing Configuration
If a model has incomplete cost configuration:
```
+----------------+-------------+--------------+
| model | input, $/Mt | output, $/Mt |
+----------------+-------------+--------------+
| unconfigured | - | - |
+----------------+-------------+--------------+
```
## Use Cases
### Cost Monitoring
```bash
# Check current cost configuration
tg-show-token-costs
# Monitor costs over time
echo "$(date): $(tg-show-token-costs)" >> cost_history.log
```
### Cost Analysis
```bash
# Find most expensive models
tg-show-token-costs | grep -v "model" | sort -k3 -nr
# Find free/local models
tg-show-token-costs | grep "0.000"
```
### Budget Planning
```bash
# Calculate potential costs for usage scenarios
analyze_costs() {
echo "Cost Analysis for Usage Scenarios"
echo "================================="
# Extract cost data
tg-show-token-costs | grep -v "model" | \
while read -r line; do
model=$(echo "$line" | awk '{print $1}' | tr -d '|' | tr -d ' ')
input_cost=$(echo "$line" | awk '{print $2}' | tr -d '|' | tr -d ' ')
output_cost=$(echo "$line" | awk '{print $3}' | tr -d '|' | tr -d ' ')
if [[ "$input_cost" != "-" && "$output_cost" != "-" ]]; then
echo "Model: $model"
echo " 1M input tokens: \$${input_cost}"
echo " 1M output tokens: \$${output_cost}"
echo " 10K conversation (5K in/5K out): \$$(echo "scale=3; ($input_cost * 5 + $output_cost * 5) / 1000" | bc -l)"
echo ""
fi
done
}
analyze_costs
```
### Environment Comparison
```bash
# Compare costs across environments
compare_costs() {
local env1_url="$1"
local env2_url="$2"
echo "Cost Comparison"
echo "==============="
echo "Environment 1: $env1_url"
tg-show-token-costs -u "$env1_url"
echo ""
echo "Environment 2: $env2_url"
tg-show-token-costs -u "$env2_url"
}
compare_costs "http://dev:8088/" "http://prod:8088/"
```
## Advanced Usage
### Cost Reporting
```bash
# Generate detailed cost report
generate_cost_report() {
local report_file="token_costs_$(date +%Y%m%d_%H%M%S).txt"
echo "TrustGraph Token Cost Report" > "$report_file"
echo "Generated: $(date)" >> "$report_file"
echo "============================" >> "$report_file"
echo "" >> "$report_file"
tg-show-token-costs >> "$report_file"
echo "" >> "$report_file"
echo "Cost Analysis:" >> "$report_file"
echo "==============" >> "$report_file"
# Add cost analysis
total_models=$(tg-show-token-costs | grep -c "|" | awk '{print $1-3}') # Subtract header rows
free_models=$(tg-show-token-costs | grep -c "0.000")
paid_models=$((total_models - free_models))
echo "Total models configured: $total_models" >> "$report_file"
echo "Paid models: $paid_models" >> "$report_file"
echo "Free models: $free_models" >> "$report_file"
# Find most expensive models
echo "" >> "$report_file"
echo "Most expensive models (by output cost):" >> "$report_file"
tg-show-token-costs | grep -v "model" | grep -v "^\+" | \
sort -k3 -nr | head -3 >> "$report_file"
echo "Report saved: $report_file"
}
generate_cost_report
```
### Cost Validation
```bash
# Validate cost configuration
validate_cost_config() {
echo "Cost Configuration Validation"
echo "============================="
local issues=0
# Check for unconfigured models
unconfigured=$(tg-show-token-costs | grep -c "\-")
if [ "$unconfigured" -gt 0 ]; then
echo "⚠ Warning: $unconfigured models have incomplete cost configuration"
tg-show-token-costs | grep "\-"
issues=$((issues + 1))
fi
# Check for zero-cost models (might be intentional)
zero_cost=$(tg-show-token-costs | grep -c "0.000.*0.000")
if [ "$zero_cost" -gt 0 ]; then
echo " Info: $zero_cost models configured with zero cost (likely local models)"
fi
# Check for unusual cost patterns
tg-show-token-costs | grep -v "model" | grep -v "^\+" | \
while read -r line; do
input_cost=$(echo "$line" | awk '{print $2}' | tr -d '|' | tr -d ' ')
output_cost=$(echo "$line" | awk '{print $3}' | tr -d '|' | tr -d ' ')
model=$(echo "$line" | awk '{print $1}' | tr -d '|' | tr -d ' ')
if [[ "$input_cost" != "-" && "$output_cost" != "-" ]]; then
# Check if output cost is lower than input cost (unusual)
if (( $(echo "$output_cost < $input_cost" | bc -l) )); then
echo "⚠ Warning: $model has output cost lower than input cost"
issues=$((issues + 1))
fi
# Check for extremely high costs
if (( $(echo "$input_cost > 100" | bc -l) )) || (( $(echo "$output_cost > 200" | bc -l) )); then
echo "⚠ Warning: $model has unusually high costs"
issues=$((issues + 1))
fi
fi
done
if [ "$issues" -eq 0 ]; then
echo "✓ Cost configuration appears valid"
else
echo "Found $issues potential issues"
fi
}
validate_cost_config
```
### Cost Tracking
```bash
# Track cost changes over time
track_cost_changes() {
local history_file="cost_history.txt"
local current_file="current_costs.tmp"
# Get current costs
tg-show-token-costs > "$current_file"
# Check if this is first run
if [ ! -f "$history_file" ]; then
echo "$(date): Initial cost configuration" >> "$history_file"
cat "$current_file" >> "$history_file"
echo "---" >> "$history_file"
else
# Compare with last known state
if ! diff -q "$history_file" "$current_file" > /dev/null 2>&1; then
echo "$(date): Cost configuration changed" >> "$history_file"
# Show differences
echo "Changes:" >> "$history_file"
diff "$history_file" "$current_file" | tail -n +1 >> "$history_file"
echo "New configuration:" >> "$history_file"
cat "$current_file" >> "$history_file"
echo "---" >> "$history_file"
echo "Cost changes detected and logged to $history_file"
else
echo "No cost changes detected"
fi
fi
rm "$current_file"
}
track_cost_changes
```
### Export Cost Data
```bash
# Export costs to CSV
export_costs_csv() {
local output_file="$1"
echo "model,input_cost_per_million,output_cost_per_million" > "$output_file"
tg-show-token-costs | grep -v "model" | grep -v "^\+" | \
while read -r line; do
model=$(echo "$line" | awk '{print $1}' | tr -d '|' | tr -d ' ')
input_cost=$(echo "$line" | awk '{print $2}' | tr -d '|' | tr -d ' ')
output_cost=$(echo "$line" | awk '{print $3}' | tr -d '|' | tr -d ' ')
if [[ "$model" != "" ]]; then
echo "$model,$input_cost,$output_cost" >> "$output_file"
fi
done
echo "Costs exported to: $output_file"
}
# Export to CSV
export_costs_csv "token_costs.csv"
# Export to JSON
export_costs_json() {
local output_file="$1"
echo "{" > "$output_file"
echo " \"export_date\": \"$(date -Iseconds)\"," >> "$output_file"
echo " \"models\": [" >> "$output_file"
first=true
tg-show-token-costs | grep -v "model" | grep -v "^\+" | \
while read -r line; do
model=$(echo "$line" | awk '{print $1}' | tr -d '|' | tr -d ' ')
input_cost=$(echo "$line" | awk '{print $2}' | tr -d '|' | tr -d ' ')
output_cost=$(echo "$line" | awk '{print $3}' | tr -d '|' | tr -d ' ')
if [[ "$model" != "" ]]; then
if [ "$first" = "false" ]; then
echo "," >> "$output_file"
fi
first=false
echo " {" >> "$output_file"
echo " \"model\": \"$model\"," >> "$output_file"
echo " \"input_cost\": \"$input_cost\"," >> "$output_file"
echo " \"output_cost\": \"$output_cost\"" >> "$output_file"
echo -n " }" >> "$output_file"
fi
done
echo "" >> "$output_file"
echo " ]" >> "$output_file"
echo "}" >> "$output_file"
echo "Costs exported to: $output_file"
}
export_costs_json "token_costs.json"
```
### Cost Calculation Tools
```bash
# Calculate costs for usage scenarios
calculate_usage_cost() {
local model="$1"
local input_tokens="$2"
local output_tokens="$3"
echo "Calculating cost for $model usage:"
echo " Input tokens: $input_tokens"
echo " Output tokens: $output_tokens"
# Extract costs for specific model
costs=$(tg-show-token-costs | grep "$model")
if [ -z "$costs" ]; then
echo "Error: Model $model not found in cost configuration"
return 1
fi
input_cost=$(echo "$costs" | awk '{print $2}' | tr -d '|' | tr -d ' ')
output_cost=$(echo "$costs" | awk '{print $3}' | tr -d '|' | tr -d ' ')
if [[ "$input_cost" == "-" || "$output_cost" == "-" ]]; then
echo "Error: Incomplete cost configuration for $model"
return 1
fi
# Calculate total cost
total_cost=$(echo "scale=6; ($input_tokens * $input_cost / 1000000) + ($output_tokens * $output_cost / 1000000)" | bc -l)
echo " Input cost: \$$(echo "scale=6; $input_tokens * $input_cost / 1000000" | bc -l)"
echo " Output cost: \$$(echo "scale=6; $output_tokens * $output_cost / 1000000" | bc -l)"
echo " Total cost: \$${total_cost}"
}
# Example usage calculations
calculate_usage_cost "gpt-4" 1000 500
calculate_usage_cost "claude-3-sonnet" 5000 2000
```
### Model Cost Comparison
```bash
# Compare costs across models for same usage
compare_model_costs() {
local input_tokens="${1:-1000}"
local output_tokens="${2:-500}"
echo "Cost comparison for $input_tokens input + $output_tokens output tokens:"
echo "====================================================================="
tg-show-token-costs | grep -v "model" | grep -v "^\+" | \
while read -r line; do
model=$(echo "$line" | awk '{print $1}' | tr -d '|' | tr -d ' ')
input_cost=$(echo "$line" | awk '{print $2}' | tr -d '|' | tr -d ' ')
output_cost=$(echo "$line" | awk '{print $3}' | tr -d '|' | tr -d ' ')
if [[ "$model" != "" && "$input_cost" != "-" && "$output_cost" != "-" ]]; then
total_cost=$(echo "scale=4; ($input_tokens * $input_cost / 1000000) + ($output_tokens * $output_cost / 1000000)" | bc -l)
printf "%-20s \$%s\n" "$model" "$total_cost"
fi
done | sort -k2 -n
}
# Compare costs for typical usage
compare_model_costs 1000 500
```
## Error Handling
### Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph is running.
### Permission Errors
```bash
Exception: Access denied
```
**Solution**: Verify user permissions for configuration access.
### No Models Configured
```bash
# Empty table or no data
```
**Solution**: Configure model costs with `tg-set-token-costs`.
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-set-token-costs`](tg-set-token-costs.md) - Configure token costs
- [`tg-show-config`](tg-show-config.md) - Show other configuration settings (if available)
## API Integration
This command uses the [Config API](../apis/api-config.md) to retrieve token cost configuration from TrustGraph's configuration system.
## Best Practices
1. **Regular Review**: Check cost configurations regularly
2. **Cost Tracking**: Monitor cost changes over time
3. **Validation**: Validate cost configurations for accuracy
4. **Documentation**: Document cost sources and update procedures
5. **Reporting**: Generate regular cost reports for budget planning
6. **Comparison**: Compare costs across environments
7. **Automation**: Automate cost monitoring and alerting
## Troubleshooting
### Missing Cost Data
```bash
# Check if models are configured
tg-show-token-costs | grep -c "model"
# Verify specific model exists
tg-show-token-costs | grep "model-name"
```
### Formatting Issues
```bash
# If table is garbled
export COLUMNS=120
tg-show-token-costs
```
### Incomplete Data
```bash
# Look for models with missing costs
tg-show-token-costs | grep "\-"
# Set missing costs
tg-set-token-costs --model "incomplete-model" -i 1.0 -o 2.0
```

View file

@ -0,0 +1,246 @@
# tg-show-token-rate
## Synopsis
```
tg-show-token-rate [OPTIONS]
```
## Description
The `tg-show-token-rate` command displays a live stream of token usage rates from TrustGraph processors. It monitors both input and output tokens, showing instantaneous rates and cumulative averages over time. This command is essential for monitoring LLM token consumption and understanding processing throughput.
The command queries the metrics endpoint for token usage data and displays:
- Input token rates (tokens per second)
- Output token rates (tokens per second)
- Total token rates (combined input + output)
All rates are calculated as averages since the command started running.
## Options
- `-m, --metrics-url URL`
- Metrics endpoint URL to query for token information
- Default: `http://localhost:8088/api/metrics`
- Should point to a Prometheus-compatible metrics endpoint
- `-p, --period SECONDS`
- Sampling period in seconds between measurements
- Default: `1`
- Controls how frequently token rates are updated
- `-n, --number-samples COUNT`
- Number of samples to collect before stopping
- Default: `100`
- Set to a large value for continuous monitoring
- `-h, --help`
- Show help message and exit
## Examples
### Basic Usage
Monitor token rates with default settings (1-second intervals, 100 samples):
```bash
tg-show-token-rate
```
### Custom Sampling Period
Monitor token rates with 5-second intervals:
```bash
tg-show-token-rate --period 5
```
### Continuous Monitoring
Monitor token rates continuously (1000 samples):
```bash
tg-show-token-rate -n 1000
```
### Remote Monitoring
Monitor token rates from a remote TrustGraph instance:
```bash
tg-show-token-rate -m http://10.0.1.100:8088/api/metrics
```
### High-Frequency Monitoring
Monitor token rates with sub-second precision:
```bash
tg-show-token-rate --period 0.5 --number-samples 200
```
## Output Format
The command displays a table with continuously updated token rates:
```
Input Output Total
----- ------ -----
12.3 8.7 21.0
15.2 10.1 25.3
18.7 12.4 31.1
...
```
Each row shows:
- **Input**: Average input tokens per second since monitoring started
- **Output**: Average output tokens per second since monitoring started
- **Total**: Combined input + output tokens per second
## Advanced Usage
### Token Rate Analysis
Create a script to analyze token usage patterns:
```bash
#!/bin/bash
echo "Starting token rate analysis..."
tg-show-token-rate --period 2 --number-samples 60 > token_rates.txt
echo "Analysis complete. Data saved to token_rates.txt"
```
### Performance Monitoring
Monitor token rates during load testing:
```bash
#!/bin/bash
echo "Starting load test monitoring..."
tg-show-token-rate --period 1 --number-samples 300 | tee load_test_tokens.log
```
### Alert on High Token Usage
Create an alert script for excessive token consumption:
```bash
#!/bin/bash
tg-show-token-rate -n 10 -p 5 | tail -n 1 | awk '{
if ($3 > 100) {
print "WARNING: High token rate detected:", $3, "tokens/sec"
exit 1
}
}'
```
### Cost Estimation
Estimate token costs during processing:
```bash
#!/bin/bash
echo "Monitoring token usage for cost estimation..."
tg-show-token-rate --period 10 --number-samples 36 | \
awk 'NR>2 {total+=$3} END {print "Average tokens/sec:", total/NR-2}'
```
## Error Handling
The command handles various error conditions:
- **Connection errors**: If the metrics endpoint is unavailable
- **Invalid JSON**: If the metrics response is malformed
- **Missing metrics**: If token metrics are not found
- **Network timeouts**: If requests to the metrics endpoint time out
Common error scenarios:
```bash
# Metrics endpoint not available
tg-show-token-rate -m http://invalid-host:8088/api/metrics
# Output: Exception: [Connection error details]
# Invalid period value
tg-show-token-rate --period 0
# Output: Exception: [Invalid period error]
```
## Integration with Other Commands
### With Cost Monitoring
Combine with token cost analysis:
```bash
echo "=== Token Rates ==="
tg-show-token-rate -n 5 -p 2
echo
echo "=== Token Costs ==="
tg-show-token-costs
```
### With Processor State
Monitor tokens alongside processor health:
```bash
echo "=== Processor States ==="
tg-show-processor-state
echo
echo "=== Token Rates ==="
tg-show-token-rate -n 10 -p 1
```
### With Flow Monitoring
Track token usage per flow:
```bash
#!/bin/bash
echo "=== Active Flows ==="
tg-show-flows
echo
echo "=== Token Usage ==="
tg-show-token-rate -n 20 -p 3
```
## Best Practices
1. **Baseline Monitoring**: Establish baseline token rates for normal operation
2. **Alert Thresholds**: Set up alerts for unusually high token consumption
3. **Cost Tracking**: Monitor token rates to estimate operational costs
4. **Load Testing**: Use during load testing to understand capacity limits
5. **Historical Analysis**: Save token rate data for trend analysis
## Troubleshooting
### No Token Data
If no token rates are displayed:
1. Verify that TrustGraph processors are actively processing requests
2. Check that token metrics are being exported properly
3. Ensure the metrics endpoint is accessible
4. Verify that LLM services are receiving requests
### Inconsistent Rates
For inconsistent or erratic token rates:
1. Check for network issues affecting metrics collection
2. Verify that the sampling period is appropriate for your workload
3. Ensure multiple processors aren't conflicting
4. Check system resources (CPU, memory) on the TrustGraph instance
### High Token Rates
If token rates are unexpectedly high:
1. Investigate the types of queries being processed
2. Check for inefficient prompts or large document processing
3. Verify that caching is working properly
4. Consider if the workload justifies the token usage
## Performance Considerations
- **Sampling Frequency**: Higher frequencies provide more granular data but consume more resources
- **Network Latency**: Consider network latency when setting sampling periods
- **Metrics Storage**: Long monitoring sessions generate significant data
- **Resource Usage**: The command itself uses minimal resources
## Related Commands
- [`tg-show-token-costs`](tg-show-token-costs.md) - Display token usage costs
- [`tg-show-processor-state`](tg-show-processor-state.md) - Show processor states
- [`tg-show-flow-state`](tg-show-flow-state.md) - Display flow processor states
- [`tg-show-config`](tg-show-config.md) - Show TrustGraph configuration
## See Also
- TrustGraph Token Management Documentation
- Prometheus Metrics Configuration
- LLM Cost Optimization Guide

283
docs/cli/tg-show-tools.md Normal file
View file

@ -0,0 +1,283 @@
# tg-show-tools
## Synopsis
```
tg-show-tools [OPTIONS]
```
## Description
The `tg-show-tools` command displays the current agent tool configuration from TrustGraph. It retrieves and presents detailed information about all available tools that agents can use, including their descriptions, arguments, and parameter types.
This command is useful for:
- Understanding available agent tools and their capabilities
- Debugging agent tool configuration issues
- Documenting the current tool set
- Verifying tool definitions and argument specifications
The command queries the TrustGraph API to fetch the tool index and individual tool definitions, then presents them in a formatted table for easy reading.
## Options
- `-u, --api-url URL`
- TrustGraph API URL to query for tool configuration
- Default: `http://localhost:8088/` (or `TRUSTGRAPH_URL` environment variable)
- Should point to a running TrustGraph API instance
- `-h, --help`
- Show help message and exit
## Examples
### Basic Usage
Display all available agent tools using the default API URL:
```bash
tg-show-tools
```
### Custom API URL
Display tools from a specific TrustGraph instance:
```bash
tg-show-tools -u http://trustgraph.example.com:8088/
```
### Remote Instance
Query tools from a remote TrustGraph deployment:
```bash
tg-show-tools --api-url http://10.0.1.100:8088/
```
### Using Environment Variable
Set the API URL via environment variable:
```bash
export TRUSTGRAPH_URL=http://production.trustgraph.com:8088/
tg-show-tools
```
## Output Format
The command displays each tool in a detailed table format:
```
web-search:
+-------------+----------------------------------------------------------------------+
| id | web-search |
+-------------+----------------------------------------------------------------------+
| name | Web Search |
+-------------+----------------------------------------------------------------------+
| description | Search the web for information using a search engine |
+-------------+----------------------------------------------------------------------+
| arg 0 | query: string |
| | The search query to execute |
+-------------+----------------------------------------------------------------------+
| arg 1 | max_results: integer |
| | Maximum number of search results to return |
+-------------+----------------------------------------------------------------------+
file-read:
+-------------+----------------------------------------------------------------------+
| id | file-read |
+-------------+----------------------------------------------------------------------+
| name | File Reader |
+-------------+----------------------------------------------------------------------+
| description | Read contents of a file from the filesystem |
+-------------+----------------------------------------------------------------------+
| arg 0 | path: string |
| | Path to the file to read |
+-------------+----------------------------------------------------------------------+
```
For each tool, the output includes:
- **id**: Unique identifier for the tool
- **name**: Human-readable name of the tool
- **description**: Detailed description of what the tool does
- **arg N**: Arguments the tool accepts, with name, type, and description
## Advanced Usage
### Tool Inventory
Create a complete inventory of available tools:
```bash
#!/bin/bash
echo "=== TrustGraph Agent Tools Inventory ==="
echo "Generated on: $(date)"
echo
tg-show-tools > tools_inventory.txt
echo "Inventory saved to tools_inventory.txt"
```
### Tool Comparison
Compare tools across different environments:
```bash
#!/bin/bash
echo "=== Development Tools ==="
tg-show-tools -u http://dev.trustgraph.com:8088/ > dev_tools.txt
echo
echo "=== Production Tools ==="
tg-show-tools -u http://prod.trustgraph.com:8088/ > prod_tools.txt
echo
diff dev_tools.txt prod_tools.txt
```
### Tool Documentation
Generate documentation for agent tools:
```bash
#!/bin/bash
echo "# Available Agent Tools" > AGENT_TOOLS.md
echo "" >> AGENT_TOOLS.md
echo "Generated on: $(date)" >> AGENT_TOOLS.md
echo "" >> AGENT_TOOLS.md
tg-show-tools >> AGENT_TOOLS.md
```
### Tool Configuration Validation
Validate tool configuration after updates:
```bash
#!/bin/bash
echo "Validating tool configuration..."
if tg-show-tools > /dev/null 2>&1; then
echo "✓ Tool configuration is valid"
tool_count=$(tg-show-tools | grep -c "^[a-zA-Z].*:$")
echo "✓ Found $tool_count tools"
else
echo "✗ Tool configuration validation failed"
exit 1
fi
```
## Error Handling
The command handles various error conditions:
- **API connection errors**: If the TrustGraph API is unavailable
- **Authentication errors**: If API access is denied
- **Invalid configuration**: If tool configuration is malformed
- **Network timeouts**: If API requests time out
Common error scenarios:
```bash
# API not available
tg-show-tools -u http://invalid-host:8088/
# Output: Exception: [Connection error details]
# Invalid API URL
tg-show-tools --api-url "not-a-url"
# Output: Exception: [URL parsing error]
# Configuration not found
# Output: Exception: [Configuration retrieval error]
```
## Integration with Other Commands
### With Agent Configuration
Display tools alongside agent configuration:
```bash
echo "=== Agent Tools ==="
tg-show-tools
echo
echo "=== Agent Configuration ==="
tg-show-config
```
### With Flow Analysis
Understand tools used in flows:
```bash
echo "=== Available Tools ==="
tg-show-tools
echo
echo "=== Active Flows ==="
tg-show-flows
```
### With Prompt Analysis
Analyze tool usage in prompts:
```bash
echo "=== Agent Tools ==="
tg-show-tools | grep -E "^[a-zA-Z].*:$"
echo
echo "=== Available Prompts ==="
tg-show-prompts
```
## Best Practices
1. **Regular Documentation**: Keep tool documentation updated
2. **Version Control**: Track tool configuration changes
3. **Testing**: Test tool functionality after configuration changes
4. **Security**: Review tool permissions and capabilities
5. **Monitoring**: Monitor tool usage and performance
## Troubleshooting
### No Tools Displayed
If no tools are shown:
1. Verify the TrustGraph API is running and accessible
2. Check that tool configuration has been properly loaded
3. Ensure the API URL is correct
4. Verify network connectivity
### Incomplete Tool Information
If tool information is missing or incomplete:
1. Check the tool configuration files
2. Verify the tool index is properly maintained
3. Ensure tool definitions are valid JSON
4. Check for configuration loading errors
### Tool Configuration Errors
If tools are not working as expected:
1. Validate tool definitions against the schema
2. Check for missing or invalid arguments
3. Verify tool implementation is available
4. Review agent logs for tool execution errors
## Tool Management
### Adding New Tools
After adding new tools to the system:
```bash
# Verify the new tool appears
tg-show-tools | grep "new-tool-name"
# Test the tool configuration
tg-show-tools > current_tools.txt
```
### Removing Tools
After removing tools:
```bash
# Verify the tool is no longer listed
tg-show-tools | grep -v "removed-tool-name"
# Update tool documentation
tg-show-tools > updated_tools.txt
```
## Related Commands
- [`tg-show-config`](tg-show-config.md) - Show TrustGraph configuration
- [`tg-show-prompts`](tg-show-prompts.md) - Display available prompts
- [`tg-show-flows`](tg-show-flows.md) - Show active flows
- [`tg-invoke-agent`](tg-invoke-agent.md) - Invoke agent with tools
## See Also
- TrustGraph Agent Documentation
- Tool Configuration Guide
- Agent API Reference

189
docs/cli/tg-start-flow.md Normal file
View file

@ -0,0 +1,189 @@
# tg-start-flow
Starts a processing flow using a defined flow class.
## Synopsis
```bash
tg-start-flow -n CLASS_NAME -i FLOW_ID -d DESCRIPTION [options]
```
## Description
The `tg-start-flow` command creates and starts a new processing flow instance based on a predefined flow class. Flow classes define the processing pipeline configuration, while flow instances are running implementations of those classes with specific identifiers.
Once started, a flow provides endpoints for document processing, knowledge queries, and other TrustGraph services through its configured interfaces.
## Options
### Required Arguments
- `-n, --class-name CLASS_NAME`: Name of the flow class to instantiate
- `-i, --flow-id FLOW_ID`: Unique identifier for the new flow instance
- `-d, --description DESCRIPTION`: Human-readable description of the flow
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Start Basic Document Processing Flow
```bash
tg-start-flow \
-n "document-rag+graph-rag" \
-i "research-flow" \
-d "Research document processing pipeline"
```
### Start Custom Flow Class
```bash
tg-start-flow \
-n "medical-analysis" \
-i "medical-research-2024" \
-d "Medical research analysis for 2024 studies"
```
### Using Custom API URL
```bash
tg-start-flow \
-n "document-processing" \
-i "production-flow" \
-d "Production document processing" \
-u http://production:8088/
```
## Prerequisites
### Flow Class Must Exist
Before starting a flow, the flow class must be available in the system:
```bash
# Check available flow classes
tg-show-flow-classes
# Upload a flow class if needed
tg-put-flow-class -n "my-class" -f flow-definition.json
```
### System Requirements
- TrustGraph API gateway must be running
- Required processing components must be available
- Sufficient system resources for the flow's processing needs
## Flow Lifecycle
1. **Flow Class Definition**: Flow classes define processing pipelines
2. **Flow Instance Creation**: `tg-start-flow` creates a running instance
3. **Service Availability**: Flow provides configured service endpoints
4. **Processing**: Documents and queries can be processed through the flow
5. **Flow Termination**: Use `tg-stop-flow` to stop the instance
## Error Handling
### Flow Class Not Found
```bash
Exception: Flow class 'invalid-class' not found
```
**Solution**: Check available flow classes with `tg-show-flow-classes` and ensure the class name is correct.
### Flow ID Already Exists
```bash
Exception: Flow ID 'my-flow' already exists
```
**Solution**: Choose a different flow ID or stop the existing flow with `tg-stop-flow`.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Verify the API URL and ensure TrustGraph is running.
### Resource Errors
```bash
Exception: Insufficient resources to start flow
```
**Solution**: Check system resources and ensure required processing components are available.
## Output
On successful flow creation:
```bash
Flow 'research-flow' started successfully using class 'document-rag+graph-rag'
```
## Flow Configuration
Once started, flows provide service interfaces based on their class definition. Common interfaces include:
### Request/Response Services
- **agent**: Interactive Q&A service
- **graph-rag**: Graph-based retrieval augmented generation
- **document-rag**: Document-based retrieval augmented generation
- **text-completion**: LLM text completion
- **embeddings**: Text embedding generation
- **triples**: Knowledge graph queries
### Fire-and-Forget Services
- **text-load**: Text document loading
- **document-load**: Document file loading
- **triples-store**: Knowledge graph storage
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-stop-flow`](tg-stop-flow.md) - Stop a running flow
- [`tg-show-flows`](tg-show-flows.md) - List active flows and their interfaces
- [`tg-show-flow-classes`](tg-show-flow-classes.md) - List available flow classes
- [`tg-put-flow-class`](tg-put-flow-class.md) - Upload/update flow class definitions
- [`tg-show-flow-state`](tg-show-flow-state.md) - Check flow status
## API Integration
This command uses the [Flow API](../apis/api-flow.md) with the `start-flow` operation to create and start flow instances.
## Use Cases
### Development Environment
```bash
tg-start-flow \
-n "dev-pipeline" \
-i "dev-$(date +%Y%m%d)" \
-d "Development testing flow for $(date)"
```
### Research Projects
```bash
tg-start-flow \
-n "research-analysis" \
-i "climate-study" \
-d "Climate change research document analysis"
```
### Production Processing
```bash
tg-start-flow \
-n "production-pipeline" \
-i "prod-primary" \
-d "Primary production document processing pipeline"
```
### Specialized Processing
```bash
tg-start-flow \
-n "medical-nlp" \
-i "medical-trials" \
-d "Medical trial document analysis and extraction"
```
## Best Practices
1. **Descriptive IDs**: Use meaningful flow IDs that indicate purpose and scope
2. **Clear Descriptions**: Provide detailed descriptions for flow tracking
3. **Resource Planning**: Ensure adequate resources before starting flows
4. **Monitoring**: Use `tg-show-flows` to monitor active flows
5. **Cleanup**: Stop unused flows to free up resources
6. **Documentation**: Document flow purposes and configurations for team use

View file

@ -0,0 +1,563 @@
# tg-start-library-processing
Submits a library document for processing through TrustGraph workflows.
## Synopsis
```bash
tg-start-library-processing -d DOCUMENT_ID --id PROCESSING_ID [options]
```
## Description
The `tg-start-library-processing` command initiates processing of a document stored in TrustGraph's document library. This triggers workflows that can extract text, generate embeddings, create knowledge graphs, and enable document search and analysis.
Each processing job is assigned a unique processing ID for tracking and management purposes.
## Options
### Required Arguments
- `-d, --document-id ID`: Document ID from the library to process
- `--id, --processing-id ID`: Unique identifier for this processing job
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User ID for processing context (default: `trustgraph`)
- `-i, --flow-id ID`: Flow instance to use for processing (default: `default`)
- `--collection COLLECTION`: Collection to assign processed data (default: `default`)
- `--tags TAGS`: Comma-separated tags for the processing job
## Examples
### Basic Document Processing
```bash
tg-start-library-processing -d "doc_123456789" --id "proc_001"
```
### Processing with Custom Collection
```bash
tg-start-library-processing \
-d "research_paper_456" \
--id "research_proc_001" \
--collection "research-papers" \
--tags "nlp,research,2023"
```
### Processing with Specific Flow
```bash
tg-start-library-processing \
-d "technical_manual" \
--id "manual_proc_001" \
-i "document-analysis-flow" \
-U "technical-team" \
--collection "technical-docs"
```
### Processing Multiple Documents
```bash
# Process several documents in sequence
documents=("doc_001" "doc_002" "doc_003")
for i in "${!documents[@]}"; do
doc_id="${documents[$i]}"
proc_id="batch_proc_$(printf %03d $((i+1)))"
echo "Processing document: $doc_id"
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
--collection "batch-processing" \
--tags "batch,automated"
done
```
## Processing Workflow
### Document Processing Steps
1. **Document Retrieval**: Fetch document from library
2. **Content Extraction**: Extract text and metadata
3. **Text Processing**: Clean and normalize content
4. **Embedding Generation**: Create vector embeddings
5. **Knowledge Extraction**: Generate triples and entities
6. **Index Creation**: Make content searchable
### Processing Types
Different document types may trigger different processing workflows:
- **PDF Documents**: Text extraction, OCR if needed
- **Text Files**: Direct text processing
- **Images**: OCR and image analysis
- **Structured Data**: Schema extraction and mapping
## Use Cases
### Batch Document Processing
```bash
# Process all unprocessed documents
process_all_documents() {
local collection="$1"
local batch_id="batch_$(date +%Y%m%d_%H%M%S)"
echo "Starting batch processing for collection: $collection"
# Get all document IDs
tg-show-library-documents | \
grep "| id" | \
awk '{print $3}' | \
while read -r doc_id; do
proc_id="${batch_id}_${doc_id}"
echo "Processing document: $doc_id"
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
--collection "$collection" \
--tags "batch,automated,$(date +%Y%m%d)"
# Add delay to avoid overwhelming the system
sleep 2
done
}
# Process all documents
process_all_documents "processed-docs"
```
### Department-Specific Processing
```bash
# Process documents by department
process_by_department() {
local dept="$1"
local flow="$2"
echo "Processing documents for department: $dept"
# Find documents with department tag
tg-show-library-documents -U "$dept" | \
grep "| id" | \
awk '{print $3}' | \
while read -r doc_id; do
proc_id="${dept}_proc_$(date +%s)_${doc_id}"
echo "Processing $dept document: $doc_id"
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
-i "$flow" \
-U "$dept" \
--collection "${dept}-processed" \
--tags "$dept,departmental"
done
}
# Process documents for different departments
process_by_department "research" "research-flow"
process_by_department "finance" "document-flow"
process_by_department "legal" "compliance-flow"
```
### Priority Processing
```bash
# Process high-priority documents first
priority_processing() {
local priority_tags=("urgent" "high-priority" "critical")
for tag in "${priority_tags[@]}"; do
echo "Processing $tag documents..."
tg-show-library-documents | \
grep -B5 -A5 "$tag" | \
grep "| id" | \
awk '{print $3}' | \
while read -r doc_id; do
proc_id="priority_$(date +%s)_${doc_id}"
echo "Processing priority document: $doc_id"
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
--collection "priority-processed" \
--tags "priority,$tag"
done
done
}
priority_processing
```
### Conditional Processing
```bash
# Process documents based on criteria
conditional_processing() {
local criteria="$1"
local flow="$2"
echo "Processing documents matching criteria: $criteria"
tg-show-library-documents | \
grep -B10 -A10 "$criteria" | \
grep "| id" | \
awk '{print $3}' | \
while read -r doc_id; do
# Check if already processed
if tg-invoke-document-rag -q "test" 2>/dev/null | grep -q "$doc_id"; then
echo "Document $doc_id already processed, skipping"
continue
fi
proc_id="conditional_$(date +%s)_${doc_id}"
echo "Processing document: $doc_id"
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
-i "$flow" \
--collection "conditional-processed" \
--tags "conditional,$criteria"
done
}
# Process technical documents
conditional_processing "technical" "technical-flow"
```
## Advanced Usage
### Processing with Validation
```bash
# Process with pre and post validation
validated_processing() {
local doc_id="$1"
local proc_id="$2"
local collection="$3"
echo "Starting validated processing for: $doc_id"
# Pre-processing validation
if ! tg-show-library-documents | grep -q "$doc_id"; then
echo "ERROR: Document $doc_id not found"
return 1
fi
# Check if processing ID is unique
if tg-show-flows | grep -q "$proc_id"; then
echo "ERROR: Processing ID $proc_id already in use"
return 1
fi
# Start processing
echo "Starting processing..."
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
--collection "$collection" \
--tags "validated,$(date +%Y%m%d)"
# Monitor processing
echo "Monitoring processing progress..."
timeout=300 # 5 minutes
elapsed=0
interval=10
while [ $elapsed -lt $timeout ]; do
if tg-invoke-document-rag -q "test" -C "$collection" 2>/dev/null | grep -q "$doc_id"; then
echo "✓ Processing completed successfully"
return 0
fi
echo "Processing in progress... (${elapsed}s elapsed)"
sleep $interval
elapsed=$((elapsed + interval))
done
echo "⚠ Processing timeout reached"
return 1
}
# Usage
validated_processing "doc_123" "validated_proc_001" "validated-docs"
```
### Parallel Processing with Limits
```bash
# Process multiple documents in parallel with concurrency limits
parallel_processing() {
local doc_list=("$@")
local max_concurrent=5
local current_jobs=0
echo "Processing ${#doc_list[@]} documents with max $max_concurrent concurrent jobs"
for doc_id in "${doc_list[@]}"; do
# Wait if max concurrent jobs reached
while [ $current_jobs -ge $max_concurrent ]; do
wait -n # Wait for any job to complete
current_jobs=$((current_jobs - 1))
done
# Start processing in background
(
proc_id="parallel_$(date +%s)_${doc_id}"
echo "Starting processing: $doc_id"
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
--collection "parallel-processed" \
--tags "parallel,batch"
echo "Completed processing: $doc_id"
) &
current_jobs=$((current_jobs + 1))
done
# Wait for all remaining jobs
wait
echo "All processing jobs completed"
}
# Get document list and process in parallel
doc_list=($(tg-show-library-documents | grep "| id" | awk '{print $3}'))
parallel_processing "${doc_list[@]}"
```
### Processing with Retry Logic
```bash
# Process with automatic retry on failure
processing_with_retry() {
local doc_id="$1"
local proc_id="$2"
local max_retries=3
local retry_delay=30
for attempt in $(seq 1 $max_retries); do
echo "Processing attempt $attempt/$max_retries for document: $doc_id"
if tg-start-library-processing \
-d "$doc_id" \
--id "${proc_id}_attempt_${attempt}" \
--collection "retry-processed" \
--tags "retry,attempt_$attempt"; then
# Wait and check if processing succeeded
sleep $retry_delay
if tg-invoke-document-rag -q "test" 2>/dev/null | grep -q "$doc_id"; then
echo "✓ Processing succeeded on attempt $attempt"
return 0
else
echo "Processing started but content not yet accessible"
fi
else
echo "✗ Processing failed on attempt $attempt"
fi
if [ $attempt -lt $max_retries ]; then
echo "Retrying in ${retry_delay}s..."
sleep $retry_delay
fi
done
echo "✗ Processing failed after $max_retries attempts"
return 1
}
# Usage
processing_with_retry "doc_123" "retry_proc_001"
```
### Configuration-Driven Processing
```bash
# Process documents based on configuration file
config_driven_processing() {
local config_file="$1"
if [ ! -f "$config_file" ]; then
echo "Configuration file not found: $config_file"
return 1
fi
echo "Processing documents based on configuration: $config_file"
# Example configuration format:
# doc_id,flow_id,collection,tags
# doc_123,research-flow,research-docs,nlp research
while IFS=',' read -r doc_id flow_id collection tags; do
# Skip header line
if [ "$doc_id" = "doc_id" ]; then
continue
fi
proc_id="config_$(date +%s)_${doc_id}"
echo "Processing: $doc_id -> $collection (flow: $flow_id)"
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
-i "$flow_id" \
--collection "$collection" \
--tags "$tags"
done < "$config_file"
}
# Create example configuration
cat > processing_config.csv << EOF
doc_id,flow_id,collection,tags
doc_123,research-flow,research-docs,nlp research
doc_456,finance-flow,finance-docs,financial quarterly
doc_789,general-flow,general-docs,general processing
EOF
# Process based on configuration
config_driven_processing "processing_config.csv"
```
## Error Handling
### Document Not Found
```bash
Exception: Document not found
```
**Solution**: Verify document exists with `tg-show-library-documents`.
### Processing ID Conflict
```bash
Exception: Processing ID already exists
```
**Solution**: Use a unique processing ID or check existing jobs with `tg-show-flows`.
### Flow Not Found
```bash
Exception: Flow instance not found
```
**Solution**: Verify flow exists with `tg-show-flows` or `tg-show-flow-classes`.
### Insufficient Resources
```bash
Exception: Processing queue full
```
**Solution**: Wait for current jobs to complete or scale processing resources.
## Monitoring and Management
### Processing Status
```bash
# Monitor processing progress
monitor_processing() {
local proc_id="$1"
local timeout="${2:-300}" # 5 minutes default
echo "Monitoring processing: $proc_id"
elapsed=0
interval=10
while [ $elapsed -lt $timeout ]; do
# Check if processing is active
if tg-show-flows | grep -q "$proc_id"; then
echo "Processing active... (${elapsed}s elapsed)"
else
echo "Processing completed or stopped"
break
fi
sleep $interval
elapsed=$((elapsed + interval))
done
if [ $elapsed -ge $timeout ]; then
echo "Monitoring timeout reached"
fi
}
# Monitor specific processing job
monitor_processing "proc_001" 600
```
### Batch Monitoring
```bash
# Monitor multiple processing jobs
monitor_batch() {
local proc_pattern="$1"
echo "Monitoring batch processing: $proc_pattern"
while true; do
active_jobs=$(tg-show-flows | grep -c "$proc_pattern" || echo "0")
if [ "$active_jobs" -eq 0 ]; then
echo "All batch processing jobs completed"
break
fi
echo "Active jobs: $active_jobs"
sleep 30
done
}
# Monitor batch processing
monitor_batch "batch_proc_"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-show-library-documents`](tg-show-library-documents.md) - List available documents
- [`tg-stop-library-processing`](tg-stop-library-processing.md) - Stop processing jobs
- [`tg-show-flows`](tg-show-flows.md) - Monitor processing flows
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Query processed documents
## API Integration
This command uses the [Library API](../apis/api-librarian.md) to initiate document processing workflows.
## Best Practices
1. **Unique IDs**: Always use unique processing IDs to avoid conflicts
2. **Resource Management**: Monitor system resources during batch processing
3. **Error Handling**: Implement retry logic for robust processing
4. **Monitoring**: Track processing progress and completion
5. **Collection Organization**: Use meaningful collection names
6. **Tagging**: Apply consistent tagging for better organization
7. **Documentation**: Document processing procedures and configurations
## Troubleshooting
### Processing Not Starting
```bash
# Check document exists
tg-show-library-documents | grep "document-id"
# Check flow is available
tg-show-flows | grep "flow-id"
# Check system resources
free -h
df -h
```
### Slow Processing
```bash
# Check processing queue
tg-show-flows | grep processing | wc -l
# Monitor system load
top
htop
```
### Processing Failures
```bash
# Check processing logs
# (Log location depends on TrustGraph configuration)
# Retry with different flow
tg-start-library-processing -d "doc-id" --id "retry-proc" -i "alternative-flow"
```

256
docs/cli/tg-stop-flow.md Normal file
View file

@ -0,0 +1,256 @@
# tg-stop-flow
Stops a running processing flow.
## Synopsis
```bash
tg-stop-flow -i FLOW_ID [options]
```
## Description
The `tg-stop-flow` command terminates a running flow instance and releases its associated resources. When a flow is stopped, it becomes unavailable for processing requests, and all its service endpoints are shut down.
This command is essential for flow lifecycle management, resource cleanup, and system maintenance operations.
## Options
### Required Arguments
- `-i, --flow-id FLOW_ID`: Identifier of the flow to stop
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
## Examples
### Stop Specific Flow
```bash
tg-stop-flow -i research-flow
```
### Using Custom API URL
```bash
tg-stop-flow -i production-flow -u http://production:8088/
```
### Stop Multiple Flows
```bash
# Stop multiple flows in sequence
tg-stop-flow -i dev-flow-1
tg-stop-flow -i dev-flow-2
tg-stop-flow -i test-flow
```
## Prerequisites
### Flow Must Exist and Be Running
Before stopping a flow, verify it exists:
```bash
# Check running flows
tg-show-flows
# Stop the desired flow
tg-stop-flow -i my-flow
```
## Flow Termination Process
1. **Request Validation**: Verifies flow exists and is running
2. **Service Shutdown**: Stops all flow service endpoints
3. **Resource Cleanup**: Releases allocated system resources
4. **Queue Cleanup**: Cleans up associated Pulsar queues
5. **State Update**: Updates flow status to stopped
## Impact of Stopping Flows
### Service Unavailability
Once stopped, the flow's services become unavailable:
- REST API endpoints return errors
- WebSocket connections are terminated
- Pulsar queues are cleaned up
### In-Progress Operations
- **Completed**: Already finished operations remain completed
- **Active**: In-progress operations may be interrupted
- **Queued**: Pending operations are lost
### Resource Recovery
- **Memory**: Memory allocated to flow components is freed
- **CPU**: Processing resources are returned to system pool
- **Storage**: Temporary storage is cleaned up
## Error Handling
### Flow Not Found
```bash
Exception: Flow 'invalid-flow' not found
```
**Solution**: Check available flows with `tg-show-flows` and verify the flow ID.
### Flow Already Stopped
```bash
Exception: Flow 'my-flow' is not running
```
**Solution**: The flow is already stopped. Use `tg-show-flows` to check current status.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Verify the API URL and ensure TrustGraph is running.
### Permission Errors
```bash
Exception: Insufficient permissions to stop flow
```
**Solution**: Check user permissions and authentication credentials.
## Output
On successful flow termination:
```bash
Flow 'research-flow' stopped successfully.
```
No output typically indicates successful operation.
## Flow Management Workflow
### Development Cycle
```bash
# 1. Start flow for development
tg-start-flow -n "dev-class" -i "dev-flow" -d "Development testing"
# 2. Use flow for testing
tg-invoke-graph-rag -q "test query" -f dev-flow
# 3. Stop flow when done
tg-stop-flow -i dev-flow
```
### Resource Management
```bash
# Check active flows
tg-show-flows
# Stop unused flows to free resources
tg-stop-flow -i old-research-flow
tg-stop-flow -i temporary-test-flow
```
### System Maintenance
```bash
# Stop all flows before maintenance
for flow in $(tg-show-flows | grep "id" | awk '{print $2}'); do
tg-stop-flow -i "$flow"
done
```
## Safety Considerations
### Data Preservation
- **Knowledge Cores**: Loaded knowledge cores are preserved
- **Library Documents**: Library documents remain intact
- **Configuration**: System configuration is unaffected
### Service Dependencies
- **Dependent Services**: Ensure no critical services depend on the flow
- **Active Users**: Notify users before stopping production flows
- **Scheduled Operations**: Check for scheduled operations using the flow
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-start-flow`](tg-start-flow.md) - Start a new flow instance
- [`tg-show-flows`](tg-show-flows.md) - List active flows
- [`tg-show-flow-state`](tg-show-flow-state.md) - Check detailed flow status
- [`tg-show-flow-classes`](tg-show-flow-classes.md) - List available flow classes
## API Integration
This command uses the [Flow API](../apis/api-flow.md) with the `stop-flow` operation to terminate flow instances.
## Use Cases
### Development Environment Cleanup
```bash
# Clean up development flows at end of day
tg-stop-flow -i dev-$(whoami)
tg-stop-flow -i test-experimental
```
### Resource Optimization
```bash
# Stop idle flows to free resources
tg-show-flows | grep "idle" | while read flow; do
tg-stop-flow -i "$flow"
done
```
### Environment Switching
```bash
# Switch from development to production configuration
tg-stop-flow -i dev-flow
tg-start-flow -n "production-class" -i "prod-flow" -d "Production processing"
```
### Maintenance Operations
```bash
# Prepare for system maintenance
echo "Stopping all flows for maintenance..."
tg-show-flows | grep -E "^[a-z-]+" | while read flow_id; do
echo "Stopping $flow_id"
tg-stop-flow -i "$flow_id"
done
```
### Flow Recycling
```bash
# Restart flow with fresh configuration
tg-stop-flow -i my-flow
tg-start-flow -n "updated-class" -i "my-flow" -d "Updated configuration"
```
## Best Practices
1. **Graceful Shutdown**: Allow in-progress operations to complete when possible
2. **User Notification**: Inform users before stopping production flows
3. **Resource Monitoring**: Check system resources after stopping flows
4. **Documentation**: Record why flows were stopped for audit purposes
5. **Verification**: Confirm flow stopped successfully with `tg-show-flows`
6. **Cleanup Planning**: Plan flow stops during low-usage periods
## Troubleshooting
### Flow Won't Stop
```bash
# Check flow status
tg-show-flow-state -i problematic-flow
# Force stop if necessary (implementation dependent)
# Contact system administrator if flow remains stuck
```
### Resource Not Released
```bash
# Check system resources after stopping
ps aux | grep trustgraph
netstat -an | grep 8088
# Restart TrustGraph if resources not properly released
```
### Service Still Responding
```bash
# Verify flow services are actually stopped
tg-invoke-graph-rag -q "test" -f stopped-flow
# Should return flow not found error
```

View file

@ -0,0 +1,507 @@
# tg-stop-library-processing
Removes a library document processing record from TrustGraph.
## Synopsis
```bash
tg-stop-library-processing --id PROCESSING_ID [options]
```
## Description
The `tg-stop-library-processing` command removes a document processing record from TrustGraph's library processing system. This command removes the processing record but **does not stop in-flight processing** that may already be running.
This is primarily used for cleaning up processing records, managing processing queues, and maintaining processing history.
## Options
### Required Arguments
- `--id, --processing-id ID`: Processing ID to remove
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User ID (default: `trustgraph`)
## Examples
### Remove Single Processing Record
```bash
tg-stop-library-processing --id "proc_123456789"
```
### Remove with Custom User
```bash
tg-stop-library-processing --id "research_proc_001" -U "research-team"
```
### Remove with Custom API URL
```bash
tg-stop-library-processing --id "proc_555" -u http://staging:8088/
```
## Important Limitations
### Processing Record vs Active Processing
This command only removes the **processing record** and does not:
- Stop currently running processing jobs
- Cancel in-flight document analysis
- Interrupt active workflows
### What It Does
- Removes processing metadata from library
- Cleans up processing history
- Allows reuse of processing IDs
- Maintains processing queue hygiene
### What It Doesn't Do
- Stop active processing threads
- Cancel running analysis jobs
- Interrupt flow execution
- Free up computational resources immediately
## Use Cases
### Cleanup Failed Processing Records
```bash
# Remove failed processing records
failed_processes=("proc_failed_001" "proc_error_002" "proc_timeout_003")
for proc_id in "${failed_processes[@]}"; do
echo "Removing failed processing record: $proc_id"
tg-stop-library-processing --id "$proc_id"
done
```
### Batch Cleanup
```bash
# Clean up all processing records for a specific pattern
cleanup_batch_processing() {
local pattern="$1"
echo "Cleaning up processing records matching: $pattern"
# This would require a way to list processing records
# For now, use known processing IDs
tg-show-flows | \
grep "$pattern" | \
awk '{print $1}' | \
while read proc_id; do
echo "Removing processing record: $proc_id"
tg-stop-library-processing --id "$proc_id"
done
}
# Clean up old batch processing records
cleanup_batch_processing "batch_proc_"
```
### User-Specific Cleanup
```bash
# Clean up processing records for specific user
cleanup_user_processing() {
local user="$1"
echo "Cleaning up processing records for user: $user"
# Note: This assumes you have a way to list processing records by user
# Implementation would depend on available APIs
# Example with known processing IDs
user_processes=("${user}_proc_001" "${user}_proc_002" "${user}_proc_003")
for proc_id in "${user_processes[@]}"; do
echo "Removing processing record: $proc_id"
tg-stop-library-processing --id "$proc_id" -U "$user"
done
}
# Clean up for specific user
cleanup_user_processing "temp-user"
```
### Age-Based Cleanup
```bash
# Clean up old processing records
cleanup_old_processing() {
local days_old="$1"
echo "Cleaning up processing records older than $days_old days"
# This would require timestamp information from processing records
# Implementation depends on available metadata
cutoff_date=$(date -d "$days_old days ago" +"%Y%m%d")
# Example with date-pattern processing IDs
# proc_20231215_001, proc_20231214_002, etc.
for proc_id in proc_*; do
if [[ "$proc_id" =~ proc_([0-9]{8})_ ]]; then
proc_date="${BASH_REMATCH[1]}"
if [[ "$proc_date" < "$cutoff_date" ]]; then
echo "Removing old processing record: $proc_id"
tg-stop-library-processing --id "$proc_id"
fi
fi
done
}
# Clean up processing records older than 30 days
cleanup_old_processing 30
```
## Safe Processing Management
### Before Removing Processing Records
```bash
# Check if processing is actually complete before cleanup
safe_processing_cleanup() {
local proc_id="$1"
local doc_id="$2"
echo "Safe cleanup for processing: $proc_id"
# Check if document is accessible (processing likely complete)
if tg-invoke-document-rag -q "test" 2>/dev/null | grep -q "$doc_id"; then
echo "Document $doc_id is accessible, safe to remove processing record"
tg-stop-library-processing --id "$proc_id"
echo "Processing record removed: $proc_id"
else
echo "Document $doc_id not yet accessible, processing may still be active"
echo "Skipping removal of processing record: $proc_id"
fi
}
# Usage
safe_processing_cleanup "proc_001" "doc_123"
```
### Verification Before Cleanup
```bash
# Verify processing completion before removing records
verify_and_cleanup() {
local proc_id="$1"
local collection="$2"
echo "Verifying processing completion for: $proc_id"
# Check if processing is still active in flows
if tg-show-flows | grep -q "$proc_id"; then
echo "Processing $proc_id is still active, not removing record"
return 1
fi
# Additional verification could include:
# - Checking if document content is available
# - Verifying embeddings are generated
# - Confirming knowledge graph updates
echo "Processing appears complete, removing record"
tg-stop-library-processing --id "$proc_id"
echo "Processing record removed: $proc_id"
}
# Usage
verify_and_cleanup "proc_001" "research-docs"
```
## Advanced Usage
### Conditional Cleanup
```bash
# Clean up processing records based on success criteria
conditional_cleanup() {
local proc_id="$1"
local doc_id="$2"
local collection="$3"
echo "Conditional cleanup for: $proc_id"
# Test if document is queryable (indicates successful processing)
test_query="What is this document about?"
if result=$(tg-invoke-document-rag -q "$test_query" -C "$collection" 2>/dev/null); then
if echo "$result" | grep -q "answer"; then
echo "✓ Document is queryable, processing successful"
tg-stop-library-processing --id "$proc_id"
echo "Processing record cleaned up: $proc_id"
else
echo "⚠ Document query returned no answer, processing may be incomplete"
echo "Keeping processing record: $proc_id"
fi
else
echo "✗ Document query failed, processing incomplete or failed"
echo "Keeping processing record: $proc_id"
fi
}
# Usage
conditional_cleanup "proc_001" "doc_123" "research-docs"
```
### Bulk Cleanup with Verification
```bash
# Bulk cleanup with individual verification
bulk_verified_cleanup() {
local proc_pattern="$1"
local collection="$2"
echo "Bulk cleanup with verification for pattern: $proc_pattern"
# Get list of processing IDs (this would need appropriate API)
# For now, use example pattern
for proc_id in proc_batch_*; do
if [[ "$proc_id" =~ $proc_pattern ]]; then
echo "Checking processing: $proc_id"
# Extract document ID from processing ID (example pattern)
if [[ "$proc_id" =~ _([^_]+)$ ]]; then
doc_id="${BASH_REMATCH[1]}"
# Verify document is accessible
if tg-invoke-document-rag -q "test" -C "$collection" 2>/dev/null | grep -q "$doc_id"; then
echo "✓ Verified: $proc_id"
tg-stop-library-processing --id "$proc_id"
else
echo "⚠ Unverified: $proc_id"
fi
else
echo "? Unknown pattern: $proc_id"
fi
fi
done
}
# Usage
bulk_verified_cleanup "batch_" "processed-docs"
```
### Processing Record Maintenance
```bash
# Maintain processing record hygiene
maintain_processing_records() {
local max_records="$1"
echo "Maintaining processing records (max: $max_records)"
# This would require an API to list and count processing records
# For now, demonstrate the concept
# Count current processing records (placeholder)
current_count=150 # Would get this from API
if [ "$current_count" -gt "$max_records" ]; then
excess=$((current_count - max_records))
echo "Found $current_count records, removing $excess oldest"
# Remove oldest processing records
# This would require timestamp information
echo "Would remove $excess oldest processing records"
# Example implementation:
# oldest_records=($(get_oldest_processing_records $excess))
# for proc_id in "${oldest_records[@]}"; do
# tg-stop-library-processing --id "$proc_id"
# done
else
echo "Processing record count within limits: $current_count"
fi
}
# Maintain maximum 100 processing records
maintain_processing_records 100
```
## Error Handling
### Processing ID Not Found
```bash
Exception: Processing ID not found
```
**Solution**: Verify processing ID exists and check spelling.
### Processing Still Active
```bash
Exception: Cannot remove active processing record
```
**Solution**: Wait for processing to complete or verify if processing is actually active.
### Permission Errors
```bash
Exception: Access denied
```
**Solution**: Check user permissions and processing record ownership.
### API Connection Issues
```bash
Exception: Connection refused
```
**Solution**: Check API URL and ensure TrustGraph is running.
## Monitoring and Verification
### Processing Record Status
```bash
# Check processing record status before removal
check_processing_status() {
local proc_id="$1"
echo "Checking status of processing: $proc_id"
# Check if processing is in active flows
if tg-show-flows | grep -q "$proc_id"; then
echo "Status: ACTIVE - Processing is currently running"
return 1
else
echo "Status: INACTIVE - Processing not found in active flows"
return 0
fi
}
# Usage
if check_processing_status "proc_001"; then
echo "Safe to remove processing record"
tg-stop-library-processing --id "proc_001"
else
echo "Processing still active, not removing record"
fi
```
### Cleanup Verification
```bash
# Verify successful removal
verify_removal() {
local proc_id="$1"
echo "Verifying removal of processing record: $proc_id"
# Check if processing record still exists
# This would require an API to query processing records
if tg-show-flows | grep -q "$proc_id"; then
echo "✗ Processing record still exists"
return 1
else
echo "✓ Processing record successfully removed"
return 0
fi
}
# Usage
tg-stop-library-processing --id "proc_001"
verify_removal "proc_001"
```
## Integration with Processing Workflow
### Complete Processing Lifecycle
```bash
# Complete processing lifecycle management
processing_lifecycle() {
local doc_id="$1"
local proc_id="$2"
local collection="$3"
echo "Managing complete processing lifecycle"
echo "Document: $doc_id"
echo "Processing: $proc_id"
echo "Collection: $collection"
# 1. Start processing
echo "1. Starting processing..."
tg-start-library-processing \
-d "$doc_id" \
--id "$proc_id" \
--collection "$collection"
# 2. Monitor processing
echo "2. Monitoring processing..."
timeout=300
elapsed=0
while [ $elapsed -lt $timeout ]; do
if tg-invoke-document-rag -q "test" -C "$collection" 2>/dev/null | grep -q "$doc_id"; then
echo "✓ Processing completed"
break
fi
sleep 10
elapsed=$((elapsed + 10))
done
# 3. Verify completion
echo "3. Verifying completion..."
if tg-invoke-document-rag -q "What is this document?" -C "$collection" 2>/dev/null; then
echo "✓ Document is queryable"
# 4. Clean up processing record
echo "4. Cleaning up processing record..."
tg-stop-library-processing --id "$proc_id"
echo "✓ Processing record removed"
else
echo "✗ Processing verification failed"
echo "Keeping processing record for investigation"
fi
}
# Usage
processing_lifecycle "doc_123" "proc_test_001" "test-collection"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-start-library-processing`](tg-start-library-processing.md) - Start document processing
- [`tg-show-library-documents`](tg-show-library-documents.md) - List library documents
- [`tg-show-flows`](tg-show-flows.md) - Monitor active processing flows
- [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Verify processed documents
## API Integration
This command uses the [Library API](../apis/api-librarian.md) to remove processing records from the document processing system.
## Best Practices
1. **Verify Completion**: Ensure processing is complete before removing records
2. **Check Dependencies**: Verify no other processes depend on the processing record
3. **Gradual Cleanup**: Remove processing records gradually to avoid system impact
4. **Monitor Impact**: Watch for any effects of record removal on system performance
5. **Documentation**: Log processing record removals for audit purposes
6. **Backup**: Consider backing up processing metadata before removal
7. **Testing**: Test cleanup procedures in non-production environments
## Troubleshooting
### Record Won't Remove
```bash
# Check if processing is actually complete
tg-show-flows | grep "processing-id"
# Verify API connectivity
curl -s "$TRUSTGRAPH_URL/api/v1/library/processing" > /dev/null
```
### Unexpected Behavior After Removal
```bash
# Check if document is still accessible
tg-invoke-document-rag -q "test" -C "collection"
# Verify document processing status
tg-show-library-documents | grep "document-id"
```
### Permission Issues
```bash
# Check user permissions
tg-show-library-documents -U "your-user"
# Verify processing record ownership
```

View file

@ -0,0 +1,335 @@
# tg-unload-kg-core
Removes a knowledge core from an active flow without deleting the stored core.
## Synopsis
```bash
tg-unload-kg-core --id CORE_ID [options]
```
## Description
The `tg-unload-kg-core` command removes a previously loaded knowledge core from an active processing flow, making that knowledge unavailable for queries and processing within that specific flow. The knowledge core remains stored in the system and can be loaded again later or into different flows.
This is useful for managing flow memory usage, switching knowledge contexts, or temporarily removing knowledge without permanent deletion.
## Options
### Required Arguments
- `--id, --identifier CORE_ID`: Identifier of the knowledge core to unload
### Optional Arguments
- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
- `-U, --user USER`: User identifier (default: `trustgraph`)
- `-f, --flow-id FLOW`: Flow ID to unload knowledge from (default: `default`)
## Examples
### Unload from Default Flow
```bash
tg-unload-kg-core --id "research-knowledge"
```
### Unload from Specific Flow
```bash
tg-unload-kg-core \
--id "medical-knowledge" \
--flow-id "medical-analysis" \
-U medical-team
```
### Unload Multiple Cores
```bash
# Unload several knowledge cores from a flow
tg-unload-kg-core --id "core-1" --flow-id "analysis-flow"
tg-unload-kg-core --id "core-2" --flow-id "analysis-flow"
tg-unload-kg-core --id "core-3" --flow-id "analysis-flow"
```
### Using Custom API URL
```bash
tg-unload-kg-core \
--id "production-knowledge" \
--flow-id "prod-flow" \
-u http://production:8088/
```
## Prerequisites
### Knowledge Core Must Be Loaded
The knowledge core must currently be loaded in the specified flow:
```bash
# Check what's loaded by querying the flow
tg-show-graph -f target-flow | head -10
# If no output, core may not be loaded
```
### Flow Must Be Running
The target flow must be active:
```bash
# Check running flows
tg-show-flows
# Verify the target flow exists
tg-show-flows | grep "target-flow"
```
## Unloading Process
1. **Validation**: Verifies knowledge core is loaded in the specified flow
2. **Query Termination**: Stops any ongoing queries using the knowledge
3. **Index Cleanup**: Removes knowledge indexes from flow context
4. **Memory Release**: Frees memory allocated to the knowledge core
5. **Service Update**: Updates flow services to reflect knowledge unavailability
## Effects of Unloading
### Knowledge Becomes Unavailable
After unloading, the knowledge is no longer accessible through the flow:
```bash
# Before unloading - knowledge available
tg-invoke-graph-rag -q "What knowledge is loaded?" -f my-flow
# Unload the knowledge
tg-unload-kg-core --id "my-knowledge" --flow-id "my-flow"
# After unloading - reduced knowledge available
tg-invoke-graph-rag -q "What knowledge is loaded?" -f my-flow
```
### Memory Recovery
- RAM used by knowledge indexes is freed
- Flow performance may improve
- Other knowledge cores in the flow remain unaffected
### Core Preservation
- Knowledge core remains stored in the system
- Can be reloaded later
- Available for loading into other flows
## Output
Successful unloading typically produces no output:
```bash
# Unload core (no output expected)
tg-unload-kg-core --id "test-core" --flow-id "test-flow"
# Verify unloading by checking available knowledge
tg-show-graph -f test-flow | wc -l
# Should show fewer triples if core was successfully unloaded
```
## Error Handling
### Knowledge Core Not Loaded
```bash
Exception: Knowledge core 'my-core' not loaded in flow 'my-flow'
```
**Solution**: Verify the core is actually loaded using `tg-show-graph` or load it first with `tg-load-kg-core`.
### Flow Not Found
```bash
Exception: Flow 'invalid-flow' not found
```
**Solution**: Check running flows with `tg-show-flows` and verify the flow ID.
### Permission Errors
```bash
Exception: Access denied to unload knowledge core
```
**Solution**: Verify user permissions for the knowledge core and flow.
### Connection Errors
```bash
Exception: Connection refused
```
**Solution**: Check the API URL and ensure TrustGraph is running.
## Verification
### Check Knowledge Reduction
```bash
# Count triples before unloading
before=$(tg-show-graph -f my-flow | wc -l)
# Unload knowledge
tg-unload-kg-core --id "my-core" --flow-id "my-flow"
# Count triples after unloading
after=$(tg-show-graph -f my-flow | wc -l)
echo "Triples before: $before, after: $after"
```
### Test Query Impact
```bash
# Test queries before and after unloading
tg-invoke-graph-rag -q "test query" -f my-flow
# Should work with loaded knowledge
tg-unload-kg-core --id "relevant-core" --flow-id "my-flow"
tg-invoke-graph-rag -q "test query" -f my-flow
# May return different results or "no relevant knowledge found"
```
## Use Cases
### Memory Management
```bash
# Free up memory by unloading unused knowledge
tg-unload-kg-core --id "large-historical-data" --flow-id "analysis-flow"
# Load more relevant knowledge
tg-load-kg-core --id "current-data" --flow-id "analysis-flow"
```
### Context Switching
```bash
# Switch from medical to legal knowledge context
tg-unload-kg-core --id "medical-knowledge" --flow-id "analysis-flow"
tg-load-kg-core --id "legal-knowledge" --flow-id "analysis-flow"
```
### Selective Knowledge Loading
```bash
# Load only specific knowledge for focused analysis
tg-unload-kg-core --id "general-knowledge" --flow-id "specialized-flow"
tg-load-kg-core --id "domain-specific" --flow-id "specialized-flow"
```
### Testing and Development
```bash
# Test flow behavior with different knowledge sets
tg-unload-kg-core --id "production-data" --flow-id "test-flow"
tg-load-kg-core --id "test-data" --flow-id "test-flow"
# Run tests
./run-knowledge-tests.sh
# Restore production knowledge
tg-unload-kg-core --id "test-data" --flow-id "test-flow"
tg-load-kg-core --id "production-data" --flow-id "test-flow"
```
### Flow Maintenance
```bash
# Prepare flow for maintenance by unloading all knowledge
cores=$(tg-show-kg-cores)
for core in $cores; do
tg-unload-kg-core --id "$core" --flow-id "maintenance-flow" 2>/dev/null || true
done
# Perform maintenance
./flow-maintenance.sh
# Reload required knowledge
tg-load-kg-core --id "essential-core" --flow-id "maintenance-flow"
```
## Knowledge Management Workflow
### Dynamic Knowledge Loading
```bash
# Function to switch knowledge contexts
switch_knowledge_context() {
local flow_id=$1
local old_core=$2
local new_core=$3
echo "Switching from $old_core to $new_core in $flow_id"
# Unload old knowledge
tg-unload-kg-core --id "$old_core" --flow-id "$flow_id"
# Load new knowledge
tg-load-kg-core --id "$new_core" --flow-id "$flow_id"
echo "Context switch completed"
}
# Usage
switch_knowledge_context "analysis-flow" "old-data" "new-data"
```
### Bulk Knowledge Management
```bash
# Unload all knowledge from a flow
unload_all_knowledge() {
local flow_id=$1
# Get list of potentially loaded cores
tg-show-kg-cores | while read core; do
echo "Attempting to unload $core from $flow_id"
tg-unload-kg-core --id "$core" --flow-id "$flow_id" 2>/dev/null || true
done
echo "All knowledge unloaded from $flow_id"
}
# Usage
unload_all_knowledge "cleanup-flow"
```
## Environment Variables
- `TRUSTGRAPH_URL`: Default API URL
## Related Commands
- [`tg-load-kg-core`](tg-load-kg-core.md) - Load knowledge core into flow
- [`tg-show-kg-cores`](tg-show-kg-cores.md) - List available knowledge cores
- [`tg-show-graph`](tg-show-graph.md) - View currently loaded knowledge
- [`tg-show-flows`](tg-show-flows.md) - List active flows
## API Integration
This command uses the [Knowledge API](../apis/api-knowledge.md) with the `unload-kg-core` operation to remove knowledge from active flows.
## Best Practices
1. **Memory Monitoring**: Monitor flow memory usage when loading/unloading knowledge
2. **Graceful Unloading**: Ensure no critical queries are running before unloading
3. **Documentation**: Document which knowledge cores are needed for each flow
4. **Testing**: Test flow behavior after unloading knowledge
5. **Backup Strategy**: Keep knowledge cores stored even when not loaded
6. **Performance Optimization**: Unload unused knowledge to improve performance
## Troubleshooting
### Knowledge Still Appears in Queries
```bash
# If knowledge still appears after unloading
# Check if multiple cores contain similar data
tg-show-graph -f my-flow | grep "expected-removed-entity"
# Verify all relevant cores were unloaded
```
### Memory Not Released
```bash
# If memory usage doesn't decrease after unloading
# Check system memory usage
free -h
# Contact system administrator if memory leak suspected
```
### Query Performance Issues
```bash
# If queries become slow after unloading
# May need to reload essential knowledge
tg-load-kg-core --id "essential-core" --flow-id "slow-flow"
# Or restart the flow
tg-stop-flow -i "slow-flow"
tg-start-flow -n "flow-class" -i "slow-flow" -d "Restarted flow"
```