trustgraph/README.md

131 lines
6.2 KiB
Markdown
Raw Normal View History

2024-07-16 17:00:56 +01:00
# TrustGraph
2024-09-22 10:41:49 -07:00
![TrustGraph banner](TG_Banner_readme.png)
2024-08-12 11:56:21 -07:00
2024-09-23 14:07:17 -07:00
🚀 [Full Documentation](https://trustgraph.ai/docs/getstarted)
2024-08-23 11:16:19 -07:00
💬 [Join the Discord](https://discord.gg/AXpxVjwzAw)
📖 [Read the Blog](https://blog.trustgraph.ai)
2024-09-23 13:59:02 -07:00
📺 [YouTube](https://www.youtube.com/@TrustGraph)
2024-08-23 11:16:19 -07:00
2024-07-16 17:00:56 +01:00
## Introduction
2024-09-24 14:24:16 -07:00
TrustGraph deploys a full E2E (end-to-end) AI solution with native GraphRAG in minutes. Autonomous Knowledge Agents build ultra-dense knowlege graphs to fully capture all knowledge context. TrustGraph is designed for maximum flexibility and modularity whether it's calling Cloud LLMs or deploying SLMs On-Device. TrustGraph ingests data to build a RDF style knowledge graph to enable accurate and private `RAG` responses using only the knowledge you want, when you want.
2024-07-16 17:00:56 +01:00
2024-09-23 13:59:02 -07:00
The pipeline processing components are interconnected with a pub/sub engine to maximize modularity for agent integration. The core processing components decode documents, chunk text, create mapped embeddings, generate a RDF knowledge graph, generate AI predictions from either a Cloud LLM or On-Device SLM.
2024-07-16 17:00:56 +01:00
2024-09-11 10:10:39 -07:00
The processing showcases the reliability and efficiences of GraphRAG algorithms which can capture contextual language flags that are missed in conventional RAG approaches. Graph querying algorithms enable retrieving not just relevant knowledge but language cues essential to understanding semantic uses unique to a text corpus.
2024-07-16 17:00:56 +01:00
2024-09-23 13:59:02 -07:00
## Deploy in Minutes
2024-09-23 14:14:30 -07:00
TrustGraph is designed to deploy all the services and stores needed for a scalable GraphRAG infrastructure as quickly and simply as possible.
### Install Requirements
```
python3 -m venv env
. env/bin/activate
pip3 install pulsar-client
pip3 install cassandra-driver
export PYTHON_PATH=.
```
### Download TrustGraph
```
git clone https://github.com/trustgraph-ai/trustgraph trustgraph
cd trustgraph
```
TrustGraph is fully containerized and is launched with a Docker Compose `YAML` file. These files are prebuilt and included in the download main directory. Simply select the file that matches your desired model deployment and graph store configuration.
2024-09-23 13:59:02 -07:00
| Model Deployment | Graph Store | Launch File |
| ---------------- | ------------ | ----------- |
| AWS Bedrock | Cassandra | `tg-launch-bedrock-cassandra.yaml` |
| AWS Bedrock | Neo4j | `tg-launch-bedrock-neo4j.yaml` |
| AzureAI Serverless Endpoint | Cassandra | `tg-launch-azure-cassandra.yaml` |
| AzureAI Serverless Endpoint | Neo4j | `tg-launch-azure-neo4j.yaml` |
| Anthropic API | Cassandra | `tg-launch-claude-cassandra.yaml` |
| Anthropic API | Neo4j | `tg-launch-claude-neo4j.yaml` |
| Cohere API | Cassandra | `tg-launch-cohere-cassandra.yaml` |
| Cohere API | Neo4j | `tg-launch-cohere-neo4j.yaml` |
| Llamafile | Cassandra | `tg-launch-llamafile-cassandra.yaml` |
| Llamafile | Neo4j | `tg-launch-llamafile-neo4j.yaml` |
| Mixed Depoloyment | Cassandra | `tg-launch-mix-cassandra.yaml` |
| Mixed Depoloyment | Neo4j | `tg-launch-mix-neo4j.yaml` |
| Ollama | Cassandra | `tg-launch-ollama-cassandra.yaml` |
| Ollama | Neo4j | `tg-launch-ollama-neo4j.yaml` |
| OpenAI | Cassandra | `tg-launch-openai-cassandra.yaml` |
| OpenAI | Neo4j | `tg-launch-openai-neo4j.yaml` |
| VertexAI | Cassandra | `tg-launch-vertexai-cassandra.yaml` |
| VertexAI | Neo4j | `tg-launch-vertexai-neo4j.yaml` |
Launching TrustGraph is as simple as running one line:
```
docker compose -f <launch-file> up -d
```
## Core TrustGraph Features
2024-07-16 17:00:56 +01:00
- PDF decoding
- Text chunking
2024-09-23 13:59:02 -07:00
- On-Device SLM inference with [Ollama](https://ollama.com) or [Llamafile](https://github.com/Mozilla-Ocho/llamafile)
- Cloud LLM infernece: `AWS Bedrock`, `AzureAI`, `Anthropic`, `Cohere`, `OpenAI`, and `VertexAI`
- Chunk-mapped vector embeddings with [HuggingFace](https://hf.co) models
2024-09-24 14:24:16 -07:00
- [RDF](https://www.w3.org/TR/rdf12-schema/) Knowledge Extraction Agents
2024-09-23 13:59:02 -07:00
- [Apache Cassandra](https://github.com/apache/cassandra) or [Neo4j](https://neo4j.com/) as the graph store
- [Qdrant](https://qdrant.tech/) as the VectorDB
2024-08-12 11:56:21 -07:00
- Build and load [Knowledge Cores](https://trustgraph.ai/docs/category/knowledge-cores)
2024-09-11 10:10:39 -07:00
- GraphRAG query service
2024-09-23 14:05:01 -07:00
- [Grafana](https://github.com/grafana/) telemetry dashboard
2024-09-23 13:59:02 -07:00
- Module integration with [Apache Pulsar](https://github.com/apache/pulsar/)
- Container orchestration with `Docker` or [Podman](http://podman.io/)
2024-07-16 17:00:56 +01:00
## Architecture
2024-08-27 19:11:36 -07:00
![architecture](architecture_0.8.0.png)
2024-07-16 17:00:56 +01:00
2024-09-23 13:59:02 -07:00
TrustGraph is designed to be modular to support as many Language Models and environments as possible. A natural fit for a modular architecture is to decompose functions into a set of modules connected through a pub/sub backbone. [Apache Pulsar](https://github.com/apache/pulsar/) serves as this pub/sub backbone. Pulsar acts as the data broker managing data processing queues connected to procesing modules.
2024-07-16 17:00:56 +01:00
2024-09-23 13:59:02 -07:00
### Pulsar Workflows
2024-07-16 17:00:56 +01:00
2024-08-27 18:36:39 -07:00
- For processing flows, Pulsar accepts the output of a processing module and queues it for input to the next subscribed module.
- For services such as LLMs and embeddings, Pulsar provides a client/server model. A Pulsar queue is used as the input to the service. When processed, the output is then delivered to a separate queue where a client subscriber can request that output.
2024-09-24 14:24:16 -07:00
## Knowledge Agents
2024-09-23 13:59:02 -07:00
2024-09-24 14:24:16 -07:00
TrustGraph extracts knowledge from a text corpus (PDF or text) to an ultra-dense knowledge graph using 3 automonous knowledge agents. These agents focus on individual elements needed to build the RDF knowledge graph. The agents are:
2024-09-23 13:59:02 -07:00
2024-09-24 14:24:16 -07:00
- Topic Extraction Agent
- Entity Extraction Agent
- Node Connection Agent
2024-09-23 13:59:02 -07:00
2024-09-24 14:24:16 -07:00
The agent prompts are built through templates, enabling customized extraction agents for a specific use case. The extraction agents are launched automatically with either of following commands pointing to the path of a desired text corpus or the included sample files:
2024-09-23 14:21:50 -07:00
2024-09-23 15:01:23 -07:00
PDF file:
2024-09-23 14:21:50 -07:00
```
scripts/load-pdf -f sample-text-corpus.pdf
2024-09-23 15:01:23 -07:00
```
Text file:
```
2024-09-23 14:21:50 -07:00
scripts/load-text -f sample-text-corpus.txt
```
2024-09-23 13:59:02 -07:00
## GraphRAG Queries
Once the knowledge graph has been built or a knowledge core has been loaded, GraphRAG queries are launched with a single line:
```
scripts/query-graph-rag -q "Write a blog post about the 5 key takeaways from SB1047 and how they will impact AI development."
```
2024-07-16 17:00:56 +01:00
2024-09-23 14:05:01 -07:00
## Deploy and Manage TrustGraph
2024-07-16 17:00:56 +01:00
2024-09-23 14:05:01 -07:00
[🚀 Full Deployment Guide 🚀](https://trustgraph.ai/docs/getstarted)
2024-07-16 17:06:07 +01:00
2024-09-16 17:52:02 -07:00
## TrustGraph Developer's Guide
2024-07-16 17:06:07 +01:00
2024-09-23 13:59:02 -07:00
[Developing for TrustGraph](docs/README.development.md)