Documentation on starting the system
This commit is contained in:
cybermaggedon 2024-07-12 14:40:45 +01:00 committed by GitHub
parent ebcd5fe902
commit 603ad4e38f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

286
README.md
View file

@ -50,6 +50,10 @@ Pulsar provides two types of connectivity:
processed, the output is delivered to a separate queue so that the caller
can collect the data.
All the code is bundled into a single Python package which can be used to
use all the functionality. There is also a container image with the
package installed which can be used to run everything.
## Included modules
- `chunker-recursive` - Accepts text documents and uses LangChain recurse
@ -108,9 +112,285 @@ Using the Docker Compose you should be able to...
- Run a query which uses the vector and graph stores to produce a prompt
which is answered using an LLM.
If you get a Graph RAG response to the query, everything is working
If you get a Graph RAG response to the query, everything is working.
### Docker compose
### Clone the Github repo
TBD
```
git clone https://github.com/trustgraph-ai/trustgraph trustgraph
cd trustgraph
```
### Docker compose files
There are 4 docker compose files to choose from depending on the LLM you
wish to use:
- `docker-compose-azure.yaml`. This is for a serverless AI endpoint
hosted on Azure. Set `AZURE_TOKEN` to the secret token and
`AZURE_ENDPOINT` to the endpoint address.
- `docker-compose-claude.yaml`. This is for using Anthropic Claude LLM.
Set `CLAUDE_KEY` to the API key.
- `docker-compose-ollama.yaml`. This is for a local LLM - gemma2 hosted
using Ollama. Set `OLLAMA_HOST` to the host running Ollama (e.g.
`localhost` to talk to a locally hosted Ollama.
- `docker-compose-vertexai.yaml`. This is for using Google Cloud VertexAI.
You need a private.json authentication file for your Google Cloud.
Should be at path `vertexai/private.json`.
#### docker-compose-azure.yaml
```
export AZURE_ENDPOINT=https://ENDPOINT.HOST.GOES.HERE/
export AZURE_TOKEN=TOKEN-GOES-HERE
docker-compose -f docker-compose-azure.yaml up -d
```
#### docker-compose-claude.yaml
```
export CLAUDE_KEY=TOKEN-GOES-HERE
docker-compose -f docker-compose-claude.yaml up -d
```
#### docker-compose-ollama.yaml
```
export OLLAMA_HOST=localhost # Set to hostname of Ollama host
docker-compose -f docker-compose-ollama.yaml up -d
```
#### docker-compose-azure.yaml
```
mkdir -p vertexai
cp {whatever} vertexai/private.json
docker-compose -f docker-compose-vertexai.yaml up -d
```
On Linux if running SELinux you may need to set the permissions on the
VertexAI directory so that the key file can be mounted on a docker
container...
```
chcon -Rt svirt_sandbox_file_t vertexai/
```
### Check things are running
Check that you have a set of containers running...
```docker ps
```
You might want to look at containers which are down to see if any
have exited unexpectedly - look at the STATUS field.
```docker ps -a
```
### Wait
Before proceeding, you should leave enough time for the system to
settle into a working state. On my Macbook, it takes about 30 seconds
for Pulsar to start, before which, nothing works.
The system uses Cassandra for a Graph store, takes around 60-70 seconds
to achieve a working state. For your first go, I would advise just letting
everything settle for a couple of minutes before doing anything else, so
that if there are errors you know it's not just that the system is starting
up.
### Install requirements
```
python3 -m venv env
. env/bin/activate
pip3 install pulsar-client
pip3 install cassandra-driver
```
### Load some data
Create a sources directory and get a test file...
```
mkdir sources
curl -o sources/Challenger-Report-Vol1.pdf https://sma.nasa.gov/SignificantIncidents/assets/rogers_commission_report.pdf
```
Then load the file...
```
scripts/loader
```
You get some output on the screen, if nothing looks like errors (has the
ERROR tag) you should be good.
### Check logs
Look at the PDF decoder...
```
docker logs trustgraph_pdf-decoder_1
```
which should contain some text like...
```
Decoding 1f7b7055...
Done.
```
Look at the chunker output...
```
docker logs trustgraph_chunker_1
```
You will see similar output, except many entries instead of 1.
Look at the vectorizer output...
```
docker logs trustgraph_vectorize_1
```
You will see similar output, except many entries instead of 1.
Look at the LLM output...
```
docker logs trustgraph_llm_1
```
You will see output like this...
```
Handling prompt fa1b98ae-70ef-452b-bcbe-21a867c5e8e2...
Send response...
Done.
```
Two more log outputs to look at...
```
docker logs trustgraph_kg-extract-definitions_1
docker logs trustgraph_kg-extract-relationships_1
```
Definitions output similar to this should be visible
```
Indexing 1f7b7055-p11-c1...
[
{
"entity": "Orbiter",
"definition": "A spacecraft designed for spaceflight."
},
{
"entity": "flight deck",
"definition": "The top level of the crew compartment, typically where flight controls are located."
},
{
"entity": "middeck",
"definition": "The lower level of the crew compartment, used for sleeping, working, and storing equipment."
}
]
Done.
```
and Relationships output...
```
Indexing 1f7b7055-p11-c3...
[
{
"subject": "Space Shuttle",
"predicate": "carry",
"object": "16 tons of cargo",
"object-entity": false
},
{
"subject": "friction",
"predicate": "generated by",
"object": "atmosphere",
"object-entity": true
}
]
Done.
```
### Check graph is loading
```
scripts/graph-show
```
You should see some output along the lines of a load of lines like this...
```
http://trustgraph.ai/e/enterprise http://trustgraph.ai/e/was-carried to altitude and released for a gliding approach and landing at the Mojave Desert test center
http://trustgraph.ai/e/enterprise http://www.w3.org/2000/01/rdf-schema#label Enterprise
http://trustgraph.ai/e/enterprise http://www.w3.org/2004/02/skos/core#definition A prototype space shuttle orbiter used for atmospheric flight testing.
```
Any output at all is a good sign - indicates the graph is loading.
### Query time
With the graph loading, you should be able to see the number of graph edges
loaded...
```
scripts/graph-show | wc -l
```
You need a good few hundred edges to be loaded for the query to work on that
particular document, because it's the point where the indexer has passed
the mundane intro parts of the document and got into the interesting
parts.
```
tests/graph/rag
```
You should give the command at least a minute to run before being
concerned. The output should look like this...
```
Here are 20 facts from the provided knowledge graph about the Space Shuttle disaster:
1. **Space Shuttle Challenger was a Space Shuttle spacecraft.**
2. **The third Spacelab mission was carried by Orbiter Challenger.**
3. **Francis R. Scobee was the Commander of the Challenger crew.**
4. **Earth-to-orbit systems are designed to transport payloads and humans from Earth's surface into orbit.**
5. **The Space Shuttle program involved the Space Shuttle.**
6. **Orbiter Challenger flew on mission 41-B.**
7. **Orbiter Challenger was used on STS-7 and STS-8 missions.**
8. **Columbia completed the orbital test.**
9. **The Space Shuttle flew 24 successful missions.**
10. **One possibility for the Space Shuttle was a winged but unmanned recoverable liquid-fuel vehicle based on the Saturn 5 rocket.**
11. **A Commission was established to investigate the space shuttle Challenger accident.**
12. **Judit h Arlene Resnik was Mission Specialist Two.**
13. **Mission 51-L was originally scheduled for December 1985 but was delayed until January 1986.**
14. **The Corporation's Space Transportation Systems Division was responsible for the design and development of the Space Shuttle Orbiter.**
15. **Michael John Smith was the Pilot of the Challenger crew.**
16. **The Space Shuttle is composed of two recoverable Solid Rocket Boosters.**
17. **The Space Shuttle provides for the broadest possible spectrum of civil/military missions.**
18. **Mission 51-L consisted of placing one satellite in orbit, deploying and retrieving Spartan, and conducting six experiments.**
19. **The Space Shuttle became the focus of NASA's near-term future.**
20. **The Commission focused its attention on safety aspects of future flights.**
```
If it looks like something isn't working, try following the graph-rag
logs:
```
docker logs -f trustgraph_graph-rag_1
```
If you get an answer to your query, Graph RAG is working!
If you want to try different queries try modifying the
script you ran at `tests/test-graph-rag`.