diff --git a/README.md b/README.md index eb28e84f..8fc8f3bb 100644 --- a/README.md +++ b/README.md @@ -50,6 +50,10 @@ Pulsar provides two types of connectivity: processed, the output is delivered to a separate queue so that the caller can collect the data. +All the code is bundled into a single Python package which can be used to +use all the functionality. There is also a container image with the +package installed which can be used to run everything. + ## Included modules - `chunker-recursive` - Accepts text documents and uses LangChain recurse @@ -108,9 +112,285 @@ Using the Docker Compose you should be able to... - Run a query which uses the vector and graph stores to produce a prompt which is answered using an LLM. -If you get a Graph RAG response to the query, everything is working +If you get a Graph RAG response to the query, everything is working. -### Docker compose +### Clone the Github repo -TBD +``` +git clone https://github.com/trustgraph-ai/trustgraph trustgraph +cd trustgraph +``` + +### Docker compose files + +There are 4 docker compose files to choose from depending on the LLM you +wish to use: + +- `docker-compose-azure.yaml`. This is for a serverless AI endpoint + hosted on Azure. Set `AZURE_TOKEN` to the secret token and + `AZURE_ENDPOINT` to the endpoint address. +- `docker-compose-claude.yaml`. This is for using Anthropic Claude LLM. + Set `CLAUDE_KEY` to the API key. +- `docker-compose-ollama.yaml`. This is for a local LLM - gemma2 hosted + using Ollama. Set `OLLAMA_HOST` to the host running Ollama (e.g. + `localhost` to talk to a locally hosted Ollama. +- `docker-compose-vertexai.yaml`. This is for using Google Cloud VertexAI. + You need a private.json authentication file for your Google Cloud. + Should be at path `vertexai/private.json`. + + +#### docker-compose-azure.yaml + +``` +export AZURE_ENDPOINT=https://ENDPOINT.HOST.GOES.HERE/ +export AZURE_TOKEN=TOKEN-GOES-HERE +docker-compose -f docker-compose-azure.yaml up -d +``` + +#### docker-compose-claude.yaml + +``` +export CLAUDE_KEY=TOKEN-GOES-HERE +docker-compose -f docker-compose-claude.yaml up -d +``` + +#### docker-compose-ollama.yaml + +``` +export OLLAMA_HOST=localhost # Set to hostname of Ollama host +docker-compose -f docker-compose-ollama.yaml up -d +``` + +#### docker-compose-azure.yaml + +``` +mkdir -p vertexai +cp {whatever} vertexai/private.json +docker-compose -f docker-compose-vertexai.yaml up -d +``` + +On Linux if running SELinux you may need to set the permissions on the +VertexAI directory so that the key file can be mounted on a docker +container... + +``` +chcon -Rt svirt_sandbox_file_t vertexai/ +``` + +### Check things are running + +Check that you have a set of containers running... + +```docker ps +``` + +You might want to look at containers which are down to see if any +have exited unexpectedly - look at the STATUS field. + +```docker ps -a +``` + +### Wait + +Before proceeding, you should leave enough time for the system to +settle into a working state. On my Macbook, it takes about 30 seconds +for Pulsar to start, before which, nothing works. + +The system uses Cassandra for a Graph store, takes around 60-70 seconds +to achieve a working state. For your first go, I would advise just letting +everything settle for a couple of minutes before doing anything else, so +that if there are errors you know it's not just that the system is starting +up. + +### Install requirements + +``` +python3 -m venv env +. env/bin/activate +pip3 install pulsar-client +pip3 install cassandra-driver +``` + +### Load some data + +Create a sources directory and get a test file... + +``` +mkdir sources +curl -o sources/Challenger-Report-Vol1.pdf https://sma.nasa.gov/SignificantIncidents/assets/rogers_commission_report.pdf +``` + +Then load the file... + +``` +scripts/loader +``` + +You get some output on the screen, if nothing looks like errors (has the +ERROR tag) you should be good. + +### Check logs + +Look at the PDF decoder... + +``` +docker logs trustgraph_pdf-decoder_1 +``` + +which should contain some text like... +``` +Decoding 1f7b7055... +Done. +``` + +Look at the chunker output... + +``` +docker logs trustgraph_chunker_1 +``` + +You will see similar output, except many entries instead of 1. + +Look at the vectorizer output... + +``` +docker logs trustgraph_vectorize_1 +``` + +You will see similar output, except many entries instead of 1. + +Look at the LLM output... + +``` +docker logs trustgraph_llm_1 +``` + +You will see output like this... +``` +Handling prompt fa1b98ae-70ef-452b-bcbe-21a867c5e8e2... +Send response... +Done. +``` + +Two more log outputs to look at... + +``` +docker logs trustgraph_kg-extract-definitions_1 +docker logs trustgraph_kg-extract-relationships_1 +``` + +Definitions output similar to this should be visible + +``` +Indexing 1f7b7055-p11-c1... +[ + { + "entity": "Orbiter", + "definition": "A spacecraft designed for spaceflight." + }, + { + "entity": "flight deck", + "definition": "The top level of the crew compartment, typically where flight controls are located." + }, + { + "entity": "middeck", + "definition": "The lower level of the crew compartment, used for sleeping, working, and storing equipment." + } +] +Done. +``` + +and Relationships output... + +``` +Indexing 1f7b7055-p11-c3... +[ + { + "subject": "Space Shuttle", + "predicate": "carry", + "object": "16 tons of cargo", + "object-entity": false + }, + { + "subject": "friction", + "predicate": "generated by", + "object": "atmosphere", + "object-entity": true + } +] +Done. +``` + +### Check graph is loading + +``` +scripts/graph-show +``` + +You should see some output along the lines of a load of lines like this... + +``` +http://trustgraph.ai/e/enterprise http://trustgraph.ai/e/was-carried to altitude and released for a gliding approach and landing at the Mojave Desert test center +http://trustgraph.ai/e/enterprise http://www.w3.org/2000/01/rdf-schema#label Enterprise +http://trustgraph.ai/e/enterprise http://www.w3.org/2004/02/skos/core#definition A prototype space shuttle orbiter used for atmospheric flight testing. +``` + +Any output at all is a good sign - indicates the graph is loading. + +### Query time + +With the graph loading, you should be able to see the number of graph edges +loaded... +``` +scripts/graph-show | wc -l +``` + +You need a good few hundred edges to be loaded for the query to work on that +particular document, because it's the point where the indexer has passed +the mundane intro parts of the document and got into the interesting +parts. + +``` +tests/graph/rag +``` + +You should give the command at least a minute to run before being +concerned. The output should look like this... + +``` +Here are 20 facts from the provided knowledge graph about the Space Shuttle disaster: + +1. **Space Shuttle Challenger was a Space Shuttle spacecraft.** +2. **The third Spacelab mission was carried by Orbiter Challenger.** +3. **Francis R. Scobee was the Commander of the Challenger crew.** +4. **Earth-to-orbit systems are designed to transport payloads and humans from Earth's surface into orbit.** +5. **The Space Shuttle program involved the Space Shuttle.** +6. **Orbiter Challenger flew on mission 41-B.** +7. **Orbiter Challenger was used on STS-7 and STS-8 missions.** +8. **Columbia completed the orbital test.** +9. **The Space Shuttle flew 24 successful missions.** +10. **One possibility for the Space Shuttle was a winged but unmanned recoverable liquid-fuel vehicle based on the Saturn 5 rocket.** +11. **A Commission was established to investigate the space shuttle Challenger accident.** +12. **Judit h Arlene Resnik was Mission Specialist Two.** +13. **Mission 51-L was originally scheduled for December 1985 but was delayed until January 1986.** +14. **The Corporation's Space Transportation Systems Division was responsible for the design and development of the Space Shuttle Orbiter.** +15. **Michael John Smith was the Pilot of the Challenger crew.** +16. **The Space Shuttle is composed of two recoverable Solid Rocket Boosters.** +17. **The Space Shuttle provides for the broadest possible spectrum of civil/military missions.** +18. **Mission 51-L consisted of placing one satellite in orbit, deploying and retrieving Spartan, and conducting six experiments.** +19. **The Space Shuttle became the focus of NASA's near-term future.** +20. **The Commission focused its attention on safety aspects of future flights.** +``` + +If it looks like something isn't working, try following the graph-rag +logs: + +``` +docker logs -f trustgraph_graph-rag_1 +``` + +If you get an answer to your query, Graph RAG is working! + +If you want to try different queries try modifying the +script you ran at `tests/test-graph-rag`.