# tg-load-sample-documents Loads predefined sample documents into TrustGraph library for testing and demonstration purposes. ## Synopsis ```bash tg-load-sample-documents [options] ``` ## Description The `tg-load-sample-documents` command loads a curated set of sample documents into TrustGraph's document library. These documents include academic papers, government reports, and reference materials that demonstrate TrustGraph's capabilities and provide data for testing and evaluation. The command downloads documents from public sources and adds them to the library with comprehensive metadata including RDF triples for semantic relationships. ## Options ### Optional Arguments - `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`) - `-U, --user USER`: User ID for document ownership (default: `trustgraph`) ## Examples ### Basic Loading ```bash tg-load-sample-documents ``` ### Load with Custom User ```bash tg-load-sample-documents -U "demo-user" ``` ### Load to Custom Environment ```bash tg-load-sample-documents -u http://demo.trustgraph.ai:8088/ ``` ## Sample Documents The command loads the following sample documents: ### 1. NASA Challenger Report - **Title**: Report of the Presidential Commission on the Space Shuttle Challenger Accident, Volume 1 - **Topics**: Safety engineering, space shuttle, NASA - **Format**: PDF - **Source**: NASA Technical Reports Server - **Use Case**: Demonstrates technical document processing and safety analysis ### 2. Old Icelandic Dictionary - **Title**: A Concise Dictionary of Old Icelandic - **Topics**: Language, linguistics, Old Norse, grammar - **Format**: PDF - **Publication**: 1910, Clarendon Press - **Use Case**: Historical document processing and linguistic analysis ### 3. US Intelligence Threat Assessment - **Title**: Annual Threat Assessment of the U.S. Intelligence Community - March 2025 - **Topics**: National security, cyberthreats, geopolitics - **Format**: PDF - **Source**: Director of National Intelligence - **Use Case**: Current affairs analysis and security research ### 4. Intelligence and State Policy - **Title**: The Role of Intelligence and State Policies in International Security - **Topics**: Intelligence, international security, state policy - **Format**: PDF (sample) - **Publication**: Cambridge Scholars Publishing, 2021 - **Use Case**: Academic research and policy analysis ### 5. Globalization and Intelligence - **Title**: Beyond the Vigilant State: Globalisation and Intelligence - **Topics**: Intelligence, globalization, security studies - **Format**: PDF - **Author**: Richard J. Aldrich - **Use Case**: Academic paper analysis and research ## Use Cases ### Demo Environment Setup ```bash # Set up demonstration environment setup_demo_environment() { echo "Setting up TrustGraph demo environment..." # Initialize system tg-init-trustgraph # Load sample documents echo "Loading sample documents..." tg-load-sample-documents -U "demo" # Wait for processing echo "Waiting for document processing..." sleep 60 # Start document processing echo "Starting document processing..." tg-show-library-documents -U "demo" | \ grep "| id" | \ awk '{print $3}' | \ while read doc_id; do proc_id="demo_proc_$(date +%s)_${doc_id}" tg-start-library-processing -d "$doc_id" --id "$proc_id" -U "demo" done echo "Demo environment ready!" echo "Try: tg-invoke-document-rag -q 'What caused the Challenger accident?' -U demo" } ``` ### Testing Data Pipeline ```bash # Test complete document processing pipeline test_document_pipeline() { echo "Testing document processing pipeline..." # Load sample documents tg-load-sample-documents -U "test" # List loaded documents echo "Loaded documents:" tg-show-library-documents -U "test" # Start processing for each document tg-show-library-documents -U "test" | \ grep "| id" | \ awk '{print $3}' | \ while read doc_id; do echo "Processing document: $doc_id" proc_id="test_$(date +%s)_${doc_id}" tg-start-library-processing -d "$doc_id" --id "$proc_id" -U "test" done # Wait for processing echo "Processing documents... (this may take several minutes)" sleep 300 # Test document queries echo "Testing document queries..." test_queries=( "What is the Challenger accident?" "What is Old Icelandic?" "What are the main cybersecurity threats?" "What is intelligence policy?" ) for query in "${test_queries[@]}"; do echo "Query: $query" tg-invoke-document-rag -q "$query" -U "test" | head -5 echo "---" done echo "Pipeline test complete!" } ``` ### Educational Environment ```bash # Set up educational/training environment setup_educational_environment() { local class_name="$1" echo "Setting up educational environment for: $class_name" # Create user for the class class_user=$(echo "$class_name" | tr '[:upper:]' '[:lower:]' | tr ' ' '-') # Load sample documents for the class tg-load-sample-documents -U "$class_user" # Process documents echo "Processing documents for educational use..." tg-show-library-documents -U "$class_user" | \ grep "| id" | \ awk '{print $3}' | \ while read doc_id; do proc_id="edu_$(date +%s)_${doc_id}" tg-start-library-processing \ -d "$doc_id" \ --id "$proc_id" \ -U "$class_user" \ --collection "education" done echo "Educational environment ready for: $class_name" echo "User: $class_user" echo "Collection: education" } # Set up for different classes setup_educational_environment "AI Research Methods" setup_educational_environment "Security Studies" ``` ### Benchmarking and Performance Testing ```bash # Benchmark document processing performance benchmark_processing() { echo "Starting document processing benchmark..." # Load sample documents start_time=$(date +%s) tg-load-sample-documents -U "benchmark" load_time=$(date +%s) echo "Document loading time: $((load_time - start_time))s" # Count documents doc_count=$(tg-show-library-documents -U "benchmark" | grep -c "| id") echo "Documents loaded: $doc_count" # Start processing processing_ids=() tg-show-library-documents -U "benchmark" | \ grep "| id" | \ awk '{print $3}' | \ while read doc_id; do proc_id="bench_$(date +%s)_${doc_id}" processing_ids+=("$proc_id") tg-start-library-processing -d "$doc_id" --id "$proc_id" -U "benchmark" done processing_start=$(date +%s) # Monitor processing completion echo "Monitoring processing completion..." while true; do active_processing=$(tg-show-flows | grep -c "bench_" || echo "0") if [ "$active_processing" -eq 0 ]; then break fi echo "Active processing jobs: $active_processing" sleep 30 done processing_end=$(date +%s) echo "Processing completion time: $((processing_end - processing_start))s" echo "Total benchmark time: $((processing_end - start_time))s" # Test query performance echo "Testing query performance..." query_start=$(date +%s) for i in {1..10}; do tg-invoke-document-rag \ -q "What are the main topics in these documents?" \ -U "benchmark" > /dev/null done query_end=$(date +%s) echo "Average query time: $(echo "scale=2; ($query_end - $query_start) / 10" | bc)s" } ``` ## Advanced Usage ### Selective Document Loading ```bash # Load only specific types of documents load_by_category() { local category="$1" case "$category" in "government") echo "Loading government documents..." # This would require modifying the script to load selectively # For now, we load all and filter by tags later tg-load-sample-documents -U "gov-docs" ;; "academic") echo "Loading academic documents..." tg-load-sample-documents -U "academic-docs" ;; "historical") echo "Loading historical documents..." tg-load-sample-documents -U "historical-docs" ;; *) echo "Loading all sample documents..." tg-load-sample-documents ;; esac } # Load by category load_by_category "government" load_by_category "academic" ``` ### Multi-Environment Loading ```bash # Load sample documents to multiple environments multi_environment_setup() { local environments=("dev" "staging" "demo") for env in "${environments[@]}"; do echo "Setting up $env environment..." tg-load-sample-documents \ -u "http://$env.trustgraph.company.com:8088/" \ -U "sample-data" echo "✓ $env environment loaded" done echo "All environments loaded with sample documents" } ``` ### Custom Document Sets ```bash # Create custom document loading scripts based on the sample create_custom_loader() { local domain="$1" cat > "load-${domain}-documents.py" << 'EOF' #!/usr/bin/env python3 """ Custom document loader for specific domain Based on tg-load-sample-documents """ import argparse import os from trustgraph.api import Api # Define your own document set here documents = [ { "id": "https://example.com/doc/custom-1", "title": "Custom Document 1", "url": "https://example.com/docs/custom1.pdf", # Add your document definitions... } ] # Rest of the implementation similar to tg-load-sample-documents EOF echo "Custom loader created: load-${domain}-documents.py" } # Create custom loaders for different domains create_custom_loader "medical" create_custom_loader "legal" create_custom_loader "technical" ``` ## Document Analysis ### Content Analysis ```bash # Analyze loaded sample documents analyze_sample_documents() { echo "Analyzing sample documents..." # Get document statistics total_docs=$(tg-show-library-documents | grep -c "| id") echo "Total documents: $total_docs" # Analyze by type echo "Document types:" tg-show-library-documents | \ grep "| kind" | \ awk '{print $3}' | \ sort | uniq -c # Analyze tags echo "Popular tags:" tg-show-library-documents | \ grep "| tags" | \ sed 's/.*| tags.*| \(.*\) |.*/\1/' | \ tr ',' '\n' | \ sed 's/^ *//;s/ *$//' | \ sort | uniq -c | sort -nr | head -10 # Document sizes (would need additional API) echo "Document analysis complete" } ``` ### Query Testing ```bash # Test sample documents with various queries test_sample_queries() { echo "Testing sample document queries..." # Define test queries for different domains queries=( "What caused the Challenger space shuttle accident?" "What is Old Norse language?" "What are current cybersecurity threats?" "How does globalization affect intelligence services?" "What are the main security challenges in international relations?" ) for query in "${queries[@]}"; do echo "Testing query: $query" echo "====================" result=$(tg-invoke-document-rag -q "$query" 2>/dev/null) if [ $? -eq 0 ]; then echo "$result" | head -3 echo "✓ Query successful" else echo "✗ Query failed" fi echo "" done } ``` ## Error Handling ### Network Issues ```bash Exception: Connection failed during download ``` **Solution**: Check internet connectivity and retry. Documents are cached locally after first download. ### Insufficient Storage ```bash Exception: No space left on device ``` **Solution**: Free up disk space. Sample documents total approximately 50-100MB. ### API Connection Issues ```bash Exception: Connection refused ``` **Solution**: Verify TrustGraph API is running and accessible. ### Processing Failures ```bash Exception: Document processing failed ``` **Solution**: Check TrustGraph service logs and ensure all components are running. ## Monitoring and Validation ### Loading Progress ```bash # Monitor sample document loading monitor_sample_loading() { echo "Starting sample document loading with monitoring..." # Start loading in background tg-load-sample-documents & load_pid=$! # Monitor progress while kill -0 $load_pid 2>/dev/null; do doc_count=$(tg-show-library-documents 2>/dev/null | grep -c "| id" || echo "0") echo "Documents loaded so far: $doc_count" sleep 10 done wait $load_pid if [ $? -eq 0 ]; then final_count=$(tg-show-library-documents | grep -c "| id") echo "✓ Loading completed successfully" echo "Total documents loaded: $final_count" else echo "✗ Loading failed" fi } ``` ### Validation ```bash # Validate sample document loading validate_sample_loading() { echo "Validating sample document loading..." # Expected document count (based on current sample set) expected_docs=5 # Check actual count actual_docs=$(tg-show-library-documents | grep -c "| id") if [ "$actual_docs" -eq "$expected_docs" ]; then echo "✓ Document count correct: $actual_docs" else echo "⚠ Document count mismatch: expected $expected_docs, got $actual_docs" fi # Check for expected documents expected_titles=( "Challenger" "Icelandic" "Intelligence" "Threat Assessment" "Vigilant State" ) for title in "${expected_titles[@]}"; do if tg-show-library-documents | grep -q "$title"; then echo "✓ Found document containing: $title" else echo "✗ Missing document containing: $title" fi done echo "Validation complete" } ``` ## Environment Variables - `TRUSTGRAPH_URL`: Default API URL ## Related Commands - [`tg-show-library-documents`](tg-show-library-documents.md) - List loaded documents - [`tg-start-library-processing`](tg-start-library-processing.md) - Process loaded documents - [`tg-invoke-document-rag`](tg-invoke-document-rag.md) - Query processed documents - [`tg-load-pdf`](tg-load-pdf.md) - Load individual PDF documents ## API Integration This command uses the [Library API](../apis/api-librarian.md) to add sample documents to TrustGraph's document repository. ## Best Practices 1. **Demo Preparation**: Use for setting up demonstration environments 2. **Testing**: Ideal for testing document processing pipelines 3. **Education**: Excellent for training and educational purposes 4. **Development**: Use in development environments for consistent test data 5. **Benchmarking**: Suitable for performance testing and optimization 6. **Documentation**: Great for documenting TrustGraph capabilities ## Troubleshooting ### Download Failures ```bash # Check document URLs are accessible curl -I "https://ntrs.nasa.gov/api/citations/19860015255/downloads/19860015255.pdf" # Check local cache ls -la doc-cache/ ``` ### Processing Issues ```bash # Check document processing status tg-show-library-processing # Verify documents are in library tg-show-library-documents | grep -E "(Challenger|Icelandic|Intelligence)" ``` ### Performance Problems ```bash # Monitor system resources during loading top df -h ```