mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 16:36:21 +02:00
211 lines
5.7 KiB
Markdown
211 lines
5.7 KiB
Markdown
|
|
# tg-load-text
|
||
|
|
|
||
|
|
Loads text documents into TrustGraph processing pipelines with rich metadata support.
|
||
|
|
|
||
|
|
## Synopsis
|
||
|
|
|
||
|
|
```bash
|
||
|
|
tg-load-text [options] file1 [file2 ...]
|
||
|
|
```
|
||
|
|
|
||
|
|
## Description
|
||
|
|
|
||
|
|
The `tg-load-text` command loads text documents into TrustGraph for processing. It creates a SHA256 hash-based document ID and supports comprehensive metadata including copyright information, publication details, and keywords.
|
||
|
|
|
||
|
|
**Note**: Consider using `tg-add-library-document` followed by `tg-start-library-processing` for better document management and processing control.
|
||
|
|
|
||
|
|
## Options
|
||
|
|
|
||
|
|
### Connection & Flow
|
||
|
|
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
|
||
|
|
- `-f, --flow-id FLOW`: Flow ID for processing (default: `default`)
|
||
|
|
- `-U, --user USER`: User identifier (default: `trustgraph`)
|
||
|
|
- `-C, --collection COLLECTION`: Collection identifier (default: `default`)
|
||
|
|
|
||
|
|
### Document Metadata
|
||
|
|
- `--name NAME`: Document name/title
|
||
|
|
- `--description DESCRIPTION`: Document description
|
||
|
|
- `--document-url URL`: Document source URL
|
||
|
|
|
||
|
|
### Copyright Information
|
||
|
|
- `--copyright-notice NOTICE`: Copyright notice text
|
||
|
|
- `--copyright-holder HOLDER`: Copyright holder name
|
||
|
|
- `--copyright-year YEAR`: Copyright year
|
||
|
|
- `--license LICENSE`: Copyright license
|
||
|
|
|
||
|
|
### Publication Information
|
||
|
|
- `--publication-organization ORG`: Publishing organization
|
||
|
|
- `--publication-description DESC`: Publication description
|
||
|
|
- `--publication-date DATE`: Publication date
|
||
|
|
|
||
|
|
### Keywords
|
||
|
|
- `--keyword KEYWORD [KEYWORD ...]`: Document keywords (can specify multiple)
|
||
|
|
|
||
|
|
## Arguments
|
||
|
|
|
||
|
|
- `file1 [file2 ...]`: One or more text files to load
|
||
|
|
|
||
|
|
## Examples
|
||
|
|
|
||
|
|
### Basic Document Loading
|
||
|
|
```bash
|
||
|
|
tg-load-text document.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### Loading with Metadata
|
||
|
|
```bash
|
||
|
|
tg-load-text \
|
||
|
|
--name "Research Paper on AI" \
|
||
|
|
--description "Comprehensive study of machine learning algorithms" \
|
||
|
|
--keyword "AI" "machine learning" "research" \
|
||
|
|
research-paper.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### Complete Metadata Example
|
||
|
|
```bash
|
||
|
|
tg-load-text \
|
||
|
|
--name "TrustGraph Documentation" \
|
||
|
|
--description "Complete user guide for TrustGraph system" \
|
||
|
|
--copyright-holder "TrustGraph Project" \
|
||
|
|
--copyright-year "2024" \
|
||
|
|
--license "MIT" \
|
||
|
|
--publication-organization "TrustGraph Foundation" \
|
||
|
|
--publication-date "2024-01-15" \
|
||
|
|
--keyword "documentation" "guide" "tutorial" \
|
||
|
|
--flow-id research-flow \
|
||
|
|
trustgraph-guide.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### Multiple Files
|
||
|
|
```bash
|
||
|
|
tg-load-text chapter1.txt chapter2.txt chapter3.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### Custom Flow and Collection
|
||
|
|
```bash
|
||
|
|
tg-load-text \
|
||
|
|
--flow-id medical-research \
|
||
|
|
--user researcher \
|
||
|
|
--collection medical-papers \
|
||
|
|
medical-study.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
## Output
|
||
|
|
|
||
|
|
For each file processed, the command outputs:
|
||
|
|
|
||
|
|
### Success
|
||
|
|
```
|
||
|
|
document.txt: Loaded successfully.
|
||
|
|
```
|
||
|
|
|
||
|
|
### Failure
|
||
|
|
```
|
||
|
|
document.txt: Failed: Connection refused
|
||
|
|
```
|
||
|
|
|
||
|
|
## Document Processing
|
||
|
|
|
||
|
|
1. **File Reading**: Reads the text file content
|
||
|
|
2. **Hash Generation**: Creates SHA256 hash for unique document ID
|
||
|
|
3. **URI Creation**: Converts hash to document URI format
|
||
|
|
4. **Metadata Assembly**: Combines all metadata into RDF triples
|
||
|
|
5. **API Submission**: Sends to TrustGraph via Text Load API
|
||
|
|
|
||
|
|
## Document ID Generation
|
||
|
|
|
||
|
|
Documents are assigned IDs based on their content hash:
|
||
|
|
- SHA256 hash of file content
|
||
|
|
- Converted to TrustGraph document URI format
|
||
|
|
- Example: `http://trustgraph.ai/d/abc123...`
|
||
|
|
|
||
|
|
## Metadata Format
|
||
|
|
|
||
|
|
The metadata is stored as RDF triples including:
|
||
|
|
|
||
|
|
### Standard Properties
|
||
|
|
- `dc:title`: Document name
|
||
|
|
- `dc:description`: Document description
|
||
|
|
- `dc:creator`: Copyright holder
|
||
|
|
- `dc:date`: Publication date
|
||
|
|
- `dc:rights`: Copyright notice
|
||
|
|
- `dc:license`: License information
|
||
|
|
|
||
|
|
### Keywords
|
||
|
|
- `dc:subject`: Each keyword as separate triple
|
||
|
|
|
||
|
|
### Organization Information
|
||
|
|
- `foaf:Organization`: Publication organization details
|
||
|
|
|
||
|
|
## Error Handling
|
||
|
|
|
||
|
|
### File Errors
|
||
|
|
```bash
|
||
|
|
document.txt: Failed: No such file or directory
|
||
|
|
```
|
||
|
|
**Solution**: Verify the file path exists and is readable.
|
||
|
|
|
||
|
|
### Connection Errors
|
||
|
|
```bash
|
||
|
|
document.txt: Failed: Connection refused
|
||
|
|
```
|
||
|
|
**Solution**: Check the API URL and ensure TrustGraph is running.
|
||
|
|
|
||
|
|
### Flow Errors
|
||
|
|
```bash
|
||
|
|
document.txt: Failed: Invalid flow
|
||
|
|
```
|
||
|
|
**Solution**: Verify the flow exists and is running using `tg-show-flows`.
|
||
|
|
|
||
|
|
## Environment Variables
|
||
|
|
|
||
|
|
- `TRUSTGRAPH_URL`: Default API URL
|
||
|
|
|
||
|
|
## Related Commands
|
||
|
|
|
||
|
|
- [`tg-add-library-document`](tg-add-library-document.md) - Add documents to library (recommended)
|
||
|
|
- [`tg-load-pdf`](tg-load-pdf.md) - Load PDF documents
|
||
|
|
- [`tg-show-library-documents`](tg-show-library-documents.md) - List loaded documents
|
||
|
|
- [`tg-start-library-processing`](tg-start-library-processing.md) - Start document processing
|
||
|
|
|
||
|
|
## API Integration
|
||
|
|
|
||
|
|
This command uses the [Text Load API](../apis/api-text-load.md) to submit documents for processing. The text content is base64-encoded for transmission.
|
||
|
|
|
||
|
|
## Use Cases
|
||
|
|
|
||
|
|
### Academic Research
|
||
|
|
```bash
|
||
|
|
tg-load-text \
|
||
|
|
--name "Climate Change Impact Study" \
|
||
|
|
--publication-organization "University Research Center" \
|
||
|
|
--keyword "climate" "research" "environment" \
|
||
|
|
climate-study.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### Corporate Documentation
|
||
|
|
```bash
|
||
|
|
tg-load-text \
|
||
|
|
--name "Product Manual" \
|
||
|
|
--copyright-holder "Acme Corp" \
|
||
|
|
--license "Proprietary" \
|
||
|
|
--keyword "manual" "product" "guide" \
|
||
|
|
product-manual.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### Technical Documentation
|
||
|
|
```bash
|
||
|
|
tg-load-text \
|
||
|
|
--name "API Reference" \
|
||
|
|
--description "Complete API documentation" \
|
||
|
|
--keyword "API" "reference" "technical" \
|
||
|
|
api-docs.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
## Best Practices
|
||
|
|
|
||
|
|
1. **Use Descriptive Names**: Provide clear document names and descriptions
|
||
|
|
2. **Add Keywords**: Include relevant keywords for better searchability
|
||
|
|
3. **Complete Metadata**: Fill in copyright and publication information
|
||
|
|
4. **Batch Processing**: Load multiple related files together
|
||
|
|
5. **Use Collections**: Organize documents by topic or project using collections
|