mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-26 00:46:22 +02:00
Update docs for API/CLI changes in 1.0 (#421)
* Update some API basics for the 0.23/1.0 API change
This commit is contained in:
parent
f907ea7db8
commit
44bdd29f51
69 changed files with 19981 additions and 407 deletions
211
docs/cli/tg-load-text.md
Normal file
211
docs/cli/tg-load-text.md
Normal file
|
|
@ -0,0 +1,211 @@
|
|||
# tg-load-text
|
||||
|
||||
Loads text documents into TrustGraph processing pipelines with rich metadata support.
|
||||
|
||||
## Synopsis
|
||||
|
||||
```bash
|
||||
tg-load-text [options] file1 [file2 ...]
|
||||
```
|
||||
|
||||
## Description
|
||||
|
||||
The `tg-load-text` command loads text documents into TrustGraph for processing. It creates a SHA256 hash-based document ID and supports comprehensive metadata including copyright information, publication details, and keywords.
|
||||
|
||||
**Note**: Consider using `tg-add-library-document` followed by `tg-start-library-processing` for better document management and processing control.
|
||||
|
||||
## Options
|
||||
|
||||
### Connection & Flow
|
||||
- `-u, --url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`)
|
||||
- `-f, --flow-id FLOW`: Flow ID for processing (default: `default`)
|
||||
- `-U, --user USER`: User identifier (default: `trustgraph`)
|
||||
- `-C, --collection COLLECTION`: Collection identifier (default: `default`)
|
||||
|
||||
### Document Metadata
|
||||
- `--name NAME`: Document name/title
|
||||
- `--description DESCRIPTION`: Document description
|
||||
- `--document-url URL`: Document source URL
|
||||
|
||||
### Copyright Information
|
||||
- `--copyright-notice NOTICE`: Copyright notice text
|
||||
- `--copyright-holder HOLDER`: Copyright holder name
|
||||
- `--copyright-year YEAR`: Copyright year
|
||||
- `--license LICENSE`: Copyright license
|
||||
|
||||
### Publication Information
|
||||
- `--publication-organization ORG`: Publishing organization
|
||||
- `--publication-description DESC`: Publication description
|
||||
- `--publication-date DATE`: Publication date
|
||||
|
||||
### Keywords
|
||||
- `--keyword KEYWORD [KEYWORD ...]`: Document keywords (can specify multiple)
|
||||
|
||||
## Arguments
|
||||
|
||||
- `file1 [file2 ...]`: One or more text files to load
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Document Loading
|
||||
```bash
|
||||
tg-load-text document.txt
|
||||
```
|
||||
|
||||
### Loading with Metadata
|
||||
```bash
|
||||
tg-load-text \
|
||||
--name "Research Paper on AI" \
|
||||
--description "Comprehensive study of machine learning algorithms" \
|
||||
--keyword "AI" "machine learning" "research" \
|
||||
research-paper.txt
|
||||
```
|
||||
|
||||
### Complete Metadata Example
|
||||
```bash
|
||||
tg-load-text \
|
||||
--name "TrustGraph Documentation" \
|
||||
--description "Complete user guide for TrustGraph system" \
|
||||
--copyright-holder "TrustGraph Project" \
|
||||
--copyright-year "2024" \
|
||||
--license "MIT" \
|
||||
--publication-organization "TrustGraph Foundation" \
|
||||
--publication-date "2024-01-15" \
|
||||
--keyword "documentation" "guide" "tutorial" \
|
||||
--flow-id research-flow \
|
||||
trustgraph-guide.txt
|
||||
```
|
||||
|
||||
### Multiple Files
|
||||
```bash
|
||||
tg-load-text chapter1.txt chapter2.txt chapter3.txt
|
||||
```
|
||||
|
||||
### Custom Flow and Collection
|
||||
```bash
|
||||
tg-load-text \
|
||||
--flow-id medical-research \
|
||||
--user researcher \
|
||||
--collection medical-papers \
|
||||
medical-study.txt
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
For each file processed, the command outputs:
|
||||
|
||||
### Success
|
||||
```
|
||||
document.txt: Loaded successfully.
|
||||
```
|
||||
|
||||
### Failure
|
||||
```
|
||||
document.txt: Failed: Connection refused
|
||||
```
|
||||
|
||||
## Document Processing
|
||||
|
||||
1. **File Reading**: Reads the text file content
|
||||
2. **Hash Generation**: Creates SHA256 hash for unique document ID
|
||||
3. **URI Creation**: Converts hash to document URI format
|
||||
4. **Metadata Assembly**: Combines all metadata into RDF triples
|
||||
5. **API Submission**: Sends to TrustGraph via Text Load API
|
||||
|
||||
## Document ID Generation
|
||||
|
||||
Documents are assigned IDs based on their content hash:
|
||||
- SHA256 hash of file content
|
||||
- Converted to TrustGraph document URI format
|
||||
- Example: `http://trustgraph.ai/d/abc123...`
|
||||
|
||||
## Metadata Format
|
||||
|
||||
The metadata is stored as RDF triples including:
|
||||
|
||||
### Standard Properties
|
||||
- `dc:title`: Document name
|
||||
- `dc:description`: Document description
|
||||
- `dc:creator`: Copyright holder
|
||||
- `dc:date`: Publication date
|
||||
- `dc:rights`: Copyright notice
|
||||
- `dc:license`: License information
|
||||
|
||||
### Keywords
|
||||
- `dc:subject`: Each keyword as separate triple
|
||||
|
||||
### Organization Information
|
||||
- `foaf:Organization`: Publication organization details
|
||||
|
||||
## Error Handling
|
||||
|
||||
### File Errors
|
||||
```bash
|
||||
document.txt: Failed: No such file or directory
|
||||
```
|
||||
**Solution**: Verify the file path exists and is readable.
|
||||
|
||||
### Connection Errors
|
||||
```bash
|
||||
document.txt: Failed: Connection refused
|
||||
```
|
||||
**Solution**: Check the API URL and ensure TrustGraph is running.
|
||||
|
||||
### Flow Errors
|
||||
```bash
|
||||
document.txt: Failed: Invalid flow
|
||||
```
|
||||
**Solution**: Verify the flow exists and is running using `tg-show-flows`.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
- `TRUSTGRAPH_URL`: Default API URL
|
||||
|
||||
## Related Commands
|
||||
|
||||
- [`tg-add-library-document`](tg-add-library-document.md) - Add documents to library (recommended)
|
||||
- [`tg-load-pdf`](tg-load-pdf.md) - Load PDF documents
|
||||
- [`tg-show-library-documents`](tg-show-library-documents.md) - List loaded documents
|
||||
- [`tg-start-library-processing`](tg-start-library-processing.md) - Start document processing
|
||||
|
||||
## API Integration
|
||||
|
||||
This command uses the [Text Load API](../apis/api-text-load.md) to submit documents for processing. The text content is base64-encoded for transmission.
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Academic Research
|
||||
```bash
|
||||
tg-load-text \
|
||||
--name "Climate Change Impact Study" \
|
||||
--publication-organization "University Research Center" \
|
||||
--keyword "climate" "research" "environment" \
|
||||
climate-study.txt
|
||||
```
|
||||
|
||||
### Corporate Documentation
|
||||
```bash
|
||||
tg-load-text \
|
||||
--name "Product Manual" \
|
||||
--copyright-holder "Acme Corp" \
|
||||
--license "Proprietary" \
|
||||
--keyword "manual" "product" "guide" \
|
||||
product-manual.txt
|
||||
```
|
||||
|
||||
### Technical Documentation
|
||||
```bash
|
||||
tg-load-text \
|
||||
--name "API Reference" \
|
||||
--description "Complete API documentation" \
|
||||
--keyword "API" "reference" "technical" \
|
||||
api-docs.txt
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use Descriptive Names**: Provide clear document names and descriptions
|
||||
2. **Add Keywords**: Include relevant keywords for better searchability
|
||||
3. **Complete Metadata**: Fill in copyright and publication information
|
||||
4. **Batch Processing**: Load multiple related files together
|
||||
5. **Use Collections**: Organize documents by topic or project using collections
|
||||
Loading…
Add table
Add a link
Reference in a new issue