mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-07 07:55:13 +02:00
docs: document public ingest command
This commit is contained in:
parent
9afc5c87c3
commit
220fb5f8ea
7 changed files with 75 additions and 88 deletions
15
README.md
15
README.md
|
|
@ -113,7 +113,7 @@ my-project/
|
|||
│ └── local/
|
||||
├── raw-sources/
|
||||
│ └── warehouse/
|
||||
│ └── live-database/ # Scan artifacts and reports
|
||||
│ └── <syncId>/ # Database ingest artifacts and reports
|
||||
└── .ktx/
|
||||
└── db.sqlite # Local state (git-ignored)
|
||||
```
|
||||
|
|
@ -122,14 +122,13 @@ Semantic sources and wiki pages are committed to git. The `.ktx/` directory
|
|||
holds ephemeral state and is git-ignored — delete it and KTX rebuilds on the
|
||||
next run.
|
||||
|
||||
### Scan the demo warehouse
|
||||
### Build demo warehouse context
|
||||
|
||||
Scan artifacts are written under
|
||||
`raw-sources/warehouse/live-database/<syncId>/` in the project directory.
|
||||
Database ingest artifacts are written under `raw-sources/warehouse/<syncId>/`
|
||||
in the project directory.
|
||||
|
||||
```bash
|
||||
SCAN_OUTPUT="$(ktx scan warehouse --project-dir "$PROJECT_DIR")"
|
||||
printf '%s\n' "$SCAN_OUTPUT"
|
||||
ktx ingest warehouse --project-dir "$PROJECT_DIR" --fast
|
||||
ktx status --project-dir "$PROJECT_DIR"
|
||||
```
|
||||
|
||||
|
|
@ -218,9 +217,7 @@ KTX provider. Enable it with an environment flag when running an LLM-backed
|
|||
command:
|
||||
|
||||
```bash
|
||||
KTX_AI_DEVTOOLS_ENABLED=true ktx ingest run \
|
||||
--connection-id warehouse \
|
||||
--adapter metabase
|
||||
KTX_AI_DEVTOOLS_ENABLED=true ktx ingest warehouse --project-dir "$PROJECT_DIR" --deep
|
||||
```
|
||||
|
||||
Traces are written to `.devtools/generations.json` under the current working
|
||||
|
|
|
|||
|
|
@ -1,39 +1,48 @@
|
|||
---
|
||||
title: Building Context
|
||||
description: Scan your database schema and ingest context from dbt, Looker, Metabase, and more.
|
||||
description: Build database and source context from configured KTX connections.
|
||||
---
|
||||
|
||||
Building context is a two-step process. First, you **scan** your database to discover its structure — tables, columns, types, constraints, and relationships. Then you **ingest** from your existing tools to enrich that structure with semantic meaning — metric definitions, business descriptions, join logic, and knowledge that agents need to generate correct analytics.
|
||||
Building context reads your configured connections and writes local context that
|
||||
agents can use. Database connections produce schema context, and source
|
||||
connections such as dbt, Looker, Metabase, and Notion produce semantic sources
|
||||
and wiki pages.
|
||||
|
||||
## Scanning
|
||||
## Database ingest
|
||||
|
||||
Scanning connects to your database and extracts structural metadata. KTX stores the results locally so agents can understand your schema without querying the database directly.
|
||||
Database ingest connects to your warehouse and extracts structural metadata.
|
||||
KTX stores the results locally so agents can understand your schema without
|
||||
querying the database directly.
|
||||
|
||||
### Running a scan
|
||||
### Running database ingest
|
||||
|
||||
```bash
|
||||
ktx scan <connection-id>
|
||||
ktx ingest <connection-id>
|
||||
```
|
||||
|
||||
This runs a structural scan by default. You can control what the scan does with the `--mode` flag:
|
||||
This runs a fast schema ingest by default. You can choose the depth with public
|
||||
flags:
|
||||
|
||||
| Mode | What it does |
|
||||
| Flag | What it does |
|
||||
|------|-------------|
|
||||
| `structural` | Tables, columns, types, constraints, row counts (default) |
|
||||
| `enriched` | Structural scan plus LLM-generated column descriptions |
|
||||
| `relationships` | Structural scan plus foreign key relationship detection |
|
||||
| `--fast` | Tables, columns, types, constraints, and row counts |
|
||||
| `--deep` | Fast ingest plus AI-enriched database context |
|
||||
|
||||
```bash
|
||||
# Scan with relationship detection
|
||||
ktx scan my-postgres --mode relationships
|
||||
# Build one connection quickly
|
||||
ktx ingest my-postgres --fast
|
||||
|
||||
# Preview without writing results
|
||||
ktx scan my-postgres --dry-run
|
||||
# Build AI-enriched database context
|
||||
ktx ingest my-postgres --deep
|
||||
|
||||
# Build all configured connections
|
||||
ktx ingest --all
|
||||
```
|
||||
|
||||
### Checking scan results
|
||||
### Checking results
|
||||
|
||||
Every scan prints a summary and writes local artifacts. Use `ktx status` after a scan to review project readiness and follow-up setup work:
|
||||
Every ingest prints a summary and writes local artifacts. Use `ktx status`
|
||||
after ingest to review project readiness and follow-up setup work:
|
||||
|
||||
```bash
|
||||
ktx status
|
||||
|
|
@ -49,7 +58,9 @@ Many databases lack declared foreign keys. KTX infers relationships by scoring c
|
|||
| 0.55 – 0.84 | `review` | Plausible — needs human review |
|
||||
| < 0.55 | `rejected` | Low confidence — not applied |
|
||||
|
||||
Relationship scans run with `ktx scan <connection-id> --mode relationships`. This command only executes the scan; relationship review and calibration subcommands are not part of the current CLI surface.
|
||||
Deep database ingest can include relationship evidence where the connector can
|
||||
provide it. Relationship review and calibration subcommands are not part of the
|
||||
current public CLI surface.
|
||||
|
||||
## Ingestion
|
||||
|
||||
|
|
@ -66,15 +77,13 @@ Each ingest run follows this flow:
|
|||
### Running an ingest
|
||||
|
||||
```bash
|
||||
ktx ingest run --connection-id my-dbt-source --adapter dbt
|
||||
ktx ingest my-dbt-source
|
||||
```
|
||||
|
||||
Useful low-level flags:
|
||||
Useful output flags:
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--source-dir <path>` | Directory containing source files (e.g., your dbt project) |
|
||||
| `--viz` | Render the memory-flow TUI for real-time progress |
|
||||
| `--json` | Output as JSON |
|
||||
| `--plain` | Plain text output |
|
||||
|
||||
|
|
|
|||
|
|
@ -223,18 +223,18 @@ describe('standalone example docs', () => {
|
|||
assert.doesNotMatch(readme, /python -m ktx_daemon semantic-validate/);
|
||||
});
|
||||
|
||||
it('documents scan workflows in the docs site', async () => {
|
||||
it('documents public context build workflows in the docs site', async () => {
|
||||
const rootReadme = await readText('README.md');
|
||||
const buildingContext = await readText('docs-site/content/docs/guides/building-context.mdx');
|
||||
const scanReference = await readText('docs-site/content/docs/cli-reference/ktx-scan.mdx');
|
||||
|
||||
assert.match(buildingContext, /ktx scan <connection-id>/);
|
||||
assert.match(buildingContext, /ktx ingest <connection-id>/);
|
||||
assert.match(buildingContext, /ktx ingest --all/);
|
||||
assert.match(buildingContext, /ktx status/);
|
||||
assert.doesNotMatch(buildingContext, /ktx scan status <run-id>/);
|
||||
assert.doesNotMatch(buildingContext, /ktx scan report <run-id>/);
|
||||
assert.match(scanReference, /ktx scan <connectionId> \[options\]/);
|
||||
assert.match(rootReadme, /raw-sources\//);
|
||||
assert.match(rootReadme, /live-database\//);
|
||||
assert.doesNotMatch(rootReadme, /live-database\//);
|
||||
assert.doesNotMatch(rootReadme, /ktx scan/);
|
||||
assert.doesNotMatch(rootReadme, /Run a local ingest smoke test/);
|
||||
assert.doesNotMatch(rootReadme, /ktx ingest run --project-dir/);
|
||||
assert.doesNotMatch(rootReadme, /ktx ingest status --project-dir/);
|
||||
|
|
|
|||
|
|
@ -96,27 +96,20 @@ export function buildKtxYaml(postgresUrl) {
|
|||
'storage:',
|
||||
' state: sqlite',
|
||||
' search: sqlite-fts5',
|
||||
'ingest:',
|
||||
' adapters:',
|
||||
' - live-database',
|
||||
'',
|
||||
].join('\n');
|
||||
}
|
||||
|
||||
export function buildLiveDatabaseIngestArgs(projectDir, databaseIntrospectionUrl) {
|
||||
export function buildLiveDatabaseIngestArgs(projectDir, _databaseIntrospectionUrl, connectionId = 'warehouse') {
|
||||
return [
|
||||
'exec',
|
||||
'ktx',
|
||||
'ingest',
|
||||
'run',
|
||||
connectionId,
|
||||
'--project-dir',
|
||||
projectDir,
|
||||
'--connection-id',
|
||||
'warehouse',
|
||||
'--adapter',
|
||||
'live-database',
|
||||
'--database-introspection-url',
|
||||
databaseIntrospectionUrl,
|
||||
'--fast',
|
||||
'--no-input',
|
||||
];
|
||||
}
|
||||
|
||||
|
|
@ -324,12 +317,9 @@ async function main() {
|
|||
env: managedRuntimeEnv(cleanInstallDir),
|
||||
timeout: 120_000,
|
||||
});
|
||||
requireSuccess('ktx ingest run live-database', ingestRun);
|
||||
requireOutput('ktx ingest run live-database', ingestRun, /Status: done/);
|
||||
requireOutput('ktx ingest run live-database', ingestRun, /Adapter: live-database/);
|
||||
requireOutput('ktx ingest run live-database', ingestRun, /Diff: \+4\/~0\/-0\/=0/);
|
||||
requireOutput('ktx ingest run live-database', ingestRun, /Raw files: 4/);
|
||||
requireOutput('ktx ingest run live-database', ingestRun, /Work units: 2/);
|
||||
requireSuccess('ktx ingest warehouse --fast', ingestRun);
|
||||
requireOutput('ktx ingest warehouse --fast', ingestRun, /Ingest finished/);
|
||||
requireOutput('ktx ingest warehouse --fast', ingestRun, /Database schema/);
|
||||
|
||||
const runId = getRunId(ingestRun.stdout);
|
||||
const ingestStatus = await run('pnpm', buildLiveDatabaseStatusArgs(projectDir, runId), {
|
||||
|
|
|
|||
|
|
@ -50,7 +50,7 @@ describe('installed live-database artifact smoke helpers', () => {
|
|||
);
|
||||
});
|
||||
|
||||
it('writes a live-database-only KTX project config with SQLite local state', () => {
|
||||
it('writes a public database ingest KTX project config with SQLite local state', () => {
|
||||
assert.equal(
|
||||
buildKtxYaml('postgresql://ktx:postgres@127.0.0.1:15432/warehouse'), // pragma: allowlist secret
|
||||
[
|
||||
|
|
@ -63,9 +63,6 @@ describe('installed live-database artifact smoke helpers', () => {
|
|||
'storage:',
|
||||
' state: sqlite',
|
||||
' search: sqlite-fts5',
|
||||
'ingest:',
|
||||
' adapters:',
|
||||
' - live-database',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
|
|
@ -98,20 +95,16 @@ describe('installed live-database artifact smoke helpers', () => {
|
|||
]);
|
||||
});
|
||||
|
||||
it('builds installed CLI live-database ingest and status commands', () => {
|
||||
it('builds installed CLI public database ingest and status commands', () => {
|
||||
assert.deepEqual(buildLiveDatabaseIngestArgs('/tmp/project', 'http://127.0.0.1:8765'), [
|
||||
'exec',
|
||||
'ktx',
|
||||
'ingest',
|
||||
'run',
|
||||
'warehouse',
|
||||
'--project-dir',
|
||||
'/tmp/project',
|
||||
'--connection-id',
|
||||
'warehouse',
|
||||
'--adapter',
|
||||
'live-database',
|
||||
'--database-introspection-url',
|
||||
'http://127.0.0.1:8765',
|
||||
'--fast',
|
||||
'--no-input',
|
||||
]);
|
||||
|
||||
assert.deepEqual(buildLiveDatabaseStatusArgs('/tmp/project', 'local-run-1'), [
|
||||
|
|
|
|||
|
|
@ -653,10 +653,6 @@ try {
|
|||
'scan:',
|
||||
' enrichment:',
|
||||
' mode: deterministic',
|
||||
'ingest:',
|
||||
' adapters:',
|
||||
' - fake',
|
||||
' - live-database',
|
||||
'',
|
||||
].join('\\n'),
|
||||
'utf-8',
|
||||
|
|
@ -819,30 +815,32 @@ try {
|
|||
requireOutput('ktx dev runtime stop', runtimeStop, /Stopped KTX Python daemon/);
|
||||
process.stdout.write('ktx dev runtime daemon lifecycle verified\\n');
|
||||
|
||||
const structuralScan = await run('pnpm', ['exec', 'ktx', 'scan', 'warehouse',
|
||||
const structuralScan = await run('pnpm', ['exec', 'ktx', 'ingest', 'warehouse',
|
||||
'--project-dir',
|
||||
projectDir,
|
||||
'--fast',
|
||||
'--no-input',
|
||||
]);
|
||||
requireProjectStderr('ktx scan structural', structuralScan, projectDir);
|
||||
requireOutput('ktx scan structural', structuralScan, /Status: done/);
|
||||
requireOutput('ktx scan structural', structuralScan, /Mode: structural/);
|
||||
requireOutput('ktx scan structural', structuralScan, /Needs attention\\s+None/);
|
||||
requireProjectStderr('ktx ingest fast', structuralScan, projectDir);
|
||||
requireOutput('ktx ingest fast', structuralScan, /Ingest finished/);
|
||||
requireOutput('ktx ingest fast', structuralScan, /Database schema/);
|
||||
requireOutput('ktx ingest fast', structuralScan, /warehouse\\s+done/);
|
||||
const structuralScanRunId = getRunId(structuralScan.stdout);
|
||||
await access(join(projectDir, 'semantic-layer', 'warehouse', '_schema', 'public.yaml'));
|
||||
process.stdout.write('ktx scan structural verified: ' + structuralScanRunId + '\\n');
|
||||
process.stdout.write('ktx ingest fast verified: ' + structuralScanRunId + '\\n');
|
||||
|
||||
const enrichedScan = await run('pnpm', ['exec', 'ktx', 'scan', 'warehouse',
|
||||
const enrichedScan = await run('pnpm', ['exec', 'ktx', 'ingest', 'warehouse',
|
||||
'--project-dir',
|
||||
projectDir,
|
||||
'--mode',
|
||||
'enriched',
|
||||
'--deep',
|
||||
'--no-input',
|
||||
]);
|
||||
requireProjectStderr('ktx scan enriched', enrichedScan, projectDir);
|
||||
requireOutput('ktx scan enriched', enrichedScan, /Status: done/);
|
||||
requireOutput('ktx scan enriched', enrichedScan, /Mode: enriched/);
|
||||
requireOutput('ktx scan enriched', enrichedScan, /Enrichment artifacts:/);
|
||||
requireProjectStderr('ktx ingest deep', enrichedScan, projectDir);
|
||||
requireOutput('ktx ingest deep', enrichedScan, /Ingest finished/);
|
||||
requireOutput('ktx ingest deep', enrichedScan, /Database schema/);
|
||||
requireOutput('ktx ingest deep', enrichedScan, /warehouse\\s+done/);
|
||||
const enrichedScanRunId = getRunId(enrichedScan.stdout);
|
||||
process.stdout.write('ktx scan enriched verified: ' + enrichedScanRunId + '\\n');
|
||||
process.stdout.write('ktx ingest deep verified: ' + enrichedScanRunId + '\\n');
|
||||
|
||||
await mkdir(join(sourceDir, 'orders'), { recursive: true });
|
||||
await writeFile(join(sourceDir, 'orders', 'orders.json'), '{"name":"orders"}\\n', 'utf-8');
|
||||
|
|
|
|||
|
|
@ -464,7 +464,7 @@ describe('verification snippets', () => {
|
|||
assert.match(source, /node:sqlite/);
|
||||
assert.match(source, /driver: sqlite/);
|
||||
assert.match(source, /path: warehouse\.db/);
|
||||
assert.match(source, /live-database/);
|
||||
assert.doesNotMatch(source, /live-database/);
|
||||
assert.match(source, /'--execute'/);
|
||||
assert.match(source, /"mode": "compile_only"/);
|
||||
assert.match(source, /"mode": "executed"/);
|
||||
|
|
@ -488,11 +488,11 @@ describe('verification snippets', () => {
|
|||
assert.match(source, /ktx dev runtime stop/);
|
||||
assert.doesNotMatch(source, /ktx dev runtime prune/);
|
||||
assert.doesNotMatch(source, /staleRuntimeDir/);
|
||||
assert.match(source, /run\('pnpm', \[\s*'exec',\s*'ktx',\s*'scan',\s*'warehouse'/);
|
||||
assert.match(source, /'--mode',\s*'enriched'/);
|
||||
assert.match(source, /run\('pnpm', \[\s*'exec',\s*'ktx',\s*'ingest',\s*'warehouse'/);
|
||||
assert.match(source, /'--deep'/);
|
||||
assert.doesNotMatch(source, /'--enrich'/);
|
||||
assert.match(source, /ktx scan structural verified/);
|
||||
assert.match(source, /ktx scan enriched verified/);
|
||||
assert.match(source, /ktx ingest fast verified/);
|
||||
assert.match(source, /ktx ingest deep verified/);
|
||||
assert.match(source, /enrichment:/);
|
||||
assert.match(source, /mode: deterministic/);
|
||||
assert.match(source, /run\('pnpm', \['exec', 'ktx', 'ingest', 'run'/);
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue