docs: document public ingest command

This commit is contained in:
Andrey Avtomonov 2026-05-13 18:09:57 +02:00
parent 9afc5c87c3
commit 220fb5f8ea
7 changed files with 75 additions and 88 deletions

View file

@ -113,7 +113,7 @@ my-project/
│ └── local/
├── raw-sources/
│ └── warehouse/
│ └── live-database/ # Scan artifacts and reports
│ └── <syncId>/ # Database ingest artifacts and reports
└── .ktx/
└── db.sqlite # Local state (git-ignored)
```
@ -122,14 +122,13 @@ Semantic sources and wiki pages are committed to git. The `.ktx/` directory
holds ephemeral state and is git-ignored — delete it and KTX rebuilds on the
next run.
### Scan the demo warehouse
### Build demo warehouse context
Scan artifacts are written under
`raw-sources/warehouse/live-database/<syncId>/` in the project directory.
Database ingest artifacts are written under `raw-sources/warehouse/<syncId>/`
in the project directory.
```bash
SCAN_OUTPUT="$(ktx scan warehouse --project-dir "$PROJECT_DIR")"
printf '%s\n' "$SCAN_OUTPUT"
ktx ingest warehouse --project-dir "$PROJECT_DIR" --fast
ktx status --project-dir "$PROJECT_DIR"
```
@ -218,9 +217,7 @@ KTX provider. Enable it with an environment flag when running an LLM-backed
command:
```bash
KTX_AI_DEVTOOLS_ENABLED=true ktx ingest run \
--connection-id warehouse \
--adapter metabase
KTX_AI_DEVTOOLS_ENABLED=true ktx ingest warehouse --project-dir "$PROJECT_DIR" --deep
```
Traces are written to `.devtools/generations.json` under the current working

View file

@ -1,39 +1,48 @@
---
title: Building Context
description: Scan your database schema and ingest context from dbt, Looker, Metabase, and more.
description: Build database and source context from configured KTX connections.
---
Building context is a two-step process. First, you **scan** your database to discover its structure — tables, columns, types, constraints, and relationships. Then you **ingest** from your existing tools to enrich that structure with semantic meaning — metric definitions, business descriptions, join logic, and knowledge that agents need to generate correct analytics.
Building context reads your configured connections and writes local context that
agents can use. Database connections produce schema context, and source
connections such as dbt, Looker, Metabase, and Notion produce semantic sources
and wiki pages.
## Scanning
## Database ingest
Scanning connects to your database and extracts structural metadata. KTX stores the results locally so agents can understand your schema without querying the database directly.
Database ingest connects to your warehouse and extracts structural metadata.
KTX stores the results locally so agents can understand your schema without
querying the database directly.
### Running a scan
### Running database ingest
```bash
ktx scan <connection-id>
ktx ingest <connection-id>
```
This runs a structural scan by default. You can control what the scan does with the `--mode` flag:
This runs a fast schema ingest by default. You can choose the depth with public
flags:
| Mode | What it does |
| Flag | What it does |
|------|-------------|
| `structural` | Tables, columns, types, constraints, row counts (default) |
| `enriched` | Structural scan plus LLM-generated column descriptions |
| `relationships` | Structural scan plus foreign key relationship detection |
| `--fast` | Tables, columns, types, constraints, and row counts |
| `--deep` | Fast ingest plus AI-enriched database context |
```bash
# Scan with relationship detection
ktx scan my-postgres --mode relationships
# Build one connection quickly
ktx ingest my-postgres --fast
# Preview without writing results
ktx scan my-postgres --dry-run
# Build AI-enriched database context
ktx ingest my-postgres --deep
# Build all configured connections
ktx ingest --all
```
### Checking scan results
### Checking results
Every scan prints a summary and writes local artifacts. Use `ktx status` after a scan to review project readiness and follow-up setup work:
Every ingest prints a summary and writes local artifacts. Use `ktx status`
after ingest to review project readiness and follow-up setup work:
```bash
ktx status
@ -49,7 +58,9 @@ Many databases lack declared foreign keys. KTX infers relationships by scoring c
| 0.55 &ndash; 0.84 | `review` | Plausible — needs human review |
| &lt; 0.55 | `rejected` | Low confidence — not applied |
Relationship scans run with `ktx scan <connection-id> --mode relationships`. This command only executes the scan; relationship review and calibration subcommands are not part of the current CLI surface.
Deep database ingest can include relationship evidence where the connector can
provide it. Relationship review and calibration subcommands are not part of the
current public CLI surface.
## Ingestion
@ -66,15 +77,13 @@ Each ingest run follows this flow:
### Running an ingest
```bash
ktx ingest run --connection-id my-dbt-source --adapter dbt
ktx ingest my-dbt-source
```
Useful low-level flags:
Useful output flags:
| Flag | Description |
|------|-------------|
| `--source-dir <path>` | Directory containing source files (e.g., your dbt project) |
| `--viz` | Render the memory-flow TUI for real-time progress |
| `--json` | Output as JSON |
| `--plain` | Plain text output |

View file

@ -223,18 +223,18 @@ describe('standalone example docs', () => {
assert.doesNotMatch(readme, /python -m ktx_daemon semantic-validate/);
});
it('documents scan workflows in the docs site', async () => {
it('documents public context build workflows in the docs site', async () => {
const rootReadme = await readText('README.md');
const buildingContext = await readText('docs-site/content/docs/guides/building-context.mdx');
const scanReference = await readText('docs-site/content/docs/cli-reference/ktx-scan.mdx');
assert.match(buildingContext, /ktx scan <connection-id>/);
assert.match(buildingContext, /ktx ingest <connection-id>/);
assert.match(buildingContext, /ktx ingest --all/);
assert.match(buildingContext, /ktx status/);
assert.doesNotMatch(buildingContext, /ktx scan status <run-id>/);
assert.doesNotMatch(buildingContext, /ktx scan report <run-id>/);
assert.match(scanReference, /ktx scan <connectionId> \[options\]/);
assert.match(rootReadme, /raw-sources\//);
assert.match(rootReadme, /live-database\//);
assert.doesNotMatch(rootReadme, /live-database\//);
assert.doesNotMatch(rootReadme, /ktx scan/);
assert.doesNotMatch(rootReadme, /Run a local ingest smoke test/);
assert.doesNotMatch(rootReadme, /ktx ingest run --project-dir/);
assert.doesNotMatch(rootReadme, /ktx ingest status --project-dir/);

View file

@ -96,27 +96,20 @@ export function buildKtxYaml(postgresUrl) {
'storage:',
' state: sqlite',
' search: sqlite-fts5',
'ingest:',
' adapters:',
' - live-database',
'',
].join('\n');
}
export function buildLiveDatabaseIngestArgs(projectDir, databaseIntrospectionUrl) {
export function buildLiveDatabaseIngestArgs(projectDir, _databaseIntrospectionUrl, connectionId = 'warehouse') {
return [
'exec',
'ktx',
'ingest',
'run',
connectionId,
'--project-dir',
projectDir,
'--connection-id',
'warehouse',
'--adapter',
'live-database',
'--database-introspection-url',
databaseIntrospectionUrl,
'--fast',
'--no-input',
];
}
@ -324,12 +317,9 @@ async function main() {
env: managedRuntimeEnv(cleanInstallDir),
timeout: 120_000,
});
requireSuccess('ktx ingest run live-database', ingestRun);
requireOutput('ktx ingest run live-database', ingestRun, /Status: done/);
requireOutput('ktx ingest run live-database', ingestRun, /Adapter: live-database/);
requireOutput('ktx ingest run live-database', ingestRun, /Diff: \+4\/~0\/-0\/=0/);
requireOutput('ktx ingest run live-database', ingestRun, /Raw files: 4/);
requireOutput('ktx ingest run live-database', ingestRun, /Work units: 2/);
requireSuccess('ktx ingest warehouse --fast', ingestRun);
requireOutput('ktx ingest warehouse --fast', ingestRun, /Ingest finished/);
requireOutput('ktx ingest warehouse --fast', ingestRun, /Database schema/);
const runId = getRunId(ingestRun.stdout);
const ingestStatus = await run('pnpm', buildLiveDatabaseStatusArgs(projectDir, runId), {

View file

@ -50,7 +50,7 @@ describe('installed live-database artifact smoke helpers', () => {
);
});
it('writes a live-database-only KTX project config with SQLite local state', () => {
it('writes a public database ingest KTX project config with SQLite local state', () => {
assert.equal(
buildKtxYaml('postgresql://ktx:postgres@127.0.0.1:15432/warehouse'), // pragma: allowlist secret
[
@ -63,9 +63,6 @@ describe('installed live-database artifact smoke helpers', () => {
'storage:',
' state: sqlite',
' search: sqlite-fts5',
'ingest:',
' adapters:',
' - live-database',
'',
].join('\n'),
);
@ -98,20 +95,16 @@ describe('installed live-database artifact smoke helpers', () => {
]);
});
it('builds installed CLI live-database ingest and status commands', () => {
it('builds installed CLI public database ingest and status commands', () => {
assert.deepEqual(buildLiveDatabaseIngestArgs('/tmp/project', 'http://127.0.0.1:8765'), [
'exec',
'ktx',
'ingest',
'run',
'warehouse',
'--project-dir',
'/tmp/project',
'--connection-id',
'warehouse',
'--adapter',
'live-database',
'--database-introspection-url',
'http://127.0.0.1:8765',
'--fast',
'--no-input',
]);
assert.deepEqual(buildLiveDatabaseStatusArgs('/tmp/project', 'local-run-1'), [

View file

@ -653,10 +653,6 @@ try {
'scan:',
' enrichment:',
' mode: deterministic',
'ingest:',
' adapters:',
' - fake',
' - live-database',
'',
].join('\\n'),
'utf-8',
@ -819,30 +815,32 @@ try {
requireOutput('ktx dev runtime stop', runtimeStop, /Stopped KTX Python daemon/);
process.stdout.write('ktx dev runtime daemon lifecycle verified\\n');
const structuralScan = await run('pnpm', ['exec', 'ktx', 'scan', 'warehouse',
const structuralScan = await run('pnpm', ['exec', 'ktx', 'ingest', 'warehouse',
'--project-dir',
projectDir,
'--fast',
'--no-input',
]);
requireProjectStderr('ktx scan structural', structuralScan, projectDir);
requireOutput('ktx scan structural', structuralScan, /Status: done/);
requireOutput('ktx scan structural', structuralScan, /Mode: structural/);
requireOutput('ktx scan structural', structuralScan, /Needs attention\\s+None/);
requireProjectStderr('ktx ingest fast', structuralScan, projectDir);
requireOutput('ktx ingest fast', structuralScan, /Ingest finished/);
requireOutput('ktx ingest fast', structuralScan, /Database schema/);
requireOutput('ktx ingest fast', structuralScan, /warehouse\\s+done/);
const structuralScanRunId = getRunId(structuralScan.stdout);
await access(join(projectDir, 'semantic-layer', 'warehouse', '_schema', 'public.yaml'));
process.stdout.write('ktx scan structural verified: ' + structuralScanRunId + '\\n');
process.stdout.write('ktx ingest fast verified: ' + structuralScanRunId + '\\n');
const enrichedScan = await run('pnpm', ['exec', 'ktx', 'scan', 'warehouse',
const enrichedScan = await run('pnpm', ['exec', 'ktx', 'ingest', 'warehouse',
'--project-dir',
projectDir,
'--mode',
'enriched',
'--deep',
'--no-input',
]);
requireProjectStderr('ktx scan enriched', enrichedScan, projectDir);
requireOutput('ktx scan enriched', enrichedScan, /Status: done/);
requireOutput('ktx scan enriched', enrichedScan, /Mode: enriched/);
requireOutput('ktx scan enriched', enrichedScan, /Enrichment artifacts:/);
requireProjectStderr('ktx ingest deep', enrichedScan, projectDir);
requireOutput('ktx ingest deep', enrichedScan, /Ingest finished/);
requireOutput('ktx ingest deep', enrichedScan, /Database schema/);
requireOutput('ktx ingest deep', enrichedScan, /warehouse\\s+done/);
const enrichedScanRunId = getRunId(enrichedScan.stdout);
process.stdout.write('ktx scan enriched verified: ' + enrichedScanRunId + '\\n');
process.stdout.write('ktx ingest deep verified: ' + enrichedScanRunId + '\\n');
await mkdir(join(sourceDir, 'orders'), { recursive: true });
await writeFile(join(sourceDir, 'orders', 'orders.json'), '{"name":"orders"}\\n', 'utf-8');

View file

@ -464,7 +464,7 @@ describe('verification snippets', () => {
assert.match(source, /node:sqlite/);
assert.match(source, /driver: sqlite/);
assert.match(source, /path: warehouse\.db/);
assert.match(source, /live-database/);
assert.doesNotMatch(source, /live-database/);
assert.match(source, /'--execute'/);
assert.match(source, /"mode": "compile_only"/);
assert.match(source, /"mode": "executed"/);
@ -488,11 +488,11 @@ describe('verification snippets', () => {
assert.match(source, /ktx dev runtime stop/);
assert.doesNotMatch(source, /ktx dev runtime prune/);
assert.doesNotMatch(source, /staleRuntimeDir/);
assert.match(source, /run\('pnpm', \[\s*'exec',\s*'ktx',\s*'scan',\s*'warehouse'/);
assert.match(source, /'--mode',\s*'enriched'/);
assert.match(source, /run\('pnpm', \[\s*'exec',\s*'ktx',\s*'ingest',\s*'warehouse'/);
assert.match(source, /'--deep'/);
assert.doesNotMatch(source, /'--enrich'/);
assert.match(source, /ktx scan structural verified/);
assert.match(source, /ktx scan enriched verified/);
assert.match(source, /ktx ingest fast verified/);
assert.match(source, /ktx ingest deep verified/);
assert.match(source, /enrichment:/);
assert.match(source, /mode: deterministic/);
assert.match(source, /run\('pnpm', \['exec', 'ktx', 'ingest', 'run'/);