ktx/docs-site/content/docs/cli-reference/ktx-ingest.mdx
Andrey Avtomonov 2c9a58bb56
feat(cli): smart defaults and flatter command surface for ktx (#177)
Bare invocations now do the obvious thing instead of erroring out, and mode-as-subcommand patterns collapse into flags on the parent. No new top-level commands.

- `ktx ingest` (bare) ingests every configured connection. The `text` subcommand is gone; capture inline notes with `ktx ingest --text "..."` and files with `ktx ingest --file path` (use `-` for stdin). `--text`/`--file` reject a positional connection id; pass `--connection-id` to tag captured notes.
- `ktx connection` (bare) lists; `ktx connection test` (bare) tests every configured connection.
- `ktx wiki` and `ktx sl` flatten `list`/`search`: bare lists, with a `[query...]` positional searches (multi-word joined with spaces). `sl validate` and `sl query` stay as distinct verbs and now read `--connection-id` from the parent.
- `ktx mcp` (bare) prints daemon status.

Adds a shared `resolveConnectionSelection` helper consumed by ingest and connection test. Updates README, docs-site cli-reference and guides, next-steps strings, agent SKILL templates, and all affected tests. Per-package type-check, unit tests (605), smoke tests, and dead-code checks all pass.
2026-05-20 01:52:37 +02:00

161 lines
6.8 KiB
Text

---
title: "ktx ingest"
description: "Build or refresh KTX context, or capture text into KTX memory."
---
`ktx ingest` builds or refreshes KTX context from configured connections, and
can also capture free-form text into KTX memory. Database connections build
schema context. Context-source connections ingest metadata from tools such as
dbt, Looker, Metabase, MetricFlow, LookML, and Notion. Pass `--text` or
`--file` to capture inline text or text files into memory instead.
## Command signature
```bash
ktx ingest [options] [connectionId]
```
- Bare `ktx ingest` (no positional, no `--all`) ingests every configured
connection.
- `ktx ingest <connectionId>` ingests one configured connection.
- `ktx ingest --text "..."` (or `--file <path>`) captures notes into KTX
memory instead of ingesting a connection.
Database connections run before context-source connections when more than one
connection is selected.
## Options
| Flag | Description | Default |
|------|-------------|---------|
| `--all` | Ingest all configured connections (same as bare invocation) | `false` |
| `--fast` | Use deterministic database schema ingest | Stored connection default, or `fast` |
| `--deep` | Use AI-enriched database ingest | Stored connection default, or `fast` |
| `--query-history` | Include database query-history usage patterns | Stored connection default |
| `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default |
| `--query-history-window-days <days>` | BigQuery/Snowflake query-history lookback window for this run | Stored connection default |
| `--text <content>` | Capture inline text into KTX memory; repeatable | `[]` |
| `--file <path>` | Capture a text file into KTX memory; use `-` for stdin; repeatable | `[]` |
| `--connection-id <connectionId>` | KTX connection id to tag captured text/file notes | - |
| `--user-id <id>` | Memory user id for text/file capture attribution | `local-cli` |
| `--fail-fast` | Stop after the first failed text/file item | `false` |
| `--plain` | Print plain text output | `true` |
| `--json` | Print JSON output | `false` |
| `--yes` | Install required managed runtime features without prompting | `false` |
| `--no-input` | Disable interactive terminal input | - |
`--fast` and `--deep` are mutually exclusive. Depth flags apply only to
database connections. Query-history flags apply only to database connections
that support query history. The window flag applies to BigQuery and Snowflake;
Postgres reads the current `pg_stat_statements` aggregate data instead of a
time-windowed history table. Query-history ingest runs after schema ingest and
requires deep ingest readiness.
When more than one connection is selected, database ingest runs first, then
source ingest and memory updates run for source connections.
Some ingest paths use the managed KTX Python runtime. Query-history ingest uses
it for SQL analysis, and Looker source ingest uses it for Looker identifier
parsing. In an interactive terminal, `ktx ingest` prompts before installing the
required runtime features. Use `--yes` to install them without prompting, or
use `--no-input` to fail fast with install guidance.
`--text` and `--file` cannot be combined with a positional `connectionId` or
`--all`; pass `--connection-id <id>` instead to tag captured notes.
## Examples
```bash
# Build every configured connection (bare = --all)
ktx ingest
# Build one database or source connection
ktx ingest warehouse
# Force deterministic database schema ingest
ktx ingest warehouse --fast
# Force AI-enriched database ingest
ktx ingest warehouse --deep
# Include query-history usage patterns
ktx ingest warehouse --deep --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30
# Build a source connection
ktx ingest notion
# Capture inline text into memory
ktx ingest --text "Refunds are excluded from net revenue."
# Capture multiple text snippets in one call
ktx ingest --text "Revenue is gross receipts." --text "Orders are completed purchases."
# Capture a local Markdown file into memory and tag it to a connection
ktx ingest --file docs/revenue-notes.md --connection-id warehouse
# Capture one stdin item
printf "Refunds are excluded from net revenue." | ktx ingest --file -
```
## Output
Plain output summarizes each target and the operations that ran.
```text
Ingest finished
Source Database schema Query history Source ingest Memory update
warehouse done done skipped skipped
notion skipped skipped done done
```
Use `--json` when a script or agent needs the selected plan and per-target
results.
## Inspect source ingest traces
Source ingest writes persistent JSONL traces for postmortem debugging. Plain
ingest output prints the trace path near the report, run, and job identifiers
when a trace is available:
```text
Report: report-abc123
Run: run-abc123
Job: job-abc123
Trace: .ktx/ingest-traces/job-abc123/trace.jsonl
```
The trace file lives under the project directory at
`.ktx/ingest-traces/<jobId>/trace.jsonl`. Each line is a JSON event with the
job id, run id, sync id, connection id, source key, phase, event name, timing,
state snapshot, decision context, and error details. Failed runs also write a
stored ingest report with `status: "failed"`, `failure.phase`,
`failure.message`, and the same trace path.
Use `jq` or line-oriented tools to inspect a trace:
```bash
jq -c '. | {at, level, phase, event, durationMs, data, error}' \
.ktx/ingest-traces/<jobId>/trace.jsonl
```
KTX writes `debug` trace events by default. Set `KTX_INGEST_TRACE_LEVEL` to
`error`, `info`, `debug`, or `trace` before running ingest to change the trace
verbosity:
```bash
KTX_INGEST_TRACE_LEVEL=trace ktx ingest metabase
```
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| Connection not configured | The connection id is not present in `ktx.yaml` | Add the connection with `ktx setup` or update `ktx.yaml` |
| Deep readiness is missing | `--deep` or query history needs model, embedding, and scan-enrichment configuration | Run `ktx setup` or rerun with `--fast` |
| Query history is unsupported | The selected database driver does not support query history | Run schema ingest without query-history flags |
| Python runtime is missing | The selected ingest target needs runtime-backed SQL analysis or source parsing | Accept the interactive prompt, rerun with `--yes`, or run the suggested `ktx admin runtime install` command |
| Source options were ignored | Depth and query-history flags were supplied for a non-database source | Omit database-only flags when ingesting source connections |
| Text ingest stops early | `--fail-fast` was used and one item failed | Fix the failed item or rerun without `--fail-fast` to collect all failures |