ktx/docs-site/content/docs/cli-reference/ktx-ingest.mdx

---
title: "ktx ingest"
description: "Build or refresh ktx context, or capture text into ktx memory."
---

`ktx ingest` builds or refreshes **ktx** context from configured connections, and
can also capture free-form text into **ktx** memory. Database connections build
enriched context — schema plus AI-generated descriptions, embeddings, and
relationship evidence — and require a configured model and embeddings.
Context-source connections ingest metadata from tools such as dbt, Looker,
Metabase, MetricFlow, LookML, and Notion. Pass `--text` or `--file` to capture
inline text or text files into memory instead.

## Command signature

```bash
ktx ingest [options] [connectionId]
```

- Bare `ktx ingest` (no positional, no `--all`) ingests every configured
  connection.
- `ktx ingest <connectionId>` ingests one configured connection.
- `ktx ingest --text "..."` (or `--file <path>`) captures notes into **ktx**
  memory instead of ingesting a connection.

Database connections run before context-source connections when more than one
connection is selected.

## Options

| Flag | Description | Default |
|------|-------------|---------|
| `--all` | Ingest all configured connections (same as bare invocation) | `false` |
| `--query-history` | Include database query-history usage patterns | Stored connection default |
| `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default |
| `--query-history-window-days <days>` | BigQuery/Snowflake query-history lookback window for this run | Stored connection default |
| `--text <content>` | Capture inline text into **ktx** memory; repeatable | `[]` |
| `--file <path>` | Capture a text file into **ktx** memory; use `-` for stdin; repeatable | `[]` |
| `--connection-id <connectionId>` | **ktx** connection id to tag captured text/file notes | - |
| `--user-id <id>` | Memory user id for text/file capture attribution | `local-cli` |
| `--fail-fast` | Stop after the first failed text/file item | `false` |
| `--plain` | Print plain text output | `true` |
| `--json` | Print JSON output | `false` |
| `--yes` | Install required managed runtime features without prompting | `false` |
| `--no-input` | Disable interactive terminal input | - |

Database ingest always builds enriched context and requires a configured model
and embeddings (run `ktx setup`); connections without that configuration fail
before any work starts. Query-history flags apply only to database connections
that support query history. The window flag applies to BigQuery and Snowflake;
Postgres reads the current `pg_stat_statements` aggregate data instead of a
time-windowed history table. Query-history ingest runs after the schema scan.

When more than one connection is selected, database ingest runs first, then
context-source ingest and memory updates run for context-source connections.

Some ingest paths use the managed **ktx** Python runtime. Query-history ingest uses
it for SQL analysis, and Looker context-source ingest uses it for Looker identifier
parsing. In an interactive terminal, `ktx ingest` prompts before installing the
required runtime features. Use `--yes` to install them without prompting, or
use `--no-input` to fail fast with install guidance.

`--text` and `--file` cannot be combined with a positional `connectionId` or
`--all`; pass `--connection-id <id>` instead to tag captured notes.

## Examples

```bash
# Build every configured connection (bare = --all)
ktx ingest

# Build one database or context-source connection
ktx ingest warehouse

# Include query-history usage patterns
ktx ingest warehouse --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30

# Build a context-source connection
ktx ingest notion

# Capture inline text into memory
ktx ingest --text "Refunds are excluded from net revenue."

# Capture multiple text snippets in one call
ktx ingest --text "Revenue is gross receipts." --text "Orders are completed purchases."

# Capture a local Markdown file into memory and tag it to a connection
ktx ingest --file docs/revenue-notes.md --connection-id warehouse

# Capture one stdin item
printf "Refunds are excluded from net revenue." | ktx ingest --file -
```

## Output

Plain output summarizes each target and the operations that ran.

```text
Ingest finished

Source         Database schema  Query history  Source ingest  Memory update
warehouse      done             done           skipped        skipped
notion         skipped          skipped        done           done
```

Use `--json` when a script or agent needs the selected plan and per-target
results.

## Inspect context-source ingest traces

Context-source ingest writes persistent JSONL traces for postmortem debugging.
Plain ingest output prints the trace path near the report, run, and job
identifiers when a trace is available:

```text
Report: report-abc123
Run: run-abc123
Job: job-abc123
Trace: .ktx/ingest-traces/job-abc123/trace.jsonl
```

The trace file lives under the project directory at
`.ktx/ingest-traces/<jobId>/trace.jsonl`. Each line is a JSON event with the
job id, run id, sync id, connection id, source key, phase, event name, timing,
state snapshot, decision context, and error details. Failed runs also write a
stored ingest report with `status: "failed"`, `failure.phase`,
`failure.message`, and the same trace path.

Use `jq` or line-oriented tools to inspect a trace:

```bash
jq -c '. | {at, level, phase, event, durationMs, data, error}' \
  .ktx/ingest-traces/<jobId>/trace.jsonl
```

**ktx** writes `debug` trace events by default. Set `KTX_INGEST_TRACE_LEVEL` to
`error`, `info`, `debug`, or `trace` before running ingest to change the trace
verbosity:

```bash
KTX_INGEST_TRACE_LEVEL=trace ktx ingest metabase
```

### Profiling a slow ingest

Each timed phase and work unit records a `durationMs` in the trace, and each
agent loop records its step count and token usage. To see where wall-clock time
went, enable profiling and **ktx** prints a rolled-up breakdown to stderr at the
end of the run. There are two ways to turn it on, and two output formats.

Turn it on per run with the `KTX_PROFILE_INGEST` environment variable, or
persistently with `ingest.profile` in `ktx.yaml` (useful for CI or while
iterating on a slow source):

```bash
KTX_PROFILE_INGEST=1 ktx ingest metabase       # human-readable table
KTX_PROFILE_INGEST=json ktx ingest metabase    # raw JSON for coding agents
```

```yaml
ingest:
  profile: true   # human table; use "json" for the machine-readable form
```

Both formats report total wall time, time per phase, and the slowest work units,
splitting each work unit's agent-loop time into model time versus tool-execution
time. The `json` form emits the full structured profile (raw milliseconds and
token counts, stable keys) plus a `summary.headline` one-line diagnosis, so a
coding agent can parse it directly instead of scraping the table. If both the env
var and the config request profiling, `json` wins. Example headline:

```text
Slowest phase: reconciliation (2m 05s, 48% of wall time). 2 work units (1 failed), ~88% model generation vs ~12% tools.
```

Work units run serially by default (`ingest.workUnits.maxConcurrency` is `1`);
raise it in `ktx.yaml` if the profile shows the run is bound by serialized
work-unit agent loops.

## Common errors

| Error | Cause | Recovery |
|-------|-------|----------|
| Connection not configured | The connection id is not present in `ktx.yaml` | Add the connection with `ktx setup` or update `ktx.yaml` |
| Enrichment is not configured | Database ingest needs a model, embeddings, and scan-enrichment configuration | Run `ktx setup` to configure a model and embeddings |
| Query history is unsupported | The selected database driver does not support query history | Run ingest without query-history flags |
| Python runtime is missing | The selected ingest target needs runtime-backed SQL analysis or source parsing | Accept the interactive prompt, rerun with `--yes`, or run the suggested `ktx admin runtime install` command |
| Context-source options were ignored | Query-history flags were supplied for a context-source connection | Omit database-only flags when ingesting context-source connections |
| Text ingest stops early | `--fail-fast` was used and one item failed | Fix the failed item or rerun without `--fail-fast` to collect all failures |