feat(cli)!: remove fast mode; ktx ingest always builds enriched context (KLO-721) (#237)

Fast mode (the ktx ingest --fast/--deep database-ingest depth toggle) is removed.
ktx ingest now always builds the full enriched ("deep") context. There is no
structural fallback: a database connection without a configured model and
embeddings fails the enrichment-readiness preflight before any work runs, with
a 'Run ktx setup to configure a model and embeddings' hint.

- Remove --fast/--deep flags, the per-connection context.depth field, and the
  ktx setup depth prompt (delete setup-database-context-depth.ts).
- Rename ingest-depth.ts -> connection-drivers.ts; ingest always requests scan
  mode 'enriched'; readiness gate (enrichmentReadinessGaps) runs for every
  database target.
- Drop the database-context-depth telemetry step (Node + Python schema mirrors
  regenerated).
- Update CLI, setup, context-build view, docs, the public ktx skill, and the
  release-smoke / artifacts scripts (now assert the no-LLM guard failure).

ktx status --fast (a separate network-probe flag) is unchanged.

Follow-ups: KLO-726 (live progress for ktx ingest --all), KLO-727 (restore
credentialed successful-ingest release smoke coverage).
This commit is contained in:
Andrey Avtomonov 2026-05-29 17:41:04 +02:00 committed by GitHub
parent 637891f030
commit 3f0d11e07d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
34 changed files with 222 additions and 884 deletions

View file

@ -5,9 +5,11 @@ description: "Build or refresh ktx context, or capture text into ktx memory."
`ktx ingest` builds or refreshes **ktx** context from configured connections, and
can also capture free-form text into **ktx** memory. Database connections build
schema context. Context-source connections ingest metadata from tools such as
dbt, Looker, Metabase, MetricFlow, LookML, and Notion. Pass `--text` or
`--file` to capture inline text or text files into memory instead.
enriched context — schema plus AI-generated descriptions, embeddings, and
relationship evidence — and require a configured model and embeddings.
Context-source connections ingest metadata from tools such as dbt, Looker,
Metabase, MetricFlow, LookML, and Notion. Pass `--text` or `--file` to capture
inline text or text files into memory instead.
## Command signature
@ -29,8 +31,6 @@ connection is selected.
| Flag | Description | Default |
|------|-------------|---------|
| `--all` | Ingest all configured connections (same as bare invocation) | `false` |
| `--fast` | Use deterministic fast database ingest | Stored connection default, or `fast` |
| `--deep` | Use deep database ingest with AI-generated descriptions, embeddings, and relationship evidence | Stored connection default, or `fast` |
| `--query-history` | Include database query-history usage patterns | Stored connection default |
| `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default |
| `--query-history-window-days <days>` | BigQuery/Snowflake query-history lookback window for this run | Stored connection default |
@ -44,12 +44,12 @@ connection is selected.
| `--yes` | Install required managed runtime features without prompting | `false` |
| `--no-input` | Disable interactive terminal input | - |
`--fast` and `--deep` are mutually exclusive. Depth flags apply only to
database connections. Query-history flags apply only to database connections
Database ingest always builds enriched context and requires a configured model
and embeddings (run `ktx setup`); connections without that configuration fail
before any work starts. Query-history flags apply only to database connections
that support query history. The window flag applies to BigQuery and Snowflake;
Postgres reads the current `pg_stat_statements` aggregate data instead of a
time-windowed history table. Query-history ingest runs after fast ingest and
requires deep ingest readiness.
time-windowed history table. Query-history ingest runs after the schema scan.
When more than one connection is selected, database ingest runs first, then
context-source ingest and memory updates run for context-source connections.
@ -72,14 +72,8 @@ ktx ingest
# Build one database or context-source connection
ktx ingest warehouse
# Force deterministic fast database ingest
ktx ingest warehouse --fast
# Force deep database ingest with AI enrichment
ktx ingest warehouse --deep
# Include query-history usage patterns
ktx ingest warehouse --deep --query-history
ktx ingest warehouse --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30
@ -154,8 +148,8 @@ KTX_INGEST_TRACE_LEVEL=trace ktx ingest metabase
| Error | Cause | Recovery |
|-------|-------|----------|
| Connection not configured | The connection id is not present in `ktx.yaml` | Add the connection with `ktx setup` or update `ktx.yaml` |
| Deep readiness is missing | `--deep` or query history needs model, embedding, and scan-enrichment configuration | Run `ktx setup` or rerun with `--fast` |
| Query history is unsupported | The selected database driver does not support query history | Run fast ingest without query-history flags |
| Enrichment is not configured | Database ingest needs a model, embeddings, and scan-enrichment configuration | Run `ktx setup` to configure a model and embeddings |
| Query history is unsupported | The selected database driver does not support query history | Run ingest without query-history flags |
| Python runtime is missing | The selected ingest target needs runtime-backed SQL analysis or source parsing | Accept the interactive prompt, rerun with `--yes`, or run the suggested `ktx admin runtime install` command |
| Context-source options were ignored | Depth and query-history flags were supplied for a context-source connection | Omit database-only flags when ingesting context-source connections |
| Context-source options were ignored | Query-history flags were supplied for a context-source connection | Omit database-only flags when ingesting context-source connections |
| Text ingest stops early | `--fail-fast` was used and one item failed | Fix the failed item or rerun without `--fail-fast` to collect all failures |

View file

@ -131,8 +131,8 @@ BigQuery; and `databases` for ClickHouse.
Query history setup is supported for Postgres, BigQuery, and Snowflake. The
window flag applies to BigQuery and Snowflake; Postgres reads the current
`pg_stat_statements` aggregate data instead of a time-windowed history table.
Enabling query history makes deep ingest readiness matter for later
`ktx ingest` runs.
Later `ktx ingest` runs build enriched context and need a configured model and
embeddings, including when query history is enabled.
When query history is enabled for PostgreSQL, Snowflake, or BigQuery,
`ktx setup` runs a non-blocking readiness probe after the connection test

View file

@ -66,8 +66,9 @@ read, how to think, and where to put the results.
## Minimal config
A working `ktx.yaml` needs one entry in `connections`. Everything else accepts
defaults. The example below is enough for `ktx ingest warehouse` to run a fast
schema scan against a local Postgres.
defaults. The example below registers a local Postgres connection; building
context with `ktx ingest warehouse` also needs a model and embeddings, which
`ktx setup` configures.
```yaml
connections:
@ -123,7 +124,7 @@ context-source drivers share the map.
Warehouse connections are open objects: the listed fields are validated, and
any other field is preserved and passed through to the connector. Use
`enabled_tables` to scope deep ingest to a specific list of
`enabled_tables` to scope ingest to a specific list of
`schema.table` names - useful for smoke tests.
```yaml

View file

@ -236,7 +236,7 @@ Testing warehouse
Connection test passed
Building schema context for warehouse
Running fast database ingest
Running database scan
```
If setup exits early, rerun `ktx setup` in the same directory. **ktx** keeps
@ -268,13 +268,13 @@ Agent integration ready: yes (codex:project)
For a structured check inside scripts, use `ktx status --json`.
When setup builds deep context, its final context check looks like:
When setup finishes building context, its final context check looks like:
```text
ktx context is ready for agents.
Databases:
warehouse: deep context complete
warehouse: database context complete
Context sources:
dbt_main: memory update complete
@ -326,7 +326,7 @@ ktx setup \
Then build context:
```bash
ktx ingest warehouse --fast
ktx ingest warehouse
```
See [ktx setup](/docs/cli-reference/ktx-setup) for the full automation flag

View file

@ -24,7 +24,9 @@ external metadata can attach to known warehouse tables.
## Database ingest
Database ingest records table, column, type, constraint, and row-count context.
Database ingest always builds enriched context: tables, columns, types,
constraints, and row counts, plus AI-generated descriptions, embeddings, and
relationship evidence.
```bash
# Build one configured database connection
@ -34,23 +36,8 @@ ktx ingest warehouse
ktx ingest --all
```
Depth controls how much context **ktx** builds:
| Flag | Best for | What it does |
|------|----------|--------------|
| `--fast` | First setup, quick refreshes, CI smoke checks | Deterministic fast ingest with tables, columns, types, constraints, and row counts |
| `--deep` | Agent-ready context for real analysis | Fast ingest plus deep enrichment with descriptions, embeddings, relationship evidence, and optional query history |
Examples:
```bash
ktx ingest warehouse --fast
ktx ingest warehouse --deep
ktx ingest --all --deep
```
Deep ingest needs LLM and embedding readiness. Otherwise run `ktx setup` or use
`--fast`.
Enriched ingest needs a configured model and embeddings. Run `ktx setup` first;
connections without that configuration fail before any work starts.
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools for the
current run.
@ -64,7 +51,7 @@ Enable it during setup, store it under `connections.<id>.context.queryHistory`,
or request it for one run:
```bash
ktx ingest warehouse --deep --query-history
ktx ingest warehouse --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30
```
@ -74,8 +61,8 @@ for one run.
## Relationship evidence
**ktx** scores relationship candidates during supported deep database ingest. The
public CLI does not expose separate relationship review subcommands.
**ktx** scores relationship candidates during database ingest. The public CLI
does not expose separate relationship review subcommands.
## Context-source ingest
@ -159,7 +146,7 @@ After interactive setup:
```bash
ktx status
ktx ingest --all --deep
ktx ingest --all
ktx status
```
@ -176,8 +163,8 @@ ktx wiki "revenue" --json --limit 10
| Symptom | Likely cause | Recovery |
|---------|--------------|----------|
| Connection not configured | The connection id is missing from `ktx.yaml` | Add it with `ktx setup` |
| Deep readiness is missing | LLM or embeddings are not setup-ready | Run `ktx setup`, or rerun with `--fast` |
| Query history is unsupported | The selected database driver does not expose query history | Run fast ingest without query-history flags |
| Enrichment is not configured | LLM or embeddings are not setup-ready | Run `ktx setup` to configure a model and embeddings |
| Query history is unsupported | The selected database driver does not expose query history | Run ingest without query-history flags |
| No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or context-source connection |
| Context-source flags have no effect | Depth and query-history flags were supplied for a context-source connector | Use those flags only for database connections |
| Context-source flags have no effect | Query-history flags were supplied for a context-source connector | Use query-history flags only for database connections |
| Text ingest stops early | `--fail-fast` stopped on the first failed item | Fix the item or rerun without `--fail-fast` |

View file

@ -111,12 +111,13 @@ non-obvious terms.
Agents can refresh context when the user asks them to:
```bash
ktx ingest warehouse --fast
ktx ingest warehouse
ktx ingest
ktx ingest --file docs/revenue-notes.md --connection-id warehouse
```
Use `--deep` only when LLM and embedding setup is ready.
Database ingest builds enriched context and requires a configured model and
embeddings; run `ktx setup` first if they are not ready.
## Good agent behavior

View file

@ -517,5 +517,5 @@ No authentication required - SQLite is file-based. The file must be readable by
| Connection URL appears in git diff | A literal credential URL was written to `ktx.yaml` | Replace it with `env:NAME` or `file:/path/to/secret` and rotate exposed credentials |
| Database ingest returns no tables | Schema, database, or project filter is wrong, or the user lacks metadata permissions | Verify the schema list and grant metadata read permissions |
| Query history is empty | Query history extension or warehouse history view is unavailable | Enable the warehouse-specific history feature, then rerun `ktx ingest <connectionId> --query-history` or `ktx setup` |
| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on fast schema context |
| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on schema-level context without column statistics |
| Semantic query execution fails | Connection is missing, unreachable, or query execution is disabled | Run `ktx connection test <id>` and check the `ktx sl query` flags |