feat: merge ingest and scan

* docs: add CLI component reuse guidance

* docs: add unified ingest ux design

* Refine unified ingest UX design after adversarial review iteration 1

* Refine unified ingest UX design after adversarial review iteration 2

* Refine unified ingest UX design after adversarial review iteration 3

* feat(cli): route public connection ingest command

* feat(cli): hide standalone scan from public help

* feat(cli): plan public ingest depth and query history

* feat(cli): execute public database ingest facets

* feat(ingest): read connection query history config

* fix(cli): use public ingest wording

* fix(config): stop generating ingest adapter allow lists

* docs: document public ingest command

* test: align ingest surface expectations

* docs: add unified ingest public CLI surface plan

* feat(cli): preflight deep public ingest readiness

* feat(setup): store query history in connection context

* feat(setup): store database context depth

* feat(setup): verify context readiness by database depth

* fix(setup): keep context build foreground only

* fix(config): reject reserved ingest connection ids

* test: close unified ingest v1 expectations

* docs: add unified ingest v1 closure plan

* fix(ingest): bypass adapter allow-list for public source ingest

* fix(ingest): honor query history window intent

* fix(ingest): hide scan internals from public database ingest

* feat(ingest): use foreground view for interactive public ingest

* fix(setup): use schema context and query history wording

* test(cli): verify unified ingest public output

* docs: add unified ingest v1 public output closure plan

* fix(setup): forward query history flags

* fix(setup): prompt for postgres query history

* fix(status): report query history readiness

* fix(ingest): remove legacy public guidance

* fix(ingest): polish foreground retry copy

* docs(examples): use unified query history wording

* chore(ingest): finish public query history cleanup

* docs: add unified ingest v1 query history status cleanup plan

* test(docs): cover unified ingest public docs

* docs: align ingest CLI reference with unified UX

* docs: update context build guides for unified ingest

* docs: update setup and primary source ingest wording

* docs: stop advertising adapter-backed example ingest

* docs: close unified ingest public docs gaps

* docs: add unified ingest v1 docs site closure plan

* fix: render unified ingest foreground warnings

* fix: explain query history schema order

* fix: add public ingest retry guidance

* fix: align setup next steps with unified ingest

* fix: remove scan wording from demo progress

* test: verify unified ingest ux closure

* docs: add unified ingest v1 foreground and retry closure plan

* fix(cli): preserve query-history pull config in public ingest

* fix(cli): omit hidden commands from docs command tree

* test(cli): close unified ingest final public surface checks

* docs: add unified ingest v1 final public surface closure plan

* fix(cli): use public source labels in ingest reports

* fix(cli): suppress low-level public ingest output

* test(cli): verify unified ingest public plain output

* docs: add unified ingest v1 public plain output closure plan

* fix(cli): add public ingest copy sanitizers

* fix(cli): sanitize public ingest progress copy

* fix(cli): rename setup schema scope prompt

* docs(plan): add progress copy closure; test: align setup back-nav fixture

Adds the iter9 plan and updates the setup back-navigation test fixture
to pass disableQueryHistory plus listSchemas/listTables stubs that the
unified ingest setup step now requires.

* docs(plan): add final ux labels plan with narrowed label scans

* fix(cli): aggregate unsupported query-history warnings

* fix(cli): align setup database labels

* test(cli): fix setup database test type-check

* fix(cli): remove primary-source wording from setup output

* test(cli): verify unified ingest setup closure

* docs(plan): add unified ingest v1 verification copy closure plan

* fix(cli): remove top-level scan command

* fix(cli): remove legacy ingest and wiki commands

* Merge scan into ingest flow

* feat(cli): split ingest progress into per-phase rows, rename work units to tasks

Each database target in the unified ingest dashboard now renders one row per
real subprocess (Schema, then Query history when enabled) instead of a single
combined bar. Each phase has its own monotonic 0-100% bar so the progress
never snaps back to zero when historic-sql starts after scan completes.
Completed phases keep their final bar, summary, and elapsed time visible as
an inline audit trail; queued and skipped phases are shown explicitly.

Also rename user-facing "work units" / "Failed work units" to "tasks" /
"Failed tasks" in ingest output and parseIngestSummary. The parser still
accepts the legacy "Work units:" wording in captured output for backward
compat. Internal memory-flow event names and type fields are left alone.

* Fix test harness failures

* Fix CI smoke checks

---------

Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
This commit is contained in:
Andrey Avtomonov 2026-05-14 01:43:06 +02:00 committed by GitHub
parent 1a472cf3ed
commit b00c1a11a9
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
118 changed files with 16890 additions and 2992 deletions

View file

@ -3,7 +3,7 @@ title: "ktx dev"
description: "Low-level project initialization and runtime management."
---
`ktx dev` contains development-only project initialization and managed runtime commands. Scan and ingest commands live at the root as [`ktx scan`](/docs/cli-reference/ktx-scan) and [`ktx ingest`](/docs/cli-reference/ktx-ingest).
`ktx dev` contains development-only project initialization and managed runtime commands. Context building lives at the root as [`ktx ingest`](/docs/cli-reference/ktx-ingest).
## Command signature

View file

@ -1,73 +1,59 @@
---
title: "ktx ingest"
description: "Run and inspect local ingest memory-flow output."
description: "Build or refresh KTX context from configured connections."
---
`ktx ingest` runs adapter-level local ingest and renders stored ingest reports.
`ktx ingest` builds or refreshes KTX context from configured connections.
Database connections build schema context. Context-source connections ingest
metadata from tools such as dbt, Looker, Metabase, MetricFlow, LookML, and
Notion.
## Command signature
```bash
ktx ingest <subcommand> [options]
ktx ingest [options] [connectionId]
```
## Subcommands
Use a connection id to build one configured connection. Use `--all` to build
every configured connection. Database connections run before context-source
connections when you use `--all`.
| Subcommand | Description |
|-----------|-------------|
| `run` | Run local ingest for one configured connection and source adapter |
| `status [runId]` | Print status for the latest or selected stored local ingest run or report file |
| `watch [runId]` | Open the latest or selected stored ingest visual report |
| `replay <runId>` | Replay a stored ingest run or bundle report through memory-flow output |
## `ingest run`
## Build options
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <connectionId>` | KTX connection id | Required |
| `--adapter <adapter>` | Ingest source adapter name | Required |
| `--source-dir <path>` | Directory containing source files | — |
| `--database-introspection-url <url>` | Daemon URL for live-database introspection | — |
| `--debug-llm-request-file <path>` | Write sanitized LLM request structure to a JSONL file | — |
| `--all` | Build every configured connection | `false` |
| `--fast` | Use deterministic database schema ingest | Stored connection default, or `fast` |
| `--deep` | Use AI-enriched database ingest | Stored connection default, or `fast` |
| `--query-history` | Include database query-history usage patterns | Stored connection default |
| `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default |
| `--query-history-window-days <days>` | Query-history lookback window for this run | Stored connection default |
| `--plain` | Print plain text output | `true` |
| `--json` | Print JSON output | `false` |
| `--viz` | Render memory-flow TUI output | `false` |
| `--yes` | Install the managed Python runtime without prompting when required | `false` |
| `--no-input` | Disable interactive terminal input for visualization and runtime installation | — |
| `--no-input` | Disable interactive terminal input | `false` |
## `ingest status`, `watch`, and `replay`
| Flag | Description | Default |
|------|-------------|---------|
| `--report-file <path>` | Bundle ingest report JSON file to render | — |
| `--plain` | Print plain text output | `true` for `status` and `replay` |
| `--json` | Print JSON output | `false` |
| `--viz` | Render memory-flow TUI output | `true` for `watch` |
| `--no-input` | Disable interactive terminal input for visualization | — |
`--fast` and `--deep` are mutually exclusive. Depth flags apply only to
database connections. Query-history flags apply only to database connections
that support query history.
## Examples
```bash
ktx ingest run --connection-id my-dbt-source --adapter dbt
ktx ingest run --connection-id prod-metabase --adapter metabase --yes
ktx ingest status
ktx ingest status run-abc123
ktx ingest status --json
ktx ingest watch
ktx ingest watch run-abc123
ktx ingest replay run-abc123
ktx ingest replay run-abc123 --viz
ktx ingest replay run-abc123 --report-file /tmp/ingest-report.json
ktx ingest warehouse
ktx ingest warehouse --fast
ktx ingest warehouse --deep
ktx ingest warehouse --deep --query-history
ktx ingest warehouse --query-history-window-days 30
ktx ingest notion
ktx ingest --all
ktx ingest --all --deep
```
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| Ingest needs credentials | The source adapter requires API or git access | Configure the referenced environment variable or secret file |
| Ingest run cannot find adapter | `--adapter` does not match a supported source adapter | Use a configured adapter such as `dbt`, `metabase`, `looker`, `lookml`, `notion`, or `live-database` |
| Latest run not found | No ingest run has been started in this project | Run `ktx ingest run --connection-id <id> --adapter <adapter>` first |
| Report watch fails in a non-interactive shell | Visual report needs a terminal | Use `ktx ingest status --json` for agent and CI workflows |
| Connection not configured | The connection id is not present in `ktx.yaml` | Add the connection with `ktx setup` or update `ktx.yaml` |
| Deep readiness is missing | `--deep` or query history needs model, embedding, and scan-enrichment configuration | Run `ktx setup` or rerun with `--fast` |
| Query history is unsupported | The selected database driver does not support query history | Run schema ingest without query-history flags |
| No ingest target was selected | No connection id was provided and `--all` was omitted | Run `ktx ingest <connectionId>` or `ktx ingest --all` |

View file

@ -1,44 +0,0 @@
---
title: "ktx scan"
description: "Run standalone database scans."
---
Discover a configured database connection's schema, including tables, columns, types, constraints, and optional relationship signals.
## Command signature
```bash
ktx scan <connectionId> [options]
```
## Options
| Flag | Description | Default |
|------|-------------|---------|
| `--mode <mode>` | Scan mode: `structural`, `enriched`, or `relationships` | `structural` |
| `--dry-run` | Run without writing scan results | `false` |
| `--database-introspection-url <url>` | Daemon URL for live-database introspection | — |
| `--yes` | Install the managed Python runtime without prompting when required | `false` |
| `--no-input` | Disable interactive managed runtime installation | — |
## Examples
```bash
ktx scan my-warehouse
ktx scan my-warehouse --mode enriched
ktx scan my-warehouse --mode relationships
ktx scan my-warehouse --dry-run
ktx scan my-warehouse --database-introspection-url http://127.0.0.1:8765
```
## Output
`ktx scan` prints a human summary and writes scan artifacts under the KTX project directory unless `--dry-run` is set. Use `ktx status` after a scan to inspect project readiness and next setup work.
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| Scan cannot connect | Connection credentials or network access are invalid | Run `ktx connection test <connectionId>` and update the connection before scanning |
| Enriched scan cannot describe columns | LLM credentials are missing or invalid | Complete LLM setup with `ktx setup` before enriched scans |
| Relationship scan has limited evidence | The connector cannot provide optional validation or statistics | Re-run with a connector that supports the missing capability, or treat relationship output as lower-confidence context |

View file

@ -30,7 +30,7 @@ ktx setup [options]
| `--global` | Install agent integration into the global target scope (Claude Code and Codex only) | `false` |
The setup wizard is the public configuration interface. It prompts for LLM
credentials, embeddings, database connections, context sources, Historic SQL,
credentials, embeddings, database connections, context sources, query history,
and agent integration when those values are needed.
## Examples
@ -62,7 +62,7 @@ KTX project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Primary sources configured: yes (postgres-warehouse)
Databases configured: yes (postgres-warehouse)
Context sources configured: yes (dbt-main)
KTX context built: yes
Agent integration ready: yes (codex:project)

View file

@ -1,6 +1,6 @@
---
title: "ktx wiki"
description: "List, read, search, or write wiki pages."
description: "List or search wiki pages."
---
Manage wiki pages in your KTX project. Wiki pages are Markdown documents that capture business definitions, rules, and gotchas. Agents search them for context when answering questions about your data.
@ -16,9 +16,7 @@ ktx wiki <subcommand> [options]
| Subcommand | Description |
|-----------|-------------|
| `list` | List local wiki pages |
| `read <key>` | Read one local wiki page |
| `search <query>` | Search local wiki pages |
| `write <key>` | Write one local wiki page |
## Options
@ -29,13 +27,6 @@ ktx wiki <subcommand> [options]
| `--json` | Print JSON output | `false` |
| `--user-id <id>` | Local user id | `local` |
### `wiki read`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output | `false` |
| `--user-id <id>` | Local user id | `local` |
### `wiki search`
| Flag | Description | Default |
@ -44,18 +35,6 @@ ktx wiki <subcommand> [options]
| `--user-id <id>` | Local user id | `local` |
| `--limit <number>` | Maximum search results | — |
### `wiki write`
| Flag | Description | Default |
|------|-------------|---------|
| `--user-id <id>` | Local user id | `local` |
| `--scope <scope>` | Scope: `global` or `user` | `global` |
| `--summary <summary>` | Wiki page summary (required) | — |
| `--content <content>` | Wiki page content (required) | — |
| `--tag <tag>` | Wiki tag; repeatable | — |
| `--ref <ref>` | Wiki ref; repeatable | — |
| `--sl-ref <ref>` | Semantic-layer ref; repeatable | — |
## Examples
```bash
@ -65,48 +44,17 @@ ktx wiki list
# List all wiki pages as JSON
ktx wiki list --json
# Read a specific wiki page
ktx wiki read revenue-definitions
# Read a specific wiki page as JSON
ktx wiki read revenue-definitions --json
# Search wiki pages
ktx wiki search "monthly recurring revenue"
# Search wiki pages as JSON
ktx wiki search "monthly recurring revenue" --json --limit 10
# Write a global wiki page
ktx wiki write revenue-definitions \
--summary "Canonical revenue metric definitions" \
--content "## MRR\nMonthly Recurring Revenue is calculated as..."
# Write a user-scoped wiki page
ktx wiki write my-notes \
--scope user \
--summary "Personal analysis notes" \
--content "Things to check when revenue numbers look off..."
# Write a page with tags and references
ktx wiki write churn-rules \
--summary "Churn calculation business rules" \
--content "A customer is considered churned when..." \
--tag finance \
--tag retention \
--sl-ref customers \
--sl-ref subscriptions
# Write a page with external references
ktx wiki write data-freshness \
--summary "Data pipeline SLAs and freshness guarantees" \
--content "The orders table refreshes every 15 minutes..." \
--ref "https://wiki.example.com/data-pipelines"
```
## Output
Wiki commands print local wiki pages and search results. Agents should search first, then read the most relevant page by key.
Wiki commands print local wiki page listings and search results. Open the
matching Markdown files directly when you need the full page contents.
```json
{
@ -128,6 +76,4 @@ Wiki commands print local wiki pages and search results. Agents should search fi
| Error | Cause | Recovery |
|-------|-------|----------|
| Search returns no results | The query terms do not match summaries, tags, or content | Retry with business synonyms, then create a page if the knowledge is missing |
| Read fails for a key | The page key is wrong or scoped to a different user | Run `ktx wiki list` or search again to get the exact key |
| Write fails due to missing fields | `--summary` or `--content` was omitted | Pass both fields, and keep the summary short enough for search results |
| Agent writes duplicate pages | It did not search existing pages first | Always run `ktx wiki search` before `ktx wiki write` |
| A page is missing | No Markdown file exists for that business context | Add a file under `wiki/` or run `ktx ingest <connectionId>` |

View file

@ -4,7 +4,6 @@
"pages": [
"ktx-setup",
"ktx-connection",
"ktx-scan",
"ktx-ingest",
"ktx-sl",
"ktx-wiki",