mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-07 07:55:13 +02:00
* docs: add CLI component reuse guidance * docs: add unified ingest ux design * Refine unified ingest UX design after adversarial review iteration 1 * Refine unified ingest UX design after adversarial review iteration 2 * Refine unified ingest UX design after adversarial review iteration 3 * feat(cli): route public connection ingest command * feat(cli): hide standalone scan from public help * feat(cli): plan public ingest depth and query history * feat(cli): execute public database ingest facets * feat(ingest): read connection query history config * fix(cli): use public ingest wording * fix(config): stop generating ingest adapter allow lists * docs: document public ingest command * test: align ingest surface expectations * docs: add unified ingest public CLI surface plan * feat(cli): preflight deep public ingest readiness * feat(setup): store query history in connection context * feat(setup): store database context depth * feat(setup): verify context readiness by database depth * fix(setup): keep context build foreground only * fix(config): reject reserved ingest connection ids * test: close unified ingest v1 expectations * docs: add unified ingest v1 closure plan * fix(ingest): bypass adapter allow-list for public source ingest * fix(ingest): honor query history window intent * fix(ingest): hide scan internals from public database ingest * feat(ingest): use foreground view for interactive public ingest * fix(setup): use schema context and query history wording * test(cli): verify unified ingest public output * docs: add unified ingest v1 public output closure plan * fix(setup): forward query history flags * fix(setup): prompt for postgres query history * fix(status): report query history readiness * fix(ingest): remove legacy public guidance * fix(ingest): polish foreground retry copy * docs(examples): use unified query history wording * chore(ingest): finish public query history cleanup * docs: add unified ingest v1 query history status cleanup plan * test(docs): cover unified ingest public docs * docs: align ingest CLI reference with unified UX * docs: update context build guides for unified ingest * docs: update setup and primary source ingest wording * docs: stop advertising adapter-backed example ingest * docs: close unified ingest public docs gaps * docs: add unified ingest v1 docs site closure plan * fix: render unified ingest foreground warnings * fix: explain query history schema order * fix: add public ingest retry guidance * fix: align setup next steps with unified ingest * fix: remove scan wording from demo progress * test: verify unified ingest ux closure * docs: add unified ingest v1 foreground and retry closure plan * fix(cli): preserve query-history pull config in public ingest * fix(cli): omit hidden commands from docs command tree * test(cli): close unified ingest final public surface checks * docs: add unified ingest v1 final public surface closure plan * fix(cli): use public source labels in ingest reports * fix(cli): suppress low-level public ingest output * test(cli): verify unified ingest public plain output * docs: add unified ingest v1 public plain output closure plan * fix(cli): add public ingest copy sanitizers * fix(cli): sanitize public ingest progress copy * fix(cli): rename setup schema scope prompt * docs(plan): add progress copy closure; test: align setup back-nav fixture Adds the iter9 plan and updates the setup back-navigation test fixture to pass disableQueryHistory plus listSchemas/listTables stubs that the unified ingest setup step now requires. * docs(plan): add final ux labels plan with narrowed label scans * fix(cli): aggregate unsupported query-history warnings * fix(cli): align setup database labels * test(cli): fix setup database test type-check * fix(cli): remove primary-source wording from setup output * test(cli): verify unified ingest setup closure * docs(plan): add unified ingest v1 verification copy closure plan * fix(cli): remove top-level scan command * fix(cli): remove legacy ingest and wiki commands * Merge scan into ingest flow * feat(cli): split ingest progress into per-phase rows, rename work units to tasks Each database target in the unified ingest dashboard now renders one row per real subprocess (Schema, then Query history when enabled) instead of a single combined bar. Each phase has its own monotonic 0-100% bar so the progress never snaps back to zero when historic-sql starts after scan completes. Completed phases keep their final bar, summary, and elapsed time visible as an inline audit trail; queued and skipped phases are shown explicitly. Also rename user-facing "work units" / "Failed work units" to "tasks" / "Failed tasks" in ingest output and parseIngestSummary. The parser still accepts the legacy "Work units:" wording in captured output for backward compat. Internal memory-flow event names and type fields are left alone. * Fix test harness failures * Fix CI smoke checks --------- Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
171 lines
5.4 KiB
Text
171 lines
5.4 KiB
Text
---
|
|
title: Building Context
|
|
description: Build database and source context from configured KTX connections.
|
|
---
|
|
|
|
Building context reads your configured connections and writes local context that
|
|
agents can use. Database connections produce schema context, and source
|
|
connections such as dbt, Looker, Metabase, and Notion produce semantic sources
|
|
and wiki pages.
|
|
|
|
## Database ingest
|
|
|
|
Database ingest connects to your warehouse and extracts structural metadata.
|
|
KTX stores the results locally so agents can understand your schema without
|
|
querying the database directly.
|
|
|
|
### Running database ingest
|
|
|
|
```bash
|
|
ktx ingest <connection-id>
|
|
```
|
|
|
|
This runs a fast schema ingest by default. You can choose the depth with public
|
|
flags:
|
|
|
|
| Flag | What it does |
|
|
|------|-------------|
|
|
| `--fast` | Tables, columns, types, constraints, and row counts |
|
|
| `--deep` | Fast ingest plus AI-enriched database context |
|
|
|
|
```bash
|
|
# Build one connection quickly
|
|
ktx ingest my-postgres --fast
|
|
|
|
# Build AI-enriched database context
|
|
ktx ingest my-postgres --deep
|
|
|
|
# Build all configured connections
|
|
ktx ingest --all
|
|
```
|
|
|
|
### Checking results
|
|
|
|
Every ingest prints a summary and writes local artifacts. Use `ktx status`
|
|
after ingest to review project readiness and follow-up setup work:
|
|
|
|
```bash
|
|
ktx status
|
|
```
|
|
|
|
### Relationship detection
|
|
|
|
Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals — name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status:
|
|
|
|
| Score range | Status | Meaning |
|
|
|-------------|--------|---------|
|
|
| ≥ 0.85 | `accepted` | High confidence — applied automatically |
|
|
| 0.55 – 0.84 | `review` | Plausible — needs human review |
|
|
| < 0.55 | `rejected` | Low confidence — not applied |
|
|
|
|
Deep database ingest can include relationship evidence where the connector can
|
|
provide it. Relationship review and calibration subcommands are not part of the
|
|
current public CLI surface.
|
|
|
|
## Ingestion
|
|
|
|
Ingestion pulls semantic context from your existing analytics tools — dbt projects, Looker models, Metabase questions, and more — and writes it into your KTX project as semantic sources and wiki pages.
|
|
|
|
### How it works
|
|
|
|
Each ingest run follows this flow:
|
|
|
|
1. An **adapter** extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.)
|
|
2. An **LLM agent** reconciles the extracted metadata with your existing context — it merges intelligently rather than overwriting
|
|
3. **Semantic sources** (YAML) and **wiki pages** (Markdown) are written to your project directory
|
|
|
|
### Running an ingest
|
|
|
|
```bash
|
|
ktx ingest my-dbt-source
|
|
```
|
|
|
|
Useful output flags:
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--json` | Output as JSON |
|
|
| `--plain` | Plain text output |
|
|
|
|
Foreground context builds do not detach into background control sessions. If a
|
|
run is interrupted, rerun `ktx ingest <connection-id>` or `ktx ingest --all`.
|
|
|
|
### Supported context sources
|
|
|
|
| Driver | Source | What gets ingested |
|
|
|--------|--------|--------------------|
|
|
| `dbt` | dbt project | Model definitions, column descriptions, tests, tags |
|
|
| `metricflow` | MetricFlow semantic models | Metrics, dimensions, entities, semantic joins |
|
|
| `lookml` | LookML files | Views, explores, dimensions, measures, joins |
|
|
| `looker` | Looker API | Explores, looks, dashboard metadata |
|
|
| `metabase` | Metabase API | Questions, dashboards, table metadata |
|
|
| `notion` | Notion API | Database pages, knowledge articles |
|
|
|
|
Query history is a database connection facet. Enable it with
|
|
`connections.<id>.context.queryHistory` or pass `--query-history` for a current
|
|
run. See [Context Sources](/docs/integrations/context-sources) for
|
|
driver-specific setup and auth configuration.
|
|
|
|
### What gets generated
|
|
|
|
A typical dbt ingest produces semantic sources and wiki pages in your project:
|
|
|
|
**Semantic source** (`semantic-layer/my-postgres/orders.yaml`):
|
|
|
|
```yaml title="semantic-layer/my-postgres/orders.yaml"
|
|
name: orders
|
|
table: public.orders
|
|
grain:
|
|
- order_id
|
|
columns:
|
|
- name: order_id
|
|
type: string
|
|
description: Unique order identifier
|
|
- name: customer_id
|
|
type: string
|
|
description: Foreign key to customers table
|
|
- name: order_date
|
|
type: time
|
|
role: time
|
|
description: Date the order was placed
|
|
- name: total_amount
|
|
type: number
|
|
description: Total order value in USD
|
|
measures:
|
|
- name: total_revenue
|
|
expr: SUM(total_amount)
|
|
description: Sum of all order values
|
|
- name: order_count
|
|
expr: COUNT(DISTINCT order_id)
|
|
description: Number of distinct orders
|
|
joins:
|
|
- to: customers
|
|
on: orders.customer_id = customers.customer_id
|
|
relationship: many_to_one
|
|
```
|
|
|
|
**Wiki page** (`wiki/global/order-status-definitions.md`):
|
|
|
|
```markdown
|
|
---
|
|
summary: Business definitions for order status values
|
|
tags: [orders, definitions]
|
|
sl_refs: [orders]
|
|
---
|
|
|
|
## Order Statuses
|
|
|
|
- **pending**: Order placed but not yet processed
|
|
- **confirmed**: Payment received, awaiting fulfillment
|
|
- **shipped**: Order dispatched to carrier
|
|
- **delivered**: Order received by customer
|
|
- **cancelled**: Order cancelled before shipment
|
|
|
|
Orders in "pending" status for more than 48 hours are flagged for review.
|
|
```
|
|
|
|
### Ingest transcripts
|
|
|
|
Every ingest session records a full transcript: tool calls, LLM responses, and
|
|
write decisions. Inspect the stored transcript files when you need to debug why
|
|
a source was written a certain way.
|