The TypeScript checks CI job does not install uv or Python, so the
module-level `execFileSync('uv', ...)` in schemas.contract.test.ts threw
ENOENT and failed the suite. Wrap the schema dump in a try/catch and
guard the describe block with `describe.skipIf` so the test skips in
environments without uv. Local dev and any CI job that has uv on PATH
still runs the cross-language contract assertion.
A transient sampleTable failure during ingest used to take out every
table in a connection: generateTableDescription returned a hardcoded
'Table not found' string into descriptions.ai, and KtxDescriptionGenerator
was constructed without a logger, so the failure left no trail anywhere.
- sampleTable / sampleColumn calls retry 3x with 200/400/800ms backoff,
honouring KtxScanContext.signal via a new KtxAbortedError.
- On retry exhaustion or missing capability, table generation falls back
to a metadata-only prompt built from column name / native type / comment
/ rawDescriptions. The column path follows the same rule -- call the
LLM when any of samples or rawDescriptions are available; skip only
when both are absent.
- Logger is now threaded from KtxScanContext into the generator. Failures
emit structured KtxScanWarning entries (new description_fallback_used
code, plus existing sampling_failed / enrichment_failed /
connector_capability_missing). ktx scan groups warnings by code so a
batch of identical failures collapses to one summary line plus sample.
- Returns null on failure instead of the 'Table not found' sentinel; the
manifest writer's existing guard already skips empty descriptions, so
schema YAML no longer carries misleading text. SCAN_MANAGED_DESCRIPTION_KEYS
already strips stale 'ai' on merge, so existing YAML clears on next run.
Also suppress AI SDK v6 'system in messages' warning: pull system messages
out of KtxMessageBuilder.wrapSimple's output via a new splitKtxSystemMessages
helper and pass them top-level to generateText (preserves cacheControl
providerOptions on the SystemModelMessage). Agent-runner's local
splitSystemPromptMessages dedupes onto the shared helper.
Overlay sources now have two distinct collections: `columns:` for computed
columns (requiring `expr` + `type`) and `column_overrides:` for metadata
patches to inherited manifest columns. Composing or loading an overlay that
mixes the two — or references an unknown column — fails with a typed error.
Introduce `ResolvedSemanticLayerSource` / `resolvedSourceSchema` /
`toResolvedWire` as the strict shape sent to the Python engine, and add a
schema contract test that diffs Zod against the Pydantic JSON schema dumped
by `python -m semantic_layer dump-schema`. `SourceDefinition` is now
`extra="forbid"` on the Python side.
`loadAllSources` surfaces per-file load errors instead of swallowing them,
so validation/query paths can report manifest shard parse failures.
* refactor(context): export and describe mapping shape schemas
* feat(context): add driver-schemas module with warehouse drivers
* feat(context): add metabase, looker, lookml driver schemas with mappings
* feat(context): add notion, dbt, metricflow driver schemas
* refactor(context): make connectionSchema a driver-discriminated union
* chore(context): re-export KtxConnectionConfig from project package
* docs(context): add connection driver schema plan
* chore(secrets): allowlist example credentials in driver-schemas fixtures
* test(cli): align metabase fixtures with required api_url field
The driver-discriminated union added in this branch now requires api_url
for metabase connections and a known driver for warehouses. Update slow
CLI test fixtures and assertions so they exercise the new schema:
- ingest.test-utils.ts: add api_url to the prod-metabase fixture.
- setup.test.ts: switch metabase fixture from 'url' to 'api_url'.
- local-scan-connectors.test.ts: invalid-driver/missing-driver tests now
expect the schema error from loadKtxProject (parse-time rejection).
composeOverlay was appending overlay columns to the manifest column list,
producing duplicate entries when dbt/metabase overlays declared a column
just to attach descriptions. The duplicates carried no `type`, so the
pydantic SourceDefinition rejected them at semantic-query time and broke
`ktx sl query` for every overlay-backed measure. Now overlay columns
match base columns by name (case-insensitive): same-name entries merge
onto the manifest (overlay fields win, type/role fall back to the base,
descriptions merge per source key) and only new names append.
Annotates the Zod config schema with .describe() text on every field and
adds generateKtxProjectConfigJsonSchema() plus a ktx dev schema command
that prints (or writes) a draft-07 JSON Schema for editors and LLM agents.
* feat(cli): extend `ktx connection test` to every supported driver
Dispatch by driver: native DBs now call `connector.testConnection()`
(was `introspect(dryRun)`), looker/notion/metabase hit their auth
endpoints, and dbt/metricflow/lookml run `git ls-remote` via the
existing `testRepoConnection` helper. Unknown drivers exit 1 with a
listing of supported ones.
* feat(cli): add `ktx connection test --all` summary list
Tests every configured connection in parallel and renders a single
Clack-style list (◇/│/◆/└, green ✓ / red ✗) consistent with sl list,
with per-row detail and a passed/failed footer. Exits non-zero if any
connection fails. Single-id `ktx connection test` output is preserved.
* fix(cli): read metabase status url from api_url
`ktx status` was probing `url` / `base_url` on metabase connections, but
ktx.yaml stores it as `api_url`, so the field always reported "url not
set". Read `api_url` directly and align the warning text with the actual
key.
* refactor(context): validate ktx.yaml with Zod and surface issues in status
- Replace hand-rolled ktx.yaml parsing with a strict Zod schema and
derive KtxProjectConfig types from it.
- Add validateKtxProjectConfig returning structured KtxConfigIssue[]
with migration hints for deprecated keys (ingest.llm,
scan.enrichment.backend, etc.).
- Wire ktx status/doctor to run validation, render schema issues in
plain and JSON output, and add a Config row to project status.
- Update the orbit example to camelCase scan.relationships keys to
match the schema.
* fix(context): tolerate legacy setup.completed_steps and optional driver
- Accept and drop the legacy setup.completed_steps field so existing
ktx.yaml files migrated from older versions still load.
- Make connections.<id>.driver optional in the schema; runtime code
already produces a clear "no driver" error at use time.
* feat(cli): add ktx status --validate to run only ktx.yaml schema validation
- New --validate flag dispatches a focused runKtxDoctor 'validate' branch
that reads ktx.yaml, runs validateKtxProjectConfig, and skips LLM,
connection, embedding, and query-history checks.
- Plain output prints a single Config row; JSON output emits
{ok: true} on success or the existing invalid_config / missing_project
shapes on failure.
* fix(llm): wire prompt caching through all Anthropic call sites
- page-triage classifier + light-extraction now put the static skill
prompt in `system:` so the per-document caches hit instead of
re-sending boilerplate in the user message every call.
- Description generation builders return `{ system, user }` with
instruction text + word limit moved into the cacheable system.
- Relationship-LLM proposal framing moved to `system:`.
- `KtxMessageBuilder.wrapSimple` skips the history breakpoint for
single-message calls (cache write that could never be reused).
- Gateway backend now sets `anthropic-beta: extended-cache-ttl-2025-04-11`
so 1h TTLs don't silently downgrade to 5m on Gateway routes.
* fix(llm): keep wrapSimple history breakpoint so multi-step agent loops cache
Reverts the wrapSimple `messages.length > 1` guard from the prior commit.
agent-runner uses wrapSimple with a single user message, but generateText
runs a multi-step tool loop inside it — the cache marker on the first user
message is reused by every subsequent step, so it isn't waste.
The release validator (scripts/validate-llm-debug-jsonl.mjs) also requires
a `message-part` marker target in captured debug JSONL.
* docs: add CLI component reuse guidance
* docs: add unified ingest ux design
* Refine unified ingest UX design after adversarial review iteration 1
* Refine unified ingest UX design after adversarial review iteration 2
* Refine unified ingest UX design after adversarial review iteration 3
* feat(cli): route public connection ingest command
* feat(cli): hide standalone scan from public help
* feat(cli): plan public ingest depth and query history
* feat(cli): execute public database ingest facets
* feat(ingest): read connection query history config
* fix(cli): use public ingest wording
* fix(config): stop generating ingest adapter allow lists
* docs: document public ingest command
* test: align ingest surface expectations
* docs: add unified ingest public CLI surface plan
* feat(cli): preflight deep public ingest readiness
* feat(setup): store query history in connection context
* feat(setup): store database context depth
* feat(setup): verify context readiness by database depth
* fix(setup): keep context build foreground only
* fix(config): reject reserved ingest connection ids
* test: close unified ingest v1 expectations
* docs: add unified ingest v1 closure plan
* fix(ingest): bypass adapter allow-list for public source ingest
* fix(ingest): honor query history window intent
* fix(ingest): hide scan internals from public database ingest
* feat(ingest): use foreground view for interactive public ingest
* fix(setup): use schema context and query history wording
* test(cli): verify unified ingest public output
* docs: add unified ingest v1 public output closure plan
* fix(setup): forward query history flags
* fix(setup): prompt for postgres query history
* fix(status): report query history readiness
* fix(ingest): remove legacy public guidance
* fix(ingest): polish foreground retry copy
* docs(examples): use unified query history wording
* chore(ingest): finish public query history cleanup
* docs: add unified ingest v1 query history status cleanup plan
* test(docs): cover unified ingest public docs
* docs: align ingest CLI reference with unified UX
* docs: update context build guides for unified ingest
* docs: update setup and primary source ingest wording
* docs: stop advertising adapter-backed example ingest
* docs: close unified ingest public docs gaps
* docs: add unified ingest v1 docs site closure plan
* fix: render unified ingest foreground warnings
* fix: explain query history schema order
* fix: add public ingest retry guidance
* fix: align setup next steps with unified ingest
* fix: remove scan wording from demo progress
* test: verify unified ingest ux closure
* docs: add unified ingest v1 foreground and retry closure plan
* fix(cli): preserve query-history pull config in public ingest
* fix(cli): omit hidden commands from docs command tree
* test(cli): close unified ingest final public surface checks
* docs: add unified ingest v1 final public surface closure plan
* fix(cli): use public source labels in ingest reports
* fix(cli): suppress low-level public ingest output
* test(cli): verify unified ingest public plain output
* docs: add unified ingest v1 public plain output closure plan
* fix(cli): add public ingest copy sanitizers
* fix(cli): sanitize public ingest progress copy
* fix(cli): rename setup schema scope prompt
* docs(plan): add progress copy closure; test: align setup back-nav fixture
Adds the iter9 plan and updates the setup back-navigation test fixture
to pass disableQueryHistory plus listSchemas/listTables stubs that the
unified ingest setup step now requires.
* docs(plan): add final ux labels plan with narrowed label scans
* fix(cli): aggregate unsupported query-history warnings
* fix(cli): align setup database labels
* test(cli): fix setup database test type-check
* fix(cli): remove primary-source wording from setup output
* test(cli): verify unified ingest setup closure
* docs(plan): add unified ingest v1 verification copy closure plan
* fix(cli): remove top-level scan command
* fix(cli): remove legacy ingest and wiki commands
* Merge scan into ingest flow
* feat(cli): split ingest progress into per-phase rows, rename work units to tasks
Each database target in the unified ingest dashboard now renders one row per
real subprocess (Schema, then Query history when enabled) instead of a single
combined bar. Each phase has its own monotonic 0-100% bar so the progress
never snaps back to zero when historic-sql starts after scan completes.
Completed phases keep their final bar, summary, and elapsed time visible as
an inline audit trail; queued and skipped phases are shown explicitly.
Also rename user-facing "work units" / "Failed work units" to "tasks" /
"Failed tasks" in ingest output and parseIngestSummary. The parser still
accepts the legacy "Work units:" wording in captured output for backward
compat. Internal memory-flow event names and type fields are left alone.
* Fix test harness failures
* Fix CI smoke checks
---------
Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
Move setup step completion state out of ktx.yaml into a gitignored local
state file so it is not committed or shared across machines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>