* feat(cli): define full warehouse dialect contract
* test(cli): keep dialect edge tests focused
* fix(cli): stabilize dialect contract foundation
* refactor(connectors): own read-only query preparation
* refactor(connectors): resolve dialects through registry
* refactor(connectors): keep concrete dialect classes internal
* chore(workspace): enforce dialect import boundary
* refactor(cli): resolve relationship dialect at scan boundary
* refactor(cli): use dialect display parsing for entity details
* refactor(cli): use dialect display parsing for warehouse catalog
* refactor(cli): use dialect SQL in relationship workflows
* test(cli): verify solid dialect scan workflow closure
* test: split cli tests from source tree
* refactor(cli): standardize BigQuery scope listing
* feat(sqlite): implement connector scope listing
* test(connectors): cover required table listing
* feat(cli): add warehouse driver registry
* refactor(setup): route scope discovery through driver registry
* refactor(cli): route local query execution through driver registry
* refactor(historic-sql): route dialect support through driver registry
* refactor(cli): test warehouse connections through driver registry
* fix(cli): close driver registry type export gaps
* Improve setup daemon diagnostics
* refactor(setup): centralize rail-prefixed diagnostics + query-history fallback
Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput
into clack.ts so the setup wizard, managed daemons, and embedding/agent steps
share one rail-formatted writer. setup-databases.ts also adds a
"disable query history and retry" option when the schema-context build fails
and query history is the likely culprit, surfaced via a new
failed-query-history-unavailable status.
* fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match
The setup picker's KtxTableListEntry was a 2-level { schema, name }, so
qualifiedTableId always wrote db.name into enabled_tables. When BigQuery,
Snowflake, or SQL Server later ran fast ingest, their introspect step filtered
the scope set with scopedTableNames(scope, { catalog: projectId|database, db })
— catalog was non-null on the introspect side but null in the scope refs, so
every entry was rejected, the live-database adapter staged zero table files,
and detect() failed with 'Adapter "live-database" did not recognize fetched
source output'.
Align the picker boundary with the canonical 3-level KtxTableRef:
- Add catalog: string | null to KtxTableListEntry.
- BigQuery/Snowflake/SQL Server listTables populate catalog from the
resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null.
- qualifiedTableId emits catalog.schema.name when catalog is non-null
(resolveEnabledTables already accepts the 3-part shape) and
schemasFromEnabledTables now goes through parseDottedTableEntry so it
recovers the schema correctly from both 2-part and 3-part entries.
- Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker
reuse.
Update listTables expectations in all seven connector tests and the setup /
picker test fixtures. Add a picker regression test that covers the
catalog-bearing round-trip (save + refine).
* fix(cli): allow debug telemetry under opt-out env
* docs: rewrite context-as-code as reviewing-context guide
Move the page from Concepts to Guides and rebuild around an interactive
review-loop diagram. Extract pan/zoom + fit-view controls into a shared
FlowCanvas wrapper and adopt it across all three docs diagrams.
* test: point examples-docs assertion at reviewing-context
Update the doc smoke test that read context-as-code.mdx to read the new
guides/reviewing-context.mdx path. The `ktx ingest --all --no-input`
assertion still holds; the rename was the only break.
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
Bare invocations now do the obvious thing instead of erroring out, and mode-as-subcommand patterns collapse into flags on the parent. No new top-level commands.
- `ktx ingest` (bare) ingests every configured connection. The `text` subcommand is gone; capture inline notes with `ktx ingest --text "..."` and files with `ktx ingest --file path` (use `-` for stdin). `--text`/`--file` reject a positional connection id; pass `--connection-id` to tag captured notes.
- `ktx connection` (bare) lists; `ktx connection test` (bare) tests every configured connection.
- `ktx wiki` and `ktx sl` flatten `list`/`search`: bare lists, with a `[query...]` positional searches (multi-word joined with spaces). `sl validate` and `sl query` stay as distinct verbs and now read `--connection-id` from the parent.
- `ktx mcp` (bare) prints daemon status.
Adds a shared `resolveConnectionSelection` helper consumed by ingest and connection test. Updates README, docs-site cli-reference and guides, next-steps strings, agent SKILL templates, and all affected tests. Per-package type-check, unit tests (605), smoke tests, and dead-code checks all pass.
* docs: rewrite Semantic Querying concept with imperative-vs-declarative diagram
Reframe semantic-layer-internals.mdx around the contract the semantic
layer offers an agent: declare what you want (a Semantic Query), KTX
figures out how to compute it. Replaces the old "Context-Aware SQL"
framing with a clear imperative-vs-declarative narrative.
Adds a React Flow component (semantic-layer-flow.tsx) that contrasts a
buggy 4-table agent-authored SQL (chasm trap, LEFT-JOIN-in-WHERE,
hardcoded DATE_TRUNC) against the chasm-safe per-fact CTE SQL the
planner actually emits, including the outer GROUP BY over the requested
dimensions. Both lanes converge into a shared warehouse node and each
SQL card now has parallel bullet notes (failures on the left, KTX
behavior on the right).
Side fixes bundled in:
- include the /ktx basePath in the favicon metadata so the icon resolves
under the production prefix
- migrate docs-site/middleware.ts to docs-site/proxy.ts (Next 16 rename)
- redirect / to /ktx/docs/getting-started/introduction so the apex docs
URL works
- add tests covering the apex redirect, the favicon basePath, and the
middleware-to-proxy rename
- propagate the Semantic Query terminology across the ktx-sl CLI
reference, the context-layer concept page, and the agent-clients /
primary-sources integration pages
* Fix CI dead-code failures
* docs-site: polish semantic-layer-internals code blocks and flow diagram
- Make CodeBlock a server component so children traverse synchronously
under React 19 RSC streaming; previously extractText returned "" in
dev SSR, leaving code blocks empty.
- Add custom JSON/YAML/SQL/code-like tokenizers with theme-aware token
classes; drop the colored file-glyph dot and gradient tab-head.
- Tighten tab-head: subtle grey background, smaller monospace filename
in muted grey, smaller rectangular language pill placed to the left
of the filename.
- Polish the React Flow semantic-layer diagram (controls, fit-view
padding, edge types).
* docs-site: annotate imperative SQL, add section anchor, drop ClickHouse
- Wire numbered red badges to each problematic span in the "Without KTX"
SQL with hover sync between SQL gutter, lines, and the notes list.
- Add #imperative-vs-declarative anchor on the flow section header so
the eyebrow link is shareable; reveals a # glyph on hover/focus.
- Align the compiled-SQL note dots to the first-line midpoint
(mt-[6px] instead of mt-1) so 4px dots sit at y=8 in a 16px line.
- Remove all ClickHouse references from docs-site (primary-sources,
quickstart, ktx-setup, contributing, agents-setup, mechanics test,
warehouse drivers in the flow diagram).
* test: drop ClickHouse contributing-docs assertion
Align the workspace-package mirror test with the ClickHouse removal
from docs-site (75907eb). The connector-clickhouse package still
exists in packages/, but contributing.mdx no longer lists it, so the
test that mirrored docs against the workspace was failing.
* fix(context): merge overlay columns onto manifest columns by name
composeOverlay was appending overlay columns to the manifest column list,
producing duplicate entries when dbt/metabase overlays declared a column
just to attach descriptions. The duplicates carried no `type`, so the
pydantic SourceDefinition rejected them at semantic-query time and broke
`ktx sl query` for every overlay-backed measure. Now overlay columns
match base columns by name (case-insensitive): same-name entries merge
onto the manifest (overlay fields win, type/role fall back to the base,
descriptions merge per source key) and only new names append.
* refactor(sl): split overlay columns from column_overrides and enforce TS/Python wire contract
Overlay sources now have two distinct collections: `columns:` for computed
columns (requiring `expr` + `type`) and `column_overrides:` for metadata
patches to inherited manifest columns. Composing or loading an overlay that
mixes the two — or references an unknown column — fails with a typed error.
Introduce `ResolvedSemanticLayerSource` / `resolvedSourceSchema` /
`toResolvedWire` as the strict shape sent to the Python engine, and add a
schema contract test that diffs Zod against the Pydantic JSON schema dumped
by `python -m semantic_layer dump-schema`. `SourceDefinition` is now
`extra="forbid"` on the Python side.
`loadAllSources` surfaces per-file load errors instead of swallowing them,
so validation/query paths can report manifest shard parse failures.
* fix(context): make scan description generation resilient and quiet
A transient sampleTable failure during ingest used to take out every
table in a connection: generateTableDescription returned a hardcoded
'Table not found' string into descriptions.ai, and KtxDescriptionGenerator
was constructed without a logger, so the failure left no trail anywhere.
- sampleTable / sampleColumn calls retry 3x with 200/400/800ms backoff,
honouring KtxScanContext.signal via a new KtxAbortedError.
- On retry exhaustion or missing capability, table generation falls back
to a metadata-only prompt built from column name / native type / comment
/ rawDescriptions. The column path follows the same rule -- call the
LLM when any of samples or rawDescriptions are available; skip only
when both are absent.
- Logger is now threaded from KtxScanContext into the generator. Failures
emit structured KtxScanWarning entries (new description_fallback_used
code, plus existing sampling_failed / enrichment_failed /
connector_capability_missing). ktx scan groups warnings by code so a
batch of identical failures collapses to one summary line plus sample.
- Returns null on failure instead of the 'Table not found' sentinel; the
manifest writer's existing guard already skips empty descriptions, so
schema YAML no longer carries misleading text. SCAN_MANAGED_DESCRIPTION_KEYS
already strips stale 'ai' on merge, so existing YAML clears on next run.
Also suppress AI SDK v6 'system in messages' warning: pull system messages
out of KtxMessageBuilder.wrapSimple's output via a new splitKtxSystemMessages
helper and pass them top-level to generateText (preserves cacheControl
providerOptions on the SystemModelMessage). Agent-runner's local
splitSystemPromptMessages dedupes onto the shared helper.
* test(docs): align examples-docs assertions with revamped docs
PR #103 (setup/guide doc revamp) reworded several CLI examples and
connection labels; the assertions in scripts/examples-docs.test.mjs
still referenced the pre-revamp wording and were failing in CI on main.
Update the regexes to match the post-revamp content:
- drop the `--json` flag from the sl-query example expectation
- move the `Driver:` / `Status: ok` probe to the connection reference,
which is where that output now lives (driver id is lowercase
`postgres`, not the display name `PostgreSQL`)
- drop the obsolete `Install \`uv\`...` troubleshooting line
- accept `<connectionId>` everywhere; the docs no longer use the
hyphenated `<connection-id>` form
- match the `warehouse` connection id used in the quickstart instead of
the `postgres-warehouse` id only used in the README and setup ref
* fix(sl): skip TS/Python schema contract test when uv is unavailable
The TypeScript checks CI job does not install uv or Python, so the
module-level `execFileSync('uv', ...)` in schemas.contract.test.ts threw
ENOENT and failed the suite. Wrap the schema dump in a try/catch and
guard the describe block with `describe.skipIf` so the test skips in
environments without uv. Local dev and any CI job that has uv on PATH
still runs the cross-language contract assertion.
* refactor(context): validate ktx.yaml with Zod and surface issues in status
- Replace hand-rolled ktx.yaml parsing with a strict Zod schema and
derive KtxProjectConfig types from it.
- Add validateKtxProjectConfig returning structured KtxConfigIssue[]
with migration hints for deprecated keys (ingest.llm,
scan.enrichment.backend, etc.).
- Wire ktx status/doctor to run validation, render schema issues in
plain and JSON output, and add a Config row to project status.
- Update the orbit example to camelCase scan.relationships keys to
match the schema.
* fix(context): tolerate legacy setup.completed_steps and optional driver
- Accept and drop the legacy setup.completed_steps field so existing
ktx.yaml files migrated from older versions still load.
- Make connections.<id>.driver optional in the schema; runtime code
already produces a clear "no driver" error at use time.
* feat(cli): add ktx status --validate to run only ktx.yaml schema validation
- New --validate flag dispatches a focused runKtxDoctor 'validate' branch
that reads ktx.yaml, runs validateKtxProjectConfig, and skips LLM,
connection, embedding, and query-history checks.
- Plain output prints a single Config row; JSON output emits
{ok: true} on success or the existing invalid_config / missing_project
shapes on failure.
* docs: add CLI component reuse guidance
* docs: add unified ingest ux design
* Refine unified ingest UX design after adversarial review iteration 1
* Refine unified ingest UX design after adversarial review iteration 2
* Refine unified ingest UX design after adversarial review iteration 3
* feat(cli): route public connection ingest command
* feat(cli): hide standalone scan from public help
* feat(cli): plan public ingest depth and query history
* feat(cli): execute public database ingest facets
* feat(ingest): read connection query history config
* fix(cli): use public ingest wording
* fix(config): stop generating ingest adapter allow lists
* docs: document public ingest command
* test: align ingest surface expectations
* docs: add unified ingest public CLI surface plan
* feat(cli): preflight deep public ingest readiness
* feat(setup): store query history in connection context
* feat(setup): store database context depth
* feat(setup): verify context readiness by database depth
* fix(setup): keep context build foreground only
* fix(config): reject reserved ingest connection ids
* test: close unified ingest v1 expectations
* docs: add unified ingest v1 closure plan
* fix(ingest): bypass adapter allow-list for public source ingest
* fix(ingest): honor query history window intent
* fix(ingest): hide scan internals from public database ingest
* feat(ingest): use foreground view for interactive public ingest
* fix(setup): use schema context and query history wording
* test(cli): verify unified ingest public output
* docs: add unified ingest v1 public output closure plan
* fix(setup): forward query history flags
* fix(setup): prompt for postgres query history
* fix(status): report query history readiness
* fix(ingest): remove legacy public guidance
* fix(ingest): polish foreground retry copy
* docs(examples): use unified query history wording
* chore(ingest): finish public query history cleanup
* docs: add unified ingest v1 query history status cleanup plan
* test(docs): cover unified ingest public docs
* docs: align ingest CLI reference with unified UX
* docs: update context build guides for unified ingest
* docs: update setup and primary source ingest wording
* docs: stop advertising adapter-backed example ingest
* docs: close unified ingest public docs gaps
* docs: add unified ingest v1 docs site closure plan
* fix: render unified ingest foreground warnings
* fix: explain query history schema order
* fix: add public ingest retry guidance
* fix: align setup next steps with unified ingest
* fix: remove scan wording from demo progress
* test: verify unified ingest ux closure
* docs: add unified ingest v1 foreground and retry closure plan
* fix(cli): preserve query-history pull config in public ingest
* fix(cli): omit hidden commands from docs command tree
* test(cli): close unified ingest final public surface checks
* docs: add unified ingest v1 final public surface closure plan
* fix(cli): use public source labels in ingest reports
* fix(cli): suppress low-level public ingest output
* test(cli): verify unified ingest public plain output
* docs: add unified ingest v1 public plain output closure plan
* fix(cli): add public ingest copy sanitizers
* fix(cli): sanitize public ingest progress copy
* fix(cli): rename setup schema scope prompt
* docs(plan): add progress copy closure; test: align setup back-nav fixture
Adds the iter9 plan and updates the setup back-navigation test fixture
to pass disableQueryHistory plus listSchemas/listTables stubs that the
unified ingest setup step now requires.
* docs(plan): add final ux labels plan with narrowed label scans
* fix(cli): aggregate unsupported query-history warnings
* fix(cli): align setup database labels
* test(cli): fix setup database test type-check
* fix(cli): remove primary-source wording from setup output
* test(cli): verify unified ingest setup closure
* docs(plan): add unified ingest v1 verification copy closure plan
* fix(cli): remove top-level scan command
* fix(cli): remove legacy ingest and wiki commands
* Merge scan into ingest flow
* feat(cli): split ingest progress into per-phase rows, rename work units to tasks
Each database target in the unified ingest dashboard now renders one row per
real subprocess (Schema, then Query history when enabled) instead of a single
combined bar. Each phase has its own monotonic 0-100% bar so the progress
never snaps back to zero when historic-sql starts after scan completes.
Completed phases keep their final bar, summary, and elapsed time visible as
an inline audit trail; queued and skipped phases are shown explicitly.
Also rename user-facing "work units" / "Failed work units" to "tasks" /
"Failed tasks" in ingest output and parseIngestSummary. The parser still
accepts the legacy "Work units:" wording in captured output for backward
compat. Internal memory-flow event names and type fields are left alone.
* Fix test harness failures
* Fix CI smoke checks
---------
Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>