Compare commits

...

74 commits
v0.5.0 ... main

Author SHA1 Message Date
Luca Martial
bf1fe9748e
docs: minor README and docs-site touch-ups (#266)
- Link the Y Combinator badge and the docs "by Kaelio" label
- Add a maintainer line to the README
- Set the npm author field on @kaelio/ktx

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:32:08 -04:00
Andrey Avtomonov
698efdcef8
feat(cli): add channel-aware update notifier (#265)
* feat(cli): show cached update notices after commands

* docs(cli): describe update notices

* fix(cli): type update check environment

* fix(cli): decouple update notice display from refresh and harden suppression

Display a cached "update available" notice based solely on the lastNoticeAt
24h throttle, independent of checkedAt refresh freshness, matching the design's
independent display/refresh decisions. Suppress the check unconditionally under
--json, CI, and non-TTY before consulting output-mode preferences, so a
KTX_OUTPUT=pretty override can no longer make CI/non-TTY contexts phone npm.
2026-06-06 10:42:10 +02:00
Luca Martial
377f21acd7
docs: add serving-phase diagram to the introduction page (#264)
* feat(docs): add serving-phase diagram to the introduction page

The introduction's "How ktx works" section described both the ingest and serve sides but only rendered the ingestion diagram. Add a live, theme-aware React Flow diagram for the serving phase (agent <-> ktx via MCP -> context layer + database) so both phases are shown, with a matching content test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(diagram-studio): relabel context edge and use right-angle routing

The hub->context edge searches and reads definitions, not just searches; relabel it "search + read". Route the serving search/read-only edges with smoothstep (right angles) to match the docs diagram. (The README PNG is a baked export and is unchanged until re-exported from the studio.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(docs): point product-mechanics assertions at the FlowCanvas wrapper

product-mechanics renders via the shared FlowCanvas wrapper, so the ReactFlow config (nodesDraggable, zoomOnScroll, etc.) lives there now. Update the stale assertions that still expected those literals inline, fixing a pre-existing test failure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(serving-diagram): shrink the boxes and drop OpenCode from the agent list

Reduce node dimensions, font sizes, padding, and the canvas height so the serving diagram renders ~25% smaller and more compact. Remove OpenCode from the agent's listed clients.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 19:22:45 -04:00
Luca Martial
d3e20df1d5
fix(docs-site): stop doubling the /ktx basePath on alias-host redirects (#263)
ktx.sh/ and docs.ktx.sh/ redirected to
https://docs.kaelio.com/ktx/ktx/docs/... (note the doubled /ktx) and 404'd.

The host-agnostic `source: "/"` redirect ran before the alias-host
canonicalizers, so it injected the /ktx basePath into the path on the alias
domains, which the alias catch-all then prepended a second time.

Reorder redirects() so alias-host canonicalization runs first, leaving the
generic root/docs rules for the local/canonical host only. The /stars
exclusion stays because redirects run before beforeFiles rewrites.

Add Host-spoofing regression tests (the prior tests only used localhost,
which never exercised the alias-host rules) and remove the vestigial
website/vercel.json, which the live ktx.sh routing already bypasses.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 15:05:22 -04:00
github-actions[bot]
d14227468b chore: refresh star history chart [skip ci] 2026-06-05 18:44:32 +00:00
Andrey Avtomonov
fb7b94b60e
feat(telemetry): collect PostHog $exception error reports in CLI and daemon (#262)
* feat(telemetry): add node exception reporter

* feat(telemetry): report node cli exceptions

* feat(telemetry): add daemon exception reporter

* feat(telemetry): report daemon exceptions

* docs(telemetry): document error reports

* fix(telemetry): pass redaction snapshots from node call sites

* test(telemetry): verify prepared node exception payload

* fix(telemetry): close daemon exception lifecycle gaps

* test(telemetry): verify prepared daemon exception payload

* test(telemetry): close error collection acceptance gaps

* test(telemetry): close posthog exception acceptance gaps
2026-06-05 19:36:21 +02:00
Andrey Avtomonov
c3d8cedb0b
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor

* feat(cli): wire ingest rate-limit config

* feat(cli): report provider rate-limit signals

* feat(cli): show ingest rate-limit waits

* fix(cli): complete rate-limit event coverage

* fix(cli): abort ingest provider calls cleanly

* fix(cli): propagate ingest cancellation

* fix(cli): reject pre-aborted ingest rate-limit waits

* fix(cli): honor Claude rate-limit reset waits

* fix(cli): retry thrown Codex rate-limit failures

* fix(cli): type Claude rate-limit result details

* fix(cli): emit ingest rate-limit countdowns from rejected signals

* fix(cli): report ai sdk rate-limit header utilization

* fix(cli): gate LLM rate-limit retries on the governor budget

The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.

Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.

Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
github-actions[bot]
5a8821073b chore: refresh star history chart [skip ci] 2026-06-04 18:53:21 +00:00
Andrey Avtomonov
ec7edf8f50
fix(telemetry): preserve driver error class and code in connection_test (#260)
Native connector test failures were flattened to `new Error(message)`,
collapsing every driver's error class to `Error` and dropping `.code` /
`.number`. connection_test telemetry could therefore not tell a SQL Server
login rejection (ELOGIN / 18456) apart from a network or TLS error, and the
only field that varied was a raw message.

Connectors now return `connectorTestFailure(error)`, which preserves the
original driver error as `cause`, and `testNativeConnection` re-throws that
cause. `scrubErrorClass` then records the real class (e.g. ConnectionError)
and `formatErrorDetail` keeps the code prefix (e.g. "ELOGIN: ..."). The
helper is the single source of truth for the failure shape across all seven
native connectors. User-facing terminal output is unchanged.
2026-06-04 14:51:14 +02:00
Andrey Avtomonov
c2beaf7d55
feat(setup): wizard prompt tweaks and quieter query-history filter output (#259)
Setup wizard flow tweaks:
- Add a reveal-tail password prompt (reveal-password-prompt.ts) that unmasks
  the last few characters of a typed/pasted secret, and wire it into the setup
  prompt adapter in place of clack's password(); adds the @clack/core dep.
- Reorder wizard select options: surface "Paste a key" before the
  environment-variable option across embeddings/models/sources, promote
  Metabase/Notion in the source list, put Git URL before Local path, reorder
  the Notion crawl-mode choices, and relabel the sources "Done" action.

Query-history filter picker output:
- Collapse the per-template parse-failure lines into a single count in the
  setup output and route the full template-id list to --debug stderr.
- Model parse failures as a structured parseFailedTemplateIds field instead of
  warning strings.
- Add a privacy-safe query_history_filter_completed telemetry event
  (counts/enums only), mirrored into the Python daemon schema.
2026-06-04 14:11:08 +02:00
github-actions[bot]
8eb1cd3e79 chore: refresh star history chart [skip ci] 2026-06-04 07:45:37 +00:00
semantic-release-bot
7ba948a135 chore(release): 0.9.0 [skip ci]
## [0.9.0](https://github.com/Kaelio/ktx/compare/v0.8.0...v0.9.0) (2026-06-03)

### Features

* add codex llm backend for ktx runtime work ([#253](https://github.com/Kaelio/ktx/issues/253)) ([494618a](494618ab14))
* **cli:** consistent connection setup recovery and build-time gate ([#257](https://github.com/Kaelio/ktx/issues/257)) ([ce1516b](ce1516b357))
* **cli:** guide next action at end of ktx setup, not reruns ([#256](https://github.com/Kaelio/ktx/issues/256)) ([45aa95d](45aa95d2cc))
* **cli:** stream plain ktx ingest progress to stderr (KLO-726) ([#251](https://github.com/Kaelio/ktx/issues/251)) ([13774bf](13774bfcef))
* **query-history:** scope mining to modeled schemas by default ([#258](https://github.com/Kaelio/ktx/issues/258)) ([e70ae1e](e70ae1e63b))
* **telemetry:** include error details for failures ([#254](https://github.com/Kaelio/ktx/issues/254)) ([6da8c34](6da8c3452a))

### Bug Fixes

* **ingest:** recover textual-conflict gate failures; fix query-history adapter ([#255](https://github.com/Kaelio/ktx/issues/255)) ([f5dea9a](f5dea9a089))

### Other Changes

* refresh star history chart [skip ci] ([9d3a0b7](9d3a0b751d))
* refresh star history chart [skip ci] ([74c6076](74c6076b72))
* refresh star history chart [skip ci] ([d01abe6](d01abe6f3c))
* revert repo references to Kaelio/ktx and remove rename-resilience ([#252](https://github.com/Kaelio/ktx/issues/252)) ([41e20c9](41e20c9ce7)), closes [#250](https://github.com/Kaelio/ktx/issues/250) [#250](https://github.com/Kaelio/ktx/issues/250)
2026-06-03 21:50:59 +00:00
Andrey Avtomonov
e70ae1e63b
feat(query-history): scope mining to modeled schemas by default (#258)
* feat(query-history): structure SQL analysis table refs

* feat(query-history): qualify SQL analysis table refs

* feat(query-history): wire modeled scope floor through ingest

* chore(query-history): verify scope floor

* test(query-history): align daemon SQL batch endpoint contract

* feat(query-history): build scope from same-run scan catalog

* feat(query-history): fail open on scope-floor catalog failures

* chore(query-history): verify scope-floor v1 closure

* refactor(query-history): share scope membership

* feat(setup): apply derived query history filters

* docs: document derived query history filters

* fix(query-history): redact filter picker LLM prompt SQL

* fix(setup): run filter picker SQL analysis through managed daemon

* chore(query-history): verify filter picker v1 closure

* fix(query-history): fail open on partial service-account attribution

* fix(query-history): aggregate BigQuery users by execution count

* fix(query-history): aggregate Snowflake users by execution count

* fix(query-history): use BigQuery query info hash
2026-06-03 17:19:42 +02:00
Andrey Avtomonov
ce1516b357
feat(cli): consistent connection setup recovery and build-time gate (#257)
* feat(cli): block context build when a required connection fails its live test

A context build can take several minutes, so a connection that is
unreachable or misconfigured should stop the build up front instead of
failing partway through. Before the build starts, run a live connection
test for every primary- and context-source connection the build depends
on.

Each test's output is captured in a discarded buffer so raw error text
(and database paths) never reach the user; failures are surfaced only by
connection id and connector type, with a pointer to `ktx connection test
<id>` for the underlying error.

- Interactive setup lets the user fix the connection and retry without
  restarting, re-resolving targets so an added/removed/reconfigured
  connection is honored.
- `--no-input` exits non-zero and writes a failed context state with a
  failureReason, so scripts stop early and setup never reads as ready.

Extract the buffered command IO helper out of setup-databases into
src/io/buffered-command-io.ts so both call sites share one implementation.

* feat(cli): use recovery primitive for database setup

* feat(cli): use recovery primitive for source setup

* docs: document setup connection recovery

* fix(cli): close database recovery gaps

* fix(cli): target failing project in gate hint and preserve missing-input

Address two review findings on the connection-recovery work:

- The connection-gate failure hint emitted `ktx connection test <id>` with no
  --project-dir, so a setup run started with `--project-dir ./analytics` pointed
  users at cwd/KTX_PROJECT_DIR instead of the project that just failed. Emit the
  resolved project dir, matching the contextBuildCommands convention.

- The non-interactive database configure path returned `cancelled`, which the
  recovery primitive collapses to `failed`. Sibling paths still report
  `missing-input` for absent flags, so incomplete-flag runs were
  indistinguishable from real connection failures. The database wrapper now
  tracks the configure missing-input signal and restores the `missing-input`
  step status; the shared primitive keeps its four outcomes.
2026-06-03 11:08:46 +00:00
Andrey Avtomonov
f5dea9a089
fix(ingest): recover textual-conflict gate failures; fix query-history adapter (#255)
* fix(ingest): recover textual-conflict gate failures; fix query-history adapter

Two latent gaps in the isolated-diff local-ingest pipeline that can abort an
otherwise-successful ingest:

- Metabase: when a work-unit patch hit both a textual conflict and a post-merge
  dangling sl_ref, the after-textual-resolution branch returned a hard
  semantic_conflict and rolled back the whole job. It now runs the same
  repairGateFailure recovery the clean-apply branch already uses (re-validate,
  then commit the union of resolved + repaired paths), reaching parity.

- Query history: the historic-sql adapter was registered only when ktx.yaml had
  context.queryHistory.enabled=true, so `--query-history` threw "Adapter not
  available for local ingest". Registration now resolves the dialect from driver
  capability, since the explicit --query-history request is itself the opt-in;
  the config-gated helper is unchanged for status/setup/probes.

Adds the previously-missing tests for both paths.

* chore: sync uv.lock to 0.8.0 (regenerated with pinned uv 0.11.11)

* fix(ingest): drop ktx's own scan probes and dedup tables in query history

Query history (historic-sql) mined two kinds of noise back into context:

- ktx's own warehouse scan emits relationship- and column-profiling probes
  (the relationship_profile_values aggregation and the child_values/parent_values
  FK-overlap CTEs) into pg_stat_statements. shouldDropBySql now filters these
  ktx-owned, dialect-stable signatures so ktx introspection is not ingested as
  usage history.

- The same physical table appears both bare (accounts, via search_path) and
  schema-qualified (orbit_raw.accounts), producing duplicate per-table work
  units. canonicalizeTableIdentifiers collapses a bare name into its unique
  qualified form before work-unit keying; ambiguous names are left untouched.

On the orbit demo this removes ~35% of sampled query templates (ktx self-probes)
and ~45 duplicate per-table work units.

* docs(agents): add Design Reasoning Defaults section
2026-06-03 13:05:59 +02:00
github-actions[bot]
9d3a0b751d chore: refresh star history chart [skip ci] 2026-06-03 07:50:39 +00:00
Andrey Avtomonov
45aa95d2cc
feat(cli): guide next action at end of ktx setup, not reruns (#256)
Re-running setup was the dominant action for installs that completed setup but never ingested. Classify completion (incomplete | needs-context | needs-agents | ready) and drive one obvious next action per state: route a config-complete project straight to the build, point unbuilt-context users at `ktx ingest` instead of re-running setup or dropping to a bare shell, and confirm readiness for fully-set-up projects rather than reopening the edit menu.
2026-06-03 01:00:21 +02:00
Andrey Avtomonov
cb6a67c2d7 Make telemetry reliable across interrupts and headless installs
Three reliability gaps surfaced while auditing why PostHog numbers were
untrustworthy:

1. Interrupted commands lost their events. capture() is fire-and-forget and the
   only flush guarantee lived in a finally block, which SIGINT/SIGTERM skip — so
   Ctrl-C'ing a long ingest or an MCP client killing 'ktx mcp stdio' dropped the
   command event and any queued events. Add SIGINT/SIGTERM handlers (real-process
   entry only; never under test/programmatic io) that mark the active command
   span aborted, emit it, drain the emitter, then exit. Idempotent with the
   normal finally path via the single-consume command span.

2. Headless-first installs were invisible. loadTelemetryIdentity refused to mint
   an installId unless stdout was a TTY, so a machine whose first run was an
   IDE-launched MCP server or a script emitted nothing, ever. Mint on first run
   regardless of surface (still honoring CI/DO_NOT_TRACK/KTX_TELEMETRY_DISABLED),
   writing the one-time notice to stderr — safe under the MCP stdio protocol,
   which reserves stdout. Drop the now-unused stdoutIsTTY option.

3. No guard against silent emit regressions (the 0.7.0 scan_completed blackout).
   Add tests: the shared executePublicIngestTarget chokepoint emits exactly one
   ingest_completed on success and on the preflight-failure branch, and a
   database target invokes the scan that emits scan_completed; plus coverage for
   the aborted-flush helper.

Identity is unchanged otherwise: every event still attributes to the installId
in ~/.ktx/telemetry.json. No event/field changes, so Node<->Python schema parity
is untouched. Docs updated to reflect first-run-on-any-surface activation.
2026-06-02 23:19:37 +02:00
Andrey Avtomonov
2334a4b6e3 Emit ingest_completed once per target on every ingest path
emitIngestCompleted was called only in runKtxPublicIngest's plain/json loop,
so the foreground 'ktx ingest' view and all of 'ktx setup' — which delegate to
runContextBuild -> executePublicIngestTarget — never emitted the event. That
left ingest_completed near-useless for measuring ingestion.

Move the emit into executePublicIngestTarget, the single per-target chokepoint
every entrypoint funnels through: a thin wrapper now captures timing, runs the
existing steps (extracted to runIngestTargetSteps), and emits exactly once. The
telemetry echo targets deps.runtimeIo (the real user stream) so a capture
buffer used for step output doesn't swallow it. Thread project through the
context-build call site. No schema/field changes, so Node<->Python telemetry
parity is unaffected.

Add tests: the shared chokepoint emits exactly one ingest_completed for any
caller, and a multi-target run emits one per target with no double-emit.
2026-06-02 20:03:27 +02:00
Andrey Avtomonov
6da8c3452a
feat(telemetry): include error details for failures (#254) 2026-06-02 17:23:51 +02:00
Andrey Avtomonov
494618ab14
feat: add codex llm backend for ktx runtime work (#253)
* feat: add codex sdk runner foundation

* feat: parse codex runtime events

* feat: expose codex runtime mcp tools

* feat: add codex llm runtime

* feat: wire codex llm backend

* test: avoid Array.fromAsync in codex runner test

* docs: document codex llm backend

* fix: tighten codex runtime config ownership

* fix: use codex sdk env and thread options

* fix: parse codex sdk event shapes

* test: add codex backend live smoke

* docs: clarify codex backend isolation

* fix: drive codex loop metrics from mcp events

* fix: enforce codex local step budget

* docs: disclose codex isolation limits

* fix: count all codex agent steps and stream step callbacks live

The agent-loop step budget only counted completed mcp_tool_call items, so
built-in command_execution steps (which the public Codex SDK/CLI surface can
still expose) never decremented the budget, letting ingest/reconciliation run
past stepBudget until Codex stopped on its own. onStepFinish was also replayed
only after the whole stream drained, so live work_unit_step / reconciliation
progress appeared stuck until the Codex process exited.

collectEvents is now the single live step accumulator: it counts every
completed agent-action item via a shared isCompletedAgentStep predicate
(command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish
as each step completes, and enforces the budget on that broader count. A
no-tool turn still counts as one step. toolFailures stays MCP-specific, since a
non-zero command exit is normal agent exploration, not a loop failure.

* test: align ingest llm-guard assertions with codex backend

The skip-llm ingest guard message now lists codex as a valid backend and
mentions a Claude Code/Codex session plus a codex setup hint, but this slow
suite test still asserted the pre-codex wording. Update it to match the
production message (already covered by the local-bundle-runtime unit test) and
add the codex setup-line assertion.

* fix: treat codex error:null tool calls as success

The Codex SDK serializes error: null on successful mcp_tool_call items, so
the failure check (item.error !== undefined) flagged every successful tool
call as failed with the empty-payload default "Codex turn failed". This
killed every ingest work unit under the codex backend before it could
produce a patch.

Key on status === 'failed' (authoritative, always set) and only treat a
populated error object as a failure. Add a regression test built from a
verbatim real-SDK event capture.

* fix: default codex backend to gpt-5.5 and report real probe errors

The previous default gpt-5.3-codex is an API-key-only model that the OpenAI
API rejects under ChatGPT-account (subscription) auth, so codex status/setup
failed with a misleading "authentication is not usable" message even though
auth was fine.

- Default codex model is now gpt-5.5 (works on both subscription and API-key
  auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and
  keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark).
- runCodexAuthProbe now distinguishes "model not available" from an auth
  failure and surfaces the real API error: collectEvents retains stream
  events when the SDK throws on a non-zero exit, and the API error JSON
  envelope is unwrapped to its human-readable message.
- The Codex isolation warning now renders inside the clack setup frame.
- Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth.

* fix: require llm.models.default in status and match codex probe remediation

Status reported a project ready when a non-none LLM backend was configured
without llm.models.default, but the runtime (resolveModelSlots) hard-requires
it, so ingest/scan/memory threw after `ktx status` said the project was usable.
buildLlmStatus now fails for any non-none backend missing models.default and no
longer invents a fallback model for claude-code/codex.

Codex probe failures now carry a category-matched fix: a model-access failure
steers the user at llm.models.default instead of the auth/install remediation.
runCodexAuthProbe returns the fix and status consumes it; the message stays
self-sufficient so setup output is unchanged.

Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx
states --llm-model only accepts codex/default or gpt-*/codex-* ids.

Repaired four doctor fixtures that configured a backend without models.default
(the now-correctly-blocked config) and added coverage for the new behavior.
2026-06-02 13:57:11 +02:00
github-actions[bot]
74c6076b72 chore: refresh star history chart [skip ci] 2026-06-02 07:46:46 +00:00
Andrey Avtomonov
41e20c9ce7
chore: revert repo references to Kaelio/ktx and remove rename-resilience (#252)
The GitHub repo was renamed back from Kaelio/ktx-ai-data-agents-context to Kaelio/ktx, reverting the URL changes from #250 across package metadata, CI (codecov + star-history slugs), issue/security templates, the release runbook, and docs/install commands.

Also removes the rename-resilience machinery #250 added: semantic-release now reads the repository URL straight from package.json (Kaelio/ktx) again, so the repositoryUrl() derivation in scripts/semantic-release-config.cjs, its tests, and the rename note in docs/release.md are no longer needed.
2026-06-02 00:14:43 +02:00
Andrey Avtomonov
13774bfcef
feat(cli): stream plain ktx ingest progress to stderr (KLO-726) (#251)
* feat(cli): share public ingest progress adapter

* feat(cli): stream plain public ingest progress

* test(cli): update plain ingest progress assertions

* chore(cli): satisfy plain ingest progress checks

* fix(artifacts): expect plain ingest stderr progress in installed-CLI smoke

* ci(coverage): make Codecov upload non-fatal and fix repo slug

The Coverage job failed because the Codecov upload returned
'Repository not found' while fail_ci_if_error was true, turning a
Codecov-side issue into a hard CI failure even though all tests pass.

- Set fail_ci_if_error: false on both uploads so Codecov outages or an
  unlinked repo no longer break CI (upload stays best-effort).
- Correct the stale slug Kaelio/ktx -> Kaelio/ktx-ai-data-agents-context
  to match the actual GitHub repo (aligns with main).

* fix(cli): isolate query-history failure capture from scan output

The plain public-ingest progress path passes one captured IO as the
target-level `io`. With progress deps set, both the schema scan and the
query-history ingest resolved their capture to that same shared buffer,
so a non-actionable query-history failure surfaced leftover scan report
text (e.g. "Mode: enriched") as the skipped-facet detail instead of the
real query-history message.

Give the query-history ingest a phase-local capture while preserving the
flow-to-io branch the foreground context-build view relies on.

---------

Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
2026-06-01 23:31:31 +02:00
github-actions[bot]
d01abe6f3c chore: refresh star history chart [skip ci] 2026-06-01 19:42:24 +00:00
semantic-release-bot
41cccc3448 chore(release): 0.8.0 [skip ci]
## [0.8.0](https://github.com/Kaelio/ktx-ai-data-agents-context/compare/v0.7.0...v0.8.0) (2026-06-01)

### ⚠ BREAKING CHANGES

* **cli:** remove fast mode; ktx ingest always builds enriched context (KLO-721) (#237)

### Features

* **cli:** profile ingest runs and split model vs tool time ([#249](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/249)) ([21744fc](21744fc520))
* **cli:** remove fast mode; ktx ingest always builds enriched context (KLO-721) ([#237](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/237)) ([3f0d11e](3f0d11e07d))
* **cli:** shell completion for commands, flags, and entity names ([#244](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/244)) ([d320d54](d320d54ab2)), closes [#243](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/243)
* README architecture diagrams + React Flow diagram studio ([#245](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/245)) ([ba5bb92](ba5bb92ab7))
* report MCP client telemetry ([#242](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/242)) ([2e5f7f2](2e5f7f25aa))
* **telemetry:** enable PostHog GeoIP enrichment ([#243](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/243)) ([95a2653](95a265323a))
* trim MCP query response payloads ([#240](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/240)) ([25f639f](25f639fba2))

### Bug Fixes

* **brand:** README lockup wordmark in Outfit to match docs-site ([#246](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/246)) ([1959f49](1959f493d6))
* **cli:** align Notion setup credential to --source-auth-token-ref ([#236](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/236)) ([637891f](637891f030))
* **cli:** treat artifact-producing ingests with failures as partial ([#238](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/238)) ([53a6f8d](53a6f8d111))
* **release:** point repository URLs at renamed GitHub repo ([#250](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/250)) ([41f5279](41f52797de))

### Documentation

* **ktx skill:** harden setup guidance from agent-driven demo run ([#247](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/247)) ([5faa16b](5faa16b32c))
* **readme:** add launch video to README hero ([#248](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/248)) ([22ddf55](22ddf5524c))

### Continuous Integration

* normalize star-history.svg trailing newline ([#241](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/241)) ([cbbcf8e](cbbcf8e8bd)), closes [#240](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/240)
* push star-history refresh to protected main with RELEASE_PAT ([#239](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/239)) ([ba06f70](ba06f7078a))
* refresh README star history chart twice daily ([08d08d8](08d08d8ea0))
* stop tombi reformatting uv.lock and sync lock to 0.7.0 ([#235](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/235)) ([8ebc4ce](8ebc4ce107))

### Other Changes

* refresh star history chart [skip ci] ([c196d1f](c196d1f192))
* refresh star history chart [skip ci] ([2058c26](2058c26e84))
* refresh star history chart [skip ci] ([54d6e87](54d6e87733))
* upgrade dependencies and tooling ([#232](https://github.com/Kaelio/ktx-ai-data-agents-context/issues/232)) ([d53cdac](d53cdac366))
2026-06-01 18:09:14 +00:00
Andrey Avtomonov
41f52797de
fix(release): point repository URLs at renamed GitHub repo (#250)
* fix(release): point repository URLs at renamed GitHub repo

The GitHub repo was renamed from Kaelio/ktx to
Kaelio/ktx-ai-data-agents-context. semantic-release reads repositoryUrl
from package.json's repository field and the @semantic-release/github
plugin failed verifyConditions with EMISMATCHGITHUBURL because it no
longer matched the live clone URL.

Update every Kaelio/ktx reference to the renamed repo: package metadata
(root + CLI repository/bugs/homepage), the codecov upload slugs and
star-history slug in CI, the issue-template and security-advisory links,
the release runbook, and all docs/install commands.

* fix(release): derive semantic-release repositoryUrl from the CI repo

@semantic-release/github exact-matches repositoryUrl against the live
GitHub clone_url (no redirect following), so any repo rename re-breaks the
release when repositoryUrl is the static package.json value.

Derive repositoryUrl from the runner's GITHUB_REPOSITORY/GITHUB_SERVER_URL
so it always tracks the current repo name. A future rename (including back
to Kaelio/ktx) now resolves with no code change. Outside CI the option is
omitted, so semantic-release falls back to package.json as documented.

The package.json repository field stays ktx-ai-data-agents-context as
npm-display metadata, decoupled from the release-time match.
2026-06-01 20:07:24 +02:00
Andrey Avtomonov
9133d243e8 Update demo warehouse URL 2026-06-01 16:44:41 +02:00
Andrey Avtomonov
21744fc520
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes

Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.

Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.

- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
  stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
  reconcile / squash phases and emit the profile in a finally block
  (best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx

* fix(cli): flush tool-call logs before reading ingest profile

Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
Andrey Avtomonov
22ddf5524c
docs(readme): add launch video to README hero (#248)
Add a clickable launch-video poster (linking to YouTube) directly after
the intro note and before the architecture diagrams. GitHub Markdown can
not embed a YouTube player, so the poster image links out instead.
2026-06-01 13:42:42 +00:00
Andrey Avtomonov
5faa16b32c
docs(ktx skill): harden setup guidance from agent-driven demo run (#247)
Fold field-tested fixes into the ktx skill, verified against current CLI source:

- prefer file: secret refs over env: (env: re-resolves per-process and resolves
  empty in later ingest/mcp shells)
- pass --skip-agents on data-only setup runs; explain the trailing agent step's
  misleading exit 1 on otherwise-successful runs
- dbt ignores --source-warehouse-connection-id (maps by table name); required
  only for Metabase/Looker/LookML
- never go silent during slow setup/ingest: poll .ktx mtimes and post progress
  so a long run does not look stuck
- judge readiness from verdict, connections[].status, localStats.semanticLayer
  and wikiPages; perConnection under-reports
- add troubleshooting entries for the 'Run in a TTY' exit 1 and secrets that
  resolve empty only during ingest/mcp
2026-06-01 12:08:58 +00:00
Andrey Avtomonov
1959f493d6
fix(brand): README lockup wordmark in Outfit to match docs-site (#246) 2026-06-01 11:18:37 +00:00
Andrey Avtomonov
ba5bb92ab7
feat: README architecture diagrams + React Flow diagram studio (#245)
Replace the tall portrait README ingestion SVG with two landscape
diagrams — "1 · Ingestion" (build the context layer) and "2 · Serving"
(agents query it through MCP) — wired in as transparent 2x PNGs that
read on GitHub light and dark.

Add docs-site/diagram-studio: a static React Flow page with custom
themed nodes and the inlined ktx mascot that renders both diagrams and
exports them to PNG via html-to-image (the diagrams' reproducible
source). Remove the superseded ingestion-flow SVGs.
2026-06-01 12:06:27 +02:00
Andrey Avtomonov
d320d54ab2
feat(cli): shell completion for commands, flags, and entity names (#244)
* feat(completion): complete known argument values

* fix(completion): hide Commander-hidden subcommands from completions

Replace the `__`-prefix name heuristic with Commander's `_hidden` flag so
internal subcommands registered with { hidden: true } (e.g. `mcp serve-internal`)
are excluded from completions, mirroring `ktx --help`.

* test: cover wiki and sl read command routing

* test: cover raw wiki and sl reads

* feat: add wiki read command

* feat: add sl read command

* feat: complete read command entity names

* docs: document wiki and sl read commands

* test: include read commands in command tree

* feat(sl): read and validate unique sources by name

* feat(sl): make read and validate connection id optional

* fix(completion): dedupe semantic source names

* docs(sl): document connection-optional read and validate

* fix(sl): require connection id for query command

* docs(sl): clarify query connection requirement

* fix(completion): don't resolve option values as subcommands

resolveCommand skipped flag tokens but not the value consumed by a
value-taking option in the `--flag value` form, so a connection id like
`query` was matched as the `sl query` subcommand and yielded no `sl`
completions. Track value-taking options and skip their consumed value
before matching subcommands.

* test(telemetry): assert first-run notice via TELEMETRY_NOTICE constant

CI (which tests this branch merged with main) failed because #243 changed
the first-run notice wording in identity.ts (dropped "anonymous") but left
this test grepping for the old literal 'ktx collects anonymous usage data',
so indexOf returned -1. Assert against the exported TELEMETRY_NOTICE
constant instead so the test tracks the source of truth and cannot drift
when the notice text changes again.
2026-05-31 23:44:33 +02:00
github-actions[bot]
c196d1f192 chore: refresh star history chart [skip ci] 2026-05-31 18:29:55 +00:00
github-actions[bot]
2058c26e84 chore: refresh star history chart [skip ci] 2026-05-30 18:28:06 +00:00
Andrey Avtomonov
95a265323a
feat(telemetry): enable PostHog GeoIP enrichment (#243)
Set disableGeoip: false on the CLI telemetry client so events are enriched with approximate, IP-based location at ingest. Update the first-run notice, public telemetry docs, and the AGENTS telemetry policy to drop the prior "anonymous" wording to match.
2026-05-30 18:33:14 +02:00
Andrey Avtomonov
2e5f7f25aa
feat: report MCP client telemetry (#242) 2026-05-30 18:00:25 +02:00
Andrey Avtomonov
25f639fba2
feat: trim MCP query response payloads (#240) 2026-05-30 17:54:24 +02:00
Andrey Avtomonov
cbbcf8e8bd
ci: normalize star-history.svg trailing newline (#241)
The star-history refresh workflow committed the API's SVG verbatim, but the
response has no trailing newline. Because the refresh commit uses [skip ci],
the file never ran end-of-file-fixer at commit time, so pre-commit's
`--all-files` run failed end-of-file-fixer on every open PR (e.g. #240), even
PRs that never touched the file.

Normalize the downloaded SVG to exactly one trailing newline in the workflow
(idempotent, so the "unchanged" guard still works), and fix the currently
committed file so open PRs go green now.
2026-05-30 17:44:27 +02:00
github-actions[bot]
54d6e87733 chore: refresh star history chart [skip ci] 2026-05-30 14:02:55 +00:00
Andrey Avtomonov
ba06f7078a
ci: push star-history refresh to protected main with RELEASE_PAT (#239)
The scheduled star-history workflow checked out with the default
GITHUB_TOKEN, so its git push to main was rejected by the branch
protection hook (GH006). Check out with RELEASE_PAT instead, matching
release.yml, whose semantic-release step already pushes to the protected
main branch with the same token.
2026-05-30 16:01:47 +02:00
Andrey Avtomonov
08d08d8ea0 ci: refresh README star history chart twice daily
Point the README chart at a committed assets/star-history.svg instead of
the star-history API URL so GitHub serves it directly and bypasses the Camo
proxy cache. A scheduled workflow regenerates the SVG at 06:00/18:00 UTC,
busting star-history's server-side cache, and commits it when it changes.
2026-05-30 12:07:15 +02:00
Andrey Avtomonov
53a6f8d111
fix(cli): treat artifact-producing ingests with failures as partial (#238)
* fix(cli): derive ingest outcomes from saved artifacts

* fix(cli): treat artifact-producing ingests with failures as partial

* fix(cli): route memory-flow run status through shared ingest outcome

* fix(cli): treat partial ingest as saved context in setup status

* test(cli): align memory-flow replay expectations with partial ingests
2026-05-30 00:42:59 +02:00
Andrey Avtomonov
3f0d11e07d
feat(cli)!: remove fast mode; ktx ingest always builds enriched context (KLO-721) (#237)
Fast mode (the ktx ingest --fast/--deep database-ingest depth toggle) is removed.
ktx ingest now always builds the full enriched ("deep") context. There is no
structural fallback: a database connection without a configured model and
embeddings fails the enrichment-readiness preflight before any work runs, with
a 'Run ktx setup to configure a model and embeddings' hint.

- Remove --fast/--deep flags, the per-connection context.depth field, and the
  ktx setup depth prompt (delete setup-database-context-depth.ts).
- Rename ingest-depth.ts -> connection-drivers.ts; ingest always requests scan
  mode 'enriched'; readiness gate (enrichmentReadinessGaps) runs for every
  database target.
- Drop the database-context-depth telemetry step (Node + Python schema mirrors
  regenerated).
- Update CLI, setup, context-build view, docs, the public ktx skill, and the
  release-smoke / artifacts scripts (now assert the no-LLM guard failure).

ktx status --fast (a separate network-probe flag) is unchanged.

Follow-ups: KLO-726 (live progress for ktx ingest --all), KLO-727 (restore
credentialed successful-ingest release smoke coverage).
2026-05-29 17:41:04 +02:00
Andrey Avtomonov
637891f030
fix(cli): align Notion setup credential to --source-auth-token-ref (#236)
Notion's setup path read --source-api-key-ref while writing the auth_token_ref
config field, so --source-auth-token-ref was silently dropped. Align Notion to
the flag=field convention every other connector follows: it now reads
--source-auth-token-ref, and --source-api-key-ref becomes Metabase-only.

Also add validation rejecting any credential-ref flag not applicable to the
chosen --source, with a pointer to the correct flag, closing the silent-drop
class for all connectors.

Update CLI-reference docs, the ktx skill Notion example, and tests.

Fixes KLO-724.
2026-05-29 17:23:46 +02:00
Andrey Avtomonov
8ebc4ce107
ci: stop tombi reformatting uv.lock and sync lock to 0.7.0 (#235)
The pre-commit job failed because tombi-format reformats uv.lock to a
layout uv does not produce, so once CI's uv sync re-resolved the stale
lock (workspace members still at 0.6.0) and rewrote it, tombi rewrote it
back and the hook reported a modified file.

Exclude uv.lock from tombi-format so uv stays authoritative for its
generated lockfile, and bump the workspace members to 0.7.0 so the lock
is current and uv stops re-resolving it (uv lock --check now passes).
2026-05-29 15:04:48 +02:00
Andrey Avtomonov
0a517b2c13
skill: document adding context sources; docs: one-shot full-demo path (#234)
- skills/ktx/SKILL.md: add an "Add context sources" section with the generic
  `ktx setup --source ...` flags per connector (dbt, Metabase, Notion, ...),
  warehouse mapping, the --metabase-database-id discovery note, and the
  `ktx ingest` follow-up. The skill previously only documented database setup
  with --skip-sources, so agents couldn't wire up dbt/Metabase/Notion (KLO-723).
- docs-site quickstart: the kaelio.com/start callout now points at the
  "copy agent setup" one-shot prompt that installs the full four-source demo.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 15:02:57 +02:00
Andrey Avtomonov
d53cdac366
chore: upgrade dependencies and tooling (#232)
* chore: upgrade dependencies and tooling

* chore: upgrade dependencies and tooling
2026-05-29 11:56:55 +02:00
semantic-release-bot
ed8f523362 chore(release): 0.7.0 [skip ci]
## [0.7.0](https://github.com/Kaelio/ktx/compare/v0.6.0...v0.7.0) (2026-05-28)

### Features

* **docs-site:** redirect ktx.sh/slack to Slack community invite ([#224](https://github.com/Kaelio/ktx/issues/224)) ([a94f358](a94f35800a))

### Bug Fixes

* **cli:** align ingest step counter with SDK num_turns ([#225](https://github.com/Kaelio/ktx/issues/225)) ([6837ab2](6837ab253d))
* **cli:** preserve project artifacts when ktx setup steps fail ([#229](https://github.com/Kaelio/ktx/issues/229)) ([c1ed5ee](c1ed5eedce))
* **docs-site:** disable Geist Mono ligatures on every font-mono surface ([#228](https://github.com/Kaelio/ktx/issues/228)) ([2a85346](2a85346613))

### Documentation

* add context layer terminology ([#226](https://github.com/Kaelio/ktx/issues/226)) ([27842e1](27842e14a9))
* add ktx skills.sh setup skill ([#227](https://github.com/Kaelio/ktx/issues/227)) ([39f94f3](39f94f39ff))
* **docs-site:** collapse agent setup explainer into a hover overlay ([#231](https://github.com/Kaelio/ktx/issues/231)) ([57b6071](57b607169f))
* **docs-site:** show setup prompt command in backticks ([00d5fd1](00d5fd1b0f))
* **docs-site:** tidy agent setup prompt copy and sizing ([35cecdf](35cecdf65d))
* **skills:** correct ktx setup skill against agent-trial findings ([#230](https://github.com/Kaelio/ktx/issues/230)) ([6c6a3e7](6c6a3e7baf))
2026-05-28 15:21:40 +00:00
Andrey Avtomonov
00d5fd1b0f docs(docs-site): show setup prompt command in backticks 2026-05-28 16:09:03 +02:00
Andrey Avtomonov
57b607169f
docs(docs-site): collapse agent setup explainer into a hover overlay (#231) 2026-05-28 16:05:19 +02:00
Andrey Avtomonov
6c6a3e7baf
docs(skills): correct ktx setup skill against agent-trial findings (#230)
An external agent ran the skill end-to-end against `ktx setup` and reported
seven concrete failures, all verified against the CLI source:

- All useful setup flags are `.hideHelp()`, so the skill's "verify with
  --help" rule led the agent to conclude its own examples were wrong
  (setup-commands.ts:208-332).
- The non-interactive LLM default is `anthropic` (and requires a key), not
  `claude-code` as the skill claimed (setup-models.ts:505-507).
- `ktx status` exits 1 whenever the LLM is `none`, even with healthy
  embeddings and connections (status-project.ts:204-211, doctor.ts:647).
- `ktx ingest` rejects `--yes`+`--no-input` while `ktx setup` accepts both
  (managed-python-command.ts:23-24).
- `--database-url <raw>` auto-externalizes to `.ktx/secrets/<id>-url` —
  worth telling the agent (setup-databases.ts:671-683).
- Resuming setup with only `--llm-backend` fails on missing DB flags even
  when `ktx.yaml` already has one (setup-databases.ts:1778-1782).
- The `--agents` step prints `Required before using agents: ktx mcp start`
  but the skill never told agents to run it (setup-agents.ts:989,1227).

Rewrite SKILL.md to: lead with the scripted (non-interactive) path; add a
single "gather inputs once" checklist; correct the LLM default; document
`--skip-*` flags and resumability; warn that `status` exit code ≠
readiness; fix the `ktx ingest` example to use `--no-input` only; require
`ktx mcp start` after `--agents`; add a ktx-monorepo branch that avoids
`npm install -g`.

Add skills/ktx/troubleshooting.md (one level deep, per Anthropic's
progressive-disclosure guidance) covering the five real failure signatures
the agent hit: invalid ELF header, missing native CLI binary, missing
Anthropic key, claude-code probe failure, and the resume-without-DB error.

Description rewritten to combine what + when per the official skill
authoring guidelines.
2026-05-28 15:36:56 +02:00
Andrey Avtomonov
35cecdf65d docs(docs-site): tidy agent setup prompt copy and sizing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 15:30:51 +02:00
Andrey Avtomonov
c1ed5eedce
fix(cli): preserve project artifacts when ktx setup steps fail (#229)
ktx setup wiped ktx.yaml, .ktx/setup/state.json, wiki/, semantic-layer/,
raw-sources/, and .git/ — or removed the entire project dir — whenever any
single source in the context-build step failed, destroying hours of ingest
work and the persisted resume state. The cleanup hint was designed for an
"early abort, leave no trace" semantic but was applied indiscriminately to
every later step failure, in direct conflict with the .ktx/setup/state.json
resume mechanism.

Drop the cleanup mechanism entirely (KtxSetupCreatedProjectCleanup,
cleanupForFolderState, createProjectWithCleanup, cleanupCreatedProjectScaffold,
and the createdProjectCleanup plumbing through KtxSetupProjectResult). Step
failures now return non-zero without touching the filesystem, so re-running
ktx setup continues from completed steps and only re-attempts failed sources.

Rewrites the two tests that documented the wipe behavior to assert
preservation, and adds a regression test that simulates partial context-build
artifacts (state.json, wiki/, semantic-layer/) and verifies all survive a
failed context step.

Refs KLO-719
2026-05-28 15:17:06 +02:00
Andrey Avtomonov
b687167bc1 Route ktx stars dashboard 2026-05-28 13:00:49 +02:00
Andrey Avtomonov
2a85346613
fix(docs-site): disable Geist Mono ligatures on every font-mono surface (#228)
Geist Mono fuses `--` into an em-dash glyph that visually swallows the
adjacent space, so prompts like `npx skills add Kaelio/ktx --skill ktx`
rendered as `Kaelio/ktx--skill ktx` on the quickstart page. The existing
ligature-off rule only covered <code>/<pre> and the .ktx-code wrapper —
quickstart.mdx puts the prompt in a plain <div className="font-mono">,
so the rule didn't apply. Extend the selector to also match the
.font-mono Tailwind utility and any inline-style opt-in via the mono
font CSS variable.

Document the convention in AGENTS.md so future docs additions keep
ligatures off on any new monospace container.
2026-05-28 12:51:17 +02:00
Andrey Avtomonov
39f94f39ff
docs: add ktx skills.sh setup skill (#227) 2026-05-28 12:28:10 +02:00
Luca Martial
27842e14a9
docs: add context layer terminology (#226) 2026-05-28 05:58:08 -04:00
Andrey Avtomonov
6837ab253d
fix(cli): align ingest step counter with SDK num_turns (#225)
The Claude Code runtime counted every SDKAssistantMessage with
parent_tool_use_id === null as a step, but the SDK emits extra messages
within a single num_turns round-trip — `stop_reason: 'pause_turn'`
continuations and errored partials it retries internally. The local
counter then outran maxTurns and the ingest HUD rendered confusing
ratios like `step 69/40`.

Filter both cases in collectResult so stepIndex tracks num_turns and
stays bounded by the work-unit stepBudget.
2026-05-28 02:09:53 +02:00
Andrey Avtomonov
a94f35800a
feat(docs-site): redirect ktx.sh/slack to Slack community invite (#224)
Add a host-scoped redirect for /slack on ktx.sh before the existing
catch-all so the path resolves to the community invite link instead of
docs.kaelio.com/ktx/slack.
2026-05-27 18:20:51 +02:00
semantic-release-bot
5d74bd35de chore(release): 0.6.0 [skip ci]
## [0.6.0](https://github.com/Kaelio/ktx/compare/v0.5.0...v0.6.0) (2026-05-26)

### Features

* **cli:** skip-context-sources menu + clack-style tree picker UX ([#213](https://github.com/Kaelio/ktx/issues/213)) ([cfd1749](cfd1749ab9))
* **cli:** surface docs and demo-warehouse links in ktx setup ([#221](https://github.com/Kaelio/ktx/issues/221)) ([62699bf](62699bfe9d))
* **connectors:** generalize readiness and constraint handling ([#212](https://github.com/Kaelio/ktx/issues/212)) ([78b8a0c](78b8a0c025))

### Bug Fixes

* **ingest:** attribute historic-sql evidence writes in bundle report ([#220](https://github.com/Kaelio/ktx/issues/220)) ([1071f9d](1071f9d1c9))
* **scripts:** make package artifacts pnpm launch work on Windows ([2a6fb19](2a6fb19ba4))
* update ktx CI boundary checks ([#223](https://github.com/Kaelio/ktx/issues/223)) ([bc7373f](bc7373fa8e))

### Documentation

* ban ktx compatibility shims ([#214](https://github.com/Kaelio/ktx/issues/214)) ([a9db379](a9db3797e6))
* **readme:** restructure for clarity and add FAQ + comparison table ([#222](https://github.com/Kaelio/ktx/issues/222)) ([0eeac6f](0eeac6f980))
* standardize fanout terminology ([#218](https://github.com/Kaelio/ktx/issues/218)) ([9248688](924868841d))

### Code Refactoring

* remove legacy ktx compatibility shims ([#211](https://github.com/Kaelio/ktx/issues/211)) ([96952fb](96952fb43c))

### Tests

* split cli tests from source tree ([#216](https://github.com/Kaelio/ktx/issues/216)) ([56985b7](56985b7e09))

### Continuous Integration

* disable telemetry in workflows ([#217](https://github.com/Kaelio/ktx/issues/217)) ([4827437](4827437f3a))
2026-05-26 21:19:07 +00:00
Andrey Avtomonov
bc7373fa8e
fix: update ktx CI boundary checks (#223) 2026-05-26 23:03:47 +02:00
Andrey Avtomonov
0eeac6f980 docs(readme): restructure for clarity and add FAQ + comparison table (#222)
* docs(readme): restructure for clarity and add FAQ + comparison table

Restructure the README: trim Common Commands to the 6 essentials and link
to the CLI Reference, add a "How ktx compares" table and "Who is ktx for"
qualifier, introduce a small FAQ, wrap key prompts in GitHub callouts,
merge the duplicate workspace-layout section into Development, move
Telemetry next to License, and add a Star History chart.

* docs(readme): tighten Skip-ktx list and convert FAQ to bullets
2026-05-26 14:29:53 +02:00
Andrey Avtomonov
62699bfe9d
feat(cli): surface docs and demo-warehouse links in ktx setup (#221)
Add a Clack note pointing to https://docs.kaelio.com/ktx right after the
setup intro, and a second note pointing to https://kaelio.com/start
above the database driver multiselect — mirroring the docs-site CTA
wording. Closes KLO-715 and KLO-716.
2026-05-26 13:42:52 +02:00
Andrey Avtomonov
1071f9d1c9
fix(ingest): attribute historic-sql evidence writes in bundle report (#220)
The emit_historic_sql_evidence tool took rawPath as LLM-supplied input,
so projection actions frequently lacked defensible raw paths and every
row in bundle_ingest_reports fell through as actionType: 'skipped' with
null artifact metadata, hiding the wiki pages and SL merges the run had
actually produced (KLO-698).

The tool now reads the work unit's rawFiles from session.allowedRawPaths
and stores them on the evidence envelope; the projection emits actions
with those paths, and stale/archive actions are anchored to manifest.json
so they also surface as non-skipped provenance rows.
2026-05-26 12:21:53 +02:00
ARYAN
2a6fb19ba4
fix(scripts): make package artifacts pnpm launch work on Windows
Fix Windows package artifact script invocation under pnpm.
2026-05-26 12:16:53 +02:00
Andrey Avtomonov
56985b7e09
test: split cli tests from source tree (#216)
* feat(cli): define full warehouse dialect contract

* test(cli): keep dialect edge tests focused

* fix(cli): stabilize dialect contract foundation

* refactor(connectors): own read-only query preparation

* refactor(connectors): resolve dialects through registry

* refactor(connectors): keep concrete dialect classes internal

* chore(workspace): enforce dialect import boundary

* refactor(cli): resolve relationship dialect at scan boundary

* refactor(cli): use dialect display parsing for entity details

* refactor(cli): use dialect display parsing for warehouse catalog

* refactor(cli): use dialect SQL in relationship workflows

* test(cli): verify solid dialect scan workflow closure

* test: split cli tests from source tree

* refactor(cli): standardize BigQuery scope listing

* feat(sqlite): implement connector scope listing

* test(connectors): cover required table listing

* feat(cli): add warehouse driver registry

* refactor(setup): route scope discovery through driver registry

* refactor(cli): route local query execution through driver registry

* refactor(historic-sql): route dialect support through driver registry

* refactor(cli): test warehouse connections through driver registry

* fix(cli): close driver registry type export gaps

* Improve setup daemon diagnostics

* refactor(setup): centralize rail-prefixed diagnostics + query-history fallback

Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput
into clack.ts so the setup wizard, managed daemons, and embedding/agent steps
share one rail-formatted writer. setup-databases.ts also adds a
"disable query history and retry" option when the schema-context build fails
and query history is the likely culprit, surfaced via a new
failed-query-history-unavailable status.

* fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match

The setup picker's KtxTableListEntry was a 2-level { schema, name }, so
qualifiedTableId always wrote db.name into enabled_tables. When BigQuery,
Snowflake, or SQL Server later ran fast ingest, their introspect step filtered
the scope set with scopedTableNames(scope, { catalog: projectId|database, db })
— catalog was non-null on the introspect side but null in the scope refs, so
every entry was rejected, the live-database adapter staged zero table files,
and detect() failed with 'Adapter "live-database" did not recognize fetched
source output'.

Align the picker boundary with the canonical 3-level KtxTableRef:

- Add catalog: string | null to KtxTableListEntry.
- BigQuery/Snowflake/SQL Server listTables populate catalog from the
  resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null.
- qualifiedTableId emits catalog.schema.name when catalog is non-null
  (resolveEnabledTables already accepts the 3-part shape) and
  schemasFromEnabledTables now goes through parseDottedTableEntry so it
  recovers the schema correctly from both 2-part and 3-part entries.
- Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker
  reuse.

Update listTables expectations in all seven connector tests and the setup /
picker test fixtures. Add a picker regression test that covers the
catalog-bearing round-trip (save + refine).

* fix(cli): allow debug telemetry under opt-out env
2026-05-26 08:49:05 +02:00
Luca Martial
924868841d
docs: standardize fanout terminology (#218) 2026-05-25 11:09:33 -04:00
Andrey Avtomonov
4827437f3a
ci: disable telemetry in workflows (#217) 2026-05-25 16:12:39 +02:00
Andrey Avtomonov
a9db3797e6
docs: ban ktx compatibility shims (#214) 2026-05-24 22:55:08 +02:00
Andrey Avtomonov
78b8a0c025
feat(connectors): generalize readiness and constraint handling (#212)
* feat(connectors): add postgres maxConnections

* feat(connectors): add mysql maxConnections

* feat(connectors): add sqlserver maxConnections

* feat(connectors): rename snowflake pool config

* docs: document connector maxConnections

* feat(scan): add constraint discovery warning helper

* feat(scan): carry structural warnings through reports

* feat(postgres): soft-fail denied constraint discovery

* feat(mysql): soft-fail denied constraint discovery

* feat(sqlserver): soft-fail denied constraint discovery

* feat(bigquery): soft-fail denied primary key discovery

* feat(snowflake): report denied primary key discovery

* test(scan): verify constraint discovery warnings

* feat(historic-sql): use shared readiness probes

* docs: document query history readiness probes

* test(historic-sql): verify readiness probe registry

* test(ingest): account for live database warnings artifact

* Add skip option for agent setup
2026-05-24 19:30:06 +02:00
Andrey Avtomonov
cfd1749ab9
feat(cli): skip-context-sources menu + clack-style tree picker UX (#213)
* feat(cli): add 'skip context sources' option to database setup menu

After databases are configured, the post-setup menu now offers a 'Skip
context sources' choice equivalent to passing --skip-sources, which
plumbs through KtxSetupDatabasesResult.skipSources to bypass the
context-source step in the same run.

* feat(cli): standardize tree picker UX after clack autocomplete-multiselect

Search is always on (no '/' to enter): typed printable chars feed the
query, Tab toggles selection on the focused node without leaving the
search bar, and Space toggles only after arrow-key navigation
(isNavigating); otherwise it is appended to the query. Esc clears a
non-empty query before quitting, Ctrl+A and Ctrl+N replace bare-letter
bulk bindings, and the cursor refocuses on the first match when the
query change would hide it.
2026-05-24 19:29:37 +02:00
Andrey Avtomonov
96952fb43c
refactor: remove legacy ktx compatibility shims (#211)
* refactor: remove legacy ktx compatibility shims

* fix: restore overlay collision guidance
2026-05-24 16:57:23 +02:00
807 changed files with 32434 additions and 8962 deletions

View file

@ -10,6 +10,11 @@ on:
permissions:
contents: read
env:
DO_NOT_TRACK: "1"
KTX_TELEMETRY_DISABLED: "1"
NEXT_TELEMETRY_DISABLED: "1"
concurrency:
group: ktx-ci-${{ github.ref }}
cancel-in-progress: true
@ -212,7 +217,7 @@ jobs:
flags: typescript
name: typescript
disable_search: true
fail_ci_if_error: true
fail_ci_if_error: false
- name: Warn when Codecov token is missing for TypeScript
if: env.CODECOV_TOKEN_CONFIGURED != 'true'
@ -231,7 +236,7 @@ jobs:
flags: python
name: python
disable_search: true
fail_ci_if_error: true
fail_ci_if_error: false
- name: Warn when Codecov token is missing for Python
if: env.CODECOV_TOKEN_CONFIGURED != 'true'

View file

@ -26,6 +26,11 @@ permissions:
contents: write
id-token: write
env:
DO_NOT_TRACK: "1"
KTX_TELEMETRY_DISABLED: "1"
NEXT_TELEMETRY_DISABLED: "1"
concurrency:
group: ktx-release-${{ github.ref }}
cancel-in-progress: false

72
.github/workflows/star-history.yml vendored Normal file
View file

@ -0,0 +1,72 @@
name: Refresh star history chart
on:
schedule:
# Twice daily at 06:00 and 18:00 UTC.
- cron: "0 6,18 * * *"
workflow_dispatch:
permissions:
contents: write
env:
DO_NOT_TRACK: "1"
KTX_TELEMETRY_DISABLED: "1"
NEXT_TELEMETRY_DISABLED: "1"
concurrency:
group: star-history-refresh
cancel-in-progress: true
jobs:
refresh:
name: Regenerate assets/star-history.svg
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
# RELEASE_PAT can push to the protected main branch; the default
# GITHUB_TOKEN is rejected by the branch-protection hook (GH006).
token: ${{ secrets.RELEASE_PAT }}
- name: Fetch fresh star-history SVG
run: |
set -euo pipefail
# cachebust forces star-history to regenerate instead of serving its
# own server-side cache; --location follows the slug-normalizing 301.
url="https://api.star-history.com/svg?repos=Kaelio/ktx&type=Date&cachebust=${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}"
curl --fail --location --silent --show-error \
--retry 3 --retry-delay 5 --max-time 60 \
-o assets/star-history.svg.new "$url"
# Guard against error pages / truncated responses before overwriting.
if ! grep -q "</svg>" assets/star-history.svg.new; then
echo "Downloaded file is not a valid SVG; aborting." >&2
exit 1
fi
if [ "$(wc -c < assets/star-history.svg.new)" -lt 1000 ]; then
echo "Downloaded SVG is suspiciously small; aborting." >&2
exit 1
fi
# The star-history API returns the SVG without a trailing newline,
# which end-of-file-fixer rewrites whenever pre-commit runs
# --all-files on a PR. Because the refresh commit below uses [skip ci],
# the hook never runs against it here, so an un-normalized file
# silently breaks the pre-commit check on every open PR. Normalize to
# exactly one trailing newline before committing.
printf '%s\n' "$(cat assets/star-history.svg.new)" > assets/star-history.svg
rm -f assets/star-history.svg.new
- name: Commit if changed
run: |
set -euo pipefail
if git diff --quiet -- assets/star-history.svg; then
echo "Star-history chart unchanged; nothing to commit."
exit 0
fi
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add assets/star-history.svg
# [skip ci] keeps this housekeeping commit from triggering KTX CI.
git commit -m "chore: refresh star history chart [skip ci]"
git push

View file

@ -7,6 +7,11 @@ on:
permissions:
issues: write
env:
DO_NOT_TRACK: "1"
KTX_TELEMETRY_DISABLED: "1"
NEXT_TELEMETRY_DISABLED: "1"
jobs:
label-external:
name: Add needs-triage to external issues
@ -17,7 +22,7 @@ jobs:
github.event.issue.author_association != 'COLLABORATOR'
steps:
- name: Apply needs-triage label
uses: actions/github-script@v7
uses: actions/github-script@v9
with:
script: |
await github.rest.issues.addLabels({

View file

@ -14,6 +14,18 @@ repos:
- id: check-case-conflict
- id: mixed-line-ending
- repo: https://github.com/tombi-toml/tombi-pre-commit
rev: v1.1.0
hooks:
- id: tombi-format
args: ["--offline"]
# uv.lock is generated and owned by uv, which writes its own canonical
# TOML layout. tombi reformats that layout differently, so once uv
# regenerates the lock (e.g. after a dependency or version change)
# tombi rewrites it and the hook fails on the modified file. Keep uv
# authoritative for its lockfile; tombi still formats hand-edited TOML.
exclude: ^uv\.lock$
- repo: https://github.com/asottile/pyupgrade
rev: v3.21.2
hooks:

127
AGENTS.md
View file

@ -24,6 +24,11 @@ database migrations, ORPC contracts, or `python-service/` layout exist here.
- **MUST**: Keep package/public API changes intentional. Do not add compatibility
wrappers for old **ktx** names unless the user explicitly asks for a migration
bridge.
- **MUST**: Avoid compatibility shims for old **ktx** features, command shapes,
configuration formats, or internal APIs. This rule does not prohibit
compatibility support for third-party systems and libraries, such as
Metabase version differences. Keep the **ktx** codebase clean instead of
preserving stale **ktx** behavior.
- **MUST**: Treat **ktx** as having no public users unless the user says otherwise.
Legacy support is not necessary by default; prefer clean breaking changes over
compatibility shims, migration bridges, or preserved stale behavior.
@ -154,6 +159,65 @@ and naming asymmetries are bugs in waiting — see
[`docs/code-design.md`](docs/code-design.md). Treat the `MUST` / `MUST NOT`
rules there with the same weight as the ones in this file.
## Design Reasoning Defaults
When proposing a design, an approach, or any non-trivial change, apply these
defaults and run the self-check before presenting it. They encode the
corrections users most often have to make; reaching these conclusions
autonomously — without being asked the leading question — is the bar.
- **MUST**: Optimize for the best outcome, not for an unstated constraint. Do not
silently adopt "smallest change", "least effort", "cheapest", or "least user
intervention" as the goal unless the user said so. Default to the most correct,
durable solution, and present cost / effort / scope as information for the user
to weigh — not as a ceiling you impose on their behalf.
- **MUST**: Separate one-time cost from recurring cost before discarding an
option. A fixed cost paid once (a setup-time computation, an extra LLM call
during setup, a contract change) to make every later run cheaper or more
correct is usually worth it. Do not reject it with recurring-cost reasoning;
quantify both sides. (Example smell: "don't add an LLM call to a cost-cutting
feature" — wrong when the call is one-time and the savings recur.)
- **MUST**: Treat a user's example as a representative of a class, not as the
spec. Design for the general population the example stands for, then stress-test
against deliberately different instances — another warehouse, dialect, stack
layout, or input shape — before committing. If a design only works because of an
incidental property of the example (e.g. "the noise happened to be in a separate
schema *on this demo*"), it is curve-fitting; generalize it or state the
assumption explicitly.
- **MUST**: Prefer deriving from the system's own state over enumerating cases.
Favor an allowlist computed from declared/observed state (config, scanned
catalog, query log, the user's own inputs) over a denylist of known-bad
specifics (particular tables, schemas, tools, or vendors). A hardcoded or
hand-maintained list of external specifics is a smell: it rots and fails on the
next stack. The only acceptable static patterns are genuinely universal
invariants (e.g. DB-engine system catalogs) and ktx's own self-emitted
signatures.
- **SHOULD**: Before inventing an abstraction or hand-rolling structural logic,
search for what already exists and reuse it — the codebase's canonical
representation (a structured ref/key type) instead of a parallel string scheme,
and a mandated/available tool (e.g. `sqlglot` for SQL structure; see
[SQL and Structured Parsing](#sql-and-structured-parsing)) instead of
hand-parsing. Normalize ambiguous input to the canonical form at the boundary;
do not carry the ambiguity downstream. This is the single-source-of-truth / DRY
item from the Priority Hierarchy applied at design time.
Before presenting a design, answer these explicitly:
1. Am I optimizing for a goal the user actually stated, or one I assumed?
2. Does this generalize beyond the example in front of me? Name a real case where
it would break.
3. Am I enumerating known-bad cases when I could derive scope from the system's
own declared/observed state?
4. Is there an existing canonical representation or mandated tool I should reuse
instead of building or parsing my own?
5. Am I discarding the better option on a weak or misapplied constraint
(one-time vs recurring cost, "more surface area", "more work now")?
A user question that nudges toward an alternative ("would X help?", "should I
always do Y?", "will you hardcode Z?") is a signal that a better option exists.
Investigate the implied direction and reason it through *before* defending the
original proposal — and prefer to have asked yourself the question first.
## TypeScript Standards
- Use Node 22+ and pnpm workspace commands.
@ -273,7 +337,8 @@ use `PascalCase` without the suffix.
## Telemetry
**ktx** ships anonymous PostHog telemetry. When adding commands or events:
**ktx** ships PostHog usage telemetry. Catalog telemetry events use strict
schemas. When adding commands or events:
- **MUST NOT**: Add fields that carry user data — file paths, hostnames,
environment values, SQL text, schema/table/column names, error messages,
@ -290,6 +355,24 @@ use `PascalCase` without the suffix.
of collected data changes. Adding another event with no new field types
needs no docs change.
### Error reports
**ktx** also sends PostHog Error Tracking `$exception` events when telemetry is
enabled. This channel is separate from the strict catalog event schema and is
used only for exception diagnostics.
`$exception` events may include stack frames, error class names, raw error
messages, cause chains, `source`, `handled`, `fatal`, runtime version fields,
OS/runtime fields, and the hashed `projectId` when known. Stack frames may
include local file paths and the local username when those appear in paths.
`$exception` events must never intentionally include secrets, credentials,
database URLs, auth headers, raw argv, raw environment values, SQL text,
schema/table/column names as explicit properties, customer row data, user prompt
text, or raw MCP arguments. Reporters must redact call-site-provided secret
snapshots and common static credential patterns before the SDK serializes the
exception.
## Documentation and Specs
- Keep public documentation in `README.md`, package READMEs, example READMEs,
@ -318,6 +401,26 @@ use `PascalCase` without the suffix.
source-code identifier, package/API name, or other literal value that must
match the implementation.
### Product Category Naming
- **MUST**: Use **context layer** as the primary public category for **ktx**.
Preferred phrase: `context layer for data agents`.
- **MUST**: Use **context engine** only as the secondary mechanism term for the
active system that builds, reconciles, validates, searches, and serves the
context layer.
- **MUST**: Keep **semantic layer** as the narrower term for executable metric
definitions, semantic sources, joins, measures, and SQL compilation.
- **MUST NOT**: Replace every `semantic layer` occurrence with `context layer`;
the semantic layer is one pillar inside the broader context layer.
Preferred pattern:
```md
**ktx** is an open-source context layer for data agents. Its context engine
ingests warehouse metadata, BI definitions, query history, docs, and approved
metrics, then turns them into reviewable files agents can search and execute.
```
### Terminology
For canonical vocabulary used across docs, code, comments, CLI strings, and
@ -325,8 +428,9 @@ error messages — including the disambiguation rule for the overloaded word
`source` (semantic / primary / context / source of truth) — see
[`docs/terminology.md`](docs/terminology.md). Follow that file when choosing
between near-synonyms (e.g. `connector` vs `adapter`, `data agent` vs
`database agent`, `fast ingest` vs `schema ingest`). Product-name rules in
this section take precedence over anything in that file when they conflict.
`database agent`, `context-source ingest` vs `source ingest`). Product-name
rules in this section take precedence over anything in that file when they
conflict.
### Updating `docs-site/` After Code Changes
@ -350,6 +454,23 @@ that do not change user-facing behavior. When you do update docs, follow the
warrants docs but you are out of scope, call it out in your final summary
rather than silently skipping it.
#### Monospace ligatures in `docs-site/`
- **MUST**: Disable monospace ligatures on every surface that uses the
`var(--font-mono)` family (Geist Mono). Geist Mono fuses `--` into an
em-dash glyph that visually eats the adjacent space, so prompts like
`npx skills add Kaelio/ktx --skill ktx` render as
`Kaelio/ktx--skill ktx`.
- **MUST**: When adding a new container that renders user-visible monospace
text outside `<code>` / `<pre>` (e.g. a styled `<div className="font-mono">`
for a copyable prompt), verify the global ligature-off rule in
`docs-site/app/global.css` covers its selector. Either use Tailwind's
`font-mono` utility (already covered) or extend the rule to match the new
class — do not silently rely on Geist Mono's defaults.
- **SHOULD**: Prefer `<code>` / `<pre>` (or a `font-mono` wrapper) for any
string that contains CLI flags, paths, or other tokens with `--`, `->`,
`>=`, `!=`, `==`, `//` so ligatures never alter intent.
## LLM and Prompt Development
When creating or modifying agent prompts, system prompts, tool descriptions, or

239
README.md
View file

@ -13,7 +13,18 @@
<a href="https://docs.kaelio.com/ktx/docs/"><img src="https://img.shields.io/badge/docs-ktx-22c55e?style=flat-square" alt="Documentation" /></a>
<a href="https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ"><img src="https://img.shields.io/badge/slack-join%20community-4A154B?style=flat-square&logo=slack&logoColor=white" alt="Join the ktx Slack community" /></a>
<a href="https://github.com/Kaelio/ktx/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square" alt="License" /></a>
<a href="https://www.ycombinator.com/companies?batch=P25"><img src="https://img.shields.io/badge/Y%20Combinator-P25-orange?style=flat-square" alt="Y Combinator P25" /></a>
<a href="https://www.ycombinator.com/companies/kaelio"><img src="https://img.shields.io/badge/Y%20Combinator-P25-orange?style=flat-square" alt="Y Combinator P25" /></a>
</p>
<p align="center">
<a href="https://docs.kaelio.com/ktx/docs/getting-started/quickstart"><b>Quickstart</b></a> ·
<a href="https://docs.kaelio.com/ktx/docs/cli-reference/ktx"><b>CLI Reference</b></a> ·
<a href="https://docs.kaelio.com/ktx/docs/ai-resources/agent-quickstart"><b>Agent Setup</b></a> ·
<a href="https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ"><b>Slack</b></a>
</p>
<p align="center">
<sub>Built and maintained by <a href="https://www.kaelio.com"><b>Kaelio</b></a></sub>
</p>
---
@ -22,11 +33,25 @@
warehouse accurately - from approved metric definitions, joinable columns, and
business knowledge it builds and maintains for you.
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and
SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion.
> [!NOTE]
> Run **ktx** with your own LLM API keys or a local agent sign-in — a
> **Claude Pro/Max** subscription through Claude Code, or your local Codex
> authentication. No extra usage billing from **ktx**.
<p align="center">
<a href="https://youtu.be/5V4TuzYVlrA">
<img src="assets/launch-video-thumb.png" alt="Watch the ktx launch video (1:56)" width="820" />
</a>
</p>
<p align="center">
<img src="docs-site/public/images/ingestion-flow.png" alt="Ingestion: ktx ingests databases, BI tools, modeling code, and docs through its context engine (source connectors, context builder, reconciliation, validation) into wiki Markdown and semantic-layer YAML" width="900" />
</p>
<p align="center">
<img src="docs-site/public/images/mcp-runtime-flow.png" alt="Serving: an agent queries ktx through MCP, which searches the wiki and semantic layer, returns approved metrics, and compiles them into read-only SQL run against the warehouse" width="900" />
</p>
Runs with your own LLM API keys or a **Claude
Pro/Max subscription - no extra usage billing from** **ktx**.
## Why ktx
@ -51,23 +76,35 @@ upkeep and don't absorb the rest of your company's knowledge.
- **Serves agents at execution.** Exposes CLI and MCP tools with combined
full-text and semantic search across wiki and semantic-layer entities.
Agents can run raw SQL when they need it, or compose semantic-layer queries
when they want approved metrics with reliable joins.
## How ktx compares
<p align="center">
<img src="docs-site/public/images/ingestion-flow-transparent.svg" alt="ktx ingestion flow from source systems through validation to wiki and semantic-layer outputs" width="900" />
</p>
| | General-purpose agent | Traditional semantic layer | **ktx** |
| --- | :---: | :---: | :---: |
| Builds warehouse context automatically | — | — | ✓ |
| Detects joinable columns + resolves fan/chasm traps | — | Manual | ✓ |
| Approved, reusable metric definitions | — | ✓ | ✓ |
| Absorbs wiki / Notion / team knowledge | — | — | ✓ |
| Flags contradictions across sources | — | — | ✓ |
| Ships CLI + MCP for agent execution | Partial | — | ✓ |
| Read-only by design | n/a | n/a | ✓ |
## Agent Setup
## Who is ktx for
Ask an agent such as Claude Code, Codex, Cursor, or OpenCode to install and
configure **ktx** from your project directory:
**Use ktx if you:**
```text
Follow instructions from
https://docs.kaelio.com/ktx/docs/agents-setup.md
to install and configure ktx
```
- Want agents like Claude Code, Codex, Cursor, or OpenCode to query your
warehouse with approved metric definitions
- Have business knowledge scattered across dbt, Looker, Metabase, Notion, and
team wikis
- Need agents to reuse canonical SQL instead of inventing it on every prompt
**Skip ktx if you:**
- You don't have a SQL warehouse - **ktx** sits on top of one
- You only need one ad-hoc query - `psql` or a notebook will do
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and
SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion.
## Quick Start
@ -77,10 +114,10 @@ ktx setup
ktx status
```
`ktx setup` creates or resumes a local **ktx** project, configures providers and
connections, builds context, and installs agent integration.
`ktx setup` creates or resumes a local **ktx** project, configures providers
and connections, builds context, and installs agent integration.
Example `ktx status` output after setup:
Example `ktx status` after setup:
```text
ktx project: /home/user/analytics
@ -93,38 +130,32 @@ ktx context built: yes
Agent integration ready: yes (codex:project)
```
## Telemetry
> [!TIP]
> Already using an agent? Ask Claude Code, Codex, Cursor, or OpenCode from
> your project directory:
>
> ```text
> Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install
> and configure ktx in this project.
> ```
**ktx** collects anonymous usage telemetry from interactive CLI runs to improve
setup, command reliability, and data-agent workflows. See
[Telemetry](https://docs.kaelio.com/ktx/docs/community/telemetry) for the event
catalog, privacy details, and opt-out options.
> [!IMPORTANT]
> If `ktx status` prints `ktx mcp start --project-dir ...`, run it before
> opening your agent client.
## Common Commands
## First commands
| Command | Purpose |
|---------|---------|
| --- | --- |
| `ktx setup` | Create, resume, or update a **ktx** project |
| `ktx status` | Check project readiness |
| `ktx connection` | List configured connections |
| `ktx connection test` | Test every configured connection |
| `ktx connection test <id>` | Test one connection |
| `ktx ingest` | Build context for every configured connection |
| `ktx ingest <id>` | Build context for one connection |
| `ktx ingest --text "..."` | Capture free-form notes into memory |
| `ktx ingest --file notes.md --connection-id <id>` | Capture a text file into memory |
| `ktx sl` | List semantic sources |
| `ktx sl "revenue"` | Search semantic sources |
| `ktx sl validate <source> --connection-id <id>` | Validate a semantic source |
| `ktx sl query --measure <measure> --format sql` | Compile semantic-layer SQL |
| `ktx sql --connection <id> "select 1"` | Execute read-only SQL |
| `ktx wiki` | List local wiki pages |
| `ktx wiki "revenue definition"` | Search local wiki pages |
| `ktx mcp` | Show MCP daemon status |
| `ktx mcp start` | Start the local MCP server for agent clients |
| `ktx wiki "refund policy"` | Search local wiki pages |
| `ktx mcp start` | Start the MCP server for agent clients |
Project resolution defaults to `KTX_PROJECT_DIR`, then the nearest `ktx.yaml`,
then the current directory. Pass `--project-dir <path>` when scripting.
See the [CLI Reference](https://docs.kaelio.com/ktx/docs/cli-reference/ktx)
for every command, flag, and option.
## Project Layout
@ -140,45 +171,44 @@ my-project/
Commit `ktx.yaml`, `semantic-layer/`, and `wiki/`. Keep `.ktx/` local.
## Agent Usage
Project resolution defaults to `KTX_PROJECT_DIR`, then the nearest `ktx.yaml`,
then the current directory. Pass `--project-dir <path>` when scripting.
Install **ktx** integration for Claude Code, Claude Desktop, Codex, Cursor,
OpenCode, and generic `.agents` clients:
## FAQ
```bash
ktx setup --agents
```
- **Does ktx send my schema or query results to a hosted service?**
No. **ktx** runs locally. The only data leaving your machine is what you
send to the LLM provider you configured.
- **Which LLM backends are supported?**
Anthropic API, Google Vertex AI, AI Gateway, the local Claude Code session
through the Claude Agent SDK, and your local Codex authentication through the
Codex SDK. See
[LLM configuration](https://docs.kaelio.com/ktx/docs/guides/llm-configuration).
- **How is ktx different from a dbt or MetricFlow semantic layer?**
**ktx** *ingests* those layers and combines them with raw-table
introspection and wiki content. Agents get one searchable surface instead
of three disconnected ones - and **ktx** flags contradictions across
sources.
- **Does ktx need a running server?**
There is no hosted service. The local MCP daemon runs on demand via
`ktx mcp start` when an agent client needs it.
- **Is my warehouse safe?**
Yes. Connections are read-only - **ktx** never writes to your database.
Pass `--target <target>` to install or repair one specific integration.
## Docs
A typical agent workflow combines wiki and semantic-layer search before
querying:
- [Quickstart](https://docs.kaelio.com/ktx/docs/getting-started/quickstart)
- [The Context Layer](https://docs.kaelio.com/ktx/docs/concepts/the-context-layer)
- [Building Context](https://docs.kaelio.com/ktx/docs/guides/building-context)
- [CLI Reference](https://docs.kaelio.com/ktx/docs/cli-reference/ktx)
- [Agent Quickstart](https://docs.kaelio.com/ktx/docs/ai-resources/agent-quickstart)
- [Community & Support](https://docs.kaelio.com/ktx/docs/community/support)
```bash
ktx sl "revenue" --json
ktx wiki "refund policy" --json
ktx sl query --connection-id warehouse --measure orders.revenue --format sql
```
## Community
During setup, choose **Ask data questions with ktx MCP** for agent clients.
Choose **Ask data questions + manage ktx with CLI commands** when an operator
agent also needs pinned `ktx` admin commands.
After setup, **ktx** prints **Required before using agents** with the exact
commands to run. If the output includes `ktx mcp start --project-dir ...`, run
it before opening your agent. Claude Desktop uses its own launcher and prints
separate skill upload steps under `.ktx/agents/claude/`.
## Workspace layout
| Path | Purpose |
|------|---------|
| `packages/cli` | TypeScript CLI package and published npm package source |
| `packages/cli/src/context` | Core context engine |
| `packages/cli/src/llm` | LLM and embedding providers |
| `packages/cli/src/connectors` | Database scan connectors |
| `python/ktx-sl` | Semantic-layer query planning |
| `python/ktx-daemon` | Portable compute service |
- **[Slack](https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ)** — ask questions, share what you're building, and chat with maintainers.
- **[GitHub Issues](https://github.com/Kaelio/ktx/issues)** — report bugs and request features.
- **[Contributing](https://docs.kaelio.com/ktx/docs/community/contributing)** — set up the repo, run tests, and open a PR.
## Development
@ -191,7 +221,18 @@ pnpm run build
pnpm run check
```
Use the development CLI locally:
**ktx** is a pnpm + uv workspace:
| Path | Purpose |
| --- | --- |
| `packages/cli` | TypeScript CLI and published npm package source |
| `packages/cli/src/context` | Core context engine |
| `packages/cli/src/llm` | LLM and embedding providers |
| `packages/cli/src/connectors` | Database scan connectors |
| `python/ktx-sl` | Semantic-layer query planning |
| `python/ktx-daemon` | Portable compute service |
Local development CLI:
```bash
pnpm run setup:dev
@ -199,13 +240,6 @@ pnpm run link:dev
ktx-dev --help
```
**ktx** is a pnpm + uv workspace:
- TypeScript packages live in `packages/*`
- CLI source lives in `packages/cli`
- Python runtime source lives in `python/ktx-sl` and `python/ktx-daemon`
- Public docs live in `docs-site/content/docs`
Useful checks:
```bash
@ -215,23 +249,28 @@ pnpm run dead-code
uv run pytest -q
```
## Docs
## Telemetry
- [Quickstart](docs-site/content/docs/getting-started/quickstart.mdx)
- [CLI Reference](docs-site/content/docs/cli-reference/ktx.mdx)
- [Building Context](docs-site/content/docs/guides/building-context.mdx)
- [Community & Support](docs-site/content/docs/community/support.mdx)
- [Contributing](docs-site/content/docs/community/contributing.mdx)
## Community
- **[Slack](https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ)** — ask questions, share what you're building, and chat with maintainers and other users.
- **[GitHub Issues](https://github.com/Kaelio/ktx/issues)** — report bugs and request features.
- **[Contributing guide](docs-site/content/docs/community/contributing.mdx)** — set up the repo, run tests, and open a PR.
See [Community & Support](docs-site/content/docs/community/support.mdx) for the
full guide on where to ask what.
**ktx** collects privacy-conscious usage telemetry to understand installs and
improve setup, command reliability, and data-agent workflows. Catalog telemetry
events do not record file paths, hostnames, SQL, schema names, table names,
column names, error messages, raw environment values, or argv. Error reports use
PostHog Error Tracking and can include stack frames and raw error messages,
which may contain local file paths or the local username in those paths.
**ktx** redacts secrets, credentials, database URLs, auth headers, argv, raw
environment values, SQL text, row data, and user-typed prompt or MCP argument
text from the explicit `$exception` payload. See
[Telemetry](https://docs.kaelio.com/ktx/docs/community/telemetry) for the event
catalog and opt-out options.
## License
**ktx** is licensed under the Apache License, Version 2.0. See `LICENSE`.
## Star History
<p align="center">
<a href="https://star-history.com/#Kaelio/ktx&Date">
<img src="assets/star-history.svg" alt="ktx Star History Chart" width="700" />
</a>
</p>

View file

@ -19,14 +19,9 @@
<path d="M 80 84 Q 86 77 92 84" fill="none" stroke="#F5F1EA" stroke-width="3.5" stroke-linecap="round" />
<path d="M 108 84 Q 114 77 120 84" fill="none" stroke="#F5F1EA" stroke-width="3.5" stroke-linecap="round" />
<!-- wordmark: 'ktx', half the logo height, vertically centered -->
<text
x="225"
y="145"
font-family="'JetBrains Mono', 'Fira Code', ui-monospace, 'SF Mono', Menlo, monospace"
font-size="140"
font-weight="600"
fill="#1B3139"
letter-spacing="-0.04em"
>ktx</text>
<!-- wordmark: "ktx" outlined from Outfit SemiBold (the docs-site display font)
so it renders identically everywhere, independent of installed fonts -->
<g transform="translate(242 145)" fill="#1B3139">
<path d="M51.17 0 25.06 -34.79 51.03 -67.62H72.17L41.65 -30.7L42.35 -39.55L73.57 0ZM8.05 0V-101.22H26.46V0ZM88.41 0V-95.69H106.82V0ZM72.66 -51.52V-67.62H122.57V-51.52ZM171.75 0 153.93 -27.41 150.22 -30.17 123.83 -67.62H145.64L161.91 -42.77L165.48 -40.18L193.38 0ZM122.54 0 150.05 -38.61 160.62 -26.22 143.19 0ZM166.11 -30.38 155.44 -42.67 171.54 -67.62H192.08Z" />
</g>
</svg>

Before

Width:  |  Height:  |  Size: 1.1 KiB

After

Width:  |  Height:  |  Size: 1.4 KiB

Before After
Before After

Binary file not shown.

After

Width:  |  Height:  |  Size: 135 KiB

1
assets/star-history.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 63 KiB

View file

@ -0,0 +1,12 @@
import type { Metadata } from "next";
import { DiagramStudio } from "@/components/diagram-studio/studio";
export const metadata: Metadata = {
title: "Diagram studio",
robots: { index: false, follow: false },
};
export default function DiagramStudioPage() {
return <DiagramStudio />;
}

View file

@ -166,12 +166,16 @@ pre {
}
/* Disable monospace ligatures so `--flag` keeps a visible space and double
dashes don't fuse into an em-dash glyph. */
dashes don't fuse into an em-dash glyph. Covers every monospace surface:
raw <code>/<pre>, the ktx-code wrapper, Tailwind's `font-mono` utility,
and anything that opts in via the `var(--font-mono)` family directly. */
code,
pre,
pre code,
.ktx-code,
.ktx-code code {
.ktx-code code,
.font-mono,
[style*="--font-mono"] {
font-variant-ligatures: none !important;
font-feature-settings: "liga" 0, "calt" 0 !important;
}

View file

@ -5,7 +5,7 @@ import { SlackIcon } from "@/components/slack-icon";
export const baseOptions: BaseLayoutProps = {
nav: {
title: <Logo />,
title: Logo,
transparentMode: "top",
},
links: [

View file

@ -3,11 +3,6 @@ import {
getLlmDocsPages,
getPageMarkdown,
} from "@/lib/llm-docs";
import {
agentSetupSlug,
isAgentSetupSlug,
readAgentSetupMarkdown,
} from "@/lib/agent-setup-markdown";
export const dynamic = "force-static";
@ -16,14 +11,6 @@ export async function GET(
props: { params: Promise<{ slug?: string[] }> },
) {
const params = await props.params;
if (isAgentSetupSlug(params.slug)) {
return new Response(await readAgentSetupMarkdown(), {
headers: {
"Content-Type": "text/markdown; charset=utf-8",
},
});
}
const page = getLlmDocsPage(params.slug);
if (!page) {
return new Response("Documentation page not found.\n", {
@ -42,8 +29,5 @@ export async function GET(
}
export function generateStaticParams() {
return [
...getLlmDocsPages().map((page) => ({ slug: page.slug })),
{ slug: [...agentSetupSlug] },
];
return getLlmDocsPages().map((page) => ({ slug: page.slug }));
}

View file

@ -0,0 +1,328 @@
import { type Edge, MarkerType, type Node } from "@xyflow/react";
import { C } from "./nodes";
const EDGE_COLOR = "#b3bcc4";
const MARKER_COLOR = "#9aa6ad";
const labelStyle = {
fontFamily: "var(--font-inter), system-ui, sans-serif",
fontSize: 15,
fontWeight: 600,
fill: C.inkMuted,
};
const labelBgStyle = { fill: "#ffffff", stroke: C.chipBorder, strokeWidth: 1 };
const labelBg = {
labelBgPadding: [8, 4] as [number, number],
labelBgBorderRadius: 6,
labelStyle,
labelBgStyle,
};
const marker = { type: MarkerType.ArrowClosed, color: MARKER_COLOR, width: 16, height: 16 };
const edgeStyle = { stroke: EDGE_COLOR, strokeWidth: 2 };
/* ============================== INGESTION =============================== */
const SRC_W = 300;
const SRC_H = 138;
const SRC_GAP = 24;
const srcY = (i: number) => i * (SRC_H + SRC_GAP);
export const ingestionNodes: Node[] = [
{
id: "title",
type: "title",
position: { x: 0, y: -96 },
data: {
width: 560,
eyebrow: "1 · Ingestion",
title: "ktx builds your context layer",
},
},
{
id: "db",
type: "card",
position: { x: 0, y: srcY(0) },
data: {
width: SRC_W,
height: SRC_H,
accent: C.teal,
rows: [
{ kind: "title", text: "Databases" },
{ kind: "desc", text: "Schemas, keys, query history" },
{ kind: "muted", text: "Postgres · Snowflake · BigQuery · …" },
],
handles: [{ side: "right", type: "source", id: "out" }],
},
},
{
id: "bi",
type: "card",
position: { x: 0, y: srcY(1) },
data: {
width: SRC_W,
height: SRC_H,
accent: C.orange,
rows: [
{ kind: "title", text: "BI tools" },
{ kind: "desc", text: "Dashboards, explores, usage" },
{ kind: "muted", text: "Metabase · Looker · …" },
],
handles: [{ side: "right", type: "source", id: "out" }],
},
},
{
id: "model",
type: "card",
position: { x: 0, y: srcY(2) },
data: {
width: SRC_W,
height: SRC_H,
accent: C.amber,
rows: [
{ kind: "title", text: "Modeling code" },
{ kind: "desc", text: "Metrics, models, joins, entities" },
{ kind: "muted", text: "dbt · LookML · MetricFlow · …" },
],
handles: [{ side: "right", type: "source", id: "out" }],
},
},
{
id: "docs",
type: "card",
position: { x: 0, y: srcY(3) },
data: {
width: SRC_W,
height: SRC_H,
accent: C.emerald,
rows: [
{ kind: "title", text: "Docs & notes" },
{ kind: "desc", text: "Policies, definitions, notes" },
{ kind: "muted", text: "Notion · any text · …" },
],
handles: [{ side: "right", type: "source", id: "out" }],
},
},
{
id: "engine",
type: "engine",
position: { x: 420, y: 52 },
data: {
width: 380,
height: 520,
steps: [
{ n: 1, title: "Source connectors", desc: "Read each source in its shape" },
{ n: 2, title: "Context builder", desc: "Evidence into proposed updates" },
{ n: 3, title: "Reconciliation", desc: "Merge with existing context" },
{ n: 4, title: "Validation", desc: "Check references & semantics" },
],
handles: [
{ side: "left", type: "target", id: "in" },
{ side: "right", type: "source", id: "out" },
],
},
},
{
id: "wiki",
type: "card",
position: { x: 900, y: 66 },
data: {
width: 320,
height: 220,
accent: C.emerald,
rows: [
{ kind: "mono", text: "wiki/*.md", color: C.emerald },
{ kind: "title", text: "Wiki" },
{ kind: "chips", items: ["free-form", "auto-maintained"] },
{ kind: "desc", text: "Definitions, caveats, policies," },
{ kind: "desc", text: "and notes agents can search." },
],
handles: [{ side: "left", type: "target", id: "in" }],
},
},
{
id: "sl",
type: "card",
position: { x: 900, y: 338 },
data: {
width: 320,
height: 220,
accent: C.teal,
rows: [
{ kind: "mono", text: "semantic-layer/*.yaml", color: C.teal },
{ kind: "title", text: "Semantic layer" },
{ kind: "chips", items: ["executable", "auto-maintained"] },
{ kind: "desc", text: "Metrics, joins, dimensions, and" },
{ kind: "desc", text: "filters ktx compiles into SQL." },
],
handles: [{ side: "left", type: "target", id: "in" }],
},
},
];
const ingestEdge = (source: string, target: string): Edge => ({
id: `${source}-${target}`,
source,
target,
sourceHandle: "out",
targetHandle: "in",
type: "default",
style: edgeStyle,
markerEnd: marker,
});
export const ingestionEdges: Edge[] = [
ingestEdge("db", "engine"),
ingestEdge("bi", "engine"),
ingestEdge("model", "engine"),
ingestEdge("docs", "engine"),
ingestEdge("engine", "wiki"),
ingestEdge("engine", "sl"),
];
/* =============================== RUNTIME ================================ */
export const runtimeNodes: Node[] = [
{
id: "title",
type: "title",
position: { x: 0, y: -84 },
data: {
width: 560,
eyebrow: "2 · Serving",
title: "agents query it through MCP",
},
},
{
id: "agent",
type: "card",
position: { x: 0, y: 115 },
data: {
width: 280,
height: 190,
accent: C.neutral,
align: "center",
rows: [
{ kind: "title", text: "Your agent" },
{ kind: "muted", text: "Claude Code · Cursor" },
{ kind: "muted", text: "Codex · OpenCode" },
],
handles: [
{ side: "right", type: "source", id: "ask", top: "42%" },
{ side: "right", type: "target", id: "answer", top: "62%" },
],
},
},
{
id: "hub",
type: "hub",
position: { x: 420, y: 85 },
data: {
width: 360,
height: 250,
rows: [
"Search wiki + semantic layer",
"Return approved metrics",
"Compile metrics → SQL",
],
handles: [
{ side: "left", type: "target", id: "ask", top: "42%" },
{ side: "left", type: "source", id: "answer", top: "62%" },
{ side: "right", type: "source", id: "to-context", top: "30%" },
{ side: "right", type: "source", id: "to-warehouse", top: "72%" },
],
},
},
{
id: "context",
type: "card",
position: { x: 920, y: 15 },
data: {
width: 300,
height: 150,
accent: C.teal,
rows: [
{ kind: "title", text: "Context layer" },
{ kind: "mono", text: "wiki/*.md", color: C.emerald },
{ kind: "mono", text: "semantic-layer/*.yaml", color: C.teal },
],
handles: [{ side: "left", type: "target", id: "in" }],
},
},
{
id: "warehouse",
type: "card",
position: { x: 920, y: 255 },
data: {
width: 300,
height: 150,
accent: C.slate,
rows: [
{ kind: "title", text: "Warehouse" },
{
kind: "badge",
text: "read-only",
bg: "#ecf6f8",
border: "#bfe3ea",
color: C.teal,
},
{ kind: "desc", text: "Runs the compiled SQL" },
],
handles: [{ side: "left", type: "target", id: "in" }],
},
},
];
export const runtimeEdges: Edge[] = [
{
id: "ask",
source: "agent",
sourceHandle: "ask",
target: "hub",
targetHandle: "ask",
type: "default",
label: "ask",
...labelBg,
style: edgeStyle,
markerEnd: marker,
},
{
id: "answer",
source: "hub",
sourceHandle: "answer",
target: "agent",
targetHandle: "answer",
type: "default",
label: "answer",
...labelBg,
style: edgeStyle,
markerEnd: marker,
},
{
id: "search",
source: "hub",
sourceHandle: "to-context",
target: "context",
targetHandle: "in",
type: "smoothstep",
label: "search + read",
...labelBg,
style: edgeStyle,
markerStart: marker,
markerEnd: marker,
},
{
id: "readonly",
source: "hub",
sourceHandle: "to-warehouse",
target: "warehouse",
targetHandle: "in",
type: "smoothstep",
label: "read-only",
...labelBg,
style: edgeStyle,
markerStart: marker,
markerEnd: marker,
},
];

View file

@ -0,0 +1,57 @@
/**
* Inlined ktx mascot, ported from assets/ktx-mascot.svg.
*
* - `light` renders the dark-bodied mascot for light surfaces.
* - `dark` renders the cream-bodied mascot for dark surfaces (e.g. the ktx
* hub panel), mirroring brand/ktx-mascot-dark.svg.
*/
export function KtxMascot({
variant = "light",
size = 56,
}: {
variant?: "light" | "dark";
size?: number;
}) {
const body = variant === "dark" ? "#F5F1EA" : "#1B3139";
const eye = variant === "dark" ? "#1B3139" : "#F5F1EA";
return (
<svg
viewBox="0 0 200 200"
width={size}
height={size}
role="img"
aria-label="ktx mascot"
>
<g fill="none" stroke={body} strokeWidth="16" strokeLinecap="round">
<path d="M 62 110 Q 32 130 44 152" />
<path d="M 88 116 Q 80 152 70 174" />
<path d="M 112 116 Q 120 152 130 174" />
</g>
<path
d="M 134 108 C 162 116, 172 96, 162 78 C 154 64, 168 56, 178 60"
fill="none"
stroke="#FF8A4C"
strokeWidth="16"
strokeLinecap="round"
/>
<path
d="M 48 102 C 48 56, 78 30, 100 30 C 122 30, 152 56, 152 102 C 152 116, 132 120, 100 120 C 68 120, 48 116, 48 102 Z"
fill={body}
/>
<path
d="M 80 84 Q 86 77 92 84"
fill="none"
stroke={eye}
strokeWidth="3.5"
strokeLinecap="round"
/>
<path
d="M 108 84 Q 114 77 120 84"
fill="none"
stroke={eye}
strokeWidth="3.5"
strokeLinecap="round"
/>
</svg>
);
}

View file

@ -0,0 +1,493 @@
"use client";
import { Handle, Position, type Node, type NodeProps } from "@xyflow/react";
import { KtxMascot } from "./mascot";
/** Fixed palette mirrored from the approved SVG diagrams so the exported PNG
* is theme-independent (one image that reads on light and dark GitHub). */
export const C = {
ink: "#1b1b18",
inkSoft: "#57534e",
inkMuted: "#8c857f",
cardBorder: "#e2dfd9",
engineBg: "#15323a",
engineBorder: "#23474f",
cyan: "#55dced",
stepNum: "#06262c",
stepTitle: "#f3f1ec",
stepDesc: "#9fb6bc",
hubRow: "#eef4f5",
chipBg: "#faf9f6",
chipBorder: "#e7e5e4",
teal: "#0e7490",
emerald: "#059669",
orange: "#f97316",
amber: "#d97706",
slate: "#334155",
neutral: "#94a3b8",
} as const;
const DISPLAY = "var(--font-display), system-ui, sans-serif";
const BODY = "var(--font-inter), system-ui, sans-serif";
const MONO = "var(--font-mono), ui-monospace, monospace";
const CARD_SHADOW = "0 3px 12px rgba(27, 49, 57, 0.10)";
const ENGINE_SHADOW = "0 6px 22px rgba(2, 12, 15, 0.30)";
/** ktx logo mascot size, shared by the engine and hub headers. */
const LOGO_SIZE = 56;
type HandleSpec = {
side: "left" | "right";
type: "source" | "target";
id: string;
top?: string;
};
function Handles({ specs }: { specs?: HandleSpec[] }) {
if (!specs) return null;
return (
<>
{specs.map((h) => (
<Handle
key={`${h.type}-${h.id}`}
id={h.id}
type={h.type}
position={h.side === "left" ? Position.Left : Position.Right}
isConnectable={false}
style={{
opacity: 0,
border: 0,
background: "transparent",
...(h.top ? { top: h.top } : {}),
}}
/>
))}
</>
);
}
/* ------------------------------- Card node ------------------------------- */
type CardRow =
| { kind: "title"; text: string }
| { kind: "mono"; text: string; color: string }
| { kind: "desc"; text: string }
| { kind: "muted"; text: string }
| { kind: "chips"; items: string[] }
| { kind: "badge"; text: string; bg: string; border: string; color: string };
type CardData = {
width: number;
height: number;
accent: string;
align?: "center";
rows: CardRow[];
handles?: HandleSpec[];
};
function gapFor(kind: CardRow["kind"], prev?: CardRow["kind"]): number {
if (!prev) return 0;
if (kind === "desc" && prev === "desc") return 3;
if (kind === "mono" && prev === "mono") return 2;
if (kind === "title") return 6;
return 10;
}
function CardRowView({ row }: { row: CardRow }) {
switch (row.kind) {
case "title":
return (
<span
style={{
fontFamily: DISPLAY,
fontWeight: 700,
fontSize: 26,
lineHeight: 1.15,
color: C.ink,
}}
>
{row.text}
</span>
);
case "mono":
return (
<span
style={{
fontFamily: MONO,
fontWeight: 700,
fontSize: 18,
lineHeight: 1.4,
color: row.color,
}}
>
{row.text}
</span>
);
case "desc":
return (
<span
style={{
fontFamily: BODY,
fontWeight: 500,
fontSize: 17,
lineHeight: 1.45,
color: C.inkSoft,
}}
>
{row.text}
</span>
);
case "muted":
return (
<span
style={{
fontFamily: BODY,
fontWeight: 500,
fontSize: 14,
lineHeight: 1.4,
color: C.inkMuted,
}}
>
{row.text}
</span>
);
case "chips":
return (
<div style={{ display: "flex", gap: 8, flexWrap: "wrap" }}>
{row.items.map((c) => (
<span
key={c}
style={{
fontFamily: BODY,
fontWeight: 600,
fontSize: 14,
color: C.inkSoft,
background: C.chipBg,
border: `1px solid ${C.chipBorder}`,
borderRadius: 6,
padding: "4px 10px",
}}
>
{c}
</span>
))}
</div>
);
case "badge":
return (
<span
style={{
display: "inline-flex",
alignItems: "center",
borderRadius: 14,
padding: "3px 12px",
fontFamily: BODY,
fontWeight: 700,
fontSize: 14,
background: row.bg,
border: `1px solid ${row.border}`,
color: row.color,
}}
>
{row.text}
</span>
);
}
}
function CardNode({ data }: NodeProps<Node<CardData>>) {
const center = data.align === "center";
return (
<div
style={{
width: data.width,
height: data.height,
position: "relative",
background: "#ffffff",
border: `1px solid ${C.cardBorder}`,
borderRadius: 10,
boxShadow: CARD_SHADOW,
padding: "18px 20px",
display: "flex",
flexDirection: "column",
alignItems: center ? "center" : "flex-start",
justifyContent: center ? "center" : "flex-start",
textAlign: center ? "center" : "left",
overflow: "hidden",
}}
>
<span
style={{
position: "absolute",
top: 0,
left: 2,
right: 2,
height: 4,
borderRadius: 2,
background: data.accent,
}}
/>
<Handles specs={data.handles} />
{data.rows.map((row, i) => (
<div
key={i}
style={{ marginTop: gapFor(row.kind, data.rows[i - 1]?.kind) }}
>
<CardRowView row={row} />
</div>
))}
</div>
);
}
/* ------------------------------ Engine node ------------------------------ */
type EngineStep = { n: number; title: string; desc: string };
type EngineData = {
width: number;
height: number;
steps: EngineStep[];
handles?: HandleSpec[];
};
function EngineNode({ data }: NodeProps<Node<EngineData>>) {
return (
<div
style={{
width: data.width,
height: data.height,
position: "relative",
background: C.engineBg,
border: `1px solid ${C.engineBorder}`,
borderRadius: 14,
boxShadow: ENGINE_SHADOW,
padding: "24px 24px",
display: "flex",
flexDirection: "column",
overflow: "hidden",
}}
>
<span
style={{
position: "absolute",
top: 0,
left: 2,
right: 2,
height: 4,
borderRadius: 2,
background: C.cyan,
}}
/>
<Handles specs={data.handles} />
<div style={{ display: "flex", alignItems: "center", gap: 14 }}>
<KtxMascot variant="dark" size={LOGO_SIZE} />
<span
style={{
fontFamily: DISPLAY,
fontWeight: 700,
fontSize: 30,
color: C.stepTitle,
}}
>
ktx
</span>
</div>
<div
style={{
flex: 1,
display: "flex",
flexDirection: "column",
justifyContent: "space-around",
marginTop: 6,
}}
>
{data.steps.map((s) => (
<div
key={s.n}
style={{ display: "flex", alignItems: "center", gap: 18 }}
>
<span
style={{
flex: "none",
width: 44,
height: 44,
borderRadius: "50%",
background: C.cyan,
display: "flex",
alignItems: "center",
justifyContent: "center",
fontFamily: DISPLAY,
fontWeight: 800,
fontSize: 22,
color: C.stepNum,
}}
>
{s.n}
</span>
<div style={{ display: "flex", flexDirection: "column", gap: 3 }}>
<span
style={{
fontFamily: DISPLAY,
fontWeight: 700,
fontSize: 24,
lineHeight: 1.1,
color: C.stepTitle,
}}
>
{s.title}
</span>
<span
style={{
fontFamily: BODY,
fontWeight: 500,
fontSize: 16,
lineHeight: 1.3,
color: C.stepDesc,
}}
>
{s.desc}
</span>
</div>
</div>
))}
</div>
</div>
);
}
/* -------------------------------- Hub node ------------------------------- */
type HubData = {
width: number;
height: number;
rows: string[];
handles?: HandleSpec[];
};
function HubNode({ data }: NodeProps<Node<HubData>>) {
return (
<div
style={{
width: data.width,
height: data.height,
position: "relative",
background: C.engineBg,
border: `1px solid ${C.engineBorder}`,
borderRadius: 14,
boxShadow: ENGINE_SHADOW,
padding: "24px 24px",
display: "flex",
flexDirection: "column",
overflow: "hidden",
}}
>
<span
style={{
position: "absolute",
top: 0,
left: 2,
right: 2,
height: 4,
borderRadius: 2,
background: C.cyan,
}}
/>
<Handles specs={data.handles} />
<div style={{ display: "flex", alignItems: "center", gap: 14 }}>
<KtxMascot variant="dark" size={LOGO_SIZE} />
<span
style={{
fontFamily: DISPLAY,
fontWeight: 700,
fontSize: 30,
color: C.stepTitle,
}}
>
ktx
</span>
</div>
<div
style={{
marginTop: 22,
display: "flex",
flexDirection: "column",
gap: 18,
}}
>
{data.rows.map((r) => (
<div key={r} style={{ display: "flex", alignItems: "center", gap: 14 }}>
<span
style={{
flex: "none",
width: 10,
height: 10,
borderRadius: "50%",
background: C.cyan,
}}
/>
<span
style={{
fontFamily: BODY,
fontWeight: 600,
fontSize: 19,
color: C.hubRow,
}}
>
{r}
</span>
</div>
))}
</div>
</div>
);
}
/* ------------------------------- Title node ------------------------------ */
type TitleData = { width: number; eyebrow: string; title: string };
function TitleNode({ data }: NodeProps<Node<TitleData>>) {
return (
<div
style={{
width: data.width,
display: "flex",
flexDirection: "column",
gap: 6,
}}
>
<span
style={{
fontFamily: BODY,
fontSize: 19,
fontWeight: 800,
letterSpacing: 2,
textTransform: "uppercase",
color: C.teal,
}}
>
{data.eyebrow}
</span>
<span
style={{
fontFamily: DISPLAY,
fontSize: 24,
fontWeight: 600,
color: C.inkMuted,
}}
>
{data.title}
</span>
</div>
);
}
export const nodeTypes = {
card: CardNode,
engine: EngineNode,
hub: HubNode,
title: TitleNode,
};

View file

@ -0,0 +1,242 @@
"use client";
import "@xyflow/react/dist/style.css";
import { useCallback, useRef, useState } from "react";
import {
Background,
BackgroundVariant,
type Edge,
getNodesBounds,
type Node,
ReactFlow,
ReactFlowProvider,
useEdgesState,
useNodesState,
useReactFlow,
} from "@xyflow/react";
import { toPng } from "html-to-image";
import {
ingestionEdges,
ingestionNodes,
runtimeEdges,
runtimeNodes,
} from "./flows";
import { nodeTypes } from "./nodes";
const EXPORT_PADDING = 48;
const EXPORT_PIXEL_RATIO = 2;
function DiagramCanvasInner({
initialNodes,
initialEdges,
fileName,
height,
dark,
}: {
initialNodes: Node[];
initialEdges: Edge[];
fileName: string;
height: number;
dark: boolean;
}) {
const wrapperRef = useRef<HTMLDivElement>(null);
const [nodes, , onNodesChange] = useNodesState(initialNodes);
const [edges, , onEdgesChange] = useEdgesState(initialEdges);
const { getNodes } = useReactFlow();
const [busy, setBusy] = useState(false);
const download = useCallback(async () => {
const viewport = wrapperRef.current?.querySelector<HTMLElement>(
".react-flow__viewport",
);
if (!viewport) return;
setBusy(true);
try {
await document.fonts.ready;
const bounds = getNodesBounds(getNodes());
const outW = Math.ceil(bounds.width + EXPORT_PADDING * 2);
const outH = Math.ceil(bounds.height + EXPORT_PADDING * 2);
const tx = EXPORT_PADDING - bounds.x;
const ty = EXPORT_PADDING - bounds.y;
const dataUrl = await toPng(viewport, {
width: outW,
height: outH,
pixelRatio: EXPORT_PIXEL_RATIO,
// transparent background so one PNG works on light and dark GitHub
style: {
width: `${outW}px`,
height: `${outH}px`,
transform: `translate(${tx}px, ${ty}px) scale(1)`,
},
});
const link = document.createElement("a");
link.download = fileName;
link.href = dataUrl;
link.click();
} finally {
setBusy(false);
}
}, [fileName, getNodes]);
return (
<div>
<div style={{ display: "flex", gap: 8, marginBottom: 10 }}>
<button
type="button"
onClick={download}
disabled={busy}
style={btnStyle(busy)}
>
{busy ? "Exporting…" : "Download PNG"}
</button>
</div>
<div
ref={wrapperRef}
style={{
height,
borderRadius: 12,
border: "1px solid rgba(127,127,127,0.2)",
background: dark ? "#0d1117" : "#ffffff",
}}
>
<ReactFlow
nodes={nodes}
edges={edges}
nodeTypes={nodeTypes}
onNodesChange={onNodesChange}
onEdgesChange={onEdgesChange}
fitView
fitViewOptions={{ padding: 0.08 }}
nodesDraggable={false}
nodesConnectable={false}
nodesFocusable={false}
edgesFocusable={false}
elementsSelectable={false}
panOnDrag={false}
panOnScroll={false}
zoomOnScroll={false}
zoomOnPinch={false}
zoomOnDoubleClick={false}
preventScrolling={false}
proOptions={{ hideAttribution: true }}
>
<Background
variant={BackgroundVariant.Dots}
gap={18}
size={1}
color={dark ? "#1f2a30" : "#e6e2db"}
/>
</ReactFlow>
</div>
</div>
);
}
function btnStyle(disabled: boolean): React.CSSProperties {
return {
fontFamily: "var(--font-inter), system-ui, sans-serif",
fontSize: 13,
fontWeight: 600,
padding: "7px 14px",
borderRadius: 8,
border: "1px solid #0e7490",
background: disabled ? "#9bbdc6" : "#0e7490",
color: "#ffffff",
cursor: disabled ? "default" : "pointer",
};
}
function DiagramCanvas(props: {
initialNodes: Node[];
initialEdges: Edge[];
fileName: string;
height: number;
dark: boolean;
}) {
return (
<ReactFlowProvider>
<DiagramCanvasInner {...props} />
</ReactFlowProvider>
);
}
export function DiagramStudio() {
const [dark, setDark] = useState(false);
return (
<main
style={{
maxWidth: 1320,
margin: "0 auto",
padding: "32px 24px 80px",
fontFamily: "var(--font-inter), system-ui, sans-serif",
}}
>
<header style={{ marginBottom: 24 }}>
<h1
style={{
fontFamily: "var(--font-display), system-ui, sans-serif",
fontSize: 30,
fontWeight: 700,
color: "#1b1b18",
margin: 0,
}}
>
ktx diagram studio
</h1>
<p style={{ color: "#6b6560", marginTop: 6, fontSize: 15 }}>
Static diagrams. Export is a transparent 2× PNG framed to the node
bounds the dark-background toggle is only for previewing.
</p>
<label
style={{
display: "inline-flex",
alignItems: "center",
gap: 8,
marginTop: 12,
fontSize: 14,
color: "#57534e",
}}
>
<input
type="checkbox"
checked={dark}
onChange={(e) => setDark(e.target.checked)}
/>
Preview on dark background
</label>
</header>
<section style={{ marginBottom: 40 }}>
<h2 style={sectionTitle}>1 · Ingestion building the context layer</h2>
<DiagramCanvas
initialNodes={ingestionNodes}
initialEdges={ingestionEdges}
fileName="ingestion-flow.png"
height={560}
dark={dark}
/>
</section>
<section>
<h2 style={sectionTitle}>2 · Serving answering agents at runtime</h2>
<DiagramCanvas
initialNodes={runtimeNodes}
initialEdges={runtimeEdges}
fileName="mcp-runtime-flow.png"
height={480}
dark={dark}
/>
</section>
</main>
);
}
const sectionTitle: React.CSSProperties = {
fontFamily: "var(--font-display), system-ui, sans-serif",
fontSize: 18,
fontWeight: 600,
color: "#1b1b18",
marginBottom: 12,
};

View file

@ -1,40 +1,56 @@
export function Logo() {
"use client";
import Link from "next/link";
const brandFont = {
fontFamily: "var(--font-display), var(--font-sans), sans-serif",
} as const;
export function Logo({ href = "/", className }: { href?: string; className?: string }) {
return (
<div className="flex items-center gap-3.5 group">
<div className="relative flex items-center justify-center transition-transform duration-300 ease-out group-hover:rotate-[-4deg]">
<img
src="/ktx/brand/ktx-mascot.svg"
alt=""
aria-hidden="true"
className="h-20 w-20 object-contain block dark:hidden"
/>
<img
src="/ktx/brand/ktx-mascot-dark.svg"
alt=""
aria-hidden="true"
className="h-20 w-20 object-contain hidden dark:block"
/>
</div>
<div className="flex flex-col items-start leading-none">
<div className={className}>
<div className="flex items-center gap-3.5 group">
<Link href={href} aria-label="ktx documentation home" className="flex items-center no-underline">
<span className="relative flex items-center justify-center transition-transform duration-300 ease-out group-hover:rotate-[-4deg]">
<img
src="/ktx/brand/ktx-mascot.svg"
alt=""
aria-hidden="true"
className="h-20 w-20 object-contain block dark:hidden"
/>
<img
src="/ktx/brand/ktx-mascot-dark.svg"
alt=""
aria-hidden="true"
className="h-20 w-20 object-contain hidden dark:block"
/>
</span>
</Link>
<div className="flex flex-col items-start leading-none">
<Link
href={href}
className="text-[42px] font-semibold text-fd-foreground tracking-tight no-underline"
style={brandFont}
>
ktx
</Link>
<a
href="https://www.kaelio.com"
target="_blank"
rel="noreferrer"
className="mt-1 whitespace-nowrap text-[13px] font-medium text-fd-muted-foreground/80 tracking-tight no-underline transition-colors hover:text-fd-foreground"
style={brandFont}
>
by Kaelio
</a>
</div>
<span
className="text-[42px] font-semibold text-fd-foreground tracking-tight"
style={{ fontFamily: "var(--font-display), var(--font-sans), sans-serif" }}
className="text-[19px] font-medium text-fd-muted-foreground/80 tracking-tight border-l border-fd-border pl-3 ml-1"
style={brandFont}
>
ktx
</span>
<span
className="mt-1 whitespace-nowrap text-[13px] font-medium text-fd-muted-foreground/80 tracking-tight"
style={{ fontFamily: "var(--font-display), var(--font-sans), sans-serif" }}
>
by Kaelio
Docs
</span>
</div>
<span
className="text-[19px] font-medium text-fd-muted-foreground/80 tracking-tight border-l border-fd-border pl-3 ml-1"
style={{ fontFamily: "var(--font-display), var(--font-sans), sans-serif" }}
>
Docs
</span>
</div>
);
}

View file

@ -0,0 +1,576 @@
"use client";
import {
type Edge,
type EdgeProps,
getSmoothStepPath,
Handle,
MarkerType,
type Node,
type NodeProps,
Position,
} from "@xyflow/react";
import { FlowCanvas } from "./flow-canvas";
type AgentNodeData = {
title: string;
items: string[];
};
type HubNodeData = {
title: string;
badge: string;
rows: string[];
};
type TargetNodeData = {
accent: string;
title: string;
body: string;
rows: { text: string; color?: string; mono?: boolean }[];
badge?: string;
};
type AgentNode = Node<AgentNodeData, "agent">;
type HubNode = Node<HubNodeData, "hub">;
type TargetNode = Node<TargetNodeData, "target">;
type FlowNode = AgentNode | HubNode | TargetNode;
const AGENT_W = 252;
const AGENT_H = 96;
const HUB_W = 306;
const HUB_H = 190;
const TARGET_W = 268;
const TARGET_H = 148;
const CENTER_X = 470;
const ROW_AGENT_Y = 0;
const ROW_HUB_Y = 196;
const ROW_TARGET_Y = 488;
const AGENT_X = CENTER_X - AGENT_W / 2;
const HUB_X = CENTER_X - HUB_W / 2;
const TARGET_GAP_X = 38;
const TARGETS_TOTAL = TARGET_W * 2 + TARGET_GAP_X;
const TARGETS_START_X = CENTER_X - TARGETS_TOTAL / 2;
const CONTEXT_X = TARGETS_START_X;
const WAREHOUSE_X = TARGETS_START_X + TARGET_W + TARGET_GAP_X;
const EDGE_STROKE = "#94a3b8";
const CYCLE_STROKE = "#0e7490";
const EMERALD = "#059669";
const TEAL = "#0e7490";
const nodes: FlowNode[] = [
{
id: "agent",
type: "agent",
position: { x: AGENT_X, y: ROW_AGENT_Y },
data: {
title: "Your agent",
items: ["Claude Code", "Cursor", "Codex"],
},
draggable: false,
selectable: false,
},
{
id: "hub",
type: "hub",
position: { x: HUB_X, y: ROW_HUB_Y },
data: {
title: "ktx",
badge: "MCP + CLI",
rows: [
"Search wiki + semantic layer",
"Return approved metrics",
"Compile metrics → SQL",
],
},
draggable: false,
selectable: false,
},
{
id: "context",
type: "target",
position: { x: CONTEXT_X, y: ROW_TARGET_Y },
data: {
accent: TEAL,
title: "Context layer",
body: "Approved definitions agents search before they answer.",
rows: [
{ text: "wiki/*.md", color: EMERALD, mono: true },
{ text: "semantic-layer/*.yaml", color: TEAL, mono: true },
],
},
draggable: false,
selectable: false,
},
{
id: "warehouse",
type: "target",
position: { x: WAREHOUSE_X, y: ROW_TARGET_Y },
data: {
accent: "#334155",
title: "Database",
badge: "read-only",
body: "Runs the compiled SQL. ktx never writes to it.",
rows: [],
},
draggable: false,
selectable: false,
},
];
const labelBg = {
labelBgPadding: [6, 3] as [number, number],
labelBgBorderRadius: 4,
labelStyle: {
fontSize: 13,
fontWeight: 600,
fill: "var(--color-fd-muted-foreground)",
},
labelBgStyle: {
fill: "var(--color-fd-background)",
stroke: "var(--color-fd-border)",
strokeWidth: 1,
},
};
const requestMarker = {
type: MarkerType.ArrowClosed,
color: EDGE_STROKE,
width: 16,
height: 16,
};
const flowEdges: Edge[] = [
{
id: "e-ask",
source: "agent",
sourceHandle: "ask",
target: "hub",
targetHandle: "ask",
type: "straight",
label: "ask",
...labelBg,
style: { stroke: EDGE_STROKE, strokeWidth: 1.5 },
markerEnd: requestMarker,
},
{
id: "e-answer",
source: "hub",
sourceHandle: "answer",
target: "agent",
targetHandle: "answer",
type: "straight",
label: "answer",
...labelBg,
style: { stroke: EDGE_STROKE, strokeWidth: 1.5 },
markerEnd: requestMarker,
},
{
id: "e-search",
source: "hub",
sourceHandle: "to-context",
target: "context",
targetHandle: "in",
type: "smoothstep",
label: "search + read",
...labelBg,
style: { stroke: CYCLE_STROKE, strokeWidth: 1.5 },
markerStart: { type: MarkerType.ArrowClosed, color: CYCLE_STROKE, width: 14, height: 14 },
markerEnd: { type: MarkerType.ArrowClosed, color: CYCLE_STROKE, width: 14, height: 14 },
},
{
id: "e-readonly",
source: "hub",
sourceHandle: "to-warehouse",
target: "warehouse",
targetHandle: "in",
type: "smoothstep",
label: "read-only",
...labelBg,
style: { stroke: CYCLE_STROKE, strokeWidth: 1.5 },
markerStart: { type: MarkerType.ArrowClosed, color: CYCLE_STROKE, width: 14, height: 14 },
markerEnd: { type: MarkerType.ArrowClosed, color: CYCLE_STROKE, width: 14, height: 14 },
},
];
function AgentNodeView({ data }: NodeProps<AgentNode>) {
return (
<div
style={{ width: AGENT_W, height: AGENT_H }}
className="flex flex-col justify-center rounded-md border border-fd-border bg-fd-card px-3.5 py-2.5 shadow-sm"
>
<Handle
id="ask"
type="source"
position={Position.Bottom}
className="!opacity-0"
style={{ left: "35%" }}
/>
<Handle
id="answer"
type="target"
position={Position.Bottom}
className="!opacity-0"
style={{ left: "65%" }}
/>
<div className="flex items-center gap-2.5">
<span className="flex h-8 w-8 flex-none items-center justify-center rounded-full bg-fd-primary/15 text-fd-primary">
<svg
xmlns="http://www.w3.org/2000/svg"
width="18"
height="18"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
strokeWidth="1.75"
strokeLinecap="round"
strokeLinejoin="round"
aria-hidden="true"
>
<rect x="3" y="6" width="18" height="12" rx="3" />
<circle cx="9" cy="12" r="1.25" fill="currentColor" stroke="none" />
<circle cx="15" cy="12" r="1.25" fill="currentColor" stroke="none" />
<path d="M12 3v3" />
</svg>
</span>
<p className="text-[17px] font-semibold leading-6 text-fd-foreground">
{data.title}
</p>
</div>
<div className="mt-2 flex flex-wrap gap-1.5">
{data.items.map((item) => (
<span
key={item}
className="rounded border border-fd-border bg-fd-background px-1.5 py-0.5 text-[12px] leading-5 text-fd-muted-foreground"
>
{item}
</span>
))}
</div>
</div>
);
}
function HubNodeView({ data }: NodeProps<HubNode>) {
return (
<div
style={{ width: HUB_W, height: HUB_H }}
className="relative flex flex-col rounded-md border border-cyan-200/20 bg-[#0f1f23] px-4 py-3.5 text-white shadow-sm dark:bg-[#0b181b]"
>
<Handle
id="ask"
type="target"
position={Position.Top}
className="!opacity-0"
style={{ left: "37.5%" }}
/>
<Handle
id="answer"
type="source"
position={Position.Top}
className="!opacity-0"
style={{ left: "62.5%" }}
/>
<Handle
id="to-context"
type="source"
position={Position.Bottom}
className="!opacity-0"
style={{ left: "44%" }}
/>
<Handle
id="to-warehouse"
type="source"
position={Position.Bottom}
className="!opacity-0"
style={{ left: "56%" }}
/>
<div className="flex items-center gap-2.5">
<span className="flex h-7 w-7 flex-none items-center justify-center rounded-md bg-cyan-300/95 font-mono text-sm font-bold text-[#0b1c20]">
k
</span>
<span className="text-[19px] font-bold leading-6 text-white">
{data.title}
</span>
<span className="ml-1 rounded border border-cyan-200/30 bg-white/5 px-1.5 py-0.5 font-mono text-[11px] leading-5 text-cyan-100/85">
{data.badge}
</span>
</div>
<div className="mt-3 flex flex-1 flex-col justify-center gap-2">
{data.rows.map((row) => (
<div key={row} className="flex items-center gap-2.5">
<span className="h-1.5 w-1.5 flex-none rounded-full bg-cyan-300/95" />
<span className="text-[14px] font-medium leading-5 text-cyan-50/90">
{row}
</span>
</div>
))}
</div>
</div>
);
}
function TargetNodeView({ data }: NodeProps<TargetNode>) {
return (
<div
style={{
width: TARGET_W,
height: TARGET_H,
borderTop: `3px solid ${data.accent}`,
}}
className="overflow-hidden rounded-md border border-fd-border bg-fd-card px-3.5 py-3 shadow-sm"
>
<Handle id="in" type="target" position={Position.Top} className="!opacity-0" />
<div className="flex items-center gap-2">
<p className="text-[17px] font-semibold leading-6 text-fd-foreground">
{data.title}
</p>
{data.badge ? (
<span
className="rounded-full px-1.5 py-0.5 text-[11px] font-semibold leading-5"
style={{
color: data.accent,
background: "color-mix(in oklch, var(--color-fd-card) 86%, #64748b)",
}}
>
{data.badge}
</span>
) : null}
</div>
{data.rows.length > 0 ? (
<div className="mt-1 flex flex-col gap-0.5">
{data.rows.map((row) => (
<span
key={row.text}
className={
row.mono
? "font-mono text-[13px] font-semibold tracking-tight"
: "text-[12px] leading-4 text-fd-muted-foreground"
}
style={row.color ? { color: row.color } : undefined}
>
{row.text}
</span>
))}
</div>
) : null}
<p className="mt-1.5 line-clamp-2 text-[13px] leading-[18px] text-fd-muted-foreground">
{data.body}
</p>
</div>
);
}
/* ------------------------------- Particles ------------------------------- */
const PARTICLE_SPEED_PX_PER_SEC = 150;
const PARTICLE_MIN_DURATION_SEC = 5;
type Leg = {
sx: number;
sy: number;
sPos: Position;
tx: number;
ty: number;
tPos: Position;
};
const AGENT_ASK_X = AGENT_X + AGENT_W * 0.35;
const AGENT_ANSWER_X = AGENT_X + AGENT_W * 0.65;
const AGENT_BOTTOM_Y = ROW_AGENT_Y + AGENT_H;
const HUB_ASK_X = HUB_X + HUB_W * 0.375;
const HUB_ANSWER_X = HUB_X + HUB_W * 0.625;
const HUB_TO_CONTEXT_X = HUB_X + HUB_W * 0.44;
const HUB_TO_WAREHOUSE_X = HUB_X + HUB_W * 0.56;
const HUB_BOTTOM_Y = ROW_HUB_Y + HUB_H;
const CONTEXT_TOP_X = CONTEXT_X + TARGET_W / 2;
const WAREHOUSE_TOP_X = WAREHOUSE_X + TARGET_W / 2;
function buildCyclePath(spokeX: number, targetX: number): {
d: string;
length: number;
} {
const legs: Leg[] = [
// agent → hub (ask, down)
{ sx: AGENT_ASK_X, sy: AGENT_BOTTOM_Y, sPos: Position.Bottom, tx: HUB_ASK_X, ty: ROW_HUB_Y, tPos: Position.Top },
// through the hub to its spoke handle (down, drawn behind the hub)
{ sx: HUB_ASK_X, sy: ROW_HUB_Y, sPos: Position.Bottom, tx: spokeX, ty: HUB_BOTTOM_Y, tPos: Position.Top },
// hub → target (down)
{ sx: spokeX, sy: HUB_BOTTOM_Y, sPos: Position.Bottom, tx: targetX, ty: ROW_TARGET_Y, tPos: Position.Top },
// target → hub (up)
{ sx: targetX, sy: ROW_TARGET_Y, sPos: Position.Top, tx: spokeX, ty: HUB_BOTTOM_Y, tPos: Position.Bottom },
// through the hub to its answer handle (up, drawn behind the hub)
{ sx: spokeX, sy: HUB_BOTTOM_Y, sPos: Position.Top, tx: HUB_ANSWER_X, ty: ROW_HUB_Y, tPos: Position.Bottom },
// hub → agent (answer, up)
{ sx: HUB_ANSWER_X, sy: ROW_HUB_Y, sPos: Position.Top, tx: AGENT_ANSWER_X, ty: AGENT_BOTTOM_Y, tPos: Position.Bottom },
];
const segments = legs.map((leg) => {
const [segment] = getSmoothStepPath({
sourceX: leg.sx,
sourceY: leg.sy,
sourcePosition: leg.sPos,
targetX: leg.tx,
targetY: leg.ty,
targetPosition: leg.tPos,
});
return segment;
});
let d = segments[0];
for (let i = 1; i < segments.length; i += 1) {
d += ` ${segments[i].replace(/^M/, "L")}`;
}
const length = legs.reduce(
(sum, leg) => sum + Math.abs(leg.tx - leg.sx) + Math.abs(leg.ty - leg.sy),
0,
);
return { d, length };
}
type ParticleEdgeData = {
d: string;
duration: number;
beginOffset: number;
color: string;
};
type ParticleEdge = Edge<ParticleEdgeData, "particle">;
function ParticleEdgeView({ id, data }: EdgeProps<ParticleEdge>) {
if (!data) return null;
const pathId = `runtime-particle-path-${id}`;
return (
<>
<path id={pathId} d={data.d} fill="none" stroke="none" pointerEvents="none" />
<g className="runtime-particle" style={{ color: data.color }}>
<circle r={7.5} fill="currentColor" opacity={0.16} />
<circle r={3.75} fill="currentColor" opacity={0.32} />
<circle r={2.1} fill="currentColor" />
<animateMotion
dur={`${data.duration.toFixed(2)}s`}
begin={`-${data.beginOffset.toFixed(2)}s`}
repeatCount="indefinite"
>
<mpath href={`#${pathId}`} />
</animateMotion>
</g>
</>
);
}
function makeCycleEdge(
id: string,
source: string,
spokeX: number,
targetX: number,
beginFraction: number,
): ParticleEdge {
const { d, length } = buildCyclePath(spokeX, targetX);
const duration = Math.max(
PARTICLE_MIN_DURATION_SEC,
length / PARTICLE_SPEED_PX_PER_SEC,
);
return {
id,
source,
target: source,
type: "particle",
data: { d, duration, beginOffset: duration * beginFraction, color: CYCLE_STROKE },
};
}
const particleEdges: ParticleEdge[] = [
makeCycleEdge("p-context", "context", HUB_TO_CONTEXT_X, CONTEXT_TOP_X, 0),
makeCycleEdge("p-warehouse", "warehouse", HUB_TO_WAREHOUSE_X, WAREHOUSE_TOP_X, 0.5),
];
const nodeTypes = {
agent: AgentNodeView,
hub: HubNodeView,
target: TargetNodeView,
};
const edgeTypes = {
particle: ParticleEdgeView,
};
const edges = [...flowEdges, ...particleEdges];
export function ProductRuntime() {
return (
<section
className="not-prose my-12 w-full max-w-full min-w-0 space-y-5"
aria-labelledby="runtime-title"
>
<div className="max-w-3xl">
<h2
id="runtime-title"
className="text-xl font-semibold tracking-normal text-fd-foreground sm:text-2xl"
style={{ fontFamily: "var(--font-display)" }}
>
How serving works
</h2>
<p className="mt-3 text-sm leading-6 text-fd-muted-foreground">
At runtime, agents reach ktx through MCP. ktx searches the context
layer, returns approved metrics, and compiles them into read-only SQL
the warehouse runs.
</p>
</div>
<article
className="max-w-full min-w-0 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
aria-label="ktx serving flow from an agent request to a governed answer"
>
<div className="border-b border-fd-border bg-fd-muted/35 px-5 py-4">
<p className="text-xs font-semibold uppercase tracking-wide text-fd-primary">
Serving flow
</p>
<h3
className="mt-1 text-base font-semibold tracking-normal text-fd-foreground sm:text-lg"
style={{ fontFamily: "var(--font-display)" }}
>
From an agent request to a governed answer
</h3>
<p className="mt-2 max-w-3xl text-xs leading-5 text-fd-muted-foreground">
The agent asks in plain language. ktx is the only thing that touches
the context layer and the warehouse, and every database connection
is read-only.
</p>
</div>
<FlowCanvas
nodes={nodes}
edges={edges}
nodeTypes={nodeTypes}
edgeTypes={edgeTypes}
canvasStyle={{
height: "min(620px, 98vw)",
minHeight: 430,
}}
className="runtime-canvas"
fitViewOptions={{ padding: 0.06 }}
ariaLabel="ktx serving flow diagram"
/>
</article>
<style>{`
.runtime-canvas .runtime-particle {
pointer-events: none;
filter: drop-shadow(0 0 6px currentColor);
}
@media (prefers-reduced-motion: reduce) {
.runtime-canvas .runtime-particle {
display: none;
}
}
`}</style>
</section>
);
}

View file

@ -253,7 +253,7 @@ const engine: EngineNode = {
},
{
index: 3,
title: "Detect fan-out",
title: "Detect fanout",
detail: "group measures by source, flag chasm traps",
},
{

View file

@ -1,201 +0,0 @@
# Goal
Set up **ktx** from scratch end-to-end as a fully autonomous, agent-driven replacement for the interactive `ktx setup` wizard. Detect the environment, install missing prerequisites, ask the user only for information you genuinely need (which connections to add, credentials), write a valid configuration, verify it works, and run a fast ingest. Keep the user updated throughout.
# Operating principles
- **Be autonomous.** Detect, decide, and act. Only ask the user when you need information that only they can provide: project location, which databases/sources to connect, credentials, and similar choices.
- **Stream short status updates.** Before each major phase ("Checking prerequisites…", "Installing uv…", "Configuring warehouse connection…", "Running fast ingest…") print a one-line update. Not chatty - just enough that the user can see what's happening.
- **Verify against docs, never guess.** CLI flags, config keys, and command names must come from the docs or from `ktx <command> --help`. If something looks wrong or missing, say so explicitly.
- **Print every command you run and its exit code.** Terse, not silent.
- **Fail loudly with cause + fix.** When a command fails: capture the exact error, identify the cause, change something, retry. Never retry an unchanged command. Exceptions for *known soft-failures* are listed in Phase 4 - handle those without retrying.
- **No LLM-based ingestion in this flow.** Only `--fast` ingest. The user can run `--deep` later.
- **Platform-agnostic.** Detect the host OS first and pick the right install commands / path syntax. Anything path- or shell-specific must branch on OS.
# Authoritative docs
**ktx** docs are served at `https://docs.kaelio.com/ktx/`. **Start by fetching `https://docs.kaelio.com/ktx/llms.txt`** to discover the docs map. Scan it for a "troubleshooting" entry - if one exists, read it **before** running install/setup so you can apply known fixes preemptively rather than after failing. If no troubleshooting page is listed (current state of the docs), proceed. Then fetch any other `.md` pages you need (setup, ingest, status, connection types). **Never invent CLI flags or config keys** - verify against the docs or `ktx --help` / `ktx <subcommand> --help`.
> **Note on the `ktx status` JSON example in the docs.** The docs page for `ktx status` shows an example shaped like `{"title": "...", "checks": [...]}`. That example is outdated. The real CLI output uses a top-level `verdict` field plus a `connections[]` array - see Phase 5 for the canonical success criteria. Trust the shape in this prompt over the docs example.
# Workflow
## Phase 1 - Detect environment
Determine the host OS (e.g. via `uname -s`, `process.platform`, or `$env:OS`). Use the right install commands per OS for the rest of this flow.
| Tool | macOS / Linux | Windows (PowerShell) |
|------|---------------|----------------------|
| `uv` | `curl -LsSf https://astral.sh/uv/install.sh \| sh` then re-source shell env | `irm https://astral.sh/uv/install.ps1 \| iex` |
| Node.js | use system / fnm / nvm - **do not** auto-install | use system / nvm-windows - **do not** auto-install |
| **ktx** CLI | `npm install -g …` (see Phase 2) | `npm install -g …` (see Phase 2) |
If Node.js is missing, **stop and ask the user** to install it (https://nodejs.org/). Do not attempt to auto-install Node.
## Phase 2 - Verify and install prerequisites
Check each tool in order; install only if missing.
1. **Node.js** - run `node --version`. Require >= 22. If missing or older, stop and instruct the user.
2. **`uv`** - run `uv --version`. If missing, run the OS-appropriate install command, then re-source the shell environment (`export PATH="$HOME/.local/bin:$PATH"` on Linux/macOS) so `uv` is on `PATH`.
3. **ktx CLI** -
- Install ktx with `npm install -g @kaelio/ktx`
- Verify with `ktx --version`.
Print one status line per tool ("✓ uv 0.11.15 found", "Installing uv…", "✓ ktx 0.x.y installed").
## Phase 3 - Gather user choices
Ask the user (grouped if your harness supports it; otherwise sequentially):
1. **Project directory.** Default: current working directory. Confirm before continuing.
2. **LLM provider.** Default: `claude-code` with model `sonnet` (the user is already inside Claude Code; no extra API key needed). Offer `anthropic` (paste API key, stored as `env:` or `file:` ref) and `vertex` (GCP project + location) as alternatives. Skip if defaults are accepted.
3. **Embeddings backend.** Default: `sentence-transformers` (local, no API key, managed Python runtime). Offer `openai` only if the user has a key.
4. **Database connections.** Ask how many to add, then loop. For each, collect:
- Connection name (e.g. `warehouse`, `analytics`).
- Driver: one of `sqlite`, `postgres`, `mysql`, `sqlserver`, `bigquery`, `snowflake`.
- Connection URL/DSN (or service-account file for BigQuery). Accept `env:VAR_NAME` or `file:/abs/path` to avoid pasting raw secrets.
- **Heads-up for the user**: even if they paste a literal URL, **ktx** will silently relocate it into `<project>/.ktx/secrets/<connection>-url` and rewrite `ktx.yaml` to `url: file:…` - this is correct, secure behavior and not a bug.
- Schemas / datasets to include (postgres / sqlserver / snowflake / bigquery only).
- Optional `enabled_tables` allowlist if the user wants to scope ingest to specific tables.
5. **Context sources** (dbt, Metabase, Looker, LookML, MetricFlow, Notion). Default: none. Ask only if the user mentions them.
## Phase 4 - Configure the project
Drive the existing wizard non-interactively (verify exact flag names with `ktx setup --help` and the docs - the automation flags are hidden from help but accepted):
```
ktx setup \
--project-dir <path> \
--no-input --yes \
--llm-backend <claude-code|anthropic|vertex> --llm-model <model> \
[--anthropic-api-key-env ANTHROPIC_API_KEY | --anthropic-api-key-file <path>] \
[--vertex-project <p> --vertex-location <loc>] \
--embedding-backend <sentence-transformers|openai> \
[--embedding-api-key-env OPENAI_API_KEY] \
--skip-sources \
--database <driver> --database-connection-id <name> --database-url <url|env:VAR|file:/path> \
[--database-schema <schema> …]
```
Notes on the flags above:
- **Project creation is automatic with `--no-input --yes`.** When
`ktx.yaml` exists, setup resumes it. When it doesn't exist, setup creates it
at `--project-dir`.
- **`--database-connection-id` is dual-purpose.** With `--database` or
`--database-url`, it names the new connection. Without those flags, it
selects an existing connection id.
- **Configure one new database connection per setup command.** If the user
wants multiple new connections, run setup again for each connection.
- **You don't need `--skip-agents` in this flow.** The agent integration step
is opt-in: setup leaves it alone unless you pass `--agents --target
<target>`.
- **`--skip-sources`** is correct and is the documented way to leave context sources unconfigured.
### Known soft-failure: `ktx setup` exits 1 after a successful fast build
When you select a configuration that only does fast ingest, `ktx setup`'s final readiness verification fails with:
```
ktx context build did not pass agent-readiness verification.
<connection>: deep database context has not completed.
```
This is **expected** and **does not mean setup failed**. Treat the exit code as a soft-failure **only if all of the following hold**:
- The build log shows the fast ingest reached `[100%] Scan completed` for every configured connection.
- `ktx connection test <name>` (run next) exits 0 for every connection.
- `ktx status --json --no-input` reports `verdict: "ready"`.
If those three conditions hold, proceed to Phase 5 without retrying setup, and **do not** switch to `--deep` to "fix" the readiness gate - deep ingest is explicitly out of scope. Mention this in the final report under "Docs / CLI gaps" so the user is aware.
If any of those three conditions do not hold, this is a real failure - capture the error, fetch the relevant docs page, fix the cause, retry.
After `ktx setup` writes `ktx.yaml`, edit it directly for anything flags don't cover:
- Per-connection `enabled_tables` allowlist (snake_case, under `connections.<name>.enabled_tables`).
- Any advanced settings the user requested.
Use a YAML-aware editor (e.g. `uv run python -c "import yaml; …"`) - do not hand-edit blindly.
## Phase 5 - Verify
`ktx setup` already runs a fast ingest of every database connection it configures, so you do not need to re-ingest by default. For each configured connection:
```
ktx connection test <connection-name> # must exit 0
```
Only re-run ingest if setup's build log did **not** reach 100% for that connection:
```
ktx ingest <connection-name> --fast --no-input
```
**Mutex warning on `ktx ingest`**: passing both `--yes` and `--no-input` fails with `Choose only one runtime install mode: --yes or --no-input`. Setup already installed the managed Python runtime, so pass **only `--no-input`** to `ktx ingest`. (`--yes` is only needed when an ingest invocation has to install the runtime itself, which is not the case here.)
Then run the global health check:
```
ktx status --json --no-input
```
Success requires (canonical shape - supersedes the example in the docs):
- `verdict: "ready"` at the top of the JSON.
- Every `connections[].status === "ok"`.
- `ktx connection test <name>` exited 0 for every connection.
Do **not** run `--deep` ingest in this flow - that requires LLM time and is out of scope.
### Optional: directly probe the ktx daemon
If the user asks for stronger verification that `sentence-transformers` is actually serving (not just that setup said "ok"), do all of:
1. `ktx admin runtime status --json` → expect `"kind": "ready"` and `"features": [..., "local-embeddings"]`.
2. `pgrep -fa ktx-daemon` → expect a process running `ktx-daemon serve-http`.
3. `curl -sS http://127.0.0.1:<port>/health` → expect HTTP 200 with `{"status":"healthy",…}`.
4. `curl -sS -X POST http://127.0.0.1:<port>/embeddings/compute -H 'content-type: application/json' -d '{"text":"hello"}'` → expect `{"embedding": [...384 floats...]}`.
Discover the port from setup's log line `Started ktx daemon: http://127.0.0.1:<port>` or from the daemon's OpenAPI at `GET /openapi.json`. Note: the routes are `/health` and `/embeddings/compute` - not `/healthz` or `/embeddings`.
## Phase 6 - Final report
Print a structured report:
```
ktx SETUP COMPLETE
Project: <path>
LLM: <backend> / <model>
Embeddings: <backend> / <model>
Runtime: managed Python ✓ (if the ktx daemon was started)
Connections:
- <name> (<driver>) status=ok schemas=[…] tables=<N>
- …
Sources: <list or "none">
Verdict: ready
```
Then **Next steps** (copy-pasteable):
1. Enrich with AI descriptions and embeddings: `ktx ingest <connection> --deep` (several minutes per connection).
2. Add more connections later by rerunning this setup or via `ktx setup --database … --database-connection-id …`.
3. Configure context sources (dbt, Metabase, Looker, LookML, MetricFlow, Notion) - see `ktx setup --help` for `--source …` flags.
4. Install agent integration: `ktx setup --agents --target <claude-code|claude-desktop|codex|cursor|opencode|universal>` (with optional `--global` for `claude-code`/`codex`).
5. Connect the agent / MCP: see docs at `https://docs.kaelio.com/ktx/`.
Under **Docs / CLI gaps to flag** include any of these that applied during your run:
- `ktx setup` exits non-zero after a successful fast build (deep-readiness gate); status reports ready.
- `ktx ingest` rejects `--yes` and `--no-input` together; docs don't note the conflict.
- `ktx status --json` real shape (`verdict`, `connections[]`) doesn't match the example in the docs page.
- The pasted DB URL was moved to `.ktx/secrets/<name>-url` automatically.
End with a single line: `RESULT: PASS` or `RESULT: FAIL - <one-line reason>`.
# Operating rules (recap)
- Print every command you run and its exit code. Status updates may be terse, but never silent.
- On failure: capture the error, fetch the relevant docs page, fix the cause, retry. Never retry an unchanged command.
- Known soft-failures (listed in Phase 4 and Phase 5) are not real failures - handle them as documented; do not retry or escalate.
- If you find a docs/CLI gap ("docs say X but CLI does Y"), call it out in the final report.
- Never commit credentials - **ktx** accepts `env:` and `file:` references; prefer those. **ktx** will also auto-relocate literal URLs into `.ktx/secrets/`, but that does not protect anyone who pasted the URL into chat history.

View file

@ -14,7 +14,8 @@ Read https://docs.kaelio.com/ktx/llms.txt first. Then fetch only the ktx Markdow
## Set up a project
```text
Set up ktx in this repository. Start by reading /docs/ai-resources/agent-quickstart.md and /docs/getting-started/quickstart.md. Install the published CLI with npm; use pnpm only when working from a ktx source checkout. After setup, run ktx status and summarize which steps are complete, which files changed, and what still needs credentials or user input.
Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install
and configure ktx in this project.
```
## Find a command

View file

@ -0,0 +1,86 @@
---
title: "ktx completion"
description: "Print a shell completion script for tab completion."
---
Print a shell completion script for **ktx**. Once installed, pressing <kbd>Tab</kbd>
completes commands, subcommands, and flags, and - inside a **ktx** project - the
names of things that already exist: semantic-layer source names for
`ktx sl read` and `ktx sl validate`, wiki page keys for `ktx wiki read`, and
configured connection ids for `ktx connection test`, `ktx ingest`, and
`ktx sql`. This saves you from remembering exact source, page, or connection
names.
## Command signature
```bash
ktx completion <shell>
```
`<shell>` must be `zsh` or `bash`. The command writes the script to stdout; it
does not modify any files. Enable completion by evaluating the script in your
shell startup file.
## Installation
Add the matching line to your shell startup file, then restart your shell (or
`source` the file). `ktx` must be on your `PATH`.
```bash
# zsh — add to ~/.zshrc
eval "$(ktx completion zsh)"
```
```bash
# bash — add to ~/.bashrc
eval "$(ktx completion bash)"
```
To try it for the current session only, run the same `eval` line directly in
your terminal.
## What gets completed
| Position | Completions |
|----------|-------------|
| `ktx <Tab>` | Top-level commands (`setup`, `sl`, `wiki`, `ingest`, …) |
| `ktx sl <Tab>` | The `read` / `validate` / `query` subcommands |
| `ktx sl read <Tab>` | Existing semantic-layer source names |
| `ktx sl validate <Tab>` | Existing semantic-layer source names |
| `ktx wiki <Tab>` | The `read` subcommand |
| `ktx wiki read <Tab>` | Existing wiki page keys |
| `ktx connection test <Tab>` | Configured connection ids |
| `ktx ingest <Tab>` | Configured connection ids |
| `ktx sql --connection <Tab>` | Configured connection ids |
| `ktx completion <Tab>` | `zsh` or `bash` |
| `ktx <command> --<Tab>` | The command's flags and inherited global flags |
| `ktx sl --output <Tab>` | An option's allowed values (here `pretty`, `plain`, `json`) |
| `ktx sl --connection-id <Tab>` | Configured connection ids |
Source names, wiki page keys, and connection ids are read from the **ktx**
project resolved from your current directory (or `--project-dir` /
`KTX_PROJECT_DIR`). Outside a **ktx** project, completion still suggests
commands and flags but no project entities. Bare `ktx sl <Tab>` and
`ktx wiki <Tab>` complete subcommands instead of entity names because their
positional arguments are free-text search queries.
## Examples
```bash
# Print the zsh completion script
ktx completion zsh
# Print the bash completion script
ktx completion bash
# Install for zsh
echo 'eval "$(ktx completion zsh)"' >> ~/.zshrc
```
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| `error: command-argument value '<name>' is invalid for argument 'shell'. Allowed choices are zsh, bash.` | A shell other than `zsh` or `bash` was requested | Re-run with `ktx completion zsh` or `ktx completion bash` |
| Tab completion does nothing | The script was not evaluated, or `ktx` is not on `PATH` | Confirm the `eval` line is in your startup file, restart the shell, and verify `ktx --version` runs |
| Source, page, or connection names are missing | The current directory is not inside a **ktx** project | Run from the project directory, or pass `--project-dir`, or set `KTX_PROJECT_DIR` |

View file

@ -104,6 +104,6 @@ configured connection and exit non-zero if any probe fails.
| Error | Cause | Recovery |
|-------|-------|----------|
| No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or context-source connection |
| Connection test fails | Credentials, network access, database, warehouse, or schema is invalid | Verify the same URL with the database's native client, then rerun `ktx setup` and reconfigure the connection |
| Mapping validation fails during setup | BI database mappings do not point at valid warehouse connections | Rerun `ktx setup` and update the context-source mapping selections |
| Connection test fails | Credentials, network access, database, warehouse, or schema is invalid | Use the setup recovery menu to retry or re-enter details; if it still fails, verify the same URL with the database's native client |
| Mapping validation fails during setup | BI database mappings do not point at valid warehouse connections | Use the setup recovery menu to retry validation or re-enter mapping selections; rerun `ktx setup` if you already exited |
| Notion page picker cannot run | The terminal is non-interactive or Notion discovery failed | Rerun interactive `ktx setup`, or use non-interactive setup flags with explicit root page ids |

View file

@ -5,9 +5,11 @@ description: "Build or refresh ktx context, or capture text into ktx memory."
`ktx ingest` builds or refreshes **ktx** context from configured connections, and
can also capture free-form text into **ktx** memory. Database connections build
schema context. Context-source connections ingest metadata from tools such as
dbt, Looker, Metabase, MetricFlow, LookML, and Notion. Pass `--text` or
`--file` to capture inline text or text files into memory instead.
enriched context — schema plus AI-generated descriptions, embeddings, and
relationship evidence — and require a configured model and embeddings.
Context-source connections ingest metadata from tools such as dbt, Looker,
Metabase, MetricFlow, LookML, and Notion. Pass `--text` or `--file` to capture
inline text or text files into memory instead.
## Command signature
@ -29,8 +31,6 @@ connection is selected.
| Flag | Description | Default |
|------|-------------|---------|
| `--all` | Ingest all configured connections (same as bare invocation) | `false` |
| `--fast` | Use deterministic fast database ingest | Stored connection default, or `fast` |
| `--deep` | Use deep database ingest with AI-generated descriptions, embeddings, and relationship evidence | Stored connection default, or `fast` |
| `--query-history` | Include database query-history usage patterns | Stored connection default |
| `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default |
| `--query-history-window-days <days>` | BigQuery/Snowflake query-history lookback window for this run | Stored connection default |
@ -44,12 +44,12 @@ connection is selected.
| `--yes` | Install required managed runtime features without prompting | `false` |
| `--no-input` | Disable interactive terminal input | - |
`--fast` and `--deep` are mutually exclusive. Depth flags apply only to
database connections. Query-history flags apply only to database connections
Database ingest always builds enriched context and requires a configured model
and embeddings (run `ktx setup`); connections without that configuration fail
before any work starts. Query-history flags apply only to database connections
that support query history. The window flag applies to BigQuery and Snowflake;
Postgres reads the current `pg_stat_statements` aggregate data instead of a
time-windowed history table. Query-history ingest runs after fast ingest and
requires deep ingest readiness.
time-windowed history table. Query-history ingest runs after the schema scan.
When more than one connection is selected, database ingest runs first, then
context-source ingest and memory updates run for context-source connections.
@ -72,14 +72,8 @@ ktx ingest
# Build one database or context-source connection
ktx ingest warehouse
# Force deterministic fast database ingest
ktx ingest warehouse --fast
# Force deep database ingest with AI enrichment
ktx ingest warehouse --deep
# Include query-history usage patterns
ktx ingest warehouse --deep --query-history
ktx ingest warehouse --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30
@ -149,13 +143,51 @@ verbosity:
KTX_INGEST_TRACE_LEVEL=trace ktx ingest metabase
```
### Profiling a slow ingest
Each timed phase and work unit records a `durationMs` in the trace, and each
agent loop records its step count and token usage. To see where wall-clock time
went, enable profiling and **ktx** prints a rolled-up breakdown to stderr at the
end of the run. There are two ways to turn it on, and two output formats.
Turn it on per run with the `KTX_PROFILE_INGEST` environment variable, or
persistently with `ingest.profile` in `ktx.yaml` (useful for CI or while
iterating on a slow source):
```bash
KTX_PROFILE_INGEST=1 ktx ingest metabase # human-readable table
KTX_PROFILE_INGEST=json ktx ingest metabase # raw JSON for coding agents
```
```yaml
ingest:
profile: true # human table; use "json" for the machine-readable form
```
Both formats report total wall time, time per phase, and the slowest work units,
splitting each work unit's agent-loop time into model time versus tool-execution
time. The `json` form emits the full structured profile (raw milliseconds and
token counts, stable keys) plus a `summary.headline` one-line diagnosis, so a
coding agent can parse it directly instead of scraping the table. If both the env
var and the config request profiling, `json` wins. Example headline:
```text
Slowest phase: reconciliation (2m 05s, 48% of wall time). 2 work units (1 failed), ~88% model generation vs ~12% tools.
```
Work units run serially by default (`ingest.workUnits.maxConcurrency` is `1`);
raise it in `ktx.yaml` if the profile shows the run is bound by serialized
work-unit agent loops. If the provider reports an LLM rate limit, **ktx** shows
a transient wait message and temporarily reduces effective work-unit concurrency
according to `ingest.rateLimit`.
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| Connection not configured | The connection id is not present in `ktx.yaml` | Add the connection with `ktx setup` or update `ktx.yaml` |
| Deep readiness is missing | `--deep` or query history needs model, embedding, and scan-enrichment configuration | Run `ktx setup` or rerun with `--fast` |
| Query history is unsupported | The selected database driver does not support query history | Run fast ingest without query-history flags |
| Enrichment is not configured | Database ingest needs a model, embeddings, and scan-enrichment configuration | Run `ktx setup` to configure a model and embeddings |
| Query history is unsupported | The selected database driver does not support query history | Run ingest without query-history flags |
| Python runtime is missing | The selected ingest target needs runtime-backed SQL analysis or source parsing | Accept the interactive prompt, rerun with `--yes`, or run the suggested `ktx admin runtime install` command |
| Context-source options were ignored | Depth and query-history flags were supplied for a context-source connection | Omit database-only flags when ingesting context-source connections |
| Context-source options were ignored | Query-history flags were supplied for a context-source connection | Omit database-only flags when ingesting context-source connections |
| Text ingest stops early | `--fail-fast` was used and one item failed | Fix the failed item or rerun without `--fail-fast` to collect all failures |

View file

@ -51,8 +51,9 @@ prompts.
| Flag | Description |
|------|-------------|
| `--llm-backend <backend>` | LLM backend: `anthropic`, `vertex`, or `claude-code` |
| `--llm-backend <backend>` | LLM backend: `anthropic`, `vertex`, `claude-code`, or `codex` |
| `--llm-backend claude-code` | Use the local Claude Code session for **ktx** LLM calls |
| `--llm-backend codex` | Use local Codex authentication for **ktx** LLM calls |
| `--llm-model <model>` | LLM model ID or backend model alias to validate and save |
| `--anthropic-api-key-env <name>` | Environment variable containing the Anthropic API key |
| `--anthropic-api-key-file <path>` | File containing the Anthropic API key |
@ -62,9 +63,14 @@ prompts.
Choose only one Anthropic credential source. Anthropic credential flags are only
valid with the Anthropic backend; Vertex flags are only valid with the Vertex
backend. The `claude-code` backend uses local Claude Code authentication instead
backend. The `claude-code` and `codex` backends use local authentication instead
of Anthropic API key or Vertex flags. For Claude Code, `--llm-model` accepts
`sonnet`, `opus`, `haiku`, or a full Claude model ID.
`sonnet`, `opus`, `haiku`, or a full Claude model ID. For Codex, `--llm-model`
accepts `codex`, `default`, or a `gpt-*` / `codex-*` model ID such as
`gpt-5.5`; any other value is rejected before the auth probe. Run `codex` to
see the models available to your login, and pick a `gpt-*` / `codex-*` id from
that list. Note that `*-codex` API-billing model IDs (for example
`gpt-5.3-codex`) are not available to ChatGPT-subscription logins.
### Embeddings
@ -131,11 +137,34 @@ BigQuery; and `databases` for ClickHouse.
Query history setup is supported for Postgres, BigQuery, and Snowflake. The
window flag applies to BigQuery and Snowflake; Postgres reads the current
`pg_stat_statements` aggregate data instead of a time-windowed history table.
Enabling query history makes deep ingest readiness matter for later
`ktx ingest` runs.
Later `ktx ingest` runs build enriched context and need a configured model and
embeddings, including when query history is enabled.
When query history is enabled for PostgreSQL, Snowflake, or BigQuery,
`ktx setup` runs a non-blocking readiness probe after the connection test
passes. A failed probe still writes setup changes, prints the warehouse-specific
grant or extension remediation, and skips query-history processing until you
fix the prerequisite. If the later schema-context build also fails, interactive
setup offers **Disable query history and retry** so you can finish database
setup with `connections.<id>.context.queryHistory.enabled: false`.
After the schema scan completes, setup can derive query-history service-account
filters from in-scope history. If **ktx** finds clear operational roles, it
prints each proposed exclusion with a reason and writes
`connections.<id>.context.queryHistory.filters.serviceAccounts` only when you
apply the proposal. In non-interactive setup with `--yes`, the proposal is
applied automatically. Existing `serviceAccounts` blocks are never overwritten.
For BigQuery, the remediation tells you to grant `roles/bigquery.resourceViewer`
on the BigQuery project, or grant a custom role that contains
`bigquery.jobs.listAll`.
### Context Sources
In interactive setup, after you configure a database, choose
**Skip context sources** to leave optional context-source setup complete with no
sources. This is equivalent to passing `--skip-sources` in scripted setup.
| Flag | Description |
|------|-------------|
| `--source <type>` | Context-source connector type: `dbt`, `metricflow`, `metabase`, `looker`, `lookml`, or `notion` |
@ -144,9 +173,9 @@ Enabling query history makes deep ingest readiness matter for later
| `--source-git-url <url>` | Git URL for dbt, MetricFlow, or LookML |
| `--source-branch <branch>` | Git branch for context-source setup |
| `--source-subpath <path>` | Repo subpath for context-source setup |
| `--source-auth-token-ref <ref>` | `env:` or `file:` credential reference for source repo auth |
| `--source-auth-token-ref <ref>` | `env:` or `file:` credential reference for source repo auth or Notion integration token |
| `--source-url <url>` | Source service URL for Metabase or Looker |
| `--source-api-key-ref <ref>` | `env:` or `file:` API key reference for Metabase or Notion |
| `--source-api-key-ref <ref>` | `env:` or `file:` API key reference for Metabase |
| `--source-client-id <id>` | Looker client id |
| `--source-client-secret-ref <ref>` | `env:` or `file:` Looker client secret reference |
| `--source-warehouse-connection-id <id>` | Warehouse connection id used for context-source mapping |
@ -175,6 +204,17 @@ ktx setup \
--llm-backend claude-code \
--llm-model opus
# Configure **ktx** to use local Codex authentication for LLM work
ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
```
When you choose `--llm-backend codex`, setup prints a warning if the public
Codex SDK and CLI surface cannot prove full Claude-Code-style isolation. The
backend restricts **ktx** runtime MCP tools to each run, but Codex may still
load user Codex config and built-in command execution or read-only file
capabilities.
```bash
# Script a Postgres connection that reads its URL from the environment
ktx setup \
--project-dir ./analytics \
@ -205,6 +245,14 @@ ktx setup \
--source-warehouse-connection-id warehouse \
--metabase-database-id 1
# Add a Notion source that crawls selected root pages
ktx setup \
--source notion \
--source-connection-id notion-main \
--source-auth-token-ref env:NOTION_TOKEN \
--notion-crawl-mode selected_roots \
--notion-root-page-id abc123def456
# Install project-scoped agent integration for Codex
ktx setup --agents --target codex
```

View file

@ -11,13 +11,16 @@ the vocabulary agents use to generate correct SQL.
```bash
ktx sl [options] [query...] # list (bare) or search (with query)
ktx sl validate <sourceName> [options]
ktx sl read <sourceName>
ktx sl validate <sourceName>
ktx sl query [options]
```
- Bare `ktx sl` lists semantic sources.
- `ktx sl <query...>` searches semantic sources (multi-word queries are
joined with a space).
- `ktx sl <query...>` searches semantic sources. Multi-word queries are joined
with a space.
- `ktx sl read <sourceName>` prints the YAML for one source. Add
`--connection-id` only when the source name exists in multiple connections.
- `ktx sl validate` and `ktx sl query` remain as explicit subcommands.
## Subcommands
@ -26,6 +29,7 @@ ktx sl query [options]
|-----------|-------------|
| (none, no query) | List semantic sources |
| (none, with query) | Search semantic sources |
| `read <sourceName>` | Print the YAML for one semantic source |
| `validate <sourceName>` | Validate a semantic source against the database schema |
| `query` | Compile or execute a semantic query |
@ -40,17 +44,23 @@ ktx sl query [options]
| `--output <mode>` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` |
| `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` |
### `sl read`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | Optional **ktx** connection id for disambiguation | - |
### `sl validate`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | **ktx** connection id (required) | - |
| `--connection-id <id>` | Optional **ktx** connection id for disambiguation | - |
### `sl query`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | **ktx** connection id | - |
| `--connection-id <id>` | Required **ktx** connection id | - |
| `--query-file <path>` | JSON semantic query file | - |
| `--measure <measure>` | Measure to query; repeatable (at least one required) | - |
| `--dimension <dimension>` | Dimension to include; repeatable | - |
@ -65,8 +75,9 @@ ktx sl query [options]
| `--no-input` | Disable interactive managed runtime installation | - |
| `--max-rows <n>` | Maximum rows to return when executing | - |
`sl query` requires at least one `--measure` unless `--query-file` is set.
`--query-file` should point to a JSON semantic query object.
`sl query` requires `--connection-id` and at least one `--measure` unless
`--query-file` is set. `--query-file` must point to a JSON semantic query
object.
## Examples
@ -83,7 +94,16 @@ ktx sl --json
# Search sources as JSON
ktx sl "revenue" --json
# Validate a source against the live schema
# Print the YAML for a source name that is unique across connections
ktx sl read orders
# Print the YAML for a source name that exists in multiple connections
ktx sl --connection-id my-warehouse read orders
# Validate a source name that is unique across connections
ktx sl validate orders
# Validate a source name that exists in multiple connections
ktx sl validate orders --connection-id my-warehouse
# Compile a query and view the generated SQL
@ -144,6 +164,12 @@ shows `#1`, `#2`, and later rank badges for the displayed results. Plain and
JSON output keep the raw `score` value, which is a ranking score rather than a
percentage.
`ktx sl read <sourceName>` prints the source YAML directly to stdout when the
source name is unique across connections. If the name exists in multiple
connections, rerun the command with `--connection-id <id>`. The command does
not wrap output in pretty, plain, or JSON formatting, so it can be piped to
other tools.
```json
{
"sql": "SELECT orders.status, SUM(orders.total_amount) AS total_revenue FROM public.orders GROUP BY orders.status",
@ -160,7 +186,8 @@ percentage.
| Error | Cause | Recovery |
|-------|-------|----------|
| Source not found | Source name or connection id is wrong | Run `ktx sl --json` and retry with an exact source name and connection id |
| Source not found | Source name or connection id is wrong | Run `ktx sl <query>` or `ktx sl --connection-id <id>` to find the exact source name, then retry `ktx sl read <sourceName>` or `ktx sl validate <sourceName>` |
| Source name is ambiguous | The same source name exists in multiple connections | Rerun with `--connection-id <id>` from the error message |
| Validation fails | YAML references missing columns, invalid joins, or invalid SQL expressions | Fix the source YAML and rerun `ktx sl validate` |
| Query compile fails | Measure, dimension, filter, or segment name is invalid | Search sources with `ktx sl <query>`, inspect the source YAML in your project files, then retry using declared fields |
| Execution returns too many rows | `--max-rows` is missing or too high | Add `--max-rows` with a bounded value before executing |

View file

@ -21,7 +21,7 @@ ktx status [options]
| `--json` | Print JSON output | `false` |
| `-v`, `--verbose` | Show every check, including passing ones | `false` |
| `--validate` | Only validate the `ktx.yaml` schema; skip readiness checks | `false` |
| `--fast` | Skip checks that require external communication (Postgres query-history probe, Claude Code auth probe) | `false` |
| `--fast` | Skip checks that require external communication (query-history readiness probes, Claude Code auth probe, and Codex auth probe) | `false` |
| `--no-input` | Disable interactive terminal input | - |
## Examples
@ -39,7 +39,7 @@ ktx status --verbose
# Validate ktx.yaml without running readiness checks
ktx status --validate
# Skip slow probes (Postgres pg_stat_statements, Claude Code auth)
# Skip slow probes (query-history readiness, Claude Code auth, Codex auth)
ktx status --fast
# Check a project from another directory
@ -57,6 +57,16 @@ flow, then rerun `ktx status`. Use `--fast` to skip this probe (useful in CI
or offline contexts); skipped checks render as `-` and carry
`"status": "skipped"` in JSON output.
For `llm.provider.backend: codex`, `ktx status` runs a minimal non-interactive
Codex request. If the probe fails, authenticate Codex locally with the Codex CLI
and verify the Codex CLI installation.
When `llm.provider.backend: codex` is configured, `ktx status` also prints a
warning when the installed public Codex SDK and CLI surface cannot prove full
Claude-Code-style isolation. The warning does not block authenticated Codex
usage, but it marks the project status as partial so you can make an explicit
runtime-isolation decision.
A `Local data` section summarises what the project has accumulated locally:
ingest run counts, last completed timestamp per connection, knowledge page
counts by scope, semantic-layer source and dictionary value counts, and the

View file

@ -1,21 +1,24 @@
---
title: "ktx wiki"
description: "List or search wiki pages."
description: "List, search, or read wiki pages."
---
List and search wiki pages in your **ktx** project. Wiki pages are Markdown
documents that capture business definitions, rules, and gotchas. Agents search
them for context when answering questions about your data.
List, search, and read wiki pages in your **ktx** project. Wiki pages are
Markdown documents that capture business definitions, rules, and gotchas.
Agents search them for context when answering questions about your data.
## Command signature
```bash
ktx wiki [options] [query...]
ktx wiki [options] [query...] # list (bare) or search (with query)
ktx wiki read <key>
```
- Bare `ktx wiki` lists local wiki pages.
- `ktx wiki <query...>` searches local wiki pages (multi-word queries are
joined with a space).
- `ktx wiki <query...>` searches local wiki pages. Multi-word queries are
joined with a space.
- `ktx wiki read <key>` prints the whole Markdown file for one wiki page,
including YAML frontmatter.
Edit the Markdown files under `wiki/` directly, or ingest source content with
`ktx ingest`, when you need to add or update wiki knowledge.
@ -50,6 +53,9 @@ ktx wiki "monthly recurring revenue"
# Search wiki pages as JSON
ktx wiki "monthly recurring revenue" --json --limit 10
# Print the exact Markdown file for a known page key
ktx wiki read revenue-definitions
# Print search results as TSV
ktx wiki "monthly recurring revenue" --output plain
@ -62,8 +68,10 @@ ktx --debug wiki "monthly recurring revenue" --json
Wiki commands print clack-style pretty output in a TTY and TSV-style plain
output when requested. JSON output wraps the items with a command metadata
envelope. Search results include `matchReasons` and `lanes` metadata so you can
see whether lexical, token, or semantic search contributed to the ranking. Open
the matching Markdown files directly when you need the full page contents.
see whether lexical, token, or semantic search contributed to the ranking. Use
`ktx wiki read <key>` when you need the full page contents. Read output is the
exact Markdown file stored on disk, including YAML frontmatter, and is not
wrapped in pretty, plain, or JSON formatting.
Pretty search output shows `#1`, `#2`, and later rank badges for the displayed
results. Plain and JSON output keep the raw `score` value, which is a ranking
score rather than a percentage.
@ -121,4 +129,4 @@ stays machine-readable:
| Error | Cause | Recovery |
|-------|-------|----------|
| Search returns no results | The query terms do not match summaries, tags, or content, and the semantic lane is unavailable or has no positive matches | Run with `--debug`, check the semantic lane status, retry with business synonyms, then create a page if the knowledge is missing |
| A page is missing | No Markdown file exists for that business context | Add a file under `wiki/` or run `ktx ingest <connectionId>` |
| A page is missing | No Markdown file exists for that business context or `ktx wiki read <key>` used the wrong key | Run `ktx wiki <query>` to find the page key, then retry `ktx wiki read <key>` |

View file

@ -36,9 +36,11 @@ ktx
wiki
list
search <query>
read <key>
sl
list
search <query>
read <sourceName>
validate <sourceName>
query
sql
@ -57,6 +59,7 @@ ktx
stop
status
reindex
completion <shell>
```
The public context-build entrypoint is `ktx ingest [connectionId]` or
@ -71,6 +74,44 @@ The public context-build entrypoint is `ktx ingest [connectionId]` or
| `-v`, `--version` | Show the CLI package name and version. |
| `-h`, `--help` | Show help for the current command. |
## Update notices
> **Note:** The update notifier writes only to stderr and keeps command stdout
> unchanged.
When a newer package is available on your installed release channel, `ktx`
prints a short notice after the command finishes:
```text
↑ Update available: ktx 0.9.0 → 0.10.0
npm i -g @kaelio/ktx
```
Stable installs compare against the npm `latest` dist-tag.
Release-candidate installs compare against the `next` dist-tag and show:
```text
npm i -g @kaelio/ktx@next
```
The check is skipped for JSON output, CI, non-TTY stdout, and hidden completion
commands. To opt out explicitly, set any of these environment variables:
```bash
KTX_NO_UPDATE_CHECK=1
NO_UPDATE_NOTIFIER=1
DO_NOT_TRACK=1
```
The `ktx` CLI prints one npm command because globally installed binaries don't
expose a reliable runtime package-manager signal. If you prefer another global
package manager, use the equivalent command:
```bash
pnpm add -g @kaelio/ktx
yarn global add @kaelio/ktx
```
## Project resolution
Most commands are project-aware. Pass `--project-dir <path>` when scripting or
@ -97,6 +138,10 @@ ktx ingest
ktx sl "revenue"
ktx wiki "revenue recognition"
# Print a known wiki page or semantic source
ktx wiki read revenue-definitions
ktx sl --connection-id warehouse read orders
# Execute read-only SQL
ktx sql --connection warehouse "select count(*) from public.orders"

View file

@ -11,6 +11,7 @@
"ktx-wiki",
"ktx-status",
"ktx-mcp",
"ktx-admin"
"ktx-admin",
"ktx-completion"
]
}

View file

@ -1,12 +1,15 @@
---
title: Telemetry
description: Understand what anonymous usage telemetry ktx collects and how to opt out.
description: Understand what usage telemetry ktx collects and how to opt out.
---
**ktx** collects anonymous, aggregated usage telemetry from interactive CLI
runs so maintainers can see which commands work, where setup fails, and which
parts of the data-agent workflow need improvement. Telemetry is opt-out and
disabled automatically in CI and non-interactive runs.
**ktx** collects aggregated usage telemetry so maintainers can see
which commands work, where setup fails, and which parts of the data-agent
workflow need improvement. Telemetry is opt-out: it turns on the first time you
run **ktx** in any way — an interactive command, a script, or an
agent-launched MCP server — and prints a one-time notice (to the terminal when
there is one, otherwise to standard error). It stays disabled in CI and whenever
an opt-out is set.
## Opt out
@ -17,23 +20,58 @@ Use any of these mechanisms to disable telemetry:
| `export KTX_TELEMETRY_DISABLED=1` | Disables telemetry for the shell and child processes |
| `export DO_NOT_TRACK=1` | Standard do-not-track environment variable |
| `CI=1` | Automatic in CI |
| Non-TTY output | Automatic for pipes and scripts |
| Edit `~/.ktx/telemetry.json` and set `"enabled": false` | Persistent for the machine |
| Edit `~/.ktx/telemetry.json` and set `"enabled": false` | Persistent for the machine, including the MCP server |
## What we collect
High-level signals only: which commands run, how long they take, whether they
High-level signals: which commands run, how long they take, whether they
succeed or fail, and basic environment metadata (CLI version, Node version, OS
platform). For project-level analysis, **ktx** sends a salted hash of the
project directory — never the raw path.
platform). When an operation fails, we also include diagnostic detail about the
error so we can debug it. For project-level analysis, **ktx** sends a salted
hash of the project directory to group events.
When an agent reaches **ktx** through MCP, we also record the connecting client
tool's self-reported name and version (for example Claude Desktop, Cursor, or
Cline) so we can see which agents people use **ktx** with. That describes the
tool, never you or your data.
## What we never collect
- File paths, hostnames, environment variable values, or command arguments
- `ktx.yaml` contents, connection passwords, API keys, or tokens
- Schema names, table names, column names, SQL text, or query results
- Error messages or stack traces
- Git remote URLs, Git user email, OS user, or hostname
We build telemetry around counts and coarse signals, not the contents of your
data or configuration. We don't deliberately collect your `ktx.yaml`, query
results, passwords, API keys, or access tokens.
The one place environment-specific text can appear is failure diagnostics: when
an operation errors, the detail we record is the error as your tools reported
it, which can include identifiers from your setup. If you'd rather send nothing
at all, turn telemetry off using any of the options above.
## Error reports
When telemetry is enabled, **ktx** sends PostHog Error Tracking `$exception`
events for CLI and daemon exceptions. Error reports help group crashes and
handled failures into PostHog issues.
Error reports can include:
- Stack frames, including function names, local file paths, line numbers, and
SDK-provided source context.
- Error class names and raw error messages.
- Cause chains when the runtime exposes them.
- `source`, `handled`, and `fatal` diagnostic fields.
- Runtime version, OS, architecture, and CI fields.
- The hashed `projectId` when **ktx** knows the project.
Error reports never intentionally include:
- Secrets, credentials, API keys, tokens, cookies, signed URLs, or auth headers.
- Database URLs, connection strings, DSNs, raw argv, or raw environment values.
- SQL text, schema names, table names, or column names as explicit payload
properties.
- Customer row data.
- User prompt text or raw MCP arguments.
The same opt-out controls listed above disable error reports.
## Storage and retention

View file

@ -8,7 +8,7 @@ import { SemanticLayerFlow } from "@/components/semantic-layer-flow";
**ktx**'s semantic layer is a compiler that turns intent into SQL. The agent
declares _what_ it wants - measures, dimensions, filters - in a small
semantic query. **ktx** figures out the _how_: which tables to join, what
grain to aggregate at, how to keep fan-out from inflating measures, and
grain to aggregate at, how to keep fanout from inflating measures, and
what dialect the warehouse speaks.
This page covers four mechanics:
@ -16,7 +16,7 @@ This page covers four mechanics:
- The semantic query contract agents send to the compiler.
- The planner steps that turn a semantic query into SQL.
- The join graph that backs those steps, and how it's built.
- The fan-out failure mode the compiler is designed to prevent.
- The fanout failure mode the compiler is designed to prevent.
## Imperative SQL vs declarative semantic querying
@ -84,14 +84,14 @@ same ordered steps before any SQL is emitted.
2. **Pick an anchor and build the join tree.** Choose the largest measure
source as the root, then run a shortest-path search across the typed
join graph to reach every required source.
3. **Detect fan-out.** Group measures by their owning source. If more
3. **Detect fanout.** Group measures by their owning source. If more
than one group exists, the planner marks the query as a chasm trap
and switches to aggregate-locality compilation.
4. **Classify filters.** Split predicates into row-level (`WHERE`) and
aggregate-level (`HAVING`) based on whether they reference a measure.
5. **Generate SQL.** Emit Postgres-shaped SQL with the right shape:
single-source aggregation when the query is safe, per-source CTEs
when fan-out is present.
when fanout is present.
6. **Transpile to the target dialect.** Run the result through `sqlglot`
so the warehouse receives syntax it understands.
@ -107,7 +107,7 @@ inverted, so the planner can traverse from any anchor.
| Relationship | Planning impact |
|--------------|-----------------|
| `many_to_one` | Safe direction for adding dimensions |
| `one_to_many` | Multiplies measures and triggers fan-out handling |
| `one_to_many` | Multiplies measures and triggers fanout handling |
| `one_to_one` | Safe in either direction when keys match |
| Equal-cost paths | Treated as ambiguous; aliases or explicit joins resolve them |
@ -286,9 +286,9 @@ inference. Each input contributes a different kind of authority.
</div>
</div>
## Fan-out and aggregate locality
## Fanout and aggregate locality
Fan-out is the classic analytics failure mode. Two fact tables join to a
Fanout is the classic analytics failure mode. Two fact tables join to a
shared dimension. A naive query joins them all together first, so each
row from one fact is multiplied by the matching rows from the other.
Measures duplicate, numbers go wrong, and the agent doesn't notice.
@ -336,5 +336,5 @@ different from what the agent first proposed.
| Explain the semantic query shape | The semantic query contract | [ktx sl](/docs/cli-reference/ktx-sl) |
| Describe what the planner does between query and SQL | What the planner does | [ktx sl](/docs/cli-reference/ktx-sl) |
| Explain why **ktx** asks for grain and relationship types | The join graph | [Writing context](/docs/guides/writing-context) |
| Diagnose duplicated measures after a join | Fan-out and aggregate locality | [ktx sl](/docs/cli-reference/ktx-sl) |
| Diagnose duplicated measures after a join | Fanout and aggregate locality | [ktx sl](/docs/cli-reference/ktx-sl) |
| Describe how semantic context stays current | Building and maintaining the graph | [Reviewing Context](/docs/guides/reviewing-context) |

View file

@ -156,7 +156,7 @@ joins:
relationship: many_to_one
```
For how the compiler walks the join graph, handles fan-out, and transpiles
For how the compiler walks the join graph, handles fanout, and transpiles
dialects, read [Semantic querying](/docs/concepts/semantic-layer-internals).
## Wiki pages
@ -240,7 +240,7 @@ models every time the warehouse changes.
| **Surface** | Indexed docs and chats | Modeling language or runtime | YAML and Markdown files |
| **Data-stack awareness** | None - treats data tools as text | High for declared metrics, none for the surrounding warehouse | Built in: scans schemas, dbt, BI tools, and query history |
| **Maintenance** | Manual page authoring | Manual modeling, model-per-change | Auto-maintained: reconciles evidence with accepted files |
| **SQL safety** | None - generates plausible text | Compiled, dialect-correct | Compiled with join-graph and fan-out handling |
| **SQL safety** | None - generates plausible text | Compiled, dialect-correct | Compiled with join-graph and fanout handling |
| **Agent edit loop** | Text-only | Tied to the modeling workflow | First-class: patch files, validate, review diffs |
If you already use MetricFlow, LookML, dbt, or BI tools, **ktx** can ingest that

View file

@ -66,8 +66,9 @@ read, how to think, and where to put the results.
## Minimal config
A working `ktx.yaml` needs one entry in `connections`. Everything else accepts
defaults. The example below is enough for `ktx ingest warehouse` to run a fast
schema scan against a local Postgres.
defaults. The example below registers a local Postgres connection; building
context with `ktx ingest warehouse` also needs a model and embeddings, which
`ktx setup` configures.
```yaml
connections:
@ -105,7 +106,7 @@ context-source drivers share the map.
| Driver | Kind | Required fields | Common optional fields |
|--------|------|-----------------|------------------------|
| `postgres` / `postgresql` | Warehouse | `driver` | `url`, `enabled_tables`, `historicSql`, `context.queryHistory` |
| `postgres` | Warehouse | `driver` | `url`, `enabled_tables`, `historicSql`, `context.queryHistory` |
| `mysql` | Warehouse | `driver` | `url`, `enabled_tables` |
| `sqlite` | Warehouse | `driver` | `url` or `path`, `enabled_tables` |
| `sqlserver` | Warehouse | `driver` | `url`, `enabled_tables` |
@ -123,7 +124,7 @@ context-source drivers share the map.
Warehouse connections are open objects: the listed fields are validated, and
any other field is preserved and passed through to the connector. Use
`enabled_tables` to scope deep ingest to a specific list of
`enabled_tables` to scope ingest to a specific list of
`schema.table` names - useful for smoke tests.
```yaml
@ -157,11 +158,14 @@ connections:
dataset_ids: [analytics, mart]
```
For Snowflake connections, set `maxSessions` when deep ingest needs more or
fewer concurrent warehouse sessions. The default is `4`. This caps all
concurrent Snowflake SQL work for that connector instance, including schema
introspection, table sampling, relationship profiling, relationship
validation, and read-only SQL execution.
For Postgres, MySQL, SQL Server, and Snowflake connections, set
`maxConnections` when scan or ingest work needs to stay below the target's
connection cap. Postgres, MySQL, and SQL Server default to `10`; Snowflake
defaults to `4`. This caps all concurrent SQL work for that connector instance,
including schema introspection, table sampling, relationship profiling,
relationship validation, and read-only SQL execution. BigQuery and ClickHouse
do not expose `maxConnections` because their connectors don't use client-side
connection pools.
For Postgres, BigQuery, and Snowflake, `historicSql` and `context.queryHistory`
toggle query-history ingest. The shape is connector-specific; the setup wizard
@ -175,9 +179,22 @@ connections:
context:
queryHistory:
enabled: true
enabledSchemas:
- orbit_raw
- orbit_analytics
minExecutions: 5
```
- `enabledSchemas`: Optional list of schema or dataset names that query-history
ingest may mine. Omit it to let **ktx** derive the modeled schema floor from
the connection and semantic-layer sources. Use `["*"]` to disable the floor
for discovery runs.
- `filters.serviceAccounts`: Optional service-account filter block. During
setup, when query history is enabled and no service-account block already
exists, **ktx** can propose exact role patterns such as `^svc_loader$` from
observed in-scope query history. The block uses `mode: exclude` and remains
hand-editable.
### Metabase
```yaml
@ -372,13 +389,23 @@ llm:
| Field | Type | Default | Purpose |
|-------|------|---------|---------|
| `provider.backend` | `none` \| `anthropic` \| `vertex` \| `gateway` \| `claude-code` | `none` | Selected backend. `none` disables LLM features. `claude-code` uses the local Claude Code session and needs no API key. |
| `provider.backend` | `none` \| `anthropic` \| `vertex` \| `gateway` \| `claude-code` \| `codex` | `none` | Selected backend. `none` disables LLM features. `claude-code` uses the local Claude Code session and needs no API key. `codex` uses local Codex authentication and needs no API key. |
| `provider.anthropic.api_key` | `string` | - | Anthropic API key. Required when `backend: anthropic`. Accepts `env:` or `file:` references. |
| `provider.anthropic.base_url` | `string` | - | Override the Anthropic API base URL (proxy, self-hosted gateway). |
| `provider.gateway.api_key` / `base_url` | `string` | - | Credentials for an AI Gateway provider. Required when `backend: gateway`. |
| `provider.vertex.project` | `string` | - | Google Cloud project ID hosting the Vertex AI endpoint. |
| `provider.vertex.location` | `string` | - | Vertex AI region (for example `us-east5`). Required when the `vertex` block is present. |
Use `codex` when local Codex authentication should power **ktx** LLM work:
```yaml
llm:
provider:
backend: codex
models:
default: gpt-5.5
```
### Model roles
`models` overrides the per-role model. Keys are fixed; values are
@ -425,6 +452,16 @@ ingest:
stepBudget: 40
maxConcurrency: 2
failureMode: continue
rateLimit:
enabled: true
throttleThreshold: 0.8
minConcurrencyUnderPressure: 1
maxWaitMs: 600000
retry:
maxAttempts: 6
baseDelayMs: 1000
maxDelayMs: 60000
jitter: true
```
### Adapters
@ -471,6 +508,24 @@ handles failures.
| `workUnits.maxConcurrency` | `int > 0` | `1` | How many work units run in parallel. |
| `workUnits.failureMode` | `abort` \| `continue` | `continue` | `abort` stops the whole ingest run on the first failure; `continue` records it and keeps going. |
### Rate limits
`rateLimit` controls provider-neutral pacing for LLM calls during ingest. When a
provider reports a subscription window, retry-after delay, or HTTP 429,
**ktx** pauses new work-unit model calls, shows a transient wait in the CLI,
and reduces work-unit concurrency while the provider is under pressure.
| Field | Type | Default | Purpose |
|-------|------|---------|---------|
| `rateLimit.enabled` | `boolean` | `true` | Master switch for ingest LLM rate-limit pacing and visible waits. |
| `rateLimit.throttleThreshold` | `number between 0 and 1` | `0.8` | Fraction of a known provider window at which **ktx** starts reducing concurrency. |
| `rateLimit.minConcurrencyUnderPressure` | `int > 0` | `1` | Effective work-unit concurrency while a provider is under rate-limit pressure. |
| `rateLimit.maxWaitMs` | `int > 0` | unset | Caps how long a single provider-reset wait can last. This bounds each wait, not the whole run: after a capped wait elapses **ktx** retries and may pause again. Omit to wait until the provider's reset time. |
| `rateLimit.retry.maxAttempts` | `int > 0` | `6` | Maximum attempts for a single rate-limited LLM call before the failure surfaces (counts the first try). Also bounds how far opaque backoff grows for responses without a reset time or retry-after value. |
| `rateLimit.retry.baseDelayMs` | `int > 0` | `1000` | Initial opaque retry delay in milliseconds. |
| `rateLimit.retry.maxDelayMs` | `int > 0` | `60000` | Maximum opaque retry delay in milliseconds. |
| `rateLimit.retry.jitter` | `boolean` | `true` | Add jitter to opaque retry delays. |
## `scan`
`scan` configures how schema-level inputs become structured context:
@ -517,7 +572,7 @@ the manifest.
| `relationships.maxLlmTablesPerBatch` | `int > 0` | `40` | Max tables included in a single LLM relationship-proposal batch. |
| `relationships.maxCandidatesPerColumn` | `int > 0` | `25` | Max join partners considered per column. |
| `relationships.profileSampleRows` | `int > 0` | `10000` | Rows sampled per table when profiling values for relationship inference. |
| `relationships.profileConcurrency` | `int > 0` | `4` | Parallel relationship-profile queries against the database. For Snowflake, effective database concurrency is also bounded by the connection's `maxSessions`. |
| `relationships.profileConcurrency` | `int > 0` | `4` | Parallel relationship-profile queries against the database. For pooled connectors, effective database concurrency is also bounded by the connection's `maxConnections`. |
| `relationships.validationConcurrency` | `int > 0` | `4` | Parallel relationship validation queries against the database. |
| `relationships.validationBudget` | `all` \| `int ≥ 0` | runtime default | Cap on validation queries per scan. `all` means unlimited. |

View file

@ -4,6 +4,7 @@ description: ktx is an open-source, self-improving context layer for data agents
---
import { ProductMechanics } from "@/components/product-mechanics";
import { ProductRuntime } from "@/components/product-runtime";
<div className="not-prose mb-10">
<div>
@ -59,6 +60,8 @@ serves that context to agents at runtime.
<ProductMechanics />
<ProductRuntime />
## Use it for
Use **ktx** when agents need more than raw database access. Agents can search wiki

View file

@ -30,17 +30,18 @@ If you're a coding assistant choosing a docs route, start with the
</div>
<div className="mt-2.5 text-base leading-relaxed text-fd-foreground">
Try **ktx** against a real data stack - Postgres, dbt, Metabase, and Notion
pre-loaded with the Orbit demo corpus. The page lists demo credentials
you can paste straight into `ktx setup`.
pre-loaded with the Orbit demo corpus. Hit **copy agent setup** on the page
for a one-shot prompt that has an agent install the full four-source demo,
or grab the raw credentials to paste into `ktx setup` yourself.
</div>
<a
href="https://kaelio.com/start"
href="https://www.kaelio.com/start"
className="group mt-5 inline-flex items-center gap-2 rounded-full px-4 py-2.5 text-sm font-semibold text-white no-underline shadow-[inset_0_1px_0_rgba(255,255,255,0.35),0_2px_4px_rgba(255,138,77,0.2),0_10px_24px_-8px_rgba(255,138,77,0.55)] transition-all duration-200 hover:-translate-y-0.5 hover:shadow-[inset_0_1px_0_rgba(255,255,255,0.4),0_3px_6px_rgba(255,138,77,0.28),0_16px_30px_-8px_rgba(255,138,77,0.65)]"
style={{
background: 'linear-gradient(180deg, #ff9d63 0%, #f97316 100%)',
}}
>
Get demo credentials at kaelio.com/start
Get demo credentials at www.kaelio.com/start
<svg
width="14"
height="14"
@ -98,21 +99,70 @@ If you're a coding assistant choosing a docs route, start with the
background: 'color-mix(in oklch, var(--color-fd-primary) 8%, transparent)',
}}
>
<div className="text-sm font-semibold text-fd-foreground">
Run setup from an agent
</div>
<div className="mt-2 text-sm leading-6 text-fd-muted-foreground">
You can ask an agent such as Claude Code, Codex, Cursor, or OpenCode to
install and configure **ktx** for you. The{' '}
<a href="/ktx/docs/agents-setup.md" className="font-medium underline">
agent setup Markdown prompt
</a>{' '}
tells the agent how to check prerequisites, ask only for credentials or
connection choices, run <code>ktx setup</code>, verify connections, and
report the result.
</div>
<div className="mt-3 text-sm leading-6 text-fd-muted-foreground">
Use a prompt like this from the project you want to configure:
<div className="flex flex-wrap items-center gap-x-3 gap-y-2">
<div className="text-sm font-semibold text-fd-foreground">
Or, ask an AI agent to install and configure **ktx** for you.
</div>
<div className="group relative ml-auto inline-flex">
<button
type="button"
aria-describedby="agent-setup-overlay"
className="inline-flex cursor-help items-center gap-1.5 rounded-full border border-fd-border bg-fd-background/70 px-2.5 py-1 text-xs font-medium text-fd-muted-foreground transition-colors hover:border-fd-primary/40 hover:text-fd-foreground focus:outline-none focus-visible:border-fd-primary/40 focus-visible:text-fd-foreground"
>
<svg
width="12"
height="12"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
strokeWidth="2.4"
strokeLinecap="round"
strokeLinejoin="round"
aria-hidden="true"
>
<circle cx="12" cy="12" r="10" />
<path d="M9.09 9a3 3 0 0 1 5.83 1c0 2-3 3-3 3" />
<line x1="12" y1="17" x2="12.01" y2="17" />
</svg>
What does it do?
</button>
<div
id="agent-setup-overlay"
role="tooltip"
className="invisible absolute right-0 top-full z-20 translate-y-0.5 pt-2 opacity-0 transition-all duration-150 group-hover:visible group-hover:translate-y-0 group-hover:opacity-100 group-focus-within:visible group-focus-within:translate-y-0 group-focus-within:opacity-100"
>
<div className="w-[min(24rem,calc(100vw-2rem))] rounded-lg border border-fd-border bg-fd-popover p-3 text-sm leading-6 text-fd-popover-foreground shadow-xl">
<div className="text-xs font-semibold uppercase tracking-wide text-fd-muted-foreground">
The agent will
</div>
<ol className="mt-2 space-y-1.5 pl-0">
{[
<>Check prerequisites on your machine</>,
<>Ask only for credentials and connection choices</>,
<>Run <code className="whitespace-nowrap">ktx setup</code> in your project</>,
<>Verify each connection it configured</>,
<>Report what was installed and what is ready</>,
].map((item, index) => (
<li key={index} className="flex gap-2.5">
<span
className="mt-0.5 inline-flex h-5 w-5 shrink-0 items-center justify-center rounded-full text-[11px] font-bold tabular-nums"
style={{
background: 'color-mix(in oklch, var(--color-fd-primary) 18%, transparent)',
color: 'var(--color-fd-primary)',
}}
>
{index + 1}
</span>
<span className="leading-6">{item}</span>
</li>
))}
</ol>
<div className="mt-3 border-t border-fd-border pt-2 text-xs text-fd-muted-foreground">
Works with any AI coding agent.
</div>
</div>
</div>
</div>
</div>
<div className="mt-3 max-w-full overflow-hidden rounded-md border bg-fd-background">
<div className="flex items-center justify-between gap-2 border-b px-3 py-2">
@ -120,16 +170,15 @@ If you're a coding assistant choosing a docs route, start with the
Prompt
</span>
<CopyButton
text={`Follow instructions from
https://docs.kaelio.com/ktx/docs/agents-setup.md
to install and configure ktx`}
text={[
'Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill',
'to install and configure ktx',
].join(' ')}
className="-my-1"
/>
</div>
<div className="p-3 font-mono text-sm leading-6 text-fd-foreground">
<div>Follow instructions from</div>
<div className="break-all">https://docs.kaelio.com/ktx/docs/agents-setup.md</div>
<div>to install and configure ktx</div>
<div className="p-3 font-mono text-[13.5px] leading-6 text-fd-foreground">
Run {'`npx skills add Kaelio/ktx --skill ktx`'} and use the ktx skill to install and configure ktx
</div>
</div>
</div>
@ -166,8 +215,8 @@ The wizard walks you through everything **ktx** needs in one pass:
SQLite, PostgreSQL, MySQL, SQL Server, BigQuery, and Snowflake.
5. **Context sources** - optionally adds dbt, MetricFlow, LookML, Looker,
Metabase, or Notion. You can skip and add them later.
6. **Build** - runs the first ingest so semantic sources and wiki pages
are ready for agents.
6. **Build** - offers to run the first ingest so semantic sources and wiki
pages are ready for agents. If you skip it, build later with `ktx ingest`.
7. **Agent integration** - installs project-local rules for Claude Code,
Codex, Cursor, OpenCode, or universal `.agents`.
@ -187,7 +236,7 @@ Testing warehouse
Connection test passed
Building schema context for warehouse
Running fast database ingest
Running database scan
```
If setup exits early, rerun `ktx setup` in the same directory. **ktx** keeps
@ -198,6 +247,18 @@ progress under `.ktx/setup/` and resumes from the remaining work.
> resuming setup, connecting an agent, checking status, or exploring a
> pre-built demo project.
When the wizard finishes, it states where you stand and the single next action:
- **Context built** - **ktx** confirms it is ready for agents and points you to
open your coding agent and ask a data question.
- **Build skipped** - **ktx** tells you setup is complete and that the only step
left is to build context with `ktx ingest`.
Re-running `ktx setup` on an already-configured project goes straight to the
remaining step - building context or connecting an agent - instead of
re-asking every question. Once everything is ready, it confirms you are set
rather than reopening the configuration menu.
## Verify
When setup finishes, check readiness:
@ -219,18 +280,41 @@ Agent integration ready: yes (codex:project)
For a structured check inside scripts, use `ktx status --json`.
When setup builds deep context, its final context check looks like:
If you skipped the build, `ktx context built` shows `no`. Build it with
`ktx ingest` - there is no need to re-run `ktx setup`.
When setup finishes building context, its final context check looks like:
```text
ktx context is ready for agents.
Databases:
warehouse: deep context complete
warehouse: database context complete
Context sources:
dbt_main: memory update complete
```
Before the build starts, **ktx** runs a live test for every connection the
build depends on. A context build can take several minutes, so if any required
connection is unreachable or misconfigured the build is blocked up front and
**ktx** names the failing connection by id and connector type:
```text
KTX cannot build context: a required connection failed its live test.
Failed connections:
warehouse (postgres)
Each connection must be reachable before KTX builds context.
Run `ktx connection test <id>` to see the error, fix the connection, then retry.
```
Run `ktx connection test <connection-id>` to see the underlying error, fix the
connection, then continue. In interactive setup you can retry without
restarting; with `--no-input` the build exits non-zero and names the failing
connection so scripts can stop early.
## Connect a coding agent
The setup wizard installs project-local agent rules in the last step. To
@ -277,7 +361,7 @@ ktx setup \
Then build context:
```bash
ktx ingest warehouse --fast
ktx ingest warehouse
```
See [ktx setup](/docs/cli-reference/ktx-setup) for the full automation flag
@ -290,7 +374,8 @@ surface.
| `ktx: command not found` | Reinstall `@kaelio/ktx` and open a new shell |
| Setup resumes the wrong project | Pass `--project-dir <path>` |
| LLM or embeddings health check fails | Rerun setup and pick a different credential, model, or backend |
| Database test fails | Verify the same connection with the database's native client, then rerun setup |
| Database test fails | Use the setup recovery menu to retry or re-enter details; if it still fails, verify the same connection with the database's native client |
| Context build blocked: a connection failed its live test | Run `ktx connection test <connection-id>` to see the error, fix the connection, then retry the build |
| Agent integration is incomplete | Run `ktx setup --agents --target <target>` |
## Next steps

View file

@ -24,7 +24,9 @@ external metadata can attach to known warehouse tables.
## Database ingest
Database ingest records table, column, type, constraint, and row-count context.
Database ingest always builds enriched context: tables, columns, types,
constraints, and row counts, plus AI-generated descriptions, embeddings, and
relationship evidence.
```bash
# Build one configured database connection
@ -34,37 +36,37 @@ ktx ingest warehouse
ktx ingest --all
```
Depth controls how much context **ktx** builds:
Enriched ingest needs a configured model and embeddings. Run `ktx setup` first;
connections without that configuration fail before any work starts.
| Flag | Best for | What it does |
|------|----------|--------------|
| `--fast` | First setup, quick refreshes, CI smoke checks | Deterministic fast ingest with tables, columns, types, constraints, and row counts |
| `--deep` | Agent-ready context for real analysis | Fast ingest plus deep enrichment with descriptions, embeddings, relationship evidence, and optional query history |
Examples:
Local-auth backends keep provider credentials out of `ktx.yaml`:
```bash
ktx ingest warehouse --fast
ktx ingest warehouse --deep
ktx ingest --all --deep
ktx setup --llm-backend claude-code --no-input
ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
```
Deep ingest needs LLM and embedding readiness. Otherwise run `ktx setup` or use
`--fast`.
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools for the
current run.
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools
for the current run. With `codex`, **ktx** restricts the temporary runtime MCP
server to the current run's tool set, disables Codex web search, requests a
read-only sandbox, and sets `approval_policy=never`. The public Codex SDK and
CLI surface may still load user Codex config and built-in command execution or
read-only file capabilities, so use `claude-code` for stricter runtime tool
isolation.
## Query history
PostgreSQL, BigQuery, and Snowflake can add query-history context: common joins,
filters, service-account patterns, redaction rules, and high-usage templates.
filters, redaction rules, high-usage templates, and service-account exclusions.
When query history is enabled during setup, **ktx** reviews observed in-scope
roles and can write exact `filters.serviceAccounts` patterns for operational
traffic such as loader or refresh roles.
Enable it during setup, store it under `connections.<id>.context.queryHistory`,
or request it for one run:
```bash
ktx ingest warehouse --deep --query-history
ktx ingest warehouse --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30
```
@ -74,8 +76,8 @@ for one run.
## Relationship evidence
**ktx** scores relationship candidates during supported deep database ingest. The
public CLI does not expose separate relationship review subcommands.
**ktx** scores relationship candidates during database ingest. The public CLI
does not expose separate relationship review subcommands.
## Context-source ingest
@ -159,7 +161,7 @@ After interactive setup:
```bash
ktx status
ktx ingest --all --deep
ktx ingest --all
ktx status
```
@ -176,8 +178,8 @@ ktx wiki "revenue" --json --limit 10
| Symptom | Likely cause | Recovery |
|---------|--------------|----------|
| Connection not configured | The connection id is missing from `ktx.yaml` | Add it with `ktx setup` |
| Deep readiness is missing | LLM or embeddings are not setup-ready | Run `ktx setup`, or rerun with `--fast` |
| Query history is unsupported | The selected database driver does not expose query history | Run fast ingest without query-history flags |
| Enrichment is not configured | LLM or embeddings are not setup-ready | Run `ktx setup` to configure a model and embeddings |
| Query history is unsupported | The selected database driver does not expose query history | Run ingest without query-history flags |
| No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or context-source connection |
| Context-source flags have no effect | Depth and query-history flags were supplied for a context-source connector | Use those flags only for database connections |
| Context-source flags have no effect | Query-history flags were supplied for a context-source connector | Use query-history flags only for database connections |
| Text ingest stops early | `--fail-fast` stopped on the first failed item | Fix the item or rerun without `--fail-fast` |

View file

@ -16,6 +16,7 @@ Set `llm.provider.backend` to one of these values:
- `gateway`: Use AI Gateway-compatible Anthropic model ids.
- `claude-code`: Use your local Claude Code session through the Claude Agent
SDK. **ktx** strips provider-routing environment variables from child processes.
- `codex`: Use your local Codex authentication through the Codex SDK.
## Claude Code
@ -47,6 +48,42 @@ model IDs are also accepted.
metadata may still list host slash commands, skills, and subagents; **ktx** does not
grant execution access to them.
## Codex backend
Use `codex` when you want **ktx** to run LLM-backed workflows through your
local Codex authentication instead of a direct provider API key.
```yaml
llm:
provider:
backend: codex
models:
default: gpt-5.5
```
Configure it non-interactively:
```bash
ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
```
This is separate from Codex agent-client setup. `ktx setup --agents --target
codex` installs instructions and MCP access for an end-user Codex session.
`ktx setup --llm-backend codex` makes **ktx** itself execute ingest, scan
enrichment, memory, and other LLM-backed work through Codex.
During runtime loops, **ktx** starts a temporary loopback MCP server for the
current run, exposes only the tools passed to that run, asks Codex to use a
read-only sandbox, sets `approval_policy=never`, auto-approves only those
run-scoped MCP tools, and disables Codex web search.
Codex backend isolation is currently limited by the public Codex SDK and CLI
surface. Codex may still load user Codex config and built-in command execution
or read-only file capabilities. Use `llm.provider.backend: claude-code` when
you need stricter Claude-Code-style runtime tool isolation, or remove host
Codex MCP and tool config before running untrusted prompts through the `codex`
backend.
## Prompt caching
`llm.promptCaching` has partial parity on `claude-code`. Status and doctor warn

View file

@ -111,12 +111,13 @@ non-obvious terms.
Agents can refresh context when the user asks them to:
```bash
ktx ingest warehouse --fast
ktx ingest warehouse
ktx ingest
ktx ingest --file docs/revenue-notes.md --connection-id warehouse
```
Use `--deep` only when LLM and embedding setup is ready.
Database ingest builds enriched context and requires a configured model and
embeddings; run `ktx setup` first if they are not ready.
## Good agent behavior

View file

@ -9,7 +9,9 @@ admin surface for setup, ingest, status, daemon lifecycle, and debugging.
Run `ktx setup` and select your agent client targets, or configure manually
using the snippets below. Choose **Ask data questions with ktx MCP** for agent
clients. Choose **Ask data questions + manage ktx with CLI commands** only when
a developer or operator agent also needs pinned `ktx` admin commands.
a developer or operator agent also needs pinned `ktx` admin commands. Choose
**Skip agent setup for now** to leave agent integration incomplete and run
`ktx setup --agents` later.
## Install with setup
@ -43,14 +45,19 @@ ktx setup --agents --target codex --global
manifest lets status checks report agent readiness and lets future cleanup
remove only files **ktx** installed.
The interactive command asks two questions:
The interactive command asks what agents can do first:
```txt
◆ What should agents be allowed to do with this ktx project?
│ ○ Ask data questions with ktx MCP
│ ○ Ask data questions + manage ktx with CLI commands
│ ○ Skip agent setup for now
```
If you choose an install mode, it then asks which targets to install:
```txt
◆ Which agent targets should ktx install?
│ ◻ Claude Code
│ ◻ Claude Desktop
@ -183,10 +190,8 @@ Claude Desktop skill packages for the **ktx** workflows:
- `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or
`%AppData%/Claude/claude_desktop_config.json` (Windows) gets an
`mcpServers.ktx` entry that runs the **ktx** MCP server over stdio via a local
launcher shim at `.ktx/agents/claude/ktx-plugin-runner.sh`. The shim locates
a usable Node.js (Volta, NVM, Homebrew, system) so Claude Desktop can spawn
the server without needing `node` in PATH.
`mcpServers.ktx` entry that runs the **ktx** MCP server over stdio with the
current Node.js executable and the installed `ktx` CLI entrypoint.
- `.ktx/agents/claude/ktx-analytics.zip` contains the `ktx-analytics` skill.
If you choose **Ask data questions + manage ktx with CLI commands**, **ktx** also
generates `.ktx/agents/claude/ktx.zip` with the admin `ktx` skill. Claude

View file

@ -517,5 +517,5 @@ No authentication required - SQLite is file-based. The file must be readable by
| Connection URL appears in git diff | A literal credential URL was written to `ktx.yaml` | Replace it with `env:NAME` or `file:/path/to/secret` and rotate exposed credentials |
| Database ingest returns no tables | Schema, database, or project filter is wrong, or the user lacks metadata permissions | Verify the schema list and grant metadata read permissions |
| Query history is empty | Query history extension or warehouse history view is unavailable | Enable the warehouse-specific history feature, then rerun `ktx ingest <connectionId> --query-history` or `ktx setup` |
| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on fast schema context |
| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on schema-level context without column statistics |
| Semantic query execution fails | Connection is missing, unreachable, or query execution is disabled | Run `ktx connection test <id>` and check the `ktx sl query` flags |

View file

@ -1,12 +0,0 @@
import { readFile } from "node:fs/promises";
import { join } from "node:path";
export const agentSetupSlug = ["agents-setup"] as const;
export function isAgentSetupSlug(slug: string[] | undefined) {
return slug?.length === 1 && slug[0] === agentSetupSlug[0];
}
export function readAgentSetupMarkdown() {
return readFile(join(process.cwd(), "content/agents-setup.md"), "utf8");
}

View file

@ -52,8 +52,9 @@ ktx provides semantic-layer files, warehouse scans, wiki pages, provenance, and
## Agent Entry Points
- Installable setup skill: run \`npx skills add Kaelio/ktx --skill ktx\` from
the project you want to configure.
${link("/docs/ai-resources/agent-quickstart", "Agent Quickstart", "Task-first route for coding assistants using ktx")}
${link("/docs/agents-setup", "Agent Setup", "Copy-pasteable prompt for agents installing and configuring ktx")}
${link("/docs/ai-resources/markdown-access", "Markdown Access", "Fetch ktx docs as llms.txt, llms-full.txt, or per-page Markdown")}
${link("/docs/ai-resources/agent-instructions", "Agent Instructions", "Suggested instructions for coding assistants that need to read and cite ktx docs")}

View file

@ -6,15 +6,60 @@ const withMDX = createMDX();
const config = {
basePath: "/ktx",
async rewrites() {
return [
{
source: "/docs/:path*.md",
destination: "/llms.mdx/docs/:path*",
},
];
return {
beforeFiles: [
{
source: "/stars",
has: [{ type: "host", value: "ktx.sh" }],
destination: "https://ktx-stars.vercel.app/stars",
basePath: false,
},
{
source: "/stars/:path*",
has: [{ type: "host", value: "ktx.sh" }],
destination: "https://ktx-stars.vercel.app/stars/:path*",
basePath: false,
},
],
afterFiles: [
{
source: "/docs/:path*.md",
destination: "/llms.mdx/docs/:path*",
},
],
};
},
async redirects() {
// Alias-host canonicalization MUST come before the generic root/docs
// redirects below. Those generic rules have no host guard, so if they ran
// first they would inject a "/ktx" basePath into the path on the alias
// hosts, which the alias catch-alls would then prepend a second time —
// producing https://docs.kaelio.com/ktx/ktx/docs/... Redirects also run
// before beforeFiles rewrites, so the ktx.sh catch-all must exclude
// /stars* to let the stars dashboard rewrite proxy through.
return [
{
source: "/slack",
has: [{ type: "host", value: "ktx.sh" }],
destination:
"https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ",
permanent: false,
basePath: false,
},
{
source: "/:path*",
has: [{ type: "host", value: "docs.ktx.sh" }],
destination: "https://docs.kaelio.com/ktx/:path*",
permanent: true,
basePath: false,
},
{
source: "/:path((?!stars(?:/|$)).*)",
has: [{ type: "host", value: "ktx.sh" }],
destination: "https://docs.kaelio.com/ktx/:path",
permanent: true,
basePath: false,
},
{
source: "/",
destination: "/ktx/docs/getting-started/introduction",
@ -27,20 +72,6 @@ const config = {
permanent: false,
basePath: false,
},
{
source: "/:path*",
has: [{ type: "host", value: "docs.ktx.sh" }],
destination: "https://docs.kaelio.com/ktx/:path*",
permanent: true,
basePath: false,
},
{
source: "/:path*",
has: [{ type: "host", value: "ktx.sh" }],
destination: "https://docs.kaelio.com/ktx/:path*",
permanent: true,
basePath: false,
},
];
},
};

View file

@ -12,15 +12,16 @@
"dependencies": {
"@xyflow/react": "^12.10.2",
"fumadocs-core": "16.8.10",
"fumadocs-mdx": "15.0.4",
"fumadocs-mdx": "15.0.7",
"fumadocs-ui": "16.8.10",
"html-to-image": "1.11.11",
"next": "^16",
"react": "19.2.6",
"react-dom": "19.2.6"
},
"devDependencies": {
"@tailwindcss/postcss": "^4",
"@types/node": "^25.7.0",
"@types/node": "^25.9.1",
"@types/react": "^19",
"@types/react-dom": "^19",
"tailwindcss": "^4",

View file

@ -1,210 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="1346" height="1710" viewBox="0 0 1346 1710" role="img" aria-labelledby="title desc">
<title id="title">ktx ingestion flow</title>
<desc id="desc">Source systems flow through source connectors, context builder, reconciliation, and validation to create wiki Markdown and semantic-layer YAML outputs.</desc>
<defs>
<filter id="card-shadow" x="-12%" y="-12%" width="124%" height="124%" color-interpolation-filters="sRGB">
<feDropShadow dx="0" dy="2" stdDeviation="2" flood-color="#0f172a" flood-opacity="0.14"/>
</filter>
<filter id="dark-shadow" x="-12%" y="-12%" width="124%" height="124%" color-interpolation-filters="sRGB">
<feDropShadow dx="0" dy="2" stdDeviation="2" flood-color="#020617" flood-opacity="0.22"/>
</filter>
<filter id="glow-blue" x="-160%" y="-160%" width="420%" height="420%">
<feGaussianBlur stdDeviation="7" result="blur"/>
<feMerge>
<feMergeNode in="blur"/>
<feMergeNode in="SourceGraphic"/>
</feMerge>
</filter>
<marker id="arrow" viewBox="0 0 10 10" refX="8.5" refY="5" markerWidth="9" markerHeight="9" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#94a3b8"/>
</marker>
<style>
.card { fill: #ffffff; stroke: #e2e8f0; stroke-width: 1.4; filter: url(#card-shadow); }
.stage { fill: #0b1f23; stroke: #17343a; stroke-width: 1.2; filter: url(#dark-shadow); }
.title { fill: #24272d; font: 700 28px Inter, Arial, sans-serif; }
.body { fill: #666b73; font: 500 18px Inter, Arial, sans-serif; }
.tag { fill: #6b7280; font: 500 16px Inter, Arial, sans-serif; }
.mono { font: 700 20px "SFMono-Regular", Consolas, monospace; }
.stage-title { fill: #f8fafc; font: 700 28px Inter, Arial, sans-serif; }
.stage-body { fill: #b8c6ca; font: 500 20px Inter, Arial, sans-serif; }
.index { fill: #07313a; font: 700 22px Inter, Arial, sans-serif; text-anchor: middle; dominant-baseline: middle; }
.edge { fill: none; stroke: #94a3b8; stroke-width: 2; stroke-linecap: round; stroke-linejoin: round; }
.dash { fill: none; stroke: #64748b; stroke-width: 1.8; stroke-dasharray: 5 8; stroke-linecap: round; }
</style>
</defs>
<g id="source-cards">
<g transform="translate(24 39)">
<rect class="card" x="0" y="0" width="298" height="285" rx="4"/>
<rect x="0" y="0" width="298" height="4" rx="2" fill="#3b82f6"/>
<text class="title" x="22" y="52">Databases</text>
<text class="body" x="22" y="92">Schemas, columns, keys,</text>
<text class="body" x="22" y="120">row counts, and query</text>
<text class="body" x="22" y="148">history.</text>
<g transform="translate(22 180)">
<rect x="0" y="0" width="112" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="12" y="24">PostgreSQL</text>
<rect x="120" y="0" width="100" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="132" y="24">Snowflake</text>
<rect x="0" y="46" width="92" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="12" y="70">BigQuery</text>
<rect x="100" y="46" width="74" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="112" y="70">SQLite</text>
</g>
</g>
<g transform="translate(358 39)">
<rect class="card" x="0" y="0" width="298" height="285" rx="4"/>
<rect x="0" y="0" width="298" height="4" rx="2" fill="#f97316"/>
<text class="title" x="22" y="52">BI tools</text>
<text class="body" x="22" y="92">Dashboards, questions,</text>
<text class="body" x="22" y="120">explores, usage, and trusted</text>
<text class="body" x="22" y="148">examples.</text>
<g transform="translate(22 180)">
<rect x="0" y="0" width="96" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="12" y="24">Metabase</text>
<rect x="104" y="0" width="74" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="116" y="24">Looker</text>
</g>
</g>
<g transform="translate(692 39)">
<rect class="card" x="0" y="0" width="298" height="285" rx="4"/>
<rect x="0" y="0" width="298" height="4" rx="2" fill="#f59e0b"/>
<text class="title" x="22" y="52">Modeling code</text>
<text class="body" x="22" y="92">Existing metrics, dimensions,</text>
<text class="body" x="22" y="120">models, joins, and entities.</text>
<g transform="translate(22 152)">
<rect x="0" y="0" width="48" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="12" y="24">dbt</text>
<rect x="56" y="0" width="82" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="68" y="24">LookML</text>
<rect x="0" y="46" width="102" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="12" y="70">MetricFlow</text>
</g>
</g>
<g transform="translate(1026 39)">
<rect class="card" x="0" y="0" width="298" height="285" rx="4"/>
<rect x="0" y="0" width="298" height="4" rx="2" fill="#10b981"/>
<text class="title" x="22" y="52">Docs and notes</text>
<text class="body" x="22" y="92">Policies, caveats, team</text>
<text class="body" x="22" y="120">definitions, and analyst</text>
<text class="body" x="22" y="148">context.</text>
<g transform="translate(22 180)">
<rect x="0" y="0" width="72" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="12" y="24">Notion</text>
<rect x="80" y="0" width="84" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="92" y="24">Any text</text>
</g>
</g>
</g>
<g id="edges">
<path class="edge" d="M172 324 V380 Q172 394 186 394 H507 Q507 394 507 380 V324"/>
<path class="edge" d="M841 324 V380 Q841 394 827 394 H507"/>
<path class="edge" d="M1175 324 V380 Q1175 394 1161 394 H673 Q673 394 673 408 V433" marker-end="url(#arrow)"/>
<path class="edge" d="M507 394 H673"/>
<path class="edge" d="M673 618 V651" marker-end="url(#arrow)"/>
<path class="edge" d="M673 833 V866" marker-end="url(#arrow)"/>
<path class="edge" d="M673 1048 V1081" marker-end="url(#arrow)"/>
<path class="edge" d="M673 1262 V1310 Q673 1325 656 1325 H305 Q291 1325 291 1339 V1364" marker-end="url(#arrow)"/>
<path class="edge" d="M673 1262 V1310 Q673 1325 690 1325 H1043 Q1057 1325 1057 1339 V1364" marker-end="url(#arrow)"/>
<path class="dash" d="M546 1523 H800"/>
<path d="M546 1523 l9 -6 v12 z" fill="#64748b"/>
<path d="M800 1523 l-9 -6 v12 z" fill="#64748b"/>
</g>
<g id="particles">
<circle cx="256" cy="394" r="18" fill="#3b82f6" opacity="0.18" filter="url(#glow-blue)"/>
<circle cx="256" cy="394" r="6" fill="#3b82f6" opacity="0.9"/>
<circle cx="632" cy="394" r="18" fill="#f97316" opacity="0.18" filter="url(#glow-blue)"/>
<circle cx="632" cy="394" r="6" fill="#f97316" opacity="0.9"/>
<circle cx="830" cy="394" r="18" fill="#10b981" opacity="0.18" filter="url(#glow-blue)"/>
<circle cx="830" cy="394" r="6" fill="#10b981" opacity="0.9"/>
<circle cx="673" cy="635" r="17" fill="#10b981" opacity="0.18" filter="url(#glow-blue)"/>
<circle cx="673" cy="635" r="6" fill="#10b981" opacity="0.9"/>
<circle cx="673" cy="1065" r="17" fill="#f59e0b" opacity="0.18" filter="url(#glow-blue)"/>
<circle cx="673" cy="1065" r="6" fill="#f59e0b" opacity="0.9"/>
<circle cx="573" cy="1322" r="17" fill="#3b82f6" opacity="0.18" filter="url(#glow-blue)"/>
<circle cx="573" cy="1322" r="6" fill="#3b82f6" opacity="0.9"/>
</g>
<g id="stages">
<g transform="translate(464 438)">
<rect class="stage" x="0" y="0" width="420" height="180" rx="4"/>
<circle cx="52" cy="90" r="26" fill="#55dced"/>
<text class="index" x="52" y="90">1</text>
<text class="stage-title" x="98" y="72">Source connectors</text>
<text class="stage-body" x="98" y="110">Read each configured system in</text>
<text class="stage-body" x="98" y="140">its native shape.</text>
</g>
<g transform="translate(464 653)">
<rect class="stage" x="0" y="0" width="420" height="180" rx="4"/>
<circle cx="52" cy="90" r="26" fill="#55dced"/>
<text class="index" x="52" y="90">2</text>
<text class="stage-title" x="98" y="72">Context builder</text>
<text class="stage-body" x="98" y="110">Turn source evidence into</text>
<text class="stage-body" x="98" y="140">proposed context updates.</text>
</g>
<g transform="translate(464 868)">
<rect class="stage" x="0" y="0" width="420" height="180" rx="4"/>
<circle cx="52" cy="90" r="26" fill="#55dced"/>
<text class="index" x="52" y="90">3</text>
<text class="stage-title" x="98" y="72">Reconciliation</text>
<text class="stage-body" x="98" y="110">Merge new evidence with the</text>
<text class="stage-body" x="98" y="140">context that already exists.</text>
</g>
<g transform="translate(464 1082)">
<rect class="stage" x="0" y="0" width="420" height="180" rx="4"/>
<circle cx="52" cy="90" r="26" fill="#55dced"/>
<text class="index" x="52" y="90">4</text>
<text class="stage-title" x="98" y="72">Validation</text>
<text class="stage-body" x="98" y="110">Check references and semantics</text>
<text class="stage-body" x="98" y="140">before agents rely on them.</text>
</g>
</g>
<g id="outputs">
<g transform="translate(60 1373)">
<rect class="card" x="0" y="0" width="485" height="329" rx="4"/>
<rect x="0" y="0" width="485" height="4" rx="2" fill="#10b981"/>
<text class="mono" x="24" y="52" fill="#10b981">wiki/*.md</text>
<text class="title" x="24" y="100">Wiki</text>
<g transform="translate(24 122)">
<rect x="0" y="0" width="90" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="12" y="24">free-form</text>
<rect x="98" y="0" width="140" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="110" y="24">auto-maintained</text>
</g>
<text class="body" x="24" y="194">Definitions, caveats, policies, analyst notes, and</text>
<text class="body" x="24" y="222">business language that agents can search.</text>
</g>
<g transform="translate(803 1373)">
<rect class="card" x="0" y="0" width="485" height="329" rx="4"/>
<rect x="0" y="0" width="485" height="4" rx="2" fill="#3b82f6"/>
<text class="mono" x="24" y="52" fill="#3b82f6">semantic-layer/*.yaml</text>
<text class="title" x="24" y="100">Semantic layer</text>
<g transform="translate(24 122)">
<rect x="0" y="0" width="96" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="12" y="24">structured</text>
<rect x="104" y="0" width="104" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="116" y="24">executable</text>
<rect x="216" y="0" width="140" height="36" rx="4" fill="#fbfaf8" stroke="#e5e1dc"/>
<text class="tag" x="228" y="24">auto-maintained</text>
</g>
<text class="body" x="24" y="194">Metrics, joins, tables, dimensions, filters, and</text>
<text class="body" x="24" y="222">segments that ktx can validate and compile into</text>
<text class="body" x="24" y="250">SQL.</text>
</g>
<g transform="translate(622 1505)">
<rect x="0" y="0" width="102" height="36" rx="4" fill="#ffffff" stroke="#e5e1dc"/>
<text class="tag" x="13" y="24">references</text>
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 137 KiB

After

Width:  |  Height:  |  Size: 346 KiB

Before After
Before After

Binary file not shown.

After

Width:  |  Height:  |  Size: 176 KiB

View file

@ -2,6 +2,8 @@ import assert from "node:assert/strict";
import { spawn } from "node:child_process";
import { once } from "node:events";
import { readFile, writeFile } from "node:fs/promises";
import http from "node:http";
import https from "node:https";
import { dirname, join } from "node:path";
import { createServer } from "node:net";
import { after, before, test } from "node:test";
@ -100,6 +102,37 @@ after(async () => {
}
});
// Node's fetch (undici) overwrites the Host header with the connection host,
// so the alias-host redirect rules never match. The low-level http(s) client
// sends Host verbatim, which is what the alias canonicalization keys off of.
function requestWithHost(hostHeader, path) {
const target = new URL(docsSiteUrl);
const client = target.protocol === "https:" ? https : http;
const port =
target.port || (target.protocol === "https:" ? "443" : "80");
return new Promise((resolve, reject) => {
const request = client.request(
{
hostname: target.hostname,
port,
path,
method: "GET",
headers: { Host: hostHeader },
},
(response) => {
response.resume();
resolve({
status: response.statusCode,
location: response.headers.location,
});
},
);
request.on("error", reject);
request.end();
});
}
test("/ktx/docs redirects to the docs introduction", async () => {
const response = await fetch(`${docsSiteUrl}${docsBasePath}/docs`, {
redirect: "manual",
@ -141,3 +174,51 @@ test("/ktx/api/search returns docs search results", async () => {
"search should return at least one docs result",
);
});
test("ktx.sh canonicalizes to a single /ktx basePath on the docs host", async () => {
const root = await requestWithHost("ktx.sh", "/");
assert.equal(root.status, 308);
assert.equal(root.location, "https://docs.kaelio.com/ktx/");
assert.ok(
!root.location.includes("/ktx/ktx"),
"the basePath must not be doubled",
);
const page = await requestWithHost(
"ktx.sh",
"/docs/getting-started/quickstart",
);
assert.equal(page.status, 308);
assert.equal(
page.location,
"https://docs.kaelio.com/ktx/docs/getting-started/quickstart",
);
});
test("docs.ktx.sh canonicalizes to a single /ktx basePath on the docs host", async () => {
const root = await requestWithHost("docs.ktx.sh", "/");
assert.equal(root.status, 308);
assert.equal(root.location, "https://docs.kaelio.com/ktx");
assert.ok(
!root.location.includes("/ktx/ktx"),
"the basePath must not be doubled",
);
const page = await requestWithHost("docs.ktx.sh", "/llms.txt");
assert.equal(page.status, 308);
assert.equal(page.location, "https://docs.kaelio.com/ktx/llms.txt");
});
test("ktx.sh keeps the /slack and /stars exceptions", async () => {
const slack = await requestWithHost("ktx.sh", "/slack");
assert.equal(slack.status, 307);
assert.match(slack.location, /^https:\/\/join\.slack\.com\//);
// /stars is proxied by a beforeFiles rewrite, so the apex catch-all must not
// canonicalize it to the docs host.
const stars = await requestWithHost("ktx.sh", "/stars");
assert.ok(
!(stars.location ?? "").startsWith("https://docs.kaelio.com"),
"the stars dashboard must not be redirected to the docs host",
);
});

View file

@ -85,7 +85,7 @@ test("product mechanics component explains ingestion outputs", async () => {
"compile into SQL",
'"use client"',
"@xyflow/react",
"<ReactFlow",
"<FlowCanvas",
"getSmoothStepPath",
"animateMotion",
"mechanics-particle",
@ -97,21 +97,21 @@ test("product mechanics component explains ingestion outputs", async () => {
);
}
assert.match(
component,
// The ReactFlow canvas config lives in the shared FlowCanvas wrapper, which
// product-mechanics renders. Assert the static read-only behavior there.
const flowCanvas = await readDocsFile("components/flow-canvas.tsx");
for (const guard of [
/nodesDraggable=\{false\}/,
"ReactFlow canvas should disable node dragging",
);
assert.match(
component,
/panOnDrag=\{false\}/,
"ReactFlow canvas should disable panning",
);
assert.match(
component,
/nodesConnectable=\{false\}/,
/zoomOnScroll=\{false\}/,
"ReactFlow canvas should disable scroll zoom",
);
/elementsSelectable=\{false\}/,
]) {
assert.match(
flowCanvas,
guard,
`shared FlowCanvas should enforce static read-only behavior: ${guard}`,
);
}
assert.doesNotMatch(component, /raw-sources/);
assert.doesNotMatch(component, /\.ktx/);

View file

@ -0,0 +1,74 @@
import assert from "node:assert/strict";
import { readFile } from "node:fs/promises";
import { dirname, join } from "node:path";
import { test } from "node:test";
import { fileURLToPath } from "node:url";
const docsSiteDir = join(dirname(fileURLToPath(import.meta.url)), "..");
async function readDocsFile(path) {
return readFile(join(docsSiteDir, path), "utf8");
}
test("docs introduction renders the serving phase after ingestion", async () => {
const introduction = await readDocsFile(
"content/docs/getting-started/introduction.mdx",
);
assert.match(
introduction,
/import\s+\{\s*ProductRuntime\s*\}\s+from\s+"@\/components\/product-runtime";/,
);
assert.match(introduction, /<ProductRuntime\s*\/>/);
const mechanicsIndex = introduction.indexOf("<ProductMechanics />");
const runtimeIndex = introduction.indexOf("<ProductRuntime />");
const useCaseIndex = introduction.indexOf("## Use it for");
assert.ok(
runtimeIndex > mechanicsIndex,
"serving diagram should appear after the ingestion diagram",
);
assert.ok(
runtimeIndex < useCaseIndex,
"serving diagram should appear before use-case sections",
);
});
test("product runtime component explains the serving cycle", async () => {
const component = await readDocsFile("components/product-runtime.tsx");
for (const expectedText of [
"How serving works",
"Serving flow",
"From an agent request to a governed answer",
"Your agent",
"Claude Code",
"Cursor",
"Codex",
"Search wiki + semantic layer",
"Return approved metrics",
"Compile metrics → SQL",
"Context layer",
"Database",
"search + read",
"read-only",
"wiki/*.md",
"semantic-layer/*.yaml",
'"use client"',
"@xyflow/react",
"FlowCanvas",
"getSmoothStepPath",
"animateMotion",
"runtime-particle",
"buildCyclePath",
]) {
assert.ok(
component.includes(expectedText),
`component should include: ${expectedText}`,
);
}
assert.doesNotMatch(component, /raw-sources/);
assert.doesNotMatch(component, /<img/);
});

View file

@ -89,3 +89,41 @@ enough reason to fix it even when the local code "works."
(`loadX` vs `loadHigherX`, `createY` vs `createDefaultY`, `xClient`
vs `xService`), assume callers will pick the wrong one. Unify, or
document inline why both must exist.
## Dispatch and contract leaks across per-variant layers
Layers with multiple per-variant implementations (warehouse drivers,
dialects, LLM providers, ingest adapters, historic-SQL probes) drift
toward parallel switches and informal contracts. The patterns below
look locally reasonable per file but multiply with the number of
variants times the number of consumers — every fix has to be applied
N times, and silent drift between variants is invisible until a user
hits it.
- **MUST NOT**: Maintain two or more files that switch on the same
enum or string union to dispatch to per-variant behavior. Promote
the dispatch to a single registry table keyed by the union, exposed
through one resolution function. If you find yourself writing the
third such switch, the second one was already a bug.
- **MUST**: When every variant of an abstraction implements the same
method, the method belongs on the shared interface. An informal
contract that every implementation happens to satisfy is a leak
waiting to happen — callers will reach for the concrete class
instead of the contract, and the next variant added will silently
forget to implement it.
- **MUST**: When a layer has both a thin shared interface and rich
per-variant concrete classes, they must agree. Either widen the
interface so callers never need the concrete class, or make the
concrete class private (test-only `/** @internal */` JSDoc plus a
boundary check in `scripts/check-boundaries.mjs`). A class that is
public AND has methods the interface does not expose is the exact
configuration that produces leaks.
The warehouse driver / dialect layer in
`packages/cli/src/connectors/<driver>/` plus
`packages/cli/src/context/connections/{dialects,drivers}.ts` is the
canonical worked example: per-driver dialect classes carry
`/** @internal */`, `scripts/check-boundaries.mjs` enforces the import
boundary, and dispatch lives in the two registry files. Apply the
same shape to any other per-variant layer that grows beyond two
implementations.

View file

@ -21,6 +21,41 @@ in prose when ambiguity is possible. Always qualify:
Bare `source` is allowed only inside a section that has already established its
referent (e.g., body of a `Semantic sources` page, or `sourceName` as a CLI arg).
## Context Layer and Context Engine
Use **context layer** as the primary category term for what **ktx** provides to
data agents.
Use **context engine** as the secondary mechanism term for how **ktx** builds,
maintains, validates, and serves that layer.
| Concept | Use | Do not use |
|---|---|---|
| The whole **ktx** product category | **context layer** / **context layer for data agents** | knowledge layer, agent memory |
| The active system that builds and maintains context | **context engine** | context layer when describing ingest/reconciliation internals |
| The durable reviewed surface agents use | **context layer** | context engine |
| The compiler pillar for executable metrics and joins | **semantic layer** | context layer when specifically discussing SQL compilation |
| Prose/business knowledge files | **wiki** / **wiki pages** | wiki context |
### Usage rules
- Use **context layer** in taglines, page titles, meta descriptions, docs
introductions, comparison pages, and first-paragraph definitions.
- Use **context engine** when describing active behavior: ingesting evidence,
reconciling changes, validating references, maintaining files, search, CLI,
and MCP serving.
- Keep **semantic layer** for the narrower YAML/compiler surface: semantic
sources, measures, joins, dimensions, filters, SQL compilation, and semantic
queries.
- Do not use **context engine** as the primary replacement for the whole
product. It sounds like runtime infrastructure; **context layer** better
describes the durable YAML and Markdown surface users review in git.
- Do not use **context layer** when the sentence is specifically about the
compiler. Example: write "the semantic layer compiles semantic queries to
SQL," not "the context layer compiles semantic queries to SQL."
- Default lowercase in prose: `context layer`, `context engine`, `semantic
layer`. Title case only in page titles, headings, nav labels, and UI labels.
## Canonical vocabulary
| Concept | Use | Do not use |
@ -31,7 +66,8 @@ referent (e.g., body of a `Semantic sources` page, or `sourceName` as a CLI arg)
| The connected database | **primary source** / **database connection** | data source |
| Analytics-tooling integration | **context source** / **context-source connection** | BI source, BI model, metadata source, source tool |
| YAML file describing a table | **semantic source** | semantic-layer source, model file, bare "source file" |
| The whole **ktx** surface | **context layer** (lowercase in prose) | "Context Layer" in prose |
| The whole **ktx** surface | **context layer** / **context layer for data agents** (lowercase in prose) | "Context Layer" in prose, knowledge layer, agent memory |
| The active system that builds and maintains context | **context engine** (lowercase in prose) | context layer when describing ingest/reconciliation internals |
| The compiler pillar | **semantic layer** (lowercase in prose) | "Semantic Layer" in prose |
| The query payload | **semantic query** (lowercase in prose) | "Semantic Query" |
| The MCP layer | **MCP server** (the server), **MCP tools** (the functions) | "ktx MCP" as a standalone noun |
@ -41,8 +77,6 @@ referent (e.g., body of a `Semantic sources` page, or `sourceName` as a CLI arg)
| Connection ref in prose | **connection id** (lowercase, two words) | "connection ID" |
| CLI arg/flag literal | `connectionId` (code font) | — |
| File path placeholder | `<connection-id>` (code font) | — |
| Fast schema mode | **fast ingest** | schema ingest, schema-only ingest |
| AI-enriched mode | **deep ingest** | AI-enriched ingest |
| Ingest of a primary connection | **database ingest** | — |
| Ingest of a context-source connection | **context-source ingest** | bare "source ingest" |
| Wiki capture | **text ingest** | — |
@ -56,7 +90,7 @@ referent (e.g., body of a `Semantic sources` page, or `sourceName` as a CLI arg)
| Wiki surface as a whole | **wiki** | "wiki context" |
| A single Markdown file | **wiki page** | — |
| YAML vs Markdown contrast | **wiki Markdown** (only when contrasting with **semantic source YAML**) | — |
| Joins multiplying rows (generic) | **fan-out** | — |
| Joins multiplying rows (generic) | **fanout** | — |
| The two named patterns | **chasm trap** / **fan trap** | — |
| Casual gloss in user prose | **double-count** | (avoid in technical/internals prose) |

View file

@ -14,8 +14,8 @@
"src/telemetry/schema-writer.ts!",
"src/telemetry/index.ts!",
"scripts/**/*.mjs",
"src/**/*.test-utils.ts",
"src/**/acceptance-fixtures.ts",
"test/**/*.test-utils.ts",
"test/**/acceptance-fixtures.ts",
"src/context/scan/relationship-benchmarks.ts!",
"src/context/scan/relationship-benchmark-report.ts!"
]
@ -37,6 +37,9 @@
"@semantic-release/release-notes-generator",
"conventional-changelog-conventionalcommits"
],
"ignore": [
".context/**"
],
"ignoreBinaries": [
"uv",
"lsof"

View file

@ -1,10 +1,10 @@
{
"name": "ktx-workspace",
"version": "0.5.0",
"version": "0.9.0",
"description": "Workspace root for ktx packages",
"private": true,
"type": "module",
"packageManager": "pnpm@11.1.1",
"packageManager": "pnpm@11.4.0",
"engines": {
"node": ">=22.0.0",
"pnpm": ">=10.20.0"
@ -24,6 +24,7 @@
"dead-code:fix": "biome check . --formatter-enabled=false --assist-enabled=false --write && knip --fix --format",
"dead-code:knip": "knip --reporter compact",
"dead-code:knip:production": "knip --production --reporter compact",
"deps:upgrade": "node scripts/upgrade-dependencies.mjs",
"docs": "kill $(lsof -ti:3000) 2>/dev/null; pnpm --filter ktx-docs run dev",
"ktx": "node scripts/run-ktx.mjs",
"link:dev": "node scripts/link-dev-cli.mjs",
@ -31,6 +32,7 @@
"setup:dev": "node scripts/setup-dev.mjs",
"release:published-smoke": "node scripts/published-package-smoke.mjs --require-config",
"release:local-embeddings-smoke": "node scripts/local-embeddings-runtime-smoke.mjs --require-opt-in",
"release:codex-backend-smoke": "node scripts/codex-backend-live-smoke.mjs",
"release:readiness": "node scripts/release-readiness.mjs",
"release:update-version": "node scripts/update-public-release-version.mjs",
"relationships:acquire-public-fixtures": "node scripts/acquire-public-benchmark-fixtures.mjs",
@ -58,11 +60,11 @@
"@semantic-release/github": "^12.0.8",
"@semantic-release/npm": "^13.1.5",
"@semantic-release/release-notes-generator": "^14.1.1",
"@types/node": "^25.7.0",
"@types/node": "^25.9.1",
"better-sqlite3": "^12.10.0",
"conventional-changelog-conventionalcommits": "^9.3.1",
"knip": "^6.12.2",
"pg": "^8.20.0",
"knip": "^6.14.1",
"pg": "^8.21.0",
"semantic-release": "^25.0.3",
"typescript": "^6.0.3",
"yaml": "^2.9.0"

View file

@ -1,7 +1,11 @@
{
"name": "@kaelio/ktx",
"version": "0.5.0",
"version": "0.9.0",
"description": "Standalone ktx context layer for data agents",
"author": {
"name": "Kaelio",
"url": "https://www.kaelio.com"
},
"type": "module",
"engines": {
"node": ">=22.0.0"
@ -32,47 +36,50 @@
"build": "tsc -p tsconfig.json && node dist/telemetry/schema-writer.js src/telemetry/events.schema.json ../../python/ktx-daemon/src/ktx_daemon/telemetry/events.schema.json && node scripts/copy-runtime-assets.mjs && node ../../scripts/prepare-cli-bin.mjs",
"clean": "node -e \"fs.rmSync('dist', { recursive: true, force: true })\"",
"docs:commands": "pnpm run build && node dist/print-command-tree.js",
"smoke": "vitest run src/standalone-smoke.test.ts src/example-smoke.test.ts --testTimeout 30000",
"test": "vitest run --exclude src/standalone-smoke.test.ts --exclude src/example-smoke.test.ts --exclude src/setup-databases.test.ts --exclude src/scan.test.ts --exclude src/commands/connection-metabase-setup.test.ts --exclude src/setup-models.test.ts --exclude src/setup-sources.test.ts --exclude src/setup.test.ts --exclude src/connection.test.ts --exclude src/setup-embeddings.test.ts --exclude src/ingest.test.ts --exclude src/commands/connection-mapping.test.ts --exclude src/ingest-viz.test.ts --exclude src/demo.test.ts --exclude src/setup-project.test.ts --exclude src/sl.test.ts --exclude src/local-scan-connectors.test.ts --exclude src/commands/connection-notion.test.ts --exclude src/context/scan/local-scan.test.ts --exclude src/context/mcp/local-project-ports.test.ts --exclude src/context/ingest/local-stage-ingest.test.ts --exclude src/context/sl/pglite-sl-search-prototype.test.ts --exclude src/context/core/git.service.test.ts --exclude src/context/ingest/local-adapters.test.ts --exclude src/context/ingest/local-bundle-ingest.test.ts --exclude src/context/ingest/local-metabase-ingest.test.ts --exclude src/context/sl/local-sl.test.ts --exclude src/context/search/pglite-owner-process.test.ts --exclude src/context/scan/local-enrichment-artifacts.test.ts --exclude src/context/search/pglite-spike.test.ts --exclude src/context/wiki/local-knowledge.test.ts --exclude src/context/sl/local-query.test.ts --exclude src/context/scan/relationship-review-decisions.test.ts --exclude src/context/scan/relationship-profiling.test.ts",
"test:slow": "vitest run src/setup-databases.test.ts src/scan.test.ts src/commands/connection-metabase-setup.test.ts src/setup-models.test.ts src/setup-sources.test.ts src/setup.test.ts src/connection.test.ts src/setup-embeddings.test.ts src/ingest.test.ts src/commands/connection-mapping.test.ts src/ingest-viz.test.ts src/demo.test.ts src/setup-project.test.ts src/sl.test.ts src/local-scan-connectors.test.ts src/commands/connection-notion.test.ts src/context/scan/local-scan.test.ts src/context/mcp/local-project-ports.test.ts src/context/ingest/local-stage-ingest.test.ts src/context/sl/pglite-sl-search-prototype.test.ts src/context/core/git.service.test.ts src/context/ingest/local-adapters.test.ts src/context/ingest/local-bundle-ingest.test.ts src/context/ingest/local-metabase-ingest.test.ts src/context/sl/local-sl.test.ts src/context/search/pglite-owner-process.test.ts src/context/scan/local-enrichment-artifacts.test.ts src/context/search/pglite-spike.test.ts src/context/wiki/local-knowledge.test.ts src/context/sl/local-query.test.ts src/context/scan/relationship-review-decisions.test.ts src/context/scan/relationship-profiling.test.ts --testTimeout 30000",
"type-check": "tsc -p tsconfig.json --noEmit",
"smoke": "vitest run test/standalone-smoke.test.ts test/example-smoke.test.ts --testTimeout 30000",
"test": "vitest run --exclude test/standalone-smoke.test.ts --exclude test/example-smoke.test.ts --exclude test/setup-databases.test.ts --exclude test/scan.test.ts --exclude test/commands/connection-metabase-setup.test.ts --exclude test/setup-models.test.ts --exclude test/setup-sources.test.ts --exclude test/setup.test.ts --exclude test/connection.test.ts --exclude test/setup-embeddings.test.ts --exclude test/ingest.test.ts --exclude test/commands/connection-mapping.test.ts --exclude test/ingest-viz.test.ts --exclude test/demo.test.ts --exclude test/setup-project.test.ts --exclude test/sl.test.ts --exclude test/local-scan-connectors.test.ts --exclude test/commands/connection-notion.test.ts --exclude test/context/scan/local-scan.test.ts --exclude test/context/mcp/local-project-ports.test.ts --exclude test/context/ingest/local-stage-ingest.test.ts --exclude test/context/sl/pglite-sl-search-prototype.test.ts --exclude test/context/core/git.service.test.ts --exclude test/context/ingest/local-adapters.test.ts --exclude test/context/ingest/local-bundle-ingest.test.ts --exclude test/context/ingest/local-metabase-ingest.test.ts --exclude test/context/sl/local-sl.test.ts --exclude test/context/search/pglite-owner-process.test.ts --exclude test/context/scan/local-enrichment-artifacts.test.ts --exclude test/context/search/pglite-spike.test.ts --exclude test/context/wiki/local-knowledge.test.ts --exclude test/context/sl/local-query.test.ts --exclude test/context/scan/relationship-review-decisions.test.ts --exclude test/context/scan/relationship-profiling.test.ts",
"test:slow": "vitest run test/setup-databases.test.ts test/scan.test.ts test/commands/connection-metabase-setup.test.ts test/setup-models.test.ts test/setup-sources.test.ts test/setup.test.ts test/connection.test.ts test/setup-embeddings.test.ts test/ingest.test.ts test/commands/connection-mapping.test.ts test/ingest-viz.test.ts test/demo.test.ts test/setup-project.test.ts test/sl.test.ts test/local-scan-connectors.test.ts test/commands/connection-notion.test.ts test/context/scan/local-scan.test.ts test/context/mcp/local-project-ports.test.ts test/context/ingest/local-stage-ingest.test.ts test/context/sl/pglite-sl-search-prototype.test.ts test/context/core/git.service.test.ts test/context/ingest/local-adapters.test.ts test/context/ingest/local-bundle-ingest.test.ts test/context/ingest/local-metabase-ingest.test.ts test/context/sl/local-sl.test.ts test/context/search/pglite-owner-process.test.ts test/context/scan/local-enrichment-artifacts.test.ts test/context/search/pglite-spike.test.ts test/context/wiki/local-knowledge.test.ts test/context/sl/local-query.test.ts test/context/scan/relationship-review-decisions.test.ts test/context/scan/relationship-profiling.test.ts --testTimeout 30000",
"type-check": "tsc -p tsconfig.json --noEmit && tsc -p tsconfig.test.json --noEmit",
"relationships:benchmarks": "pnpm --silent run build && node ../../scripts/relationship-benchmark-report.mjs",
"relationships:benchmarks:test": "KTX_RUN_RELATIONSHIP_BENCHMARKS=1 vitest run src/context/scan/relationship-benchmarks.test.ts",
"relationships:benchmarks:test": "KTX_RUN_RELATIONSHIP_BENCHMARKS=1 vitest run test/context/scan/relationship-benchmarks.test.ts",
"search:pglite-spike": "node ../../scripts/pglite-hybrid-search-spike.mjs",
"search:pglite-owner-prototype": "node ../../scripts/pglite-owner-process-prototype.mjs",
"search:pglite-sl-prototype": "node ../../scripts/pglite-sl-search-prototype.mjs"
},
"dependencies": {
"@ai-sdk/anthropic": "3.0.77",
"@ai-sdk/devtools": "0.0.17",
"@ai-sdk/google-vertex": "^4.0.128",
"@anthropic-ai/claude-agent-sdk": "0.3.142",
"@ai-sdk/anthropic": "3.0.78",
"@ai-sdk/devtools": "0.0.18",
"@ai-sdk/google-vertex": "^4.0.134",
"@anthropic-ai/claude-agent-sdk": "0.3.146",
"@clack/core": "1.3.1",
"@clack/prompts": "1.4.0",
"@clickhouse/client": "^1.18.4",
"@clickhouse/client": "^1.18.5",
"@commander-js/extra-typings": "14.0.0",
"@google-cloud/bigquery": "^8.3.1",
"@looker/sdk": "^26.8.0",
"@looker/sdk-node": "^26.8.0",
"@looker/sdk-rtl": "^21.6.5",
"@modelcontextprotocol/sdk": "^1.29.0",
"@notionhq/client": "^5.21.0",
"ai": "^6.0.180",
"@notionhq/client": "^5.22.0",
"@openai/codex-sdk": "^0.133.0",
"ai": "^6.0.188",
"better-sqlite3": "^12.10.0",
"commander": "14.0.3",
"fflate": "^0.8.2",
"fflate": "^0.8.3",
"handlebars": "^4.7.9",
"ink": "^7.0.2",
"ink": "^7.0.3",
"lookml-parser": "7.1.0",
"minimatch": "^10.2.5",
"mssql": "^12.5.2",
"mssql": "^12.5.4",
"mysql2": "^3.22.3",
"openai": "^6.37.0",
"openai": "^6.38.0",
"p-limit": "^7.3.0",
"pg": "^8.20.0",
"posthog-node": "^5.0.0",
"pg": "^8.21.0",
"posthog-node": "^5.34.9",
"react": "^19.2.6",
"semver": "^7.8.1",
"simple-git": "3.36.0",
"snowflake-sdk": "^2.4.1",
"snowflake-sdk": "^2.4.2",
"yaml": "^2.9.0",
"zod": "^4.4.3"
},
@ -81,14 +88,15 @@
"@electric-sql/pglite-socket": "^0.1.5",
"@types/better-sqlite3": "^7.6.13",
"@types/mssql": "^12.3.0",
"@types/node": "^25.7.0",
"@types/node": "^25.9.1",
"@types/pg": "^8.20.0",
"@types/react": "^19.2.14",
"@vitest/coverage-v8": "^4.1.6",
"@types/react": "^19.2.15",
"@types/semver": "^7.7.1",
"@vitest/coverage-v8": "^4.1.7",
"ajv": "8.20.0",
"ink-testing-library": "^4.0.0",
"typescript": "^6.0.3",
"vitest": "^4.1.6"
"vitest": "^4.1.7"
},
"license": "Apache-2.0",
"repository": {

View file

@ -1,7 +1,54 @@
import { cancel, confirm, isCancel, log, spinner } from '@clack/prompts';
import type { KtxCliIo } from './cli-runtime.js';
const ESC = String.fromCharCode(0x1b);
export interface CliStyleEnv {
NO_COLOR?: string;
TERM?: string;
}
function ansiEnabled(env: CliStyleEnv = process.env): boolean {
return !env.NO_COLOR && env.TERM !== 'dumb';
}
function ansiColor(text: string, open: number, close: number, env?: CliStyleEnv): string {
if (!ansiEnabled(env)) {
return text;
}
return `${ESC}[${open}m${text}${ESC}[${close}m`;
}
export function dim(text: string, env?: CliStyleEnv): string {
return ansiColor(text, 2, 22, env);
}
export function cyan(text: string, env?: CliStyleEnv): string {
return ansiColor(text, 36, 39, env);
}
export interface RailBufferedSource {
stdoutText(): string;
stderrText(): string;
}
export function errorMessage(error: unknown): string {
return error instanceof Error ? error.message : String(error);
}
export function writePrefixedLines(write: (chunk: string) => void, output: string): void {
for (const line of output.split(/\r?\n/)) {
if (line.length > 0) {
write(`${line}\n`);
}
}
}
export function flushPrefixedBufferedCommandOutput(io: KtxCliIo, buffered: RailBufferedSource): void {
writePrefixedLines((chunk) => io.stdout.write(chunk), buffered.stdoutText());
writePrefixedLines((chunk) => io.stderr.write(chunk), buffered.stderrText());
}
export interface KtxCliSpinner {
start(message: string): void;
message(message: string): void;
@ -38,11 +85,11 @@ export function createClackSpinner(): KtxCliSpinner {
}
function magenta(text: string): string {
return `${ESC}[35m${text}${ESC}[39m`;
return ansiColor(text, 35, 39);
}
function red(text: string): string {
return `${ESC}[31m${text}${ESC}[39m`;
return ansiColor(text, 31, 39);
}
export function createStaticCliSpinner(io: KtxCliSpinnerIo): KtxCliSpinner {

View file

@ -2,6 +2,7 @@ import { existsSync } from 'node:fs';
import { join } from 'node:path';
import { Command, type CommandUnknownOpts, InvalidArgumentError } from '@commander-js/extra-typings';
import type { KtxCliDeps, KtxCliIo, KtxCliPackageInfo } from './cli-runtime.js';
import { registerCompletionCommands } from './commands/completion-commands.js';
import { registerConnectionCommands } from './commands/connection-commands.js';
import { registerIngestCommands } from './commands/ingest-commands.js';
import { registerWikiCommands } from './commands/knowledge-commands.js';
@ -15,6 +16,7 @@ import { renderMissingProjectMessage } from './doctor.js';
import { findNearestKtxProjectDir, resolveKtxProjectDir } from './project-resolver.js';
import { profileMark, profileSpan } from './startup-profile.js';
import type { CommandOutcome } from './telemetry/index.js';
import { prepareUpdateCheckNotice, type PrepareUpdateCheckNoticeOptions } from './update-check/update-check.js';
profileMark('module:cli-program');
@ -38,6 +40,8 @@ interface KtxCommanderProgramOptions {
runInit: (args: { projectDir: string; force: boolean }, io: KtxCliIo) => Promise<number>;
}
type KtxCliUpdateCheckOptions = Pick<PrepareUpdateCheckNoticeOptions, 'env' | 'fetchDistTags' | 'homeDir' | 'now'>;
export interface BuildKtxProgramOptions {
io: KtxCliIo;
deps: KtxCliDeps;
@ -46,6 +50,7 @@ export interface BuildKtxProgramOptions {
setExitCode?: (code: number) => void;
argv?: string[];
setTelemetryModule?: (telemetry: typeof import('./telemetry/index.js')) => void;
updateCheck?: KtxCliUpdateCheckOptions;
}
type CommanderExitLike = { exitCode: number; code: string; message: string };
@ -430,11 +435,29 @@ export function collectCommandFlagsPresent(command: CommandUnknownOpts): Record<
export function buildKtxProgram(options: BuildKtxProgramOptions): Command {
const program = createBaseProgram(options.packageInfo, options.io);
let pendingUpdateNotice: string | null = null;
program.hook('preAction', async (_thisCommand, actionCommand) => {
// The hidden completion command must stay silent and side-effect free: skip
// the telemetry notice, command span, project checks, and update checks entirely.
if (commandPath(actionCommand as CommandPathNode).includes('__complete')) {
return;
}
const commandNode = actionCommand as CommandPathNode;
const updateCheck = await prepareUpdateCheckNotice({
io: options.io,
env: options.updateCheck?.env,
fetchDistTags: options.updateCheck?.fetchDistTags,
homeDir: options.updateCheck?.homeDir,
installedVersion: options.packageInfo.version,
now: options.updateCheck?.now,
commandOptions: commandOptions(commandNode),
});
pendingUpdateNotice = updateCheck.notice;
const telemetry = await import('./telemetry/index.js');
options.setTelemetryModule?.(telemetry);
await telemetry.showTelemetryNoticeIfNeeded(options.io, options.packageInfo);
const commandNode = actionCommand as CommandPathNode;
const path = commandPath(commandNode);
const projectDir = resolveCommandProjectDir(commandNode);
const hasProject = ktxYamlExists(projectDir);
@ -451,6 +474,13 @@ export function buildKtxProgram(options: BuildKtxProgramOptions): Command {
ensureProjectAvailable(options.io, commandNode);
});
program.hook('postAction', () => {
if (pendingUpdateNotice) {
options.io.stderr.write(pendingUpdateNotice);
pendingUpdateNotice = null;
}
});
const context: KtxCliCommandContext = {
io: options.io,
deps: options.deps,
@ -476,6 +506,7 @@ export function buildKtxProgram(options: BuildKtxProgramOptions): Command {
registerStatusCommands(program, context);
registerMcpCommands(program, context);
registerAdminCommands(program, context);
registerCompletionCommands(program, context);
return program;
}
@ -522,6 +553,13 @@ export async function runCommanderKtxCli(
try {
return await runBareInteractiveCommand(program, io, context);
} catch (error) {
const telemetry = await import('./telemetry/index.js');
await telemetry.reportException({
error,
context: { source: 'bare-interactive', handled: true, fatal: false },
packageInfo: info,
io,
});
io.stderr.write(`${formatCliError(error)}\n`);
return 1;
}
@ -556,6 +594,23 @@ export async function runCommanderKtxCli(
outcome: commandOutcomeForParseResult(parseError, exitCode),
error: parseError,
});
if (
parseError &&
!isCommanderExit(parseError) &&
!isKtxProjectMissingAbortError(parseError)
) {
await telemetryModule.reportException({
error: parseError,
context: {
source: completed?.commandPath.join(' ') ?? 'commander parseAsync',
handled: true,
fatal: false,
},
projectDir: completed?.projectGroupAttached ? completed.projectDir : undefined,
packageInfo: info,
io,
});
}
await telemetryModule.emitCompletedCommand({ completed, packageInfo: info, io });
await telemetryModule.shutdownTelemetryEmitter();
}

View file

@ -89,6 +89,88 @@ export async function runInitForCommander(
return await runInit(args, io);
}
function signalExitCode(signal: NodeJS.Signals): number {
// 128 + signal number: SIGINT (2) -> 130, SIGTERM (15) -> 143.
return signal === 'SIGTERM' ? 143 : 130;
}
/**
* Flush telemetry on interrupt for the real CLI process. `capture()` is
* fire-and-forget and the only flush guarantee lives in a `finally` a signal
* skips, so Ctrl-C / `kill` of a long-running command (ingest, `mcp stdio`)
* would otherwise drop its `command` event and queued events. Installed only
* when driving the actual process; programmatic/test callers pass their own
* `io` and never reach here. Returns a disposer that removes the listeners.
*/
function installTelemetrySignalFlush(io: KtxCliIo, info: KtxCliPackageInfo): () => void {
let handling = false;
const handle = (signal: NodeJS.Signals): void => {
if (handling) {
process.exit(signalExitCode(signal));
}
handling = true;
void (async () => {
try {
const { emitAbortedCommandAndShutdown } = await import('./telemetry/index.js');
await emitAbortedCommandAndShutdown({ packageInfo: info, io });
} catch {
// Best-effort: never let a telemetry hiccup block the interrupt exit.
}
process.exit(signalExitCode(signal));
})();
};
const onSigint = (): void => handle('SIGINT');
const onSigterm = (): void => handle('SIGTERM');
process.on('SIGINT', onSigint);
process.on('SIGTERM', onSigterm);
return () => {
process.off('SIGINT', onSigint);
process.off('SIGTERM', onSigterm);
};
}
/** @internal */
export function createGlobalExceptionReporter(io: KtxCliIo, info: KtxCliPackageInfo) {
return async (source: 'uncaughtException' | 'unhandledRejection', error: unknown): Promise<void> => {
const { reportException, shutdownTelemetryEmitter } = await import('./telemetry/index.js');
await reportException({
error,
context: { source, handled: false, fatal: true },
io,
packageInfo: info,
immediate: true,
});
await shutdownTelemetryEmitter();
};
}
export function installGlobalExceptionHandlers(io: KtxCliIo, info: KtxCliPackageInfo): () => void {
const report = createGlobalExceptionReporter(io, info);
const handle = (source: 'uncaughtException' | 'unhandledRejection', error: unknown): void => {
void (async () => {
try {
await report(source, error);
} catch {
// Best-effort: preserve Node's process termination behavior.
}
if (error instanceof Error && error.stack) {
io.stderr.write(`${error.stack}\n`);
} else {
io.stderr.write(`${String(error)}\n`);
}
process.exit(1);
})();
};
const onUncaught = (error: Error): void => handle('uncaughtException', error);
const onUnhandled = (reason: unknown): void => handle('unhandledRejection', reason);
process.on('uncaughtException', onUncaught);
process.on('unhandledRejection', onUnhandled);
return () => {
process.off('uncaughtException', onUncaught);
process.off('unhandledRejection', onUnhandled);
};
}
export async function runKtxCli(
argv = process.argv.slice(2),
io: KtxCliIo = process,
@ -98,7 +180,17 @@ export async function runKtxCli(
profileMark('runtime:runKtxCli');
const { runCommanderKtxCli } = await profileSpan('import ./cli-program.js', () => import('./cli-program.js'));
return await runCommanderKtxCli(argv, io, deps, info, {
runInit: runInitForCommander,
});
// Real-process entry only: flush telemetry if interrupted. Test/programmatic
// callers pass their own `io`, so they never install process-level handlers.
const removeSignalFlush = (io as unknown) === process ? installTelemetrySignalFlush(io, info) : undefined;
const removeGlobalExceptionHandlers =
(io as unknown) === process ? installGlobalExceptionHandlers(io, info) : undefined;
try {
return await runCommanderKtxCli(argv, io, deps, info, {
runInit: runInitForCommander,
});
} finally {
removeGlobalExceptionHandlers?.();
removeSignalFlush?.();
}
}

View file

@ -16,7 +16,11 @@ export function walkCommandTree(command: CommandUnknownOpts): CommandTreeNode {
description: command.description(),
aliases: command.aliases(),
arguments: command.registeredArguments.map(formatArgumentDeclaration),
children: command.commands.map((child) => walkCommandTree(child)),
// Internal commands (e.g. the shell-completion helper `__complete`) use a
// `__` prefix and are omitted from the human-facing command tree.
children: command.commands
.filter((child) => !child.name().startsWith('__'))
.map((child) => walkCommandTree(child)),
};
}

View file

@ -0,0 +1,44 @@
import { Argument, type Command } from '@commander-js/extra-typings';
import type { KtxCliCommandContext } from '../cli-program.js';
import { computeCompletions } from '../completion/complete-engine.js';
import { completionScript } from '../completion/completion-scripts.js';
import { createProjectCompletionProviders } from '../completion/dynamic-candidates.js';
import { profileMark } from '../startup-profile.js';
profileMark('module:commands/completion-commands');
export function registerCompletionCommands(program: Command, context: KtxCliCommandContext): void {
program
.command('completion')
.description('Print a shell completion script for ktx')
.addArgument(new Argument('<shell>', 'Target shell').choices(['zsh', 'bash']))
.addHelpText(
'after',
'\nEnable completion by adding the matching line to your shell startup file:\n' +
' zsh: eval "$(ktx completion zsh)"\n' +
' bash: eval "$(ktx completion bash)"\n',
)
.action((shell) => {
context.io.stdout.write(completionScript(shell));
});
// Hidden command invoked by the generated shell scripts. It must only ever
// print newline-separated candidates to stdout and exit 0, so a TAB press is
// never disrupted by an error, a telemetry notice, or a parse failure.
program
.command('__complete', { hidden: true })
.argument('[words...]')
.allowUnknownOption(true)
.helpOption(false)
.action(async (words: string[]) => {
try {
const candidates = await computeCompletions(program, words, createProjectCompletionProviders());
if (candidates.length > 0) {
context.io.stdout.write(`${candidates.join('\n')}\n`);
}
} catch {
// Swallow: completion must never break the shell.
}
context.setExitCode(0);
});
}

View file

@ -29,8 +29,6 @@ export function registerIngestCommands(
.usage('[options] [connectionId]')
.argument('[connectionId]', 'Configured connection id to ingest (omit to ingest all)')
.option('--all', 'Ingest all configured connections', false)
.addOption(new Option('--fast', 'Use deterministic database schema ingest').conflicts('deep'))
.addOption(new Option('--deep', 'Use AI-enriched database ingest').conflicts('fast'))
.addOption(new Option('--query-history', 'Include database query-history usage patterns').conflicts('noQueryHistory'))
.addOption(new Option('--no-query-history', 'Skip database query-history usage patterns'))
.option('--query-history-window-days <days>', 'Query-history lookback window for this run', parsePositiveIntegerOption)
@ -87,8 +85,6 @@ export function registerIngestCommands(
all: selection.kind === 'all',
json: options.json === true,
inputMode: options.input === false ? 'disabled' : 'auto',
...(options.fast === true ? { depth: 'fast' as const } : {}),
...(options.deep === true ? { depth: 'deep' as const } : {}),
queryHistory,
...(options.queryHistoryWindowDays !== undefined ? { queryHistoryWindowDays: options.queryHistoryWindowDays } : {}),
cliVersion: context.packageInfo.version,

View file

@ -21,9 +21,9 @@ function isDebugEnabled(command: CommandWithGlobalOptions): boolean {
}
export function registerWikiCommands(program: Command, context: KtxCliCommandContext): void {
program
const wiki = program
.command('wiki')
.description('List or search local wiki pages')
.description('List, search, or read local wiki pages')
.usage('[options] [query...]')
.argument('[query...]', 'Search query; omit to list all pages')
.option('--user-id <id>', 'Local user id', 'local')
@ -76,4 +76,18 @@ export function registerWikiCommands(program: Command, context: KtxCliCommandCon
});
},
);
wiki
.command('read')
.description('Read a wiki page file by key')
.argument('<key>', 'Wiki page key')
.action(async (key: string, _options, command) => {
const parentOpts = command.parent?.opts() as { userId?: string } | undefined;
await runKnowledgeArgs(context, {
command: 'read',
projectDir: resolveCommandProjectDir(command),
key,
userId: parentOpts?.userId ?? 'local',
});
});
}

View file

@ -29,7 +29,7 @@ function embeddingBackend(value: string): 'openai' | 'sentence-transformers' {
}
function llmBackend(value: string): KtxSetupLlmBackend {
if (value === 'anthropic' || value === 'vertex' || value === 'claude-code') {
if (value === 'anthropic' || value === 'vertex' || value === 'claude-code' || value === 'codex') {
return value;
}
throw new InvalidArgumentError(`invalid choice '${value}'`);
@ -308,9 +308,14 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
.addOption(new Option('--source-git-url <url>', 'Git URL for dbt, MetricFlow, or LookML').hideHelp())
.addOption(new Option('--source-branch <branch>', 'Git branch for source setup').hideHelp())
.addOption(new Option('--source-subpath <path>', 'Repo subpath for source setup').hideHelp())
.addOption(new Option('--source-auth-token-ref <ref>', 'env: or file: credential ref for source repo auth').hideHelp())
.addOption(
new Option(
'--source-auth-token-ref <ref>',
'env: or file: credential ref for source repo auth or Notion integration token',
).hideHelp(),
)
.addOption(new Option('--source-url <url>', 'Source service URL for Metabase or Looker').hideHelp())
.addOption(new Option('--source-api-key-ref <ref>', 'env: or file: API key ref for Metabase or Notion').hideHelp())
.addOption(new Option('--source-api-key-ref <ref>', 'env: or file: API key ref for Metabase').hideHelp())
.addOption(new Option('--source-client-id <id>', 'Looker client id').hideHelp())
.addOption(new Option('--source-client-secret-ref <ref>', 'env: or file: Looker client secret ref').hideHelp())
.addOption(new Option('--source-warehouse-connection-id <id>', 'Mapped warehouse connection id').hideHelp())
@ -401,6 +406,8 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
}
const resolvedAgentScope = options.local ? 'local' : options.global ? 'global' : 'project';
const debugEnabled =
((command.optsWithGlobals ? command.optsWithGlobals() : command.opts()) as { debug?: unknown }).debug === true;
await runSetupArgs(context, {
command: 'run',
projectDir: resolveCommandProjectDir(command),
@ -410,6 +417,7 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
agentScope: resolvedAgentScope,
skipAgents: options.skipAgents === true,
inputMode: options.input === false ? 'disabled' : 'auto',
...(debugEnabled ? { debug: true } : {}),
yes: options.yes === true,
cliVersion: context.packageInfo.version,
...(options.llmBackend ? { llmBackend: options.llmBackend } : {}),

View file

@ -94,19 +94,28 @@ export function registerSlCommands(program: Command, context: KtxCliCommandConte
},
);
sl.command('validate')
.description('Validate a semantic-layer source (set --connection-id on `ktx sl`)')
sl.command('read')
.description('Read a semantic-layer source YAML file')
.argument('<sourceName>', 'Semantic-layer source name')
.action(async (sourceName: string, _options, command) => {
const parentOpts = command.parent?.opts() as { connectionId?: string } | undefined;
await runSlArgs(context, {
command: 'read',
projectDir: resolveCommandProjectDir(command),
connectionId: parentOpts?.connectionId,
sourceName,
});
});
sl.command('validate')
.description('Validate a semantic-layer source')
.argument('<sourceName>', 'Semantic-layer source name')
.action(async (sourceName: string, _options, command) => {
const parentOpts = command.parent?.opts() as { connectionId?: string } | undefined;
const connectionId = parentOpts?.connectionId;
if (connectionId === undefined) {
command.error("error: required option '--connection-id <id>' not specified");
}
await runSlArgs(context, {
command: 'validate',
projectDir: resolveCommandProjectDir(command),
connectionId: connectionId as string,
connectionId: parentOpts?.connectionId,
sourceName,
});
});
@ -131,10 +140,14 @@ export function registerSlCommands(program: Command, context: KtxCliCommandConte
throw new Error('sl query requires at least one --measure');
}
const parentOpts = command.parent?.opts() as { connectionId?: string } | undefined;
const connectionId = parentOpts?.connectionId;
if (connectionId === undefined) {
command.error("error: required option '--connection-id <id>' not specified");
}
const args = slQueryCommandSchema.parse({
command: 'query',
projectDir: resolveCommandProjectDir(command),
connectionId: parentOpts?.connectionId,
connectionId,
...(options.queryFile
? { queryFile: options.queryFile }
: {

View file

@ -0,0 +1,172 @@
import type { CommandUnknownOpts, Option } from '@commander-js/extra-typings';
/**
* Dynamic completion candidates that depend on project state (semantic-layer
* source names, wiki page keys, connection ids). Injected so the engine stays
* pure and unit-testable without touching the filesystem.
*/
export interface CompletionProviders {
/** Candidate operands for a positional argument of the active command path. */
positionalCandidates(commandPath: string[], typedTokens: string[]): Promise<string[]>;
/** Candidate values for an option that has no static `choices` (e.g. `--connection-id`). */
optionValueCandidates(commandPath: string[], optionFlag: string, typedTokens: string[]): Promise<string[]>;
}
interface ResolvedCommand {
command: CommandUnknownOpts;
/** Subcommand names from the root down to the active command (root name excluded). */
commandPath: string[];
}
function isHiddenCommand(command: CommandUnknownOpts): boolean {
// Completion mirrors `ktx --help`: commands registered with `{ hidden: true }`
// (the `__complete` helper and `mcp serve-internal`) are internal and must not
// surface. Commander exposes this only through the private `_hidden` field its
// own help renderer reads, so a name heuristic like a `__` prefix is not enough.
return (command as { _hidden?: boolean })._hidden === true;
}
function resolveCommand(program: CommandUnknownOpts, typedTokens: string[]): ResolvedCommand {
let command: CommandUnknownOpts = program;
const commandPath: string[] = [];
for (let index = 0; index < typedTokens.length; index += 1) {
const token = typedTokens[index];
if (token.startsWith('-')) {
// A value-taking option in the `--flag value` form consumes the next token
// as its value, so skip that value before matching subcommands. Otherwise a
// connection id like `query` would be resolved as the `sl query` subcommand
// instead of being treated as the `--connection-id` value. The `--flag=value`
// form carries its own value and consumes nothing extra.
if (!token.includes('=')) {
const option = findOption(command, token);
if (option && !option.isBoolean()) {
index += 1;
}
}
continue;
}
const sub = command.commands.find((candidate) => candidate.name() === token || candidate.aliases().includes(token));
if (sub) {
command = sub;
commandPath.push(sub.name());
}
}
return { command, commandPath };
}
function collectOptions(command: CommandUnknownOpts): Option[] {
const options: Option[] = [];
let current: CommandUnknownOpts | null = command;
while (current) {
options.push(...current.options);
current = current.parent;
}
return options;
}
function findOption(command: CommandUnknownOpts, flag: string): Option | undefined {
return collectOptions(command).find((option) => option.long === flag || option.short === flag);
}
function isRepeatableOption(option: Option): boolean {
// Variadic options, and options backed by a collector with an array default
// (e.g. `--measure`/`--dimension`), may be supplied more than once.
return option.variadic || Array.isArray(option.defaultValue);
}
function flagCandidates(command: CommandUnknownOpts, typedTokens: string[]): string[] {
const present = new Set(typedTokens.filter((token) => token.startsWith('-')));
const candidates: string[] = [];
for (const option of collectOptions(command)) {
if (option.hidden || !option.long) {
continue;
}
if (present.has(option.long) && !isRepeatableOption(option)) {
continue;
}
candidates.push(option.long);
}
return candidates;
}
async function optionValueCandidates(
resolved: ResolvedCommand,
option: Option,
typedTokens: string[],
providers: CompletionProviders,
): Promise<string[]> {
if (option.argChoices && option.argChoices.length > 0) {
return option.argChoices;
}
return providers.optionValueCandidates(resolved.commandPath, option.long ?? option.name(), typedTokens);
}
function dedupeSortFilter(candidates: string[], partial: string): string[] {
const seen = new Set<string>();
const matches: string[] = [];
for (const candidate of candidates) {
if (!candidate.startsWith(partial) || seen.has(candidate)) {
continue;
}
seen.add(candidate);
matches.push(candidate);
}
return matches.sort();
}
/**
* Compute completion candidates for the partial last element of `words`
* (everything the shell has on the line after `ktx`). The active command and
* its flags are derived by walking the live Commander tree, so completion never
* drifts from the real command structure.
*/
export async function computeCompletions(
program: CommandUnknownOpts,
words: string[],
providers: CompletionProviders,
): Promise<string[]> {
const partial = words.length > 0 ? (words[words.length - 1] ?? '') : '';
const typedTokens = words.slice(0, -1);
const resolved = resolveCommand(program, typedTokens);
// (a) Option value via the `--opt=value` form.
const equalsMatch = /^(--[^=]+)=(.*)$/.exec(partial);
if (equalsMatch) {
const [, flag, valuePartial] = equalsMatch;
const option = findOption(resolved.command, flag);
if (!option || option.isBoolean()) {
return [];
}
const values = await optionValueCandidates(resolved, option, typedTokens, providers);
return dedupeSortFilter(
values.map((value) => `${flag}=${value}`),
`${flag}=${valuePartial}`,
);
}
// (b) Option value via the `--opt value` form (previous token is a value-taking option).
const previous = typedTokens[typedTokens.length - 1];
if (previous && previous.startsWith('-') && !partial.startsWith('-')) {
const option = findOption(resolved.command, previous);
if (option && !option.isBoolean()) {
return dedupeSortFilter(await optionValueCandidates(resolved, option, typedTokens, providers), partial);
}
}
// (c) Flag completion.
if (partial.startsWith('-')) {
return dedupeSortFilter(flagCandidates(resolved.command, typedTokens), partial);
}
// (d) Positional: subcommand names union static argument choices union dynamic operand candidates.
const candidates: string[] = resolved.command.commands
.filter((sub) => !isHiddenCommand(sub))
.map((sub) => sub.name());
for (const argument of resolved.command.registeredArguments) {
if (argument.argChoices) {
candidates.push(...argument.argChoices);
}
}
candidates.push(...(await providers.positionalCandidates(resolved.commandPath, typedTokens)));
return dedupeSortFilter(candidates, partial);
}

View file

@ -0,0 +1,39 @@
// Static shell completion scripts emitted by `ktx completion <shell>`.
//
// Both scripts gather the words on the current command line (excluding the
// leading `ktx`), append the partial word under the cursor, and delegate to the
// hidden `ktx __complete` command, which prints newline-separated candidates.
// All command/flag/entity knowledge lives in `ktx __complete` so these scripts
// never have to encode the command tree.
//
// Lines are single-quoted JS strings so the shell `${...}` expansions are
// emitted verbatim (a template literal would try to interpolate them).
const ZSH_SCRIPT = [
'#compdef ktx',
'_ktx() {',
' local -a candidates',
' local out',
' out="$(ktx __complete -- "${words[@]:1:$((CURRENT-1))}" 2>/dev/null)" || return 0',
' candidates=("${(@f)out}")',
' compadd -- $candidates',
'}',
'compdef _ktx ktx',
'',
].join('\n');
const BASH_SCRIPT = [
'_ktx() {',
' local cur out',
' cur="${COMP_WORDS[COMP_CWORD]}"',
' out="$(ktx __complete -- "${COMP_WORDS[@]:1:COMP_CWORD}" 2>/dev/null)" || { COMPREPLY=(); return 0; }',
" local IFS=$'\\n'",
' COMPREPLY=($(compgen -W "${out}" -- "$cur"))',
'}',
'complete -F _ktx ktx',
'',
].join('\n');
export function completionScript(shell: 'zsh' | 'bash'): string {
return shell === 'zsh' ? ZSH_SCRIPT : BASH_SCRIPT;
}

View file

@ -0,0 +1,103 @@
import { existsSync } from 'node:fs';
import { join } from 'node:path';
import type { KtxLocalProject } from '../context/project/project.js';
import { resolveKtxProjectDir } from '../project-resolver.js';
import type { CompletionProviders } from './complete-engine.js';
/** Extract an option value from already-typed tokens (`--flag value` or `--flag=value`). */
function extractOptionValue(tokens: string[], flag: string): string | undefined {
const prefix = `${flag}=`;
for (let index = 0; index < tokens.length; index += 1) {
const token = tokens[index];
if (token === flag) {
const next = tokens[index + 1];
if (next !== undefined && !next.startsWith('-')) {
return next;
}
} else if (token.startsWith(prefix)) {
return token.slice(prefix.length);
}
}
return undefined;
}
/**
* Resolve and load the project the user is completing against. Honors a
* `--project-dir` typed on the line, then `KTX_PROJECT_DIR`, then the nearest
* `ktx.yaml`. Returns null (no completions) when there is no project, without
* creating any files.
*/
async function loadCompletionProject(typedTokens: string[]): Promise<KtxLocalProject | null> {
const explicitProjectDir = extractOptionValue(typedTokens, '--project-dir');
const projectDir = resolveKtxProjectDir(explicitProjectDir !== undefined ? { explicitProjectDir } : {});
if (!existsSync(join(projectDir, 'ktx.yaml'))) {
return null;
}
const { loadKtxProject } = await import('../context/project/project.js');
return loadKtxProject({ projectDir });
}
async function sourceNames(typedTokens: string[]): Promise<string[]> {
const project = await loadCompletionProject(typedTokens);
if (!project) {
return [];
}
const connectionId = extractOptionValue(typedTokens, '--connection-id');
const { listLocalSlSources } = await import('../context/sl/local-sl.js');
const summaries = await listLocalSlSources(project, connectionId !== undefined ? { connectionId } : {});
return [...new Set(summaries.map((summary) => summary.name))];
}
async function wikiPageKeys(typedTokens: string[]): Promise<string[]> {
const project = await loadCompletionProject(typedTokens);
if (!project) {
return [];
}
const userId = extractOptionValue(typedTokens, '--user-id');
const { listLocalKnowledgePageKeys } = await import('../context/wiki/local-knowledge.js');
return listLocalKnowledgePageKeys(project, userId !== undefined ? { userId } : {});
}
async function connectionIds(typedTokens: string[]): Promise<string[]> {
const project = await loadCompletionProject(typedTokens);
if (!project) {
return [];
}
return Object.keys(project.config.connections).sort();
}
/**
* Project-backed completion providers. Every entry swallows its own errors so a
* failed lookup never breaks the shell completion degrades to commands/flags.
*/
export function createProjectCompletionProviders(): CompletionProviders {
return {
async positionalCandidates(commandPath, typedTokens) {
try {
const key = commandPath.join(' ');
if (key === 'sl read' || key === 'sl validate') {
return await sourceNames(typedTokens);
}
if (key === 'wiki read') {
return await wikiPageKeys(typedTokens);
}
if (key === 'connection test' || key === 'ingest') {
return await connectionIds(typedTokens);
}
return [];
} catch {
return [];
}
},
async optionValueCandidates(_commandPath, optionFlag, typedTokens) {
try {
if (optionFlag === '--connection-id' || optionFlag === '--connection') {
return await connectionIds(typedTokens);
}
return [];
} catch {
return [];
}
},
};
}

View file

@ -0,0 +1,21 @@
import type { KtxProjectConnectionConfig } from './context/project/config.js';
const KTX_DATABASE_DRIVER_IDS = new Set([
'sqlite',
'postgres',
'mysql',
'clickhouse',
'sqlserver',
'bigquery',
'snowflake',
]);
export function normalizeConnectionDriver(connection: KtxProjectConnectionConfig): string {
return String(connection.driver ?? '')
.trim()
.toLowerCase();
}
export function isDatabaseDriver(driver: string): boolean {
return KTX_DATABASE_DRIVER_IDS.has(driver.trim().toLowerCase());
}

View file

@ -0,0 +1,132 @@
import type { KtxCliIo } from './cli-runtime.js';
import type { KtxSetupPromptOption } from './setup-prompts.js';
export type RecoveryOutcome = 'ready' | 'skip' | 'back' | 'failed';
/** @internal */
export interface RecoveryAction {
value: string;
label: string;
run: () => Promise<void>;
}
export type ConfigureResult = 'configured' | 'back' | 'cancelled';
export type ValidateResult =
| { status: 'ok' }
| { status: 'back' }
| { status: 'failed'; extraActions?: RecoveryAction[] };
export interface ConnectionRecoveryInput {
label: string;
interactive: boolean;
allowSkip: boolean;
io: KtxCliIo;
prompts: {
select(options: { message: string; options: KtxSetupPromptOption[] }): Promise<string>;
};
snapshot: () => Promise<() => Promise<void>>;
configure: () => Promise<ConfigureResult>;
validate: () => Promise<ValidateResult>;
}
async function runRollbackOnce(input: {
rollback: () => Promise<void>;
state: { rolledBack: boolean };
}): Promise<void> {
if (input.state.rolledBack) {
return;
}
input.state.rolledBack = true;
await input.rollback();
}
function recoveryOptions(input: {
allowSkip: boolean;
extraActions?: RecoveryAction[];
}): KtxSetupPromptOption[] {
return [
{ value: 'retry', label: 'Retry connection test' },
{ value: 're-enter', label: 'Re-enter connection details' },
...(input.extraActions ?? []).map((action) => ({
value: action.value,
label: action.label,
})),
...(input.allowSkip ? [{ value: 'skip', label: 'Skip this connection' }] : []),
{ value: 'back', label: 'Back' },
];
}
export async function runConnectionSetupWithRecovery(
input: ConnectionRecoveryInput,
): Promise<RecoveryOutcome> {
const rollback = await input.snapshot();
const rollbackState = { rolledBack: false };
const firstConfig = await input.configure();
if (firstConfig === 'back') {
await runRollbackOnce({ rollback, state: rollbackState });
return 'back';
}
if (firstConfig === 'cancelled') {
await runRollbackOnce({ rollback, state: rollbackState });
return 'failed';
}
let validation = await input.validate();
while (validation.status !== 'ok') {
if (validation.status === 'back') {
await runRollbackOnce({ rollback, state: rollbackState });
return 'back';
}
if (!input.interactive) {
return 'failed';
}
const action = await input.prompts.select({
message: `Connection setup failed for ${input.label}`,
options: recoveryOptions({
allowSkip: input.allowSkip,
extraActions: validation.extraActions,
}),
});
if (action === 'back') {
await runRollbackOnce({ rollback, state: rollbackState });
return 'back';
}
if (action === 'skip' && input.allowSkip) {
await runRollbackOnce({ rollback, state: rollbackState });
return 'skip';
}
if (action === 're-enter') {
const nextConfig = await input.configure();
if (nextConfig === 'back') {
await runRollbackOnce({ rollback, state: rollbackState });
return 'back';
}
if (nextConfig === 'cancelled') {
await runRollbackOnce({ rollback, state: rollbackState });
return 'failed';
}
validation = await input.validate();
continue;
}
if (action === 'retry') {
validation = await input.validate();
continue;
}
const extraAction = validation.extraActions?.find((candidate) => candidate.value === action);
if (extraAction) {
await extraAction.run();
validation = await input.validate();
continue;
}
validation = await input.validate();
}
return 'ready';
}

View file

@ -6,6 +6,7 @@ import { type NotionBotInfo, NotionClient } from './context/ingest/adapters/noti
import { createLocalLookerCredentialResolver } from './context/ingest/adapters/looker/local-looker.adapter.js';
import { metabaseRuntimeConfigFromLocalConnection } from './context/ingest/adapters/metabase/local-metabase.adapter.js';
import { testRepoConnection } from './context/ingest/repo-fetch.js';
import { getDriverRegistration } from './context/connections/drivers.js';
import { parseNotionConnectionConfig, resolveNotionConnectionAuthToken } from './context/connections/notion-config.js';
import { resolveKtxConfigReference } from './context/core/config-reference.js';
import { type KtxLocalProject, loadKtxProject } from './context/project/project.js';
@ -15,8 +16,9 @@ import { bold, dim, green, red, SYMBOLS } from './io/symbols.js';
import { createKtxCliScanConnector } from './local-scan-connectors.js';
import { profileMark } from './startup-profile.js';
import { isDemoConnection } from './telemetry/demo-detect.js';
import { emitTelemetryEvent } from './telemetry/index.js';
import { scrubErrorClass } from './telemetry/scrubber.js';
import { emitTelemetryEvent, reportException } from './telemetry/index.js';
import { collectTelemetryRedactionSecrets } from './telemetry/redaction-secrets.js';
import { formatErrorDetail, scrubErrorClass } from './telemetry/scrubber.js';
profileMark('module:connection');
@ -73,6 +75,12 @@ async function testNativeConnection(
}
const result = await connector.testConnection();
if (!result.success) {
// Re-throw the driver's original error so connection_test telemetry records
// its real class (e.g. ConnectionError) and code (e.g. ELOGIN) instead of
// collapsing every native failure to a generic Error with no code.
if (result.cause instanceof Error) {
throw result.cause;
}
throw new Error(result.error ?? 'connection test failed');
}
return { driver: connector.driver };
@ -272,17 +280,7 @@ async function testConnectionByDriver(
return { driver, detailKey: 'Repo', detailValue: result.repoUrl };
}
if (
driver === 'sqlite' ||
driver === 'sqlite3' ||
driver === 'postgres' ||
driver === 'postgresql' ||
driver === 'mysql' ||
driver === 'clickhouse' ||
driver === 'sqlserver' ||
driver === 'bigquery' ||
driver === 'snowflake'
) {
if (getDriverRegistration(driver)) {
const result = await testNativeConnection(
project,
connectionId,
@ -313,6 +311,7 @@ async function emitConnectionTest(input: {
io: KtxCliIo;
}): Promise<void> {
const errorClass = input.error ? scrubErrorClass(input.error) : undefined;
const errorDetail = input.error ? formatErrorDetail(input.error) : undefined;
await emitTelemetryEvent({
name: 'connection_test',
projectDir: input.project.projectDir,
@ -323,8 +322,24 @@ async function emitConnectionTest(input: {
outcome: input.outcome,
durationMs: input.durationMs,
...(errorClass ? { errorClass } : {}),
...(errorDetail ? { errorDetail } : {}),
},
});
if (input.error) {
await reportException({
error: input.error,
context: { source: 'connection test', handled: true, fatal: false },
projectDir: input.project.projectDir,
io: input.io,
redactionSecrets: await collectTelemetryRedactionSecrets({
project: input.project,
connectionId: input.connectionId,
includeLlm: false,
includeEmbeddings: false,
env: process.env,
}),
});
}
}
function visualWidth(text: string): number {

View file

@ -1,12 +1,34 @@
import { BigQuery, type TableField } from '@google-cloud/bigquery';
import { normalizeBigQueryProjectId, normalizeBigQueryRegion } from '../../context/connections/bigquery-identifiers.js';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
connectorTestFailure,
createKtxConnectorCapabilities,
type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
type KtxColumnStatsResult,
type KtxQueryResult,
type KtxReadOnlyQueryInput,
type KtxScanConnector,
type KtxScanContext,
type KtxScanInput,
type KtxScanWarning,
type KtxSchemaColumn,
type KtxSchemaSnapshot,
type KtxSchemaTable,
type KtxTableListEntry,
type KtxTableRef,
type KtxTableSampleInput,
type KtxTableSampleResult,
} from '../../context/scan/types.js';
import { readFileSync } from 'node:fs';
import { homedir } from 'node:os';
import { resolve } from 'node:path';
import { KtxBigQueryDialect } from './dialect.js';
export interface KtxBigQueryConnectionConfig {
driver?: string;
@ -185,6 +207,17 @@ function firstNumber(value: unknown): number | null {
return Number.isFinite(numberValue) ? numberValue : null;
}
function isDeniedError(error: unknown): boolean {
if (!error || typeof error !== 'object') {
return false;
}
const candidate = error as { code?: unknown; errors?: Array<{ reason?: unknown }> };
return (
candidate.code === 403 ||
candidate.errors?.some((item) => item.reason === 'accessDenied' || item.reason === 'notFound') === true
);
}
function normalizeValue(value: unknown): unknown {
if (value === null || value === undefined) {
return null;
@ -204,6 +237,23 @@ function normalizeValue(value: unknown): unknown {
return value;
}
/** @internal */
export function prepareBigQueryReadOnlyQuery(
sql: string,
params?: Record<string, unknown>,
): { sql: string; params?: Record<string, unknown> } {
if (!params) {
return { sql, params: undefined };
}
let processedSql = sql;
const processedParams: Record<string, unknown> = {};
for (const [key, value] of Object.entries(params)) {
processedSql = processedSql.replace(new RegExp(`:${key}\\b`, 'g'), `@${key}`);
processedParams[key] = value;
}
return { sql: processedSql, params: Object.keys(processedParams).length > 0 ? processedParams : undefined };
}
export function isKtxBigQueryConnectionConfig(
connection: KtxBigQueryConnectionConfig | undefined,
): connection is KtxBigQueryConnectionConfig {
@ -255,7 +305,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
private readonly now: () => Date;
private readonly maxBytesBilled?: number | string;
private readonly queryTimeoutMs?: number;
private readonly dialect = new KtxBigQueryDialect();
private readonly dialect = getDialectForDriver('bigquery');
private client: KtxBigQueryClient | null = null;
constructor(options: KtxBigQueryScanConnectorOptions) {
@ -272,7 +322,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
this.id = `bigquery:${options.connectionId}`;
}
async testConnection(): Promise<{ success: boolean; error?: string }> {
async testConnection(): Promise<KtxConnectorTestResult> {
try {
const client = this.getClient();
await client.getDatasets({ maxResults: 1 });
@ -281,7 +331,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
}
return { success: true };
} catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) };
return connectorTestFailure(error);
}
}
@ -289,11 +339,12 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
this.assertConnection(input.connectionId);
const tables: KtxSchemaTable[] = [];
const datasetIds = this.requireDatasetIdsForScan();
const snapshotWarnings: KtxScanWarning[] = [];
for (const datasetId of datasetIds) {
const scopedNames = input.tableScope
? scopedTableNames(input.tableScope, { catalog: this.resolved.projectId, db: datasetId })
: null;
tables.push(...(await this.introspectDataset(datasetId, scopedNames)));
tables.push(...(await this.introspectDataset(datasetId, scopedNames, snapshotWarnings)));
}
return {
connectionId: this.connectionId,
@ -307,6 +358,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
total_columns: tables.reduce((sum, table) => sum + table.columns.length, 0),
},
tables,
warnings: snapshotWarnings,
};
}
@ -331,7 +383,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
async executeReadOnly(input: KtxBigQueryReadOnlyQueryInput, _ctx: KtxScanContext): Promise<KtxQueryResult> {
this.assertConnection(input.connectionId);
const limitedSql = limitSqlForExecution(assertReadOnlySql(input.sql), input.maxRows);
const prepared = this.dialect.prepareQuery(limitedSql, input.params);
const prepared = prepareBigQueryReadOnlyQuery(limitedSql, input.params);
const result = await this.query(prepared.sql, prepared.params);
return { ...result, rowCount: result.rows.length };
}
@ -366,7 +418,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
if (!datasetId) {
return 0;
}
const tables = await this.introspectDataset(datasetId, null);
const tables = await this.introspectDataset(datasetId, null, []);
return tables.find((table) => table.name === tableName)?.estimatedRows ?? 0;
}
@ -378,7 +430,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
return this.dialect.quoteIdentifier(identifier);
}
async listDatasets(): Promise<string[]> {
async listSchemas(): Promise<string[]> {
const [datasets] = await this.getClient().getDatasets();
return datasets.map((dataset) => dataset.id).filter((id): id is string => Boolean(id));
}
@ -404,6 +456,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
params,
);
return rows.map((row) => ({
catalog: this.resolved.projectId,
schema: row.table_schema,
name: row.table_name,
kind:
@ -467,13 +520,24 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
return firstNumber(rows[0]?.[header]);
}
private async introspectDataset(datasetId: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
private async introspectDataset(
datasetId: string,
scopedNames: readonly string[] | null,
snapshotWarnings: KtxScanWarning[],
): Promise<KtxSchemaTable[]> {
if (scopedNames && scopedNames.length === 0) return [];
const dataset = this.getClient().dataset(datasetId);
const [tableRefs] = await dataset.getTables();
const scopeSet = scopedNames ? new Set(scopedNames) : null;
const filteredTableRefs = scopeSet ? tableRefs.filter((tableRef) => scopeSet.has(tableRef.id ?? '')) : tableRefs;
const primaryKeys = await this.primaryKeys(datasetId);
const primaryKeysResult = await tryConstraintQuery(
{ schema: datasetId, kind: 'primary_key', isDeniedError },
() => this.primaryKeys(datasetId),
);
const primaryKeys = primaryKeysResult.ok ? primaryKeysResult.value : new Map<string, Set<string>>();
if (!primaryKeysResult.ok) {
snapshotWarnings.push(primaryKeysResult.warning);
}
const tables: KtxSchemaTable[] = [];
for (const tableRef of filteredTableRefs) {
const tableName = tableRef.id || '';

View file

@ -1,9 +1,18 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
formatDialectTableName,
limitOffsetClause,
parseDialectDisplayRef,
} from '../../context/connections/dialect-helpers.js';
import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/types.js';
type BigQueryTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
export class KtxBigQueryDialect {
readonly type = 'bigquery';
/** @internal */
export class KtxBigQueryDialect implements KtxDialect {
readonly type = 'bigquery' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {
TIMESTAMP: 'time',
@ -27,13 +36,19 @@ export class KtxBigQueryDialect {
}
formatTableName(table: BigQueryTableNameRef): string {
if (table.catalog && table.db) {
return `${this.quoteIdentifier(table.catalog)}.${this.quoteIdentifier(table.db)}.${this.quoteIdentifier(table.name)}`;
}
if (table.db) {
return `${this.quoteIdentifier(table.db)}.${this.quoteIdentifier(table.name)}`;
}
return this.quoteIdentifier(table.name);
return formatDialectTableName(table, this.quoteIdentifier.bind(this), 'three-part');
}
formatDisplayRef(table: BigQueryTableNameRef): string {
return formatDialectDisplayRef(table, 'three-part');
}
parseDisplayRef(display: string): KtxTableRef | null {
return parseDialectDisplayRef(display, 'three-part');
}
columnDisplayTablePartCount(): 1 | 2 | 3 {
return columnDisplayPartCount('three-part');
}
mapDataType(nativeType: string): string {
@ -93,19 +108,6 @@ export class KtxBigQueryDialect {
return `SELECT ${quotedColumn} FROM ${tableName} WHERE ${quotedColumn} IS NOT NULL AND TRIM(CAST(${quotedColumn} AS STRING)) != '' ORDER BY RAND() LIMIT ${limit}`;
}
prepareQuery(sql: string, params?: Record<string, unknown>): { sql: string; params?: Record<string, unknown> } {
if (!params) {
return { sql, params: undefined };
}
let processedSql = sql;
const processedParams: Record<string, unknown> = {};
for (const [key, value] of Object.entries(params)) {
processedSql = processedSql.replace(new RegExp(`:${key}\\b`, 'g'), `@${key}`);
processedParams[key] = value;
}
return { sql: processedSql, params: Object.keys(processedParams).length > 0 ? processedParams : undefined };
}
getRandomSampleFilter(samplePct: number): string {
if (samplePct <= 0 || samplePct >= 1) {
return '';
@ -121,7 +123,11 @@ export class KtxBigQueryDialect {
}
getLimitOffsetClause(limit: number, offset?: number): string {
return offset !== undefined && offset > 0 ? `LIMIT ${limit} OFFSET ${offset}` : `LIMIT ${limit}`;
return limitOffsetClause(limit, offset);
}
getTopClause(_limit: number): string {
return '';
}
getNullCountExpression(column: string): string {
@ -132,6 +138,18 @@ export class KtxBigQueryDialect {
return `APPROX_COUNT_DISTINCT(${column})`;
}
textLengthExpression(columnSql: string): string {
return `LENGTH(CAST(${columnSql} AS STRING))`;
}
castToText(columnSql: string): string {
return `CAST(${columnSql} AS STRING)`;
}
getSampleValueAggregation(innerSql: string): string {
return `(SELECT STRING_AGG(CAST(value AS STRING), '\\u001F') FROM (${innerSql}) AS relationship_profile_values)`;
}
generateCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string {
return `
WITH sampled AS (
@ -172,36 +190,4 @@ export class KtxBigQueryDialect {
FROM sampled
`;
}
getTimeTruncExpression(
column: string,
granularity: 'day' | 'week' | 'month' | 'quarter' | 'year',
timezone?: string,
): string {
const bigQueryGranularity = granularity.toUpperCase();
if (timezone) {
return `DATE_TRUNC(DATETIME(${column}, '${timezone}'), ${bigQueryGranularity})`;
}
return `DATE_TRUNC(${column}, ${bigQueryGranularity})`;
}
getCustomTimeTruncExpression(column: string, interval: string, origin?: string, timezone?: string): string {
const col = timezone ? `DATETIME(${column}, '${timezone}')` : column;
const [rawAmount, rawUnit] = interval.split(' ');
let diffUnit = rawUnit!.toUpperCase();
let amount = Number(rawAmount);
let addUnit = diffUnit;
if (diffUnit === 'WEEK') {
diffUnit = 'DAY';
amount = amount * 7;
addUnit = 'DAY';
}
const originExpr = origin ? `TIMESTAMP '${origin}'` : `TIMESTAMP '1970-01-01'`;
return `TIMESTAMP_ADD(${originExpr}, INTERVAL CAST(FLOOR(TIMESTAMP_DIFF(${col}, ${originExpr}, ${diffUnit}) / ${amount}) * ${amount} AS INT64) ${addUnit})`;
}
parseIntervalToSql(interval: string): string {
const [amount, unit] = interval.split(' ');
return `INTERVAL ${amount} ${unit!.toUpperCase()}`;
}
}

View file

@ -1,12 +1,12 @@
import { createClient } from '@clickhouse/client';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
import { connectorTestFailure, createKtxConnectorCapabilities, type KtxConnectorTestResult, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { readFileSync } from 'node:fs';
import { Agent as HttpsAgent } from 'node:https';
import { homedir } from 'node:os';
import { resolve } from 'node:path';
import { KtxClickHouseDialect } from './dialect.js';
export interface KtxClickHouseConnectionConfig {
driver?: string;
@ -198,6 +198,49 @@ function clickHouseTableKey(database: string, table: string): string {
return `${database}.${table}`;
}
function inferClickHouseQueryParamType(value: unknown): string {
if (value === null || value === undefined) {
return 'String';
}
if (typeof value === 'boolean') {
return 'Bool';
}
if (typeof value === 'number') {
return Number.isInteger(value) ? 'Int64' : 'Float64';
}
if (value instanceof Date) {
return 'DateTime';
}
return 'String';
}
/** @internal */
export function prepareClickHouseReadOnlyQuery(
sql: string,
params?: Record<string, unknown>,
): { sql: string; params?: Record<string, unknown> } {
if (!params) {
return { sql, params: undefined };
}
let parameterizedQuery = sql;
const queryParams: Record<string, unknown> = {};
const sortedKeys = Object.keys(params).sort((a, b) => b.length - a.length);
for (const key of sortedKeys) {
const placeholder = `:${key}`;
if (parameterizedQuery.includes(placeholder)) {
parameterizedQuery = parameterizedQuery.replace(
new RegExp(`:${key}\\b`, 'g'),
`{${key}:${inferClickHouseQueryParamType(params[key])}}`,
);
queryParams[key] = params[key];
}
}
return { sql: parameterizedQuery, params: Object.keys(queryParams).length > 0 ? queryParams : undefined };
}
export function isKtxClickHouseConnectionConfig(
connection: KtxClickHouseConnectionConfig | undefined,
): connection is KtxClickHouseConnectionConfig {
@ -256,7 +299,7 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
private readonly clientFactory: KtxClickHouseClientFactory;
private readonly endpointResolver?: KtxClickHouseEndpointResolver;
private readonly now: () => Date;
private readonly dialect = new KtxClickHouseDialect();
private readonly dialect = getDialectForDriver('clickhouse');
private client: KtxClickHouseClient | null = null;
private resolvedEndpoint: KtxClickHouseResolvedEndpoint | null = null;
@ -274,12 +317,12 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
this.id = `clickhouse:${options.connectionId}`;
}
async testConnection(): Promise<{ success: boolean; error?: string }> {
async testConnection(): Promise<KtxConnectorTestResult> {
try {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) };
return connectorTestFailure(error);
}
}
@ -408,7 +451,7 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
async executeReadOnly(input: KtxClickHouseReadOnlyQueryInput, _ctx: KtxScanContext): Promise<KtxQueryResult> {
this.assertConnection(input.connectionId);
const limitedSql = limitSqlForExecution(assertReadOnlySql(input.sql), input.maxRows);
const prepared = this.dialect.prepareQuery(limitedSql, input.params);
const prepared = prepareClickHouseReadOnlyQuery(limitedSql, input.params);
const result = await this.query(prepared.sql, prepared.params);
return { ...result, rowCount: result.rows.length };
}
@ -488,6 +531,7 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
{ schemas: filterSchemas },
);
return rows.map((row) => ({
catalog: null,
schema: row.database,
name: row.name,
kind: row.engine === 'View' || row.engine === 'MaterializedView' ? ('view' as const) : ('table' as const),

View file

@ -1,9 +1,18 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
formatDialectTableName,
limitOffsetClause,
parseDialectDisplayRef,
} from '../../context/connections/dialect-helpers.js';
import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/types.js';
type ClickHouseTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
export class KtxClickHouseDialect {
readonly type = 'clickhouse';
/** @internal */
export class KtxClickHouseDialect implements KtxDialect {
readonly type = 'clickhouse' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {
date: 'time',
@ -45,9 +54,19 @@ export class KtxClickHouseDialect {
}
formatTableName(table: ClickHouseTableNameRef): string {
return table.db
? `${this.quoteIdentifier(table.db)}.${this.quoteIdentifier(table.name)}`
: this.quoteIdentifier(table.name);
return formatDialectTableName(table, this.quoteIdentifier.bind(this), 'ansi');
}
formatDisplayRef(table: ClickHouseTableNameRef): string {
return formatDialectDisplayRef(table, 'ansi');
}
parseDisplayRef(display: string): KtxTableRef | null {
return parseDialectDisplayRef(display, 'ansi');
}
columnDisplayTablePartCount(): 1 | 2 | 3 {
return columnDisplayPartCount('ansi');
}
mapDataType(nativeType: string): string {
@ -97,29 +116,6 @@ export class KtxClickHouseDialect {
return `SELECT ${quotedColumn} FROM ${tableName} WHERE ${quotedColumn} IS NOT NULL AND trim(toString(${quotedColumn})) != '' LIMIT ${limit}`;
}
prepareQuery(sql: string, params?: Record<string, unknown>): { sql: string; params?: Record<string, unknown> } {
if (!params) {
return { sql, params: undefined };
}
let parameterizedQuery = sql;
const queryParams: Record<string, unknown> = {};
const sortedKeys = Object.keys(params).sort((a, b) => b.length - a.length);
for (const key of sortedKeys) {
const placeholder = `:${key}`;
if (parameterizedQuery.includes(placeholder)) {
parameterizedQuery = parameterizedQuery.replace(
new RegExp(`:${key}\\b`, 'g'),
`{${key}:${this.inferClickHouseType(params[key])}}`,
);
queryParams[key] = params[key];
}
}
return { sql: parameterizedQuery, params: queryParams };
}
getRandomSampleFilter(samplePct: number): string {
if (samplePct <= 0 || samplePct >= 1) {
return '';
@ -132,7 +128,11 @@ export class KtxClickHouseDialect {
}
getLimitOffsetClause(limit: number, offset?: number): string {
return offset !== undefined && offset > 0 ? `LIMIT ${limit} OFFSET ${offset}` : `LIMIT ${limit}`;
return limitOffsetClause(limit, offset);
}
getTopClause(_limit: number): string {
return '';
}
getNullCountExpression(column: string): string {
@ -143,6 +143,18 @@ export class KtxClickHouseDialect {
return `COUNT(DISTINCT ${column})`;
}
textLengthExpression(columnSql: string): string {
return `length(toString(${columnSql}))`;
}
castToText(columnSql: string): string {
return `toString(${columnSql})`;
}
getSampleValueAggregation(innerSql: string): string {
return `(SELECT arrayStringConcat(groupArray(toString(value)), '\\x1F') FROM (${innerSql}) AS relationship_profile_values)`;
}
generateCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string {
return `
SELECT COUNT(DISTINCT val) AS cardinality
@ -181,99 +193,9 @@ export class KtxClickHouseDialect {
)
`;
}
getTimeTruncExpression(
column: string,
granularity: 'day' | 'week' | 'month' | 'quarter' | 'year',
timezone?: string,
): string {
const tz = timezone ? `, '${timezone}'` : '';
switch (granularity) {
case 'day':
return `toStartOfDay(${column}${tz})`;
case 'week':
return `toStartOfWeek(${column}, 1${tz})`;
case 'month':
return `toStartOfMonth(${column}${tz})`;
case 'quarter':
return `toStartOfQuarter(${column}${tz})`;
case 'year':
return `toStartOfYear(${column}${tz})`;
}
}
getCustomTimeTruncExpression(column: string, interval: string, origin?: string, timezone?: string): string {
const col = timezone ? `toTimezone(${column}, '${timezone}')` : column;
const [rawAmount, rawUnit] = interval.split(' ');
const amount = Number(rawAmount);
const unit = rawUnit!.toLowerCase();
const originExpr = origin ? `toDateTime('${origin}')` : "toDateTime('1970-01-01')";
const calendarUnit = this.toClickHouseDateDiffUnit(unit);
if (calendarUnit) {
return `dateAdd(${calendarUnit}, intDiv(dateDiff(${calendarUnit}, ${originExpr}, ${col}), ${amount}) * ${amount}, ${originExpr})`;
}
const seconds = this.intervalToSeconds(amount, unit);
return `addSeconds(${originExpr}, intDiv(toUInt64(dateDiff('second', ${originExpr}, ${col})), ${seconds}) * ${seconds})`;
}
parseIntervalToSql(interval: string): string {
const [amount, unit] = interval.split(' ');
return `INTERVAL ${amount} ${unit!.toUpperCase()}`;
}
private unwrapClickHouseType(value: string, wrapper: string): string {
const prefix = `${wrapper}(`;
return value.startsWith(prefix) && value.endsWith(')') ? value.slice(prefix.length, -1) : value;
}
private inferClickHouseType(value: unknown): string {
if (value === null || value === undefined) {
return 'String';
}
if (typeof value === 'boolean') {
return 'Bool';
}
if (typeof value === 'number') {
return Number.isInteger(value) ? 'Int64' : 'Float64';
}
if (value instanceof Date) {
return 'DateTime';
}
return 'String';
}
private toClickHouseDateDiffUnit(unit: string): string | null {
if (unit === 'month' || unit === 'months') {
return "'month'";
}
if (unit === 'quarter' || unit === 'quarters') {
return "'quarter'";
}
if (unit === 'year' || unit === 'years') {
return "'year'";
}
return null;
}
private intervalToSeconds(amount: number, unit: string): number {
switch (unit) {
case 'second':
case 'seconds':
return amount;
case 'minute':
case 'minutes':
return amount * 60;
case 'hour':
case 'hours':
return amount * 3600;
case 'day':
case 'days':
return amount * 86400;
case 'week':
case 'weeks':
return amount * 604800;
default:
return amount * 86400;
}
}
}

View file

@ -2,10 +2,37 @@ import mysql, { type FieldPacket, type Pool, type RowDataPacket } from 'mysql2/p
import { readFileSync } from 'node:fs';
import { homedir } from 'node:os';
import { resolve } from 'node:path';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxTableListEntry, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import {
constraintDiscoveryWarning,
tryConstraintQuery,
type ConstraintDiscoveryKind,
} from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { KtxMysqlDialect } from './dialect.js';
import {
connectorTestFailure,
createKtxConnectorCapabilities,
type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
type KtxColumnStatsResult,
type KtxQueryResult,
type KtxReadOnlyQueryInput,
type KtxScanConnector,
type KtxScanContext,
type KtxScanInput,
type KtxScanWarning,
type KtxSchemaColumn,
type KtxSchemaForeignKey,
type KtxSchemaSnapshot,
type KtxSchemaTable,
type KtxTableListEntry,
type KtxTableRef,
type KtxTableSampleInput,
type KtxTableSampleResult,
} from '../../context/scan/types.js';
export interface KtxMysqlConnectionConfig {
driver?: string;
@ -18,6 +45,7 @@ export interface KtxMysqlConnectionConfig {
password?: string;
url?: string;
ssl?: boolean | { rejectUnauthorized?: boolean };
maxConnections?: number;
[key: string]: unknown;
}
@ -163,6 +191,23 @@ function maybeNumber(value: unknown): number | undefined {
return typeof value === 'number' && Number.isFinite(value) ? value : undefined;
}
function positiveIntegerConfigValue(input: {
connection: KtxMysqlConnectionConfig;
key: keyof KtxMysqlConnectionConfig;
connectionId: string;
defaultValue: number;
}): number {
const value = input.connection[input.key];
if (value === undefined) {
return input.defaultValue;
}
const numberValue = Number(value);
if (!Number.isInteger(numberValue) || numberValue < 1) {
throw new Error(`connections.${input.connectionId}.${String(input.key)} must be a positive integer`);
}
return numberValue;
}
function parseMysqlUrl(url: string): Partial<KtxMysqlConnectionConfig> {
const parsed = new URL(url);
const sslParam = parsed.searchParams.get('ssl') ?? parsed.searchParams.get('sslmode');
@ -231,6 +276,28 @@ function primaryKeyMap(rows: MysqlPrimaryKeyRow[], fallbackDatabase: string): Ma
return grouped;
}
function isDeniedError(error: unknown): boolean {
if (!error || typeof error !== 'object') {
return false;
}
const code = (error as { code?: unknown }).code;
return (
code === 'ER_TABLEACCESS_DENIED_ERROR' ||
code === 'ER_SPECIFIC_ACCESS_DENIED_ERROR' ||
code === 'ER_DBACCESS_DENIED_ERROR'
);
}
function pushConstraintWarnings(
warnings: KtxScanWarning[],
schemas: readonly string[],
kind: ConstraintDiscoveryKind,
): void {
for (const schema of schemas) {
warnings.push(constraintDiscoveryWarning({ schema, kind }));
}
}
function queryParams(params: Record<string, unknown> | unknown[] | undefined): unknown[] | undefined {
if (!params) {
return undefined;
@ -238,6 +305,25 @@ function queryParams(params: Record<string, unknown> | unknown[] | undefined): u
return Array.isArray(params) ? params : Object.values(params);
}
/** @internal */
export function prepareMysqlReadOnlyQuery(
sql: string,
params?: Record<string, unknown>,
): { sql: string; params?: unknown[] } {
if (!params) {
return { sql, params: undefined };
}
const values: unknown[] = [];
const parameterizedQuery = sql.replace(/:([A-Za-z_][A-Za-z0-9_]*)\b/g, (placeholder, key: string) => {
if (!(key in params)) {
return placeholder;
}
values.push(params[key]);
return '?';
});
return { sql: parameterizedQuery, params: values };
}
export function isKtxMysqlConnectionConfig(
connection: KtxMysqlConnectionConfig | undefined,
): connection is KtxMysqlConnectionConfig {
@ -262,6 +348,12 @@ export function mysqlConnectionPoolConfigFromConfig(input: {
const host = stringConfigValue(merged, 'host', env);
const database = stringConfigValue(merged, 'database', env);
const user = stringConfigValue(merged, 'username', env) ?? stringConfigValue(merged, 'user', env);
const maxConnections = positiveIntegerConfigValue({
connection: merged,
key: 'maxConnections',
connectionId: input.connectionId,
defaultValue: 10,
});
if (!host) {
throw new Error(`Native MySQL connector requires connections.${input.connectionId}.host or url`);
@ -280,7 +372,7 @@ export function mysqlConnectionPoolConfigFromConfig(input: {
database,
user,
password: stringConfigValue(merged, 'password', env),
connectionLimit: 10,
connectionLimit: maxConnections,
waitForConnections: true,
...(ssl ? { ssl: { rejectUnauthorized: ssl.rejectUnauthorized ?? false } } : {}),
};
@ -305,7 +397,7 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
private readonly poolFactory: KtxMysqlPoolFactory;
private readonly endpointResolver?: KtxMysqlEndpointResolver;
private readonly now: () => Date;
private readonly dialect = new KtxMysqlDialect();
private readonly dialect = getDialectForDriver('mysql');
private pool: KtxMysqlPool | null = null;
private resolvedEndpoint: KtxMysqlResolvedEndpoint | null = null;
@ -323,18 +415,19 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
this.id = `mysql:${options.connectionId}`;
}
async testConnection(): Promise<{ success: boolean; error?: string }> {
async testConnection(): Promise<KtxConnectorTestResult> {
try {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) };
return connectorTestFailure(error);
}
}
async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> {
this.assertConnection(input.connectionId);
const databases = configuredMysqlSchemas(this.connection, this.poolConfig.database);
const snapshotWarnings: KtxScanWarning[] = [];
const placeholders = databases.map(() => '?').join(', ');
let allScopedTables: string[] | null = null;
if (input.tableScope) {
@ -368,8 +461,11 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
`,
[...databases, ...tableNameParams],
);
const primaryKeys = await this.queryRaw<MysqlPrimaryKeyRow>(
`
const primaryKeysResult = await tryConstraintQuery(
{ schema: databases[0] ?? this.poolConfig.database, kind: 'primary_key', isDeniedError },
() =>
this.queryRaw<MysqlPrimaryKeyRow>(
`
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA IN (${placeholders})
@ -377,10 +473,18 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION
`,
[...databases, ...tableNameParams],
[...databases, ...tableNameParams],
),
);
const foreignKeys = await this.queryRaw<MysqlForeignKeyRow>(
`
const primaryKeys = primaryKeysResult.ok ? primaryKeysResult.value : [];
if (!primaryKeysResult.ok) {
pushConstraintWarnings(snapshotWarnings, databases, 'primary_key');
}
const foreignKeysResult = await tryConstraintQuery(
{ schema: databases[0] ?? this.poolConfig.database, kind: 'foreign_key', isDeniedError },
() =>
this.queryRaw<MysqlForeignKeyRow>(
`
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME, CONSTRAINT_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA IN (${placeholders})
@ -388,8 +492,13 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
`,
[...databases, ...tableNameParams],
[...databases, ...tableNameParams],
),
);
const foreignKeys = foreignKeysResult.ok ? foreignKeysResult.value : [];
if (!foreignKeysResult.ok) {
pushConstraintWarnings(snapshotWarnings, databases, 'foreign_key');
}
const columnsByTable = groupByTable(columns, this.poolConfig.database);
const primaryKeysByTable = primaryKeyMap(primaryKeys, this.poolConfig.database);
@ -417,6 +526,7 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
total_columns: schemaTables.reduce((sum, table) => sum + table.columns.length, 0),
},
tables: schemaTables,
warnings: snapshotWarnings,
};
}
@ -461,7 +571,7 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
const limitedSql = limitSqlForExecution(assertReadOnlySql(input.sql), input.maxRows);
const prepared = Array.isArray(input.params)
? { sql: limitedSql, params: input.params }
: this.dialect.prepareQuery(limitedSql, input.params);
: prepareMysqlReadOnlyQuery(limitedSql, input.params);
const result = await this.query(prepared.sql, prepared.params);
return { ...result, rowCount: result.rows.length };
}
@ -536,6 +646,7 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
filterSchemas,
);
return rows.map((row) => ({
catalog: null,
schema: row.TABLE_SCHEMA,
name: row.TABLE_NAME,
kind: row.TABLE_TYPE === 'VIEW' ? ('view' as const) : ('table' as const),

View file

@ -1,9 +1,18 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
formatDialectTableName,
limitOffsetClause,
parseDialectDisplayRef,
} from '../../context/connections/dialect-helpers.js';
import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/types.js';
type MysqlTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
export class KtxMysqlDialect {
readonly type = 'mysql';
/** @internal */
export class KtxMysqlDialect implements KtxDialect {
readonly type = 'mysql' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {
datetime: 'time',
@ -41,9 +50,19 @@ export class KtxMysqlDialect {
}
formatTableName(table: MysqlTableNameRef): string {
return table.db
? `${this.quoteIdentifier(table.db)}.${this.quoteIdentifier(table.name)}`
: this.quoteIdentifier(table.name);
return formatDialectTableName(table, this.quoteIdentifier.bind(this), 'ansi');
}
formatDisplayRef(table: MysqlTableNameRef): string {
return formatDialectDisplayRef(table, 'ansi');
}
parseDisplayRef(display: string): KtxTableRef | null {
return parseDialectDisplayRef(display, 'ansi');
}
columnDisplayTablePartCount(): 1 | 2 | 3 {
return columnDisplayPartCount('ansi');
}
mapDataType(nativeType: string): string {
@ -91,21 +110,6 @@ export class KtxMysqlDialect {
return `SELECT ${quotedColumn} FROM ${tableName} WHERE ${quotedColumn} IS NOT NULL AND TRIM(CAST(${quotedColumn} AS CHAR)) != '' LIMIT ${limit}`;
}
prepareQuery(sql: string, params?: Record<string, unknown>): { sql: string; params?: unknown[] } {
if (!params) {
return { sql, params: undefined };
}
const values: unknown[] = [];
const parameterizedQuery = sql.replace(/:([A-Za-z_][A-Za-z0-9_]*)\b/g, (placeholder, key: string) => {
if (!(key in params)) {
return placeholder;
}
values.push(params[key]);
return '?';
});
return { sql: parameterizedQuery, params: values };
}
getRandomSampleFilter(samplePct: number): string {
if (samplePct <= 0 || samplePct >= 1) {
return '';
@ -118,7 +122,11 @@ export class KtxMysqlDialect {
}
getLimitOffsetClause(limit: number, offset?: number): string {
return offset !== undefined && offset > 0 ? `LIMIT ${limit} OFFSET ${offset}` : `LIMIT ${limit}`;
return limitOffsetClause(limit, offset);
}
getTopClause(_limit: number): string {
return '';
}
getNullCountExpression(column: string): string {
@ -129,6 +137,18 @@ export class KtxMysqlDialect {
return `COUNT(DISTINCT ${column})`;
}
textLengthExpression(columnSql: string): string {
return `CHAR_LENGTH(CAST(${columnSql} AS CHAR))`;
}
castToText(columnSql: string): string {
return `CAST(${columnSql} AS CHAR)`;
}
getSampleValueAggregation(innerSql: string): string {
return `(SELECT GROUP_CONCAT(CAST(value AS CHAR) SEPARATOR CHAR(31)) FROM (${innerSql}) AS relationship_profile_values)`;
}
generateCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string {
return `
SELECT COUNT(DISTINCT val) AS cardinality
@ -167,36 +187,4 @@ export class KtxMysqlDialect {
) AS sampled
`;
}
getTimeTruncExpression(
column: string,
granularity: 'day' | 'week' | 'month' | 'quarter' | 'year',
timezone?: string,
): string {
const col = timezone ? `CONVERT_TZ(${column}, '+00:00', '${timezone}')` : column;
switch (granularity) {
case 'day':
return `DATE(${col})`;
case 'week':
return `DATE(${col} - INTERVAL WEEKDAY(${col}) DAY)`;
case 'month':
return `DATE_FORMAT(${col}, '%Y-%m-01')`;
case 'quarter':
return `MAKEDATE(YEAR(${col}), 1) + INTERVAL (QUARTER(${col}) - 1) QUARTER`;
case 'year':
return `DATE_FORMAT(${col}, '%Y-01-01')`;
}
}
getCustomTimeTruncExpression(column: string, interval: string, origin?: string, timezone?: string): string {
const col = timezone ? `CONVERT_TZ(${column}, '+00:00', '${timezone}')` : column;
const [amount, unit] = interval.split(' ');
const originExpr = origin ? `'${origin}'` : `'1970-01-01'`;
return `DATE_ADD(${originExpr}, INTERVAL FLOOR(TIMESTAMPDIFF(${unit!.toUpperCase()}, ${originExpr}, ${col}) / ${amount}) * ${amount} ${unit!.toUpperCase()})`;
}
parseIntervalToSql(interval: string): string {
const [amount, unit] = interval.split(' ');
return `INTERVAL ${amount} ${unit!.toUpperCase()}`;
}
}

View file

@ -1,11 +1,34 @@
import { readFileSync } from 'node:fs';
import { homedir } from 'node:os';
import { resolve } from 'node:path';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
connectorTestFailure,
createKtxConnectorCapabilities,
type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
type KtxColumnStatsResult,
type KtxQueryResult,
type KtxReadOnlyQueryInput,
type KtxScanConnector,
type KtxScanContext,
type KtxScanInput,
type KtxScanWarning,
type KtxSchemaColumn,
type KtxSchemaForeignKey,
type KtxSchemaSnapshot,
type KtxSchemaTable,
type KtxTableListEntry,
type KtxTableRef,
type KtxTableSampleInput,
type KtxTableSampleResult,
} from '../../context/scan/types.js';
import { Pool } from 'pg';
import { KtxPostgresDialect } from './dialect.js';
const PG_OID_TYPE_MAP: Record<number, string> = {
16: 'boolean',
@ -43,6 +66,7 @@ export interface KtxPostgresConnectionConfig {
sslmode?: string;
sslMode?: string;
rejectUnauthorized?: boolean;
maxConnections?: number;
[key: string]: unknown;
}
@ -197,6 +221,29 @@ function groupByTable<T extends { table_name: string }>(rows: T[]): Map<string,
return grouped;
}
/** @internal */
export function preparePostgresReadOnlyQuery(
sql: string,
params?: Record<string, unknown>,
): { sql: string; params?: unknown[] } {
if (!params) {
return { sql, params: undefined };
}
const paramNames = Object.keys(params);
const values: unknown[] = new Array(paramNames.length);
const paramIndexMap = new Map<string, number>();
paramNames.forEach((name, index) => {
paramIndexMap.set(name, index + 1);
values[index] = params[name];
});
const sortedKeys = [...paramNames].sort((a, b) => b.length - a.length);
let parameterizedQuery = sql;
for (const name of sortedKeys) {
parameterizedQuery = parameterizedQuery.replace(new RegExp(`:${name}\\b`, 'g'), `$${paramIndexMap.get(name)}`);
}
return { sql: parameterizedQuery, params: values };
}
function primaryKeyMap(rows: PostgresPrimaryKeyRow[]): Map<string, Set<string>> {
const grouped = new Map<string, Set<string>>();
for (const row of rows) {
@ -207,6 +254,14 @@ function primaryKeyMap(rows: PostgresPrimaryKeyRow[]): Map<string, Set<string>>
return grouped;
}
function isDeniedError(error: unknown): boolean {
if (!error || typeof error !== 'object') {
return false;
}
const code = (error as { code?: unknown }).code;
return code === '42501' || code === '42P01';
}
function queryRows(result: KtxPostgresQueryResult): unknown[][] {
const headers = (result.fields ?? []).map((field) => field.name);
return result.rows.map((row) => headers.map((header) => row[header]));
@ -242,6 +297,23 @@ function numberValue(value: unknown): number | undefined {
return typeof value === 'number' && Number.isFinite(value) ? value : undefined;
}
function positiveIntegerConfigValue(input: {
connection: KtxPostgresConnectionConfig;
key: keyof KtxPostgresConnectionConfig;
connectionId: string;
defaultValue: number;
}): number {
const value = input.connection[input.key];
if (value === undefined) {
return input.defaultValue;
}
const numberValue = Number(value);
if (!Number.isInteger(numberValue) || numberValue < 1) {
throw new Error(`connections.${input.connectionId}.${String(input.key)} must be a positive integer`);
}
return numberValue;
}
function parsePostgresUrl(url: string): Partial<KtxPostgresConnectionConfig> {
const parsed = new URL(url);
const sslmode = parsed.searchParams.get('sslmode') ?? undefined;
@ -276,7 +348,7 @@ export function isKtxPostgresConnectionConfig(
connection: KtxPostgresConnectionConfig | undefined,
): connection is KtxPostgresConnectionConfig {
const driver = String(connection?.driver ?? '').toLowerCase();
return driver === 'postgres' || driver === 'postgresql';
return driver === 'postgres';
}
/** @internal */
@ -299,6 +371,12 @@ export function postgresPoolConfigFromConfig(input: {
const user = stringConfigValue(merged, 'username', env) ?? stringConfigValue(merged, 'user', env);
const password = stringConfigValue(merged, 'password', env);
const sslmode = normalizedSslMode(merged);
const maxConnections = positiveIntegerConfigValue({
connection: merged,
key: 'maxConnections',
connectionId: input.connectionId,
defaultValue: 10,
});
if (!referencedUrl && !host) {
throw new Error(`Native PostgreSQL connector requires connections.${input.connectionId}.host or url`);
@ -311,7 +389,7 @@ export function postgresPoolConfigFromConfig(input: {
}
const config: KtxPostgresPoolConfig = {
max: 10,
max: maxConnections,
idleTimeoutMillis: 30_000,
connectionTimeoutMillis: 10_000,
...(referencedUrl && sslmode !== 'prefer' && sslmode !== 'disable'
@ -347,7 +425,7 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
private readonly poolFactory: KtxPostgresPoolFactory;
private readonly endpointResolver?: KtxPostgresEndpointResolver;
private readonly now: () => Date;
private readonly dialect = new KtxPostgresDialect();
private readonly dialect = getDialectForDriver('postgres');
private pool: KtxPostgresPool | null = null;
private lastIdlePoolError: Error | null = null;
private resolvedEndpoint: KtxPostgresResolvedEndpoint | null = null;
@ -366,12 +444,12 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
this.id = `postgres:${options.connectionId}`;
}
async testConnection(): Promise<{ success: boolean; error?: string }> {
async testConnection(): Promise<KtxConnectorTestResult> {
try {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) };
return connectorTestFailure(error);
}
}
@ -379,10 +457,11 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
this.assertConnection(input.connectionId);
const schemas = schemasFromConnection(this.connection);
const allTables: KtxSchemaTable[] = [];
const snapshotWarnings: KtxScanWarning[] = [];
for (const schema of schemas) {
const scopedNames = input.tableScope ? scopedTableNames(input.tableScope, { catalog: null, db: schema }) : null;
if (scopedNames && scopedNames.length === 0) continue;
const tables = await this.loadSchemaTables(schema, scopedNames);
const tables = await this.loadSchemaTables(schema, scopedNames, snapshotWarnings);
allTables.push(...tables);
}
return {
@ -398,6 +477,7 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
total_columns: allTables.reduce((sum, table) => sum + table.columns.length, 0),
},
tables: allTables,
warnings: snapshotWarnings,
};
}
@ -434,7 +514,7 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
const limitedSql = limitSqlForExecution(assertReadOnlySql(input.sql), input.maxRows);
const prepared = Array.isArray(input.params)
? { sql: limitedSql, params: input.params }
: this.dialect.prepareQuery(limitedSql, input.params);
: preparePostgresReadOnlyQuery(limitedSql, input.params);
const result = await this.query(prepared.sql, prepared.params);
return { ...result, rowCount: result.rows.length };
}
@ -529,6 +609,7 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
[filterSchemas],
);
return rows.map((row) => ({
catalog: null,
schema: row.schema_name,
name: row.table_name,
kind: row.table_kind === 'v' ? ('view' as const) : ('table' as const),
@ -546,7 +627,11 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
}
}
private async loadSchemaTables(schema: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
private async loadSchemaTables(
schema: string,
scopedNames: readonly string[] | null,
snapshotWarnings: KtxScanWarning[],
): Promise<KtxSchemaTable[]> {
if (scopedNames && scopedNames.length === 0) return [];
const pgCatalogScopeClause = scopedNames ? 'AND c.relname = ANY($2)' : '';
const tableConstraintScopeClause = scopedNames ? 'AND tc.table_name = ANY($2)' : '';
@ -591,8 +676,11 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
`,
[schema, ...scopeValues],
);
const primaryKeys = await this.queryRaw<PostgresPrimaryKeyRow>(
`
const primaryKeysResult = await tryConstraintQuery(
{ schema, kind: 'primary_key', isDeniedError },
() =>
this.queryRaw<PostgresPrimaryKeyRow>(
`
SELECT tc.table_name, kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
@ -603,10 +691,18 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
${tableConstraintScopeClause}
ORDER BY tc.table_name, kcu.ordinal_position
`,
[schema, ...scopeValues],
[schema, ...scopeValues],
),
);
const foreignKeys = await this.queryRaw<PostgresForeignKeyRow>(
`
const primaryKeys = primaryKeysResult.ok ? primaryKeysResult.value : [];
if (!primaryKeysResult.ok) {
snapshotWarnings.push(primaryKeysResult.warning);
}
const foreignKeysResult = await tryConstraintQuery(
{ schema, kind: 'foreign_key', isDeniedError },
() =>
this.queryRaw<PostgresForeignKeyRow>(
`
SELECT
tc.table_name,
kcu.column_name,
@ -626,8 +722,13 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
${tableConstraintScopeClause}
ORDER BY tc.table_name, kcu.column_name
`,
[schema, ...scopeValues],
[schema, ...scopeValues],
),
);
const foreignKeys = foreignKeysResult.ok ? foreignKeysResult.value : [];
if (!foreignKeysResult.ok) {
snapshotWarnings.push(foreignKeysResult.warning);
}
const columnsByTable = groupByTable(columns);
const primaryKeysByTable = primaryKeyMap(primaryKeys);

View file

@ -1,9 +1,18 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
formatDialectTableName,
limitOffsetClause,
parseDialectDisplayRef,
} from '../../context/connections/dialect-helpers.js';
import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/types.js';
type PostgresTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
export class KtxPostgresDialect {
readonly type = 'postgresql';
/** @internal */
export class KtxPostgresDialect implements KtxDialect {
readonly type = 'postgres' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {
timestamp: 'time',
@ -45,9 +54,19 @@ export class KtxPostgresDialect {
}
formatTableName(table: PostgresTableNameRef): string {
return table.db
? `${this.quoteIdentifier(table.db)}.${this.quoteIdentifier(table.name)}`
: this.quoteIdentifier(table.name);
return formatDialectTableName(table, this.quoteIdentifier.bind(this), 'ansi');
}
formatDisplayRef(table: PostgresTableNameRef): string {
return formatDialectDisplayRef(table, 'ansi');
}
parseDisplayRef(display: string): KtxTableRef | null {
return parseDialectDisplayRef(display, 'ansi');
}
columnDisplayTablePartCount(): 1 | 2 | 3 {
return columnDisplayPartCount('ansi');
}
mapDataType(nativeType: string): string {
@ -92,25 +111,6 @@ export class KtxPostgresDialect {
return `SELECT ${quotedColumn} FROM ${tableName} WHERE ${quotedColumn} IS NOT NULL AND TRIM(CAST(${quotedColumn} AS TEXT)) != '' LIMIT ${limit}`;
}
prepareQuery(sql: string, params?: Record<string, unknown>): { sql: string; params?: unknown[] } {
if (!params) {
return { sql, params: undefined };
}
const paramNames = Object.keys(params);
const values: unknown[] = new Array(paramNames.length);
const paramIndexMap = new Map<string, number>();
paramNames.forEach((name, index) => {
paramIndexMap.set(name, index + 1);
values[index] = params[name];
});
const sortedKeys = [...paramNames].sort((a, b) => b.length - a.length);
let parameterizedQuery = sql;
for (const name of sortedKeys) {
parameterizedQuery = parameterizedQuery.replace(new RegExp(`:${name}\\b`, 'g'), `$${paramIndexMap.get(name)}`);
}
return { sql: parameterizedQuery, params: values };
}
getRandomSampleFilter(samplePct: number): string {
if (samplePct <= 0 || samplePct >= 1) {
return '';
@ -126,7 +126,11 @@ export class KtxPostgresDialect {
}
getLimitOffsetClause(limit: number, offset?: number): string {
return offset !== undefined && offset > 0 ? `LIMIT ${limit} OFFSET ${offset}` : `LIMIT ${limit}`;
return limitOffsetClause(limit, offset);
}
getTopClause(_limit: number): string {
return '';
}
getNullCountExpression(column: string): string {
@ -137,6 +141,18 @@ export class KtxPostgresDialect {
return `COUNT(DISTINCT ${column})`;
}
textLengthExpression(columnSql: string): string {
return `LENGTH(CAST(${columnSql} AS TEXT))`;
}
castToText(columnSql: string): string {
return `CAST(${columnSql} AS TEXT)`;
}
getSampleValueAggregation(innerSql: string): string {
return `(SELECT STRING_AGG(CAST(value AS TEXT), CHR(31)) FROM (${innerSql}) AS relationship_profile_values)`;
}
generateCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string {
return `
WITH sampled AS (
@ -191,23 +207,4 @@ export class KtxPostgresDialect {
FROM sampled
`;
}
getTimeTruncExpression(
column: string,
granularity: 'day' | 'week' | 'month' | 'quarter' | 'year',
timezone?: string,
): string {
const col = timezone ? `(${column} AT TIME ZONE '${timezone.replace(/'/g, "''")}')` : column;
return `DATE_TRUNC('${granularity}', ${col})`;
}
getCustomTimeTruncExpression(column: string, interval: string, origin?: string, timezone?: string): string {
const col = timezone ? `(${column} AT TIME ZONE '${timezone.replace(/'/g, "''")}')` : column;
const originExpr = origin ? `TIMESTAMP '${origin.replace(/'/g, "''")}'` : "TIMESTAMP '1970-01-01'";
return `${originExpr} + FLOOR(EXTRACT(EPOCH FROM (${col} - ${originExpr})) / EXTRACT(EPOCH FROM INTERVAL '${interval.replace(/'/g, "''")}')) * INTERVAL '${interval.replace(/'/g, "''")}'`;
}
parseIntervalToSql(interval: string): string {
return `INTERVAL '${interval.replace(/'/g, "''")}'`;
}
}

View file

@ -2,12 +2,34 @@ import { createPrivateKey } from 'node:crypto';
import { readFileSync } from 'node:fs';
import { homedir } from 'node:os';
import { resolve } from 'node:path';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
connectorTestFailure,
createKtxConnectorCapabilities,
type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
type KtxColumnStatsResult,
type KtxQueryResult,
type KtxReadOnlyQueryInput,
type KtxScanConnector,
type KtxScanContext,
type KtxScanInput,
type KtxScanWarning,
type KtxSchemaColumn,
type KtxSchemaSnapshot,
type KtxSchemaTable,
type KtxTableListEntry,
type KtxTableRef,
type KtxTableSampleInput,
type KtxTableSampleResult,
} from '../../context/scan/types.js';
import snowflake from 'snowflake-sdk';
import type { Bind, Binds, Connection, ConnectionOptions } from 'snowflake-sdk';
import { KtxSnowflakeDialect } from './dialect.js';
import { assertSafeSnowflakeIdentifier, quoteSnowflakeIdentifier } from './identifiers.js';
import { configureSnowflakeSdkLogger } from './sdk-logger.js';
@ -24,7 +46,7 @@ export interface KtxSnowflakeConnectionConfig {
privateKey?: string;
passphrase?: string;
role?: string;
maxSessions?: number;
maxConnections?: number;
[key: string]: unknown;
}
@ -39,7 +61,7 @@ export interface KtxSnowflakeResolvedConnectionConfig {
privateKey?: string;
passphrase?: string;
role?: string;
maxSessions: number;
maxConnections: number;
}
export interface KtxSnowflakeRawColumnMetadata {
@ -166,6 +188,13 @@ function firstNumber(value: unknown): number | null {
return Number.isFinite(numberValue) ? numberValue : null;
}
function isDeniedError(error: unknown): boolean {
if (error instanceof Error) {
return /insufficient privileges|does not exist or not authorized/i.test(error.message);
}
return false;
}
function normalizeSnowflakeValue(value: unknown, columnType?: string): unknown {
if (columnType && DATE_TYPES.some((type) => columnType.toUpperCase().includes(type))) {
if (typeof value === 'number') {
@ -202,6 +231,14 @@ function toSnowflakeBinds(params: unknown[] | undefined): Binds | undefined {
return params?.map((value) => toSnowflakeBind(value));
}
/** @internal */
export function prepareSnowflakeReadOnlyQuery(
sql: string,
params?: Record<string, unknown>,
): { sql: string; params?: unknown[] } {
return { sql, params: params ? Object.values(params) : undefined };
}
export function isKtxSnowflakeConnectionConfig(
connection: KtxSnowflakeConnectionConfig | undefined,
): connection is KtxSnowflakeConnectionConfig {
@ -218,6 +255,10 @@ export function snowflakeConnectionConfigFromConfig(input: {
if (!isKtxSnowflakeConnectionConfig(input.connection)) {
throw new Error(`Native Snowflake connector cannot run driver "${inputDriver}"`);
}
const staleMaxSessionsKey = 'max' + 'Sessions';
if (Object.prototype.hasOwnProperty.call(input.connection, staleMaxSessionsKey)) {
throw new Error(`connections.${input.connectionId}.maxSessions has been renamed to maxConnections`);
}
const env = input.env ?? process.env;
const authMethod = input.connection?.authMethod ?? 'password';
const account = stringConfigValue(input.connection, 'account', env);
@ -249,9 +290,9 @@ export function snowflakeConnectionConfigFromConfig(input: {
database,
schemas: resolvedSchemas,
username,
maxSessions: positiveIntegerConfigValue({
maxConnections: positiveIntegerConfigValue({
connection: input.connection,
key: 'maxSessions',
key: 'maxConnections',
connectionId: input.connectionId,
defaultValue: 4,
}),
@ -322,7 +363,7 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
const message = error instanceof Error ? error.message : String(error);
if (/timeout/i.test(message) && /pool|acquire/i.test(message)) {
throw new Error(
"Snowflake session pool exhausted after 60s - consider lowering maxSessions or increasing your account's concurrent-statement limit.",
"Snowflake session pool exhausted after 60s - consider lowering maxConnections or increasing your account's concurrent-statement limit.",
);
}
throw error;
@ -399,6 +440,7 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
[this.resolved.database, ...(schemas ?? [])],
);
return result.rows.map((row) => ({
catalog: this.resolved.database,
schema: String(row[0]),
name: String(row[1]),
kind: String(row[2]) === 'VIEW' ? ('view' as const) : ('table' as const),
@ -424,7 +466,7 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) };
return connectorTestFailure(error);
}
}
@ -432,7 +474,7 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
if (!this.pool) {
this.pool = snowflake.createPool(await this.resolveConnectionOptions(), {
min: 0,
max: this.resolved.maxSessions,
max: this.resolved.maxConnections,
evictionRunIntervalMillis: 30_000,
acquireTimeoutMillis: 60_000,
});
@ -519,7 +561,7 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
private readonly resolved: KtxSnowflakeResolvedConnectionConfig;
private readonly driverFactory: KtxSnowflakeDriverFactory;
private readonly dialect = new KtxSnowflakeDialect();
private readonly dialect = getDialectForDriver('snowflake');
private readonly now: () => Date;
private driverInstance: KtxSnowflakeDriver | null = null;
@ -533,20 +575,30 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
}
}
async testConnection(): Promise<{ success: boolean; error?: string }> {
async testConnection(): Promise<KtxConnectorTestResult> {
return this.getDriver().test();
}
async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> {
this.assertConnection(input.connectionId);
const tables: KtxSchemaTable[] = [];
const snapshotWarnings: KtxScanWarning[] = [];
for (const schemaName of this.resolved.schemas) {
const scopedNames = input.tableScope
? scopedTableNames(input.tableScope, { catalog: this.resolved.database, db: schemaName })
: null;
if (scopedNames && scopedNames.length === 0) continue;
const rawTables = await this.getDriver().getSchemaMetadata(schemaName, scopedNames);
const primaryKeys = await this.primaryKeys(rawTables.map((table) => table.name), schemaName);
const primaryKeysResult = await tryConstraintQuery(
{ schema: schemaName, kind: 'primary_key', isDeniedError },
() => this.primaryKeys(rawTables.map((table) => table.name), schemaName),
);
const primaryKeys = primaryKeysResult.ok
? primaryKeysResult.value
: new Map(rawTables.map((table) => [table.name, new Set<string>()]));
if (!primaryKeysResult.ok) {
snapshotWarnings.push(primaryKeysResult.warning);
}
tables.push(...rawTables.map((table) => this.toSchemaTable(table, primaryKeys)));
}
return {
@ -563,6 +615,7 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
total_columns: tables.reduce((sum, table) => sum + table.columns.length, 0),
},
tables,
warnings: snapshotWarnings,
};
}
@ -593,7 +646,7 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
async executeReadOnly(input: KtxSnowflakeReadOnlyQueryInput, _ctx: KtxScanContext): Promise<KtxQueryResult> {
this.assertConnection(input.connectionId);
const limitedSql = limitSqlForExecution(assertReadOnlySql(input.sql), input.maxRows);
const prepared = this.dialect.prepareQuery(limitedSql, input.params);
const prepared = prepareSnowflakeReadOnlyQuery(limitedSql, input.params);
return this.getDriver().query(prepared.sql, prepared.params);
}
@ -654,6 +707,7 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
[this.resolved.database, ...(schemas ?? [])],
);
return result.rows.map((row) => ({
catalog: this.resolved.database,
schema: String(row[0]),
name: String(row[1]),
kind: String(row[2]) === 'VIEW' ? ('view' as const) : ('table' as const),
@ -686,9 +740,8 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
return grouped;
}
const tableNamePlaceholders = tableNames.map(() => '?').join(', ');
try {
const result = await this.getDriver().query(
`
const result = await this.getDriver().query(
`
SELECT tc.TABLE_NAME, kcu.COLUMN_NAME
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
@ -701,16 +754,12 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
AND tc.TABLE_NAME IN (${tableNamePlaceholders})
ORDER BY tc.TABLE_NAME, kcu.ORDINAL_POSITION
`,
[schemaName, this.resolved.database, ...tableNames],
);
for (const row of result.rows) {
const tableName = String(row[0]);
const columnName = String(row[1]);
grouped.get(tableName)?.add(columnName);
}
} catch {
// INFORMATION_SCHEMA.KEY_COLUMN_USAGE often isn't granted to read-only roles;
// continue with empty PK map and let FK inference + profiling carry the slack.
[schemaName, this.resolved.database, ...tableNames],
);
for (const row of result.rows) {
const tableName = String(row[0]);
const columnName = String(row[1]);
grouped.get(tableName)?.add(columnName);
}
return grouped;
}

View file

@ -1,9 +1,18 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
formatDialectTableName,
limitOffsetClause,
parseDialectDisplayRef,
} from '../../context/connections/dialect-helpers.js';
import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/types.js';
type SnowflakeTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
export class KtxSnowflakeDialect {
readonly type = 'snowflake';
/** @internal */
export class KtxSnowflakeDialect implements KtxDialect {
readonly type = 'snowflake' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {
TIMESTAMP_NTZ: 'time',
@ -45,13 +54,19 @@ export class KtxSnowflakeDialect {
}
formatTableName(table: SnowflakeTableNameRef): string {
if (table.catalog && table.db) {
return `${this.quoteIdentifier(table.catalog)}.${this.quoteIdentifier(table.db)}.${this.quoteIdentifier(table.name)}`;
}
if (table.db) {
return `${this.quoteIdentifier(table.db)}.${this.quoteIdentifier(table.name)}`;
}
return this.quoteIdentifier(table.name);
return formatDialectTableName(table, this.quoteIdentifier.bind(this), 'three-part');
}
formatDisplayRef(table: SnowflakeTableNameRef): string {
return formatDialectDisplayRef(table, 'three-part');
}
parseDisplayRef(display: string): KtxTableRef | null {
return parseDialectDisplayRef(display, 'three-part');
}
columnDisplayTablePartCount(): 1 | 2 | 3 {
return columnDisplayPartCount('three-part');
}
mapDataType(nativeType: string): string {
@ -96,10 +111,6 @@ export class KtxSnowflakeDialect {
return `SELECT ${quotedColumn} FROM ${tableName} WHERE ${quotedColumn} IS NOT NULL AND TRIM(CAST(${quotedColumn} AS STRING)) != '' LIMIT ${limit}`;
}
prepareQuery(sql: string, params?: Record<string, unknown>): { sql: string; params?: unknown[] } {
return { sql, params: params ? Object.values(params) : undefined };
}
getRandomSampleFilter(samplePct: number): string {
if (samplePct <= 0 || samplePct >= 1) {
return '';
@ -115,7 +126,11 @@ export class KtxSnowflakeDialect {
}
getLimitOffsetClause(limit: number, offset?: number): string {
return offset !== undefined && offset > 0 ? `LIMIT ${limit} OFFSET ${offset}` : `LIMIT ${limit}`;
return limitOffsetClause(limit, offset);
}
getTopClause(_limit: number): string {
return '';
}
getNullCountExpression(column: string): string {
@ -126,6 +141,18 @@ export class KtxSnowflakeDialect {
return `APPROX_COUNT_DISTINCT(${column})`;
}
textLengthExpression(columnSql: string): string {
return `LENGTH(CAST(${columnSql} AS TEXT))`;
}
castToText(columnSql: string): string {
return `CAST(${columnSql} AS VARCHAR)`;
}
getSampleValueAggregation(innerSql: string): string {
return `(SELECT LISTAGG(CAST(value AS VARCHAR), '\\x1f') FROM (${innerSql}) AS relationship_profile_values)`;
}
generateCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string {
return `
WITH sampled AS (
@ -164,24 +191,4 @@ export class KtxSnowflakeDialect {
FROM sampled
`;
}
getTimeTruncExpression(
column: string,
granularity: 'day' | 'week' | 'month' | 'quarter' | 'year',
timezone?: string,
): string {
const target = timezone ? `CONVERT_TIMEZONE('UTC', '${timezone}', ${column})` : column;
return `DATE_TRUNC('${granularity}', ${target})`;
}
getCustomTimeTruncExpression(column: string, interval: string, origin?: string, timezone?: string): string {
const target = timezone ? `CONVERT_TIMEZONE('UTC', '${timezone}', ${column})` : column;
const [amount, unit] = interval.split(' ');
const originExpr = origin ? `'${origin}'::TIMESTAMP` : `'1970-01-01'::TIMESTAMP`;
return `DATEADD(${unit}, FLOOR(DATEDIFF(${unit}, ${originExpr}, ${target}) / ${amount}) * ${amount}, ${originExpr})`;
}
parseIntervalToSql(interval: string): string {
return `INTERVAL '${interval}'`;
}
}

View file

@ -3,11 +3,11 @@ import { existsSync, readFileSync, statSync } from 'node:fs';
import { homedir } from 'node:os';
import { isAbsolute, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { normalizeQueryRows } from '../../context/connections/query-executor.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { connectorTestFailure, createKtxConnectorCapabilities, type KtxConnectorTestResult, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { KtxSqliteDialect } from './dialect.js';
export interface KtxSqliteConnectionConfig {
driver?: string;
@ -125,7 +125,7 @@ export function isKtxSqliteConnectionConfig(
connection: KtxSqliteConnectionConfig | undefined,
): connection is KtxSqliteConnectionConfig {
const driver = String(connection?.driver ?? '').toLowerCase();
return driver === 'sqlite' || driver === 'sqlite3';
return driver === 'sqlite';
}
/** @internal */
@ -157,7 +157,7 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
private readonly connectionId: string;
private readonly dbPath: string;
private readonly now: () => Date;
private readonly dialect = new KtxSqliteDialect();
private readonly dialect = getDialectForDriver('sqlite');
private db: Database.Database | null = null;
constructor(options: KtxSqliteScanConnectorOptions) {
@ -167,7 +167,7 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
this.id = `sqlite:${options.connectionId}`;
}
async testConnection(): Promise<{ success: boolean; error?: string }> {
async testConnection(): Promise<KtxConnectorTestResult> {
try {
if (!existsSync(this.dbPath) || !statSync(this.dbPath).isFile()) {
return { success: false, error: `File not found: ${this.dbPath}` };
@ -175,7 +175,7 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
this.database().prepare('SELECT 1').get();
return { success: true };
} catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) };
return connectorTestFailure(error);
}
}
@ -209,6 +209,31 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
};
}
async listSchemas(): Promise<string[]> {
return [];
}
async listTables(_schemas?: string[]): Promise<KtxTableListEntry[]> {
const rows = this.database()
.prepare(
`
SELECT name, type
FROM sqlite_master
WHERE type IN ('table', 'view')
AND name NOT LIKE 'sqlite_%'
ORDER BY name
`,
)
.all() as SqliteMasterRow[];
return rows.map((row) => ({
catalog: null,
schema: '',
name: row.name,
kind: row.type === 'view' ? ('view' as const) : ('table' as const),
}));
}
async sampleTable(input: KtxTableSampleInput, _ctx: KtxScanContext): Promise<KtxTableSampleResult> {
this.assertConnection(input.connectionId);
const result = this.query(this.dialect.generateSampleQuery(this.qTableName(input.table), input.limit, input.columns));

View file

@ -1,9 +1,18 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
formatDialectTableName,
limitOffsetClause,
parseDialectDisplayRef,
} from '../../context/connections/dialect-helpers.js';
import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/types.js';
type SqliteTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
export class KtxSqliteDialect {
readonly type = 'sqlite';
/** @internal */
export class KtxSqliteDialect implements KtxDialect {
readonly type = 'sqlite' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {
DATETIME: 'time',
@ -29,7 +38,19 @@ export class KtxSqliteDialect {
}
formatTableName(table: SqliteTableNameRef): string {
return this.quoteIdentifier(table.name);
return formatDialectTableName(table, this.quoteIdentifier.bind(this), 'sqlite');
}
formatDisplayRef(table: SqliteTableNameRef): string {
return formatDialectDisplayRef(table, 'sqlite');
}
parseDisplayRef(display: string): KtxTableRef | null {
return parseDialectDisplayRef(display, 'sqlite');
}
columnDisplayTablePartCount(): 1 | 2 | 3 {
return columnDisplayPartCount('sqlite');
}
mapDataType(nativeType: string): string {
@ -76,10 +97,6 @@ export class KtxSqliteDialect {
return `SELECT ${quoted} FROM ${tableName} WHERE ${quoted} IS NOT NULL AND TRIM(CAST(${quoted} AS TEXT)) != '' LIMIT ${limit}`;
}
prepareQuery(sql: string, params?: Record<string, unknown>): { sql: string; params?: unknown } {
return params ? { sql, params } : { sql };
}
getRandomSampleFilter(samplePct: number): string {
if (samplePct <= 0 || samplePct >= 1) {
return '';
@ -92,7 +109,11 @@ export class KtxSqliteDialect {
}
getLimitOffsetClause(limit: number, offset?: number): string {
return offset !== undefined && offset > 0 ? `LIMIT ${limit} OFFSET ${offset}` : `LIMIT ${limit}`;
return limitOffsetClause(limit, offset);
}
getTopClause(_limit: number): string {
return '';
}
getNullCountExpression(column: string): string {
@ -103,6 +124,18 @@ export class KtxSqliteDialect {
return `COUNT(DISTINCT ${column})`;
}
textLengthExpression(columnSql: string): string {
return `LENGTH(CAST(${columnSql} AS TEXT))`;
}
castToText(columnSql: string): string {
return `CAST(${columnSql} AS TEXT)`;
}
getSampleValueAggregation(innerSql: string): string {
return `(SELECT GROUP_CONCAT(CAST(value AS TEXT), char(31)) FROM (${innerSql}) AS relationship_profile_values)`;
}
generateCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string {
return `
WITH sampled AS (
@ -143,35 +176,4 @@ export class KtxSqliteDialect {
FROM sampled
`;
}
getTimeTruncExpression(
column: string,
granularity: 'day' | 'week' | 'month' | 'quarter' | 'year',
_timezone?: string,
): string {
switch (granularity) {
case 'day':
return `DATE(${column})`;
case 'week':
return `DATE(${column}, 'weekday 0', '-6 days')`;
case 'month':
return `DATE(${column}, 'start of month')`;
case 'quarter':
return `DATE(${column}, 'start of month', '-' || ((CAST(STRFTIME('%m', ${column}) AS INTEGER) - 1) % 3) || ' months')`;
case 'year':
return `DATE(${column}, 'start of year')`;
}
}
getCustomTimeTruncExpression(column: string, interval: string, origin?: string, _timezone?: string): string {
const [amount, unit] = interval.split(' ');
const originExpr = origin ? `julianday('${origin}')` : `julianday('1970-01-01')`;
const unitDays = unit === 'day' ? 1 : unit === 'week' ? 7 : 30;
const intervalDays = Number(amount) * unitDays;
return `DATE(julianday('1970-01-01') + (CAST((julianday(${column}) - ${originExpr}) / ${intervalDays} AS INTEGER) * ${intervalDays}))`;
}
parseIntervalToSql(interval: string): string {
return `'${interval}'`;
}
}

View file

@ -1,11 +1,34 @@
import { assertReadOnlySql } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
connectorTestFailure,
createKtxConnectorCapabilities,
type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
type KtxColumnStatsResult,
type KtxQueryResult,
type KtxReadOnlyQueryInput,
type KtxScanConnector,
type KtxScanContext,
type KtxScanInput,
type KtxScanWarning,
type KtxSchemaColumn,
type KtxSchemaForeignKey,
type KtxSchemaSnapshot,
type KtxSchemaTable,
type KtxTableListEntry,
type KtxTableRef,
type KtxTableSampleInput,
type KtxTableSampleResult,
} from '../../context/scan/types.js';
import { readFileSync } from 'node:fs';
import { homedir } from 'node:os';
import { resolve } from 'node:path';
import sql from 'mssql';
import { KtxSqlServerDialect } from './dialect.js';
export interface KtxSqlServerConnectionConfig {
driver?: string;
@ -19,6 +42,7 @@ export interface KtxSqlServerConnectionConfig {
schema?: string;
schemas?: string[];
trustServerCertificate?: boolean;
maxConnections?: number;
[key: string]: unknown;
}
@ -136,6 +160,21 @@ function tableScopeSql(
return { clause: `AND ${columnExpression} IN (${placeholders.join(', ')})`, params };
}
/** @internal */
export function prepareSqlServerReadOnlyQuery(
sql: string,
params?: Record<string, unknown>,
): { sql: string; params?: Record<string, unknown> } {
if (!params) {
return { sql, params: undefined };
}
let parameterizedQuery = sql;
for (const key of Object.keys(params)) {
parameterizedQuery = parameterizedQuery.replace(new RegExp(`:${key}\\b`, 'g'), `@${key}`);
}
return { sql: parameterizedQuery, params };
}
class DefaultSqlServerPoolFactory implements KtxSqlServerPoolFactory {
async createPool(config: KtxSqlServerPoolConfig): Promise<KtxSqlServerPool> {
const pool = await new sql.ConnectionPool(config as sql.config).connect();
@ -197,6 +236,23 @@ function maybeNumber(value: unknown): number | undefined {
return typeof value === 'number' && Number.isFinite(value) ? value : undefined;
}
function positiveIntegerConfigValue(input: {
connection: KtxSqlServerConnectionConfig;
key: keyof KtxSqlServerConnectionConfig;
connectionId: string;
defaultValue: number;
}): number {
const value = input.connection[input.key];
if (value === undefined) {
return input.defaultValue;
}
const numberValue = Number(value);
if (!Number.isInteger(numberValue) || numberValue < 1) {
throw new Error(`connections.${input.connectionId}.${String(input.key)} must be a positive integer`);
}
return numberValue;
}
function schemaNames(connection: KtxSqlServerConnectionConfig, env: NodeJS.ProcessEnv): string[] {
if (Array.isArray(connection.schemas) && connection.schemas.length > 0) {
return connection.schemas.filter((schema) => schema.trim().length > 0).map((schema) => resolveStringReference(schema, env));
@ -219,6 +275,14 @@ function firstNumber(value: unknown): number | null {
return Number.isFinite(numberValue) ? numberValue : null;
}
function isDeniedError(error: unknown): boolean {
if (!error || typeof error !== 'object') {
return false;
}
const number = (error as { number?: unknown }).number;
return number === 229 || number === 230 || number === 297;
}
function limitSqlForSqlServerExecution(sqlText: string, maxRows: number | undefined): string {
const trimmed = assertReadOnlySql(sqlText).replace(/;+\s*$/, '');
if (!maxRows) {
@ -254,6 +318,12 @@ export function sqlServerConnectionPoolConfigFromConfig(input: {
const server = stringConfigValue(merged, 'host', env);
const database = stringConfigValue(merged, 'database', env);
const user = stringConfigValue(merged, 'username', env) ?? stringConfigValue(merged, 'user', env);
const maxConnections = positiveIntegerConfigValue({
connection: merged,
key: 'maxConnections',
connectionId: input.connectionId,
defaultValue: 10,
});
if (!server) {
throw new Error(`Native SQL Server connector requires connections.${input.connectionId}.host or url`);
@ -272,7 +342,7 @@ export function sqlServerConnectionPoolConfigFromConfig(input: {
user,
password: stringConfigValue(merged, 'password', env),
options: { encrypt: true, trustServerCertificate: merged.trustServerCertificate ?? true },
pool: { max: 10, min: 0, idleTimeoutMillis: 30000 },
pool: { max: maxConnections, min: 0, idleTimeoutMillis: 30000 },
};
}
@ -296,7 +366,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
private readonly poolFactory: KtxSqlServerPoolFactory;
private readonly endpointResolver?: KtxSqlServerEndpointResolver;
private readonly now: () => Date;
private readonly dialect = new KtxSqlServerDialect();
private readonly dialect = getDialectForDriver('sqlserver');
private pool: KtxSqlServerPool | null = null;
private resolvedEndpoint: KtxSqlServerResolvedEndpoint | null = null;
@ -316,23 +386,24 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
this.id = `sqlserver:${options.connectionId}`;
}
async testConnection(): Promise<{ success: boolean; error?: string }> {
async testConnection(): Promise<KtxConnectorTestResult> {
try {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) };
return connectorTestFailure(error);
}
}
async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> {
this.assertConnection(input.connectionId);
const tables: KtxSchemaTable[] = [];
const snapshotWarnings: KtxScanWarning[] = [];
for (const schemaName of this.schemas) {
const scopedNames = input.tableScope
? scopedTableNames(input.tableScope, { catalog: this.poolConfig.database, db: schemaName })
: null;
tables.push(...(await this.introspectSchema(schemaName, scopedNames)));
tables.push(...(await this.introspectSchema(schemaName, scopedNames, snapshotWarnings)));
}
return {
connectionId: this.connectionId,
@ -347,6 +418,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
total_columns: tables.reduce((sum, table) => sum + table.columns.length, 0),
},
tables,
warnings: snapshotWarnings,
};
}
@ -372,7 +444,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
async executeReadOnly(input: KtxSqlServerReadOnlyQueryInput, _ctx: KtxScanContext): Promise<KtxQueryResult> {
this.assertConnection(input.connectionId);
const limitedSql = limitSqlForSqlServerExecution(input.sql, input.maxRows);
const prepared = this.dialect.prepareQuery(limitedSql, input.params);
const prepared = prepareSqlServerReadOnlyQuery(limitedSql, input.params);
const result = await this.query(prepared.sql, prepared.params);
return { ...result, rowCount: result.rows.length };
}
@ -462,6 +534,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
params,
);
return rows.map((row) => ({
catalog: this.poolConfig.database,
schema: row.schema_name,
name: row.table_name,
kind: row.table_type === 'VIEW' ? ('view' as const) : ('table' as const),
@ -479,7 +552,11 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
}
}
private async introspectSchema(schemaName: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
private async introspectSchema(
schemaName: string,
scopedNames: readonly string[] | null,
snapshotWarnings: KtxScanWarning[],
): Promise<KtxSchemaTable[]> {
if (scopedNames && scopedNames.length === 0) return [];
const tableScope = tableScopeSql(scopedNames, 'TABLE_NAME');
const tables = await this.queryRaw<{ table_name: string; table_type: string }>(
@ -510,8 +587,22 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
);
const tableComments = await this.tableComments(schemaName, scopedNames);
const columnComments = await this.columnComments(schemaName, scopedNames);
const primaryKeys = await this.primaryKeys(schemaName, scopedNames);
const foreignKeys = await this.foreignKeys(schemaName, scopedNames);
const primaryKeysResult = await tryConstraintQuery(
{ schema: schemaName, kind: 'primary_key', isDeniedError },
() => this.primaryKeys(schemaName, scopedNames),
);
const foreignKeysResult = await tryConstraintQuery(
{ schema: schemaName, kind: 'foreign_key', isDeniedError },
() => this.foreignKeys(schemaName, scopedNames),
);
const primaryKeys = primaryKeysResult.ok ? primaryKeysResult.value : new Map<string, Set<string>>();
const foreignKeys = foreignKeysResult.ok ? foreignKeysResult.value : [];
if (!primaryKeysResult.ok) {
snapshotWarnings.push(primaryKeysResult.warning);
}
if (!foreignKeysResult.ok) {
snapshotWarnings.push(foreignKeysResult.warning);
}
const rowCounts = await this.rowCounts(schemaName, scopedNames);
const columnsByTable = groupByTable(columns);
const foreignKeysByTable = groupByTable(foreignKeys);

View file

@ -1,9 +1,18 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
formatDialectTableName,
parseDialectDisplayRef,
safeSqlLimit,
} from '../../context/connections/dialect-helpers.js';
import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/types.js';
type SqlServerTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
export class KtxSqlServerDialect {
readonly type = 'sqlserver';
/** @internal */
export class KtxSqlServerDialect implements KtxDialect {
readonly type = 'sqlserver' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {
datetime: 'time',
@ -39,9 +48,19 @@ export class KtxSqlServerDialect {
}
formatTableName(table: SqlServerTableNameRef): string {
return table.db
? `${this.quoteIdentifier(table.db)}.${this.quoteIdentifier(table.name)}`
: this.quoteIdentifier(table.name);
return formatDialectTableName(table, this.quoteIdentifier.bind(this), 'three-part');
}
formatDisplayRef(table: SqlServerTableNameRef): string {
return formatDialectDisplayRef(table, 'three-part');
}
parseDisplayRef(display: string): KtxTableRef | null {
return parseDialectDisplayRef(display, 'three-part');
}
columnDisplayTablePartCount(): 1 | 2 | 3 {
return columnDisplayPartCount('three-part');
}
mapDataType(nativeType: string): string {
@ -86,17 +105,6 @@ export class KtxSqlServerDialect {
return `SELECT TOP ${limit} ${quotedColumn} FROM ${tableName} WHERE ${quotedColumn} IS NOT NULL AND LTRIM(RTRIM(CAST(${quotedColumn} AS NVARCHAR(MAX)))) != ''`;
}
prepareQuery(sql: string, params?: Record<string, unknown>): { sql: string; params?: Record<string, unknown> } {
if (!params) {
return { sql, params: undefined };
}
let parameterizedQuery = sql;
for (const key of Object.keys(params)) {
parameterizedQuery = parameterizedQuery.replace(new RegExp(`:${key}\\b`, 'g'), `@${key}`);
}
return { sql: parameterizedQuery, params };
}
getRandomSampleFilter(samplePct: number): string {
if (samplePct <= 0 || samplePct >= 1) {
return '';
@ -111,12 +119,12 @@ export class KtxSqlServerDialect {
return `TABLESAMPLE (${samplePct * 100} PERCENT)`;
}
getLimitOffsetClause(limit: number, offset?: number): string {
return offset !== undefined && offset > 0 ? `OFFSET ${offset} ROWS FETCH NEXT ${limit} ROWS ONLY` : '';
getLimitOffsetClause(_limit: number, _offset?: number): string {
return '';
}
getTopClause(limit: number): string {
return `TOP ${limit}`;
return `TOP (${safeSqlLimit(limit)})`;
}
getNullCountExpression(column: string): string {
@ -127,6 +135,18 @@ export class KtxSqlServerDialect {
return `COUNT(DISTINCT ${column})`;
}
textLengthExpression(columnSql: string): string {
return `LEN(CAST(${columnSql} AS NVARCHAR(MAX)))`;
}
castToText(columnSql: string): string {
return `CAST(${columnSql} AS NVARCHAR(MAX))`;
}
getSampleValueAggregation(innerSql: string): string {
return `(SELECT STRING_AGG(CAST(value AS NVARCHAR(MAX)), CHAR(31)) FROM (${innerSql}) AS relationship_profile_values)`;
}
generateCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string {
return `
WITH sampled AS (
@ -167,35 +187,4 @@ export class KtxSqlServerDialect {
FROM sampled
`;
}
getTimeTruncExpression(
column: string,
granularity: 'day' | 'week' | 'month' | 'quarter' | 'year',
timezone?: string,
): string {
const col = timezone ? `${column} AT TIME ZONE 'UTC' AT TIME ZONE '${timezone}'` : column;
switch (granularity) {
case 'day':
return `CAST(${col} AS DATE)`;
case 'week':
return `DATEADD(WEEK, DATEDIFF(WEEK, 0, ${col}), 0)`;
case 'month':
return `DATEFROMPARTS(YEAR(${col}), MONTH(${col}), 1)`;
case 'quarter':
return `DATEFROMPARTS(YEAR(${col}), (DATEPART(QUARTER, ${col}) - 1) * 3 + 1, 1)`;
case 'year':
return `DATEFROMPARTS(YEAR(${col}), 1, 1)`;
}
}
getCustomTimeTruncExpression(column: string, interval: string, origin?: string, timezone?: string): string {
const col = timezone ? `${column} AT TIME ZONE 'UTC' AT TIME ZONE '${timezone}'` : column;
const [amount, unit] = interval.split(' ');
const originExpr = origin ? `'${origin}'` : `'1970-01-01'`;
return `DATEADD(${unit}, (DATEDIFF(${unit}, ${originExpr}, ${col}) / ${amount}) * ${amount}, ${originExpr})`;
}
parseIntervalToSql(interval: string): string {
return `'${interval}'`;
}
}

View file

@ -1,8 +1,6 @@
import type { KtxProgressPort, KtxProgressUpdateOptions } from './context/scan/types.js';
import type { KtxCliIo } from './index.js';
import type { KtxIngestProgressUpdate } from './ingest.js';
import type { KtxManagedPythonInstallPolicy } from './managed-python-command.js';
import { publicDatabaseIngestMessage, publicQueryHistoryMessage } from './public-ingest-copy.js';
import type {
KtxPublicIngestArgs,
KtxPublicIngestDeps,
@ -10,7 +8,8 @@ import type {
KtxPublicIngestProject,
KtxPublicIngestTargetResult,
} from './public-ingest.js';
import { buildPublicIngestPlan, executePublicIngestTarget } from './public-ingest.js';
import { buildPublicIngestPlan, executePublicIngestTarget, publicProgressMessage } from './public-ingest.js';
import { createAggregateProgressPort } from './progress-port-adapter.js';
import { formatDuration } from './demo-metrics.js';
import { profileMark } from './startup-profile.js';
@ -88,7 +87,6 @@ export interface ContextBuildArgs {
targetConnectionId?: string;
all?: boolean;
entrypoint?: 'setup' | 'ingest';
depth?: Extract<KtxPublicIngestArgs, { command: 'run' }>['depth'];
queryHistory?: Extract<KtxPublicIngestArgs, { command: 'run' }>['queryHistory'];
queryHistoryWindowDays?: number;
scanMode?: Extract<KtxPublicIngestArgs, { command: 'run' }>['scanMode'];
@ -371,19 +369,17 @@ function retryCommand(input: {
projectDir?: string;
entrypoint?: 'setup' | 'ingest';
connectionId?: string;
depth?: 'fast' | 'deep';
queryHistory?: boolean;
queryHistoryWindowDays?: number;
}): string {
const projectPart = input.projectDir ? ` --project-dir ${input.projectDir}` : '';
if (input.entrypoint === 'ingest' && input.connectionId) {
const depthPart = input.depth ? ` --${input.depth}` : '';
const queryHistoryPart = input.queryHistory ? ' --query-history' : '';
const windowPart =
input.queryHistory && input.queryHistoryWindowDays !== undefined
? ` --query-history-window-days ${input.queryHistoryWindowDays}`
: '';
return `ktx ingest ${input.connectionId}${projectPart}${depthPart}${queryHistoryPart}${windowPart}`;
return `ktx ingest ${input.connectionId}${projectPart}${queryHistoryPart}${windowPart}`;
}
return input.projectDir ? `ktx setup --project-dir ${input.projectDir}` : 'ktx setup';
}
@ -694,7 +690,7 @@ function isLocalSqlAnalysisConnectionRefused(input: { capturedOutput?: string; f
function friendlyDriverName(driver: string): string {
const normalized = driver.toLowerCase();
if (normalized === 'postgres' || normalized === 'postgresql') return 'PostgreSQL';
if (normalized === 'postgres') return 'PostgreSQL';
if (normalized === 'mysql') return 'MySQL';
if (normalized === 'sqlserver') return 'SQL Server';
if (normalized === 'bigquery') return 'BigQuery';
@ -746,7 +742,6 @@ function appendRetryIfNeeded(input: {
projectDir: input.projectDir,
entrypoint: input.entrypoint,
connectionId: input.target.connectionId,
depth: input.target.databaseDepth,
queryHistory: input.target.queryHistory?.enabled === true,
queryHistoryWindowDays: input.target.queryHistory?.windowDays,
})}`;
@ -769,7 +764,6 @@ function failureTextForTarget(input: {
projectDir: input.projectDir,
entrypoint: input.entrypoint,
connectionId: input.target.connectionId,
depth: input.target.databaseDepth,
queryHistory: input.target.queryHistory?.enabled === true,
queryHistoryWindowDays: input.target.queryHistory?.windowDays,
})}`,
@ -784,7 +778,6 @@ function failureTextForTarget(input: {
projectDir: input.projectDir,
entrypoint: input.entrypoint,
connectionId: input.target.connectionId,
depth: input.target.databaseDepth,
queryHistory: input.target.queryHistory?.enabled === true,
queryHistoryWindowDays: input.target.queryHistory?.windowDays,
})}`,
@ -816,17 +809,6 @@ export function initViewState(targets: KtxPublicIngestPlanTarget[]): ContextBuil
};
}
function publicProgressMessage(message: string, target: KtxPublicIngestPlanTarget): string {
let current = message;
if (target.operation === 'database-ingest') {
current = publicDatabaseIngestMessage(current);
}
if (target.steps.includes('query-history')) {
current = publicQueryHistoryMessage(current, target.connectionId);
}
return current;
}
function formatProgressDetail(
update: Pick<KtxIngestProgressUpdate, 'percent' | 'message'>,
target: KtxPublicIngestPlanTarget,
@ -835,29 +817,6 @@ function formatProgressDetail(
return `[${percent}%] ${publicProgressMessage(update.message, target)}`;
}
function createContextBuildProgressPort(
onProgress: (update: KtxIngestProgressUpdate) => void,
state: { progress: number } = { progress: 0 },
start = 0,
weight = 1,
): KtxProgressPort {
return {
async update(value: number, message?: string, options?: KtxProgressUpdateOptions): Promise<void> {
const absoluteValue = start + Math.max(0, Math.min(1, value)) * weight;
state.progress = Math.max(state.progress, Math.min(1, absoluteValue));
if (!message) return;
onProgress({
percent: Math.max(0, Math.min(100, Math.round(state.progress * 100))),
message,
...(options?.transient !== undefined ? { transient: options.transient } : {}),
});
},
startPhase(phaseWeight: number): KtxProgressPort {
return createContextBuildProgressPort(onProgress, state, state.progress, weight * phaseWeight);
},
};
}
export async function runContextBuild(
project: KtxPublicIngestProject,
args: ContextBuildArgs,
@ -868,7 +827,6 @@ export async function runContextBuild(
projectDir: args.projectDir,
...(args.targetConnectionId ? { targetConnectionId: args.targetConnectionId } : {}),
all: args.all ?? true,
...(args.depth ? { depth: args.depth } : {}),
...(args.queryHistory ? { queryHistory: args.queryHistory } : {}),
...(args.queryHistoryWindowDays !== undefined ? { queryHistoryWindowDays: args.queryHistoryWindowDays } : {}),
...(args.scanMode ? { scanMode: args.scanMode } : {}),
@ -935,7 +893,6 @@ export async function runContextBuild(
all: args.all ?? true,
json: false,
inputMode: args.inputMode,
...(args.depth ? { depth: args.depth } : {}),
...(args.queryHistory ? { queryHistory: args.queryHistory } : {}),
...(args.queryHistoryWindowDays !== undefined ? { queryHistoryWindowDays: args.queryHistoryWindowDays } : {}),
...(args.scanMode ? { scanMode: args.scanMode } : {}),
@ -1030,7 +987,7 @@ export async function runContextBuild(
};
const progressDeps: KtxPublicIngestDeps = {
scanProgress: createContextBuildProgressPort(updateSchemaPhase),
scanProgress: createAggregateProgressPort(updateSchemaPhase),
ingestProgress: updateIngestPhase,
runtimeIo: io,
onPhaseStart,
@ -1040,7 +997,7 @@ export async function runContextBuild(
let result: KtxPublicIngestTargetResult | null = null;
let thrownError: unknown = null;
try {
result = await execTarget(targetState.target, runArgs, capture.io, progressDeps);
result = await execTarget(targetState.target, runArgs, capture.io, progressDeps, project);
} catch (error) {
thrownError = error;
}

View file

@ -0,0 +1,87 @@
import type { KtxTableRef } from '../scan/types.js';
export type KtxDialectIdentifierShape = 'ansi' | 'sqlite' | 'three-part';
export type KtxDialectTableRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
export function safeSqlLimit(limit: number): number {
return Math.max(1, Math.floor(limit));
}
function safeSqlOffset(offset: number | undefined): number | null {
if (offset === undefined) {
return null;
}
const normalized = Math.floor(offset);
return normalized > 0 ? normalized : null;
}
function cleanIdentifierPart(part: string): string {
return part.trim().replace(/^["'`\[]|["'`\]]$/g, '');
}
function splitDisplay(display: string): string[] {
return display.trim().split('.').map(cleanIdentifierPart).filter(Boolean);
}
function tableParts(table: KtxDialectTableRef, shape: KtxDialectIdentifierShape): string[] {
if (shape === 'sqlite') {
return [table.name];
}
return [table.catalog ?? null, table.db ?? null, table.name].filter((part): part is string => Boolean(part));
}
function acceptedDisplayPartCounts(shape: KtxDialectIdentifierShape): readonly number[] {
if (shape === 'sqlite') {
return [1];
}
if (shape === 'three-part') {
return [3];
}
return [2, 3];
}
export function formatDialectTableName(
table: KtxDialectTableRef,
quoteIdentifier: (identifier: string) => string,
shape: KtxDialectIdentifierShape,
): string {
return tableParts(table, shape).map(quoteIdentifier).join('.');
}
export function formatDialectDisplayRef(table: KtxDialectTableRef, shape: KtxDialectIdentifierShape): string {
return tableParts(table, shape).join('.');
}
export function parseDialectDisplayRef(display: string, shape: KtxDialectIdentifierShape): KtxTableRef | null {
const parts = splitDisplay(display);
if (!acceptedDisplayPartCounts(shape).includes(parts.length)) {
return null;
}
if (parts.length === 1) {
return { catalog: null, db: null, name: parts[0]! };
}
if (parts.length === 2) {
return { catalog: null, db: parts[0]!, name: parts[1]! };
}
if (parts.length === 3) {
return { catalog: parts[0]!, db: parts[1]!, name: parts[2]! };
}
return null;
}
export function columnDisplayPartCount(shape: KtxDialectIdentifierShape): 1 | 2 | 3 {
if (shape === 'sqlite') {
return 1;
}
if (shape === 'three-part') {
return 3;
}
return 2;
}
export function limitOffsetClause(limit: number, offset?: number): string {
const safeLimit = safeSqlLimit(limit);
const safeOffset = safeSqlOffset(offset);
return safeOffset === null ? `LIMIT ${safeLimit}` : `LIMIT ${safeLimit} OFFSET ${safeOffset}`;
}

View file

@ -1,30 +0,0 @@
import { describe, expect, it } from 'vitest';
import { getDialectForDriver } from './dialects.js';
describe('getDialectForDriver', () => {
it.each([
['postgres', '"public"."orders"'],
['postgresql', '"public"."orders"'],
['mysql', '`public`.`orders`'],
['clickhouse', '`public`.`orders`'],
['sqlite', '"orders"'],
['snowflake', '"analytics"."public"."orders"'],
['bigquery', '`analytics`.`public`.`orders`'],
['sqlserver', '[analytics].[public].[orders]'],
] as const)('formats table names for %s', (driver, expected) => {
const dialect = getDialectForDriver(driver);
expect(
dialect.formatTableName({
catalog: driver === 'snowflake' || driver === 'bigquery' || driver === 'sqlserver' ? 'analytics' : null,
db: driver === 'sqlite' ? null : 'public',
name: 'orders',
}),
).toBe(expected);
});
it('throws with a supported-driver list for unknown drivers', () => {
expect(() => getDialectForDriver('oracle')).toThrow(
'Unsupported warehouse driver "oracle". Supported drivers: bigquery, clickhouse, mysql, postgres, postgresql, sqlite, sqlite3, snowflake, sqlserver',
);
});
});

View file

@ -1,102 +1,64 @@
import type { KtxSchemaDimensionType, KtxTableRef } from '../scan/types.js';
type SupportedDriver =
| 'postgres'
| 'postgresql'
| 'mysql'
| 'sqlserver'
| 'snowflake'
| 'bigquery'
| 'clickhouse'
| 'sqlite'
| 'sqlite3';
import { KtxBigQueryDialect } from '../../connectors/bigquery/dialect.js';
import { KtxClickHouseDialect } from '../../connectors/clickhouse/dialect.js';
import { KtxMysqlDialect } from '../../connectors/mysql/dialect.js';
import { KtxPostgresDialect } from '../../connectors/postgres/dialect.js';
import { KtxSqliteDialect } from '../../connectors/sqlite/dialect.js';
import { KtxSnowflakeDialect } from '../../connectors/snowflake/dialect.js';
import { KtxSqlServerDialect } from '../../connectors/sqlserver/dialect.js';
import type { KtxConnectionDriver, KtxSchemaDimensionType, KtxTableRef } from '../scan/types.js';
import type { KtxDialectTableRef } from './dialect-helpers.js';
export interface KtxDialect {
readonly type: SupportedDriver;
readonly type: KtxConnectionDriver;
quoteIdentifier(identifier: string): string;
formatTableName(table: KtxTableRef): string;
formatTableName(table: KtxDialectTableRef): string;
formatDisplayRef(table: KtxDialectTableRef): string;
parseDisplayRef(display: string): KtxTableRef | null;
columnDisplayTablePartCount(): 1 | 2 | 3;
getLimitOffsetClause(limit: number, offset?: number): string;
getTopClause(limit: number): string;
getRandomSampleFilter(samplePct: number): string;
getTableSampleClause(samplePct: number): string;
generateSampleQuery(tableName: string, limit: number, columns?: string[]): string;
generateColumnSampleQuery(tableName: string, columnName: string, limit: number): string;
getSampleValueAggregation(innerSql: string): string;
generateCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string;
generateRandomizedCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string;
generateDistinctValuesQuery(tableName: string, columnName: string, limit: number): string;
generateColumnStatisticsQuery(schemaName: string, tableName: string): string | null;
getNullCountExpression(column: string): string;
getDistinctCountExpression(column: string): string;
textLengthExpression(columnSql: string): string;
castToText(columnSql: string): string;
mapToDimensionType(nativeType: string): KtxSchemaDimensionType;
mapDataType(nativeType: string): string;
}
const supportedDrivers: SupportedDriver[] = [
const supportedDrivers: KtxConnectionDriver[] = [
'bigquery',
'clickhouse',
'mysql',
'postgres',
'postgresql',
'sqlite',
'sqlite3',
'snowflake',
'sqlserver',
];
function doubleQuoted(identifier: string): string {
return `"${identifier.replace(/"/g, '""')}"`;
}
function backtickQuoted(identifier: string): string {
return `\`${identifier.replace(/`/g, '``')}\``;
}
function bigQueryQuoted(identifier: string): string {
return `\`${identifier.replace(/`/g, '\\`')}\``;
}
function bracketQuoted(identifier: string): string {
return `[${identifier.replace(/\]/g, ']]')}]`;
}
function inferDimensionType(nativeType: string): KtxSchemaDimensionType {
const normalized = nativeType.toLowerCase().trim();
if (normalized.includes('date') || normalized.includes('time')) {
return 'time';
}
if (
normalized.includes('int') ||
normalized.includes('num') ||
normalized.includes('dec') ||
normalized.includes('float') ||
normalized.includes('double') ||
normalized.includes('real')
) {
return 'number';
}
if (normalized.includes('bool') || normalized === 'bit') {
return 'boolean';
}
return 'string';
}
function formatWithParts(table: KtxTableRef, quote: (identifier: string) => string, sqlite = false): string {
const parts = sqlite ? [table.name] : [table.catalog, table.db, table.name].filter((part): part is string => !!part);
return parts.map(quote).join('.');
}
function createDialect(type: SupportedDriver, quote: (identifier: string) => string, sqlite = false): KtxDialect {
return {
type,
quoteIdentifier: quote,
formatTableName: (table) => formatWithParts(table, quote, sqlite),
mapToDimensionType: inferDimensionType,
};
}
const dialects: Record<SupportedDriver, KtxDialect> = {
postgres: createDialect('postgres', doubleQuoted),
postgresql: createDialect('postgresql', doubleQuoted),
mysql: createDialect('mysql', backtickQuoted),
clickhouse: createDialect('clickhouse', backtickQuoted),
sqlite: createDialect('sqlite', doubleQuoted, true),
sqlite3: createDialect('sqlite3', doubleQuoted, true),
snowflake: createDialect('snowflake', doubleQuoted),
bigquery: createDialect('bigquery', bigQueryQuoted),
sqlserver: createDialect('sqlserver', bracketQuoted),
const dialectFactories: Record<KtxConnectionDriver, () => KtxDialect> = {
bigquery: () => new KtxBigQueryDialect(),
clickhouse: () => new KtxClickHouseDialect(),
mysql: () => new KtxMysqlDialect(),
postgres: () => new KtxPostgresDialect(),
sqlite: () => new KtxSqliteDialect(),
snowflake: () => new KtxSnowflakeDialect(),
sqlserver: () => new KtxSqlServerDialect(),
};
export function getDialectForDriver(driver: string): KtxDialect {
const normalized = driver.toLowerCase().trim();
if (normalized in dialects) {
return dialects[normalized as SupportedDriver];
const factory = dialectFactories[normalized as KtxConnectionDriver];
if (factory) {
return factory();
}
throw new Error(`Unsupported warehouse driver "${driver}". Supported drivers: ${supportedDrivers.join(', ')}`);
}

View file

@ -0,0 +1,199 @@
import type { KtxConnectionDriver, KtxScanConnector } from '../scan/types.js';
/** @internal */
export type KtxScopeConfigKey = 'dataset_ids' | 'databases' | 'schemas' | 'schema_names';
/** @internal */
export interface KtxDriverConnectorModule {
isConnectionConfig(connection: unknown): boolean;
createScanConnector(args: {
connectionId: string;
connection: unknown;
projectDir: string;
}): KtxScanConnector;
}
export interface KtxDriverRegistration {
readonly driver: KtxConnectionDriver;
readonly scopeConfigKey: KtxScopeConfigKey | null;
readonly hasHistoricSqlReader: boolean;
readonly hasLocalQueryExecutor: boolean;
load(): Promise<KtxDriverConnectorModule>;
}
function invalidConnectionConfig(driver: KtxConnectionDriver): Error {
return new Error(`Connection config does not match warehouse driver "${driver}".`);
}
/** @internal */
export const driverRegistrations: Record<KtxConnectionDriver, KtxDriverRegistration> = {
bigquery: {
driver: 'bigquery',
scopeConfigKey: 'dataset_ids',
hasHistoricSqlReader: true,
hasLocalQueryExecutor: false,
load: async () => {
const m = await import('../../connectors/bigquery/connector.js');
return {
isConnectionConfig: (connection) => {
const typedConnection = connection as Parameters<typeof m.isKtxBigQueryConnectionConfig>[0];
return m.isKtxBigQueryConnectionConfig(typedConnection);
},
createScanConnector: ({ connectionId, connection }) => {
const typedConnection = connection as Parameters<typeof m.isKtxBigQueryConnectionConfig>[0];
if (!m.isKtxBigQueryConnectionConfig(typedConnection)) {
throw invalidConnectionConfig('bigquery');
}
return new m.KtxBigQueryScanConnector({ connectionId, connection: typedConnection });
},
};
},
},
clickhouse: {
driver: 'clickhouse',
scopeConfigKey: 'databases',
hasHistoricSqlReader: false,
hasLocalQueryExecutor: false,
load: async () => {
const m = await import('../../connectors/clickhouse/connector.js');
return {
isConnectionConfig: (connection) => {
const typedConnection = connection as Parameters<typeof m.isKtxClickHouseConnectionConfig>[0];
return m.isKtxClickHouseConnectionConfig(typedConnection);
},
createScanConnector: ({ connectionId, connection }) => {
const typedConnection = connection as Parameters<typeof m.isKtxClickHouseConnectionConfig>[0];
if (!m.isKtxClickHouseConnectionConfig(typedConnection)) {
throw invalidConnectionConfig('clickhouse');
}
return new m.KtxClickHouseScanConnector({ connectionId, connection: typedConnection });
},
};
},
},
mysql: {
driver: 'mysql',
scopeConfigKey: 'schemas',
hasHistoricSqlReader: false,
hasLocalQueryExecutor: false,
load: async () => {
const m = await import('../../connectors/mysql/connector.js');
return {
isConnectionConfig: (connection) => {
const typedConnection = connection as Parameters<typeof m.isKtxMysqlConnectionConfig>[0];
return m.isKtxMysqlConnectionConfig(typedConnection);
},
createScanConnector: ({ connectionId, connection }) => {
const typedConnection = connection as Parameters<typeof m.isKtxMysqlConnectionConfig>[0];
if (!m.isKtxMysqlConnectionConfig(typedConnection)) {
throw invalidConnectionConfig('mysql');
}
return new m.KtxMysqlScanConnector({ connectionId, connection: typedConnection });
},
};
},
},
postgres: {
driver: 'postgres',
scopeConfigKey: 'schemas',
hasHistoricSqlReader: true,
hasLocalQueryExecutor: true,
load: async () => {
const m = await import('../../connectors/postgres/connector.js');
return {
isConnectionConfig: (connection) => {
const typedConnection = connection as Parameters<typeof m.isKtxPostgresConnectionConfig>[0];
return m.isKtxPostgresConnectionConfig(typedConnection);
},
createScanConnector: ({ connectionId, connection }) => {
const typedConnection = connection as Parameters<typeof m.isKtxPostgresConnectionConfig>[0];
if (!m.isKtxPostgresConnectionConfig(typedConnection)) {
throw invalidConnectionConfig('postgres');
}
return new m.KtxPostgresScanConnector({ connectionId, connection: typedConnection });
},
};
},
},
sqlite: {
driver: 'sqlite',
scopeConfigKey: null,
hasHistoricSqlReader: false,
hasLocalQueryExecutor: true,
load: async () => {
const m = await import('../../connectors/sqlite/connector.js');
return {
isConnectionConfig: (connection) => {
const typedConnection = connection as Parameters<typeof m.isKtxSqliteConnectionConfig>[0];
return m.isKtxSqliteConnectionConfig(typedConnection);
},
createScanConnector: ({ connectionId, connection, projectDir }) => {
const typedConnection = connection as Parameters<typeof m.isKtxSqliteConnectionConfig>[0];
if (!m.isKtxSqliteConnectionConfig(typedConnection)) {
throw invalidConnectionConfig('sqlite');
}
return new m.KtxSqliteScanConnector({ connectionId, connection: typedConnection, projectDir });
},
};
},
},
snowflake: {
driver: 'snowflake',
scopeConfigKey: 'schema_names',
hasHistoricSqlReader: true,
hasLocalQueryExecutor: false,
load: async () => {
const m = await import('../../connectors/snowflake/connector.js');
return {
isConnectionConfig: (connection) => {
const typedConnection = connection as Parameters<typeof m.isKtxSnowflakeConnectionConfig>[0];
return m.isKtxSnowflakeConnectionConfig(typedConnection);
},
createScanConnector: ({ connectionId, connection, projectDir }) => {
const typedConnection = connection as Parameters<typeof m.isKtxSnowflakeConnectionConfig>[0];
if (!m.isKtxSnowflakeConnectionConfig(typedConnection)) {
throw invalidConnectionConfig('snowflake');
}
return new m.KtxSnowflakeScanConnector({ connectionId, connection: typedConnection, projectDir });
},
};
},
},
sqlserver: {
driver: 'sqlserver',
scopeConfigKey: 'schemas',
hasHistoricSqlReader: false,
hasLocalQueryExecutor: false,
load: async () => {
const m = await import('../../connectors/sqlserver/connector.js');
return {
isConnectionConfig: (connection) => {
const typedConnection = connection as Parameters<typeof m.isKtxSqlServerConnectionConfig>[0];
return m.isKtxSqlServerConnectionConfig(typedConnection);
},
createScanConnector: ({ connectionId, connection }) => {
const typedConnection = connection as Parameters<typeof m.isKtxSqlServerConnectionConfig>[0];
if (!m.isKtxSqlServerConnectionConfig(typedConnection)) {
throw invalidConnectionConfig('sqlserver');
}
return new m.KtxSqlServerScanConnector({ connectionId, connection: typedConnection });
},
};
},
},
};
const supportedDrivers = Object.keys(driverRegistrations).sort() as KtxConnectionDriver[];
function isRegisteredDriver(driver: string): driver is KtxConnectionDriver {
return Object.prototype.hasOwnProperty.call(driverRegistrations, driver);
}
export function getDriverRegistration(driver: string): KtxDriverRegistration | undefined {
const normalized = driver.toLowerCase().trim();
return isRegisteredDriver(normalized) ? driverRegistrations[normalized] : undefined;
}
export function listSupportedDrivers(): KtxConnectionDriver[] {
return [...supportedDrivers];
}

View file

@ -1,3 +1,4 @@
import { driverRegistrations, getDriverRegistration } from './drivers.js';
import { createPostgresQueryExecutor } from './postgres-query-executor.js';
import type {
KtxSqlQueryExecutionInput,
@ -5,6 +6,7 @@ import type {
KtxSqlQueryExecutorPort,
} from './query-executor.js';
import { createSqliteQueryExecutor } from './sqlite-query-executor.js';
import type { KtxConnectionDriver } from '../scan/types.js';
export interface DefaultLocalQueryExecutorOptions {
postgres?: KtxSqlQueryExecutorPort;
@ -15,20 +17,43 @@ function driverFor(input: KtxSqlQueryExecutionInput): string {
return String(input.connection?.driver ?? '').toLowerCase();
}
function localExecutorMap(
options: DefaultLocalQueryExecutorOptions,
): Partial<Record<KtxConnectionDriver, KtxSqlQueryExecutorPort>> {
const wiredExecutors: Partial<Record<KtxConnectionDriver, KtxSqlQueryExecutorPort>> = {
postgres: options.postgres ?? createPostgresQueryExecutor(),
sqlite: options.sqlite ?? createSqliteQueryExecutor(),
};
const executors: Partial<Record<KtxConnectionDriver, KtxSqlQueryExecutorPort>> = {};
for (const registration of Object.values(driverRegistrations)) {
if (!registration.hasLocalQueryExecutor) continue;
const executor = wiredExecutors[registration.driver];
if (executor) {
executors[registration.driver] = executor;
}
}
return executors;
}
export function createDefaultLocalQueryExecutor(options: DefaultLocalQueryExecutorOptions = {}): KtxSqlQueryExecutorPort {
const postgres = options.postgres ?? createPostgresQueryExecutor();
const sqlite = options.sqlite ?? createSqliteQueryExecutor();
const executors = localExecutorMap(options);
return {
async execute(input: KtxSqlQueryExecutionInput): Promise<KtxSqlQueryExecutionResult> {
const driver = driverFor(input);
if (driver === 'postgres' || driver === 'postgresql') {
return postgres.execute(input);
const registration = getDriverRegistration(driver);
if (!registration?.hasLocalQueryExecutor) {
throw new Error(`No local query executor is configured for driver "${input.connection?.driver ?? 'unknown'}".`);
}
if (driver === 'sqlite' || driver === 'sqlite3') {
return sqlite.execute(input);
const executor = executors[registration.driver];
if (!executor) {
throw new Error(
`Local query executor flag is enabled for driver "${registration.driver}", but no executor factory is wired.`,
);
}
throw new Error(`No local query executor is configured for driver "${input.connection?.driver ?? 'unknown'}".`);
return executor.execute(input);
},
};
}

View file

@ -20,10 +20,8 @@ export interface LocalConnectionInfo {
const DRIVER_TO_CONNECTION_TYPE: Record<string, ConnectionType> = {
postgres: 'POSTGRESQL',
postgresql: 'POSTGRESQL',
sqlite: 'SQLITE',
sqlserver: 'SQLSERVER',
mssql: 'SQLSERVER',
mysql: 'MYSQL',
clickhouse: 'CLICKHOUSE',
snowflake: 'SNOWFLAKE',

View file

@ -38,7 +38,7 @@ export function createPostgresQueryExecutor(options: PostgresQueryExecutorOption
async execute(input: KtxSqlQueryExecutionInput): Promise<KtxSqlQueryExecutionResult> {
const driver = connectionDriver(input);
const connection = input.connection;
if (driver !== 'postgres' && driver !== 'postgresql') {
if (driver !== 'postgres') {
throw new Error(`Local Postgres execution cannot run driver "${connection?.driver ?? 'unknown'}".`);
}
if (typeof connection?.url !== 'string' || connection.url.trim().length === 0) {

View file

@ -52,7 +52,7 @@ function sqlitePathFromUrl(url: string): string {
/** @internal */
export function sqliteDatabasePathFromConnection(input: KtxSqlQueryExecutionInput): string {
const driver = connectionDriver(input);
if (driver !== 'sqlite' && driver !== 'sqlite3') {
if (driver !== 'sqlite') {
throw new Error(`Local SQLite execution cannot run driver "${input.connection?.driver ?? 'unknown'}".`);
}

View file

@ -0,0 +1,39 @@
/** @internal */
export function createAbortError(message = 'Aborted'): DOMException {
return new DOMException(message, 'AbortError');
}
export function isAbortError(error: unknown): boolean {
if (error instanceof DOMException && error.name === 'AbortError') {
return true;
}
if (!error || typeof error !== 'object') {
return false;
}
const record = error as { name?: unknown; code?: unknown };
return record.name === 'AbortError' || record.code === 'ABORT_ERR';
}
/** @internal */
export function throwIfAborted(signal?: AbortSignal): void {
if (signal?.aborted) {
throw createAbortError();
}
}
export function linkAbortSignal(parent?: AbortSignal): { controller: AbortController; dispose: () => void } {
const controller = new AbortController();
if (!parent) {
return { controller, dispose: () => undefined };
}
if (parent.aborted) {
controller.abort(createAbortError());
return { controller, dispose: () => undefined };
}
const onAbort = () => controller.abort(createAbortError());
parent.addEventListener('abort', onAbort, { once: true });
return {
controller,
dispose: () => parent.removeEventListener('abort', onAbort),
};
}

View file

@ -200,27 +200,78 @@ export class BigQueryHistoricSqlQueryHistoryReader {
config: HistoricSqlUnifiedPullConfig,
): AsyncIterable<AggregatedTemplate> {
const sql = `
WITH filtered_jobs AS (
SELECT
COALESCE(query_info.query_hashes.normalized_literals, TO_HEX(SHA256(query))) AS template_id,
query,
user_email,
creation_time,
end_time,
error_result
FROM ${this.viewPath}
WHERE job_type = 'QUERY'
AND statement_type IN ('SELECT', 'MERGE')
AND creation_time >= ${timestampExpression(window.start)}
AND creation_time < ${timestampExpression(window.end)}
AND query IS NOT NULL
),
template_stats AS (
SELECT
template_id,
MIN(query) AS canonical_sql,
COUNT(*) AS executions,
COUNT(DISTINCT user_email) AS distinct_users,
MIN(creation_time) AS first_seen,
MAX(creation_time) AS last_seen,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(50)] AS p50_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(95)] AS p95_ms,
SAFE_DIVIDE(COUNTIF(error_result IS NOT NULL), COUNT(*)) AS error_rate,
CAST(NULL AS INT64) AS rows_produced
FROM filtered_jobs
GROUP BY template_id
HAVING COUNT(*) >= ${config.minExecutions}
),
template_users AS (
SELECT
template_id,
user_email AS user,
COUNT(*) AS executions,
MAX(creation_time) AS last_seen
FROM filtered_jobs
GROUP BY template_id, user_email
)
SELECT
query_hash AS template_id,
MIN(query) AS canonical_sql,
COUNT(*) AS executions,
COUNT(DISTINCT user_email) AS distinct_users,
MIN(creation_time) AS first_seen,
MAX(creation_time) AS last_seen,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(50)] AS p50_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(95)] AS p95_ms,
SAFE_DIVIDE(COUNTIF(error_result IS NOT NULL), COUNT(*)) AS error_rate,
CAST(NULL AS INT64) AS rows_produced,
TO_JSON_STRING(ARRAY_AGG(STRUCT(user_email AS user, 1 AS executions) ORDER BY creation_time DESC LIMIT 5)) AS top_users
FROM ${this.viewPath}
WHERE job_type = 'QUERY'
AND statement_type IN ('SELECT', 'MERGE')
AND creation_time >= ${timestampExpression(window.start)}
AND creation_time < ${timestampExpression(window.end)}
AND query IS NOT NULL
GROUP BY query_hash
HAVING COUNT(*) >= ${config.minExecutions}
ORDER BY executions DESC`.trim();
stats.template_id,
stats.canonical_sql,
stats.executions,
stats.distinct_users,
stats.first_seen,
stats.last_seen,
stats.p50_ms,
stats.p95_ms,
stats.error_rate,
stats.rows_produced,
TO_JSON_STRING(
ARRAY_AGG(
STRUCT(users.user AS user, users.executions AS executions)
ORDER BY users.executions DESC, users.last_seen DESC
)
) AS top_users
FROM template_stats AS stats
JOIN template_users AS users
ON users.template_id = stats.template_id
GROUP BY
stats.template_id,
stats.canonical_sql,
stats.executions,
stats.distinct_users,
stats.first_seen,
stats.last_seen,
stats.p50_ms,
stats.p95_ms,
stats.error_rate,
stats.rows_produced
ORDER BY stats.executions DESC`.trim();
const result = await queryClient(client).executeQuery(sql);
if (result.error) {
throw grantsError(result.error);

View file

@ -1,6 +1,7 @@
import { createHash } from 'node:crypto';
import { readFile, readdir } from 'node:fs/promises';
import { join, relative } from 'node:path';
import { tableRefKey } from '../../../scan/table-ref.js';
import type { ChunkResult, DiffSet, ScopeDescriptor, WorkUnit } from '../../types.js';
import { isHistoricSqlPatternInputShardPath } from './pattern-inputs.js';
import { stagedManifestSchema, stagedPatternsInputSchema, stagedTableInputSchema } from './types.js';
@ -37,7 +38,7 @@ export async function chunkHistoricSqlUnifiedStagedDir(stagedDir: string, diffSe
}
const table = stagedTableInputSchema.parse(await readJson(stagedDir, path));
workUnits.push({
unitKey: `historic-sql-table-${safeUnitKey(table.table)}`,
unitKey: `historic-sql-table-${safeUnitKey(tableRefKey(table.tableRef))}`,
displayLabel: `Historic SQL usage: ${table.table}`,
rawFiles: [path],
dependencyPaths: ['manifest.json'],
@ -60,7 +61,7 @@ export async function chunkHistoricSqlUnifiedStagedDir(stagedDir: string, diffSe
dependencyPaths: ['manifest.json'],
peerFileIndex: files.filter((file) => file !== path && file !== 'manifest.json').sort(),
notes:
`Use historic_sql_patterns. Read ${path} and emit pattern objects with emit_historic_sql_evidence using rawPath "${path}". Do not call wiki_write or sl_write_source.`,
`Use historic_sql_patterns. Read ${path} and emit pattern objects with emit_historic_sql_evidence. Do not call wiki_write or sl_write_source.`,
});
}

View file

@ -1,5 +1,9 @@
import { getDriverRegistration } from '../../../connections/drivers.js';
import type { KtxConnectionDriver } from '../../../scan/types.js';
import type { HistoricSqlDialect } from './types.js';
const historicSqlDialects: readonly HistoricSqlDialect[] = ['postgres', 'bigquery', 'snowflake'];
function recordOrNull(value: unknown): Record<string, unknown> | null {
return value && typeof value === 'object' && !Array.isArray(value) ? (value as Record<string, unknown>) : null;
}
@ -10,10 +14,33 @@ function queryHistoryRecord(connection: unknown): Record<string, unknown> | null
return context ? recordOrNull(context.queryHistory) : null;
}
function historicSqlDialectForDriver(driver: KtxConnectionDriver): HistoricSqlDialect {
const dialect = historicSqlDialects.find((candidate) => candidate === driver);
if (!dialect) {
throw new Error(`Driver "${driver}" is marked as historic-SQL capable but has no HistoricSqlDialect mapping.`);
}
return dialect;
}
export function isQueryHistoryEnabled(connection: unknown): boolean {
return queryHistoryRecord(connection)?.enabled === true;
}
/**
* Resolves the query-history dialect from the connection's driver capability
* alone, ignoring whether query history is enabled in ktx.yaml. Use this on the
* adapter-registration path when query history has been explicitly requested
* for the run (e.g. via `--query-history`, which is itself the opt-in): the
* persisted `context.queryHistory.enabled` flag must not gate registration.
* Returns null when the connection's driver has no query-history reader.
*/
export function historicSqlDialectForConnectionDriver(connection: unknown): HistoricSqlDialect | null {
const conn = recordOrNull(connection);
const driver = String(conn?.driver ?? '').toLowerCase();
const registration = getDriverRegistration(driver);
return registration?.hasHistoricSqlReader ? historicSqlDialectForDriver(registration.driver) : null;
}
/**
* Resolves the query-history dialect for a connection. Returns null when
* query history is disabled, or when the connection's driver has no
@ -23,10 +50,5 @@ export function queryHistoryDialectForConnection(connection: unknown): HistoricS
if (!isQueryHistoryEnabled(connection)) {
return null;
}
const conn = recordOrNull(connection);
const driver = String(conn?.driver ?? '').toLowerCase();
if (driver === 'postgres' || driver === 'postgresql') return 'postgres';
if (driver === 'bigquery') return 'bigquery';
if (driver === 'snowflake') return 'snowflake';
return null;
return historicSqlDialectForConnectionDriver(connection);
}

Some files were not shown because too many files have changed in this diff Show more