Commit graph

81 commits

Author SHA1 Message Date
Andrey Avtomonov
f8db99811a
feat(context): add driver-discriminated connection schemas (#96)
* refactor(context): export and describe mapping shape schemas

* feat(context): add driver-schemas module with warehouse drivers

* feat(context): add metabase, looker, lookml driver schemas with mappings

* feat(context): add notion, dbt, metricflow driver schemas

* refactor(context): make connectionSchema a driver-discriminated union

* chore(context): re-export KtxConnectionConfig from project package

* docs(context): add connection driver schema plan

* chore(secrets): allowlist example credentials in driver-schemas fixtures

* test(cli): align metabase fixtures with required api_url field

The driver-discriminated union added in this branch now requires api_url
for metabase connections and a known driver for warehouses. Update slow
CLI test fixtures and assertions so they exercise the new schema:
- ingest.test-utils.ts: add api_url to the prod-metabase fixture.
- setup.test.ts: switch metabase fixture from 'url' to 'api_url'.
- local-scan-connectors.test.ts: invalid-driver/missing-driver tests now
  expect the schema error from loadKtxProject (parse-time rejection).
2026-05-15 00:08:11 +02:00
Luca Martial
372c90b533
Polish documentation copy (#98) 2026-05-14 12:43:14 -04:00
Andrey Avtomonov
ce23aca4c4
fix: remove project from ktx config (#95) 2026-05-14 17:39:31 +02:00
Andrey Avtomonov
2bca308863
feat(cli): add ktx dev schema to emit ktx.yaml JSON Schema (#93)
Annotates the Zod config schema with .describe() text on every field and
adds generateKtxProjectConfigJsonSchema() plus a ktx dev schema command
that prints (or writes) a draft-07 JSON Schema for editors and LLM agents.
2026-05-14 16:21:29 +02:00
Andrey Avtomonov
c7c5f63a66
feat(cli): extend ktx connection test to every supported driver (#92)
* feat(cli): extend `ktx connection test` to every supported driver

Dispatch by driver: native DBs now call `connector.testConnection()`
(was `introspect(dryRun)`), looker/notion/metabase hit their auth
endpoints, and dbt/metricflow/lookml run `git ls-remote` via the
existing `testRepoConnection` helper. Unknown drivers exit 1 with a
listing of supported ones.

* feat(cli): add `ktx connection test --all` summary list

Tests every configured connection in parallel and renders a single
Clack-style list (◇/│/◆/└, green ✓ / red ✗) consistent with sl list,
with per-row detail and a passed/failed footer. Exits non-zero if any
connection fails. Single-id `ktx connection test` output is preserved.

* fix(cli): read metabase status url from api_url

`ktx status` was probing `url` / `base_url` on metabase connections, but
ktx.yaml stores it as `api_url`, so the field always reported "url not
set". Read `api_url` directly and align the warning text with the actual
key.
2026-05-14 16:21:18 +02:00
Andrey Avtomonov
b3be54e3fa
refactor(context): validate ktx.yaml with Zod and surface issues in status (#91)
* refactor(context): validate ktx.yaml with Zod and surface issues in status

- Replace hand-rolled ktx.yaml parsing with a strict Zod schema and
  derive KtxProjectConfig types from it.
- Add validateKtxProjectConfig returning structured KtxConfigIssue[]
  with migration hints for deprecated keys (ingest.llm,
  scan.enrichment.backend, etc.).
- Wire ktx status/doctor to run validation, render schema issues in
  plain and JSON output, and add a Config row to project status.
- Update the orbit example to camelCase scan.relationships keys to
  match the schema.

* fix(context): tolerate legacy setup.completed_steps and optional driver

- Accept and drop the legacy setup.completed_steps field so existing
  ktx.yaml files migrated from older versions still load.
- Make connections.<id>.driver optional in the schema; runtime code
  already produces a clear "no driver" error at use time.

* feat(cli): add ktx status --validate to run only ktx.yaml schema validation

- New --validate flag dispatches a focused runKtxDoctor 'validate' branch
  that reads ktx.yaml, runs validateKtxProjectConfig, and skips LLM,
  connection, embedding, and query-history checks.
- Plain output prints a single Config row; JSON output emits
  {ok: true} on success or the existing invalid_config / missing_project
  shapes on failure.
2026-05-14 15:36:35 +02:00
Andrey Avtomonov
49f1e2720e
fix(llm): wire prompt caching through all Anthropic call sites (#90)
* fix(llm): wire prompt caching through all Anthropic call sites

- page-triage classifier + light-extraction now put the static skill
  prompt in `system:` so the per-document caches hit instead of
  re-sending boilerplate in the user message every call.
- Description generation builders return `{ system, user }` with
  instruction text + word limit moved into the cacheable system.
- Relationship-LLM proposal framing moved to `system:`.
- `KtxMessageBuilder.wrapSimple` skips the history breakpoint for
  single-message calls (cache write that could never be reused).
- Gateway backend now sets `anthropic-beta: extended-cache-ttl-2025-04-11`
  so 1h TTLs don't silently downgrade to 5m on Gateway routes.

* fix(llm): keep wrapSimple history breakpoint so multi-step agent loops cache

Reverts the wrapSimple `messages.length > 1` guard from the prior commit.
agent-runner uses wrapSimple with a single user message, but generateText
runs a multi-step tool loop inside it — the cache marker on the first user
message is reused by every subsequent step, so it isn't waste.
The release validator (scripts/validate-llm-debug-jsonl.mjs) also requires
a `message-part` marker target in captured debug JSONL.
2026-05-14 15:36:27 +02:00
Andrey Avtomonov
b00c1a11a9
feat: merge ingest and scan
* docs: add CLI component reuse guidance

* docs: add unified ingest ux design

* Refine unified ingest UX design after adversarial review iteration 1

* Refine unified ingest UX design after adversarial review iteration 2

* Refine unified ingest UX design after adversarial review iteration 3

* feat(cli): route public connection ingest command

* feat(cli): hide standalone scan from public help

* feat(cli): plan public ingest depth and query history

* feat(cli): execute public database ingest facets

* feat(ingest): read connection query history config

* fix(cli): use public ingest wording

* fix(config): stop generating ingest adapter allow lists

* docs: document public ingest command

* test: align ingest surface expectations

* docs: add unified ingest public CLI surface plan

* feat(cli): preflight deep public ingest readiness

* feat(setup): store query history in connection context

* feat(setup): store database context depth

* feat(setup): verify context readiness by database depth

* fix(setup): keep context build foreground only

* fix(config): reject reserved ingest connection ids

* test: close unified ingest v1 expectations

* docs: add unified ingest v1 closure plan

* fix(ingest): bypass adapter allow-list for public source ingest

* fix(ingest): honor query history window intent

* fix(ingest): hide scan internals from public database ingest

* feat(ingest): use foreground view for interactive public ingest

* fix(setup): use schema context and query history wording

* test(cli): verify unified ingest public output

* docs: add unified ingest v1 public output closure plan

* fix(setup): forward query history flags

* fix(setup): prompt for postgres query history

* fix(status): report query history readiness

* fix(ingest): remove legacy public guidance

* fix(ingest): polish foreground retry copy

* docs(examples): use unified query history wording

* chore(ingest): finish public query history cleanup

* docs: add unified ingest v1 query history status cleanup plan

* test(docs): cover unified ingest public docs

* docs: align ingest CLI reference with unified UX

* docs: update context build guides for unified ingest

* docs: update setup and primary source ingest wording

* docs: stop advertising adapter-backed example ingest

* docs: close unified ingest public docs gaps

* docs: add unified ingest v1 docs site closure plan

* fix: render unified ingest foreground warnings

* fix: explain query history schema order

* fix: add public ingest retry guidance

* fix: align setup next steps with unified ingest

* fix: remove scan wording from demo progress

* test: verify unified ingest ux closure

* docs: add unified ingest v1 foreground and retry closure plan

* fix(cli): preserve query-history pull config in public ingest

* fix(cli): omit hidden commands from docs command tree

* test(cli): close unified ingest final public surface checks

* docs: add unified ingest v1 final public surface closure plan

* fix(cli): use public source labels in ingest reports

* fix(cli): suppress low-level public ingest output

* test(cli): verify unified ingest public plain output

* docs: add unified ingest v1 public plain output closure plan

* fix(cli): add public ingest copy sanitizers

* fix(cli): sanitize public ingest progress copy

* fix(cli): rename setup schema scope prompt

* docs(plan): add progress copy closure; test: align setup back-nav fixture

Adds the iter9 plan and updates the setup back-navigation test fixture
to pass disableQueryHistory plus listSchemas/listTables stubs that the
unified ingest setup step now requires.

* docs(plan): add final ux labels plan with narrowed label scans

* fix(cli): aggregate unsupported query-history warnings

* fix(cli): align setup database labels

* test(cli): fix setup database test type-check

* fix(cli): remove primary-source wording from setup output

* test(cli): verify unified ingest setup closure

* docs(plan): add unified ingest v1 verification copy closure plan

* fix(cli): remove top-level scan command

* fix(cli): remove legacy ingest and wiki commands

* Merge scan into ingest flow

* feat(cli): split ingest progress into per-phase rows, rename work units to tasks

Each database target in the unified ingest dashboard now renders one row per
real subprocess (Schema, then Query history when enabled) instead of a single
combined bar. Each phase has its own monotonic 0-100% bar so the progress
never snaps back to zero when historic-sql starts after scan completes.
Completed phases keep their final bar, summary, and elapsed time visible as
an inline audit trail; queued and skipped phases are shown explicitly.

Also rename user-facing "work units" / "Failed work units" to "tasks" /
"Failed tasks" in ingest output and parseIngestSummary. The parser still
accepts the legacy "Work units:" wording in captured output for backward
compat. Internal memory-flow event names and type fields are left alone.

* Fix test harness failures

* Fix CI smoke checks

---------

Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
2026-05-14 01:43:06 +02:00
Andrey Avtomonov
1a472cf3ed
fix: clean up ktx yaml config parameters (#75)
* fix: clean up ktx yaml config parameters

* fix: align ci smoke checks with status output

* test: update artifact smoke status assertion
2026-05-14 01:27:31 +02:00
Andrey Avtomonov
0a261fe8a4
ci: add codecov coverage reporting (#82)
* ci: add codecov coverage reporting

* ci: fix codecov and secret scan checks

* ci: fix smoke and artifact checks
2026-05-14 01:13:31 +02:00
Andrey Avtomonov
28b5e2a83e
fix: align KTX agent tools and repair handling (#73) 2026-05-14 00:57:51 +02:00
Andrey Avtomonov
fa9237956e
ci: run pre-commit checks in CI (#74)
* ci: run pre-commit in CI

* test: update CI workflow guardrail
2026-05-13 19:49:25 +02:00
Andrey Avtomonov
3fde4438b1
fix: stop requiring readonly connection config (#71) 2026-05-13 19:37:25 +02:00
Andrey Avtomonov
d7147f9ca1
feat: rename project wiki directory (#66)
* feat: rename project wiki directory

* test: fix wiki skill ordering expectations

* Show configured context sources in setup
2026-05-13 16:05:58 +02:00
Andrey Avtomonov
97da9919e9
refactor: remove legacy compatibility paths (#64)
* refactor: remove legacy compatibility paths

* fix: support legacy metabase native queries

* test: use canonical semantic layer descriptions

* Rename CLI description

* Recover setup scan from SQLite ABI mismatch

* Remove legacy product name from CLI help
2026-05-13 15:55:00 +02:00
Andrey Avtomonov
e1e9c4af91
fix(cli): clean up connection commands (#62)
* fix(cli): clean up connection commands

* test(cli): update connection smoke coverage

* Fix setup output formatting

* fix notion setup picker exit
2026-05-13 15:04:50 +02:00
Luca Martial
4973ca562f
Restore Vertex AI LLM setup (#56)
* feat(context): resolve Vertex AI config references

* feat(cli): restore Vertex AI LLM setup

---------

Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
2026-05-13 14:42:38 +02:00
Andrey Avtomonov
b75576279c
fix: store Metabase mappings in ktx.yaml (#61)
* fix: store Metabase mappings in ktx.yaml

* docs: note KTX has no public users

* refactor: drop setup progress compatibility
2026-05-13 13:55:21 +02:00
Andrey Avtomonov
c22248dabf
feat(context): add warehouse verification tools (#46)
* feat(context): add warehouse dialect dispatch

* feat(context): read warehouse scan catalog

* feat(context): add entity details verification tool

* feat(context): add ingest SQL verification tool

* feat(context): add raw warehouse discovery tool

* feat(context): expose warehouse verification tools to ingest

* docs(context): add ingest identifier verification protocol

* test(context): guard ingest identifier verification prompts

* chore(context): verify warehouse verification tools

* docs: add warehouse verification tools plan and spec

* fix(context): expose target warehouses to Notion ingest

* fix(context): update ingest prompts for warehouse verification tools

* fix(context): scope raw schema discovery to allowed connections

* fix(context): verify warehouse column display targets

* docs: add notion warehouse verification gap closure plan

* fix(context): include raw discovery connection names

* fix(context): expose warehouse targets for LookML and MetricFlow

* fix(context): pass connection config to ingest query executors

* fix(cli): enable read-only SQL probes for local ingest

* docs: add warehouse verification final v1 closure plan

* fix(context): align warehouse sql probe prompt shape

* docs: add warehouse verification prompt shape closure plan

* test(context): catch connectionless sql execution prompt examples

* fix(context): include connection name in sl capture sql example

* docs: add warehouse verification sql example closure plan

* fix(context): report structured entity detail misses

* docs: add warehouse verification structured target miss closure plan

* fix: report untracked squash merge conflicts

* feat: require ingest verification ledger

* fix: stabilize ingest wiki references
2026-05-13 13:43:23 +02:00
Andrey Avtomonov
bcb0d2f8f7
chore: add TypeScript dead-code checks (#60)
* chore: add TypeScript dead-code checks

* chore: trim stale Knip ignores

* Fix CI smoke and artifact checks
2026-05-13 13:33:28 +02:00
Andrey Avtomonov
721f1a998f
feat(cli)!: remove ktx agent command (#58)
* feat(cli)!: remove ktx agent command

* test(context): update PGlite boundary guardrail
2026-05-13 13:01:56 +02:00
Andrey Avtomonov
b9e0a746af
feat(cli): clean up dev command surface (#57)
* feat(cli): clean up dev command surface

* test: align CI expectations with CLI cleanup

* test(cli): update slow test command expectations
2026-05-13 12:00:08 +02:00
Luca Martial
9a8cb08192 Refine setup table selection flow 2026-05-12 21:31:11 -07:00
Luca Martial
52ddb061a4 Add scan table filtering 2026-05-12 18:22:03 -07:00
Luca Martial
e13350c970
Merge pull request #47 from Kaelio/luca-martial/save-setup-in-dot-ktx
Save setup completion state in .ktx/setup/state.json
2026-05-12 19:27:26 -04:00
Luca Martial
f70271152b feat(context): add local .ktx/setup/state.json for setup completion tracking
Move setup step completion state out of ktx.yaml into a gitignored local
state file so it is not committed or shared across machines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-12 16:26:13 -07:00
Andrey Avtomonov
85fc408054
chore(deps): refresh workspace dependencies (#43)
* chore(deps): refresh workspace dependencies

* Fix pnpm artifact smoke build approvals
2026-05-13 01:15:35 +02:00
Luca Martial
60457e9407
Improve schema setup and Notion ingest UX (#14)
* Improve schema setup and Notion ingest UX

* Handle Postgres network scan failures

* WIP: save local changes before main merge

* Refine setup prompt choices

* Tighten ingest reconciliation guidance

* Commit setup config updates

* Canonicalize unmapped fallback details

* Count reconciliation actions in reports

* Harden semantic layer source validation

* Return wiki content after edits

* Validate SL sources against manifests

* Validate wiki refs before writes

* Simplify CLI next steps

* Clarify agent setup summary

* Surface dbt target SL sources

* Recover SL write fallbacks

* Preserve failed context build metadata

* Track raw paths for ingest actions

* test(cli): update seeded demo expectations

* fix(ingest): scope fallback recovery checks

* fix(sl): tighten source validation guards

* fix(wiki): ignore empty embedding vectors

* Improve Notion ingest UX

* Enforce flat wiki keys

* test(context): update wiki key assertion

---------

Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
2026-05-12 22:56:58 +02:00
Andrey Avtomonov
7a315d580f Merge remote-tracking branch 'origin/main' into andreybavt/execute-context7-plan
# Conflicts:
#	packages/cli/src/ingest.test.ts
#	packages/cli/src/ingest.ts
2026-05-12 14:37:51 +02:00
Andrey Avtomonov
366933c755 perf: parallelize scan description generation 2026-05-12 14:34:59 +02:00
Andrey Avtomonov
4d4441ccd5
fix(context): avoid saving scan error descriptions (#37) 2026-05-12 14:34:15 +02:00
Andrey Avtomonov
15f433930e
Merge branch 'main' into andreybavt/execute-context7-plan 2026-05-12 13:04:16 +02:00
Andrey Avtomonov
d830e8c46e docs: standardize env variable examples 2026-05-12 12:24:25 +02:00
Andrey Avtomonov
d7fb092cb0 feat(cli): route ingest adapter logs through operational logger 2026-05-12 11:26:34 +02:00
Andrey Avtomonov
d5f484eb7e fix: standardize KTX environment variables 2026-05-12 11:21:37 +02:00
Andrey Avtomonov
9e80add72c fix(context): make ingest adapter logging explicit 2026-05-12 11:21:29 +02:00
Andrey Avtomonov
a2dcd4eb08
fix: guide dev ingest llm setup (#15)
Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
2026-05-12 10:26:07 +02:00
Andrey Avtomonov
9d3b1015cc
fix: allow dbt ingest to discover warehouse schemas (#20)
Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
2026-05-12 10:25:56 +02:00
Andrey Avtomonov
b981cabdc6 Include historic SQL projection in memory counts 2026-05-11 22:52:47 +02:00
Andrey Avtomonov
1bd29c7eb1 Fix historic SQL ingest setup and progress 2026-05-11 22:35:07 +02:00
Andrey Avtomonov
2d1efda176 test: verify historic sql pattern shard work units 2026-05-11 20:23:37 +02:00
Andrey Avtomonov
8deac9d530 test: align historic sql pattern skill with shards 2026-05-11 20:21:39 +02:00
Andrey Avtomonov
3e11e33b8a feat: emit historic sql pattern shard work units 2026-05-11 20:20:58 +02:00
Andrey Avtomonov
02b621be72 feat: write historic sql pattern shards 2026-05-11 20:20:22 +02:00
Andrey Avtomonov
2a91ea521f feat: shard historic sql pattern inputs 2026-05-11 20:19:47 +02:00
Andrey Avtomonov
06163452b4 feat: redact historic sql staged artifacts 2026-05-11 20:10:17 +02:00
Andrey Avtomonov
7b38418900 feat: add historic sql redaction helper 2026-05-11 20:09:46 +02:00
Andrey Avtomonov
9c615fcb79 test: cover historic sql retrieval acceptance 2026-05-11 20:02:01 +02:00
Andrey Avtomonov
e39aea7e33 test: cover historic sql projection cleanup 2026-05-11 19:50:08 +02:00
Andrey Avtomonov
c2b2356e5e fix: keep historic sql archived patterns stable 2026-05-11 19:49:48 +02:00