Commit graph

398 commits

Author SHA1 Message Date
Andrey Avtomonov
9b38b00194 fix(docs-site): disable Geist Mono ligatures on every font-mono surface
Geist Mono fuses `--` into an em-dash glyph that visually swallows the
adjacent space, so prompts like `npx skills add Kaelio/ktx --skill ktx`
rendered as `Kaelio/ktx--skill ktx` on the quickstart page. The existing
ligature-off rule only covered <code>/<pre> and the .ktx-code wrapper —
quickstart.mdx puts the prompt in a plain <div className="font-mono">,
so the rule didn't apply. Extend the selector to also match the
.font-mono Tailwind utility and any inline-style opt-in via the mono
font CSS variable.

Document the convention in AGENTS.md so future docs additions keep
ligatures off on any new monospace container.
2026-05-28 12:41:26 +02:00
Andrey Avtomonov
39f94f39ff
docs: add ktx skills.sh setup skill (#227) 2026-05-28 12:28:10 +02:00
Luca Martial
27842e14a9
docs: add context layer terminology (#226) 2026-05-28 05:58:08 -04:00
Andrey Avtomonov
6837ab253d
fix(cli): align ingest step counter with SDK num_turns (#225)
The Claude Code runtime counted every SDKAssistantMessage with
parent_tool_use_id === null as a step, but the SDK emits extra messages
within a single num_turns round-trip — `stop_reason: 'pause_turn'`
continuations and errored partials it retries internally. The local
counter then outran maxTurns and the ingest HUD rendered confusing
ratios like `step 69/40`.

Filter both cases in collectResult so stepIndex tracks num_turns and
stays bounded by the work-unit stepBudget.
2026-05-28 02:09:53 +02:00
Andrey Avtomonov
a94f35800a
feat(docs-site): redirect ktx.sh/slack to Slack community invite (#224)
Add a host-scoped redirect for /slack on ktx.sh before the existing
catch-all so the path resolves to the community invite link instead of
docs.kaelio.com/ktx/slack.
2026-05-27 18:20:51 +02:00
semantic-release-bot
5d74bd35de chore(release): 0.6.0 [skip ci]
## [0.6.0](https://github.com/Kaelio/ktx/compare/v0.5.0...v0.6.0) (2026-05-26)

### Features

* **cli:** skip-context-sources menu + clack-style tree picker UX ([#213](https://github.com/Kaelio/ktx/issues/213)) ([cfd1749](cfd1749ab9))
* **cli:** surface docs and demo-warehouse links in ktx setup ([#221](https://github.com/Kaelio/ktx/issues/221)) ([62699bf](62699bfe9d))
* **connectors:** generalize readiness and constraint handling ([#212](https://github.com/Kaelio/ktx/issues/212)) ([78b8a0c](78b8a0c025))

### Bug Fixes

* **ingest:** attribute historic-sql evidence writes in bundle report ([#220](https://github.com/Kaelio/ktx/issues/220)) ([1071f9d](1071f9d1c9))
* **scripts:** make package artifacts pnpm launch work on Windows ([2a6fb19](2a6fb19ba4))
* update ktx CI boundary checks ([#223](https://github.com/Kaelio/ktx/issues/223)) ([bc7373f](bc7373fa8e))

### Documentation

* ban ktx compatibility shims ([#214](https://github.com/Kaelio/ktx/issues/214)) ([a9db379](a9db3797e6))
* **readme:** restructure for clarity and add FAQ + comparison table ([#222](https://github.com/Kaelio/ktx/issues/222)) ([0eeac6f](0eeac6f980))
* standardize fanout terminology ([#218](https://github.com/Kaelio/ktx/issues/218)) ([9248688](924868841d))

### Code Refactoring

* remove legacy ktx compatibility shims ([#211](https://github.com/Kaelio/ktx/issues/211)) ([96952fb](96952fb43c))

### Tests

* split cli tests from source tree ([#216](https://github.com/Kaelio/ktx/issues/216)) ([56985b7](56985b7e09))

### Continuous Integration

* disable telemetry in workflows ([#217](https://github.com/Kaelio/ktx/issues/217)) ([4827437](4827437f3a))
2026-05-26 21:19:07 +00:00
Andrey Avtomonov
bc7373fa8e
fix: update ktx CI boundary checks (#223) 2026-05-26 23:03:47 +02:00
Andrey Avtomonov
0eeac6f980 docs(readme): restructure for clarity and add FAQ + comparison table (#222)
* docs(readme): restructure for clarity and add FAQ + comparison table

Restructure the README: trim Common Commands to the 6 essentials and link
to the CLI Reference, add a "How ktx compares" table and "Who is ktx for"
qualifier, introduce a small FAQ, wrap key prompts in GitHub callouts,
merge the duplicate workspace-layout section into Development, move
Telemetry next to License, and add a Star History chart.

* docs(readme): tighten Skip-ktx list and convert FAQ to bullets
2026-05-26 14:29:53 +02:00
Andrey Avtomonov
62699bfe9d
feat(cli): surface docs and demo-warehouse links in ktx setup (#221)
Add a Clack note pointing to https://docs.kaelio.com/ktx right after the
setup intro, and a second note pointing to https://kaelio.com/start
above the database driver multiselect — mirroring the docs-site CTA
wording. Closes KLO-715 and KLO-716.
2026-05-26 13:42:52 +02:00
Andrey Avtomonov
1071f9d1c9
fix(ingest): attribute historic-sql evidence writes in bundle report (#220)
The emit_historic_sql_evidence tool took rawPath as LLM-supplied input,
so projection actions frequently lacked defensible raw paths and every
row in bundle_ingest_reports fell through as actionType: 'skipped' with
null artifact metadata, hiding the wiki pages and SL merges the run had
actually produced (KLO-698).

The tool now reads the work unit's rawFiles from session.allowedRawPaths
and stores them on the evidence envelope; the projection emits actions
with those paths, and stale/archive actions are anchored to manifest.json
so they also surface as non-skipped provenance rows.
2026-05-26 12:21:53 +02:00
ARYAN
2a6fb19ba4
fix(scripts): make package artifacts pnpm launch work on Windows
Fix Windows package artifact script invocation under pnpm.
2026-05-26 12:16:53 +02:00
Andrey Avtomonov
56985b7e09
test: split cli tests from source tree (#216)
* feat(cli): define full warehouse dialect contract

* test(cli): keep dialect edge tests focused

* fix(cli): stabilize dialect contract foundation

* refactor(connectors): own read-only query preparation

* refactor(connectors): resolve dialects through registry

* refactor(connectors): keep concrete dialect classes internal

* chore(workspace): enforce dialect import boundary

* refactor(cli): resolve relationship dialect at scan boundary

* refactor(cli): use dialect display parsing for entity details

* refactor(cli): use dialect display parsing for warehouse catalog

* refactor(cli): use dialect SQL in relationship workflows

* test(cli): verify solid dialect scan workflow closure

* test: split cli tests from source tree

* refactor(cli): standardize BigQuery scope listing

* feat(sqlite): implement connector scope listing

* test(connectors): cover required table listing

* feat(cli): add warehouse driver registry

* refactor(setup): route scope discovery through driver registry

* refactor(cli): route local query execution through driver registry

* refactor(historic-sql): route dialect support through driver registry

* refactor(cli): test warehouse connections through driver registry

* fix(cli): close driver registry type export gaps

* Improve setup daemon diagnostics

* refactor(setup): centralize rail-prefixed diagnostics + query-history fallback

Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput
into clack.ts so the setup wizard, managed daemons, and embedding/agent steps
share one rail-formatted writer. setup-databases.ts also adds a
"disable query history and retry" option when the schema-context build fails
and query history is the likely culprit, surfaced via a new
failed-query-history-unavailable status.

* fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match

The setup picker's KtxTableListEntry was a 2-level { schema, name }, so
qualifiedTableId always wrote db.name into enabled_tables. When BigQuery,
Snowflake, or SQL Server later ran fast ingest, their introspect step filtered
the scope set with scopedTableNames(scope, { catalog: projectId|database, db })
— catalog was non-null on the introspect side but null in the scope refs, so
every entry was rejected, the live-database adapter staged zero table files,
and detect() failed with 'Adapter "live-database" did not recognize fetched
source output'.

Align the picker boundary with the canonical 3-level KtxTableRef:

- Add catalog: string | null to KtxTableListEntry.
- BigQuery/Snowflake/SQL Server listTables populate catalog from the
  resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null.
- qualifiedTableId emits catalog.schema.name when catalog is non-null
  (resolveEnabledTables already accepts the 3-part shape) and
  schemasFromEnabledTables now goes through parseDottedTableEntry so it
  recovers the schema correctly from both 2-part and 3-part entries.
- Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker
  reuse.

Update listTables expectations in all seven connector tests and the setup /
picker test fixtures. Add a picker regression test that covers the
catalog-bearing round-trip (save + refine).

* fix(cli): allow debug telemetry under opt-out env
2026-05-26 08:49:05 +02:00
Luca Martial
924868841d
docs: standardize fanout terminology (#218) 2026-05-25 11:09:33 -04:00
Andrey Avtomonov
4827437f3a
ci: disable telemetry in workflows (#217) 2026-05-25 16:12:39 +02:00
Andrey Avtomonov
a9db3797e6
docs: ban ktx compatibility shims (#214) 2026-05-24 22:55:08 +02:00
Andrey Avtomonov
78b8a0c025
feat(connectors): generalize readiness and constraint handling (#212)
* feat(connectors): add postgres maxConnections

* feat(connectors): add mysql maxConnections

* feat(connectors): add sqlserver maxConnections

* feat(connectors): rename snowflake pool config

* docs: document connector maxConnections

* feat(scan): add constraint discovery warning helper

* feat(scan): carry structural warnings through reports

* feat(postgres): soft-fail denied constraint discovery

* feat(mysql): soft-fail denied constraint discovery

* feat(sqlserver): soft-fail denied constraint discovery

* feat(bigquery): soft-fail denied primary key discovery

* feat(snowflake): report denied primary key discovery

* test(scan): verify constraint discovery warnings

* feat(historic-sql): use shared readiness probes

* docs: document query history readiness probes

* test(historic-sql): verify readiness probe registry

* test(ingest): account for live database warnings artifact

* Add skip option for agent setup
2026-05-24 19:30:06 +02:00
Andrey Avtomonov
cfd1749ab9
feat(cli): skip-context-sources menu + clack-style tree picker UX (#213)
* feat(cli): add 'skip context sources' option to database setup menu

After databases are configured, the post-setup menu now offers a 'Skip
context sources' choice equivalent to passing --skip-sources, which
plumbs through KtxSetupDatabasesResult.skipSources to bypass the
context-source step in the same run.

* feat(cli): standardize tree picker UX after clack autocomplete-multiselect

Search is always on (no '/' to enter): typed printable chars feed the
query, Tab toggles selection on the focused node without leaving the
search bar, and Space toggles only after arrow-key navigation
(isNavigating); otherwise it is appended to the query. Esc clears a
non-empty query before quitting, Ctrl+A and Ctrl+N replace bare-letter
bulk bindings, and the cursor refocuses on the first match when the
query change would hide it.
2026-05-24 19:29:37 +02:00
Andrey Avtomonov
96952fb43c
refactor: remove legacy ktx compatibility shims (#211)
* refactor: remove legacy ktx compatibility shims

* fix: restore overlay collision guidance
2026-05-24 16:57:23 +02:00
semantic-release-bot
a954a29a76 chore(release): 0.5.0 [skip ci]
## [0.5.0](https://github.com/Kaelio/ktx/compare/v0.4.1...v0.5.0) (2026-05-23)

### Features

* **cli:** add --fast flag and Local data section to ktx status ([#198](https://github.com/Kaelio/ktx/issues/198)) ([1c7131c](1c7131c6c2))
* **cli:** redesign database scope picker for searchable schema-first setup ([#203](https://github.com/Kaelio/ktx/issues/203)) ([c87d14a](c87d14a554))
* **telemetry:** anonymous posthog usage telemetry across node cli and python daemon ([#205](https://github.com/Kaelio/ktx/issues/205)) ([b0dd13c](b0dd13ce7c))

### Bug Fixes

* **cli:** treat omitted sentence-transformers base_url as managed daemon ([#194](https://github.com/Kaelio/ktx/issues/194)) ([9fc715a](9fc715ac6a)), closes [#184](https://github.com/Kaelio/ktx/issues/184) [#192](https://github.com/Kaelio/ktx/issues/192)
* **snowflake:** unblock multi-schema ingest and relationship discovery ([#204](https://github.com/Kaelio/ktx/issues/204)) ([394a985](394a985d2a)), closes [#206](https://github.com/Kaelio/ktx/issues/206)
* surface silent failures and drop unused dead-code paths ([#193](https://github.com/Kaelio/ktx/issues/193)) ([0958bc0](0958bc03dc))
* surface silent failures in SL, wiki, and embedding wiring ([#195](https://github.com/Kaelio/ktx/issues/195)) ([488b955](488b955024))

### Documentation

* add agent terminology rules and link from AGENTS.md ([#197](https://github.com/Kaelio/ktx/issues/197)) ([d67cf0a](d67cf0aab8))
* add code-design principles and link from AGENTS.md ([#199](https://github.com/Kaelio/ktx/issues/199)) ([a1cfb03](a1cfb03d73))
* add ktx.yaml configuration reference ([#200](https://github.com/Kaelio/ktx/issues/200)) ([5211a03](5211a0317e))
* bold Claude Pro/Max subscription note in README ([50c7bbc](50c7bbc957))
* **quickstart:** redesign demo-warehouse callout with sticker icons ([#202](https://github.com/Kaelio/ktx/issues/202)) ([fd2ba62](fd2ba62d92))
* rewrite context-as-code as reviewing-context guide ([#201](https://github.com/Kaelio/ktx/issues/201)) ([4d4296f](4d4296f397))

### Code Refactoring

* remove legacy compatibility shims ([#208](https://github.com/Kaelio/ktx/issues/208)) ([db09936](db09936085))

### Other Changes

* **workspace:** gate dead-code with knip production mode ([#196](https://github.com/Kaelio/ktx/issues/196)) ([2366b00](2366b00301))
2026-05-23 23:06:49 +00:00
Andrey Avtomonov
db09936085
refactor: remove legacy compatibility shims (#208) 2026-05-24 01:00:20 +02:00
Andrey Avtomonov
394a985d2a
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure

Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.

- Remove the free-text Snowflake schema prompt; only pass `schema` to
  snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
  user for a comma-separated list, persist it as `schema_names`, and use
  it as both the table-list filter and the multiselect default. Applies
  to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
  documented single-schema shorthand.

* fix(snowflake): keep introspecting when primary-key discovery is denied

The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.

Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.

* fix(scan): unblock relationship discovery on Snowflake

Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:

- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
  for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
  profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
  Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
  (Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
  `connector.sampleColumn` into bare locals, losing the `this` binding when
  the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
  Every sample call threw "Cannot read properties of undefined (reading
  'assertConnection')" and degraded LLM descriptions to metadata-only
  prompts. Call the methods through the connector instead.

Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.

* test(scan): cover table-ref helpers

* feat(scan): plumb tableScope through live-database introspection port

* feat(scan): apply tableScope during metadata fetch

* feat(scan): enforce table scope at fetch boundary

* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)

* feat(cli): add RSA key-pair auth option to Snowflake setup wizard

Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.

* feat(scan): pool Snowflake sessions

* fix(scan): reuse structural snapshots and cleanup connectors

* feat(scan): parallelize relationship profiling

* feat(scan): batch table description generation

* docs: document Snowflake ingest concurrency knobs

* fix(scan): close Snowflake ingest perf verification gaps

* fix(scan): keep batched description failure bounded

* feat(scan): dispatch query-history probes by connection driver

Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.

Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.

* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject

The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.

Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.

generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.

* chore(scripts): add ktx-reset.sh project-cleanup helper

Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
2026-05-23 10:41:30 +02:00
Andrey Avtomonov
b0dd13ce7c
feat(telemetry): anonymous posthog usage telemetry across node cli and python daemon (#205)
* feat: add telemetry phase 1

* feat: add node telemetry event catalog

* feat: add telemetry event helpers

* feat: emit setup and connection telemetry

* feat: emit connection and stack telemetry

* feat: emit ingest and scan telemetry

* feat: emit query telemetry

* feat: emit sampled mcp telemetry

* docs: expand telemetry event catalog

* feat: add telemetry schema sync artifact

* feat: pass telemetry project id to semantic daemon

* feat: add daemon telemetry foundation

* feat: emit semantic daemon telemetry

* feat: emit daemon lifecycle telemetry

* docs: document full telemetry event catalog

* feat(telemetry): dim first-run notice

* feat(telemetry): show first-run notice before command output

* feat(telemetry): wire ktx PostHog project for live ingestion

* docs(telemetry): drop posthog project name and host from storage section

* docs(telemetry): trim to general overview and disclaimer

* docs(agents): add short telemetry guidelines

* feat(telemetry): enable posthog geoip enrichment

* docs(telemetry): drop ip-geoip note from public overview

* refactor(telemetry): drop no-op groupIdentify, rely on capture groups field

* fix(telemetry): respect CI kill switch in python daemon identity

* fix(sql): route table-count analysis to existing analyze-batch endpoint

* fix(telemetry): emit install_first_run from notice path and derive flagsPresent from commander

* fix(telemetry): read package info via getKtxCliPackageInfo to satisfy boundary check

* fix(telemetry): make python identity env={} bypass os.environ and unset CI in tests

* fix(telemetry): unset CI kill switch in cli-program-telemetry tests
2026-05-22 18:18:47 +02:00
Andrey Avtomonov
c87d14a554
feat(cli): redesign database scope picker for searchable schema-first setup (#203)
* feat: add searchable setup prompt pickers

* fix: make snowflake scope discovery single query

* fix: make bigquery table discovery schema scoped

* fix: honor mysql and clickhouse database scope

* feat: wire schema scope discovery for all relational setup drivers

* feat: add schema-first database scope picker

* test: update setup prompt stubs for type-check

* docs: document database scope picker fields

* Fix database setup edit preservation

---------

Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
2026-05-22 14:22:11 +02:00
Andrey Avtomonov
fd2ba62d92
docs(quickstart): redesign demo-warehouse callout with sticker icons (#202)
* docs(quickstart): redesign demo-warehouse callout with sticker icons

Replaces the plain warning-style callout with a two-column layout: text
and a pill-shaped CTA on the left, a 2x2 cluster of rotated Postgres,
Metabase, dbt, and Notion sticker tiles on the right. Adds the four
connector SVGs under docs-site/public/icons/ to support it.

* chore(docs-site): refresh auto-generated next-env.d.ts
2026-05-21 16:04:58 +02:00
Andrey Avtomonov
4d4296f397
docs: rewrite context-as-code as reviewing-context guide (#201)
* docs: rewrite context-as-code as reviewing-context guide

Move the page from Concepts to Guides and rebuild around an interactive
review-loop diagram. Extract pan/zoom + fit-view controls into a shared
FlowCanvas wrapper and adopt it across all three docs diagrams.

* test: point examples-docs assertion at reviewing-context

Update the doc smoke test that read context-as-code.mdx to read the new
guides/reviewing-context.mdx path. The `ktx ingest --all --no-input`
assertion still holds; the rename was the only break.
2026-05-21 15:42:50 +02:00
Andrey Avtomonov
5211a0317e
docs: add ktx.yaml configuration reference (#200)
Adds a new Configuration section to the docs with a reference page that
covers every top-level block of ktx.yaml: connections, setup, storage,
llm, ingest, scan, agent, and memory. Each block lists fields, defaults,
accepted values, and a short YAML example, with a leading schematic that
groups blocks into inputs, compute, and persistence.
2026-05-21 15:29:20 +02:00
Andrey Avtomonov
2366b00301
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm

* refactor(workspace): rewrite @ktx/llm imports to relative paths

* refactor(workspace): fold internal packages into cli

* chore(workspace): gate dead-code with knip production mode

Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.

* refactor(cli): delete internal barrel index.ts files

The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).

This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
  (the published package entry).
- Rewrites ~270 source/test files to import each name directly from
  the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
  `create-warehouse-verification-tools.ts` (the function it defined
  locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
  the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
  live-database/extracted-schema, live-database/structural-sync,
  relationship-* feedback/review chain) plus their tests and a
  cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
  (notion-client, connector barrels in scan/local-scan-connectors
  tests) to mock the source files instead.
- Points the maintainer benchmark script
  (`scripts/relationship-benchmark-report.mjs`) at source files
  instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
  production entries only for the benchmark code reached via dist by
  the maintainer script.

Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.

`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.

* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly

Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.

Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.

* docs: align "agent clients" and "data agents" terminology

Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.

Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.

* refactor(release): single source of truth for package version

Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.

Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.

- update-public-release-version.mjs rewrites both Python pyproject.toml
  files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
  normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
  @semantic-release/git assets so the release commit back to main
  carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
  replaced with "?? getKtxCliPackageInfo().version", and
  createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
  always reflects the most recent release; no sentinel pin to
  maintain.

Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.

* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime

Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.

* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal

Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.

* fix(cli): use real package metadata in print-command-tree

The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.

* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts

Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.

Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
Andrey Avtomonov
a1cfb03d73
docs: add code-design principles and link from AGENTS.md (#199)
Add docs/code-design.md with seven cross-cutting principles for
avoiding overengineering (one way to say one thing, behavior follows
from inputs, failures must reach a decision-maker, no seams without a
second consumer, spec-and-behavior drift, verify the path you fixed,
naming asymmetries). Reference it from AGENTS.md with the same weight
as in-file MUST/MUST NOT rules, mirroring the docs/terminology.md
pattern.
2026-05-21 14:33:27 +02:00
Andrey Avtomonov
1c7131c6c2
feat(cli): add --fast flag and Local data section to ktx status (#198)
Add --fast to skip checks requiring external communication (Claude Code
auth probe and Postgres pg_stat_statements probe); skipped checks render
as `-` and carry `"status": "skipped"` in JSON output. Always show a new
Local data section sourced from .ktx/db.sqlite (ingest run counts and
last-completed per connection, knowledge page counts by scope, semantic
layer source/dictionary value counts) plus on-disk sizes for .ktx/db.sqlite,
.ktx/cache/, raw-sources/, wiki/global/, and semantic-layer/. Wrap the
remaining slow probes in a @clack/prompts spinner when stdout is a TTY.
2026-05-21 14:13:03 +02:00
Andrey Avtomonov
50c7bbc957 docs: bold Claude Pro/Max subscription note in README 2026-05-21 13:09:41 +02:00
Andrey Avtomonov
d67cf0aab8
docs: add agent terminology rules and link from AGENTS.md (#197)
Introduce `docs/terminology.md` with the canonical vocabulary coding
agents should use across docs, code, comments, CLI strings, and error
messages — including the disambiguation rule for the overloaded word
`source` (semantic / primary / context / source of truth) and a
converged-vs-banned table covering connectors, ingest modes, MCP
naming, reconciliation, and supported-system orderings. Reference the
new file from the `Product Naming` section of AGENTS.md so Claude,
Codex, and Gemini all pick it up via the existing AGENTS.md symlinks.
2026-05-21 12:52:50 +02:00
Andrey Avtomonov
488b955024
fix: surface silent failures in SL, wiki, and embedding wiring (#195)
* fix: surface silent failures in SL, wiki, and embedding wiring

- require non-empty `vertex.location` in the project schema instead of defaulting
  to an empty string with a description that promised SDK fallback the resolver
  never honored
- log YAML parse failures from `SemanticLayerService.loadSource` and
  `KnowledgeWikiService.readPage` so corrupted overlays aren't silently treated
  as "does not exist" by ingest/agent tools
- push directory-listing errors in `loadAllSources` and `listPageKeys` into the
  load-error / log path instead of returning empty success
- accept an `embeddingProvider` in `createLocalProjectMemoryIngest` and plumb the
  resolved CLI provider through `mcp-server-factory`; warn in both the memory
  and bundle runtimes when they fall back to `NoopEmbeddingPort` while the
  project config requests an active embedding backend
- clarify `embeddings.dimensions` description as a placeholder valid only with
  `backend: none`, and tighten the sentence-transformers `base_url` description
  to call out that managed-daemon resolution is CLI-only

* test: improve PR coverage
2026-05-21 10:38:23 +02:00
Andrey Avtomonov
9fc715ac6a
fix(cli): treat omitted sentence-transformers base_url as managed daemon (#194)
After PR #184 and #192 moved managed-embeddings URL resolution to the
CLI project boundary and made `ktx setup` persist `ktx.yaml` without a
`base_url`, the status command still treated the empty value as
misconfiguration and printed "no base_url configured", dragging the
verdict down to "Partially ready — embedding credentials missing".

Update `buildEmbeddingsStatus` to recognize the managed-daemon
convention and report it as ok. Add a `status-project.test.ts` covering
the explicit-url, omitted, empty-string, and openai-missing-key paths.
2026-05-21 02:38:34 +02:00
Andrey Avtomonov
0958bc03dc
fix: surface silent failures and drop unused dead-code paths (#193)
Address overengineering audit findings across cli/context/connector packages:

- F1 Snowflake `query`: drop bare catch that flattened all errors to empty result
- F2 memory-agent: treat LLM `stopReason === 'error'` as crash (skip squash-merge)
- F3 WikiSearchTool: description honest about token-only fallback vs sqlite-fts5 hybrid
- F5 Scan enrichment provider resolution: return discriminated status and surface
  distinct `llm_unavailable` / `embedding_unavailable` warnings per failure mode
- F6 Relationship validation budget: drop dead `tableCount === undefined → 'all'`
  branch; update tests to pass `tableCount` like production
- F8 `ktx sql`: use canonical `resolveOutputMode` (now honors KTX_OUTPUT/CI/TTY)
- F9 MCP stdio server: default `protocolIo.stderr` to `process.stderr` so
  memory_ingest startup failures are visible
- F13/F14 Scan/setup JSON readers: distinguish ENOENT from corruption instead of
  silently treating both as missing
- F15 `createKtxCliScanConnector`: throw config-shape error when driver matches
  but type guard rejects, instead of "no native connector"
- F16 ContextEvidenceSearchTool: surface `embedding_unhealthy:<reason>` instead
  of silently dropping the semantic lane
- F17 PromptService: default partials to `[]` (removes stale `clinical_policy`
  reference from a prior product)
- F20 `contextBuildCommands`: drop unused `runId` parameter

Dead-code removal:

- F4 Delete `AgentRunnerService` (duplicated `RuntimeAgentRunner`, only test-used);
  migrate tests to exercise `AiSdkKtxLlmRuntime.runAgentLoop` directly
- F7 Delete `KtxScanOrchestrator` and its test (no production callers; the
  inline pipeline in `runLocalScan` is the single source of truth)
- F18 Delete `generateKtxText`/`generateKtxObject` pass-through helpers; inline
  the single `runtime.generateObject` call at its caller

Plus a clarifying comment on the SQLite `resolveStringReference` `file:` carve-out
(load-bearing for SQLite URI form, not a bug).
2026-05-21 02:38:18 +02:00
semantic-release-bot
7737ccaf1a chore(release): 0.4.1 [skip ci]
## [0.4.1](https://github.com/Kaelio/ktx/compare/v0.4.0...v0.4.1) (2026-05-21)

### Bug Fixes

* **cli:** resolve embedding provider explicitly and surface lane status in sl search ([#192](https://github.com/Kaelio/ktx/issues/192)) ([9d92c79](9d92c79988))

### Documentation

* **concepts:** add Wiki retrieval pillar page ([#191](https://github.com/Kaelio/ktx/issues/191)) ([ed2d2f9](ed2d2f9be0))

### Other Changes

* **docs-site:** add dev shortcut and fix hero heading clipping ([#190](https://github.com/Kaelio/ktx/issues/190)) ([56a9672](56a967278a))
2026-05-21 00:23:17 +00:00
Andrey Avtomonov
9d92c79988
fix(cli): resolve embedding provider explicitly and surface lane status in sl search (#192)
* feat(cli): add tryUseManagedLocalEmbeddingsDaemon for read-only callers

* feat(cli): add resolveProjectEmbeddingProvider helper

* fix(cli): wire sl search through resolveProjectEmbeddingProvider so semantic lane works

* fix(cli): wire wiki/knowledge search through resolveProjectEmbeddingProvider

* feat(cli): surface embeddings-unavailable status when sl search returns empty

* refactor(cli): route admin reindex through resolveProjectEmbeddingProvider

* refactor: pass embeddingProvider into ingest/scan instead of resolving inside @ktx/context

* refactor(mcp): resolve embedding provider in CLI factory, pass into context ports

* refactor(context): delete MANAGED_SENTENCE_TRANSFORMERS_BASE_URL sentinel

* refactor(cli): delete sentinel-based managed-embeddings indirection

* chore: scrub stale managed-embeddings sentinel references from tests and smoke script

* chore: unexport unused EmbeddingResolutionMode alias

* fix(cli): force pathPrefix="" when targeting the managed embeddings daemon

The managed daemon serves /embeddings/compute directly. The default
pathPrefix in @ktx/llm is /api, so omitting sentenceTransformers from
ktx.yaml produced /api/embeddings/compute -> 404. The resolver now
sets pathPrefix='' explicitly when wiring the managed daemon URL,
matching what the daemon actually exposes.
2026-05-21 02:21:22 +02:00
Andrey Avtomonov
56a967278a
chore(docs-site): add dev shortcut and fix hero heading clipping (#190)
* chore(docs-site): add dev shortcut and fix hero heading clipping

- Add `pnpm docs` script that frees port 3000 then runs the docs-site
  dev server, so the docs preview is one command away.
- Bump hero heading line-height to 1.2 and add 0.15em bottom padding
  so the gradient text-clip no longer cuts off descenders.
- Sync auto-generated next-env.d.ts to the current Next types path.

* fix(ci): unblock CI on docs-font branch

- Add lsof to knip ignoreBinaries so the new `pnpm docs` script
  (which uses `lsof -ti:3000` to free port 3000) does not trip
  the Unlisted binaries check.
- Make CLI version assertions read @ktx/cli/package.json at runtime
  instead of hardcoding 0.0.0-private. The 0.4.0 release commit on
  main bumped the package version, breaking 18 hardcoded test cases
  in index.test.ts and admin-reindex.test.ts; reading the version
  dynamically keeps the suite green across future version bumps.

* fix ci release version fixtures
2026-05-21 01:30:45 +02:00
Andrey Avtomonov
ed2d2f9be0
docs(concepts): add Wiki retrieval pillar page (#191)
* docs(concepts): add Wiki retrieval pillar page

Adds a dedicated concept page covering the wiki side of the context
layer: the page contract, the hybrid retrieval pipeline (lexical,
semantic, token lanes fused by RRF), the refs/sl_refs/[[wikilink]]
graph, validation that keeps edges live, and where ingest sources
pages. Wired into concepts nav and cross-linked from the-context-layer
to mirror the existing Semantic querying link.

* test: derive release versions in tests instead of hardcoding 0.1.0-rc.1

After @semantic-release/git started committing version bumps back to the
branch, the 0.4.0 release rewrote package.json, packages/cli/package.json,
and release-policy.json — but the script and CLI tests still pinned the
pre-bump strings (0.0.0-private, 0.1.0-rc.1, 0.1.0rc1), so every new
branch off main failed TypeScript checks and Coverage.

Drive the version off the existing source of truth instead: read
@ktx/cli/package.json via createRequire in the CLI tests, and reuse the
already-imported PUBLIC_NPM_PACKAGE_VERSION / RUNTIME_WHEEL_PACKAGE_VERSION
constants in the script tests. The two assertions that pinned those
constants to specific values become semver shape checks.
2026-05-21 01:26:58 +02:00
semantic-release-bot
2f647d5c68 chore(release): 0.4.0 [skip ci]
## [0.4.0](https://github.com/Kaelio/ktx/compare/v0.3.0...v0.4.0) (2026-05-20)

### Features

* **release:** one version everywhere via @semantic-release/git ([#186](https://github.com/Kaelio/ktx/issues/186)) ([2f70861](2f70861a18))

### Bug Fixes

* correct repository URL casing to match canonical Kaelio org name ([#188](https://github.com/Kaelio/ktx/issues/188)) ([b43000f](b43000f961))

### Documentation

* standardize ktx naming ([#187](https://github.com/Kaelio/ktx/issues/187)) ([17647a4](17647a436a))

### Continuous Integration

* **release:** restore RELEASE_PAT for branch push ([#189](https://github.com/Kaelio/ktx/issues/189)) ([16f8a35](16f8a35bee)), closes [#188](https://github.com/Kaelio/ktx/issues/188)
2026-05-20 15:59:28 +00:00
Andrey Avtomonov
16f8a35bee
ci(release): restore RELEASE_PAT for branch push (#189)
Re-applies the RELEASE_PAT wiring on top of the URL-casing fix in #188.
The default GITHUB_TOKEN authenticates as github-actions[bot], which
cannot be added to either restrictions or bypass_pull_request_allowances
on a protected branch. With #188 removing the URL redirect, the PAT
auth header now survives all the way to the protected-branch hook;
since RELEASE_PAT belongs to andreybavt (verified via /user) and
andreybavt is in the bypass list, the push should now be accepted.
2026-05-20 17:57:35 +02:00
Andrey Avtomonov
b43000f961
fix: correct repository URL casing to match canonical Kaelio org name (#188)
semantic-release pushes the release commit to whatever repository.url
holds, then GitHub 301-redirects the lowercase /kaelio/ktx.git to the
canonical /Kaelio/ktx.git. The redirect causes branch-protection actor
evaluation to misbehave (bypass list matches are lost). Pinning the
correct case avoids the redirect entirely.
2026-05-20 17:55:34 +02:00
Andrey Avtomonov
17647a436a
docs: standardize ktx naming (#187)
* docs: align KTX terminology

* docs: standardize ktx naming
2026-05-20 17:33:38 +02:00
Andrey Avtomonov
2f70861a18
feat(release): one version everywhere via @semantic-release/git (#186)
* feat(release): commit version files back to branch for one-version-everywhere

Add @semantic-release/git to the release plugin chain so the bumped
package.json, release-policy.json, and packages/cli/package.json land
back on the release branch after publish. This keeps the published npm
version and the in-repo version files in sync, so local builds from
main report the released version (e.g. ktx --version and the daemon
/health endpoint via KTX_DAEMON_VERSION).

Also widens assertPublicNpmReleaseTag to accept branch-<sanitized> tags,
unblocking branch RC publishes that pass through update-public-release-
version.mjs.

* test(release): pin GITHUB_REF_NAME in main-rc releaseTag assertion

The bare releaseTag('rc') call defaulted to process.env.GITHUB_REF_NAME,
which on PR CI is the merge ref (e.g. 186/merge) and yields
'branch-186-merge' instead of 'next'. Pass an explicit { GITHUB_REF_NAME:
'main' } so the test exercises the main-rc path regardless of CI env.
2026-05-20 17:01:26 +02:00
Andrey Avtomonov
2667952aa9
fix: recover snapshots and branch rc tags (#185) 2026-05-20 15:22:01 +02:00
Andrey Avtomonov
c24e07a115
fix(cli): resolve managed-embeddings daemon URL at project boundary (#184)
A clean `ktx setup` was failing verification because the managed
local-embeddings daemon URL was passed library-side through
`process.env[KTX_MANAGED_SENTENCE_TRANSFORMERS_BASE_URL]`, and the setup
flow never wrote that variable. With no resolved URL the embedding
provider was null, the deep scan emitted
`scan_enrichment_backend_not_configured`, descriptions + embeddings
stayed `skipped`, and the agent-readiness check exited 1.

Replace the env-var indirection with CLI-side substitution at the
project-load boundary. New `loadKtxCliProject` wraps `loadKtxProject`,
ensures the managed daemon when `managed:local-embeddings` is present in
`config.ingest.embeddings` or `config.scan.enrichment.embeddings`, and
substitutes the resolved baseUrl into the in-memory config. Runtime
entry points (scan, ingest, public-ingest, admin-reindex) use the new
loader; setup-time persistence paths keep raw `loadKtxProject` so the
on-disk `ktx.yaml` keeps the portable sentinel.

Cleanup follows from the new design: drop
`MANAGED_SENTENCE_TRANSFORMERS_BASE_URL_ENV`, remove the env-var lookup
branch in `resolveSentenceTransformersBaseUrl`, drop the `env` field
from `ManagedLocalEmbeddingsDaemon`, and collapse the manual
daemon-ensure dance in `admin-reindex.ts`.
2026-05-20 14:43:02 +02:00
Andrey Avtomonov
ad9c9eda0d
docs(context-layer): replace "committed" badge with git icon (#183)
Swap the small "committed" chip on the two-pillars figure for an inline
Git logo + "git" label so the source-of-truth signal reads at a glance.
Adds GitIcon component matching the existing GitHubIcon/SlackIcon
inline-SVG pattern. Also picks up a Next.js-regenerated next-env.d.ts
routes path.
2026-05-20 14:37:45 +02:00
Andrey Avtomonov
cbf87074ff
fix: resolve dependabot security advisories (#179) 2026-05-20 14:17:29 +02:00
Andrey Avtomonov
4ec5903aa5
feat(ingest): adapter-owned finalization replaces post-processor escape hatch (#136)
* Refine adapter-owned ingest finalization design after adversarial review iteration 1

* Refine adapter-owned ingest finalization design after adversarial review iteration 2

* Refine adapter-owned ingest finalization design after adversarial review iteration 3

* Implement adapter-owned ingest finalization v1

Moves finalization from runner-owned post-processors into typed
SourceAdapter.finalize() contracts. Adds finalization report schema,
scope derivation, override replay context, and migrates historic-SQL
projection. Removes IngestBundlePostProcessorPort wiring and
HistoricSqlProjectionPostProcessor.

* feat(ingest): export finalization adapter contract types

* test(ingest): exercise historic sql finalization locally

* docs(plans): add adapter-owned finalization v1 closure plan

* fix(setup): unblock clean Linux installs and add enabled_tables allowlist

- Pin managed Python runtime to 3.13 via `uv venv --python 3.13` so installs
  don't pick the system 3.12 on Ubuntu 24.04 and fail at wheel install.
- Sanitize NO_PROXY/no_proxy for the daemon child process — drop IPv6 CIDR
  entries that httpx rejects with InvalidURL (OrbStack injects these by
  default).
- Add `enabled_tables` allowlist on warehouse connections (zod schema +
  live-database introspection filter) to scope ingest to specific tables.
- Add `getting-started/troubleshooting-linux` docs page covering the Python
  3.13 prerequisite, IPv6 proxy gotcha, and a minimal working recipe; link
  it from the quickstart troubleshooting table and the llms-docs map.
- Make docs-site origin overridable via `KTX_DOCS_ORIGIN` so local builds
  can serve under host.docker.internal.

* Move docs changes to specs repo

* fix(cli): keep managed runtime python version private

* Deduplicate enabled tables filtering
2026-05-20 14:17:10 +02:00
Andrey Avtomonov
fb82993ce1
docs: rewrite the-context-layer concept and highlight markdown frontmatter (#181)
Restructure the-context-layer.mdx around two committed pillars (semantic
sources + wiki pages) with an inline anatomy card, replace the
semantic-layer-only comparison with a three-way matrix against company
brains and traditional semantic layers, and add a navigable-graph
explanation grounded in sl_refs/refs maintenance. Extend the docs-site
CodeBlock with a markdown highlighter that detects YAML frontmatter,
heading and list markers, and inline code so wiki examples render with
the same token colors as YAML/SQL blocks.
2026-05-20 14:16:33 +02:00
Andrey Avtomonov
82d24c164f
docs: add YC P25 badge (#182) 2026-05-20 14:15:44 +02:00