Commit graph

7 commits

Author SHA1 Message Date
Andrey Avtomonov
0689d709d2
fix(cli): survive ktx.yaml version skew and derive repo ownership from disk (#293)
* fix(cli): survive ktx.yaml version skew and derive repo ownership from disk

Loading ktx.yaml is now tolerant of keys this ktx version does not
recognize: they are stripped from the in-memory config (the file on disk
is never rewritten) and reported by ktx status as non-blocking warnings,
while invalid values on recognized fields still fail hard. Repo
ownership is derived from observed state (a .git directory plus a root
ktx.yaml) instead of a ktx.managed git-config marker, so projects
created by any past or future ktx classify identically. initKtxProject
now runs an explicit foreign-repo pre-check and writes ktx.yaml before
initializing git, so an interrupted init leaves only recoverable
residue instead of a bare .git misread as foreign.

* style(cli): trim comment blocks to constraint-only notes

* docs(agents): require constraint-only code comments
2026-06-11 22:10:47 +02:00
Andrey Avtomonov
00cdf2de90
refactor: enforce ktx naming and AGENTS.md compliance sweep (#289)
Align the tree with AGENTS.md/CLAUDE.md conventions:

- Rewrite user-facing strings, docs, and tests to lowercase `ktx`
  (no bare uppercase `KTX` tokens remain outside literal identifiers).
- Drop the legacy `historicSql` migration path and its now-unused
  helpers, per the no-backward-compat rule.
- Remove `as unknown as` / `any` casts: narrow `BaseTool` generics to
  `z.ZodObject`, add a typed `createLookerClient`, and delete the dead
  `getParametersSchema`/`toAnthropicFormat` pre-AI-SDK helpers.
- Use `InvalidArgumentError` for Commander parse failures.
- Finish the adapter→connector prose conversion in the `ktx.yaml` docs
  while keeping the literal `adapters` config key.
2026-06-11 13:49:45 +02:00
Andrey Avtomonov
2877b85adc
fix(cli): isolate ktx-owned project repositories (#283)
* fix(cli): isolate ktx project git repos

* fix(cli): remove inert auto commit config

* test(cli): drop stale auto commit fixtures

* docs: document isolated ktx project repos

* test(cli): keep stale config grep clean

* fix(cli): guide setup away from foreign repos at the project dir

ktx owns the git repo rooted at the project dir and refuses to adopt one it
did not create (the Finding 3 isolation invariant). But setup steered users
straight into that failure: the interactive menu offers "Current directory"
first, and `--no-input --yes --project-dir <repo-root>` created directly in
place — both then threw a generic "Failed to initialize git repository:"
wrapper from deep in GitService.initialize().

Extract the ownership rule into a shared `classifyKtxRepoOwnership(dir)` used by
both GitService.initialize() (the invariant) and the setup wizard (pre-flight
guidance), so the decision derives from one rule. Setup now detects a foreign
repo before constructing GitService and: interactively re-prompts (the user
picks the existing `ktx-project` subfolder), or non-interactively returns a
clean missing-input with the actionable message. The typed foreign-repo error
is also surfaced verbatim instead of being buried under the generic wrapper.

Empty/non-repo current directories still work — only foreign repos are blocked.

* fix(cli): keep classifyKtxRepoOwnership total for non-directory paths

The setup ownership guard runs before the existing not-a-directory check, so
pointing a custom/--project-dir path at a file made classifyKtxRepoOwnership
lstat `<file>/.git`, hit ENOTDIR, and throw — crashing the setup step instead
of returning the friendly "path exists and is not a directory" result.

A path that is a file (or missing) holds no git repo for ktx to avoid, so treat
ENOTDIR like ENOENT and return 'unowned'. The downstream existingFolderState
check still rejects a non-directory with its friendly message, and the
classifier no longer throws raw errno for any caller.
2026-06-10 14:12:25 +02:00
Andrey Avtomonov
f3f893bf01
fix: read semantic sources safely (#284)
* fix: read semantic sources safely

* test: retarget reindex per-scope error case to a broken manifest

Reading a broken standalone source was made non-fatal in de1f1a8d (it is
surfaced for repair instead of throwing), so the reindex per-scope error
test no longer captured an error. Point it at a corrupt manifest shard,
which is the remaining fatal read failure the per-scope catch must
isolate, and assert the captured error names the offending file.

* fix(sl): decouple semantic-layer file names from warehouse naming rules

The in-file `name:` field is now the sole source identity; the filename is
a derived label that never participates in identity. This removes the
"Unsafe semantic-layer source name" failure class entirely: any warehouse
identifier (Snowflake's uppercase SIGNED_UP, EVENT$LOG, dotted names) can
be read, overlaid, edited, and deleted.

- New `source-files.ts`: one total filename derivation (safe lowercase
  names verbatim; otherwise slug + sha256-hash suffix, immune to
  case-insensitive-filesystem collisions) and one by-name file resolver.
- Reads resolve by name everywhere; the path-from-name fast path and
  `assertSafeSourceName` are gone.
- Writes resolve-then-write: rewrites land on the file that declares the
  name (human renames survive); new sources get a derived filename; a
  derived path occupied by a different source fails instead of clobbering.
- `readSourceFile` returns null for missing files instead of forcing every
  caller to launder IO errors; `deleteSource` distinguishes manifest-backed
  sources from not-found instead of silently succeeding.
- `sl_write_source` accepts verbatim warehouse identifiers (snake_case is
  now a recommendation for new sources) and rejects sourceName/source.name
  mismatches; `sl_edit_source` rejects name-changing edits.
- Ingest projection commits, gate-repair allowlists, and touched-source
  derivation use resolved paths / in-file names instead of interpolating
  `<connId>/<name>.yaml`.
- Collapsed the five parallel path derivations and duplicated path-token
  helpers onto the shared module; dropped dead service methods.

* fix(sl): resolve sources by declared name end-to-end and gate warehouse SQL with the parser-backed validator

- Key broken/renamed semantic-layer files by their recoverable in-file
  name (slSourceNameForFile) so mid-edit sources stay reachable under
  their real identity in reads, listings, and search
- Derive finalization touched sources from composed-source diffs and
  recover deleted files' declared names from the pre-change commit
  instead of parsing hash-derived filenames
- Resolve revert/rollback paths against history (listFilesAtCommit) so
  human-renamed files are restored where they lived at preHead
- Validate ingest sql_execution through the daemon's sqlglot
  validateReadOnly in the connection's dialect, sharing one
  driver-to-dialect map (sql-analysis/dialect.ts) across MCP and ingest
- Harden the local read-only SQL backstop: accept leading comments,
  reject smuggled second statements, and strip trailing
  semicolons/comments before row-limit wrapping
2026-06-10 14:06:13 +02:00
Andrey Avtomonov
6b2f7c3365
fix(cli): ensure git committer identity during ktx setup (#276)
* fix(cli): ensure git committer identity during ktx setup

ktx setup threw "Failed to initialize git repository" when the project
directory was already a git repo with no commits and the machine had no
configured git identity (e.g. a fresh Mac with no ~/.gitconfig).
GitService only set the identity on the path where it created the repo
itself, so the bootstrap commit had no resolvable committer.

Carry ktx's identity via GIT_AUTHOR/GIT_COMMITTER env on the shared
git client so every commit succeeds regardless of whether ktx created
the repo, without mutating the user's repo config. Also preserve the
underlying git error when rethrowing so the failure is diagnosable in
telemetry and actionable for the user.

* chore: sync uv.lock ktx-daemon and ktx-sl versions to 0.10.0
2026-06-09 12:10:02 +02:00
Andrey Avtomonov
c3d8cedb0b
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor

* feat(cli): wire ingest rate-limit config

* feat(cli): report provider rate-limit signals

* feat(cli): show ingest rate-limit waits

* fix(cli): complete rate-limit event coverage

* fix(cli): abort ingest provider calls cleanly

* fix(cli): propagate ingest cancellation

* fix(cli): reject pre-aborted ingest rate-limit waits

* fix(cli): honor Claude rate-limit reset waits

* fix(cli): retry thrown Codex rate-limit failures

* fix(cli): type Claude rate-limit result details

* fix(cli): emit ingest rate-limit countdowns from rejected signals

* fix(cli): report ai sdk rate-limit header utilization

* fix(cli): gate LLM rate-limit retries on the governor budget

The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.

Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.

Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
Andrey Avtomonov
56985b7e09
test: split cli tests from source tree (#216)
* feat(cli): define full warehouse dialect contract

* test(cli): keep dialect edge tests focused

* fix(cli): stabilize dialect contract foundation

* refactor(connectors): own read-only query preparation

* refactor(connectors): resolve dialects through registry

* refactor(connectors): keep concrete dialect classes internal

* chore(workspace): enforce dialect import boundary

* refactor(cli): resolve relationship dialect at scan boundary

* refactor(cli): use dialect display parsing for entity details

* refactor(cli): use dialect display parsing for warehouse catalog

* refactor(cli): use dialect SQL in relationship workflows

* test(cli): verify solid dialect scan workflow closure

* test: split cli tests from source tree

* refactor(cli): standardize BigQuery scope listing

* feat(sqlite): implement connector scope listing

* test(connectors): cover required table listing

* feat(cli): add warehouse driver registry

* refactor(setup): route scope discovery through driver registry

* refactor(cli): route local query execution through driver registry

* refactor(historic-sql): route dialect support through driver registry

* refactor(cli): test warehouse connections through driver registry

* fix(cli): close driver registry type export gaps

* Improve setup daemon diagnostics

* refactor(setup): centralize rail-prefixed diagnostics + query-history fallback

Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput
into clack.ts so the setup wizard, managed daemons, and embedding/agent steps
share one rail-formatted writer. setup-databases.ts also adds a
"disable query history and retry" option when the schema-context build fails
and query history is the likely culprit, surfaced via a new
failed-query-history-unavailable status.

* fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match

The setup picker's KtxTableListEntry was a 2-level { schema, name }, so
qualifiedTableId always wrote db.name into enabled_tables. When BigQuery,
Snowflake, or SQL Server later ran fast ingest, their introspect step filtered
the scope set with scopedTableNames(scope, { catalog: projectId|database, db })
— catalog was non-null on the introspect side but null in the scope refs, so
every entry was rejected, the live-database adapter staged zero table files,
and detect() failed with 'Adapter "live-database" did not recognize fetched
source output'.

Align the picker boundary with the canonical 3-level KtxTableRef:

- Add catalog: string | null to KtxTableListEntry.
- BigQuery/Snowflake/SQL Server listTables populate catalog from the
  resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null.
- qualifiedTableId emits catalog.schema.name when catalog is non-null
  (resolveEnabledTables already accepts the 3-part shape) and
  schemasFromEnabledTables now goes through parseDottedTableEntry so it
  recovers the schema correctly from both 2-part and 3-part entries.
- Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker
  reuse.

Update listTables expectations in all seven connector tests and the setup /
picker test fixtures. Add a picker regression test that covers the
catalog-bearing round-trip (save + refine).

* fix(cli): allow debug telemetry under opt-out env
2026-05-26 08:49:05 +02:00