apunkt/ktx - bitfreedom.net: free all bits, everywhere

apunkt/ktx

mirror of https://github.com/Kaelio/ktx.git synced 2026-07-25 12:01:03 +02:00

Author	SHA1	Message	Date
Andrey Avtomonov	852ac8a836	fix(git): scope the dirty-main squash guard to staged changes only The previous guard rejected any uncommitted tracked change (git status --untracked-files=no), which also caught unstaged edits like a ktx.yaml that setup writes during the flow and commits only after the context build — so the guard wrongly blocked setup context builds and ingest (8 local-bundle-ingest cases failed with 'uncommitted changes (ktx.yaml)'). The actual hazard is the index, not the working tree: 'git commit' captures only staged changes, and the auto_commit:false residue is staged by 'git merge --squash'. Narrow the check to 'git diff --cached' so only pre-existing staged changes are refused; unstaged working-tree edits proceed untouched (never committed by the squash). Adds a regression test that an unstaged tracked change does not block the merge and is neither committed nor discarded.	2026-06-09 14:55:08 +02:00
Andrey Avtomonov	f446d207ba	fix(git): refuse squash-merge into a dirty main working tree The auto_commit:false path (stageSquashMergeIntoMain) leaves main staged, but the shared squash helper assumed a clean target. A later ingest/memory run merging into that dirty index would 'git commit' the prior run's staged files under the new run's commit (contamination), and conflict cleanup's 'reset --hard HEAD' would discard them (data loss). Guard applySquashToIndex: if the target worktree has uncommitted tracked changes, refuse before merging and return a 'dirty' result (untracked/gitignored files are ignored — the squash never commits them). Callers surface it cleanly: the bundle runner fails the run with an actionable message; the memory agent rolls back its eager DB writes (like a conflict) so the DB never gets ahead of main. Main is left untouched in every case.	2026-06-09 14:27:53 +02:00
Andrey Avtomonov	a02fcab487	fix(ingest): honor storage.git.auto_commit and memory.auto_commit Both documented flags were read only for status display; every ingest path squash-committed to main unconditionally, so setting either to false was a silent no-op (the reported symptom: 'Memory ingest (external_ingest): ...' commits despite memory.auto_commit: false). Gate the commit at the squash-merge onto main — the one point where ingest work becomes a permanent commit (intermediate session-worktree commits must still happen for the squash to collapse). When auto-commit is off, apply the squash to main's working tree and leave it staged instead of committing, so the run is never silently discarded: - GitService.stageSquashMergeIntoMain: shares the merge core with squashMergeIntoMain but stops before committing and returns the staged tree SHA (a valid diff/read ref). - memory.auto_commit gates MemoryAgentService (its DB writes are eager, so the staged files stay consistent); the commit-message job is skipped. - storage.git.auto_commit gates IngestBundleRunner; the wiki index is reconciled from the staged tree via the existing syncFromCommit (git diff/show accept a write-tree ref), and SL reindex already reads from files. Config descriptions now state precisely what each flag gates and the staged semantics when false.	2026-06-09 12:44:58 +02:00
Andrey Avtomonov	1a6da14f62	fix(git): give each ktx project its own git repo root A ktx project assumes its config dir is its own git working-tree root: writes, session worktrees, squash-merges, and reindex scans all resolve relative to it. GitService.initialize() gated on checkIsRepo() (IN_TREE), which is also satisfied by an enclosing repository — so a project nested inside another git working tree silently operated against the outer repo. Worktree/ingest writes landed at the outer root (e.g. <outer>/wiki/global/) while reindex scanned <projectDir>/wiki/global/, so the wiki was seeded but never indexed: wiki_search returned nothing and knowledge_pages stayed empty, with no error. Semantic-layer and raw-sources had the same divergence. Gate initialization on checkIsRepo('root') instead: require the repo root to be the config dir itself, and initialize a dedicated repository there when it is not (logging clearly when nesting inside an existing repo). This restores the one-repo-per-project invariant at the shared git layer, fixing all artifacts at once, and keeps ktx's commits out of the enclosing repository.	2026-06-09 12:14:07 +02:00
Andrey Avtomonov	c3d8cedb0b	feat(cli): add ingest LLM rate-limit governor with paced retries (#261 ) * feat(cli): add ingest rate limit governor * feat(cli): wire ingest rate-limit config * feat(cli): report provider rate-limit signals * feat(cli): show ingest rate-limit waits * fix(cli): complete rate-limit event coverage * fix(cli): abort ingest provider calls cleanly * fix(cli): propagate ingest cancellation * fix(cli): reject pre-aborted ingest rate-limit waits * fix(cli): honor Claude rate-limit reset waits * fix(cli): retry thrown Codex rate-limit failures * fix(cli): type Claude rate-limit result details * fix(cli): emit ingest rate-limit countdowns from rejected signals * fix(cli): report ai sdk rate-limit header utilization * fix(cli): gate LLM rate-limit retries on the governor budget The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up to 6-7 times with no backoff when constructed without a RateLimitGovernor (scan, memory, setup) or with pacing disabled, ignoring Retry-After and worsening the limit. The outer retry loop only cooperates with the governor's pause, so without active pacing there is no backoff to apply. Route the retry bound through a single source: RateLimitGovernor .maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1 (no outer retry) when absent or disabled. All three runtimes (ai-sdk, codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries) still handles transient 429s. Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.	2026-06-05 12:10:27 +02:00
Andrey Avtomonov	56985b7e09	test: split cli tests from source tree (#216 ) * feat(cli): define full warehouse dialect contract * test(cli): keep dialect edge tests focused * fix(cli): stabilize dialect contract foundation * refactor(connectors): own read-only query preparation * refactor(connectors): resolve dialects through registry * refactor(connectors): keep concrete dialect classes internal * chore(workspace): enforce dialect import boundary * refactor(cli): resolve relationship dialect at scan boundary * refactor(cli): use dialect display parsing for entity details * refactor(cli): use dialect display parsing for warehouse catalog * refactor(cli): use dialect SQL in relationship workflows * test(cli): verify solid dialect scan workflow closure * test: split cli tests from source tree * refactor(cli): standardize BigQuery scope listing * feat(sqlite): implement connector scope listing * test(connectors): cover required table listing * feat(cli): add warehouse driver registry * refactor(setup): route scope discovery through driver registry * refactor(cli): route local query execution through driver registry * refactor(historic-sql): route dialect support through driver registry * refactor(cli): test warehouse connections through driver registry * fix(cli): close driver registry type export gaps * Improve setup daemon diagnostics * refactor(setup): centralize rail-prefixed diagnostics + query-history fallback Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput into clack.ts so the setup wizard, managed daemons, and embedding/agent steps share one rail-formatted writer. setup-databases.ts also adds a "disable query history and retry" option when the schema-context build fails and query history is the likely culprit, surfaced via a new failed-query-history-unavailable status. * fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match The setup picker's KtxTableListEntry was a 2-level { schema, name }, so qualifiedTableId always wrote db.name into enabled_tables. When BigQuery, Snowflake, or SQL Server later ran fast ingest, their introspect step filtered the scope set with scopedTableNames(scope, { catalog: projectId\|database, db }) — catalog was non-null on the introspect side but null in the scope refs, so every entry was rejected, the live-database adapter staged zero table files, and detect() failed with 'Adapter "live-database" did not recognize fetched source output'. Align the picker boundary with the canonical 3-level KtxTableRef: - Add catalog: string \| null to KtxTableListEntry. - BigQuery/Snowflake/SQL Server listTables populate catalog from the resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null. - qualifiedTableId emits catalog.schema.name when catalog is non-null (resolveEnabledTables already accepts the 3-part shape) and schemasFromEnabledTables now goes through parseDottedTableEntry so it recovers the schema correctly from both 2-part and 3-part entries. - Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker reuse. Update listTables expectations in all seven connector tests and the setup / picker test fixtures. Add a picker regression test that covers the catalog-bearing round-trip (save + refine). * fix(cli): allow debug telemetry under opt-out env	2026-05-26 08:49:05 +02:00

6 commits