Commit graph

5 commits

Author SHA1 Message Date
Matt Senick (Sigma)
acd20ac248
feat(sigma): add Sigma Computing context-source adapter (#316)
* feat(sigma): add Sigma Computing context-source adapter

Closes #168

Adds a full ingest adapter for Sigma Computing so `ktx ingest` can pull
data model specs and workbook summaries into the ktx context layer. The
implementation follows the same fetch → chunk → project → LLM pattern
used by the Looker, Metabase, and MetricFlow adapters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sigma): address PR review comments

- Remove manifest from rawFiles; moves to peerFileIndex so fetchedAt
  changes don't mark all work units dirty every run
- Fix workbookFilter.updatedSince eviction bug: fetch full universe first,
  apply filter client-side, evict only on archived/deleted
- Remove measure projection entirely; project() writes measures: [] and
  the sigma_ingest skill surfaces Lookup/aggregation formulas as wiki prose
- Remove joins projection (v1 limitation); project() writes joins: [] and
  Lookup relationships are described in wiki prose instead
- Remove write-back dead code: createDataModel, updateDataModel,
  SigmaDataModelPushResult, mutate/post/put
- Fix emitBatches notes pluralization bug ('2 data modelss' → '2 data models')
- Add tokenInflight dedup on ensureToken to coalesce concurrent auth requests
- Retry spec fetch when existing staged spec is null (transient failure cache)
- Drop unused WorkbookFilter import from client-port.ts
- Note in docs that joins are not projected from Sigma data models in this release

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* updates

* fix(sigma): restore sigma in local adapter test + small cleanups

The gdrive↔sigma merge dropped 'sigma' from the expected adapter source
list in local-adapters.test.ts while keeping gdrive, so the slow TS suite
failed even though the source registers both. Add 'sigma' back at its
registration position (after metabase, before gdrive).

Also:
- Move the orphaned SigmaPullConfig docstring onto the schema it documents
  and drop the stale BullMQ reference (standalone ktx has no BullMQ; the
  config lives in the ingest job's bundleRef.config).
- Drop an O(n^2) find() round-trip in fetch() when building the active
  data-model list; filter once and reuse for the eviction id set.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
Co-authored-by: Luca Martial <48870843+luca-martial@users.noreply.github.com>
2026-07-01 01:14:57 +02:00
ARYAN
5645dc4d28
Add gdrive context source adapter (#209)
* Add gdrive context source adapter

* feat(gdrive): normalize internal doc links, tabs, and header/footer structure

* fix(gdrive): reject generic source credential flags

* test(gdrive): include local adapter in expected list

* fix(gdrive): remove dead exports and silence false positive secret checks

* fix(setup): restore notion source auth flow
2026-06-27 23:41:32 +02:00
Andrey Avtomonov
00cdf2de90
refactor: enforce ktx naming and AGENTS.md compliance sweep (#289)
Align the tree with AGENTS.md/CLAUDE.md conventions:

- Rewrite user-facing strings, docs, and tests to lowercase `ktx`
  (no bare uppercase `KTX` tokens remain outside literal identifiers).
- Drop the legacy `historicSql` migration path and its now-unused
  helpers, per the no-backward-compat rule.
- Remove `as unknown as` / `any` casts: narrow `BaseTool` generics to
  `z.ZodObject`, add a typed `createLookerClient`, and delete the dead
  `getParametersSchema`/`toAnthropicFormat` pre-AI-SDK helpers.
- Use `InvalidArgumentError` for Commander parse failures.
- Finish the adapter→connector prose conversion in the `ktx.yaml` docs
  while keeping the literal `adapters` config key.
2026-06-11 13:49:45 +02:00
Andrey Avtomonov
f3f893bf01
fix: read semantic sources safely (#284)
* fix: read semantic sources safely

* test: retarget reindex per-scope error case to a broken manifest

Reading a broken standalone source was made non-fatal in de1f1a8d (it is
surfaced for repair instead of throwing), so the reindex per-scope error
test no longer captured an error. Point it at a corrupt manifest shard,
which is the remaining fatal read failure the per-scope catch must
isolate, and assert the captured error names the offending file.

* fix(sl): decouple semantic-layer file names from warehouse naming rules

The in-file `name:` field is now the sole source identity; the filename is
a derived label that never participates in identity. This removes the
"Unsafe semantic-layer source name" failure class entirely: any warehouse
identifier (Snowflake's uppercase SIGNED_UP, EVENT$LOG, dotted names) can
be read, overlaid, edited, and deleted.

- New `source-files.ts`: one total filename derivation (safe lowercase
  names verbatim; otherwise slug + sha256-hash suffix, immune to
  case-insensitive-filesystem collisions) and one by-name file resolver.
- Reads resolve by name everywhere; the path-from-name fast path and
  `assertSafeSourceName` are gone.
- Writes resolve-then-write: rewrites land on the file that declares the
  name (human renames survive); new sources get a derived filename; a
  derived path occupied by a different source fails instead of clobbering.
- `readSourceFile` returns null for missing files instead of forcing every
  caller to launder IO errors; `deleteSource` distinguishes manifest-backed
  sources from not-found instead of silently succeeding.
- `sl_write_source` accepts verbatim warehouse identifiers (snake_case is
  now a recommendation for new sources) and rejects sourceName/source.name
  mismatches; `sl_edit_source` rejects name-changing edits.
- Ingest projection commits, gate-repair allowlists, and touched-source
  derivation use resolved paths / in-file names instead of interpolating
  `<connId>/<name>.yaml`.
- Collapsed the five parallel path derivations and duplicated path-token
  helpers onto the shared module; dropped dead service methods.

* fix(sl): resolve sources by declared name end-to-end and gate warehouse SQL with the parser-backed validator

- Key broken/renamed semantic-layer files by their recoverable in-file
  name (slSourceNameForFile) so mid-edit sources stay reachable under
  their real identity in reads, listings, and search
- Derive finalization touched sources from composed-source diffs and
  recover deleted files' declared names from the pre-change commit
  instead of parsing hash-derived filenames
- Resolve revert/rollback paths against history (listFilesAtCommit) so
  human-renamed files are restored where they lived at preHead
- Validate ingest sql_execution through the daemon's sqlglot
  validateReadOnly in the connection's dialect, sharing one
  driver-to-dialect map (sql-analysis/dialect.ts) across MCP and ingest
- Harden the local read-only SQL backstop: accept leading comments,
  reject smuggled second statements, and strip trailing
  semicolons/comments before row-limit wrapping
2026-06-10 14:06:13 +02:00
Andrey Avtomonov
56985b7e09
test: split cli tests from source tree (#216)
* feat(cli): define full warehouse dialect contract

* test(cli): keep dialect edge tests focused

* fix(cli): stabilize dialect contract foundation

* refactor(connectors): own read-only query preparation

* refactor(connectors): resolve dialects through registry

* refactor(connectors): keep concrete dialect classes internal

* chore(workspace): enforce dialect import boundary

* refactor(cli): resolve relationship dialect at scan boundary

* refactor(cli): use dialect display parsing for entity details

* refactor(cli): use dialect display parsing for warehouse catalog

* refactor(cli): use dialect SQL in relationship workflows

* test(cli): verify solid dialect scan workflow closure

* test: split cli tests from source tree

* refactor(cli): standardize BigQuery scope listing

* feat(sqlite): implement connector scope listing

* test(connectors): cover required table listing

* feat(cli): add warehouse driver registry

* refactor(setup): route scope discovery through driver registry

* refactor(cli): route local query execution through driver registry

* refactor(historic-sql): route dialect support through driver registry

* refactor(cli): test warehouse connections through driver registry

* fix(cli): close driver registry type export gaps

* Improve setup daemon diagnostics

* refactor(setup): centralize rail-prefixed diagnostics + query-history fallback

Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput
into clack.ts so the setup wizard, managed daemons, and embedding/agent steps
share one rail-formatted writer. setup-databases.ts also adds a
"disable query history and retry" option when the schema-context build fails
and query history is the likely culprit, surfaced via a new
failed-query-history-unavailable status.

* fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match

The setup picker's KtxTableListEntry was a 2-level { schema, name }, so
qualifiedTableId always wrote db.name into enabled_tables. When BigQuery,
Snowflake, or SQL Server later ran fast ingest, their introspect step filtered
the scope set with scopedTableNames(scope, { catalog: projectId|database, db })
— catalog was non-null on the introspect side but null in the scope refs, so
every entry was rejected, the live-database adapter staged zero table files,
and detect() failed with 'Adapter "live-database" did not recognize fetched
source output'.

Align the picker boundary with the canonical 3-level KtxTableRef:

- Add catalog: string | null to KtxTableListEntry.
- BigQuery/Snowflake/SQL Server listTables populate catalog from the
  resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null.
- qualifiedTableId emits catalog.schema.name when catalog is non-null
  (resolveEnabledTables already accepts the 3-part shape) and
  schemasFromEnabledTables now goes through parseDottedTableEntry so it
  recovers the schema correctly from both 2-part and 3-part entries.
- Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker
  reuse.

Update listTables expectations in all seven connector tests and the setup /
picker test fixtures. Add a picker regression test that covers the
catalog-bearing round-trip (save + refine).

* fix(cli): allow debug telemetry under opt-out env
2026-05-26 08:49:05 +02:00