mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-07 07:55:13 +02:00
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm * refactor(workspace): rewrite @ktx/llm imports to relative paths * refactor(workspace): fold internal packages into cli * chore(workspace): gate dead-code with knip production mode Turn on production-mode knip plus an autofix run in pre-commit and the `pnpm dead-code` script, document the `/** @internal */` convention for test-only exports in AGENTS.md, annotate test-only exports across the CLI with that JSDoc, and drop dead exports/wrappers the new gate surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`, `createLocalScanEnrichmentProvidersFromConfig`, `PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports). Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit production entries so cross-package barrel leaks are caught. * refactor(cli): delete internal barrel index.ts files The 34 `index.ts` re-export barrels inside `packages/cli/src/` were holdovers from the pre-fold multi-workspace structure. Post-fold-in they served no production purpose: external consumers go through the single package main entry, and in-repo callers mostly imported through them only because the path was short. Internally, knip flagged most barrel re-exports as production-dead (only reached via tests). This change: - Deletes every internal barrel except `packages/cli/src/index.ts` (the published package entry). - Rewrites ~270 source/test files to import each name directly from the file that defines it. - Moves `tools/warehouse-verification/index.ts` to `create-warehouse-verification-tools.ts` (the function it defined locally) and updates its single consumer. - Renames `search/backend-conformance.ts` → `.test-utils.ts` to match the existing test-helper file convention. - Deletes 13 dead test-only chains (dbt-descriptions/*, live-database/extracted-schema, live-database/structural-sync, relationship-* feedback/review chain) plus their tests and a cascading orphan integration test. - Updates test mocks that pointed at deleted barrel paths (notion-client, connector barrels in scan/local-scan-connectors tests) to mock the source files instead. - Points the maintainer benchmark script (`scripts/relationship-benchmark-report.mjs`) at source files instead of `dist/context/scan/index.js`. - Drops the barrel `!` entries from `knip.json`; adds explicit production entries only for the benchmark code reached via dist by the maintainer script. Net: 413 files changed, ~1.2k insertions, ~9.4k deletions. `pnpm run dead-code` (Biome + knip default + knip production) and `pnpm run type-check` are clean; 2277 tests pass. * refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly Promote the CLI workspace package to the public name `@kaelio/ktx` and drop the separate `scripts/build-public-npm-package.mjs` wrapper. The CLI package is now publishable in place (`publishConfig.access: public`, `provenance: true`), so artifact packing uses `pnpm pack` against `packages/cli/` instead of assembling a parallel package tree. Updates all workspace filter invocations, docs, tests, and release readiness checks to reference the new package name, and folds the tarball-name helper into `scripts/public-npm-release-metadata.mjs`. * docs: align "agent clients" and "data agents" terminology Replace "client agents" with "agent clients" and "database agents" with "data agents" across AGENTS.md, README.md, the docs-site copy, and the matching setup-agents test description, matching the canonical vocabulary in docs/terminology.md. Also moves packages/cli/tsconfig.json's tsBuildInfoFile from node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive node_modules reinstalls. * refactor(release): single source of truth for package version Make packages/cli/package.json the single source of truth for the @kaelio/ktx version. publicNpmPackageVersion() now reads it directly, so artifact filenames, release-readiness checks, and the Python wheel version all derive from one field. The duplicate release-policy.json.publicNpmPackageVersion is removed. Previously the two fields could drift: tarballs were named kaelio-ktx-0.4.1.tgz while internally containing @kaelio/ktx@0.0.0-private. - update-public-release-version.mjs rewrites both Python pyproject.toml files (ktx-daemon, ktx-sl) alongside the npm package.jsons, normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2). - semantic-release-config.cjs adds the two pyproject.toml files to @semantic-release/git assets so the release commit back to main carries every version source in lockstep. - The six "?? '0.0.0-private'" fallback literals across the CLI are replaced with "?? getKtxCliPackageInfo().version", and createDefaultKtxMcpServer makes its version arg required. - docs/release.md describes the actual commit-back model: the dev tree always reflects the most recent release; no sentinel pin to maintain. Verified: pnpm run artifacts:build now produces kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with @kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and 2287 vitests + 173 script tests pass. * refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and scan command entrypoints so tests can stub them, and teach resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime feature when ktx.yaml selects sentence-transformers. * chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal Both symbols are consumed only by status-project.test.ts. Annotating with /** @internal */ keeps knip's production-mode check clean without changing runtime behavior. * fix(cli): use real package metadata in print-command-tree The stubbed package name embedded a forbidden product identifier that tripped the boundary check in CI. Read the metadata from package.json instead — keeps the rendered tree unchanged and removes a duplicate source of truth. * feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer source counts, computed with `SUM(embedding_json IS NOT NULL)` over `knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to "Wiki" (canonical per `docs/terminology.md`) and rename the matching `localStats.knowledgePages` field to `localStats.wikiPages`. Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those duplicated the per-surface rows above. Disk now reports only actual byte usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` / `semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry` helpers, and the `filter` arg on `summarizeDir` are removed.
252 lines
8.9 KiB
Text
252 lines
8.9 KiB
Text
---
|
|
title: Contributing
|
|
description: Contribute to ktx through code, docs, connectors, and examples.
|
|
---
|
|
|
|
**ktx** is an open-source context layer for data agents. The project welcomes
|
|
focused contributions that improve setup, integrations, CLI behavior,
|
|
documentation, connector coverage, and examples.
|
|
|
|
## Where to start
|
|
|
|
| Goal | Start here |
|
|
|------|------------|
|
|
| Prepare a local development checkout | [Development setup](#development-setup) |
|
|
| Understand the workspace layout | [Repository structure](#repository-structure) |
|
|
| Run verification before a pull request | [Running tests](#running-tests) |
|
|
| Add a database connector | [Adding a connector](#adding-a-connector) |
|
|
| Update docs for a user-visible CLI or setup change | [PR guidelines](#pr-guidelines) |
|
|
|
|
## Contribution areas
|
|
|
|
| Area | Good first context |
|
|
|------|--------------------|
|
|
| CLI and setup | `packages/cli`, especially setup steps, command definitions, status checks, and smoke tests |
|
|
| Context engine | `packages/cli/src/context`, including project config, ingest orchestration, and semantic search |
|
|
| Connectors | `packages/cli/src/connectors/<driver>`, plus connector-specific tests and integration docs |
|
|
| Python semantic layer | `python/ktx-sl` for planning and SQL compilation |
|
|
| **ktx** daemon | `python/ktx-daemon` for the portable runtime API |
|
|
| Documentation | `docs-site/content/docs` for public docs and `docs-site/tests` for docs behavior |
|
|
|
|
## Development setup
|
|
|
|
This page is for contributors working on the **ktx** repository. To install **ktx** for
|
|
an analytics project, use the published
|
|
[`@kaelio/ktx`](https://www.npmjs.com/package/@kaelio/ktx) package in the
|
|
[Quickstart](/docs/getting-started/quickstart).
|
|
|
|
### Prerequisites
|
|
|
|
- **Node.js 22+** and **pnpm** - for the TypeScript workspace
|
|
- **Python 3.11+** and **uv** - for the Python semantic layer and daemon
|
|
- **Git** - for version control
|
|
|
|
### Clone and install
|
|
|
|
```bash
|
|
git clone https://github.com/kaelio/ktx.git
|
|
cd ktx
|
|
pnpm install
|
|
uv sync --all-groups
|
|
```
|
|
|
|
`pnpm install` sets up the TypeScript workspace.
|
|
`uv sync --all-groups` installs Python dependencies for the semantic layer and
|
|
daemon, including dev and test groups.
|
|
|
|
### Build
|
|
|
|
```bash
|
|
pnpm run build
|
|
```
|
|
|
|
This builds the TypeScript package. You can also build the package directly:
|
|
|
|
```bash
|
|
pnpm --filter @kaelio/ktx run build
|
|
```
|
|
|
|
### Link the CLI for local testing
|
|
|
|
```bash
|
|
pnpm run setup:dev
|
|
pnpm run link:dev
|
|
```
|
|
|
|
This makes the `ktx-dev` command available globally, pointing at your local
|
|
build. Use this development binary when you need to test unpublished repository
|
|
changes.
|
|
|
|
## Repository structure
|
|
|
|
**ktx** is a pnpm + uv workspace. TypeScript source lives in `packages/cli`,
|
|
and Python projects live in `python/`.
|
|
|
|
```text
|
|
packages/
|
|
cli/ # CLI package and published npm package source
|
|
src/context/ # Core context engine (scan, ingest, MCP, semantic layer)
|
|
src/llm/ # LLM client abstraction
|
|
src/connectors/ # Database connectors
|
|
|
|
python/
|
|
ktx-sl/ # Semantic layer - grain-aware query planning and SQL compilation
|
|
ktx-daemon/ # Daemon - portable API server around the semantic layer
|
|
|
|
examples/ # Example projects and fixtures
|
|
scripts/ # Workspace scripts (benchmarks, verification, release)
|
|
docs-site/ # Documentation site (Fumadocs)
|
|
```
|
|
|
|
The TypeScript package is ESM (`"type": "module"`) and uses `NodeNext` module
|
|
resolution. The Python projects use `pyproject.toml` for dependency management.
|
|
|
|
## Running tests
|
|
|
|
### TypeScript
|
|
|
|
```bash
|
|
# Run all tests
|
|
pnpm run test
|
|
|
|
# Run tests for the TypeScript package
|
|
pnpm --filter @kaelio/ktx run test
|
|
|
|
# Type-check all packages
|
|
pnpm run type-check
|
|
|
|
# Type-check the TypeScript package
|
|
pnpm --filter @kaelio/ktx run type-check
|
|
|
|
# CLI smoke test
|
|
pnpm --filter @kaelio/ktx run smoke
|
|
```
|
|
|
|
### Python
|
|
|
|
```bash
|
|
# Run all Python tests
|
|
uv run pytest -q
|
|
|
|
# Semantic layer tests
|
|
uv run pytest python/ktx-sl/tests -q
|
|
|
|
# Daemon tests
|
|
uv run pytest python/ktx-daemon/tests -q
|
|
```
|
|
|
|
### Pre-commit checks
|
|
|
|
After modifying Python files, run pre-commit on the changed files:
|
|
|
|
```bash
|
|
uv run pre-commit run --files python/ktx-sl/src/changed_file.py
|
|
```
|
|
|
|
### Full verification
|
|
|
|
For cross-cutting changes that affect package exports or shared contracts:
|
|
|
|
```bash
|
|
pnpm run build
|
|
pnpm run type-check
|
|
pnpm run test
|
|
uv run pytest -q
|
|
```
|
|
|
|
## Adding a connector
|
|
|
|
Database connectors live in `packages/cli/src/connectors/<driver>/`. Each
|
|
connector implements the `KtxScanConnector` interface from the internal context
|
|
modules.
|
|
|
|
### Step 1: Scaffold the connector
|
|
|
|
Create a new directory at `packages/cli/src/connectors/<driver>/` with:
|
|
|
|
```text
|
|
packages/cli/src/connectors/<driver>/
|
|
index.ts # Internal connector exports
|
|
connector.ts # KtxScanConnector implementation
|
|
dialect.ts # SQL dialect handling
|
|
```
|
|
|
|
Add any connector-specific npm dependency to `packages/cli/package.json`.
|
|
|
|
### Step 2: Implement the connector
|
|
|
|
Your connector class must implement `KtxScanConnector`, which requires:
|
|
|
|
- **`id`** - a string identifier, typically `"<driver>:<connectionId>"`
|
|
- **`driver`** - the `KtxConnectionDriver` value for your database
|
|
- **`capabilities`** - a `KtxConnectorCapabilities` object declaring what your connector supports: `tableSampling`, `columnSampling`, `columnStats`, `readOnlySql`, `nestedAnalysis`, `eventStreamDiscovery`, `formalForeignKeys`, `estimatedRowCounts`
|
|
- **`introspect()`** - discovers tables, columns, types, and constraints, returning a `KtxSchemaSnapshot`
|
|
|
|
Optional methods for richer scanning:
|
|
|
|
- **`sampleColumn()`** - sample values from a specific column
|
|
- **`sampleTable()`** - sample rows from a table
|
|
- **`columnStats()`** - compute column statistics
|
|
- **`executeReadOnly()`** - execute arbitrary read-only SQL
|
|
|
|
### Step 3: Add a dialect
|
|
|
|
The dialect class handles database-specific concerns: identifier quoting, type
|
|
mapping from native types to normalized types, and query generation for sampling
|
|
and statistics.
|
|
|
|
### Step 4: Wire it up
|
|
|
|
Register the new connector in `packages/cli/src/local-scan-connectors.ts` and
|
|
`packages/cli/src/local-adapters.ts` so the CLI and scan engine can instantiate
|
|
it. Keep runtime loading dynamic when the connector is optional.
|
|
|
|
### Step 5: Test
|
|
|
|
```bash
|
|
pnpm --filter @kaelio/ktx run build
|
|
pnpm --filter @kaelio/ktx run type-check
|
|
pnpm --filter @kaelio/ktx run test
|
|
```
|
|
|
|
Use `packages/cli/src/connectors/sqlite/` as a minimal reference and
|
|
`packages/cli/src/connectors/postgres/` as a full-featured one.
|
|
|
|
## Code conventions
|
|
|
|
- **TypeScript**: strict types, no `any`, no `as unknown as`. Use `zod` schemas for runtime validation at CLI and config boundaries. Follow the `camelCaseSchema` / `PascalCaseType` naming convention for Zod schemas and inferred types.
|
|
- **Python**: type hints on all new code, `pathlib` over `os.path`, explicit exception types over broad `except Exception`, `logger.exception()` for caught exceptions. Use `sqlglot` for SQL parsing - never regex.
|
|
- **Dependencies**: `pnpm` for Node packages, `uv` for Python.
|
|
- **Dead code**: remove it. Don't leave commented-out code, unused wrappers, or empty directories.
|
|
|
|
## PR guidelines
|
|
|
|
Before submitting a pull request:
|
|
|
|
1. **Run the relevant checks** - at minimum, `pnpm run type-check` and `pnpm run test` for TypeScript changes, `uv run pytest -q` and `uv run pre-commit run --files [FILES]` for Python changes.
|
|
2. **Build if you changed exports** - run `pnpm run build` to verify package exports and `dist/` expectations still align.
|
|
3. **Keep changes focused** - one logical change per PR. Don't bundle unrelated refactors.
|
|
4. **Follow existing patterns** - match the style and conventions of surrounding code. The codebase favors explicit over clever.
|
|
5. **Update docs for user-visible changes** - update `docs-site/content/docs/` when setup, CLI, configuration, or integration behavior changes.
|
|
6. **Don't commit artifacts** - `node_modules/`, `.venv/`, `dist/`, coverage output, and local databases should not be committed.
|
|
|
|
For larger features or architectural changes, open an issue first to discuss the
|
|
approach.
|
|
|
|
## Agent usage notes
|
|
|
|
Use this page when an agent is modifying the **ktx** repository itself rather than
|
|
using **ktx** in an analytics project.
|
|
|
|
| Agent task | Command or section |
|
|
|------------|--------------------|
|
|
| Prepare the workspace | `pnpm install`, `pnpm run setup:dev`, `uv sync --all-groups` |
|
|
| Verify TypeScript changes | `pnpm run type-check`, `pnpm run test`, or package-filtered equivalents |
|
|
| Verify Python changes | `uv run pytest -q` and `uv run pre-commit run --files <files>` |
|
|
| Add a connector | [Adding a connector](#adding-a-connector) |
|
|
| Check style expectations | [Code conventions](#code-conventions) |
|
|
|
|
Common recovery path: if a check fails because generated files or local
|
|
runtimes are missing, run the setup commands first. If a check fails because of
|
|
a real type, lint, or test error, fix the source file and rerun the smallest
|
|
failing check before broadening verification.
|