ktx/docs-site/content/docs/community/contributing.mdx
Andrey Avtomonov 2366b00301
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm

* refactor(workspace): rewrite @ktx/llm imports to relative paths

* refactor(workspace): fold internal packages into cli

* chore(workspace): gate dead-code with knip production mode

Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.

* refactor(cli): delete internal barrel index.ts files

The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).

This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
  (the published package entry).
- Rewrites ~270 source/test files to import each name directly from
  the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
  `create-warehouse-verification-tools.ts` (the function it defined
  locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
  the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
  live-database/extracted-schema, live-database/structural-sync,
  relationship-* feedback/review chain) plus their tests and a
  cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
  (notion-client, connector barrels in scan/local-scan-connectors
  tests) to mock the source files instead.
- Points the maintainer benchmark script
  (`scripts/relationship-benchmark-report.mjs`) at source files
  instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
  production entries only for the benchmark code reached via dist by
  the maintainer script.

Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.

`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.

* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly

Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.

Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.

* docs: align "agent clients" and "data agents" terminology

Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.

Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.

* refactor(release): single source of truth for package version

Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.

Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.

- update-public-release-version.mjs rewrites both Python pyproject.toml
  files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
  normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
  @semantic-release/git assets so the release commit back to main
  carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
  replaced with "?? getKtxCliPackageInfo().version", and
  createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
  always reflects the most recent release; no sentinel pin to
  maintain.

Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.

* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime

Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.

* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal

Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.

* fix(cli): use real package metadata in print-command-tree

The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.

* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts

Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.

Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00

252 lines
8.9 KiB
Text

---
title: Contributing
description: Contribute to ktx through code, docs, connectors, and examples.
---
**ktx** is an open-source context layer for data agents. The project welcomes
focused contributions that improve setup, integrations, CLI behavior,
documentation, connector coverage, and examples.
## Where to start
| Goal | Start here |
|------|------------|
| Prepare a local development checkout | [Development setup](#development-setup) |
| Understand the workspace layout | [Repository structure](#repository-structure) |
| Run verification before a pull request | [Running tests](#running-tests) |
| Add a database connector | [Adding a connector](#adding-a-connector) |
| Update docs for a user-visible CLI or setup change | [PR guidelines](#pr-guidelines) |
## Contribution areas
| Area | Good first context |
|------|--------------------|
| CLI and setup | `packages/cli`, especially setup steps, command definitions, status checks, and smoke tests |
| Context engine | `packages/cli/src/context`, including project config, ingest orchestration, and semantic search |
| Connectors | `packages/cli/src/connectors/<driver>`, plus connector-specific tests and integration docs |
| Python semantic layer | `python/ktx-sl` for planning and SQL compilation |
| **ktx** daemon | `python/ktx-daemon` for the portable runtime API |
| Documentation | `docs-site/content/docs` for public docs and `docs-site/tests` for docs behavior |
## Development setup
This page is for contributors working on the **ktx** repository. To install **ktx** for
an analytics project, use the published
[`@kaelio/ktx`](https://www.npmjs.com/package/@kaelio/ktx) package in the
[Quickstart](/docs/getting-started/quickstart).
### Prerequisites
- **Node.js 22+** and **pnpm** - for the TypeScript workspace
- **Python 3.11+** and **uv** - for the Python semantic layer and daemon
- **Git** - for version control
### Clone and install
```bash
git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
```
`pnpm install` sets up the TypeScript workspace.
`uv sync --all-groups` installs Python dependencies for the semantic layer and
daemon, including dev and test groups.
### Build
```bash
pnpm run build
```
This builds the TypeScript package. You can also build the package directly:
```bash
pnpm --filter @kaelio/ktx run build
```
### Link the CLI for local testing
```bash
pnpm run setup:dev
pnpm run link:dev
```
This makes the `ktx-dev` command available globally, pointing at your local
build. Use this development binary when you need to test unpublished repository
changes.
## Repository structure
**ktx** is a pnpm + uv workspace. TypeScript source lives in `packages/cli`,
and Python projects live in `python/`.
```text
packages/
cli/ # CLI package and published npm package source
src/context/ # Core context engine (scan, ingest, MCP, semantic layer)
src/llm/ # LLM client abstraction
src/connectors/ # Database connectors
python/
ktx-sl/ # Semantic layer - grain-aware query planning and SQL compilation
ktx-daemon/ # Daemon - portable API server around the semantic layer
examples/ # Example projects and fixtures
scripts/ # Workspace scripts (benchmarks, verification, release)
docs-site/ # Documentation site (Fumadocs)
```
The TypeScript package is ESM (`"type": "module"`) and uses `NodeNext` module
resolution. The Python projects use `pyproject.toml` for dependency management.
## Running tests
### TypeScript
```bash
# Run all tests
pnpm run test
# Run tests for the TypeScript package
pnpm --filter @kaelio/ktx run test
# Type-check all packages
pnpm run type-check
# Type-check the TypeScript package
pnpm --filter @kaelio/ktx run type-check
# CLI smoke test
pnpm --filter @kaelio/ktx run smoke
```
### Python
```bash
# Run all Python tests
uv run pytest -q
# Semantic layer tests
uv run pytest python/ktx-sl/tests -q
# Daemon tests
uv run pytest python/ktx-daemon/tests -q
```
### Pre-commit checks
After modifying Python files, run pre-commit on the changed files:
```bash
uv run pre-commit run --files python/ktx-sl/src/changed_file.py
```
### Full verification
For cross-cutting changes that affect package exports or shared contracts:
```bash
pnpm run build
pnpm run type-check
pnpm run test
uv run pytest -q
```
## Adding a connector
Database connectors live in `packages/cli/src/connectors/<driver>/`. Each
connector implements the `KtxScanConnector` interface from the internal context
modules.
### Step 1: Scaffold the connector
Create a new directory at `packages/cli/src/connectors/<driver>/` with:
```text
packages/cli/src/connectors/<driver>/
index.ts # Internal connector exports
connector.ts # KtxScanConnector implementation
dialect.ts # SQL dialect handling
```
Add any connector-specific npm dependency to `packages/cli/package.json`.
### Step 2: Implement the connector
Your connector class must implement `KtxScanConnector`, which requires:
- **`id`** - a string identifier, typically `"<driver>:<connectionId>"`
- **`driver`** - the `KtxConnectionDriver` value for your database
- **`capabilities`** - a `KtxConnectorCapabilities` object declaring what your connector supports: `tableSampling`, `columnSampling`, `columnStats`, `readOnlySql`, `nestedAnalysis`, `eventStreamDiscovery`, `formalForeignKeys`, `estimatedRowCounts`
- **`introspect()`** - discovers tables, columns, types, and constraints, returning a `KtxSchemaSnapshot`
Optional methods for richer scanning:
- **`sampleColumn()`** - sample values from a specific column
- **`sampleTable()`** - sample rows from a table
- **`columnStats()`** - compute column statistics
- **`executeReadOnly()`** - execute arbitrary read-only SQL
### Step 3: Add a dialect
The dialect class handles database-specific concerns: identifier quoting, type
mapping from native types to normalized types, and query generation for sampling
and statistics.
### Step 4: Wire it up
Register the new connector in `packages/cli/src/local-scan-connectors.ts` and
`packages/cli/src/local-adapters.ts` so the CLI and scan engine can instantiate
it. Keep runtime loading dynamic when the connector is optional.
### Step 5: Test
```bash
pnpm --filter @kaelio/ktx run build
pnpm --filter @kaelio/ktx run type-check
pnpm --filter @kaelio/ktx run test
```
Use `packages/cli/src/connectors/sqlite/` as a minimal reference and
`packages/cli/src/connectors/postgres/` as a full-featured one.
## Code conventions
- **TypeScript**: strict types, no `any`, no `as unknown as`. Use `zod` schemas for runtime validation at CLI and config boundaries. Follow the `camelCaseSchema` / `PascalCaseType` naming convention for Zod schemas and inferred types.
- **Python**: type hints on all new code, `pathlib` over `os.path`, explicit exception types over broad `except Exception`, `logger.exception()` for caught exceptions. Use `sqlglot` for SQL parsing - never regex.
- **Dependencies**: `pnpm` for Node packages, `uv` for Python.
- **Dead code**: remove it. Don't leave commented-out code, unused wrappers, or empty directories.
## PR guidelines
Before submitting a pull request:
1. **Run the relevant checks** - at minimum, `pnpm run type-check` and `pnpm run test` for TypeScript changes, `uv run pytest -q` and `uv run pre-commit run --files [FILES]` for Python changes.
2. **Build if you changed exports** - run `pnpm run build` to verify package exports and `dist/` expectations still align.
3. **Keep changes focused** - one logical change per PR. Don't bundle unrelated refactors.
4. **Follow existing patterns** - match the style and conventions of surrounding code. The codebase favors explicit over clever.
5. **Update docs for user-visible changes** - update `docs-site/content/docs/` when setup, CLI, configuration, or integration behavior changes.
6. **Don't commit artifacts** - `node_modules/`, `.venv/`, `dist/`, coverage output, and local databases should not be committed.
For larger features or architectural changes, open an issue first to discuss the
approach.
## Agent usage notes
Use this page when an agent is modifying the **ktx** repository itself rather than
using **ktx** in an analytics project.
| Agent task | Command or section |
|------------|--------------------|
| Prepare the workspace | `pnpm install`, `pnpm run setup:dev`, `uv sync --all-groups` |
| Verify TypeScript changes | `pnpm run type-check`, `pnpm run test`, or package-filtered equivalents |
| Verify Python changes | `uv run pytest -q` and `uv run pre-commit run --files <files>` |
| Add a connector | [Adding a connector](#adding-a-connector) |
| Check style expectations | [Code conventions](#code-conventions) |
Common recovery path: if a check fails because generated files or local
runtimes are missing, run the setup commands first. If a check fails because of
a real type, lint, or test error, fix the source file and rerun the smallest
failing check before broadening verification.