docs: add agent terminology rules and link from AGENTS.md (#197)

Introduce `docs/terminology.md` with the canonical vocabulary coding agents should use across docs, code, comments, CLI strings, and error messages — including the disambiguation rule for the overloaded word `source` (semantic / primary / context / source of truth) and a converged-vs-banned table covering connectors, ingest modes, MCP naming, reconciliation, and supported-system orderings. Reference the new file from the `Product Naming` section of AGENTS.md so Claude, Codex, and Gemini all pick it up via the existing AGENTS.md symlinks.
2026-07-22 11:51:01 +02:00 · 2026-05-21 12:52:50 +02:00 · 2026-05-21 12:52:50 +02:00 · d67cf0aab8
commit d67cf0aab8
parent 488b955024
2 changed files with 105 additions and 0 deletions
--- a/docs/terminology.md
+++ b/docs/terminology.md
@ -0,0 +1,95 @@
+# ktx Terminology Rules
+
+Canonical vocabulary for coding agents working on this repository. Applies to
+docs prose, code comments, identifiers, CLI strings, error messages, log lines,
+and example output.
+
+For product-name capitalization rules (`ktx` vs `**ktx**` vs code font), see the
+`Product Naming` section of `AGENTS.md` — those rules take precedence over
+anything below when they conflict.
+
+## The "source" rule
+
+`source` does four different jobs in this codebase. Never write bare `source`
+in prose when ambiguity is possible. Always qualify:
+
+- **semantic source** — the YAML file that describes a table
+- **primary source** — the connected database
+- **context source** — the analytics-tooling integration (dbt, Metabase, etc.)
+- **source of truth** — the canonical place a fact lives
+
+Bare `source` is allowed only inside a section that has already established its
+referent (e.g., body of a `Semantic sources` page, or `sourceName` as a CLI arg).
+
+## Canonical vocabulary
+
+| Concept | Use | Do not use |
+|---|---|---|
+| AI consumer (general prose) | **data agent** | analytics agent, database agent, client agent |
+| AI consumer (Integrations nav) | **agent client** | client agent |
+| Coding-tool framing (user-facing) | **coding agent** | — |
+| The connected database | **primary source** / **database connection** | data source |
+| Analytics-tooling integration | **context source** / **context-source connection** | BI source, BI model, metadata source, source tool |
+| YAML file describing a table | **semantic source** | semantic-layer source, model file, bare "source file" |
+| The whole **ktx** surface | **context layer** (lowercase in prose) | "Context Layer" in prose |
+| The compiler pillar | **semantic layer** (lowercase in prose) | "Semantic Layer" in prose |
+| The query payload | **semantic query** (lowercase in prose) | "Semantic Query" |
+| The MCP layer | **MCP server** (the server), **MCP tools** (the functions) | "ktx MCP" as a standalone noun |
+| The plugin/implementation | **connector** (prefix with **primary** or **context** when contrasting) | adapter, driver-as-noun |
+| Config field value | `driver` (code font only) | `driver` as a generic noun |
+| Merge step | **reconcile** / **reconciliation** / **reconciliation agent** | "merge intelligently", bare "LLM agent" |
+| Connection ref in prose | **connection id** (lowercase, two words) | "connection ID" |
+| CLI arg/flag literal | `connectionId` (code font) | — |
+| File path placeholder | `<connection-id>` (code font) | — |
+| Fast schema mode | **fast ingest** | schema ingest, schema-only ingest |
+| AI-enriched mode | **deep ingest** | AI-enriched ingest |
+| Ingest of a primary connection | **database ingest** | — |
+| Ingest of a context-source connection | **context-source ingest** | bare "source ingest" |
+| Wiki capture | **text ingest** | — |
+| Query-history sub-mode | **query-history ingest** | — |
+| SQL compilation | **compile** / **the compiler** / **SQL compilation** | "SQL generation" |
+| Internal stage inside compilation | **planner** / **planning** (only in semantic-layer-internals) | — |
+| Setup flow noun | **setup wizard** | "the wizard" (bare) |
+| Setup flow contrast | **interactive setup** (vs non-interactive / flag-driven) | "interactive command" |
+| The whole project | **ktx project** | "KTX project" (all caps) |
+| The filesystem path | **project directory** | "project dir" |
+| Wiki surface as a whole | **wiki** | "wiki context" |
+| A single Markdown file | **wiki page** | — |
+| YAML vs Markdown contrast | **wiki Markdown** (only when contrasting with **semantic source YAML**) | — |
+| Joins multiplying rows (generic) | **fan-out** | — |
+| The two named patterns | **chasm trap** / **fan trap** | — |
+| Casual gloss in user prose | **double-count** | (avoid in technical/internals prose) |
+
+## Prose rules
+
+- **Article + ktx.** Treat `ktx` as a bare proper noun, no article: `ktx
+  is...`, `in ktx`. Articles attach to the *following* noun, not to `ktx`:
+  `the **ktx** MCP server`, `the **ktx** project`.
+- **Capitalization.** Default lowercase for `context layer`, `semantic layer`,
+  `semantic query`. Title case only inside literal page titles or H1 headings.
+- **Code font.** Reserve code font for the CLI command, binary, paths, config
+  field values (e.g. `driver: postgres`), CLI arg/flag literals
+  (`connectionId`, `--project-dir`), and path placeholders (`<connection-id>`).
+  Do not use code font for prose nouns like *connector* or *reconciliation*.
+- **`driver` is never a prose noun.** Always `driver: postgres` (code font, as
+  a config field value). For the noun, use `connector`.
+
+## Canonical lists
+
+Use these orderings verbatim when listing supported systems:
+
+- **Primary sources:** PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL
+  Server, SQLite
+- **Context sources:** dbt, MetricFlow, LookML, Looker, Metabase, Notion
+
+If a doc or string omits or reorders members of either list, treat that as a
+bug unless the surrounding text justifies the change.
+
+## When updating this file
+
+- Add a new row to the canonical vocabulary table; do not introduce a parallel
+  glossary elsewhere.
+- If you rename a converged term, search the workspace for the previous form
+  and update call sites in the same change.
+- When deprecating a term, add it to the *Do not use* column with a one-line
+  reason in the surrounding prose, not just in the table.