docs: remove stale benchmarks section

2026-07-22 11:51:01 +02:00 · 2026-05-11 23:32:13 -07:00 · 2026-05-11 23:32:13 -07:00 · 37150c0abc
commit 37150c0abc
parent a0193b3fb0
3 changed files with 0 additions and 169 deletions
--- a/docs-site/content/docs/benchmarks/link-detection.mdx
+++ b/docs-site/content/docs/benchmarks/link-detection.mdx
@ -1,163 +0,0 @@
---
-title: Link Detection
-description: How KTX's relationship detection performs on real-world schemas.
---
-
-KTX infers foreign key relationships between tables even when the database declares no primary keys or foreign key constraints. This is critical for analytics warehouses, where constraints are rarely enforced. This page documents the methodology, scoring pipeline, and a reproducible benchmark you can run yourself.
-
-## Agent usage notes
-
-Use this page when an agent needs to explain, tune, or verify relationship detection.
-
-| Agent task | Relevant section | Command |
-|------------|------------------|---------|
-| Explain why KTX inferred a join | Detection pipeline | `ktx dev scan relationships <run-id> --status all` |
-| Decide whether to accept or reject a candidate | Scoring and threshold configuration | `ktx dev scan relationships <run-id> --accept <candidate-id>` |
-| Tune thresholds from reviewed decisions | Broader benchmark suite and calibration | `ktx dev scan relationship-thresholds --connection <connection-id>` |
-| Reproduce the bundled benchmark | Reproducing the benchmark | `pnpm run relationships:verify-orbit` |
-
-## What this measures
-
-Most analytics warehouses — Snowflake, BigQuery, Redshift — don't enforce referential integrity constraints. Tables like `fct_product_events` reference `dim_accounts` by convention (`account_id` → `id`), but nothing in the schema says so.
-
-KTX's relationship detection discovers these links automatically. The benchmark measures how accurately it recovers known foreign key relationships from a schema with **all declared constraints removed** — the hardest operating mode.
-
-Metrics tracked:
-
- **Accepted** — relationships scored above the accept threshold (default 0.85) and written to the project manifest
- **Review** — relationships scored between the review threshold (0.55) and accept threshold, flagged for human review
- **Rejected** — relationships scored below the review threshold
- **Skipped** — relationships not evaluated (e.g., filtered by candidate limits)
-
-## Methodology
-
-### Detection pipeline
-
-Relationship detection runs as a multi-stage pipeline during `ktx dev scan`:
-
-1. **Candidate generation** — scans the schema for potential FK relationships using multiple heuristics: exact column name matches, normalized table name matching, name inflection (singular/plural), column suffix patterns (`_id`, `_key`, `_code`, `_uuid`), self-references (`parent_id`, `manager_id`), and optionally embedding similarity and LLM proposals.
-
-2. **Column profiling** — samples up to 10,000 rows per column (configurable via `profile_sample_rows`) to collect statistics: row counts, null rates, distinct value counts, uniqueness ratios, sample values, and text length ranges.
-
-3. **Validation** — tests each candidate relationship against actual data by measuring target uniqueness, source coverage, violation ratio, and value overlap between child and parent columns.
-
-4. **Scoring** — combines 7 weighted signals into a confidence score:
-
-| Signal | Weight | What it captures |
-|--------|--------|-----------------|
-| Name similarity | 0.24 | How closely column/table names match FK conventions |
-| Value overlap | 0.22 | What percentage of FK values exist in the PK column |
-| Profile uniqueness | 0.22 | How unique the target column values are |
-| Type compatibility | 0.10 | Whether data types are compatible (hard gate — score is 0 if incompatible) |
-| Embedding similarity | 0.10 | Semantic similarity between column names |
-| Profile null rate | 0.08 | Presence of non-null values |
-| Structural prior | 0.04 | Baseline structural hints from schema conventions |
-
-Each signal is normalized to \[0, 1\], multiplied by its weight, and summed. The final confidence is `0.56 + (weighted_sum × 0.65)`, clamped to \[0, 1\].
-
-5. **Graph resolution** — resolves conflicts when multiple candidates target the same column, detects primary keys (by name pattern and validation), and classifies each relationship into `accepted`, `review`, or `rejected` based on thresholds.
-
-### Threshold configuration
-
-```yaml
-scan:
-  relationships:
-    accept_threshold: 0.85
-    review_threshold: 0.55
-```
-
-Relationships scoring above `accept_threshold` are automatically accepted into the project manifest. Those between `review_threshold` and `accept_threshold` are flagged for analyst review. Below `review_threshold`, they're rejected.
-
-### Test fixture
-
-The benchmark uses the **Orbit-style product warehouse** — a synthetic schema modeled after a real SaaS analytics warehouse with all declared constraints removed. The fixture is a SQLite database with 6 tables:
-
-| Table | Role | Estimated rows |
-|-------|------|---------------|
-| `dim_accounts` | Dimension | 3 |
-| `dim_users` | Dimension | 4 |
-| `dim_workspaces` | Dimension | 4 |
-| `fct_product_events` | Fact | 5 |
-| `fct_invoices` | Fact | 3 |
-| `support_tickets` | Fact | 4 |
-
-**Ground truth:** 6 primary keys (one `id` column per table) and 9 foreign key relationships, all `many_to_one`:
-
-| Source column | Target |
-|--------------|--------|
-| `dim_users.account_id` | `dim_accounts.id` |
-| `dim_workspaces.account_id` | `dim_accounts.id` |
-| `dim_workspaces.user_id` | `dim_users.id` |
-| `fct_product_events.account_id` | `dim_accounts.id` |
-| `fct_product_events.user_id` | `dim_users.id` |
-| `fct_product_events.workspace_id` | `dim_workspaces.id` |
-| `fct_invoices.account_id` | `dim_accounts.id` |
-| `support_tickets.account_id` | `dim_accounts.id` |
-| `support_tickets.user_id` | `dim_users.id` |
-
-The fixture runs in multiple modes to isolate the contribution of each pipeline stage: with LLM disabled, profiling disabled, validation disabled, and embeddings disabled.
-
-## Results
-
-Results for the default configuration will be added after the benchmark run is finalized.
-
-## Reproducing the benchmark
-
-### Prerequisites
-
- Node.js 22+
- pnpm
- The KTX repository cloned and dependencies installed (`pnpm install`)
-
-### Running
-
-From the repository root:
-
-```bash
-pnpm run relationships:verify-orbit
-```
-
-This runs `ktx dev scan` against the bundled SQLite fixture with enrichment disabled, then generates a verification report at:
-
-```text
-examples/orbit-relationship-verification/reports/orbit-verification.md
-```
-
-The report includes the full relationship summary, enrichment details, artifact paths, and any warnings.
-
-### Custom project
-
-To run verification against your own database (e.g., a local Orbit project):
-
-```bash
-KTX_ORBIT_PROJECT_DIR=/path/to/your-project pnpm run relationships:verify-orbit
-```
-
-### Configuration
-
-The benchmark project configuration lives at `examples/orbit-relationship-verification/ktx.yaml`:
-
-```yaml
-scan:
-  enrichment:
-    backend: none
-  relationships:
-    enabled: true
-    llm_proposals: false
-    accept_threshold: 0.85
-    review_threshold: 0.55
-    profile_sample_rows: 10000
-    validation_concurrency: 4
-```
-
-Adjust `accept_threshold` and `review_threshold` to see how threshold changes affect the accepted/review/rejected distribution. Lower thresholds accept more relationships (higher recall, lower precision); higher thresholds are more conservative.
-
-## Broader benchmark suite
-
-Beyond the Orbit fixture, KTX includes a full benchmark corpus at `packages/context/test/fixtures/relationship-benchmarks/` with fixtures across multiple tiers:
-
- **Unit** — minimal schemas testing individual heuristics
- **Row-bearing** — small schemas with data for validation testing
- **Product** — full warehouse schemas like the Orbit fixture
-
-Fixtures from public datasets (Chinook, Sakila, AdventureWorks, Northwind) supplement the synthetic fixtures. The benchmark runner measures precision, recall, and F1 for both primary key and foreign key detection across all fixtures and modes.
--- a/docs-site/content/docs/benchmarks/meta.json
+++ b/docs-site/content/docs/benchmarks/meta.json
@ -1,5 +0,0 @@
-{
-  "title": "Benchmarks",
-  "defaultOpen": true,
-  "pages": ["link-detection"]
-}
--- a/docs-site/content/docs/meta.json
+++ b/docs-site/content/docs/meta.json
@ -6,7 +6,6 @@
    "concepts",
    "guides",
    "integrations",
-    "benchmarks",
    "cli-reference",
    "ai-resources",
    "community"