docs: remove stale benchmarks section

This commit is contained in:
Luca Martial 2026-05-11 23:32:13 -07:00
parent a0193b3fb0
commit 37150c0abc
3 changed files with 0 additions and 169 deletions

View file

@ -1,163 +0,0 @@
---
title: Link Detection
description: How KTX's relationship detection performs on real-world schemas.
---
KTX infers foreign key relationships between tables even when the database declares no primary keys or foreign key constraints. This is critical for analytics warehouses, where constraints are rarely enforced. This page documents the methodology, scoring pipeline, and a reproducible benchmark you can run yourself.
## Agent usage notes
Use this page when an agent needs to explain, tune, or verify relationship detection.
| Agent task | Relevant section | Command |
|------------|------------------|---------|
| Explain why KTX inferred a join | Detection pipeline | `ktx dev scan relationships <run-id> --status all` |
| Decide whether to accept or reject a candidate | Scoring and threshold configuration | `ktx dev scan relationships <run-id> --accept <candidate-id>` |
| Tune thresholds from reviewed decisions | Broader benchmark suite and calibration | `ktx dev scan relationship-thresholds --connection <connection-id>` |
| Reproduce the bundled benchmark | Reproducing the benchmark | `pnpm run relationships:verify-orbit` |
## What this measures
Most analytics warehouses — Snowflake, BigQuery, Redshift — don't enforce referential integrity constraints. Tables like `fct_product_events` reference `dim_accounts` by convention (`account_id` → `id`), but nothing in the schema says so.
KTX's relationship detection discovers these links automatically. The benchmark measures how accurately it recovers known foreign key relationships from a schema with **all declared constraints removed** — the hardest operating mode.
Metrics tracked:
- **Accepted** — relationships scored above the accept threshold (default 0.85) and written to the project manifest
- **Review** — relationships scored between the review threshold (0.55) and accept threshold, flagged for human review
- **Rejected** — relationships scored below the review threshold
- **Skipped** — relationships not evaluated (e.g., filtered by candidate limits)
## Methodology
### Detection pipeline
Relationship detection runs as a multi-stage pipeline during `ktx dev scan`:
1. **Candidate generation** — scans the schema for potential FK relationships using multiple heuristics: exact column name matches, normalized table name matching, name inflection (singular/plural), column suffix patterns (`_id`, `_key`, `_code`, `_uuid`), self-references (`parent_id`, `manager_id`), and optionally embedding similarity and LLM proposals.
2. **Column profiling** — samples up to 10,000 rows per column (configurable via `profile_sample_rows`) to collect statistics: row counts, null rates, distinct value counts, uniqueness ratios, sample values, and text length ranges.
3. **Validation** — tests each candidate relationship against actual data by measuring target uniqueness, source coverage, violation ratio, and value overlap between child and parent columns.
4. **Scoring** — combines 7 weighted signals into a confidence score:
| Signal | Weight | What it captures |
|--------|--------|-----------------|
| Name similarity | 0.24 | How closely column/table names match FK conventions |
| Value overlap | 0.22 | What percentage of FK values exist in the PK column |
| Profile uniqueness | 0.22 | How unique the target column values are |
| Type compatibility | 0.10 | Whether data types are compatible (hard gate — score is 0 if incompatible) |
| Embedding similarity | 0.10 | Semantic similarity between column names |
| Profile null rate | 0.08 | Presence of non-null values |
| Structural prior | 0.04 | Baseline structural hints from schema conventions |
Each signal is normalized to \[0, 1\], multiplied by its weight, and summed. The final confidence is `0.56 + (weighted_sum × 0.65)`, clamped to \[0, 1\].
5. **Graph resolution** — resolves conflicts when multiple candidates target the same column, detects primary keys (by name pattern and validation), and classifies each relationship into `accepted`, `review`, or `rejected` based on thresholds.
### Threshold configuration
```yaml
scan:
relationships:
accept_threshold: 0.85
review_threshold: 0.55
```
Relationships scoring above `accept_threshold` are automatically accepted into the project manifest. Those between `review_threshold` and `accept_threshold` are flagged for analyst review. Below `review_threshold`, they're rejected.
### Test fixture
The benchmark uses the **Orbit-style product warehouse** — a synthetic schema modeled after a real SaaS analytics warehouse with all declared constraints removed. The fixture is a SQLite database with 6 tables:
| Table | Role | Estimated rows |
|-------|------|---------------|
| `dim_accounts` | Dimension | 3 |
| `dim_users` | Dimension | 4 |
| `dim_workspaces` | Dimension | 4 |
| `fct_product_events` | Fact | 5 |
| `fct_invoices` | Fact | 3 |
| `support_tickets` | Fact | 4 |
**Ground truth:** 6 primary keys (one `id` column per table) and 9 foreign key relationships, all `many_to_one`:
| Source column | Target |
|--------------|--------|
| `dim_users.account_id` | `dim_accounts.id` |
| `dim_workspaces.account_id` | `dim_accounts.id` |
| `dim_workspaces.user_id` | `dim_users.id` |
| `fct_product_events.account_id` | `dim_accounts.id` |
| `fct_product_events.user_id` | `dim_users.id` |
| `fct_product_events.workspace_id` | `dim_workspaces.id` |
| `fct_invoices.account_id` | `dim_accounts.id` |
| `support_tickets.account_id` | `dim_accounts.id` |
| `support_tickets.user_id` | `dim_users.id` |
The fixture runs in multiple modes to isolate the contribution of each pipeline stage: with LLM disabled, profiling disabled, validation disabled, and embeddings disabled.
## Results
Results for the default configuration will be added after the benchmark run is finalized.
## Reproducing the benchmark
### Prerequisites
- Node.js 22+
- pnpm
- The KTX repository cloned and dependencies installed (`pnpm install`)
### Running
From the repository root:
```bash
pnpm run relationships:verify-orbit
```
This runs `ktx dev scan` against the bundled SQLite fixture with enrichment disabled, then generates a verification report at:
```text
examples/orbit-relationship-verification/reports/orbit-verification.md
```
The report includes the full relationship summary, enrichment details, artifact paths, and any warnings.
### Custom project
To run verification against your own database (e.g., a local Orbit project):
```bash
KTX_ORBIT_PROJECT_DIR=/path/to/your-project pnpm run relationships:verify-orbit
```
### Configuration
The benchmark project configuration lives at `examples/orbit-relationship-verification/ktx.yaml`:
```yaml
scan:
enrichment:
backend: none
relationships:
enabled: true
llm_proposals: false
accept_threshold: 0.85
review_threshold: 0.55
profile_sample_rows: 10000
validation_concurrency: 4
```
Adjust `accept_threshold` and `review_threshold` to see how threshold changes affect the accepted/review/rejected distribution. Lower thresholds accept more relationships (higher recall, lower precision); higher thresholds are more conservative.
## Broader benchmark suite
Beyond the Orbit fixture, KTX includes a full benchmark corpus at `packages/context/test/fixtures/relationship-benchmarks/` with fixtures across multiple tiers:
- **Unit** — minimal schemas testing individual heuristics
- **Row-bearing** — small schemas with data for validation testing
- **Product** — full warehouse schemas like the Orbit fixture
Fixtures from public datasets (Chinook, Sakila, AdventureWorks, Northwind) supplement the synthetic fixtures. The benchmark runner measures precision, recall, and F1 for both primary key and foreign key detection across all fixtures and modes.

View file

@ -1,5 +0,0 @@
{
"title": "Benchmarks",
"defaultOpen": true,
"pages": ["link-detection"]
}

View file

@ -6,7 +6,6 @@
"concepts",
"guides",
"integrations",
"benchmarks",
"cli-reference",
"ai-resources",
"community"