schema-lint chassis v1.0: DropProperty Soft + code-tagged diagnostics (MR-694) (#90)

* schema-lint chassis v1 (WIP): tier surfacing + plan doc

First commit of the chassis v1 branch. Lands a small, foundational
slice without behavior change, plus a planning doc that lays out the
remaining 7 commits in sequence so the PR can be reviewed
incrementally.

This commit:

- Adds SchemaMigrationStep::diagnostic() returning the full
  &'static DiagnosticCode (family + tier + severity) for
  UnsupportedChange steps with codes. Renderers can now reach the
  tier without re-implementing the code → tier lookup.

- CLI `omnigraph schema plan` output now displays tier alongside
  code:

    unsupported change on node:Person.age [OG-DS-104, destructive]:
        removing property 'Person.age' is not supported in schema
        migration v1

  Operators see at-a-glance the kind of risk each rejection
  represents — not just the rule identifier.

- No behavior change. All 11 existing schema_apply tests still pass.

Planning doc at docs/schema-lint-v1-plan.md tracks the 7 remaining
commits to bring v1 to feature-complete:

  1. (this commit) Tier surfacing in plan output.
  2. Soft / Hard mode enum on drop steps.
  3. Tombstone fields on catalog IR.
  4. Planner emits DropProperty { Soft } by default.
  5. Apply path implements Soft mode.
  6. Convert PR #62 destructive-rejection tests.
  7. --allow-data-loss flag + Hard mode.
  8. (optional) Tombstone unhide / restore command.

Delete the planning doc when v1 lands. Intentionally checked in to
the WIP branch so the scope is reviewable; not intended as a
permanent doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* schema-lint v1 commit 2: DropMode + dormant Drop* variants

Second commit of the chassis v1 branch. Lands the type-level shape
of soft/hard drops without wiring them up. Variants are reachable
from emitters but the planner doesn't produce them yet; the apply
path returns an explicit not-yet-implemented error if one shows up
via deserialization.

Added:

- `DropMode { Soft, Hard }` — orthogonal to `SafetyTier`. Tier
  classifies the rule's risk class; mode is the operator's intent
  for data treatment.
    - `Soft` → catalog tombstone, data retained. Tier: safe.
    - `Hard` → Lance-level removal. Tier: destructive; will require
      --allow-data-loss to apply (commit 7).

- `SchemaMigrationStep::DropType { type_kind, name, mode }` and
  `SchemaMigrationStep::DropProperty { type_kind, type_name,
  property_name, mode }` variants.

- Re-export `DropMode` from `omnigraph_compiler::DropMode` so
  downstream crates don't reach into the catalog submodule.

- CLI `render_schema_plan_step` arms for both variants, surfacing
  the mode in plan output: `drop property 'Person.age' of node
  'Person' (soft mode)`.

- `apply_schema_with_lock` exhaustive match arm for the two new
  variants that returns `manifest_internal` with a clear
  not-yet-implemented message. If a SchemaIR JSON containing
  Drop{Type,Property} arrives (e.g. from a future tool or hand-
  written), the apply path fails explicitly rather than silently
  misclassifying.

- Two new in-source tests:
    - `drop_steps_round_trip_through_serde` — pins the wire shape
      for all four (variant × mode) combinations.
    - `drop_mode_serde_uses_snake_case` — pins external-tool-
      friendly serialization (`"soft"` / `"hard"`).

Build: clean, only pre-existing warnings.
Tests:
- omnigraph-compiler schema_plan: 6/6 (4 existing + 2 new).
- omnigraph-engine schema_apply: 11/11 (unchanged — planner still
  emits UnsupportedChange for removal paths).

Next commit (commit 3 per docs/schema-lint-v1-plan.md): add the
`tombstoned: bool` fields to NodeIR / EdgeIR / PropertyIR for the
catalog representation of soft-mode tombstones.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* plan doc: reframe v1 around Lance native drop_columns

After a substrate audit of the Lance data-evolution guide on
2026-05-13, the v1 plan was simplified. Two key findings:

1. Lance's `drop_columns()` is already metadata-only and reversible
   via time travel until cleanup. No need for a parallel
   `tombstoned: bool` field in our catalog IR — Lance's version
   graph IS the tombstone.

2. The full schema_apply substrate migration (add_columns,
   drop_columns, alter_columns vs. stage_overwrite across all step
   types) is consolidated in MR-948 as a sibling issue. v1 only
   uses the relevant slice (drop_columns for OG-DS-1XX).

Net plan changes:

- Commit 3 (original): tombstone fields on catalog IR → dropped.
  No catalog IR change needed. The Lance drop_columns commit IS the
  tombstone.

- Commit 5 (original): apply path writes tombstoned: true → replaced
  with: apply path calls Dataset::drop_columns([name]).

- Commit 7 Hard mode: stage_overwrite removing the column → replaced
  with: drop_columns + compact_files + cleanup_old_versions. Same
  APIs omnigraph cleanup already uses.

- Commit 8 (original): omnigraph schema unhide → dropped. Time
  travel is the undo (omnigraph snapshot --at <commit>).

Net result: 8 commits → 5 commits. ~250 LoC less surface. More
substrate-aligned.

The chassis types from commit 2 (DropMode enum, DropType /
DropProperty variants) are kept exactly as designed; only the
implementation strategy changed.

The Lance docs quote is included in the doc so future readers see
the substrate behavior cited verbatim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* schema-lint v1 commit 3: emit + apply DropProperty { Soft }

Wire the dormant DropProperty variant end-to-end for the Soft case.
Per docs/schema-lint-v1-plan.md, commit #3 of the schema-lint chassis
v1 series (MR-694).

Planner (schema_plan.rs):
- plan_properties: emit DropProperty { type_kind, type_name,
  property_name, mode: Soft } instead of UnsupportedChange when a
  property exists in accepted but not in desired. Plan is now
  supported = true for drop-only changes.

Apply (schema_apply.rs):
- Route DropProperty { Soft } through rewritten_tables. The existing
  batch_for_schema_apply_rewrite path already iterates the *target*
  schema fields, so a property absent from desired_catalog is
  naturally projected away. The prior Lance version retains the
  dropped column for time-travel reversibility (until cleanup runs).
- DropType still errors (lands in commit #4 with different mechanics:
  __manifest entry removal instead of column projection).
- DropProperty { Hard } still errors (lands in commit #5 with
  --allow-data-loss CLI flag + immediate compact_files +
  cleanup_old_versions).

Tests:
- Planner unit test plan_emits_soft_drop_for_removed_nullable_property
  asserts the variant emission + supported = true + no UnsupportedChange.
- Integration test apply_schema_drops_a_nullable_property_softly_
  preserves_prior_version (replaces the former
  apply_schema_rejects_dropping_a_property_with_data) asserts:
  (a) plan contains DropProperty { Soft }
  (b) apply succeeds + manifest advances + row count unchanged
  (c) current dataset schema lacks the dropped column
  (d) snapshot_at_version(pre_drop) still has the dropped column
  (e) reopen consistency — drop preserved across engine restart

Recovery: rides on SidecarKind::SchemaApply per MR-847. No new
sidecar kind needed; the entire apply path is already sidecar-wrapped.

Substrate alignment: this commit uses the stage_overwrite full-rewrite
path (full_rewrite cost class) rather than Lance native drop_columns
(catalog_only cost class). MR-948 is the follow-up substrate-alignment
refactor that introduces a LanceColumnOp surface and switches the
metadata-only case onto drop_columns. Functional outcome is identical;
cost-class improvement deferred.

Test results:
- cargo test -p omnigraph-compiler --lib: 238 passed
- cargo test -p omnigraph-engine --test schema_apply: 11 passed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: move schema-lint-v1-plan into docs/dev/ + add to index

Post-rebase fixup for the docs split (#93). The plan doc was added
to docs/ at the top level before main reorganized to docs/{user,dev}/.
This moves it into docs/dev/ and adds an entry to docs/dev/index.md
under a new "Active Implementation Plans" section so the
check-agents-md.sh link check passes.

Per the original commit message (617a77d), the plan doc is intentionally
temporary — it will be deleted when v1 lands.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andrew Altshuler 2026-05-16 16:30:03 +03:00 committed by GitHub
parent 5c889f8e42
commit e98347eb7b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 478 additions and 43 deletions

View file

@ -1035,14 +1035,48 @@ fn render_schema_plan_step(step: &SchemaMigrationStep) -> String {
type_name,
render_annotations(annotations)
),
SchemaMigrationStep::DropType {
type_kind,
name,
mode,
} => format!(
"drop {} type '{}' ({} mode)",
schema_type_kind_label(*type_kind),
name,
drop_mode_label(*mode),
),
SchemaMigrationStep::DropProperty {
type_kind,
type_name,
property_name,
mode,
} => format!(
"drop property '{}.{}' of {} '{}' ({} mode)",
type_name,
property_name,
schema_type_kind_label(*type_kind),
type_name,
drop_mode_label(*mode),
),
SchemaMigrationStep::UnsupportedChange {
entity,
reason,
code,
} => match code {
Some(c) => format!("unsupported change on {} [{}]: {}", entity, c, reason),
None => format!("unsupported change on {}: {}", entity, reason),
},
entity, reason, ..
} => {
// When a schema-lint code is attached, render code + tier
// so operators see at-a-glance the kind of risk (destructive
// / validated / safe) — not just the rule identifier.
// Reach the diagnostic via the `diagnostic()` helper so the
// CLI doesn't need to know how the lookup works.
match step.diagnostic() {
Some(diag) => format!(
"unsupported change on {} [{}, {}]: {}",
entity,
diag.code,
schema_lint_tier_label(diag.tier),
reason,
),
None => format!("unsupported change on {}: {}", entity, reason),
}
}
}
}
@ -1054,6 +1088,21 @@ fn schema_type_kind_label(kind: omnigraph_compiler::SchemaTypeKind) -> &'static
}
}
fn schema_lint_tier_label(tier: omnigraph_compiler::SafetyTier) -> &'static str {
match tier {
omnigraph_compiler::SafetyTier::Safe => "safe",
omnigraph_compiler::SafetyTier::Validated => "validated",
omnigraph_compiler::SafetyTier::Destructive => "destructive",
}
}
fn drop_mode_label(mode: omnigraph_compiler::DropMode) -> &'static str {
match mode {
omnigraph_compiler::DropMode::Soft => "soft",
omnigraph_compiler::DropMode::Hard => "hard",
}
}
fn render_prop_type(prop_type: &omnigraph_compiler::PropType) -> String {
let base = if let Some(values) = &prop_type.enum_values {
format!("Enum({})", values.join("|"))

View file

@ -16,6 +16,29 @@ pub enum SchemaTypeKind {
Edge,
}
/// How a drop step interacts with data.
///
/// - **`Soft`** — catalog tombstone only. The type / property is hidden
/// from queries but the underlying Lance column / dataset is retained
/// on disk. Reversible via `omnigraph schema unhide` (forthcoming).
/// Tier: `safe`.
/// - **`Hard`** — actual data removal. The Lance column is rewritten
/// without the property, or the Lance dataset is dropped. Irreversible
/// short of branch / snapshot restore. Tier: `destructive`; requires
/// `--allow-data-loss` to apply.
///
/// The planner emits `Soft` by default; `--allow-data-loss` on the apply
/// CLI promotes drops to `Hard`. This is the dimension orthogonal to
/// `SafetyTier` from the schema-lint chassis (`crate::lint`): tier
/// describes the rule's class; mode describes the operator's intent for
/// data treatment.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum DropMode {
Soft,
Hard,
}
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct SchemaMigrationPlan {
pub supported: bool,
@ -62,6 +85,28 @@ pub enum SchemaMigrationStep {
property_name: String,
annotations: Vec<Annotation>,
},
/// Remove a node or edge type. Soft mode tombstones in the catalog
/// and retains data on disk; Hard mode drops the Lance dataset and
/// requires `--allow-data-loss`.
///
/// Dormant in this commit — emitted by the planner in a later
/// commit (see `docs/schema-lint-v1-plan.md`).
DropType {
type_kind: SchemaTypeKind,
name: String,
mode: DropMode,
},
/// Remove a property from an existing type. Soft mode tombstones
/// the property in the catalog and retains the Lance column; Hard
/// mode rewrites the column out and requires `--allow-data-loss`.
///
/// Dormant in this commit.
DropProperty {
type_kind: SchemaTypeKind,
type_name: String,
property_name: String,
mode: DropMode,
},
UnsupportedChange {
entity: String,
reason: String,
@ -93,6 +138,24 @@ impl SchemaMigrationStep {
_ => None,
}
}
/// If this step carries a schema-lint code, return the full
/// catalog entry — including family, safety tier, and default
/// severity. Used by renderers that want to display richer
/// context than just the code string (e.g. `omnigraph schema
/// plan` annotating each line with its tier).
///
/// Returns `None` for steps that carry no code (the 12 of 17
/// `UnsupportedChange` paths still untagged in v0, plus every
/// non-`UnsupportedChange` variant).
pub fn diagnostic(&self) -> Option<&'static crate::lint::DiagnosticCode> {
match self {
Self::UnsupportedChange {
code: Some(c), ..
} => crate::lint::lookup(c),
_ => None,
}
}
}
pub fn plan_schema_migration(
@ -499,18 +562,22 @@ fn plan_properties(
.iter()
.filter(|property| !consumed.contains(&property.name))
{
steps.push(SchemaMigrationStep::UnsupportedChange {
entity: format!(
"{}:{}.{}",
schema_type_kind_key(type_kind),
type_name,
leftover.name
),
reason: format!(
"removing property '{}.{}' is not supported in schema migration v1",
type_name, leftover.name
),
code: Some(crate::lint::codes::OG_DS_104.code.to_string()),
// Property removed from the desired schema: emit
// DropProperty { Soft } per docs/schema-lint-v1-plan.md
// commit #3. The Soft mode reuses the existing
// stage_overwrite rewrite path — batch_for_schema_apply_rewrite
// iterates target_schema.fields(), so the dropped column is
// naturally projected away. The prior Lance version retains
// the column until cleanup_old_versions runs, matching the
// OG-DS-104 destructive-tier expectation that data remains
// recoverable via time travel until cleanup. Hard mode (with
// immediate compact_files + cleanup_old_versions) lands in
// commit #5, gated by --allow-data-loss.
steps.push(SchemaMigrationStep::DropProperty {
type_kind,
type_name: type_name.to_string(),
property_name: leftover.name.clone(),
mode: DropMode::Soft,
});
}
@ -863,6 +930,67 @@ node Account @rename_from("User") {
}));
}
#[test]
fn plan_emits_soft_drop_for_removed_nullable_property() {
// Removing a property from the desired schema emits
// DropProperty { Soft } (schema-lint v1 chassis commit #3,
// MR-694). The plan is `supported = true` — the apply path
// handles soft drop via the existing stage_overwrite rewrite
// projection. Verified at the integration level by
// `apply_schema_drops_a_nullable_property_softly_preserves_prior_version`
// in `crates/omnigraph/tests/schema_apply.rs`.
let accepted = build_schema_ir(
&parse_schema(
r#"
node Person {
name: String @key
age: I32?
}
"#,
)
.unwrap(),
)
.unwrap();
let desired = build_schema_ir(
&parse_schema(
r#"
node Person {
name: String @key
}
"#,
)
.unwrap(),
)
.unwrap();
let plan = plan_schema_migration(&accepted, &desired).unwrap();
assert!(
plan.supported,
"drop-property plan must be supported: {plan:?}"
);
assert!(
plan.steps.iter().any(|step| matches!(
step,
SchemaMigrationStep::DropProperty {
type_kind: SchemaTypeKind::Node,
type_name,
property_name,
mode: DropMode::Soft,
..
} if type_name == "Person" && property_name == "age"
)),
"expected DropProperty {{ Soft }} step in plan: {plan:?}",
);
// Negative: no UnsupportedChange anywhere in the plan.
assert!(
!plan
.steps
.iter()
.any(|step| matches!(step, UnsupportedChange { .. })),
"soft drop must not emit UnsupportedChange: {plan:?}",
);
}
#[test]
fn plan_rejects_required_property_addition() {
let accepted = build_schema_ir(
@ -935,4 +1063,56 @@ node Person @description("new") {
}],
}));
}
#[test]
fn drop_steps_round_trip_through_serde() {
// The DropType / DropProperty variants are dormant in this
// commit — the planner doesn't emit them yet — but their
// serde shape needs to be stable from day one. A future
// SchemaIR JSON containing one of these must deserialize
// back to the same value. This test pins the wire format
// so a v0 schema-ir consumer never sees a surprise variant
// shape after v1 ships.
let steps = vec![
SchemaMigrationStep::DropType {
type_kind: SchemaTypeKind::Node,
name: "Person".to_string(),
mode: DropMode::Soft,
},
SchemaMigrationStep::DropType {
type_kind: SchemaTypeKind::Edge,
name: "Knows".to_string(),
mode: DropMode::Hard,
},
SchemaMigrationStep::DropProperty {
type_kind: SchemaTypeKind::Node,
type_name: "Person".to_string(),
property_name: "age".to_string(),
mode: DropMode::Soft,
},
SchemaMigrationStep::DropProperty {
type_kind: SchemaTypeKind::Interface,
type_name: "Named".to_string(),
property_name: "alias".to_string(),
mode: DropMode::Hard,
},
];
for step in steps {
let json = serde_json::to_string(&step).expect("serialize");
let round_trip: SchemaMigrationStep =
serde_json::from_str(&json).expect("deserialize");
assert_eq!(step, round_trip, "round-trip mismatch on {json}");
}
}
#[test]
fn drop_mode_serde_uses_snake_case() {
// External tools may write SchemaIR JSON by hand. Pin the
// wire form so we don't silently break them later.
assert_eq!(serde_json::to_string(&DropMode::Soft).unwrap(), "\"soft\"");
assert_eq!(serde_json::to_string(&DropMode::Hard).unwrap(), "\"hard\"");
let soft: DropMode = serde_json::from_str("\"soft\"").unwrap();
assert_eq!(soft, DropMode::Soft);
}
}

View file

@ -16,7 +16,7 @@ pub use catalog::schema_ir::{
schema_ir_pretty_json,
};
pub use catalog::schema_plan::{
SchemaMigrationPlan, SchemaMigrationStep, SchemaTypeKind, plan_schema_migration,
DropMode, SchemaMigrationPlan, SchemaMigrationStep, SchemaTypeKind, plan_schema_migration,
};
pub use lint::{DiagnosticCode, Family, SafetyTier, Severity};
pub use ir::ParamMap;

View file

@ -18,8 +18,8 @@ use omnigraph_compiler::catalog::{Catalog, EdgeType, NodeType};
use omnigraph_compiler::schema::parser::parse_schema;
use omnigraph_compiler::types::ScalarType;
use omnigraph_compiler::{
SchemaIR, SchemaMigrationPlan, SchemaMigrationStep, SchemaTypeKind, build_catalog_from_ir,
build_schema_ir, plan_schema_migration,
DropMode, SchemaIR, SchemaMigrationPlan, SchemaMigrationStep, SchemaTypeKind,
build_catalog_from_ir, build_schema_ir, plan_schema_migration,
};
use crate::db::graph_coordinator::{GraphCoordinator, PublishedSnapshot};

View file

@ -138,6 +138,41 @@ pub(super) async fn apply_schema_with_lock(
}
SchemaMigrationStep::UpdateTypeMetadata { .. }
| SchemaMigrationStep::UpdatePropertyMetadata { .. } => {}
SchemaMigrationStep::DropProperty {
type_kind,
type_name,
mode,
..
} => {
// Soft = reuse the existing stage_overwrite rewrite
// path. batch_for_schema_apply_rewrite iterates the
// *target* schema fields, so a property absent from
// desired_catalog is naturally projected away. The
// prior Lance version retains the dropped column,
// so reads at the previous snapshot still see it
// (time-travel reversibility). Hard mode (immediate
// compact_files + cleanup_old_versions for actual
// data deletion) lands in commit #5 gated by
// --allow-data-loss.
if !matches!(mode, DropMode::Soft) {
return Err(OmniError::manifest_internal(
"DropProperty { Hard } not yet implemented (commit #5)",
));
}
let table_key = schema_table_key(*type_kind, type_name);
if table_key.starts_with("edge:") {
changed_edge_tables = true;
}
rewritten_tables.insert(table_key);
}
SchemaMigrationStep::DropType { .. } => {
// DropType (whole-table drop via __manifest entry
// removal) lands in commit #4 — different mechanics
// from DropProperty.
return Err(OmniError::manifest_internal(
"DropType not yet implemented (commit #4)",
));
}
step @ SchemaMigrationStep::UnsupportedChange { .. } => {
return Err(OmniError::manifest(
step.unsupported_error_message()

View file

@ -101,17 +101,23 @@ async fn apply_schema_unsupported_plan_does_not_advance_manifest() {
);
}
// ─── Destructive / safety-tier rejections ────────────────────────────────────
// ─── Destructive / safety-tier behavior ──────────────────────────────────────
//
// Schema migration v1 only accepts additive change: add type, add nullable
// property, add index, rename. Every other shape returns an
// `UnsupportedChange` step that surfaces as an error from `apply_schema`,
// without advancing the manifest. These tests pin that contract for the
// destructive shapes (drop type, drop property, narrow type, add required,
// remove constraint) so a regression in the planner can't silently allow them.
// Schema migration v1 accepts:
// - Additive change: add type, add nullable property, add index, rename.
// - DropProperty { Soft } via the schema-lint v1 chassis (commit #3 of MR-694)
// — the dropped column is removed from the current manifest version but
// remains reachable via Lance time travel at the prior version, until
// `omnigraph cleanup` runs. Hard mode (immediate data cleanup) lands in
// commit #5 gated by `--allow-data-loss`.
//
// Every other destructive shape (drop type, narrow type, add required without
// backfill, remove constraint) still returns an `UnsupportedChange` step that
// surfaces as an error from `apply_schema`. These tests pin the current
// contract so a regression in the planner can't silently change behavior.
#[tokio::test]
async fn apply_schema_rejects_dropping_a_property_with_data() {
async fn apply_schema_drops_a_nullable_property_softly_preserves_prior_version() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
@ -122,25 +128,96 @@ async fn apply_schema_rejects_dropping_a_property_with_data() {
.unwrap()
.version();
// Drop `age` from Person. v1 doesn't support property removal even when
// the column is nullable — it would silently destroy data.
// Drop `age` from Person. v1 + chassis commit #3 emit
// `DropProperty { Soft }`; the rewrite path projects to the
// target schema (no `age`), commits via stage_overwrite. Row
// counts are unchanged — only the column is dropped from the
// current schema view.
let desired = TEST_SCHEMA.replace(" age: I32?\n", "");
let err = db.apply_schema(&desired).await.unwrap_err();
let msg = err.to_string();
// Confirm the plan emits DropProperty { Soft } (not UnsupportedChange).
let plan = db.plan_schema(&desired).await.unwrap();
assert!(plan.supported, "drop-property plan must be supported");
assert!(
msg.contains("OG-DS-104"),
"expected schema-lint code OG-DS-104 in error, got: {msg}"
plan.steps.iter().any(|step| matches!(
step,
SchemaMigrationStep::DropProperty {
type_kind: SchemaTypeKind::Node,
type_name,
property_name,
mode: omnigraph_compiler::DropMode::Soft,
..
} if type_name == "Person" && property_name == "age"
)),
"expected DropProperty {{ type=Person, property=age, mode=Soft }} in plan; got {plan:?}",
);
// Manifest didn't advance and existing rows are untouched.
assert_eq!(
db.snapshot_of(ReadTarget::branch("main"))
.await
.unwrap()
.version(),
before_version
let result = db.apply_schema(&desired).await.unwrap();
assert!(result.supported);
assert!(result.applied);
// Manifest advanced; row count unchanged.
let after_version = db
.snapshot_of(ReadTarget::branch("main"))
.await
.unwrap()
.version();
assert!(
after_version > before_version,
"manifest version should advance after soft drop; before={before_version}, after={after_version}",
);
assert_eq!(count_rows(&db, "node:Person").await, people_before);
// (a) Current snapshot: `age` is gone from the dataset schema.
let current_snapshot = db.snapshot_of(ReadTarget::branch("main")).await.unwrap();
let current_ds = current_snapshot.open("node:Person").await.unwrap();
let current_fields = current_ds
.schema()
.fields
.iter()
.map(|f| f.name.clone())
.collect::<Vec<_>>();
assert!(
!current_fields.iter().any(|f| f == "age"),
"current Person dataset schema must not include 'age' after soft drop; got fields {current_fields:?}",
);
// (b) Time travel: at the pre-drop manifest version, the prior
// Person dataset version still has `age`. Soft drop is reversible
// via Lance's version graph until `omnigraph cleanup` runs.
let pre_drop_snapshot = db.snapshot_at_version(before_version).await.unwrap();
let pre_drop_ds = pre_drop_snapshot.open("node:Person").await.unwrap();
let pre_drop_fields = pre_drop_ds
.schema()
.fields
.iter()
.map(|f| f.name.clone())
.collect::<Vec<_>>();
assert!(
pre_drop_fields.iter().any(|f| f == "age"),
"pre-drop Person dataset schema must still include 'age' (time-travel reversibility); got fields {pre_drop_fields:?}",
);
// (c) Reopen consistency: close the engine, reopen, verify the
// drop is preserved (column still absent from current schema).
let uri = dir.path().to_str().unwrap().to_string();
drop(db);
let reopened = Omnigraph::open(&uri).await.unwrap();
let reopened_snapshot = reopened
.snapshot_of(ReadTarget::branch("main"))
.await
.unwrap();
let reopened_ds = reopened_snapshot.open("node:Person").await.unwrap();
let reopened_fields = reopened_ds
.schema()
.fields
.iter()
.map(|f| f.name.clone())
.collect::<Vec<_>>();
assert!(
!reopened_fields.iter().any(|f| f == "age"),
"after reopen, Person dataset schema must still lack 'age'; got fields {reopened_fields:?}",
);
}
#[tokio::test]

View file

@ -51,6 +51,14 @@ constraints. User-facing behavior should still be documented through
| Install and deployment packaging | [install.md](../user/install.md), [deployment.md](../user/deployment.md) |
| Release history | [releases/](../releases/) |
## Active Implementation Plans
Working documents for in-flight feature work. Removed when the work lands.
| Area | Read |
|---|---|
| Schema-lint chassis v1 (MR-694) — `--allow-data-loss`, soft/hard drops | [schema-lint-v1-plan.md](schema-lint-v1-plan.md) |
## Boundary
Developer docs may mention implementation details, stale gaps, upstream Lance

View file

@ -0,0 +1,86 @@
# Schema-lint chassis v1 — implementation plan
Work-in-progress checklist for the next slice of the chassis. v0 (the code-tagged diagnostics layer, MR-694 first PR #87) shipped on `main`. v1 brings the chassis to **enforced behavior**: `--allow-data-loss` flag, the `Soft | Hard` mode dimension on drops, and the first real "destructive but supported" migration step.
This document tracks scope so the PR can land in incremental commits without losing the thread. Delete after the work is merged.
## Lance substrate alignment (revised 2026-05-13)
After a substrate audit against the [Lance data-evolution guide](https://lance.org/guide/data_evolution/), the v1 plan was simplified. Lance's `drop_columns()` is **already metadata-only and reversible via time travel until cleanup**:
> `drop_columns` is metadata-only and remains reversible as long as old versions are retained. After `compact_files()` rewrites data files and `cleanup_old_versions()` removes old manifests/files, removed data may become permanently unrecoverable.
This means:
- **Soft mode = `Dataset::drop_columns([name])`**. No separate `tombstoned: bool` catalog field. Lance's version graph IS the tombstone.
- **Hard mode = `drop_columns()` + `compact_files()` + `cleanup_old_versions()`** (the existing `omnigraph cleanup` pipeline).
- **No `omnigraph schema unhide` command needed**. Undo is `omnigraph snapshot --at <commit>` or `omnigraph branch create --from <commit>` — the existing time-travel surface.
Two commits dropped from the v1 plan as a result: the old commit 3 (tombstone fields on catalog IR) and commit 8 (unhide command). Net: ~250 LoC less surface, more substrate-aligned, fewer new concepts.
The broader substrate migration (using Lance native APIs across all migration steps, not just drop) is tracked in **MR-948** — out of scope for this branch but linked from each commit below.
## Done in this branch so far
- [x] **Commit 1**`SchemaMigrationStep::diagnostic()` helper + CLI plan output displays tier alongside the code: `unsupported change on node:Person.age [OG-DS-104, destructive]: ...`. No behavior change. All 11 existing `schema_apply` tests still pass.
- [x] **Commit 2**`DropMode { Soft, Hard }` enum + dormant `DropType` and `DropProperty` variants on `SchemaMigrationStep`. Apply path has an exhaustive-match arm returning `manifest_internal` if either variant arrives via deserialization. Serde round-trip pinned for stable wire shape.
## Next commits (in order)
### Commit 3 — Planner emits `DropProperty { Soft }` + apply calls `Dataset::drop_columns`
Replaces the earlier "tombstone fields on catalog IR" commit. No catalog IR changes needed — Lance handles the tombstone via its version graph.
- [ ] In `plan_properties`'s leftover-property branch: emit `DropProperty { Soft }` instead of `UnsupportedChange` for OG-DS-104.
- [ ] Same for node-type removal (`plan_nodes` leftover → `DropType { Soft }`, OG-DS-102) and edge-type removal (`plan_edges` leftover → `DropType { Soft }`, OG-DS-103).
- [ ] `apply_schema_with_lock` handles `DropProperty { Soft }`: calls `Dataset::drop_columns(&[property_name])` and commits via the staged-write path. **Substrate primitive: Lance metadata-only commit.**
- [ ] `apply_schema_with_lock` handles `DropType { Soft }`: marks the table tombstoned in `__manifest` (data files retained). Reversible via branch / snapshot restore.
- [ ] Recovery sidecar: standard `catalog_only` discipline — the Lance commit IS the recoverable unit.
- [ ] CLI plan output renders the new variants with mode visible.
- [ ] Integration test (extends `tests/schema_apply.rs`): remove a property, assert apply succeeds, row count preserved, current-version schema query no longer surfaces the property, prior-version time-travel query still sees it.
### Commit 4 — Convert PR #62 destructive-rejection tests
- [ ] `tests/schema_apply.rs`'s 6 PR #62 tests currently assert "removing X fails with `OG-DS-XXX`". Convert each to:
- **Without `--allow-data-loss`**: soft drop succeeds (Lance metadata-only, rows preserved in current version, recoverable via time travel).
- **With `--allow-data-loss`**: hard drop succeeds (column data deleted after compact + cleanup) — covered by commit 5.
- [ ] Add a new test that asserts **time-travel reversibility**: drop a column, query at prior version, verify the column is still present in that snapshot.
### Commit 5 — `--allow-data-loss` CLI flag + `Hard` mode
- [ ] Add `--allow-data-loss` boolean flag to `omnigraph schema apply` and `omnigraph schema plan` (plan shows what would happen if applied with the flag).
- [ ] Thread through to `apply_schema_with_lock(.., allow_data_loss: bool)`.
- [ ] Planner: when `--allow-data-loss` is set, emit `Hard` mode instead of `Soft` for drop paths.
- [ ] Apply path for `Hard` mode `DropProperty`: `drop_columns()` + `compact_files()` + `cleanup_old_versions()`. **Substrate primitives:** Lance's existing cleanup pipeline; same APIs `omnigraph cleanup` already uses.
- [ ] Apply path for `Hard` mode `DropType`: remove the manifest entry + drop the Lance dataset.
- [ ] Recovery sidecar discipline: `full_rewrite` (the cleanup phase is the rewrite).
- [ ] Integration tests: hard drop deletes data; without flag, hard drop is impossible (planner only emits soft).
## Open questions
- **`Hard` mode cleanup: inline vs. deferred.** Should `--allow-data-loss` on apply run `compact_files` + `cleanup_old_versions` **inline** (operator gets data deletion immediately) or **defer** to the next `omnigraph cleanup` run (operator's existing pipeline)? Recommend **inline** for ergonomic guarantee — the flag is explicit consent and operators who said "yes data loss" expect data to actually be gone after the apply returns.
- **Query-level enforcement on dropped types**: should a `match { $p: DroppedType }` query fail at parse time, lint time, or runtime? Recommend: lint warning at parse time (new code in the QL family); runtime returns empty result (Lance no longer has the column). Different from before: no "tombstoned" state to surface — the type is genuinely gone in the current version's catalog.
- **`Hard` mode for `DropType` data forensics**: dataset deletion via Lance is irreversible after `cleanup_old_versions`. Operators who want forensics should take a snapshot or tag first. Document this in `docs/schema-language.md` once this lands. No special escape hatch in the apply path.
## Not in v1 scope (deferred)
- Severity config in `omnigraph.yaml` (per-rule `error` / `warn` / `force`).
- `@allow(OG-XXX-NNN, "rationale")` suppression directives.
- Pre-migration checks (MR-941).
- CD / VE / LK / NM family rules (MR-942..MR-945).
- CI integration (MR-946).
- The remaining 12 of 17 `UnsupportedChange` paths still untagged with codes (interface removal, edge endpoint change, edge cardinality change, etc.). Each goes through its own MR-XXX issue.
- **Substrate alignment of non-drop migration steps** (Lance native `add_columns`, `alter_columns` for renames, type casts, defaults). Tracked in **MR-948**. Today's `AddProperty` and `RenameProperty` still go through `stage_overwrite`; v1 doesn't change that.
## What changed from the original plan
For posterity / reviewers comparing against the initial plan committed in commit 1:
| Was | Now | Why |
|---|---|---|
| Commit 3: `tombstoned: bool` on `NodeIR`/`EdgeIR`/`PropertyIR` | **Removed.** No catalog IR change needed. | Lance's `drop_columns()` is metadata-only; Lance's version graph is the tombstone. |
| Commit 5: apply path writes `tombstoned: true` into catalog | Apply path calls `Dataset::drop_columns([name])` | Substrate-aligned. Lance handles the metadata commit. |
| Commit 7 Hard mode: `stage_overwrite` removing the column | `drop_columns` + `compact_files` + `cleanup_old_versions` | Substrate-aligned. Reuses `omnigraph cleanup` pipeline. |
| Commit 8: `omnigraph schema unhide <name>` | **Removed.** | Time travel is the undo. `omnigraph snapshot --at <commit>` reaches the pre-drop version. |
| 8 commits total | 5 commits total | Smaller surface, fewer new concepts. |
The chassis types (`DropMode` enum, `DropType` / `DropProperty` variants — commit 2) are kept exactly as designed; only the implementation strategy changed.