forbidden_apis: add .restore( and document why .append( / .delete( are excluded

Cubic flagged that the guard misses `ds.append() / ds.delete() / ds.restore()`.

`.restore(` is added — Lance-specific (no false positives in the
workspace).

`.append(` and `.delete(` stay excluded with a documenting comment:
* `.append(` over-matches `Vec::append`, `String::append`, every
  `arrow_array::xxxArrayBuilder::append` (30+ legit uses across
  `exec/mutation.rs`, `loader/jsonl.rs`, `exec/projection.rs`).
* `.delete(` over-matches `ObjectStore::delete` (used in `storage.rs`,
  `db/schema_state.rs`, `db/omnigraph.rs:1277` for staging-file
  cleanup) and would require many `// forbidden-api-allow:` sentinels
  for legitimate uses.

The remaining bypass route — engine code that imports `lance::Dataset`
and calls `ds.append(reader, params)` — is bounded by:
1. The trait surface itself (sealed, only-callable-via-trait once
   Phase 1b call-site conversion completes).
2. The PR-review process catching new `lance::Dataset` imports in
   non-storage files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Ragnor Comerford 2026-05-02 19:25:52 +02:00
parent 9b0920b5da
commit bf7d716d1b
No known key found for this signature in database

View file

@ -75,6 +75,19 @@ const FORBIDDEN_PATTERNS: &[&str] = &[
".update_columns(",
".drop_columns(",
".truncate_table(",
// `.restore(` is Lance-specific (no other library in this workspace
// exposes a `.restore(` method); safe to ban without false-positive
// risk. Used to revert a Lance dataset to a prior version — never
// an operation engine code should perform directly.
".restore(",
// NOT included: `.append(`, `.delete(`, `.write(`. Each over-matches
// legitimate non-Lance uses (`Vec::append`, `String::append`, arrow
// array `BuilderArray::append`, `ObjectStore::delete`, etc.).
// Engine code calling `ds.append(reader, params)` against an
// imported `lance::Dataset` is the residual bypass route this guard
// does NOT catch — but the trait surface itself is the primary
// enforcement (sealed + only-callable-via-trait once Phase 1b
// call-site conversion completes), so this gap is bounded.
];
/// Files exempt from the guard. These are the legitimate storage-layer