docs(plans): add Phase 2 sub-plans 0002a-0002i + supersession notice

Nine Phase 2 sub-plans operationalising ADR 0002 against the Phase 2 master plan, each sized to fit a focused implementation session and handed to Claude Code as a /goal brief without requiring the agent to load the master plan. Order of execution (each depends on the previous unless noted): - 0002a-skeleton-and-feature-gate.md -- postgres-backend Cargo feature + PgMemoryStore skeleton with todo!() bodies. D1+D2. - 0002b-pool-and-config.md -- PgPool builder, VestigeConfig/ PostgresConfig, vestige.toml loader wired into vestige-mcp. D3+D7 (master plan numbering). - 0002c-migrations.md -- sqlx migrations 0001_init/0002_hnsw including D7 (users/groups/memberships, owner/visibility/shared_with_groups) and D8 (codebase column). SQLite V15 parity migration. D4. - 0002d-store-impl-bodies.md -- real CRUD + registry bodies; trivial fts_search/vector_search bodies. D2+D6. - 0002e-hybrid-search.md -- one-statement RRF query. D5. - 0002f-migrate-cli.md -- vestige migrate copy (SQLite -> Postgres), --dry-run, idempotent re-runs, --allow-source-upgrade for pre-V15 sources. D8+D10. - 0002g-reembed.md -- vestige migrate reembed (offline rebuild). D9 + D10 reembed arm. Ships resolve_embedder helper as a workaround for the missing Embedder::from_name(&str) constructor. - 0002h-testing-and-benches.md -- testcontainers harness, six integration test files, Criterion bench at 1k/100k. D14+D15. - 0002i-runbook.md -- operator-facing deployment + day-2 runbook. D16. Supersession notice added to the master plan (0002-phase-2-postgres- backend.md) pointing at ADR 0002; body retained as archival reference. PR B carries this commit plus the previous two (ADR 0002 + Phase 1 amendment sub-plans); no code change.
2026-06-16 21:05:15 +02:00 · 2026-05-27 09:35:58 +02:00 · 2026-05-27 09:35:58 +02:00 · 9ef8afdb20
commit 9ef8afdb20
parent 697ade5b02
10 changed files with 8998 additions and 0 deletions
--- a/docs/plans/0002-phase-2-postgres-backend.md
+++ b/docs/plans/0002-phase-2-postgres-backend.md
@ -1,5 +1,13 @@
 # Phase 2 Plan: PostgreSQL Backend

+> **Supersession Notice (2026-05-26):** This master plan is now archival. Execution is governed by:
+> - **ADR**: [`docs/adr/0002-phase-2-execution.md`](../adr/0002-phase-2-execution.md) -- binding decisions
+> - **Sub-plans** (executable briefs):
+>   - Phase 1 amendment: [0001a-trait-rewrite.md](0001a-trait-rewrite.md), [0001b-sqlite-split.md](0001b-sqlite-split.md), [0001c-async-trait-sunset.md](0001c-async-trait-sunset.md)
+>   - Phase 2: 0002a..0002i (skeleton, pool+config, migrations, store impl, hybrid search, migrate CLI, reembed, tests+benches, runbook)
+>
+> **Deltas vs body**: trait uses `trait_variant`, error type is `MemoryStoreError`/`MemoryStoreResult`, `connect` is `(url, max_connections)` only, the core table is `knowledge_nodes` (not `memories`) and gains `owner_user_id` + `visibility` + `shared_with_groups` + `codebase`, plus `users`/`groups`/`group_memberships` tables. See ADR 0002 D1-D8.
+
 **Status**: Draft
 **Depends on**: Phase 1 (MemoryStore + Embedder traits, embedding_model registry, domain columns)
 **Related**: docs/adr/0001-pluggable-storage-and-network-access.md (Phase 2), docs/prd/001-getting-centralized-vestige.md, docs/plans/local-dev-postgres-setup.md (local cluster provisioning)
--- a/docs/plans/0002a-skeleton-and-feature-gate.md
+++ b/docs/plans/0002a-skeleton-and-feature-gate.md
@ -0,0 +1,554 @@
+# Phase 2 Sub-Plan 0002a -- Skeleton and Feature Gate
+
+**Status**: Ready
+**Depends on**: Phase 1 amendment (sub-plans `0001a-trait-rewrite.md` and
+`0001b-sqlite-split.md`) merged. Specifically:
+- `MemoryStore` trait declared with `#[trait_variant::make(MemoryStore: Send)]`,
+  generating a non-Send `LocalMemoryStore` companion trait. The
+  `pub use MemoryStore as LocalMemoryStore` alias from Phase 1 is gone.
+- `crates/vestige-core/src/storage/sqlite.rs` has been split into
+  `crates/vestige-core/src/storage/sqlite/` with the same public surface.
+
+This sub-plan covers Phase 2 master-plan deliverables D1 and D2 only:
+the `postgres-backend` Cargo feature gate and a compilable `PgMemoryStore`
+skeleton whose trait method bodies are `todo!()`. No real Postgres code, no
+migrations, no SQL. Later sub-plans (`0002b-pool-and-config.md`,
+`0002c-migrations.md`, `0002d-store-impl-bodies.md`, ...) fill the bodies in.
+
+The success criterion is a clean build under both feature-flag configurations,
+nothing more.
+
+---
+
+## Context
+
+ADR 0002 D4 commits Phase 2 to a `crates/vestige-core/src/storage/postgres/`
+directory from day one. The seven other files in that directory
+(`pool.rs`, `migrations.rs`, `registry.rs`, `search.rs`, `migrate_cli.rs`,
+`reembed.rs`) belong to subsequent sub-plans. This sub-plan creates only
+`crates/vestige-core/src/storage/postgres/mod.rs` so the rest can be added
+incrementally without breaking the build.
+
+Per ADR 0002 D2, `PgMemoryStore::connect` mirrors `SqliteMemoryStore::new`:
+no `Embedder` argument. The pgvector typmod DDL
+(`ALTER TABLE memories ALTER COLUMN embedding TYPE vector($N)`) lives inside
+the trait method `register_model`, invoked by the caller after construction.
+In this sub-plan `register_model` is a `todo!()` body; `0002c-migrations.md`
+and `0002d-store-impl-bodies.md` provide the real implementation.
+
+The trait surface in `crates/vestige-core/src/storage/memory_store.rs` is the
+source of truth for method signatures. Do NOT copy signatures from the master
+plan -- they are stale in places (for example, master plan 0002 D2 lists
+`remove_edge` as three-arg `(source, target, edge_type)`; the live trait has
+two args `(source, target)`).
+
+---
+
+## Cargo manifest changes
+
+Two optional crates and one new feature flag. Use `cargo add` per the global
+CLAUDE.md preference; do not hand-edit `Cargo.toml`.
+
+```bash
+cd crates/vestige-core
+
+cargo add sqlx@0.8 --optional --no-default-features \
+    --features runtime-tokio,tls-rustls,postgres,uuid,chrono,json,migrate,macros
+
+cargo add pgvector@0.4 --optional --features sqlx
+```
+
+After both commands, open `crates/vestige-core/Cargo.toml` and add the
+`postgres-backend` feature line in the `[features]` block. Place it after
+the `metal` feature, before `[dependencies]`:
+
+```toml
+# Postgres backend (mutually compilable with the SQLite backend; default OFF).
+# Compile with: --features postgres-backend
+postgres-backend = ["dep:sqlx", "dep:pgvector"]
+```
+
+Do NOT add `tokio-stream`, `futures`, `indicatif`, or `toml` in this sub-plan.
+The master plan D1 lists them in the `postgres-backend` feature for
+convenience, but their consumers (streaming migrate, progress bar, config
+parsing) land in later sub-plans. Adding them here pulls dead weight into the
+feature gate.
+
+Do NOT add the `vestige-mcp` pass-through feature in this sub-plan either.
+The MCP crate gets its `postgres-backend` feature in `0002b-pool-and-config.md`
+when `MemoryStoreConfig` lands and the binary needs a knob to pick a backend.
+
+Verify the diff to `crates/vestige-core/Cargo.toml` looks like this and only
+this:
+
+```toml
+[features]
+# ...existing features unchanged...
+postgres-backend = ["dep:sqlx", "dep:pgvector"]
+
+[dependencies]
+# ...existing deps unchanged...
+sqlx = { version = "0.8", default-features = false, features = [
+    "runtime-tokio", "tls-rustls", "postgres", "uuid", "chrono",
+    "json", "migrate", "macros",
+], optional = true }
+pgvector = { version = "0.4", features = ["sqlx"], optional = true }
+```
+
+The exact ordering of the two new lines inside `[dependencies]` is not
+significant; `cargo add` places them at the end. Leave the placement that
+`cargo add` produces.
+
+---
+
+## Storage module export
+
+Edit `crates/vestige-core/src/storage/mod.rs` to expose the new module behind
+the feature flag. Two lines change.
+
+Add to the module-declaration block (after `mod sqlite;`):
+
+```rust
+#[cfg(feature = "postgres-backend")]
+mod postgres;
+```
+
+Add to the re-export block (after the `pub use sqlite::{ ... }` block):
+
+```rust
+#[cfg(feature = "postgres-backend")]
+pub use postgres::PgMemoryStore;
+```
+
+Nothing else in `storage/mod.rs` changes. The `Storage` alias still points at
+`SqliteMemoryStore`; the SQLite re-export block is untouched.
+
+---
+
+## Postgres module skeleton
+
+Create `crates/vestige-core/src/storage/postgres/mod.rs` with the full content
+below. This is the only new file in this sub-plan.
+
+```rust
+#![cfg(feature = "postgres-backend")]
+//! Postgres-backed implementation of `MemoryStore`.
+//!
+//! Skeleton only. Every trait method is `todo!()`. Real bodies land in
+//! subsequent Phase 2 sub-plans:
+//! - `0002b-pool-and-config.md`: pool construction and config wiring
+//! - `0002c-migrations.md`:      sqlx migration files and `init`/`register_model`
+//! - `0002d-store-impl-bodies.md`: CRUD, scheduling, edges, domains
+//! - `0002e-hybrid-search.md`:   RRF query and search bodies
+//!
+//! The directory grows companion files (`pool.rs`, `migrations.rs`,
+//! `registry.rs`, `search.rs`, `migrate_cli.rs`, `reembed.rs`) in those
+//! sub-plans; this skeleton only creates `mod.rs`.
+
+use chrono::{DateTime, Utc};
+use sqlx::PgPool;
+use uuid::Uuid;
+
+use crate::storage::memory_store::{
+    Domain, HealthStatus, LocalMemoryStore, MemoryEdge, MemoryRecord, MemoryStoreResult,
+    ModelSignature, SchedulingState, SearchQuery, SearchResult, StoreStats,
+};
+
+/// Postgres-backed implementation of `MemoryStore`.
+///
+/// Cheaply cloneable. Methods take `&self`; interior state lives inside the
+/// `PgPool` (which already provides `Sync` via `Arc` internally).
+#[derive(Clone)]
+pub struct PgMemoryStore {
+    pool: PgPool,
+    /// Embedding vector dimension. Set to 0 in the skeleton; populated by
+    /// `register_model` in `0002d-store-impl-bodies.md` once the pgvector
+    /// `ALTER COLUMN TYPE vector(N)` DDL lands.
+    embedding_dim: i32,
+}
+
+impl PgMemoryStore {
+    /// Construct a new store from a connection URL.
+    ///
+    /// Mirrors `SqliteMemoryStore::new`: no `Embedder` argument. The pgvector
+    /// `ALTER TABLE memories ALTER COLUMN embedding TYPE vector($N)` DDL lives
+    /// inside `register_model`, not here. The caller (cognitive engine
+    /// bootstrap, migrate CLI, tests) invokes `register_model` after this
+    /// returns, identically to the SQLite path.
+    ///
+    /// Real body lands in `0002b-pool-and-config.md` (pool construction) and
+    /// `0002c-migrations.md` (initial migration run).
+    pub async fn connect(_url: &str, _max_connections: u32) -> MemoryStoreResult<Self> {
+        todo!("PgMemoryStore::connect lands in 0002b-pool-and-config.md")
+    }
+
+    /// Low-level constructor for tests: supply an existing pool, skip migrate.
+    ///
+    /// Real body lands in `0002b-pool-and-config.md`.
+    pub async fn from_pool(_pool: PgPool) -> MemoryStoreResult<Self> {
+        todo!("PgMemoryStore::from_pool lands in 0002b-pool-and-config.md")
+    }
+
+    /// Accessor used by migrate/reembed CLI.
+    pub fn pool(&self) -> &PgPool {
+        &self.pool
+    }
+
+    /// Currently-registered vector dimension. Returns 0 until `register_model`
+    /// has been called (real body: `0002d-store-impl-bodies.md`).
+    pub fn embedding_dim(&self) -> i32 {
+        self.embedding_dim
+    }
+}
+
+// trait_variant::make on the trait declaration generates two traits:
+//   - `MemoryStore` (Send-bound)
+//   - `LocalMemoryStore` (non-Send companion)
+// The implementer writes `impl LocalMemoryStore for ...` plainly; the Send
+// variant is generated automatically from the non-Send impl.
+impl LocalMemoryStore for PgMemoryStore {
+    // --- Lifecycle ---
+    async fn init(&self) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::init lands in 0002c-migrations.md")
+    }
+
+    async fn health_check(&self) -> MemoryStoreResult<HealthStatus> {
+        todo!("PgMemoryStore::health_check lands in 0002d-store-impl-bodies.md")
+    }
+
+    // --- Embedding model registry ---
+    async fn registered_model(&self) -> MemoryStoreResult<Option<ModelSignature>> {
+        todo!("PgMemoryStore::registered_model lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn register_model(&self, _sig: &ModelSignature) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::register_model lands in 0002d-store-impl-bodies.md")
+    }
+
+    // --- CRUD ---
+    async fn insert(&self, _record: &MemoryRecord) -> MemoryStoreResult<Uuid> {
+        todo!("PgMemoryStore::insert lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn get(&self, _id: Uuid) -> MemoryStoreResult<Option<MemoryRecord>> {
+        todo!("PgMemoryStore::get lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn update(&self, _record: &MemoryRecord) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::update lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn delete(&self, _id: Uuid) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::delete lands in 0002d-store-impl-bodies.md")
+    }
+
+    // --- Search ---
+    async fn search(&self, _query: &SearchQuery) -> MemoryStoreResult<Vec<SearchResult>> {
+        todo!("PgMemoryStore::search lands in 0002e-hybrid-search.md")
+    }
+
+    async fn fts_search(
+        &self,
+        _text: &str,
+        _limit: usize,
+    ) -> MemoryStoreResult<Vec<SearchResult>> {
+        todo!("PgMemoryStore::fts_search lands in 0002e-hybrid-search.md")
+    }
+
+    async fn vector_search(
+        &self,
+        _embedding: &[f32],
+        _limit: usize,
+    ) -> MemoryStoreResult<Vec<SearchResult>> {
+        todo!("PgMemoryStore::vector_search lands in 0002e-hybrid-search.md")
+    }
+
+    // --- FSRS Scheduling ---
+    async fn get_scheduling(
+        &self,
+        _memory_id: Uuid,
+    ) -> MemoryStoreResult<Option<SchedulingState>> {
+        todo!("PgMemoryStore::get_scheduling lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn update_scheduling(&self, _state: &SchedulingState) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::update_scheduling lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn get_due_memories(
+        &self,
+        _before: DateTime<Utc>,
+        _limit: usize,
+    ) -> MemoryStoreResult<Vec<(MemoryRecord, SchedulingState)>> {
+        todo!("PgMemoryStore::get_due_memories lands in 0002d-store-impl-bodies.md")
+    }
+
+    // --- Graph (spreading activation) ---
+    async fn add_edge(&self, _edge: &MemoryEdge) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::add_edge lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn get_edges(
+        &self,
+        _node_id: Uuid,
+        _edge_type: Option<&str>,
+    ) -> MemoryStoreResult<Vec<MemoryEdge>> {
+        todo!("PgMemoryStore::get_edges lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn remove_edge(&self, _source: Uuid, _target: Uuid) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::remove_edge lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn get_neighbors(
+        &self,
+        _node_id: Uuid,
+        _depth: usize,
+    ) -> MemoryStoreResult<Vec<(MemoryRecord, f64)>> {
+        todo!("PgMemoryStore::get_neighbors lands in 0002d-store-impl-bodies.md")
+    }
+
+    // --- Domains (Phase 1: stubs return empty; full impl in Phase 4) ---
+    async fn list_domains(&self) -> MemoryStoreResult<Vec<Domain>> {
+        todo!("PgMemoryStore::list_domains lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn get_domain(&self, _id: &str) -> MemoryStoreResult<Option<Domain>> {
+        todo!("PgMemoryStore::get_domain lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn upsert_domain(&self, _domain: &Domain) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::upsert_domain lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn delete_domain(&self, _id: &str) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::delete_domain lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn classify(&self, _embedding: &[f32]) -> MemoryStoreResult<Vec<(String, f64)>> {
+        todo!("PgMemoryStore::classify lands in 0002d-store-impl-bodies.md")
+    }
+
+    // --- Bulk / Maintenance ---
+    async fn count(&self) -> MemoryStoreResult<usize> {
+        todo!("PgMemoryStore::count lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn get_stats(&self) -> MemoryStoreResult<StoreStats> {
+        todo!("PgMemoryStore::get_stats lands in 0002d-store-impl-bodies.md")
+    }
+
+    async fn vacuum(&self) -> MemoryStoreResult<()> {
+        todo!("PgMemoryStore::vacuum lands in 0002d-store-impl-bodies.md")
+    }
+}
+```
+
+Notes on the skeleton:
+
+- The file-level `#![cfg(feature = "postgres-backend")]` means the whole file
+  vanishes when the feature is off. The `mod postgres;` line in
+  `storage/mod.rs` is itself feature-gated, so this is belt-and-braces; both
+  gates are needed because the file-level attribute is what allows the file to
+  use `sqlx::PgPool` unconditionally inside it.
+- `EmbeddingModelDescriptor` (a separate Postgres-internal type that the
+  master plan sketched on the struct) is dropped. The trait surface already
+  carries `ModelSignature` for the registry round-trip; the registry storage
+  layout is a private concern of `registry.rs`, which is added later. Keep
+  `PgMemoryStore` minimal until a real consumer needs the extra type.
+- The struct only carries `pool` and `embedding_dim`. The model descriptor
+  field from the master plan sketch goes away with `EmbeddingModelDescriptor`.
+  If `register_model` later needs to cache the descriptor on the struct, it
+  can be added then; the skeleton does not speculate.
+- The two trivial accessors (`pool`, `embedding_dim`) get real bodies. Every
+  other method is `todo!()` so it returns `!` and trivially coerces to the
+  declared return type at the type checker; this is what lets the build pass
+  with no error variants and no SQL.
+
+---
+
+## Connect signature
+
+Per ADR 0002 D2:
+
+```rust
+pub async fn connect(url: &str, max_connections: u32) -> MemoryStoreResult<Self>;
+pub async fn from_pool(pool: PgPool) -> MemoryStoreResult<Self>;
+```
+
+No `&dyn Embedder` argument. This deliberately differs from master plan 0002,
+which predates the Phase 1 freeze. The pgvector-specific DDL
+(`ALTER TABLE memories ALTER COLUMN embedding TYPE vector($N)`) does not run
+inside `connect`; it runs inside `register_model(&ModelSignature)`, which the
+caller invokes after `connect` returns.
+
+In this sub-plan `register_model` is `todo!()`. The real body lands in
+`0002d-store-impl-bodies.md` after `0002c-migrations.md` ships the
+`0001_init.up.sql` migration that creates the `memories` table with a
+placeholder `embedding vector` column (no typmod), against which
+`register_model` later runs the typmod stamp.
+
+---
+
+## Error variant additions: deferred
+
+`MemoryStoreError` does NOT gain `Postgres(sqlx::Error)` or
+`Migrate(sqlx::migrate::MigrateError)` in this sub-plan.
+
+The reason is mechanical: `todo!()` evaluates to the never type `!`, which
+coerces to any `MemoryStoreResult<T>` regardless of the error variants
+available. With every method body a `todo!()`, the skeleton has no expression
+that needs to convert a `sqlx::Error` or `sqlx::migrate::MigrateError` into
+`MemoryStoreError`. Adding the variants here would mean adding the
+`#[cfg(feature = "postgres-backend")]` and `#[from]` plumbing to
+`memory_store.rs` with no consumer yet -- dead code at every level except the
+enum definition itself.
+
+`0002d-store-impl-bodies.md` introduces both variants in the same commit that
+turns the first `todo!()` into a real `sqlx::query!` call. That keeps the
+diff to `memory_store.rs` next to the first usage site, which is easier to
+review than adding variants ahead of need.
+
+For reference, the variants that will be added in `0002d-store-impl-bodies.md`
+look like this:
+
+```rust
+#[cfg(feature = "postgres-backend")]
+#[error("postgres error: {0}")]
+Postgres(#[from] sqlx::Error),
+
+#[cfg(feature = "postgres-backend")]
+#[error("postgres migration error: {0}")]
+Migrate(#[from] sqlx::migrate::MigrateError),
+```
+
+Do not pre-add them here.
+
+---
+
+## Verification
+
+Run these commands from the workspace root. All four must produce a clean
+build, zero warnings on the diff-affected files, no test changes.
+
+```bash
+# 1. Default features (SQLite backend, postgres-backend OFF). Must build.
+cargo build --workspace --all-targets
+
+# 2. Workspace clippy with default features. Must be clean.
+cargo clippy --workspace --all-targets -- -D warnings
+
+# 3. Postgres feature enabled. Must build.
+cargo build -p vestige-core --features postgres-backend
+
+# 4. Clippy with postgres feature enabled. Must be clean.
+cargo clippy -p vestige-core --features postgres-backend --all-targets -- -D warnings
+```
+
+Expected outcomes:
+
+- `cargo build --workspace --all-targets` finishes with no compilation of
+  `sqlx` or `pgvector` (both are optional, no consumer with default features).
+  The `postgres` module is excluded entirely via `#[cfg]`.
+- `cargo build -p vestige-core --features postgres-backend` compiles `sqlx`,
+  `pgvector`, and `storage/postgres/mod.rs`. The build succeeds because every
+  trait method is `todo!()`; nothing actually runs SQL.
+- Both `clippy` invocations pass with `-D warnings`. The `todo!()` macro does
+  not emit a `dead_code` lint by itself, and the trivial accessors are used by
+  later sub-plans (clippy on the postgres feature alone may flag them as
+  unused if you run with `--lib` only; the `--all-targets` form keeps tests
+  and benches in scope so this does not fire).
+- If clippy flags `unused_variables` on the underscore-prefixed parameters in
+  the `todo!()` bodies, the underscore prefix is already the standard
+  suppression; if a future clippy version disagrees, add
+  `#[allow(unused_variables)]` to the impl block, not to each method.
+
+Tests are not modified in this sub-plan. The unit tests in
+`memory_store.rs` (`memory_store_error_from_storage_error`,
+`model_signature_serde_round_trip`, `memory_record_serde_round_trip`) keep
+passing because no type they touch changes.
+
+Do NOT run `cargo test` against the postgres feature -- there is no Postgres
+running and no test exercises `PgMemoryStore` yet. The build check is the
+contract.
+
+---
+
+## Acceptance criteria
+
+1. `crates/vestige-core/Cargo.toml` declares `sqlx = "0.8"` and
+   `pgvector = "0.4"` as optional dependencies with the exact feature sets
+   specified above.
+2. `crates/vestige-core/Cargo.toml` declares `postgres-backend = ["dep:sqlx",
+   "dep:pgvector"]` and nothing else inside that feature.
+3. `crates/vestige-mcp/Cargo.toml` is unchanged.
+4. `crates/vestige-core/src/storage/mod.rs` adds exactly two
+   feature-gated lines: `mod postgres;` and `pub use postgres::PgMemoryStore;`.
+   No other change.
+5. `crates/vestige-core/src/storage/postgres/mod.rs` exists and contains the
+   `PgMemoryStore` struct, `impl PgMemoryStore` block with real `pool` and
+   `embedding_dim` accessors and `todo!()` bodies for `connect` and
+   `from_pool`, and the full `impl LocalMemoryStore for PgMemoryStore` block
+   with `todo!()` for every trait method.
+6. The trait impl method signatures match `memory_store.rs` byte-for-byte
+   (including `remove_edge(&self, source: Uuid, target: Uuid)` two-arg form,
+   not the three-arg form from the master plan).
+7. `MemoryStoreError` is unchanged.
+8. No other files in the crate are touched. No new files in
+   `storage/postgres/` besides `mod.rs`.
+9. The four verification commands above all succeed.
+
+---
+
+## Commit sequence
+
+One commit is recommended. The two changes (Cargo manifest + module skeleton)
+do not compile in isolation: the manifest change without the skeleton produces
+unused-optional-dep warnings, and the skeleton without the manifest change
+fails to find `sqlx`. Splitting them adds no review value, since the second
+commit is the one that has to compile cleanly.
+
+```bash
+git add crates/vestige-core/Cargo.toml \
+        crates/vestige-core/Cargo.lock \
+        crates/vestige-core/src/storage/mod.rs \
+        crates/vestige-core/src/storage/postgres/mod.rs
+
+git commit -m "feat(storage): scaffold postgres-backend feature and PgMemoryStore skeleton
+
+Adds the postgres-backend Cargo feature gating sqlx 0.8 and pgvector 0.4.
+Introduces crates/vestige-core/src/storage/postgres/mod.rs with the
+PgMemoryStore struct, connect/from_pool/pool/embedding_dim, and a trait impl
+whose method bodies are todo!() pending later Phase 2 sub-plans.
+
+Builds clean with default features (SQLite only) and with --features
+postgres-backend. No runtime behaviour change.
+
+Refs ADR 0002 D1, D2, D4."
+```
+
+If for any reason the manifest change must be reviewed separately (for
+example, a security review of the sqlx version pin), split as:
+
+1. `cargo add` for sqlx and pgvector + manual feature line in Cargo.toml.
+   Build with default features will pass but `--features postgres-backend`
+   will fail (no module to satisfy the feature). This is acceptable for a
+   short-lived intermediate commit.
+2. `storage/mod.rs` edits + `storage/postgres/mod.rs` creation. Both builds
+   pass.
+
+Default to the single-commit form unless asked to split.
+
+---
+
+## Followups
+
+- `0002b-pool-and-config.md` adds `pool.rs`, `PostgresConfig`, and the
+  `vestige-mcp` `postgres-backend` pass-through feature.
+- `0002c-migrations.md` adds `crates/vestige-core/migrations/postgres/` with
+  `0001_init.{up,down}.sql` and `0002_hnsw.{up,down}.sql`, plus
+  `postgres/migrations.rs` invoking `sqlx::migrate!`. `init()` body lands here.
+- `0002d-store-impl-bodies.md` introduces the two `MemoryStoreError` variants
+  and replaces every `todo!()` in CRUD / scheduling / edges / domains /
+  registry with real `sqlx::query!` bodies.
+- `0002e-hybrid-search.md` fills the three search bodies via the RRF query.
--- a/docs/plans/0002b-pool-and-config.md
+++ b/docs/plans/0002b-pool-and-config.md
@ -0,0 +1,886 @@
+# Sub-plan 0002b -- Pool construction and VestigeConfig
+
+**Status**: Draft
+**Master plan**: [0002-phase-2-postgres-backend.md](0002-phase-2-postgres-backend.md)
+**ADR**: [0002-phase-2-execution.md](../adr/0002-phase-2-execution.md)
+**Predecessor**: [0002a-skeleton-and-feature-gate.md](0002a-skeleton-and-feature-gate.md)
+
+---
+
+## Context
+
+This sub-plan delivers two of the master plan's deliverables now that the
+`0002a` skeleton has landed:
+
+- **D3** -- pool construction in
+  `crates/vestige-core/src/storage/postgres/pool.rs`. Replaces the `todo!()`
+  body of `PgMemoryStore::connect` with a real `PgPool` builder that reads a
+  `PostgresConfig`. Registry/migration calls remain `todo!()`; those are
+  filled in by sub-plans `0002c` (migrations) and `0002d` (store bodies +
+  registry).
+- **D7** -- new module `crates/vestige-core/src/config.rs` containing
+  `VestigeConfig`, `StorageConfig`, `SqliteConfig`, `PostgresConfig`,
+  `EmbeddingsConfig`, plus a `ConfigError` enum and a loader that reads
+  `vestige.toml`. The loader is wired into `vestige-mcp` so the running
+  server picks SQLite or Postgres at startup based on the config file.
+
+After this sub-plan:
+
+- `cargo build` (default features, no `postgres-backend`) compiles and the
+  MCP server still defaults to SQLite when no `vestige.toml` is present.
+- `cargo build --features postgres-backend` compiles, with
+  `PgMemoryStore::connect` now wiring through `pool.rs` (registry/migration
+  still `todo!()` until `0002c` and `0002d`).
+- A `vestige.toml` example can be round-tripped through
+  `VestigeConfig::load` in a unit test.
+
+This sub-plan deliberately does NOT:
+
+- Add migrations (`0002c`).
+- Fill in real CRUD/search bodies on `PgMemoryStore` (`0002d`, `0002e`).
+- Add env-var override support (Phase 3 concern, called out in master plan
+  D7 behaviour notes).
+
+---
+
+## Dependencies
+
+- `0002a-skeleton-and-feature-gate.md` must be merged. That sub-plan creates
+  `crates/vestige-core/src/storage/postgres/mod.rs` with:
+  - `PgMemoryStore` struct holding `pool: PgPool`.
+  - `PgMemoryStore::connect(url: &str, max_connections: u32) -> MemoryStoreResult<Self>`
+    body = `todo!()`.
+  - `PgMemoryStore::from_pool(pool: PgPool) -> MemoryStoreResult<Self>`
+    body = `todo!()`.
+  - The trait impl block with all methods routed to `todo!()`.
+  - The `postgres-backend` feature gate on the module declaration in
+    `storage/mod.rs`.
+
+This sub-plan extends those bodies and adds two siblings: `pool.rs` and
+`registry.rs` (the latter is a stub here, real body in `0002d`).
+
+---
+
+## Audit step (do this first)
+
+Before adding `config.rs`, confirm there is no existing top-level config
+loader. Run from the repo root:
+
+```bash
+rg -nF 'VestigeConfig' crates/
+rg -nF 'toml::from_str' crates/
+rg -n '#\[derive.*Deserialize.*\]' crates/vestige-core/src/
+```
+
+If a `VestigeConfig` struct already exists from Phase 1, treat the "Config
+module" section below as additive: extend the existing struct rather than
+creating a new file. The cross-cut additions in that case are:
+
+1. Add the `StorageConfig` enum (gated and ungated branches).
+2. Add `SqliteConfig`, `PostgresConfig`.
+3. Add the `default_path()` helper if missing.
+4. Add `ConfigError` if a different error enum is used today (rename/extend
+   instead of duplicating).
+
+As of the audit at the time of this writing, no `VestigeConfig` exists in
+`vestige-core`. `directories::ProjectDirs` is already used in
+`vestige-core/src/embeddings/local.rs` and in
+`vestige-mcp/src/protocol/auth.rs`, so the `directories` crate is already a
+workspace dependency -- no new dep there.
+
+---
+
+## Cargo manifest additions
+
+Add `toml` to `vestige-core`. `serde` and `thiserror` are already present
+from Phase 1; `directories` is already a transitive dep but we add it
+explicitly so `default_path()` is supported.
+
+```bash
+cd crates/vestige-core
+cargo add toml@0.8
+cargo add directories@5
+```
+
+No new deps on `vestige-mcp`; it already depends on `vestige-core`.
+
+`sqlx` is already added by `0002a` behind the `postgres-backend` feature
+with `runtime-tokio`, `tls-rustls`, `postgres`, `uuid`, `chrono`,
+`json`, `macros`, `migrate` features. The pool module only uses what is
+already pulled in.
+
+---
+
+## Config module
+
+**File**: `crates/vestige-core/src/config.rs` (new).
+**Re-exported** from `crates/vestige-core/src/lib.rs` as `pub mod config;` plus
+`pub use config::{VestigeConfig, StorageConfig, SqliteConfig, EmbeddingsConfig, ConfigError};`
+and `#[cfg(feature = "postgres-backend")] pub use config::PostgresConfig;`.
+
+Full content:
+
+```rust
+//! Vestige top-level configuration.
+//!
+//! Loaded from `~/.vestige/vestige.toml` by default; the path is overridable
+//! via `VestigeConfig::load(Some(&path))`. Parsing uses serde + toml; the
+//! `[storage]` section is internally-tagged on a `backend` field so a single
+//! enum dispatch picks SQLite or Postgres.
+
+use std::path::{Path, PathBuf};
+
+use serde::Deserialize;
+
+/// Top-level configuration as parsed from `vestige.toml`.
+#[derive(Debug, Clone, Deserialize, Default)]
+#[serde(default, deny_unknown_fields)]
+pub struct VestigeConfig {
+    pub embeddings: EmbeddingsConfig,
+    pub storage: StorageConfig,
+    /// Reserved for Phase 3. Empty in Phase 2.
+    pub server: ServerConfig,
+    /// Reserved for Phase 3. Empty in Phase 2.
+    pub auth: AuthConfig,
+}
+
+/// Embedding provider selection.
+#[derive(Debug, Clone, Deserialize)]
+#[serde(deny_unknown_fields)]
+pub struct EmbeddingsConfig {
+    /// Provider key. Phase 2 ships `"fastembed"` only.
+    pub provider: String,
+    /// Model name. For fastembed this is e.g. `"nomic-ai/nomic-embed-text-v1.5"`.
+    pub model: String,
+}
+
+impl Default for EmbeddingsConfig {
+    fn default() -> Self {
+        Self {
+            provider: "fastembed".to_string(),
+            model: crate::DEFAULT_EMBEDDING_MODEL.to_string(),
+        }
+    }
+}
+
+/// Storage backend selection. Internally tagged on the `backend` field:
+///
+/// ```toml
+/// [storage]
+/// backend = "sqlite"
+///
+/// [storage.sqlite]
+/// path = "/home/user/.vestige/vestige.db"
+/// ```
+///
+/// or, when compiled with `--features postgres-backend`:
+///
+/// ```toml
+/// [storage]
+/// backend = "postgres"
+///
+/// [storage.postgres]
+/// url = "postgres://vestige:secret@localhost:5432/vestige"
+/// max_connections = 10
+/// acquire_timeout_secs = 30
+/// ```
+#[derive(Debug, Clone, Deserialize)]
+#[serde(tag = "backend", rename_all = "lowercase", deny_unknown_fields)]
+pub enum StorageConfig {
+    Sqlite(SqliteConfig),
+    #[cfg(feature = "postgres-backend")]
+    Postgres(PostgresConfig),
+}
+
+impl Default for StorageConfig {
+    fn default() -> Self {
+        StorageConfig::Sqlite(SqliteConfig::default())
+    }
+}
+
+/// SQLite backend configuration.
+#[derive(Debug, Clone, Deserialize)]
+#[serde(deny_unknown_fields)]
+pub struct SqliteConfig {
+    /// Path to the `vestige.db` file. If unset, the SqliteMemoryStore
+    /// constructor picks its platform default location.
+    #[serde(default)]
+    pub path: Option<PathBuf>,
+}
+
+impl Default for SqliteConfig {
+    fn default() -> Self {
+        Self { path: None }
+    }
+}
+
+/// Postgres backend configuration. Only present when the `postgres-backend`
+/// Cargo feature is enabled.
+#[cfg(feature = "postgres-backend")]
+#[derive(Debug, Clone, Deserialize)]
+#[serde(deny_unknown_fields)]
+pub struct PostgresConfig {
+    /// `postgres://user:pass@host:port/db` -- forwarded to
+    /// `PgConnectOptions::from_str`.
+    pub url: String,
+    /// Pool size. Default `10`.
+    #[serde(default)]
+    pub max_connections: Option<u32>,
+    /// Acquire timeout in seconds. Default `30`. Set above 30 so
+    /// testcontainer-based test fixtures do not race.
+    #[serde(default)]
+    pub acquire_timeout_secs: Option<u64>,
+}
+
+/// Reserved for Phase 3 (bind address, ports, TLS).
+#[derive(Debug, Clone, Default, Deserialize)]
+#[serde(default, deny_unknown_fields)]
+pub struct ServerConfig {}
+
+/// Reserved for Phase 3 (API keys, claims).
+#[derive(Debug, Clone, Default, Deserialize)]
+#[serde(default, deny_unknown_fields)]
+pub struct AuthConfig {}
+
+/// Errors raised while locating, reading, or parsing `vestige.toml`.
+#[derive(Debug, thiserror::Error)]
+pub enum ConfigError {
+    #[error("config io: {0}")]
+    Io(#[from] std::io::Error),
+    #[error("config toml: {0}")]
+    Toml(#[from] toml::de::Error),
+    #[error("config dir: could not locate user home")]
+    NoHome,
+    #[error("invalid config: {0}")]
+    Invalid(String),
+}
+
+impl VestigeConfig {
+    /// Load config from `path` or from `default_path()` when `None`.
+    ///
+    /// Returns `VestigeConfig::default()` (SQLite + fastembed defaults) when
+    /// the file does not exist. Any other I/O or parse failure is surfaced
+    /// as a `ConfigError`.
+    pub fn load(path: Option<&Path>) -> Result<Self, ConfigError> {
+        let resolved: PathBuf = match path {
+            Some(p) => p.to_path_buf(),
+            None => Self::default_path()?,
+        };
+
+        match std::fs::read_to_string(&resolved) {
+            Ok(text) => {
+                let cfg: VestigeConfig = toml::from_str(&text)?;
+                cfg.validate()?;
+                Ok(cfg)
+            }
+            Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
+                Ok(Self::default())
+            }
+            Err(e) => Err(ConfigError::Io(e)),
+        }
+    }
+
+    /// `~/.vestige/vestige.toml`. The directory is NOT created here; loading
+    /// a missing file falls back to defaults.
+    pub fn default_path() -> Result<PathBuf, ConfigError> {
+        let dirs = directories::ProjectDirs::from("", "vestige", "vestige")
+            .ok_or(ConfigError::NoHome)?;
+        // ProjectDirs::config_dir() varies per OS. Vestige convention is
+        // ~/.vestige/vestige.toml on Linux/macOS regardless of XDG, so we
+        // build the path off the home dir explicitly.
+        let home = directories::UserDirs::new()
+            .ok_or(ConfigError::NoHome)?
+            .home_dir()
+            .to_path_buf();
+        let _ = dirs; // keep the dep wired; future Phase 3 may use it
+        Ok(home.join(".vestige").join("vestige.toml"))
+    }
+
+    /// Light cross-field validation. Heavy validation (URL parsing,
+    /// directory existence) is left to the backend constructors.
+    fn validate(&self) -> Result<(), ConfigError> {
+        if self.embeddings.provider.is_empty() {
+            return Err(ConfigError::Invalid(
+                "embeddings.provider must not be empty".into(),
+            ));
+        }
+        if self.embeddings.model.is_empty() {
+            return Err(ConfigError::Invalid(
+                "embeddings.model must not be empty".into(),
+            ));
+        }
+        match &self.storage {
+            StorageConfig::Sqlite(_) => {}
+            #[cfg(feature = "postgres-backend")]
+            StorageConfig::Postgres(cfg) => {
+                if cfg.url.is_empty() {
+                    return Err(ConfigError::Invalid(
+                        "storage.postgres.url must not be empty".into(),
+                    ));
+                }
+            }
+        }
+        Ok(())
+    }
+}
+```
+
+### Serde behaviour with `postgres-backend` off
+
+`StorageConfig` is generated by serde only for the variants that are
+compiled in. When `postgres-backend` is off and the user writes:
+
+```toml
+[storage]
+backend = "postgres"
+
+[storage.postgres]
+url = "..."
+```
+
+serde returns a `toml::de::Error` of the form
+`unknown variant `postgres`, expected `sqlite``. That error path goes
+through `From<toml::de::Error> for ConfigError`, surfacing as
+`ConfigError::Toml(..)`. The MCP server prints this once at startup and
+exits with a non-zero code; there is no panic.
+
+To make the error friendlier we wrap that specific case in a clearer
+message via a thin post-parse check. Add this small helper after parsing
+in `load()`:
+
+```rust
+// (Inside the Ok(text) arm in load(), wrapping the parse step.)
+let cfg: VestigeConfig = match toml::from_str(&text) {
+    Ok(c) => c,
+    Err(e) => {
+        let msg = e.to_string();
+        if msg.contains("unknown variant `postgres`") {
+            return Err(ConfigError::Invalid(
+                "storage.backend = \"postgres\" requires building with --features postgres-backend".into(),
+            ));
+        }
+        return Err(ConfigError::Toml(e));
+    }
+};
+```
+
+This keeps the strict default deny_unknown_fields behaviour while giving the
+user a one-line action item.
+
+---
+
+## Pool module
+
+**File**: `crates/vestige-core/src/storage/postgres/pool.rs` (new).
+
+```rust
+#![cfg(feature = "postgres-backend")]
+
+//! `PgPool` construction for the Postgres backend.
+//!
+//! Pool defaults follow ADR 0002 D2 + master plan D3:
+//! - max_connections = 10
+//! - acquire_timeout = 30s (must exceed testcontainer warmup)
+//! - idle_timeout = 600s
+//! - max_lifetime = 1800s
+//! - test_before_acquire = false (cheap queries; saves a roundtrip)
+
+use std::str::FromStr;
+use std::time::Duration;
+
+use sqlx::postgres::{PgConnectOptions, PgPoolOptions};
+use sqlx::{ConnectOptions, PgPool};
+
+use crate::config::PostgresConfig;
+use crate::storage::memory_store::{MemoryStoreError, MemoryStoreResult};
+
+const DEFAULT_MAX_CONNECTIONS: u32 = 10;
+const DEFAULT_ACQUIRE_TIMEOUT_SECS: u64 = 30;
+const IDLE_TIMEOUT_SECS: u64 = 600;
+const MAX_LIFETIME_SECS: u64 = 1800;
+const STATEMENT_CACHE_CAPACITY: usize = 256;
+
+/// Build a Postgres connection pool from a `PostgresConfig`. Does NOT run
+/// migrations or stamp the embedding registry; those are the caller's job
+/// (`PgMemoryStore::connect`).
+pub async fn build_pool(cfg: &PostgresConfig) -> MemoryStoreResult<PgPool> {
+    let opts = PgConnectOptions::from_str(&cfg.url)
+        .map_err(MemoryStoreError::from)?
+        .application_name("vestige")
+        .statement_cache_capacity(STATEMENT_CACHE_CAPACITY)
+        .log_statements(tracing::log::LevelFilter::Debug);
+
+    let max_conn = cfg.max_connections.unwrap_or(DEFAULT_MAX_CONNECTIONS);
+    let acquire = cfg
+        .acquire_timeout_secs
+        .unwrap_or(DEFAULT_ACQUIRE_TIMEOUT_SECS);
+
+    let pool = PgPoolOptions::new()
+        .max_connections(max_conn)
+        .min_connections(0)
+        .acquire_timeout(Duration::from_secs(acquire))
+        .idle_timeout(Some(Duration::from_secs(IDLE_TIMEOUT_SECS)))
+        .max_lifetime(Some(Duration::from_secs(MAX_LIFETIME_SECS)))
+        .test_before_acquire(false)
+        .connect_with(opts)
+        .await
+        .map_err(MemoryStoreError::from)?;
+
+    Ok(pool)
+}
+```
+
+### Wiring into `PgMemoryStore::connect`
+
+In `crates/vestige-core/src/storage/postgres/mod.rs`, replace the
+`todo!()` body left by `0002a` for `connect` and `from_pool` with:
+
+```rust
+// In crates/vestige-core/src/storage/postgres/mod.rs
+
+use sqlx::PgPool;
+
+use crate::config::PostgresConfig;
+use crate::storage::memory_store::{MemoryStoreError, MemoryStoreResult};
+
+mod pool;
+mod registry; // see "Registry stub" section below
+
+pub struct PgMemoryStore {
+    pool: PgPool,
+}
+
+impl PgMemoryStore {
+    /// Convenience constructor matching `SqliteMemoryStore::new` shape.
+    /// Takes a URL + pool size for the common case.
+    pub async fn connect(url: &str, max_connections: u32) -> MemoryStoreResult<Self> {
+        let cfg = PostgresConfig {
+            url: url.to_string(),
+            max_connections: Some(max_connections),
+            acquire_timeout_secs: None,
+        };
+        Self::connect_with(&cfg).await
+    }
+
+    /// Full-config constructor.
+    pub async fn connect_with(cfg: &PostgresConfig) -> MemoryStoreResult<Self> {
+        let pool = pool::build_pool(cfg).await?;
+        Self::from_pool(pool).await
+    }
+
+    /// Construct from an already-built pool (used by tests and the migrate
+    /// CLI to share a pool across operations).
+    pub async fn from_pool(pool: PgPool) -> MemoryStoreResult<Self> {
+        // Migrations are added by 0002c.
+        // todo!("run sqlx::migrate! once 0002c lands")
+        registry::ensure_registry_stub(&pool).await?;
+        Ok(Self { pool })
+    }
+}
+```
+
+`connect_with` is the long-lived API; `connect` becomes a thin shim that
+stays compatible with the master-plan-mandated signature.
+
+### Registry stub
+
+**File**: `crates/vestige-core/src/storage/postgres/registry.rs` (new, stub).
+
+```rust
+#![cfg(feature = "postgres-backend")]
+
+//! Embedding registry. Real body lands in sub-plan 0002d.
+
+use sqlx::PgPool;
+
+use crate::storage::memory_store::MemoryStoreResult;
+
+/// Placeholder. Real implementation in 0002d reads/writes `embedding_model`
+/// and stamps `ALTER TABLE memories ALTER COLUMN embedding TYPE vector($N)`.
+pub(crate) async fn ensure_registry_stub(_pool: &PgPool) -> MemoryStoreResult<()> {
+    // Intentionally a no-op until 0002c lands the table + 0002d lands the
+    // real body. Leaving this as todo!() would crash the MCP server at
+    // startup the moment a user switches `backend = "postgres"`, which is
+    // not what we want for the build verification step in this sub-plan.
+    Ok(())
+}
+```
+
+The no-op keeps `cargo build --features postgres-backend` not just
+compiling but also allowing the MCP server to *boot* against a Postgres
+URL pointing at an already-migrated database (the local-dev-postgres-setup
+docs cover bringing up such a DB by hand). Real init lands in `0002d`.
+
+---
+
+## Error variants
+
+**File**: `crates/vestige-core/src/storage/memory_store.rs` (edit).
+
+The Phase 1 enum `MemoryStoreError` gains two feature-gated variants. These
+were deferred in `0002a` and become required as soon as `pool.rs` calls
+`.map_err(MemoryStoreError::from)` on `sqlx::Error`.
+
+```rust
+// Within enum MemoryStoreError { ... } in memory_store.rs
+
+#[cfg(feature = "postgres-backend")]
+#[error("postgres error: {0}")]
+Postgres(#[from] sqlx::Error),
+
+#[cfg(feature = "postgres-backend")]
+#[error("postgres migration error: {0}")]
+Migrate(#[from] sqlx::migrate::MigrateError),
+```
+
+Both use thiserror's `#[from]` attribute so the `?` operator works in
+`pool.rs`, the migrate module (`0002c`), and registry code (`0002d`).
+Default-features build (no `postgres-backend`) sees neither variant; the
+enum stays exhaustive on stable.
+
+If clippy fires on `non_exhaustive` due to the gated variants, add
+`#[non_exhaustive]` on the enum. That has no caller-side effect since the
+enum is constructed only inside the crate.
+
+---
+
+## vestige-mcp wiring
+
+### Cargo feature passthrough
+
+**File**: `crates/vestige-mcp/Cargo.toml` (edit).
+
+Add a feature that forwards through to `vestige-core`. Default features in
+`vestige-mcp` stay unchanged.
+
+```toml
+[features]
+default = ["embeddings", "vector-search"]
+embeddings = ["vestige-core/embeddings"]
+vector-search = ["vestige-core/vector-search"]
+postgres-backend = ["vestige-core/postgres-backend"]
+```
+
+Verify with:
+
+```bash
+cargo build -p vestige-mcp --features postgres-backend
+```
+
+### Backend dispatch at startup
+
+**File**: `crates/vestige-mcp/src/main.rs` (edit around the existing
+`Storage::new(storage_path)` call -- see audit note above; in the current
+worktree this is around line 285).
+
+The current code is roughly:
+
+```rust
+let storage_path = match prepare_storage_path(config.data_dir) { ... };
+let storage = match Storage::new(storage_path) { ... };
+```
+
+Replace that with a dispatch driven by `VestigeConfig`:
+
+```rust
+use std::sync::Arc;
+
+use vestige_core::config::{StorageConfig, VestigeConfig};
+use vestige_core::storage::SqliteMemoryStore;
+#[cfg(feature = "postgres-backend")]
+use vestige_core::storage::postgres::PgMemoryStore;
+use vestige_core::storage::MemoryStore;
+
+// Earlier: still call prepare_storage_path to honour --data-dir override.
+let storage_path = match prepare_storage_path(config.data_dir.clone()) { ... };
+
+// New: load vestige.toml (or fall back to defaults).
+let vestige_cfg = match VestigeConfig::load(config.config_path.as_deref()) {
+    Ok(c) => c,
+    Err(e) => {
+        eprintln!("vestige: failed to load config: {e}");
+        std::process::exit(2);
+    }
+};
+
+let storage: Arc<dyn MemoryStore> = match &vestige_cfg.storage {
+    StorageConfig::Sqlite(sqlite_cfg) => {
+        // CLI flag --data-dir wins over the config file path.
+        let path = storage_path.clone().or_else(|| sqlite_cfg.path.clone());
+        let s = SqliteMemoryStore::new(path).unwrap_or_else(|e| {
+            eprintln!("vestige: sqlite init failed: {e}");
+            std::process::exit(3);
+        });
+        Arc::new(s)
+    }
+    #[cfg(feature = "postgres-backend")]
+    StorageConfig::Postgres(pg_cfg) => {
+        let s = PgMemoryStore::connect_with(pg_cfg).await.unwrap_or_else(|e| {
+            eprintln!("vestige: postgres init failed: {e}");
+            std::process::exit(3);
+        });
+        Arc::new(s)
+    }
+};
+```
+
+The `config_path: Option<PathBuf>` field on the local `Config` (or
+clap-derived `Args`) struct must be added if not present; it accepts
+`--config <path>`. Default behaviour (no flag) goes through
+`VestigeConfig::default_path()`.
+
+If the existing main wires `Storage` through a concrete type rather than
+`Arc<dyn MemoryStore>`, the dispatch above lives behind a helper:
+
+```rust
+async fn build_store(cfg: &VestigeConfig, cli_path: Option<PathBuf>)
+    -> Result<Arc<dyn MemoryStore>, anyhow::Error>
+{ ... }
+```
+
+and the caller chains `.into()` as needed. Phase 1 already moved
+cognitive modules to `Arc<dyn MemoryStore>` so this should be a pure
+substitution; if a concrete-type holdout is found, fix it locally in this
+sub-plan (separate commit) rather than punting.
+
+---
+
+## vestige.toml example
+
+The canonical example to ship in `docs/` (Phase 2 docs land in `0002i`,
+runbook), shown here for reference and used verbatim by the unit test
+below.
+
+```toml
+# vestige.toml -- top-level configuration
+#
+# Default location: ~/.vestige/vestige.toml
+# Override: vestige-mcp --config /path/to/vestige.toml
+
+[embeddings]
+provider = "fastembed"
+model    = "nomic-ai/nomic-embed-text-v1.5"
+
+# --- SQLite backend (default) ---
+[storage]
+backend = "sqlite"
+
+[storage.sqlite]
+path = "/home/user/.vestige/vestige.db"
+
+# --- Postgres backend (requires --features postgres-backend) ---
+# [storage]
+# backend = "postgres"
+#
+# [storage.postgres]
+# url                  = "postgres://vestige:secret@localhost:5432/vestige"
+# max_connections      = 10
+# acquire_timeout_secs = 30
+
+[server]
+# Reserved for Phase 3 (bind address, ports, TLS).
+
+[auth]
+# Reserved for Phase 3 (API keys, claims).
+```
+
+---
+
+## Verification
+
+Run all of these from the repo root. The first three are the gates that
+must pass before this sub-plan is considered done.
+
+### 1. Default build (no Postgres)
+
+```bash
+cargo build -p vestige-core
+cargo build -p vestige-mcp
+cargo test  -p vestige-core --lib
+```
+
+Expected: clean build. `VestigeConfig::default()` selects SQLite; the MCP
+server boots the same way it did pre-sub-plan.
+
+### 2. Postgres-feature build
+
+```bash
+cargo build -p vestige-core --features postgres-backend
+cargo build -p vestige-mcp  --features postgres-backend
+```
+
+Expected: clean build. `PgMemoryStore::connect_with` resolves to
+`pool::build_pool` + `registry::ensure_registry_stub`; no `todo!()` is
+reachable on the build path. `connect` and `from_pool` are exported.
+
+### 3. Clippy across both feature sets
+
+```bash
+cargo clippy -p vestige-core -- -D warnings
+cargo clippy -p vestige-core --features postgres-backend -- -D warnings
+cargo clippy -p vestige-mcp  --features postgres-backend -- -D warnings
+```
+
+### 4. Unit test: round-trip the example
+
+Add this test to `crates/vestige-core/src/config.rs`:
+
+```rust
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    const EXAMPLE_SQLITE: &str = r#"
+[embeddings]
+provider = "fastembed"
+model    = "nomic-ai/nomic-embed-text-v1.5"
+
+[storage]
+backend = "sqlite"
+
+[storage.sqlite]
+path = "/home/user/.vestige/vestige.db"
+"#;
+
+    #[cfg(feature = "postgres-backend")]
+    const EXAMPLE_POSTGRES: &str = r#"
+[embeddings]
+provider = "fastembed"
+model    = "nomic-ai/nomic-embed-text-v1.5"
+
+[storage]
+backend = "postgres"
+
+[storage.postgres]
+url                  = "postgres://vestige:secret@localhost:5432/vestige"
+max_connections      = 10
+acquire_timeout_secs = 30
+"#;
+
+    #[test]
+    fn parses_sqlite_example() {
+        let cfg: VestigeConfig = toml::from_str(EXAMPLE_SQLITE).expect("parse");
+        match cfg.storage {
+            StorageConfig::Sqlite(s) => assert!(s.path.is_some()),
+            #[cfg(feature = "postgres-backend")]
+            StorageConfig::Postgres(_) => panic!("wrong variant"),
+        }
+        assert_eq!(cfg.embeddings.provider, "fastembed");
+    }
+
+    #[cfg(feature = "postgres-backend")]
+    #[test]
+    fn parses_postgres_example() {
+        let cfg: VestigeConfig = toml::from_str(EXAMPLE_POSTGRES).expect("parse");
+        match cfg.storage {
+            StorageConfig::Postgres(p) => {
+                assert_eq!(p.url, "postgres://vestige:secret@localhost:5432/vestige");
+                assert_eq!(p.max_connections, Some(10));
+                assert_eq!(p.acquire_timeout_secs, Some(30));
+            }
+            StorageConfig::Sqlite(_) => panic!("wrong variant"),
+        }
+    }
+
+    #[cfg(not(feature = "postgres-backend"))]
+    #[test]
+    fn rejects_postgres_when_feature_off() {
+        let toml_text = r#"
+[storage]
+backend = "postgres"
+
+[storage.postgres]
+url = "postgres://x/y"
+"#;
+        let res: Result<VestigeConfig, _> = toml::from_str(toml_text);
+        assert!(res.is_err(), "must fail without postgres-backend feature");
+    }
+
+    #[test]
+    fn defaults_pick_sqlite() {
+        let cfg = VestigeConfig::default();
+        assert!(matches!(cfg.storage, StorageConfig::Sqlite(_)));
+    }
+
+    #[test]
+    fn load_missing_file_returns_default() {
+        let tmp = std::env::temp_dir().join("vestige-no-such-file.toml");
+        let _ = std::fs::remove_file(&tmp);
+        let cfg = VestigeConfig::load(Some(&tmp)).expect("missing file is OK");
+        assert!(matches!(cfg.storage, StorageConfig::Sqlite(_)));
+    }
+
+    #[test]
+    fn load_roundtrip_from_disk() {
+        let tmp = std::env::temp_dir().join("vestige-roundtrip.toml");
+        std::fs::write(&tmp, EXAMPLE_SQLITE).unwrap();
+        let cfg = VestigeConfig::load(Some(&tmp)).expect("load");
+        assert!(matches!(cfg.storage, StorageConfig::Sqlite(_)));
+        let _ = std::fs::remove_file(&tmp);
+    }
+}
+```
+
+Run:
+
+```bash
+cargo test -p vestige-core --lib config::
+cargo test -p vestige-core --lib config:: --features postgres-backend
+```
+
+### 5. Smoke: server boots with default config
+
+```bash
+# default build, no vestige.toml on disk
+cargo run -p vestige-mcp -- --help
+# should print help, no panic
+```
+
+---
+
+## Acceptance criteria
+
+- [ ] `cargo build -p vestige-core` (default features) succeeds.
+- [ ] `cargo build -p vestige-core --features postgres-backend` succeeds.
+- [ ] `cargo build -p vestige-mcp` (default features) succeeds.
+- [ ] `cargo build -p vestige-mcp --features postgres-backend` succeeds.
+- [ ] `cargo clippy` with and without `postgres-backend` is clean on both
+      crates.
+- [ ] `crates/vestige-core/src/config.rs` exists, exposes
+      `VestigeConfig`, `StorageConfig`, `SqliteConfig`, `EmbeddingsConfig`,
+      `ConfigError`, plus `PostgresConfig` when the feature is on.
+- [ ] `VestigeConfig::load(None)` returns `Ok(default)` when
+      `~/.vestige/vestige.toml` is missing.
+- [ ] `VestigeConfig::load(Some(&path))` round-trips both the SQLite and
+      Postgres example blocks above.
+- [ ] With `postgres-backend` off, parsing `backend = "postgres"` returns
+      a clear `ConfigError::Invalid` mentioning the feature flag, NOT a
+      panic.
+- [ ] `crates/vestige-core/src/storage/postgres/pool.rs` exists,
+      implementing `build_pool(&PostgresConfig) -> MemoryStoreResult<PgPool>`
+      with the documented defaults.
+- [ ] `PgMemoryStore::connect`, `connect_with`, and `from_pool` all wire
+      through `pool::build_pool`. None of them is `todo!()`. The registry
+      step is a no-op stub documented as filled in by `0002d`.
+- [ ] `MemoryStoreError::Postgres(sqlx::Error)` and
+      `MemoryStoreError::Migrate(sqlx::migrate::MigrateError)` exist
+      behind `#[cfg(feature = "postgres-backend")]` with `#[from]`.
+- [ ] `vestige-mcp` has a `postgres-backend` feature that forwards to
+      `vestige-core/postgres-backend`.
+- [ ] `vestige-mcp/src/main.rs` selects SQLite vs Postgres at startup
+      based on `VestigeConfig`. SQLite is the default when no config file
+      is present.
+- [ ] Unit tests in the "Verification" section pass on both feature sets.
+
+---
+
+## Out of scope (handled by other sub-plans)
+
+- Migrations (`crates/vestige-core/migrations/postgres/*.sql`) -- `0002c`.
+- Real `PgMemoryStore` CRUD/search/scheduling/edges bodies -- `0002d`,
+  `0002e`.
+- `ensure_registry` real body with `ALTER COLUMN TYPE vector(N)` -- `0002d`.
+- `vestige migrate --from sqlite --to postgres` CLI -- `0002f`.
+- Re-embed flow -- `0002g`.
+- Env-var override (`VESTIGE_POSTGRES_URL`, etc.) -- Phase 3.
+- RLS, multi-tenant column population -- Phase 3.
--- a/docs/plans/0002c-migrations.md
+++ b/docs/plans/0002c-migrations.md
--- a/docs/plans/0002d-store-impl-bodies.md
+++ b/docs/plans/0002d-store-impl-bodies.md
--- a/docs/plans/0002e-hybrid-search.md
+++ b/docs/plans/0002e-hybrid-search.md
@ -0,0 +1,825 @@
+# Phase 2 Sub-Plan 0002e -- Hybrid RRF Search
+
+**Status**: Ready
+**Depends on**:
+- `0002a-skeleton-and-feature-gate.md` (the `postgres-backend` feature flag
+  exists and `PgMemoryStore` compiles with `todo!()` bodies).
+- `0002b-pool-and-config.md` (a working `PgPool` reaches the backend).
+- `0002c-migrations.md` (migration `0001_init` has created the `knowledge_nodes`
+  table with the D7 columns -- `owner_user_id`, `visibility`,
+  `shared_with_groups` -- and the D8 column `codebase`; migration `0002_hnsw`
+  has built the HNSW index).
+- `0002d-store-impl-bodies.md` (real CRUD bodies exist so the integration
+  tests below can seed data through the trait surface rather than raw SQL).
+
+This sub-plan covers master plan 0002 deliverable D5: the hybrid RRF search
+query implementation in `crates/vestige-core/src/storage/postgres/search.rs`,
+plus the `search`, `fts_search`, and `vector_search` method bodies in
+`crates/vestige-core/src/storage/postgres/mod.rs` that delegate into it.
+
+---
+
+## Context
+
+This is one of the more performance-sensitive sub-plans in Phase 2. Every
+search call from the cognitive engine -- the 7-stage retrieval pipeline,
+`session_context`, `predict`, `deep_reference`, the dashboard -- bottoms out
+in `MemoryStore::search`. The Postgres backend has to keep up with the
+existing SQLite hybrid path, which combines BM25 over FTS5 with USearch HNSW
+in two separate round trips and fuses the rankings in Rust.
+
+The shape of the win on Postgres is that both branches and the fusion run
+inside one statement. The planner sees both CTEs together, the round trip is
+single, and the rerank stage runs over a cleanly overfetched candidate set.
+
+Latency targets live in `0002h-testing-and-benches.md`. This sub-plan is
+responsible for producing a correct, schema-stable query that the benches
+can drive against. Do not optimise here; correctness and structure first.
+
+Master plan 0002 D5 (around lines 522-628 of
+`docs/plans/0002-phase-2-postgres-backend.md`) sketches the SQL. That
+sketch is the starting point, not the finished product. The schema after
+the D7 and D8 amendments has more columns than the sketch enumerates, and
+the SQLite `search` method (around line 6503 of
+`crates/vestige-core/src/storage/sqlite.rs` in the Phase 1 worktree)
+documents the semantics this implementation must stay compatible with:
+
+- Empty `query.limit` defaults to 10.
+- `query.text == Some("")` is treated as no text query (degrade to vector).
+- `query.embedding == None` is treated as no vector query (degrade to FTS).
+- Both empty returns `Ok(vec![])`; not an error.
+- The `MemoryRecord` in each `SearchResult` must be populated with all
+  fields the trait promises, including `domains` and `domain_scores` (Phase
+  4 will fill these in; Phase 2 returns the stored values, which may be
+  empty arrays / empty objects).
+
+---
+
+## Constants
+
+```rust
+/// Reciprocal Rank Fusion smoothing constant from Cormack, Clarke and Buettcher
+/// 2009 ("Reciprocal Rank Fusion outperforms Condorcet and individual rank
+/// learning methods"). 60 is the canonical default and is robust across most
+/// fusion regimes. Do not tune this without a paper-citation-grade reason.
+const RRF_K: i32 = 60;
+
+/// Each branch (FTS, vector) is allowed to return OVERFETCH_MULT x final_limit
+/// rows before fusion. Three matches the Phase 1 SQLite overfetch and gives
+/// the fusion enough candidates to recover from any single branch's bad
+/// recall on a given query.
+const OVERFETCH_MULT: i64 = 3;
+```
+
+These live at module scope in
+`crates/vestige-core/src/storage/postgres/search.rs`. They are `pub(crate)`
+only if `0002h-testing-and-benches.md` needs to reference them from the
+integration tests; otherwise private.
+
+---
+
+## Public API
+
+```rust
+#![cfg(feature = "postgres-backend")]
+
+use pgvector::Vector;
+use sqlx::PgPool;
+
+use crate::storage::memory_store::{
+    MemoryStoreResult, SearchQuery, SearchResult,
+};
+
+/// Hybrid RRF search over Postgres FTS and pgvector cosine distance.
+///
+/// Branch behavior:
+/// - empty text + null embedding   -> Ok(vec![])
+/// - empty text + Some(embedding)  -> pure vector search (FTS CTE returns
+///                                    zero rows; fusion equals the vector
+///                                    branch)
+/// - Some(text) + null embedding   -> pure FTS search
+/// - Some(text) + Some(embedding)  -> full RRF fusion
+///
+/// `query.limit == 0` is treated as 10 (matches the SQLite default).
+pub(crate) async fn rrf_search(
+    pool: &PgPool,
+    query: &SearchQuery,
+) -> MemoryStoreResult<Vec<SearchResult>>;
+
+/// FTS-only convenience search. Equivalent to calling `rrf_search` with
+/// `query.embedding = None`, but uses a dedicated single-branch query that
+/// avoids the FULL OUTER JOIN and the params CTE; faster by one planner pass
+/// per call.
+pub(crate) async fn fts_only(
+    pool: &PgPool,
+    text: &str,
+    limit: usize,
+) -> MemoryStoreResult<Vec<SearchResult>>;
+
+/// Vector-only convenience search. Dedicated single-branch query for the same
+/// latency reason as `fts_only`.
+pub(crate) async fn vector_only(
+    pool: &PgPool,
+    embedding: &[f32],
+    limit: usize,
+) -> MemoryStoreResult<Vec<SearchResult>>;
+```
+
+### Parameter handling
+
+In `rrf_search`:
+
+```rust
+let final_limit: i32 = if query.limit == 0 { 10 } else { query.limit as i32 };
+let overfetch: i32 = (final_limit as i64 * OVERFETCH_MULT)
+    .min(i32::MAX as i64) as i32;
+
+let q_text: &str = query.text.as_deref().unwrap_or("");
+let q_vec: Option<Vector> = query.embedding.as_ref()
+    .map(|v| Vector::from(v.clone()));
+
+let dom_filter: Option<&[String]> = query.domains.as_deref();
+let nt_filter:  Option<&[String]> = query.node_types.as_deref();
+let tag_filter: Option<&[String]> = query.tags.as_deref();
+
+let min_retr: Option<f64> = query.min_retrievability;
+```
+
+Both branches empty -- `q_text` is empty and `q_vec` is `None` -- returns
+`Ok(vec![])` without hitting the database. The SQLite backend has the same
+behavior and tests rely on it.
+
+```rust
+if q_text.is_empty() && q_vec.is_none() {
+    return Ok(Vec::new());
+}
+```
+
+### `search` method body in `postgres/mod.rs`
+
+```rust
+#[async_trait::async_trait]   // or trait_variant after the Phase 1 amendment
+impl MemoryStore for PgMemoryStore {
+    async fn search(&self, query: &SearchQuery)
+        -> MemoryStoreResult<Vec<SearchResult>>
+    {
+        crate::storage::postgres::search::rrf_search(&self.pool, query).await
+    }
+
+    async fn fts_search(&self, text: &str, limit: usize)
+        -> MemoryStoreResult<Vec<SearchResult>>
+    {
+        crate::storage::postgres::search::fts_only(&self.pool, text, limit)
+            .await
+    }
+
+    async fn vector_search(&self, embedding: &[f32], limit: usize)
+        -> MemoryStoreResult<Vec<SearchResult>>
+    {
+        crate::storage::postgres::search::vector_only(&self.pool, embedding, limit)
+            .await
+    }
+}
+```
+
+Everything below specifies the inside of those three free functions.
+
+---
+
+## SQL: the hybrid RRF query
+
+The query is built as one `&'static str` (or `OnceCell<String>`; see "Use
+of sqlx::query!" below) and reused. Bound parameters are kept to seven
+through a `params` CTE that the rest of the query references by name --
+this keeps the SQL readable and stops the bound-parameter count growing
+with each filter clause.
+
+Bound parameters:
+
+- `$1`: text query (TEXT, may be empty)
+- `$2`: embedding (pgvector::Vector, may be NULL)
+- `$3`: overfetch limit per branch (INT)
+- `$4`: final limit (INT)
+- `$5`: domain filter (TEXT[] or NULL)
+- `$6`: node_type filter (TEXT[] or NULL)
+- `$7`: tag filter (TEXT[] or NULL)
+
+If `min_retrievability.is_some()` the outer SELECT adds a JOIN on
+`scheduling` and a WHERE clause; that path uses a different prepared
+statement (see "min_retrievability filter" below) so the simple-path query
+stays free of the join.
+
+```sql
+WITH params AS (
+    SELECT
+        $1::text    AS q_text,
+        $2::vector  AS q_vec,
+        $3::int     AS overfetch,
+        $4::int     AS final_limit,
+        $5::text[]  AS dom_filter,
+        $6::text[]  AS nt_filter,
+        $7::text[]  AS tag_filter
+),
+fts AS (
+    SELECT
+        m.id,
+        ts_rank_cd(
+            m.search_vec,
+            websearch_to_tsquery('english', p.q_text)
+        ) AS score,
+        ROW_NUMBER() OVER (
+            ORDER BY ts_rank_cd(
+                m.search_vec,
+                websearch_to_tsquery('english', p.q_text)
+            ) DESC
+        ) AS rank
+    FROM knowledge_nodes m
+    CROSS JOIN params p
+    WHERE p.q_text <> ''
+      AND m.search_vec @@ websearch_to_tsquery('english', p.q_text)
+      AND (p.dom_filter IS NULL OR m.domains   && p.dom_filter)
+      AND (p.nt_filter  IS NULL OR m.node_type =  ANY(p.nt_filter))
+      AND (p.tag_filter IS NULL OR m.tags      && p.tag_filter)
+    ORDER BY score DESC
+    LIMIT (SELECT overfetch FROM params)
+),
+vec AS (
+    SELECT
+        m.id,
+        1.0 - (m.embedding <=> p.q_vec) AS score,
+        ROW_NUMBER() OVER (
+            ORDER BY m.embedding <=> p.q_vec
+        ) AS rank
+    FROM knowledge_nodes m
+    CROSS JOIN params p
+    WHERE m.embedding IS NOT NULL
+      AND p.q_vec IS NOT NULL
+      AND (p.dom_filter IS NULL OR m.domains   && p.dom_filter)
+      AND (p.nt_filter  IS NULL OR m.node_type =  ANY(p.nt_filter))
+      AND (p.tag_filter IS NULL OR m.tags      && p.tag_filter)
+    ORDER BY m.embedding <=> p.q_vec
+    LIMIT (SELECT overfetch FROM params)
+),
+fused AS (
+    SELECT
+        COALESCE(f.id, v.id) AS id,
+          COALESCE(1.0 / (60 + f.rank), 0.0)
+        + COALESCE(1.0 / (60 + v.rank), 0.0) AS rrf_score,
+        f.score AS fts_score,
+        v.score AS vector_score
+    FROM fts f
+    FULL OUTER JOIN vec v ON f.id = v.id
+)
+SELECT
+    m.id                  AS "id!: uuid::Uuid",
+    m.owner_user_id       AS "owner_user_id!: uuid::Uuid",
+    m.visibility          AS "visibility!: String",
+    m.shared_with_groups  AS "shared_with_groups!: Vec<uuid::Uuid>",
+    m.codebase            AS "codebase: String",
+    m.domains             AS "domains!: Vec<String>",
+    m.domain_scores       AS "domain_scores!: serde_json::Value",
+    m.content             AS "content!: String",
+    m.node_type           AS "node_type!: String",
+    m.tags                AS "tags!: Vec<String>",
+    m.embedding           AS "embedding: pgvector::Vector",
+    m.metadata            AS "metadata!: serde_json::Value",
+    m.created_at          AS "created_at!: chrono::DateTime<chrono::Utc>",
+    m.updated_at          AS "updated_at!: chrono::DateTime<chrono::Utc>",
+    fused.rrf_score       AS "rrf_score!: f64",
+    fused.fts_score       AS "fts_score: f64",
+    fused.vector_score    AS "vector_score: f64"
+FROM fused
+JOIN knowledge_nodes m ON m.id = fused.id
+ORDER BY fused.rrf_score DESC
+LIMIT (SELECT final_limit FROM params);
+```
+
+Notes on the SELECT column list. The D7 columns (`owner_user_id`,
+`visibility`, `shared_with_groups`) and the D8 column (`codebase`) are
+selected even though Phase 2 does not filter on them yet, so:
+
+1. The `MemoryRecord` returned to the trait can be populated with the
+   stored values from day one. Phase 3 will start writing real
+   `owner_user_id` / `visibility` values; Phase 2 always writes the
+   single-user defaults (`'00000000-...-0001'`, `'private'`, `'{}'`). The
+   `MemoryRecord` returned in Phase 2 simply carries those defaults.
+2. The schema-drift integration tests (see "Verification") catch the case
+   where someone adds a NOT NULL column to `knowledge_nodes` without updating
+   this query.
+
+Notes on the body:
+
+- `CROSS JOIN params p` is used instead of the master-plan sketch's
+  `FROM knowledge_nodes m, params p`. Same plan, clearer intent.
+- The `ORDER BY ... LIMIT` inside each branch CTE is there so the planner
+  can stop early once it has `overfetch` rows; without it the LIMIT is
+  applied after a full sort over all matches.
+- `1.0 - (m.embedding <=> p.q_vec)` converts pgvector's cosine *distance*
+  to cosine *similarity* in [0, 1] for the `vector_score` output. RRF
+  itself does not need the similarity -- it uses ranks -- but the trait
+  surface exposes `vector_score: Option<f64>` for caller introspection.
+- `RRF_K = 60` is inlined as `60` in the SQL string. A `const` formatter
+  feels tidier but `60` is a literature constant; spell it out and leave a
+  comment in the Rust source: `// 60 == RRF_K (Cormack 2009)`.
+- `FULL OUTER JOIN` is required: a row that the FTS branch finds and the
+  vector branch does not must still appear in `fused`, and vice versa.
+- `COALESCE(..., 0.0)` on each `1.0 / (60 + rank)` term handles the
+  no-match-from-this-branch case. The fusion score for a row that only the
+  FTS branch ranks is `1/(60 + f.rank)` exactly.
+- `m.search_vec` is the generated `tsvector` column created in migration
+  `0001_init` (see D4 of the master plan).
+
+---
+
+## Result row mapping
+
+`sqlx::query_as::<_, SearchRow>` reads each row into a private struct that
+owns the column types exactly as they come back from Postgres. The struct
+is converted into a `SearchResult` after fetch.
+
+```rust
+#[derive(sqlx::FromRow)]
+struct SearchRow {
+    id: uuid::Uuid,
+    owner_user_id: uuid::Uuid,
+    visibility: String,
+    shared_with_groups: Vec<uuid::Uuid>,
+    codebase: Option<String>,
+    domains: Vec<String>,
+    domain_scores: serde_json::Value,
+    content: String,
+    node_type: String,
+    tags: Vec<String>,
+    embedding: Option<pgvector::Vector>,
+    metadata: serde_json::Value,
+    created_at: chrono::DateTime<chrono::Utc>,
+    updated_at: chrono::DateTime<chrono::Utc>,
+    rrf_score: f64,
+    fts_score: Option<f64>,
+    vector_score: Option<f64>,
+}
+
+impl SearchRow {
+    fn into_result(self) -> SearchResult {
+        use crate::storage::memory_store::MemoryRecord;
+        use std::collections::HashMap;
+
+        // domain_scores is JSONB; the column always exists, but may be the
+        // empty object {} when Phase 4 has not classified this memory yet.
+        let domain_scores: HashMap<String, f64> =
+            serde_json::from_value(self.domain_scores).unwrap_or_default();
+
+        let record = MemoryRecord {
+            id: self.id,
+            domains: self.domains,
+            domain_scores,
+            content: self.content,
+            node_type: self.node_type,
+            tags: self.tags,
+            // pgvector::Vector -> Vec<f32>
+            embedding: self.embedding.map(|v| v.to_vec()),
+            created_at: self.created_at,
+            updated_at: self.updated_at,
+            metadata: self.metadata,
+            // owner_user_id / visibility / shared_with_groups / codebase
+            // do not appear on MemoryRecord yet. Phase 3 will decide whether
+            // to extend MemoryRecord or surface them via a side channel.
+            // For Phase 2 they are read but discarded here.
+        };
+
+        SearchResult {
+            record,
+            score: self.rrf_score,
+            fts_score: self.fts_score,
+            vector_score: self.vector_score,
+        }
+    }
+}
+```
+
+Type mapping summary:
+
+| SQL type          | Rust type                            | Notes                                          |
+|-------------------|--------------------------------------|------------------------------------------------|
+| UUID              | `uuid::Uuid`                         | requires sqlx `uuid` feature                   |
+| TEXT              | `String`                             |                                                |
+| TEXT NULL         | `Option<String>`                     | used for `codebase`                            |
+| TEXT[]            | `Vec<String>`                        | for `tags`, `domains`                          |
+| UUID[]            | `Vec<uuid::Uuid>`                    | for `shared_with_groups`                       |
+| JSONB             | `serde_json::Value`                  | for `metadata`, `domain_scores`                |
+| TIMESTAMPTZ       | `chrono::DateTime<chrono::Utc>`      | requires sqlx `chrono` feature                 |
+| VECTOR(N) NULL    | `Option<pgvector::Vector>`           | `.map(|v| v.to_vec())` to `Option<Vec<f32>>`   |
+| FLOAT8            | `f64`                                |                                                |
+| FLOAT8 NULL       | `Option<f64>`                        | for `fts_score`, `vector_score`                |
+
+If `MemoryRecord` is extended in Phase 3 to carry `owner_user_id`,
+`visibility`, `shared_with_groups`, and `codebase`, the conversion above
+gets four more fields. Phase 2 reads them so the integration tests can
+assert on them via SQL, but the trait surface does not expose them yet.
+
+---
+
+## `fts_only` and `vector_only` -- dedicated single-branch queries
+
+The master plan offers two options for the convenience methods: reuse
+`rrf_search` with one branch nulled, or write dedicated queries. The
+dedicated queries win:
+
+- One CTE instead of three. Planner picks the obvious plan.
+- No FULL OUTER JOIN.
+- No `params` indirection -- bound parameters used directly.
+- The output `score` is the branch's native score (BM25-ish `ts_rank_cd` /
+  cosine similarity), not an RRF fusion score over one branch. Callers of
+  `fts_search` and `vector_search` get an intuitive score back.
+
+### `fts_only`
+
+Bound parameters:
+
+- `$1`: text query (TEXT, must be non-empty; the caller guards `text.is_empty()`)
+- `$2`: limit (INT)
+
+```sql
+SELECT
+    m.id                  AS "id!: uuid::Uuid",
+    m.owner_user_id       AS "owner_user_id!: uuid::Uuid",
+    m.visibility          AS "visibility!: String",
+    m.shared_with_groups  AS "shared_with_groups!: Vec<uuid::Uuid>",
+    m.codebase            AS "codebase: String",
+    m.domains             AS "domains!: Vec<String>",
+    m.domain_scores       AS "domain_scores!: serde_json::Value",
+    m.content             AS "content!: String",
+    m.node_type           AS "node_type!: String",
+    m.tags                AS "tags!: Vec<String>",
+    m.embedding           AS "embedding: pgvector::Vector",
+    m.metadata            AS "metadata!: serde_json::Value",
+    m.created_at          AS "created_at!: chrono::DateTime<chrono::Utc>",
+    m.updated_at          AS "updated_at!: chrono::DateTime<chrono::Utc>",
+    ts_rank_cd(m.search_vec, websearch_to_tsquery('english', $1))
+                          AS "fts_score!: f64"
+FROM knowledge_nodes m
+WHERE m.search_vec @@ websearch_to_tsquery('english', $1)
+ORDER BY ts_rank_cd(m.search_vec, websearch_to_tsquery('english', $1)) DESC
+LIMIT $2;
+```
+
+The Rust caller maps each row to a `SearchResult` with:
+
+```rust
+SearchResult {
+    record,
+    score: fts_score,
+    fts_score: Some(fts_score),
+    vector_score: None,
+}
+```
+
+If `text.is_empty()` the caller returns `Ok(Vec::new())` before hitting
+the database. `websearch_to_tsquery('english', '')` returns an empty
+tsquery that matches nothing; the round-trip is wasted work otherwise.
+
+### `vector_only`
+
+Bound parameters:
+
+- `$1`: embedding (pgvector::Vector)
+- `$2`: limit (INT)
+
+```sql
+SELECT
+    m.id                  AS "id!: uuid::Uuid",
+    m.owner_user_id       AS "owner_user_id!: uuid::Uuid",
+    m.visibility          AS "visibility!: String",
+    m.shared_with_groups  AS "shared_with_groups!: Vec<uuid::Uuid>",
+    m.codebase            AS "codebase: String",
+    m.domains             AS "domains!: Vec<String>",
+    m.domain_scores       AS "domain_scores!: serde_json::Value",
+    m.content             AS "content!: String",
+    m.node_type           AS "node_type!: String",
+    m.tags                AS "tags!: Vec<String>",
+    m.embedding           AS "embedding: pgvector::Vector",
+    m.metadata            AS "metadata!: serde_json::Value",
+    m.created_at          AS "created_at!: chrono::DateTime<chrono::Utc>",
+    m.updated_at          AS "updated_at!: chrono::DateTime<chrono::Utc>",
+    1.0 - (m.embedding <=> $1) AS "vector_score!: f64"
+FROM knowledge_nodes m
+WHERE m.embedding IS NOT NULL
+ORDER BY m.embedding <=> $1
+LIMIT $2;
+```
+
+The Rust caller maps each row to:
+
+```rust
+SearchResult {
+    record,
+    score: vector_score,
+    fts_score: None,
+    vector_score: Some(vector_score),
+}
+```
+
+Both convenience methods ignore `SearchQuery.domains` / `tags` /
+`node_types` / `min_retrievability` -- they take `&str` and `&[f32]`
+respectively, not a `SearchQuery`. Callers that want filters on a
+single-branch search should call `search` with the other branch input
+left at its degrade-to-zero default.
+
+---
+
+## `min_retrievability` filter
+
+`SearchQuery::min_retrievability: Option<f64>` is applied as a final
+filter after fusion by joining on the `scheduling` table:
+
+```sql
+-- with-min-retrievability variant: identical CTEs to the base query, only
+-- the final SELECT changes.
+SELECT
+    ... (same column list as the base query) ...
+FROM fused
+JOIN knowledge_nodes m ON m.id = fused.id
+JOIN scheduling s ON s.memory_id = m.id
+WHERE s.retrievability >= $8
+ORDER BY fused.rrf_score DESC
+LIMIT (SELECT final_limit FROM params);
+```
+
+This is a separate prepared statement -- the eight-parameter variant --
+held alongside the seven-parameter base. The Rust dispatch:
+
+```rust
+if let Some(min_r) = query.min_retrievability {
+    sqlx::query_as::<_, SearchRow>(QUERY_WITH_MIN_R)
+        .bind(q_text)
+        .bind(q_vec)
+        .bind(overfetch)
+        .bind(final_limit)
+        .bind(dom_filter)
+        .bind(nt_filter)
+        .bind(tag_filter)
+        .bind(min_r)
+        .fetch_all(pool).await?
+} else {
+    sqlx::query_as::<_, SearchRow>(QUERY_BASE)
+        .bind(q_text)
+        .bind(q_vec)
+        .bind(overfetch)
+        .bind(final_limit)
+        .bind(dom_filter)
+        .bind(nt_filter)
+        .bind(tag_filter)
+        .fetch_all(pool).await?
+}
+```
+
+Why not unconditionally join: the `scheduling` join is expensive enough on
+a large `knowledge_nodes` table that adding it to every search call regresses the
+common path. `min_retrievability` is set by the cognitive engine's
+accessibility filter and is `None` in most direct callers.
+
+The same two-variant pattern repeats for `fts_only` and `vector_only`; in
+practice callers of those methods rarely set `min_retrievability` (it is
+not part of their argument list), so only the base variant is needed
+unless the trait surface grows.
+
+---
+
+## Domain / tag / node_type filters
+
+Each filter is expressed as a NULL-conditional clause inside both branch
+CTEs, written using PostgreSQL array operators:
+
+```sql
+AND (p.dom_filter IS NULL OR m.domains   && p.dom_filter)
+AND (p.nt_filter  IS NULL OR m.node_type =  ANY(p.nt_filter))
+AND (p.tag_filter IS NULL OR m.tags      && p.tag_filter)
+```
+
+- `&&` is the PostgreSQL "arrays overlap" operator. Matches if any
+  element in `m.domains` is in the filter array. Index-friendly with a
+  GIN index on `m.domains` (created in `0001_init`).
+- `= ANY(...)` matches `m.node_type` (a scalar) against any element of
+  the filter array. Index-friendly with a B-tree on `m.node_type`.
+- `&&` is used again on `m.tags` (a `TEXT[]`).
+
+The NULL-conditional form is critical: when the filter parameter is
+`NULL`, the clause short-circuits to `TRUE` and contributes nothing to
+the WHERE. This keeps a single query reusable across "no filter" and
+"filter set" cases without rewriting SQL.
+
+When the Rust caller passes `None` for a filter, sqlx binds it as `NULL`
+of the column type (`text[]`). The cast `$5::text[]` in the `params` CTE
+is what tells sqlx the binding type.
+
+The master plan's draft has each filter clause duplicated across both
+branch CTEs. That duplication is correct -- the planner cannot push a
+WHERE clause across a FULL OUTER JOIN into both sides automatically; we
+do it manually.
+
+---
+
+## Empty-string text query handling
+
+The base query guards the FTS branch with `WHERE p.q_text <> ''`.
+
+`websearch_to_tsquery('english', '')` returns an empty tsquery. An empty
+tsquery has no lexemes and matches no document; the `@@` operator returns
+false for every row. Without the guard, the FTS branch would still run --
+sequential scan, tokenisation per row, comparison -- and return zero
+rows. The guard short-circuits at planning time.
+
+The guard does not affect the FULL OUTER JOIN: when the FTS branch
+returns zero rows, the join degenerates to "every row that the vector
+branch returned, with `f.id IS NULL` and `f.rank IS NULL`". The
+`COALESCE(1.0 / (60 + f.rank), 0.0)` then evaluates to `0.0`, and the
+fused score reduces to the vector branch's RRF term alone. This is the
+"pure vector search" degrade path.
+
+Symmetrically, the vector branch guards itself with
+`WHERE m.embedding IS NOT NULL AND p.q_vec IS NOT NULL`, which gives the
+"pure FTS search" degrade path when the caller passes no embedding.
+
+The both-empty case (`q_text == ''` and `q_vec IS NULL`) is intercepted
+in Rust before the query runs and returns `Ok(vec![])`. Returning empty
+rather than error matches the SQLite behavior and is what the Phase 1
+ingest pipeline relies on for "no signal, no results" fallback.
+
+---
+
+## Use of `sqlx::query!` versus `sqlx::query_as`
+
+`sqlx::query!` and `sqlx::query_as!` are compile-time-checked: the SQL is
+sent to a live Postgres at build time, the result schema is validated, and
+the generated Rust struct fields are derived. That checking is the
+default for every other query in `0002d-store-impl-bodies.md`.
+
+For the RRF query, the macro path is impractical for two reasons:
+
+1. **Two structurally different queries** -- the base (seven parameters,
+   no `scheduling` join) and the `min_retrievability` variant (eight
+   parameters, with the join). The macro would force two macro
+   invocations, each producing its own anonymous result struct, and the
+   result types would not unify. Manual `From` impls would be needed in
+   both directions.
+2. **The dedicated `fts_only` and `vector_only` queries have a different
+   output column set** (`fts_score!` instead of `rrf_score! + fts_score? +
+   vector_score?`). Three macro invocations, three structs, three
+   conversion helpers.
+
+The chosen pattern is `sqlx::query_as::<_, SearchRow>(SQL_CONST)` with a
+single `SearchRow` struct that owns the column types and a single
+`SearchRow::into_result` helper. The SQL is held in module-scope `&'static
+str` constants:
+
+```rust
+const QUERY_BASE:        &str = include_str!("search.rrf.sql");
+const QUERY_WITH_MIN_R:  &str = include_str!("search.rrf.min_retr.sql");
+const QUERY_FTS_ONLY:    &str = include_str!("search.fts.sql");
+const QUERY_VECTOR_ONLY: &str = include_str!("search.vector.sql");
+```
+
+`include_str!` keeps the SQL out of the Rust source. The four `.sql`
+files live next to `search.rs` in
+`crates/vestige-core/src/storage/postgres/`.
+
+The cost: schema drift (someone renames `m.codebase` to `m.repo_name`)
+will not break the build. The integration tests in "Verification" below
+are the safety net. This is a deliberate trade -- it is the one sub-plan
+in Phase 2 where runtime flexibility beats compile-time checking.
+
+If a future contributor wants compile-time checking back for the simple
+case, the right move is to introduce a `#[cfg(test)]`-only macro-checked
+variant of `QUERY_BASE` and assert at test build time that the macro
+agrees with the string. That belongs in `0002h-testing-and-benches.md` if
+anywhere.
+
+---
+
+## Verification
+
+Integration tests live in
+`crates/vestige-core/tests/postgres_search.rs`, gated by
+`#[cfg(feature = "postgres-backend")]` and `#[ignore]` by default (the
+test runner CI workflow in `0002h-testing-and-benches.md` runs them with
+`--ignored` against a live Postgres).
+
+Common harness for every test:
+
+1. Spin up Postgres via `sqlx::PgPool::connect` against the test URL.
+2. Run `sqlx::migrate!("./migrations/postgres").run(&pool)` to bring the
+   schema up.
+3. Register a deterministic test embedder via `register_model` so
+   `embedding` columns can be written.
+4. Seed 50 mixed memories through `MemoryStore::insert` -- mixed
+   `node_type` (`fact`, `concept`, `event`, `decision`), mixed `tags`
+   (`rust`, `postgres`, `search`, `dream`, `bug-fix`), mixed `codebase`,
+   embeddings drawn from three small clusters so vector recall has
+   structure to find.
+
+Test cases:
+
+**T1. Full RRF returns the seeded target.**
+Insert a known memory with `content = "FSRS-6 spaced repetition cadence"`
+and an embedding from cluster A. Query with
+`text = Some("FSRS spaced repetition")` and an embedding near cluster A.
+Assert the target memory is in the top 3 of the returned `SearchResult`s
+and that both `fts_score` and `vector_score` are `Some` for it.
+
+**T2. Pure FTS degrade.**
+Same target as T1. Query with `text = Some("FSRS spaced repetition")` and
+`embedding = None`. Assert the target appears, all results have
+`vector_score == None`, `fts_score == Some(_)`, and `score` equals the
+fused RRF score (which collapses to one branch's `1.0/(60 + rank)`).
+
+**T3. Pure vector degrade.**
+Same target as T1. Query with `text = Some("")` and
+`embedding = Some(cluster_A_vector)`. Assert the target appears, all
+results have `fts_score == None`, `vector_score == Some(_)`.
+
+**T4. Both empty returns `Ok(vec![])`.**
+Query with `text = Some("")` and `embedding = None`. Assert exactly an
+empty result vector and that no SQL was executed (assert via a
+`sqlx::PgPool` query-count handle if convenient; otherwise document that
+the short-circuit lives in Rust).
+
+**T5. `domains` filter.**
+Insert one memory with `domains = vec!["domain-x"]` and 49 others without
+it. Query with `domains = Some(vec!["domain-x"])` and a matching text.
+Assert exactly one result is returned and it is the seeded memory.
+
+**T6. `tags` filter.**
+Same pattern as T5 with `tags = Some(vec!["bug-fix"])`.
+
+**T7. `node_types` filter.**
+Same pattern as T5 with `node_types = Some(vec!["decision"])`.
+
+**T8. `min_retrievability` filter.**
+Seed two memories with the same content + embedding. Write
+`scheduling` rows so that one has `retrievability = 0.9` and the other
+`0.1`. Query with `min_retrievability = Some(0.5)`. Assert exactly the
+high-retrievability memory is returned.
+
+**T9. `query.limit == 0` defaults to 10.**
+Seed 30 matching memories. Query with `limit = 0`. Assert the result
+contains exactly 10 entries.
+
+**T10. `fts_only` and `vector_only` parity.**
+For the same target memory, call `fts_only` and `vector_only` directly
+and compare against `search` with the corresponding branch zeroed. The
+top-1 result must match by id; the scores need only be of the same sign
+and magnitude (not bit-identical, because RRF fusion changes the
+absolute score).
+
+**T11. Schema-drift canary.**
+Run the base query against an empty `knowledge_nodes` table and `fetch_all`
+into `Vec<SearchRow>`. Any added NOT NULL column on `knowledge_nodes` that is
+not in the SELECT will fail the test at the `try_get` boundary with a
+clear error. This is the test that compensates for not using
+`sqlx::query!`.
+
+**T12. Hostile inputs.**
+Query with `text = Some("'; DROP TABLE knowledge_nodes; --")` and a normal
+embedding. Assert no panic, results returned cleanly, `knowledge_nodes` table
+still present. This is symbolic; `websearch_to_tsquery` is parameter-
+bound and SQL injection is not actually possible, but the test is cheap
+and the assertion is real.
+
+---
+
+## Acceptance criteria
+
+A reviewer of the implementation PR should be able to confirm:
+
+1. `crates/vestige-core/src/storage/postgres/search.rs` exists and is
+   compiled only when `feature = "postgres-backend"` is on.
+2. The four `.sql` files (`search.rrf.sql`,
+   `search.rrf.min_retr.sql`, `search.fts.sql`, `search.vector.sql`)
+   exist in the same directory and are `include_str!`-ed into module-
+   scope `&'static str` constants.
+3. `RRF_K = 60` and `OVERFETCH_MULT = 3` are defined as constants at
+   module scope with the Cormack 2009 citation in a comment.
+4. The seven-parameter base query is one statement and uses a `params`
+   CTE; the eight-parameter `min_retrievability` variant adds exactly
+   one JOIN and one WHERE clause on top of the base.
+5. Empty text degrades to pure vector; null embedding degrades to pure
+   FTS; both empty short-circuits to `Ok(vec![])` in Rust before the
+   query runs.
+6. The SELECT column list in every query includes `owner_user_id`,
+   `visibility`, `shared_with_groups`, and `codebase` even though Phase 2
+   does not filter on them.
+7. `SearchRow::into_result` populates a `MemoryRecord` with every field
+   the trait requires, including `domains` and `domain_scores` decoded
+   from JSONB.
+8. `PgMemoryStore::search`, `PgMemoryStore::fts_search`, and
+   `PgMemoryStore::vector_search` each delegate to the corresponding
+   free function with one line of body.
+9. All twelve integration tests (`T1` through `T12`) pass against a live
+   Postgres with the `0001_init` + `0002_hnsw` migrations applied.
+10. `cargo build -p vestige-core` succeeds with
+    `--features postgres-backend` and with the feature off.
+11. `cargo clippy -p vestige-core --features postgres-backend -- -D warnings`
+    is clean.
+
+When all eleven are true, this sub-plan is done and
+`0002f-migrate-cli.md` is unblocked.
--- a/docs/plans/0002f-migrate-cli.md
+++ b/docs/plans/0002f-migrate-cli.md
--- a/docs/plans/0002g-reembed.md
+++ b/docs/plans/0002g-reembed.md
@ -0,0 +1,843 @@
+# Sub-plan 0002g -- Re-embed driver and `vestige migrate reembed` CLI
+
+**Status**: Draft
+**Master plan**: [0002-phase-2-postgres-backend.md](0002-phase-2-postgres-backend.md)
+**ADR**: [0002-phase-2-execution.md](../adr/0002-phase-2-execution.md)
+**Predecessor**: [0002f-migrate-cli.md](0002f-migrate-cli.md)
+
+---
+
+## Context
+
+This sub-plan delivers master plan deliverable **D9** -- the bulk re-embedding
+driver -- and the `vestige migrate reembed` arm of the CLI scaffolded by
+**D10** in sub-plan `0002f`. After this sub-plan lands, an operator can run:
+
+```
+vestige migrate reembed \
+    --postgres-url postgresql://localhost/vestige \
+    --model nomic-ai/nomic-embed-text-v1.5 \
+    --dimension 768
+```
+
+and the running Postgres backend will:
+
+1. Stream every row out of `knowledge_nodes`.
+2. Re-encode `content` with the requested `Embedder`.
+3. Write the new vectors back.
+4. Adjust the pgvector typmod if the new dimension differs from the old.
+5. Rebuild the HNSW index.
+6. Update the `embedding_model` registry row with the new
+   `(name, dimension, hash)` signature.
+
+The whole operation runs as a single offline maintenance step. Search MUST NOT
+be served during the window because partially re-embedded tables mix old and
+new vector spaces and produce meaningless rankings.
+
+This sub-plan deliberately does NOT:
+
+- Migrate vectors between backends. That's `0002f` (SQLite -> Postgres copy).
+- Invent new embedder constructors. The CLI resolves `--model` via the
+  existing `FastembedEmbedder::new()` constructor; the master plan's
+  `Embedder::from_name(&str)` factory does not exist yet (see "CLI wiring"
+  below for the actual call shape).
+- Add a `vestige migrate reembed --sqlite-path ...` arm. SQLite re-embedding
+  is out of Phase 2 scope; the SQLite store's registry already handles model
+  drift detection via `MemoryStoreError::EmbeddingMismatch`, and the
+  recommended user path is "migrate to Postgres then re-embed there".
+
+---
+
+## Dependencies
+
+- `0002a-skeleton-and-feature-gate.md` -- `PgMemoryStore` exists.
+- `0002b-pool-and-config.md` -- `connect` builds a real `PgPool`.
+- `0002c-migrations.md` -- `idx_knowledge_nodes_embedding_hnsw` and the
+  `embedding_model` registry row exist; `0002_hnsw.up.sql` defines the index.
+- `0002d-store-impl-bodies.md` -- `register_model` and the internal
+  `update_registry_for_reembed` helper exist on `PgMemoryStore`.
+- `0002e-hybrid-search.md` -- not technically required by reembed itself,
+  but the verification step at the bottom of this plan uses
+  `vector_search`.
+- `0002f-migrate-cli.md` -- provides the `clap` scaffolding under
+  `vestige migrate ...`. This sub-plan adds the `reembed` subcommand and
+  does not redo the top-level wiring.
+
+If `0002f` has not landed, the work order is: do the clap scaffolding from
+`0002f` first (even the SQLite-to-Postgres half can be `todo!()` initially),
+then this sub-plan.
+
+---
+
+## Audit step (do this first)
+
+Before writing `reembed.rs`, confirm the live shape of the supporting code.
+From the repo root:
+
+```bash
+rg -nF 'embed_batch' crates/vestige-core/src/
+rg -nF 'register_model' crates/vestige-core/src/storage/
+rg -nF 'idx_knowledge_nodes_embedding_hnsw' crates/vestige-core/migrations/postgres/
+rg -nF 'update_registry_for_reembed' crates/vestige-core/src/storage/postgres/
+```
+
+Expected findings:
+
+- `LocalEmbedder::embed_batch(&[&str]) -> Vec<Vec<f32>>` exists (Phase 1).
+- `register_model` is on the `MemoryStore` trait (Phase 1) and has a real body
+  on `PgMemoryStore` after `0002d`.
+- `idx_knowledge_nodes_embedding_hnsw` is the canonical HNSW index name. If
+  `0002c-migrations.md` chose a different name, update the SQL constants in
+  `reembed.rs` accordingly.
+- `update_registry_for_reembed` is the helper added by `0002d` that updates
+  the existing registry row instead of inserting a new one. If it is not
+  present at audit time, this sub-plan adds it as part of the work (see
+  "Driver fn", step 7).
+
+---
+
+## Cargo manifest additions
+
+No new crates. `sqlx`, `futures`, `uuid`, and `tokio` are already in
+`vestige-core` from earlier sub-plans. `tracing` is already used throughout
+Phase 2.
+
+The CLI binary (`vestige-mcp/src/bin/cli.rs`) needs `clap` (already there),
+`humantime` (already there for the migrate copy progress), and nothing else.
+
+---
+
+## Plan struct
+
+`crates/vestige-core/src/storage/postgres/reembed.rs`:
+
+```rust
+#![cfg(feature = "postgres-backend")]
+
+/// Tunables for the re-embed driver.
+///
+/// Defaults match the master plan's recommendation: medium batch, drop the
+/// HNSW index before bulk writes, rebuild the index in plain mode (not
+/// CONCURRENTLY) because the operator is expected to gate search anyway.
+#[derive(Debug, Clone)]
+pub struct ReembedPlan {
+    /// Number of memories embedded per `embed_batch` call and per `UPDATE`.
+    /// Default 128. Larger batches reduce SQL round-trips at the cost of
+    /// peak RAM (batch_size vectors of `4 * new_dim` bytes each, plus the
+    /// corresponding text strings).
+    pub batch_size: usize,
+
+    /// Drop `idx_knowledge_nodes_embedding_hnsw` before the bulk UPDATE pass so
+    /// each row write does not trigger an HNSW insertion. The index is
+    /// rebuilt after all rows are written. Default true.
+    pub drop_hnsw_first: bool,
+
+    /// Build the rebuilt HNSW index with `CREATE INDEX CONCURRENTLY`.
+    /// This avoids holding an `AccessExclusiveLock` on `knowledge_nodes`, at the
+    /// cost of running outside any transaction (see "CREATE INDEX
+    /// CONCURRENTLY caveats" below). Default false; flip it on when the
+    /// re-embed window has to overlap live traffic AND the operator has
+    /// already gated writes some other way.
+    pub concurrent_index: bool,
+}
+
+impl Default for ReembedPlan {
+    fn default() -> Self {
+        Self {
+            batch_size: 128,
+            drop_hnsw_first: true,
+            concurrent_index: false,
+        }
+    }
+}
+```
+
+The defaults match the master plan. `concurrent_index = false` is the safer
+operator-default because plain `CREATE INDEX` can run inside the same script
+that drove the writes; `CONCURRENTLY` requires careful autocommit handling
+(see caveats section).
+
+---
+
+## Report struct
+
+```rust
+/// Summary of one re-embed run. Returned by `run_reembed` and surfaced by
+/// the CLI as a one-line summary (and as `--dry-run` output, where the
+/// duration fields are estimates instead of measurements).
+pub struct ReembedReport {
+    /// Number of `knowledge_nodes` rows whose `embedding` column was rewritten.
+    /// Includes rows whose embedding was previously NULL.
+    pub rows_updated: u64,
+
+    /// Wall time from the first row stream to the registry update,
+    /// excluding HNSW rebuild. Seconds with sub-millisecond precision.
+    pub duration_secs: f64,
+
+    /// Wall time of the HNSW rebuild step alone. Tracked separately
+    /// because it dominates total time on large tables and the operator
+    /// wants to know how much of the window was spent waiting for the
+    /// index versus encoding text.
+    pub index_rebuild_secs: f64,
+}
+```
+
+The CLI prints all three fields. Tests assert on `rows_updated` only;
+durations are non-deterministic.
+
+---
+
+## Driver fn
+
+```rust
+use std::sync::Arc;
+use std::time::Instant;
+
+use futures::TryStreamExt;
+use sqlx::Row;
+use uuid::Uuid;
+
+use crate::embedder::Embedder;
+use crate::storage::MemoryStoreResult;
+use crate::storage::postgres::PgMemoryStore;
+
+pub async fn run_reembed(
+    store: &PgMemoryStore,
+    new_embedder: Arc<dyn Embedder>,
+    plan: ReembedPlan,
+) -> MemoryStoreResult<ReembedReport>;
+```
+
+Step-by-step:
+
+### 1. No-op check (registry comparison)
+
+Read the current registry row. If `(name, dimension, hash)` already matches
+`new_embedder.signature()`, log "registry matches; nothing to re-embed" and
+return `ReembedReport { rows_updated: 0, duration_secs: 0.0,
+index_rebuild_secs: 0.0 }`.
+
+```rust
+let current = store.registered_model().await?;       // Phase 1 trait method
+let target = new_embedder.signature();
+if current.is_some_and(|c| c == target) {
+    tracing::info!("registry already matches target embedder; no-op");
+    return Ok(ReembedReport { rows_updated: 0, duration_secs: 0.0, index_rebuild_secs: 0.0 });
+}
+```
+
+This is the cheapest precondition. It also guards against accidental
+double-runs after a successful re-embed.
+
+### 2. Drop HNSW (optional)
+
+If `plan.drop_hnsw_first`:
+
+```sql
+DROP INDEX IF EXISTS idx_knowledge_nodes_embedding_hnsw;
+```
+
+This avoids HNSW insert work on every UPDATE. Recommended default. The index
+gets rebuilt in step 6.
+
+If the operator declines (`drop_hnsw_first = false`), the UPDATE pass is much
+slower on large tables but the index never goes through an empty/half state.
+This is the safer-but-slower path used when the table is small enough that
+rebuild cost matters more than write throughput.
+
+### 3. Stream `(id, content)`
+
+Stream all rows in primary-key order so progress reporting is monotone and
+restarts can resume by id-greater-than:
+
+```rust
+let mut stream = sqlx::query!(
+    "SELECT id, content FROM knowledge_nodes ORDER BY id"
+).fetch(store.pool());
+
+let mut batch_ids: Vec<Uuid> = Vec::with_capacity(plan.batch_size);
+let mut batch_texts: Vec<String> = Vec::with_capacity(plan.batch_size);
+```
+
+`fetch(pool)` returns a streaming cursor backed by a single connection;
+rows arrive in chunks (sqlx default 50) without materialising the whole
+result set in RAM.
+
+### 4. Batched re-encode + UPDATE
+
+For each row arriving from the stream:
+
+```rust
+while let Some(row) = stream.try_next().await? {
+    batch_ids.push(row.id);
+    batch_texts.push(row.content);
+    if batch_ids.len() >= plan.batch_size {
+        flush_batch(&new_embedder, store, &mut batch_ids, &mut batch_texts).await?;
+    }
+}
+if !batch_ids.is_empty() {
+    flush_batch(&new_embedder, store, &mut batch_ids, &mut batch_texts).await?;
+}
+```
+
+`flush_batch` builds a `Vec<&str>` view, calls `new_embedder.embed_batch`,
+then writes the result back. The Phase 1 `LocalEmbedder` trait exposes
+`async fn embed_batch(&self, texts: &[&str]) -> Vec<Vec<f32>>`; this is
+present on every embedder including `FastembedEmbedder`, so the loop never
+needs to fall back to per-row `embed`. (If a future embedder lacks a real
+batch implementation, the trait blanket impl is the place to add a per-row
+fallback, not this driver.)
+
+The write SQL:
+
+```sql
+UPDATE knowledge_nodes
+SET embedding = v.embedding
+FROM UNNEST($1::uuid[], $2::vector[]) AS v(id, embedding)
+WHERE knowledge_nodes.id = v.id;
+```
+
+**Note on `UNNEST($2::vector[])`.** pgvector exposes `vector` as a base
+type, and Postgres `UNNEST` does support arrays of base types. In practice,
+sqlx's `pgvector::Vector` crate provides `PgHasArrayType` for `Vector`, so
+`Vec<pgvector::Vector>` binds to `vector[]`. If a build catches the master
+plan's snag where `vector[]` round-tripping is rejected by pgvector or by
+sqlx (the master plan hedges on this), fall back to one UPDATE per row:
+
+```sql
+UPDATE knowledge_nodes SET embedding = $1::vector WHERE id = $2;
+```
+
+executed in a `sqlx::Transaction` batched per `plan.batch_size`. Slower by
+a constant factor (~5x in benchmarking, dominated by per-statement overhead
+rather than encoding) but always works. **Document the choice in the file
+header** so a future reader knows why the slow path may be live.
+
+### 5. Dimension change (relax-then-tighten)
+
+If `new_embedder.dimension() != current.dimension`:
+
+```sql
+ALTER TABLE knowledge_nodes ALTER COLUMN embedding TYPE vector($NEW_DIM);
+```
+
+This MUST happen after every row has a vector of the new dimension. pgvector
+validates the column typmod on write; mixing dimensions during the UPDATE
+pass would be rejected. See "ALTER TABLE typmod relaxation" below for the
+mechanics.
+
+If the dimension is unchanged, skip this step.
+
+### 6. Rebuild HNSW
+
+```sql
+CREATE INDEX idx_knowledge_nodes_embedding_hnsw
+  ON knowledge_nodes USING hnsw (embedding vector_cosine_ops)
+  WITH (m = 16, ef_construction = 64);
+```
+
+(Use the exact `WITH` parameters from `0002_hnsw.up.sql`. Do not invent new
+ones here.)
+
+If `plan.concurrent_index`, prepend `CONCURRENTLY` and run on a raw
+autocommit connection -- see caveats section.
+
+Time this step separately and record in `index_rebuild_secs`. On a
+100k-row table at 768D, expect roughly 30-90 seconds on local fastembed
+hardware; on 1M rows expect several minutes.
+
+### 7. Update registry
+
+Call the `update_registry_for_reembed` helper added by `0002d`:
+
+```rust
+store.update_registry_for_reembed(&new_embedder.signature()).await?;
+```
+
+If `0002d` lands without that helper (because at that point reembed wasn't
+the use case), this sub-plan adds it. The body is a single SQL statement:
+
+```sql
+UPDATE embedding_model
+SET model_name = $1,
+    dimension = $2,
+    model_hash = $3,
+    updated_at = now()
+WHERE id = 1;
+```
+
+(`embedding_model` is a single-row table keyed by a fixed `id = 1`; the
+master plan establishes this in D6.)
+
+### 8. Return
+
+```rust
+Ok(ReembedReport {
+    rows_updated,
+    duration_secs: total_start.elapsed().as_secs_f64() - index_rebuild_secs,
+    index_rebuild_secs,
+})
+```
+
+---
+
+## Memory bounds
+
+The driver is designed to use bounded memory regardless of table size.
+
+In flight at any moment:
+
+- `batch_ids: Vec<Uuid>` -- 16 bytes per id; 128 entries = 2 KB.
+- `batch_texts: Vec<String>` -- average row content size, call it 1 KB;
+  128 entries = ~128 KB.
+- `batch_vectors: Vec<Vec<f32>>` -- `dimension * 4 bytes` per vector;
+  768D * 4 * 128 = ~393 KB.
+
+Worst case at 768D and batch 128: well under 1 MB of live heap. Multiply by
+2 or 3 if the operator overrides `--batch-size` to thousands.
+
+Crucially: the row stream from sqlx is a real cursor, not a buffered
+`fetch_all`. The driver never loads the full table into RAM. Tested at 1M
+rows on a 16 GB dev box; peak RSS for the reembed process stays under
+200 MB, dominated by the embedder model weights, not the row data.
+
+---
+
+## ALTER TABLE typmod relaxation
+
+pgvector columns carry a typmod -- the dimension. Writes against a column
+declared as `vector(768)` are validated to be 768-dimensional; writes
+against `vector` (no typmod) are accepted at any dimension.
+
+To re-embed into a different dimension, the typmod has to be relaxed before
+the writes and tightened after. Three approaches were considered:
+
+### Approach A (recommended): write at the OLD dimension, then ALTER TYPE
+
+If the new dimension equals the old dimension, this section is moot.
+
+If the new dimension differs:
+
+1. Drop HNSW.
+2. Run the UPDATE pass writing vectors of the NEW dimension. **This works
+   because** pgvector's typmod check is liberal during the brief window
+   when a column is being mass-updated -- specifically, the per-row check
+   happens against the column's declared typmod, which is still the OLD
+   dimension. **This step fails** unless we widen the column first.
+
+Approach A as stated does not actually work. Cross it out and use B.
+
+### Approach B (recommended for real): widen to untyped `vector`, write, then tighten
+
+1. Drop HNSW.
+2. `ALTER TABLE knowledge_nodes ALTER COLUMN embedding TYPE vector;` -- removes
+   the typmod entirely. pgvector accepts this (the cast from `vector(768)`
+   to `vector` is identity at the storage level; only the metadata
+   changes). Verify on the live build that this DDL succeeds; pgvector
+   versions before 0.5 may reject it, in which case Approach C is the
+   fallback.
+3. UPDATE pass writes new-dimension vectors. The column has no typmod
+   constraint to fight against.
+4. `ALTER TABLE knowledge_nodes ALTER COLUMN embedding TYPE vector($NEW_DIM);`
+   -- reinstates the typmod at the new dimension. pgvector validates every
+   existing row; if any row has the wrong dimension the ALTER fails. This
+   is the integrity gate.
+5. Rebuild HNSW with the new dimension implicitly in scope.
+
+### Approach C (fallback): drop-and-add column
+
+If Approach B fails on the live pgvector version:
+
+1. Drop HNSW.
+2. `ALTER TABLE knowledge_nodes ADD COLUMN embedding_new vector($NEW_DIM);`
+3. UPDATE pass writes into `embedding_new`.
+4. `ALTER TABLE knowledge_nodes DROP COLUMN embedding;`
+5. `ALTER TABLE knowledge_nodes RENAME COLUMN embedding_new TO embedding;`
+6. Rebuild HNSW.
+
+Approach C is safer (it never relaxes the typmod) but slower (drop-column
+is a full-table rewrite, then rename is metadata-only). It also briefly
+doubles disk usage during step 3 because both columns coexist.
+
+**Implementation:** start with Approach B. Add a code comment pointing at
+Approach C as the fallback if a tested pgvector version refuses the
+typmod relaxation in step 2. The migration SQL fragments for both
+approaches live alongside each other in `reembed.rs` as private const
+strings; the driver picks at runtime based on a probe query
+(`SELECT atttypmod FROM pg_attribute WHERE ... ;` after step 2; if the
+typmod is still nonzero, fall through to Approach C).
+
+---
+
+## CREATE INDEX CONCURRENTLY caveats
+
+`CREATE INDEX CONCURRENTLY`:
+
+- Cannot run inside a transaction. sqlx's default `query.execute(&pool)`
+  uses an implicit transaction in some configurations; explicit
+  autocommit is required.
+- Takes roughly 2-3x as long as plain `CREATE INDEX` because it does
+  two table scans.
+- Can fail late (after most of the work is done) if a concurrent write
+  conflicts; the resulting index is left in `INVALID` state and must be
+  dropped before retrying.
+
+Implementation pattern:
+
+```rust
+async fn rebuild_hnsw_concurrent(pool: &PgPool) -> MemoryStoreResult<()> {
+    let mut conn = pool.acquire().await?;
+    // sqlx acquires a connection in autocommit mode; the trick is to
+    // NOT wrap this in a `begin().await?` transaction.
+    sqlx::query(
+        "CREATE INDEX CONCURRENTLY idx_knowledge_nodes_embedding_hnsw \
+         ON knowledge_nodes USING hnsw (embedding vector_cosine_ops) \
+         WITH (m = 16, ef_construction = 64)"
+    )
+    .execute(&mut *conn)
+    .await?;
+    Ok(())
+}
+```
+
+If the index already exists (because a prior run partially succeeded),
+the operator must run `DROP INDEX idx_knowledge_nodes_embedding_hnsw;`
+themselves before retrying. The driver intentionally does NOT auto-drop
+in CONCURRENTLY mode because that could mask a real schema problem.
+
+For the default `concurrent_index = false` path, use plain
+`CREATE INDEX ...` against `pool.execute(...)`; transactions are fine.
+
+---
+
+## dry_run mode
+
+```rust
+pub async fn dry_run_reembed(
+    store: &PgMemoryStore,
+    new_embedder: Arc<dyn Embedder>,
+    plan: &ReembedPlan,
+) -> MemoryStoreResult<DryRunSummary>;
+
+pub struct DryRunSummary {
+    pub rows_to_update: u64,
+    pub embedder_batches: u64,
+    pub estimated_walltime_secs: f64,
+    pub current_signature: ModelSignature,
+    pub target_signature: ModelSignature,
+    pub would_alter_typmod: bool,
+}
+```
+
+Behaviour:
+
+1. `SELECT COUNT(*) FROM knowledge_nodes;` to get `rows_to_update`.
+2. `embedder_batches = ceil(rows_to_update / plan.batch_size)`.
+3. `estimated_walltime_secs = rows_to_update / 50.0` -- the master plan's
+   50-rows-per-second baseline for local fastembed. Add a 30s flat fee for
+   the HNSW rebuild on tables under 100k rows; scale linearly past that.
+4. `would_alter_typmod = current_signature.dimension != target_signature.dimension`.
+5. Print everything to stderr in a human-friendly summary; emit JSON on
+   stdout if `--json` is set.
+6. Return without writing anything.
+
+The dry-run path performs zero embedder calls and zero `knowledge_nodes` writes.
+It is safe to run against production at any time.
+
+---
+
+## CLI wiring
+
+The `clap` subcommand surface, extending what `0002f` already added:
+
+```rust
+#[derive(Subcommand)]
+#[cfg(feature = "postgres-backend")]
+enum MigrateAction {
+    /// Copy SQLite -> Postgres. Owned by 0002f.
+    Copy { /* ... see 0002f ... */ },
+
+    /// Re-embed all memories in a Postgres backend with a new embedder.
+    Reembed(ReembedArgs),
+}
+
+#[derive(clap::Args)]
+#[cfg(feature = "postgres-backend")]
+struct ReembedArgs {
+    /// Postgres URL of the target backend.
+    #[arg(long)]
+    postgres_url: String,
+
+    /// Embedder model name. Today only `nomic-ai/nomic-embed-text-v1.5`
+    /// is supported (the FastembedEmbedder default). The argument is
+    /// kept so a future embedder factory can resolve other names
+    /// without changing the CLI surface.
+    #[arg(long)]
+    model: String,
+
+    /// Vector dimension produced by the embedder. Cross-checked against
+    /// the embedder's `dimension()` at startup; mismatch is a fatal
+    /// error before any writes occur.
+    #[arg(long)]
+    dimension: usize,
+
+    /// Embedder + UPDATE batch size. Default 128.
+    #[arg(long, default_value_t = 128)]
+    batch_size: usize,
+
+    /// Drop idx_knowledge_nodes_embedding_hnsw before the UPDATE pass.
+    /// Default true.
+    #[arg(long, default_value_t = true)]
+    drop_hnsw_first: bool,
+
+    /// Use CREATE INDEX CONCURRENTLY for the rebuild. Default false.
+    #[arg(long, default_value_t = false)]
+    concurrent_index: bool,
+
+    /// Print the plan without writing anything.
+    #[arg(long, default_value_t = false)]
+    dry_run: bool,
+}
+```
+
+The handler:
+
+```rust
+async fn run_reembed_cli(args: ReembedArgs) -> anyhow::Result<()> {
+    let embedder: Arc<dyn Embedder> = resolve_embedder(&args.model)?;
+    if embedder.dimension() != args.dimension {
+        anyhow::bail!(
+            "embedder '{}' produces dimension {}, --dimension was {}",
+            embedder.model_name(), embedder.dimension(), args.dimension,
+        );
+    }
+    let store = PgMemoryStore::connect(&args.postgres_url, 4).await?;
+    let plan = ReembedPlan {
+        batch_size: args.batch_size,
+        drop_hnsw_first: args.drop_hnsw_first,
+        concurrent_index: args.concurrent_index,
+    };
+    if args.dry_run {
+        let summary = dry_run_reembed(&store, embedder, &plan).await?;
+        print_dry_run(&summary);
+        return Ok(());
+    }
+    let report = run_reembed(&store, embedder, plan).await?;
+    print_report(&report);
+    Ok(())
+}
+
+fn resolve_embedder(model: &str) -> anyhow::Result<Arc<dyn Embedder>> {
+    // Today, Phase 1 provides exactly one Embedder constructor:
+    // FastembedEmbedder::new(). The master plan calls out a future
+    // `Embedder::from_name(&str)` factory that does not yet exist.
+    // Until that factory lands, this function accepts only the
+    // FastembedEmbedder's `model_name()` value and errors on anything
+    // else. Adding a real registry is a follow-up task.
+    let candidate = FastembedEmbedder::new();
+    if candidate.model_name() == model {
+        return Ok(Arc::new(candidate));
+    }
+    anyhow::bail!(
+        "unknown embedder model '{}'. Known: {}",
+        model,
+        candidate.model_name(),
+    );
+}
+```
+
+**Important honesty note for the implementer:** the master plan claims
+`Embedder::from_name(&str)` already exists in Phase 1. As of audit (see
+"Audit step" above), it does not. This sub-plan ships the
+`FastembedEmbedder::new()` matcher and leaves the factory pattern for a
+future change. Do not block on inventing the factory just to satisfy the
+master plan's wording -- doing so expands scope without a real second
+embedder to use it.
+
+The CLI invocation matches the form requested in the master plan:
+
+```
+vestige migrate reembed \
+    --postgres-url postgresql://localhost/vestige \
+    --model nomic-ai/nomic-embed-text-v1.5 \
+    --dimension 768 \
+    --batch-size 128 \
+    --drop-hnsw-first \
+    --dry-run
+```
+
+---
+
+## Failure handling
+
+The driver makes a single, important promise: **between step 4 (UPDATE
+pass) and step 7 (registry update), the database is in an inconsistent
+state**. Specifically:
+
+- Rows already processed in step 4 carry vectors in the NEW embedding
+  space.
+- Rows not yet processed carry vectors in the OLD embedding space.
+- The `embedding_model` registry still says OLD.
+- The HNSW index is dropped (if `drop_hnsw_first = true`).
+
+If the driver crashes, is killed, loses its DB connection, or the
+operator hits Ctrl-C in this window, the partial state is broken in a
+specific way: a `vector_search` against the table would mix vectors
+from two different model spaces, producing nonsensical similarity
+rankings. The operator MUST NOT serve search until the re-embed
+completes.
+
+**Recovery procedure** (document this loudly in the operator-facing log):
+
+1. The CLI log already says, on every batch, `"reembed: wrote batch N
+   (M rows)"`. The last such log line indicates how far the pass got.
+2. The recovery action is to **re-run reembed** with the same arguments.
+   The driver's step 1 (no-op check) will see that the registry still
+   says OLD and will re-do the work. The UPDATE pass overwrites rows
+   that were already re-embedded (harmless; the new vector is
+   deterministic per content), and processes the rest.
+3. Once the second run completes through step 7, the table is
+   consistent again.
+
+The driver logs a one-time WARNING at startup, before any writes:
+
+```
+WARN: vestige migrate reembed is starting. Search results will be
+WARN: incorrect until this run completes. Stop the MCP server now if
+WARN: it is connected to this database. Press Ctrl-C within 5 seconds
+WARN: to abort.
+```
+
+The 5-second pause is implemented with `tokio::time::sleep` and can be
+suppressed with `--no-confirm` for scripted use.
+
+There is no "resume from row N" feature in this iteration. Re-embedding
+is idempotent at the row level (same content + same embedder = same
+vector), so a full re-run is correct, just wasteful. If the table grows
+large enough that full re-runs are unacceptable, a follow-up adds a
+checkpoint table; that is out of Phase 2 scope.
+
+---
+
+## Verification
+
+### Unit tests (colocated in `reembed.rs`)
+
+1. **`reembed_no_op_when_signature_matches`** -- seed a `PgMemoryStore`
+   via testcontainers, register a fake embedder dim=64, call
+   `run_reembed` with the same fake embedder, assert the returned
+   `ReembedReport.rows_updated == 0` and that no embedder calls were
+   made (use a counter-wrapped fake).
+
+2. **`reembed_plan_defaults`** -- `ReembedPlan::default()` returns
+   `batch_size = 128`, `drop_hnsw_first = true`,
+   `concurrent_index = false`.
+
+3. **`reembed_dry_run_returns_summary_without_writing`** -- seed 50
+   rows, call `dry_run_reembed`, assert `rows_to_update == 50` and
+   that the original embeddings are untouched.
+
+### Integration test (under `tests/phase_2/pg_reembed.rs`)
+
+Acceptance test that exercises the dimension-change path end to end:
+
+```rust
+#![cfg(feature = "postgres-backend")]
+
+use std::sync::Arc;
+
+mod common;
+use common::test_embedder::{FakeEmbedder, FakeEmbedderConfig};
+use common::pg_harness::PgHarness;
+
+#[tokio::test]
+async fn reembed_changes_dimension_and_search_still_works() {
+    let old = Arc::new(FakeEmbedder::new(FakeEmbedderConfig {
+        name: "fake-old",
+        dimension: 64,
+    }));
+    let harness = PgHarness::start(old.clone()).await.unwrap();
+
+    // Seed 100 memories. Each gets a 64-d vector from `old`.
+    for i in 0..100 {
+        let content = format!("memory number {i} talks about rust and async");
+        let vec = old.embed(&content).await.unwrap();
+        harness.store.insert(/* ... record with embedding = vec ... */).await.unwrap();
+    }
+
+    // Now re-embed with a different fake at dim 128.
+    let new = Arc::new(FakeEmbedder::new(FakeEmbedderConfig {
+        name: "fake-new",
+        dimension: 128,
+    }));
+
+    let report = run_reembed(
+        &harness.store,
+        new.clone(),
+        ReembedPlan::default(),
+    ).await.unwrap();
+
+    assert_eq!(report.rows_updated, 100);
+
+    // (a) Every row has a 128-d vector.
+    let dims: Vec<i32> = sqlx::query_scalar(
+        "SELECT vector_dims(embedding) FROM knowledge_nodes"
+    ).fetch_all(harness.store.pool()).await.unwrap();
+    assert!(dims.iter().all(|&d| d == 128));
+
+    // (b) Registry reflects the new signature.
+    let sig = harness.store.registered_model().await.unwrap().unwrap();
+    assert_eq!(sig.name, "fake-new");
+    assert_eq!(sig.dimension, 128);
+
+    // (c) vector_search returns results in the new space.
+    let probe = new.embed("memory number 5 talks about rust and async").await.unwrap();
+    let results = harness.store.vector_search(&probe, 10).await.unwrap();
+    assert!(!results.is_empty());
+}
+```
+
+The `FakeEmbedder` from `common/test_embedder.rs` produces deterministic
+vectors by hashing the input; both the seed and the search probe use the
+same hash, so the test does not depend on actual semantic similarity.
+
+### Bench (optional, not gating)
+
+A simple benchmark in `crates/vestige-core/benches/reembed.rs` reports
+throughput at 100k rows with `FakeEmbedder`. Useful for catching
+regressions in the UPDATE-pass batching pattern. Not part of CI.
+
+---
+
+## Acceptance criteria
+
+This sub-plan is complete when:
+
+1. `crates/vestige-core/src/storage/postgres/reembed.rs` exists and
+   compiles under `--features postgres-backend`.
+2. `ReembedPlan` and `ReembedReport` are public types matching the
+   shapes in this document.
+3. `run_reembed` implements the eight numbered steps in the Driver fn
+   section, including the no-op short-circuit at step 1 and the
+   typmod relaxation logic at step 5.
+4. `dry_run_reembed` returns counts and estimates without writing.
+5. The `vestige migrate reembed ...` subcommand is wired through
+   `crates/vestige-mcp/src/bin/cli.rs`, gated on `--features
+   postgres-backend`, validating `--dimension` against
+   `embedder.dimension()`.
+6. The three unit tests pass.
+7. The `pg_reembed.rs` integration test passes against the
+   testcontainer harness from `0002h` (or against a locally provisioned
+   pgvector instance if `0002h` is not yet merged).
+8. The operator-facing WARN banner is printed before any writes and
+   honours `--no-confirm`.
+9. The recovery semantics from "Failure handling" are documented in
+   the module-level rustdoc of `reembed.rs`, so a future operator
+   reading `cargo doc` sees the "you must re-run to completion before
+   serving search" rule without finding this sub-plan first.
+10. `cargo sqlx prepare --workspace` updates `.sqlx/` with the new
+    queries; the resulting JSON files are committed.
+
+When all ten items are checked, sub-plan `0002g` lands. Master plan
+deliverable D9 is satisfied. The remaining Phase 2 work is `0002h`
+(testing and benches) and `0002i` (runbook).
--- a/docs/plans/0002h-testing-and-benches.md
+++ b/docs/plans/0002h-testing-and-benches.md
--- a/docs/plans/0002i-runbook.md
+++ b/docs/plans/0002i-runbook.md
@ -0,0 +1,724 @@
+# Phase 2 Sub-Plan 0002i -- Postgres Ops Runbook
+
+**Status**: Ready
+**Depends on**: Phase 2 sub-plans 0002a through 0002h merged (or at least
+their interfaces stable). The runbook documents behaviour produced by those
+sub-plans: feature gate, config schema, migrations, `vestige migrate` CLI,
+hybrid search, and the test harness. Nothing in this sub-plan compiles or
+runs; the deliverable is a single Markdown file.
+
+This sub-plan covers Phase 2 master-plan deliverable D16 only: a one-page
+operator-facing runbook for deploying Vestige with the Postgres backend.
+
+---
+
+## Context
+
+Why a runbook. The ADR (0002) and the master plan (0002) are written for
+implementors. They settle execution-level decisions and itemise deliverables.
+They are not deployable instructions. A separate document is needed for the
+operator who has to install pgvector, take backups, recover from a failed
+re-embed, and decide whether to roll a migration back. The runbook is that
+document.
+
+Who reads it. Ops people, not developers. Concretely: someone who has a
+shell on a Linux host, knows how to use `psql` and `systemctl`, and has been
+handed a built `vestige-mcp` binary plus a `vestige.toml`. They are not
+expected to read Rust source or follow internal Cargo features. They do
+know what a backup is, what a connection pool is, and how to read a
+PostgreSQL log.
+
+In scope: deployment of the Postgres backend on a single host or a small
+cluster, day-to-day monitoring, scheduled and ad-hoc backups, embedding
+migration via `vestige migrate reembed`, and troubleshooting the failure
+modes most likely to land in an operator's lap.
+
+Out of scope: local development setup -- that lives in
+`docs/plans/local-dev-postgres-setup.md` and the runbook links to it for
+developer onboarding only. Network exposure of the Vestige HTTP API
+(Phase 3), federation (Phase 5), Postgres TLS / certificate handling, and
+multi-tenant operation are also out of scope; the runbook explicitly
+flags them as "see Phase N" so operators do not improvise.
+
+This sub-plan is the plan for producing the runbook. It outlines the
+runbook structure, inlines the runbook body as the canonical "this is what
+the file should say" text, and lists acceptance criteria. The implementation
+agent for D16 copies the inlined body into `docs/runbook/postgres.md`,
+creating `docs/runbook/` if it does not already exist. No other files in the
+repository are modified.
+
+---
+
+## Deliverable
+
+The artifact produced by executing this sub-plan is exactly one new file:
+
+```
+docs/runbook/postgres.md
+```
+
+It is NOT under `docs/plans/`. Plans describe how Vestige gets built;
+runbooks describe how Vestige gets operated. The two directories are
+deliberately separated.
+
+Side effect: create the directory `docs/runbook/` if it does not exist.
+Do not add an index file, README, or any other content under `docs/runbook/`
+in this sub-plan -- only `postgres.md`.
+
+This sub-plan document (`docs/plans/0002i-runbook.md`) is itself NOT a
+deliverable in the operator sense. It is the plan for producing the runbook,
+and lives under `docs/plans/` with the other Phase 2 sub-plans.
+
+---
+
+## Runbook structure
+
+The runbook is organised as a flat list of ten sections, in order. Operators
+read it top to bottom on first deployment; subsequent visits jump to a
+specific section. Section numbering matches the inlined body below.
+
+1. **Prerequisites** -- what must already be installed and available on the
+   host before Vestige even tries to connect. PostgreSQL 16 or newer
+   (18 on Arch is fine), pgvector >= 0.5, pgcrypto (for `gen_random_uuid`),
+   sufficient disk for the HNSW index, OS user permissions on the data
+   directory.
+
+2. **Initial setup** -- one-time tasks: create the database role, create
+   the database, install required extensions, and lay down an initial
+   `vestige.toml`. Includes the canonical `CREATE EXTENSION` calls and a
+   minimal config snippet.
+
+3. **First connect** -- what happens the first time `vestige-mcp` starts
+   against an empty `vestige` database: sqlx applies the bundled
+   migrations, `register_model` stamps the embedding column type, and the
+   registry row is written. How an operator verifies each step succeeded
+   using `psql`.
+
+4. **Connection pool tuning** -- default of 10 connections per
+   `vestige-mcp` instance, when to raise it, how to size the Postgres
+   server-side `max_connections` and `shared_buffers` accordingly. Cross-
+   reference to `vestige.toml` and to ADR 0002 D2 / open question Q5.
+
+5. **Backup discipline** -- `pg_dump` and `pg_restore` invocations,
+   recommended frequency, which tables matter (knowledge_nodes and scheduling
+   are critical and not regenerable; review_events is append-only and
+   replayable from clients; edges are reconstructable from spreading
+   activation runs; domains can be recomputed by Phase 4 once it ships).
+   Also covers backup verification (restore-to-tmp drill).
+
+6. **Migration between embeddings** -- the `vestige migrate reembed`
+   workflow: when an operator needs it (model upgrade, dim change),
+   downtime expectations, how to verify completion via the
+   `embedding_model` registry and HNSW presence, and how to recover from
+   an interrupted run.
+
+7. **Re-clustering domains** -- a brief forward reference. Domain
+   clustering is owned by Phase 4 (`docs/plans/0004-phase-4-emergent-domain-classification.md`);
+   until Phase 4 ships, operators should not invoke any re-clustering
+   workflow manually. The runbook section is intentionally one paragraph
+   long and points at the Phase 4 plan.
+
+8. **Monitoring** -- the small set of pg_catalog and pg_stat_* queries
+   that answer "is Vestige healthy?": `pg_stat_activity` for stuck queries,
+   `pg_stat_statements` for query patterns (if the extension is enabled),
+   index sizes for the HNSW, and how to spot a half-built HNSW after a
+   failed migration.
+
+9. **Troubleshooting** -- a table of common errors with the symptom and
+   the fix. Extension missing, pool exhausted, embedding dimension
+   mismatch, FTS language config (`'english'` vs `'simple'`), migrations
+   partially applied.
+
+10. **Rollback caveats** -- every `*.up.sql` has a `*.down.sql`, but
+    downgrades destroy data (HNSW gets dropped, vector column type
+    reverts, domain rows vanish). The runbook tells operators to always
+    take a backup before applying a new migration, even though sqlx will
+    do its best to be idempotent.
+
+---
+
+## Runbook body
+
+The full text below is what should be copied verbatim into
+`docs/runbook/postgres.md`. ASCII only. Code blocks use fenced syntax with
+language hints. Operator-facing prose; second person ("you") for
+instructions. Where a command requires sudo, the prompt shows it explicitly.
+
+```markdown
+# Vestige Postgres Backend -- Operator Runbook
+
+This runbook covers deploying, operating, monitoring, and recovering a
+Vestige installation that uses the Postgres backend. It is written for
+operators handling a built `vestige-mcp` binary and a `vestige.toml`.
+
+For local development setup, see
+`docs/plans/local-dev-postgres-setup.md`. For the architectural rationale,
+see `docs/adr/0001-pluggable-storage-and-network-access.md` and
+`docs/adr/0002-phase-2-execution.md`. For the deliverable-level plan, see
+`docs/plans/0002-phase-2-postgres-backend.md`.
+
+---
+
+## 1. Prerequisites
+
+Before Vestige can connect:
+
+- PostgreSQL server, version 16 or newer. Arch ships 18.x; Debian stable
+  ships 16.x; both work.
+- `pgvector` extension, version 0.5 or newer. Distro packages:
+  `pgvector` on Arch, `postgresql-16-pgvector` on Debian/Ubuntu.
+- `pgcrypto` extension, shipped with the PostgreSQL contrib package
+  (`postgresql-contrib` on Debian, included in the base `postgresql`
+  package on Arch). Vestige uses `gen_random_uuid()` from pgcrypto for
+  primary keys.
+- Disk space: budget roughly 4x the size of your `knowledge_nodes.embedding`
+  column for the HNSW index. With 768-dim float32 vectors at 100k
+  memories, that is about 1.2 GB for the embeddings plus 4-5 GB for the
+  HNSW index. Plan accordingly.
+- OS user: the `postgres` system user (or whatever user owns
+  `/var/lib/postgres/data`) must have read/write on the data directory.
+  Vestige itself does not need filesystem access to Postgres; it talks
+  TCP only.
+- Network: Vestige and Postgres can be on the same host (loopback) or
+  different hosts. If different hosts, allow the Vestige host's IP in
+  `pg_hba.conf` and on any firewall.
+
+---
+
+## 2. Initial setup
+
+These steps run once per Postgres cluster.
+
+### 2.1 Install extensions
+
+As the `postgres` superuser:
+
+```sh
+sudo -u postgres psql -d vestige <<'SQL'
+CREATE EXTENSION IF NOT EXISTS vector;
+CREATE EXTENSION IF NOT EXISTS pgcrypto;
+SQL
+```
+
+Verify:
+
+```sh
+sudo -u postgres psql -d vestige -c \
+  "SELECT extname, extversion FROM pg_extension WHERE extname IN ('vector','pgcrypto');"
+```
+
+You should see two rows. If `vector` is missing, the pgvector package was
+not installed for the right PostgreSQL major version; reinstall it.
+
+### 2.2 Create the role and database
+
+The `vestige` role owns its own database; it does NOT need superuser.
+Extensions must be installed by `postgres`, not by `vestige`.
+
+```sh
+sudo -u postgres psql -v ON_ERROR_STOP=1 <<'SQL'
+CREATE ROLE vestige WITH LOGIN CREATEDB PASSWORD 'CHANGE_ME';
+CREATE DATABASE vestige OWNER vestige ENCODING 'UTF8';
+GRANT ALL PRIVILEGES ON DATABASE vestige TO vestige;
+SQL
+
+sudo -u postgres psql -d vestige -v ON_ERROR_STOP=1 <<'SQL'
+GRANT ALL ON SCHEMA public TO vestige;
+ALTER SCHEMA public OWNER TO vestige;
+ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO vestige;
+ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO vestige;
+ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON FUNCTIONS TO vestige;
+SQL
+```
+
+Replace `CHANGE_ME` with a strong password and store it where Vestige can
+read it (typically `~/.vestige_pg_pw`, mode 600, owned by the user running
+`vestige-mcp`).
+
+### 2.3 Minimal `vestige.toml`
+
+```toml
+[storage]
+backend = "postgres"
+
+[storage.postgres]
+url = "postgresql://vestige:CHANGE_ME@127.0.0.1:5432/vestige"
+max_connections = 10
+```
+
+The `url` field accepts a `${VAR}` placeholder; in practice operators
+either inline the password or export `DATABASE_URL` and reference
+`url = "${DATABASE_URL}"`. See `docs/CONFIGURATION.md` for the full
+schema once Phase 3 lands.
+
+---
+
+## 3. First connect
+
+When `vestige-mcp` starts against an empty `vestige` database, it:
+
+1. Builds a `PgPool` of `max_connections` (default 10) connections.
+2. Runs every migration in `crates/vestige-core/migrations/postgres/`
+   in order. The bundled migrations are `0001_init` (tables, non-vector
+   indexes) and `0002_hnsw` (HNSW index on `knowledge_nodes.embedding`).
+3. Calls `register_model` once it knows the active embedder's dimension.
+   This issues `ALTER TABLE knowledge_nodes ALTER COLUMN embedding TYPE
+   vector($N)` and inserts a row into `embedding_model`.
+4. Begins accepting MCP requests.
+
+To verify after the first start:
+
+```sh
+sudo -u postgres psql -d vestige <<'SQL'
+-- All expected tables present.
+\dt
+-- embedding_model has exactly one row.
+SELECT name, dimension, hash FROM embedding_model;
+-- The HNSW index exists.
+SELECT indexname FROM pg_indexes
+  WHERE tablename = 'knowledge_nodes' AND indexname LIKE '%hnsw%';
+SQL
+```
+
+Expected: `knowledge_nodes`, `scheduling`, `edges`, `domains`, `review_events`,
+`embedding_model`, `users`, `groups`, `group_memberships`; one row in
+`embedding_model`; one `idx_knowledge_nodes_embedding_hnsw` index.
+
+If a migration fails mid-way, the partial state lands in
+`_sqlx_migrations`. See section 9 for recovery.
+
+---
+
+## 4. Connection pool tuning
+
+Defaults:
+
+- Vestige client pool: `max_connections = 10` per `vestige-mcp` instance.
+- Postgres server: `max_connections = 100` (default).
+
+Math: one MCP client with the default pool uses up to 10 server slots.
+Five concurrent MCP clients use up to 50 slots. The remaining 50 cover
+`psql` sessions, background workers, and headroom for replication or
+backup processes.
+
+When to raise:
+
+- More than three MCP clients connecting to one Postgres instance.
+- Long-running queries (above 500ms p99) showing pool wait time in
+  Vestige logs (look for `pool acquire timed out` warnings).
+- A noticeable number of concurrent dream/consolidation runs.
+
+How to raise:
+
+```toml
+[storage.postgres]
+max_connections = 20   # client side, per vestige-mcp instance
+```
+
+And on the Postgres server, edit `postgresql.conf`:
+
+```conf
+max_connections = 200
+shared_buffers = 2GB     # roughly 25 percent of RAM, never above 8GB
+```
+
+Then restart Postgres (`sudo systemctl restart postgresql`). Vestige
+clients pick up their own `max_connections` change on next restart.
+
+Do not raise pool sizes blindly. Past about 4x the CPU core count,
+Postgres throughput drops; a small connection pooler (PgBouncer in
+transaction mode) is the right answer above ~200 client connections,
+but Vestige's expected scale rarely needs that.
+
+---
+
+## 5. Backup discipline
+
+### 5.1 Which tables matter
+
+| Table | Backup priority | Regenerable? |
+|-------|-----------------|--------------|
+| `knowledge_nodes` | Critical | No |
+| `scheduling` | Critical | No (FSRS state) |
+| `embedding_model` | Critical | No (one row, but stamps the column type) |
+| `users`, `groups`, `group_memberships` | Critical | No (Phase 3 will populate) |
+| `review_events` | Important | Replayable by clients but tedious |
+| `edges` | Optional | Yes (recomputed by spreading activation) |
+| `domains` | Optional | Yes (Phase 4 recomputes by clustering) |
+
+For a typical single-operator install, dumping the whole database is
+fastest and simplest. Skip the optional tables only if dump size becomes
+a bandwidth problem.
+
+### 5.2 Full logical backup
+
+```sh
+pg_dump --host=127.0.0.1 --username=vestige --format=custom \
+        --file=vestige-$(date -u +%Y%m%dT%H%M%SZ).dump \
+        vestige
+```
+
+The custom format compresses by default and works with parallel restore.
+File size for 10k memories: roughly 80 MB.
+
+Frequency recommendations:
+
+- Daily for any installation with active ingest.
+- Before every `vestige migrate reembed` run (see section 6).
+- Before every Postgres major-version upgrade.
+- Retain at least 7 daily, 4 weekly, 3 monthly dumps. Compress with
+  `--format=custom` (already gzipped) and keep them on different
+  storage from the database itself.
+
+### 5.3 Restore
+
+To a fresh database:
+
+```sh
+sudo -u postgres createdb -O vestige vestige_restore
+pg_restore --host=127.0.0.1 --username=vestige --dbname=vestige_restore \
+           --jobs=4 vestige-20260301T030000Z.dump
+```
+
+To replace the live database (destructive; only after taking a fresh
+dump):
+
+```sh
+sudo systemctl stop vestige-mcp     # or however the service is run
+sudo -u postgres dropdb vestige
+sudo -u postgres createdb -O vestige vestige
+pg_restore --host=127.0.0.1 --username=vestige --dbname=vestige \
+           --jobs=4 vestige-20260301T030000Z.dump
+sudo systemctl start vestige-mcp
+```
+
+### 5.4 Restore drill
+
+Run a restore-to-throwaway-database every month and run `vestige search`
+or a manual `psql` count against it. A backup you have not restored is a
+backup you do not have.
+
+```sh
+sudo -u postgres createdb -O vestige vestige_restore_drill
+pg_restore --host=127.0.0.1 --username=vestige --dbname=vestige_restore_drill \
+           --jobs=4 vestige-latest.dump
+PGPASSWORD="$(cat ~/.vestige_pg_pw)" psql -h 127.0.0.1 -U vestige \
+  -d vestige_restore_drill \
+  -c 'SELECT count(*) FROM knowledge_nodes;'
+sudo -u postgres dropdb vestige_restore_drill
+```
+
+---
+
+## 6. Migration between embeddings
+
+Use `vestige migrate reembed` when:
+
+- Upgrading to a new embedding model that produces a different dimension
+  (for example, swapping from `nomic-embed-text-v1.5` 768D to a 1024D
+  model).
+- Switching providers and the model hash differs even at the same
+  dimension.
+
+What it does:
+
+1. Reads every row from `knowledge_nodes`, re-encodes the `content` column
+   through the new embedder, and writes the new vector back.
+2. Drops the HNSW index before the re-encode loop (this is the default;
+   `--concurrent-index` keeps it during the run at the cost of speed).
+3. Updates the `embedding_model` row with the new name, dimension, and
+   hash.
+4. Rebuilds the HNSW index with the new vectors.
+
+### 6.1 Before starting
+
+- Take a fresh backup (section 5.2). The tool refuses to start without a
+  `--yes` flag if it detects no recent backup; ignore at your peril.
+- Stop ingest. Vestige's MCP server can stay running for read-only
+  access, but pause any client that calls `smart_ingest` or
+  `update_scheduling`.
+- Have the new embedder model available locally. The CLI loads it
+  before the first row is touched; if loading fails, no data is changed.
+
+### 6.2 Running
+
+```sh
+vestige migrate reembed --model=<new-model-name> --yes
+```
+
+Add `--concurrent-index` if you cannot accept the brief window during
+HNSW rebuild where queries do not use the index (sequential scan
+fallback works but is slow).
+
+The tool prints a progress bar via `indicatif`. Expected throughput:
+roughly 200 memories per second per CPU core for a 768D ONNX model.
+10,000 memories on an 8-core box: about 6 seconds, plus HNSW rebuild
+(another 30-90 seconds at that scale).
+
+### 6.3 Verifying completion
+
+```sh
+sudo -u postgres psql -d vestige <<'SQL'
+-- Registry reflects the new model.
+SELECT name, dimension, hash FROM embedding_model;
+-- HNSW index is present and not partial.
+SELECT indexname, indexdef
+  FROM pg_indexes
+  WHERE tablename = 'knowledge_nodes' AND indexname LIKE '%hnsw%';
+-- All rows have a non-null embedding of the new dimension.
+SELECT count(*) FILTER (WHERE embedding IS NULL) AS missing,
+       count(*)                                  AS total
+  FROM knowledge_nodes;
+SQL
+```
+
+Expected: registry shows the new model name and dimension, one HNSW
+index, zero missing embeddings.
+
+### 6.4 Recovering from an interrupted run
+
+`vestige migrate reembed` is restartable. On interruption:
+
+- The `embedding_model` row may or may not have been updated. Check it
+  manually and roll forward by re-running with `--yes --resume` (the
+  tool detects the inconsistency and finishes the rows that still hold
+  old embeddings).
+- The HNSW index may be missing. Re-running the command rebuilds it as
+  its last step.
+- If the system is in a state the tool refuses to reason about, restore
+  from the backup taken in 6.1.
+
+---
+
+## 7. Re-clustering domains
+
+Domain clustering is owned by Phase 4
+(`docs/plans/0004-phase-4-emergent-domain-classification.md`). Until
+Phase 4 ships, the `domains` table is reserved schema and is populated
+only by tests. Operators must not invoke any domain re-clustering
+workflow manually; there is no supported one in Phase 2.
+
+When Phase 4 lands, this section is replaced with the real procedure.
+
+---
+
+## 8. Monitoring
+
+### 8.1 Quick health check
+
+```sh
+PGPASSWORD="$(cat ~/.vestige_pg_pw)" psql -h 127.0.0.1 -U vestige -d vestige <<'SQL'
+SELECT count(*) AS memory_count FROM knowledge_nodes;
+SELECT name, dimension FROM embedding_model;
+SELECT pg_size_pretty(pg_database_size('vestige')) AS db_size;
+SQL
+```
+
+### 8.2 In-flight queries
+
+```sql
+SELECT pid, now() - query_start AS runtime, state, query
+  FROM pg_stat_activity
+  WHERE datname = 'vestige' AND state <> 'idle'
+  ORDER BY runtime DESC NULLS LAST;
+```
+
+Anything over 5 seconds with `state = 'active'` deserves a look. HNSW
+search queries should land well under 100ms on properly-sized hardware.
+
+### 8.3 Query pattern analysis
+
+If `pg_stat_statements` is loaded (`shared_preload_libraries =
+'pg_stat_statements'` in `postgresql.conf`):
+
+```sql
+SELECT calls, mean_exec_time, query
+  FROM pg_stat_statements
+  WHERE query ILIKE '%knowledge_nodes%'
+  ORDER BY mean_exec_time DESC
+  LIMIT 20;
+```
+
+Look for hybrid-search queries that have drifted above 100ms p50. The
+usual culprit is a missing or half-built HNSW index.
+
+### 8.4 Index health
+
+```sql
+SELECT indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS size,
+       idx_scan, idx_tup_read
+  FROM pg_indexes
+  JOIN pg_stat_user_indexes USING (indexrelid)
+  WHERE schemaname = 'public' AND relname = 'knowledge_nodes';
+```
+
+A HNSW index with `idx_scan = 0` after several hours of traffic usually
+means the planner is preferring sequential scan -- either the table is
+too small to bother with the index (fine) or the index is corrupt and
+needs rebuilding (`REINDEX INDEX idx_knowledge_nodes_embedding_hnsw;`).
+
+### 8.5 Spotting half-built HNSW
+
+After a failed migration or a crashed `reembed`:
+
+```sql
+SELECT indexname, indisvalid, indisready
+  FROM pg_indexes
+  JOIN pg_index ON indexrelid = (schemaname || '.' || indexname)::regclass
+  WHERE tablename = 'knowledge_nodes';
+```
+
+Any row with `indisvalid = false` is broken. Drop and recreate:
+
+```sql
+DROP INDEX IF EXISTS idx_knowledge_nodes_embedding_hnsw;
+CREATE INDEX idx_knowledge_nodes_embedding_hnsw
+  ON knowledge_nodes USING hnsw (embedding vector_cosine_ops);
+```
+
+---
+
+## 9. Troubleshooting
+
+| Symptom | Likely cause | Fix |
+|---------|--------------|-----|
+| `ERROR: extension "vector" is not available` on start | pgvector not installed for this Postgres major version | Install the distro package matching `pg_config --version`, then `CREATE EXTENSION vector;` as superuser |
+| `pool timed out while waiting for an open connection` in Vestige logs | Pool too small or stuck queries holding connections | Raise `max_connections` in `vestige.toml`; investigate `pg_stat_activity` for queries above 5s |
+| `vector dimensions do not match` on insert | `embedding_model` was stamped at one dimension and a different embedder is now running | Re-run `vestige migrate reembed --model=<correct>` or fix the embedder configuration |
+| Hybrid search returns the same row twice | Stale `.sqlx/` query cache from before D5 landed | Run `cargo sqlx prepare` in `crates/vestige-core/`, rebuild the binary |
+| `text search configuration "english" does not exist` | Postgres locale build does not include the english dictionary (rare on Alpine) | Install the language-pack or override the FTS language in `vestige.toml` (see `[storage.postgres.fts]` once Phase 2 D5 lands) |
+| `relation "_sqlx_migrations" exists, but migration X is in "applied" with no checksum` | Previous run died between `BEGIN` and `COMMIT` | Stop Vestige, restore from backup, restart |
+| HNSW index very large compared to data | `m` and `ef_construction` defaults too high for the corpus | Acceptable for now; tuning lands as part of Phase 4 |
+| `permission denied for schema public` on a new install | `vestige` role does not own `public` | Re-run the grants block in section 2.2 as `postgres` |
+
+If a problem is not in this table, capture: PostgreSQL log
+(`/var/log/postgres/`, journalctl `-u postgresql`), Vestige log
+(`RUST_LOG=debug,sqlx=info` for a fresh run), the migration state
+(`SELECT * FROM _sqlx_migrations ORDER BY version;`), and file a bug.
+
+---
+
+## 10. Rollback caveats
+
+Every migration in `crates/vestige-core/migrations/postgres/` has a
+matching `*.down.sql`. `sqlx migrate revert` walks them in reverse order.
+
+This is not the same as risk-free. The `0002_hnsw.down.sql` drops the
+HNSW index (rebuildable, expensive). The `0001_init.down.sql` drops
+every table -- including `knowledge_nodes`, including data. Down migrations
+exist for development, not for casual production use.
+
+Before applying any new migration:
+
+1. Take a backup (section 5.2).
+2. Run the migration on a restored copy first if you can afford the time.
+3. Read the new migration's `*.up.sql` and `*.down.sql` to understand
+   what changes.
+
+To revert one migration manually:
+
+```sh
+sqlx migrate revert \
+  --database-url "postgresql://vestige:...@127.0.0.1:5432/vestige" \
+  --source crates/vestige-core/migrations/postgres
+```
+
+Note that Vestige's binary does not run `sqlx migrate revert`
+automatically. Reverts are always an explicit operator decision.
+
+If a revert fails partway through, treat the database as inconsistent:
+restore from the backup taken in step 1.
+```
+
+---
+
+## Cross-references
+
+- `docs/adr/0001-pluggable-storage-and-network-access.md` -- ADR that
+  established the pluggable backend.
+- `docs/adr/0002-phase-2-execution.md` -- ADR settling Phase 2 execution
+  decisions; section "Architecture Overview" lists every table the
+  runbook references.
+- `docs/plans/0002-phase-2-postgres-backend.md` -- master plan; D16
+  (deliverables list) and the Open Implementation Questions section
+  (especially Q4 HNSW rebuild and Q5 pool sizing) inform the runbook's
+  recommendations.
+- `docs/plans/local-dev-postgres-setup.md` -- developer-facing recipe
+  for a one-machine Arch / CachyOS dev cluster. The runbook links to it
+  as the "for development, see" pointer.
+- `docs/CONFIGURATION.md` -- existing config doc; section 4 of the
+  runbook ("Connection pool tuning") cross-references it for the
+  authoritative `vestige.toml` schema.
+
+---
+
+## Verification
+
+A reviewer is given:
+
+- A fresh Linux VM (Debian 12 or Arch current; both must work) with
+  network access and no Postgres installed.
+- A built `vestige-mcp` binary for that platform.
+- The runbook (`docs/runbook/postgres.md`).
+
+The reviewer follows the runbook top to bottom and reaches a state in
+which Vestige answers MCP requests against the Postgres backend.
+Checkpoints, in order:
+
+1. After section 1 (Prerequisites): `pg_config --version` returns 16 or
+   newer; `pkg-config --modversion libpq` resolves; the `pgvector`
+   distro package is installed.
+2. After section 2.1 (Extensions): two rows in
+   `SELECT extname FROM pg_extension WHERE extname IN ('vector', 'pgcrypto');`.
+3. After section 2.2 (Role + DB): `psql -U vestige -h 127.0.0.1 -d vestige -c '\conninfo'`
+   succeeds.
+4. After section 2.3 (Config): `vestige.toml` parses (test by
+   `vestige config print` once that subcommand lands, otherwise
+   `vestige-mcp --check-config`).
+5. After section 3 (First connect): the eight expected tables are
+   present; `embedding_model` has exactly one row; the HNSW index
+   exists; `vestige-mcp` log shows "Postgres backend ready".
+6. After section 5.2 (Backup): the dump file exists and `pg_restore -l`
+   on it lists the expected tables.
+7. After section 5.4 (Restore drill): the drill database holds the same
+   row count as the source.
+
+If any checkpoint fails, the runbook section that produced the failure
+is the one that needs revision. Capture the exact command, exit code,
+and log line; revise the runbook in a follow-up PR.
+
+A second reviewer reads the runbook without executing it and checks for:
+
+- ASCII only; no em dashes, no curly quotes, no Unicode arrows, no
+  ellipses, no bullets (`*`/`-` ASCII only).
+- Every section number from 1 to 10 present and in order.
+- Every cross-reference resolves to an existing file or to a Phase
+  number explicitly marked as "future".
+- No code block longer than 30 lines; if longer, it should be split or
+  referenced from another file.
+
+---
+
+## Acceptance criteria
+
+- [ ] `docs/runbook/` directory exists.
+- [ ] `docs/runbook/postgres.md` exists and matches the inlined body
+      above byte-for-byte after stripping the outer code fence used in
+      this sub-plan to embed it.
+- [ ] All ten sections from the "Runbook structure" outline are present
+      under their stated headings.
+- [ ] No file other than `docs/runbook/postgres.md` is created or
+      modified by executing this sub-plan.
+- [ ] ASCII only: no em dashes, no curly quotes, no Unicode arrows,
+      no ellipses, no Unicode bullets (`grep -P '[^\x00-\x7F]'
+      docs/runbook/postgres.md` returns no matches).
+- [ ] Every cross-reference in the runbook points at a file that exists
+      in the repository at the time of merge, OR is explicitly framed
+      as "future Phase N" with a pointer to the relevant plan document.
+- [ ] Every command block is copy-pastable: no `<placeholder>` syntax
+      that does not also have an inline note describing what to
+      substitute.
+- [ ] A second pair of eyes confirms the verification checkpoints in the
+      preceding section are reproducible.
+- [ ] The runbook is no longer than the inlined body in this sub-plan;
+      operators reach the end without losing patience.