docs: ADR 0001 + Phase 1-4 implementation plans

Pluggable storage backend, network access, and emergent domain classification. Introduces MemoryStore + Embedder traits, PgMemoryStore alongside SqliteMemoryStore, HTTP MCP + API key auth, and HDBSCAN-based domain clustering. Phase 5 federation deferred to a follow-up ADR. - docs/adr/0001-pluggable-storage-and-network-access.md -- Accepted - docs/plans/0001-phase-1-storage-trait-extraction.md - docs/plans/0002-phase-2-postgres-backend.md - docs/plans/0003-phase-3-network-access.md - docs/plans/0004-phase-4-emergent-domain-classification.md - docs/prd/001-getting-centralized-vestige.md -- source RFC
2026-07-14 22:52:11 +02:00 · 2026-04-21 20:29:40 +02:00 · 2026-04-21 20:29:40 +02:00 · 0d273c5641
commit 0d273c5641
parent 2391acf480
6 changed files with 5667 additions and 0 deletions
--- a/docs/plans/0001-phase-1-storage-trait-extraction.md
+++ b/docs/plans/0001-phase-1-storage-trait-extraction.md
--- a/docs/plans/0002-phase-2-postgres-backend.md
+++ b/docs/plans/0002-phase-2-postgres-backend.md
--- a/docs/plans/0003-phase-3-network-access.md
+++ b/docs/plans/0003-phase-3-network-access.md
--- a/docs/plans/0004-phase-4-emergent-domain-classification.md
+++ b/docs/plans/0004-phase-4-emergent-domain-classification.md
@ -0,0 +1,883 @@
+# Phase 4 Plan: Emergent Domain Classification
+
+**Status**: Draft
+**Depends on**: Phase 1 (domain columns on memories, `Domain` struct + `DomainStore` methods on `MemoryStore`, `Embedder` trait), Phase 2 (Postgres JSONB + TEXT[] support for domain fields, `embedding_model` registry parity), Phase 3 (Axum HTTP server, REST `/api/v1/` scaffolding, API key auth middleware, signed dashboard session cookies)
+**Related**: docs/adr/0001-pluggable-storage-and-network-access.md (Phase 4), docs/prd/001-getting-centralized-vestige.md (Emergent Domain Model)
+
+---
+
+## Scope
+
+### In scope
+
+- `DomainClassifier` cognitive module under `crates/vestige-core/src/neuroscience/domain_classifier.rs`, alongside existing neuroscience modules (spreading_activation, synaptic_tagging, ...).
+- HDBSCAN discovery pipeline using the `hdbscan` crate (v0.10): load all embeddings, cluster, extract centroids, extract top-terms via TF-IDF over cluster members, persist via the trait's `DomainStore` methods.
+- Soft-assignment pipeline: for each memory, compute `cosine_similarity(memory.embedding, domain.centroid)` for every domain, store raw scores in `domain_scores` JSONB, threshold into `domains[]` using `assign_threshold` (default 0.65).
+- Automatic classification on ingest: run through `CognitiveEngine` / `smart_ingest` so new memories get classified against existing centroids immediately; skip when `domain_count == 0` (Phase 0 accumulation).
+- Re-cluster hook in dream consolidation: every Nth four-phase dream cycle (N=5 default) triggers a discovery pass and generates proposals (split / merge / none). Proposals land in a new `domain_proposals` table, surface in the dashboard, and are never auto-applied (conservative drift, ADR Q7).
+- Context signals: `SignalSource` trait with `GitRepoSignal` (detects `.git` in CWD or `metadata.cwd`) and `IdeHintSignal` (reads `metadata.editor` / `metadata.ide`). Each returns a `boost_map` of `domain_id -> additive delta` (typical +0.05). Injected as a `signal_boost: Option<HashMap<String, f64>>` parameter into `DomainClassifier::classify`.
+- Cross-domain spreading activation decay: `ActivationNetwork` traversal multiplies the edge's effective weight by `cross_domain_decay` (default 0.5) when `target.domains` and `source.domains` are disjoint. Strict "no overlap" policy, not graded.
+- CLI subcommands (in `crates/vestige-mcp/src/bin/cli.rs`, under a new `Domains` command group): `list`, `discover [--min-cluster-size N] [--force]`, `rename <id> <new_label>`, `merge <a> <b> [--into <id>]`. Human-readable tables on stdout; JSON via `--json`.
+- Dashboard UI additions (`apps/dashboard/src/routes/(app)/domains/`): list page, per-domain detail (memories, centroid top_terms, score histogram, proposal review controls).
+- REST endpoints under `/api/v1/domains` (introduced by Phase 3 skeleton, implemented in Phase 4): list, discover, rename, merge, proposal list / accept / reject.
+- Config additions: `[domains]` section in `vestige.toml` covering `assign_threshold`, `recluster_interval`, `min_cluster_size`, `cross_domain_decay`, `discovery_threshold`, `merge_threshold`, `signal_boost` (per-signal toggle).
+
+### Out of scope
+
+- Phase 5 federation (explicit separate ADR). Domain centroids are installation-local; no sync.
+- Learned re-weighting of domain scores (future, only if retrieval-quality metrics show a need).
+- Interactive cluster-membership editing in the UI (drag-and-drop reassign) -- future enhancement.
+- Multi-user domain namespaces. One domain set per installation; API keys that carry `domain_filter` just restrict access, they do not create namespaces.
+- Auto-sweep of `min_cluster_size` / auto-tuned `assign_threshold` (ADR resolution Q6 + Q9: static defaults, user tunes).
+- Graded cross-domain decay (`|A intersect B| / max(|A|,|B|)`) -- strict "no overlap" is the Phase 4 rule.
+
+---
+
+## Prerequisites
+
+Artifacts that Phases 1-3 are expected to have landed:
+
+- In `vestige-core`:
+  - `Embedder` trait (`crates/vestige-core/src/embedder/`).
+  - `MemoryStore` trait (`crates/vestige-core/src/storage/trait.rs` or similar) including `DomainStore` methods: `list_domains`, `get_domain`, `upsert_domain`, `delete_domain`, `classify(&[f32]) -> Vec<(String, f64)>`, plus a bulk accessor such as `all_embeddings()` (already present in sqlite.rs as `get_all_embeddings`) and a `get_all_memories_with_embeddings()` iterator for discovery. The trait must expose a method to batch-update `(domains, domain_scores)` for a memory id.
+  - `Domain` struct: `{ id: String, label: String, centroid: Vec<f32>, top_terms: Vec<String>, memory_count: usize, created_at: DateTime<Utc> }`.
+  - Columns on memories in both SQLite and Postgres: `domains TEXT[]` (or JSON array on SQLite) and `domain_scores JSONB` (or TEXT JSON on SQLite).
+  - The `domains` table in both backends (see PRD schema sketch).
+- In `vestige-mcp`:
+  - Axum `/api/v1/` router prefix with auth middleware.
+  - CLI skeleton (`bin/cli.rs`) using `clap`; Phase 4 adds a `Domains` subcommand tree.
+  - REST handlers file structure ready under `crates/vestige-mcp/src/dashboard/handlers.rs` (legacy) and a dedicated REST handler under `/api/v1/`; Phase 4 adds `domains.rs` handler module.
+  - SvelteKit dashboard (`apps/dashboard/`) with existing `(app)/memories`, `(app)/timeline`, `(app)/stats`, etc. Phase 4 adds `(app)/domains/`.
+
+New workspace crate additions required (added manually to `Cargo.toml`, since `cargo add` is not run from the plan):
+
+- `hdbscan = "0.10"` in `crates/vestige-core/Cargo.toml` (feature-gated behind `domain-classification`).
+- Optional: a lightweight stop-word constant inline; no external stop-word crate -- the neuroscience modules already do tokenization on whitespace + length>3 (see `dreams.rs::content_similarity`). Reuse that style; no `ndarray` needed because `hdbscan` v0.10 accepts `&[Vec<f32>]` directly (verified from PRD snippet).
+- No new deps in `vestige-mcp` for Phase 4 -- CLI reuses `clap` / `colored` / `comfy-table` if already present, otherwise a hand-rolled padded print. We pick hand-rolled to avoid adding a table crate; this matches the existing style of `run_stats` in `cli.rs`.
+
+Test fixtures:
+
+- A JSON seed corpus checked into `tests/phase_4/fixtures/seed_500.json` containing >= 500 memories drawn from three plausible clusters. A builder function `tests/phase_4/support/fixtures.rs::build_seed_corpus()` deterministically generates or loads this corpus. Each record has `content`, `tags`, `embedding` (768D bge-base-en-v1.5; use a committed vector or a deterministic mock embedder in tests). For deterministic tests we fake embeddings by hashing content -- acceptable as long as the fake preserves cluster separability (prefix-based: "DEV-...", "INFRA-...", "HOME-..." seeds three Gaussian blobs).
+- Reuse `Embedder` mock from Phase 1 tests (`MockEmbedder`) for discovery tests that need real cosine similarity.
+- A minimal git-repo fixture created in a tempdir (`tempfile::tempdir` + `std::process::Command::new("git").arg("init")`) for context-signal tests.
+
+---
+
+## Deliverables
+
+1. `DomainClassifier` cognitive module: struct, defaults, `classify`, `classify_with_boost`, `reassign_all`, `discover`.
+2. `domain_terms` helper (TF-IDF over cluster members, returning `top_k` terms).
+3. `cli domains discover` subcommand.
+4. `cli domains list` / `rename` / `merge` subcommands.
+5. Auto-classify hook on ingest (wired into the cognitive engine's ingest pipeline before persistence).
+6. Re-cluster hook in dream consolidation (`DreamEngine::run` orchestrator gets an optional `DomainReClusterHook`; triggers every Nth dream).
+7. Context signal extractor module (`crates/vestige-core/src/neuroscience/context_signals.rs`) with `SignalSource` trait + `GitRepoSignal` + `IdeHintSignal`.
+8. Cross-domain spreading activation decay in `ActivationNetwork::activate` (config-driven).
+9. `vestige.toml` `[domains]` section + defaults loader.
+10. Dashboard UI: SvelteKit routes `(app)/domains/+page.svelte` (list), `(app)/domains/[id]/+page.svelte` (detail), `(app)/domains/proposals/+page.svelte` (review).
+11. REST endpoints under `/api/v1/domains` + `/api/v1/domains/proposals`.
+12. `domain_proposals` table + migration + `DomainProposal` trait methods on `MemoryStore`.
+13. WebSocket event `VestigeEvent::DomainProposalCreated` so the dashboard gets a live notification after a re-cluster fires.
+
+---
+
+## Detailed Task Breakdown
+
+### 1. `DomainClassifier` cognitive module
+
+**File**: `crates/vestige-core/src/neuroscience/domain_classifier.rs`
+**Export**: in `crates/vestige-core/src/neuroscience/mod.rs`, add `pub mod domain_classifier;` and re-export `pub use domain_classifier::{DomainClassifier, ClassificationResult, DomainProposal, ProposalKind};`
+**Deps**: `hdbscan = "0.10"`, `serde`, `serde_json`, `chrono`, `tracing`, existing `crate::storage::Domain`, `crate::storage::MemoryStore` trait.
+
+Struct and defaults (match PRD exactly):
+
+```rust
+pub struct DomainClassifier {
+    pub assign_threshold: f64,      // default 0.65
+    pub discovery_threshold: usize, // default 150
+    pub recluster_interval: usize,  // default 5 (every 5th dream)
+    pub min_cluster_size: usize,    // default 10
+    pub min_samples: usize,         // default 5 (HDBSCAN)
+    pub cross_domain_decay: f64,    // default 0.5
+    pub merge_threshold: f64,       // default 0.90 (centroid cosine)
+    pub top_terms_k: usize,         // default 10
+}
+
+impl Default for DomainClassifier { ... }
+```
+
+Result types:
+
+```rust
+#[derive(Debug, Clone)]
+pub struct ClassificationResult {
+    pub scores: HashMap<String, f64>, // raw per-domain similarities
+    pub domains: Vec<String>,         // above assign_threshold
+}
+
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub enum ProposalKind {
+    Split { parent: String, children: Vec<String> },
+    Merge { targets: Vec<String>, suggested_label: String },
+    NewCluster { top_terms: Vec<String> },
+}
+
+#[derive(Debug, Clone)]
+pub struct DomainProposal {
+    pub id: String,                 // uuid v4
+    pub kind: ProposalKind,
+    pub rationale: String,
+    pub confidence: f64,
+    pub created_at: DateTime<Utc>,
+    pub status: ProposalStatus,     // Pending | Accepted | Rejected
+}
+```
+
+Key methods (all pure where possible; all pub):
+
+```rust
+impl DomainClassifier {
+    pub fn classify(&self, embedding: &[f32], domains: &[Domain]) -> ClassificationResult;
+
+    pub fn classify_with_boost(
+        &self,
+        embedding: &[f32],
+        domains: &[Domain],
+        boost: Option<&HashMap<String, f64>>,
+    ) -> ClassificationResult;
+
+    pub async fn reassign_all(
+        &self,
+        store: &dyn MemoryStore,
+        domains: &[Domain],
+    ) -> Result<usize, StorageError>;
+
+    pub async fn discover(
+        &self,
+        store: &dyn MemoryStore,
+    ) -> Result<Vec<Domain>, StorageError>;
+
+    pub async fn propose_changes(
+        &self,
+        store: &dyn MemoryStore,
+        existing: &[Domain],
+        newly_discovered: &[Domain],
+    ) -> Result<Vec<DomainProposal>, StorageError>;
+
+    pub async fn apply_proposal(
+        &self,
+        store: &dyn MemoryStore,
+        proposal: &DomainProposal,
+    ) -> Result<(), StorageError>;
+}
+```
+
+Behavior notes:
+
+- `classify` returns empty `{ scores: {}, domains: [] }` iff `domains.is_empty()` (accumulation phase). This matches the PRD snippet verbatim.
+- `classify_with_boost` adds the boost delta to each score AFTER cosine, before thresholding. It clamps to `[0.0, 1.0]`. Boost keys not present in `domains` are ignored.
+- `reassign_all` streams memories in batches of 500 (iterator on the store) to keep memory bounded; for each memory issues a single `UPDATE memories SET domains = ?, domain_scores = ? WHERE id = ?` call. Returns count of memories whose `domains` vector actually changed.
+- `discover` loads all `(id, embedding)` pairs via an `all_embeddings()` method on the store (exists under `#[cfg(all(feature = "embeddings", feature = "vector-search"))]` in `sqlite.rs::get_all_embeddings`; Phase 1 should promote this onto the trait -- if not yet promoted, add the method). Then:
+  1. Build `Vec<Vec<f32>>` and index -> id map.
+  2. `Hdbscan::default_hyper_params(&embeddings).min_cluster_size(self.min_cluster_size).min_samples(self.min_samples).build()` (exact builder depends on hdbscan 0.10 surface; see Open Question).
+  3. `let labels = clusterer.cluster()?;`
+  4. `let centers = clusterer.calc_centers(Center::Centroid, &labels)?;`
+  5. Group indices by label ignoring -1 (noise). For each cluster compute `top_terms` via `compute_top_terms`.
+  6. Preserve stable IDs where possible: match each new cluster centroid to the closest existing domain by cosine; if similarity > 0.85, reuse the existing domain id + label. Otherwise generate a fresh id `cluster_{n}` with a label derived from the first 2 terms.
+  7. Upsert all resulting `Domain`s via the store.
+- `propose_changes` compares old vs new clusters:
+  - **Split**: an old domain that best-matches two or more new domains each with >= `min_cluster_size` members. Rationale: "domain `dev` is now 2 clusters of >=10 memories: `systems` and `networking`".
+  - **Merge**: two old domains whose centroids now satisfy `cosine > merge_threshold` get a merge proposal.
+  - **NewCluster**: a new cluster that doesn't match any old domain above 0.85 similarity.
+- `apply_proposal` runs the split or merge against the store (reassign memberships via `reassign_all`), then marks the proposal `Accepted`. It never runs automatically -- only via the CLI or dashboard.
+
+Helper:
+
+```rust
+fn compute_top_terms(documents: &[&str], k: usize) -> Vec<String>;
+```
+
+Uses TF-IDF with IDF computed over the entire passed-in corpus (the `documents` slice), tokenization = whitespace split, lowercase, strip non-alphanumeric, drop tokens shorter than 4 chars and a small built-in stop-word list (`the`, `and`, `for`, `that`, `with`, ...). Matches the tokenizer used in `dreams.rs::content_similarity` and `dreams.rs::extract_patterns` so behavior is predictable.
+
+Cosine similarity helper:
+
+```rust
+fn cosine_similarity(a: &[f32], b: &[f32]) -> f64;
+```
+
+Keep the existing crate-level `cosine_similarity` if already present (check `embeddings::` or `search::`); otherwise add a private one. Returns 0.0 on dimension mismatch, panics would be a bug.
+
+### 2. Top-terms computation helper
+
+**File**: same module, private section.
+
+- `fn tokenize(text: &str) -> Vec<String>`: lowercase, split on non-alphanumeric, filter len >= 4, drop stop-words.
+- `fn tfidf_top_k(docs: &[&str], k: usize) -> Vec<String>`:
+  1. `tf[doc_idx][term] = count / total_terms`.
+  2. `df[term] = docs containing term`.
+  3. `idf[term] = log((N + 1) / (df[term] + 1)) + 1` (smoothed).
+  4. For each term, average `tf` across docs in the cluster; multiply by `idf`; sort desc; return top `k`.
+
+Cluster top-terms are computed over cluster members only, with IDF over the **whole corpus** (all memory contents), not the cluster, so common words get penalized globally. Recompute global IDF once per `discover` call.
+
+### 3. CLI subcommand: `vestige domains discover`
+
+**File**: `crates/vestige-mcp/src/bin/cli.rs`
+
+Add to `enum Commands`:
+
+```rust
+/// Emergent domain management
+Domains {
+    #[command(subcommand)]
+    action: DomainAction,
+},
+```
+
+```rust
+#[derive(clap::Subcommand)]
+enum DomainAction {
+    /// List all discovered domains
+    List {
+        #[arg(long)] json: bool,
+    },
+    /// Run HDBSCAN discovery on all embeddings and propose domains
+    Discover {
+        #[arg(long, default_value_t = 10)] min_cluster_size: usize,
+        /// Skip the proposal flow and write new domains directly (first-time use)
+        #[arg(long)] force: bool,
+        #[arg(long)] json: bool,
+    },
+    /// Rename a domain (by id)
+    Rename {
+        id: String,
+        new_label: String,
+    },
+    /// Merge two domains
+    Merge {
+        a: String,
+        b: String,
+        #[arg(long)] into: Option<String>, // default: `a`
+    },
+}
+```
+
+Handler plumbing lives in `run_domains(action)` dispatching to `run_domains_list`, `run_domains_discover`, `run_domains_rename`, `run_domains_merge`. Each opens the default `Storage`, constructs a `DomainClassifier::default()`, and invokes the appropriate method.
+
+Output format for `list`:
+
+```
+ID              LABEL              MEMORIES    TOP TERMS
+dev             Development        87          rust, trait, async, tokio, zinit
+infra           Infrastructure     47          bgp, sonic, vlan, frr, peering
+home            Home               31          solar, kwh, battery, pool, esphome
+(unclassified)                     12
+```
+
+Produced via plain `print!` with `%-15s %-18s %-10d %s` style padding. `--json` emits `serde_json::to_string_pretty(&domains)`.
+
+Output format for `discover` with `--force`:
+
+```
+HDBSCAN: 500 embeddings, min_cluster_size=10, min_samples=5
+Found 3 clusters (ignoring 14 noise points)
+  cluster_0 (N=47)  top: bgp, sonic, vlan, frr, peering
+  cluster_1 (N=31)  top: solar, kwh, battery, pool, esphome
+  cluster_2 (N=22)  top: rust, trait, async, tokio, zinit
+
+Writing 3 domains to the store...
+Soft-assigning 500 memories against centroids...
+  multi-domain: 43
+  single-domain: 412
+  unclassified (below threshold 0.65): 45
+Done in 7.4s.
+```
+
+Output format for `discover` without `--force` (post-Phase-0):
+
+```
+HDBSCAN: 623 embeddings, min_cluster_size=10
+Comparing to existing 3 domains...
+
+Proposals (pending, accept via dashboard or `vestige domains proposals`):
+  [split] dev -> (systems:34, networking:28)    confidence 0.82
+  [new]   cluster_5 (books, novels, reading)    confidence 0.71
+
+Run `vestige domains proposals` to review, or open the dashboard.
+```
+
+### 4. CLI: `list`, `rename`, `merge`
+
+- `list`: calls `store.list_domains()`, fetches unclassified count via `store.count_memories_without_domains()` (Phase 1 should have provided this; if not, Phase 4 adds it to the trait and both backends).
+- `rename`: `store.get_domain(id)` -> mutate `label` -> `store.upsert_domain`. No memory touch.
+- `merge`: load both, compute blended centroid (weighted by `memory_count`), merge `top_terms` (union, recompute TF-IDF rank if both sides share the corpus), delete the non-`into` domain, call `reassign_all`. Wrapped in a transaction on Postgres; on SQLite rely on the existing writer-lock pattern.
+
+### 5. Auto-classify on ingest
+
+**File**: `crates/vestige-core/src/cognitive.rs` (or equivalent ingest entry in `vestige-mcp/src/tools/smart_ingest.rs`).
+
+Integration point: just before the record is persisted in the smart-ingest path, after the embedder has produced `embedding` and before `storage.insert(...)`. Trace the current call site -- today `Storage::ingest(IngestInput)` computes embedding inside storage; in Phase 1 the embedder becomes external (ADR decision Q2), so classification can hook right there in the cognitive engine.
+
+Pseudocode:
+
+```rust
+let embedding = embedder.embed(&input.content).await?;
+let domains = store.list_domains().await?;
+
+let (domains_assigned, domain_scores) = if domains.is_empty() {
+    (Vec::new(), HashMap::new())
+} else {
+    let boost = context_signals.gather_boost(&input.metadata, &domains);
+    let result = classifier.classify_with_boost(&embedding, &domains, boost.as_ref());
+    (result.domains, result.scores)
+};
+
+record.embedding = Some(embedding);
+record.domains = domains_assigned;
+record.domain_scores = domain_scores;
+store.insert(&record).await?;
+```
+
+Edge cases:
+
+- Accumulation phase (`domains.is_empty()`): skip classification entirely. Zero overhead.
+- Embedding failed / skipped: leave `domains = []`, `domain_scores = {}`. Never fail ingest because of classification.
+- Metric: emit `VestigeEvent::MemoryClassified { id, domains, top_score }` on the WebSocket bus so the dashboard sees it live.
+
+### 6. Re-cluster hook in dream consolidation
+
+**File**: `crates/vestige-core/src/advanced/dreams.rs` (long file, 1131-line `dream()` entry on the `MemoryDreamer` impl) plus `crates/vestige-core/src/consolidation/phases.rs` (the `DreamEngine::run` orchestrator).
+
+Design: the `DreamEngine::run(...)` returns `FourPhaseDreamResult`. It does not currently know how many times it has run. Phase 4 introduces a persistent counter on disk (column `dream_cycle_count` on a new singleton `system_state` table, or a simple row in the existing `metadata` / `embedding_model` registry). After the Integration phase finishes, the cognitive engine increments the counter and, if `counter % recluster_interval == 0`, launches discovery asynchronously:
+
+Extension struct in `phases.rs`:
+
+```rust
+pub struct DreamReClusterHook<'a> {
+    pub classifier: &'a DomainClassifier,
+    pub store: &'a dyn MemoryStore,
+    pub event_tx: Option<&'a tokio::sync::mpsc::UnboundedSender<VestigeEvent>>,
+}
+
+impl<'a> DreamReClusterHook<'a> {
+    pub async fn tick(&self, cycle_count: usize) -> Result<Vec<DomainProposal>, StorageError> {
+        if cycle_count == 0 || cycle_count % self.classifier.recluster_interval != 0 {
+            return Ok(vec![]);
+        }
+        let existing = self.store.list_domains().await?;
+        let rediscovered = self.classifier.discover(self.store).await?;
+        let proposals = self
+            .classifier
+            .propose_changes(self.store, &existing, &rediscovered)
+            .await?;
+        for p in &proposals {
+            self.store.insert_domain_proposal(p).await?;
+            if let Some(tx) = self.event_tx {
+                let _ = tx.send(VestigeEvent::DomainProposalCreated {
+                    id: p.id.clone(),
+                    kind: format!("{:?}", p.kind),
+                    confidence: p.confidence,
+                    timestamp: Utc::now(),
+                });
+            }
+        }
+        Ok(proposals)
+    }
+}
+```
+
+Caller wires `tick()` after `DreamEngine::run()` returns, at the ingest/consolidation orchestrator level. The hook never mutates existing domains -- it only writes proposals. The acceptance path is manual (CLI or dashboard).
+
+Counter storage: add method `store.bump_dream_cycle_count() -> Result<usize>` returning the new count. Single-row table:
+
+```sql
+CREATE TABLE IF NOT EXISTS system_state (
+    key TEXT PRIMARY KEY,
+    value TEXT NOT NULL
+);
+-- seed: ('dream_cycle_count', '0')
+```
+
+### 7. Context signal extractor
+
+**File**: `crates/vestige-core/src/neuroscience/context_signals.rs`
+
+```rust
+pub trait SignalSource: Send + Sync {
+    /// Returns domain_id -> additive boost (positive or negative, typically in [-0.1, +0.1]).
+    fn boost_map(
+        &self,
+        input_metadata: &serde_json::Value,
+        domains: &[Domain],
+    ) -> HashMap<String, f64>;
+
+    fn name(&self) -> &'static str;
+}
+
+pub struct GitRepoSignal {
+    pub boost: f64, // default +0.05
+}
+
+pub struct IdeHintSignal {
+    pub boost: f64,
+}
+
+pub struct ContextSignals {
+    sources: Vec<Box<dyn SignalSource>>,
+}
+
+impl ContextSignals {
+    pub fn gather_boost(
+        &self,
+        input_metadata: &serde_json::Value,
+        domains: &[Domain],
+    ) -> Option<HashMap<String, f64>>;
+}
+```
+
+Signal encoding convention (document in the module header):
+
+- A signal is a **soft prior**. It nudges the post-cosine score by a small additive delta, clamped to `[-0.10, +0.10]` per signal.
+- Multiple signals sum, then the final boost per domain is clamped to `[-0.15, +0.15]` so signals cannot by themselves push a memory into or out of a domain; the embedding similarity dominates.
+- Signals target domains by heuristic: `GitRepoSignal` boosts any domain whose `top_terms` overlaps `{"rust","async","trait","function","class","def","git","commit","fn","code"}`. `IdeHintSignal` does the same for `{"file","line","editor","vscode","neovim","rust-analyzer","lsp"}`.
+- All signal boosts are logged via `tracing::debug!` so users can audit why a memory picked up a domain.
+
+`GitRepoSignal::boost_map` implementation:
+
+```rust
+fn boost_map(&self, meta: &Value, domains: &[Domain]) -> HashMap<String, f64> {
+    let is_git = meta.get("cwd")
+        .and_then(|v| v.as_str())
+        .map(|cwd| std::path::Path::new(cwd).join(".git").exists())
+        .unwrap_or(false)
+        || meta.get("git_repo").is_some();
+    if !is_git { return HashMap::new(); }
+    let mut out = HashMap::new();
+    for d in domains {
+        let code_hits = d.top_terms.iter()
+            .filter(|t| CODE_TERMS.contains(t.as_str()))
+            .count();
+        if code_hits > 0 { out.insert(d.id.clone(), self.boost); }
+    }
+    out
+}
+```
+
+Config knob in `[domains.signals]`: `git = true`, `ide = true`, `git_boost = 0.05`, `ide_boost = 0.05`.
+
+### 8. Cross-domain spreading activation decay
+
+**File**: `crates/vestige-core/src/neuroscience/spreading_activation.rs`
+
+Modify `ActivationConfig`:
+
+```rust
+pub struct ActivationConfig {
+    pub decay_factor: f64,
+    pub max_hops: u32,
+    pub min_threshold: f64,
+    pub allow_cycles: bool,
+    pub cross_domain_decay: f64, // NEW, default 0.5
+}
+```
+
+Domain metadata on nodes: the current `ActivationNode` has `id`, `activation`, `last_activated`, `edges: Vec<String>`. Phase 4 adds `pub domains: Vec<String>`. Populated when nodes get added (propagated from the memory's `domains` field). The network is rebuilt on each search from the store; if the in-memory network is persisted (check `ActivationNetwork` lifetime in `CognitiveEngine`), the population happens in the engine at boot and on insert.
+
+Traversal change, in `ActivationNetwork::activate` loop, replacing the single line `let propagated = current_activation * edge.strength * self.config.decay_factor;`:
+
+```rust
+let cross_penalty = {
+    let src_doms = self.nodes.get(&current_id).map(|n| &n.domains);
+    let tgt_doms = self.nodes.get(&target_id).map(|n| &n.domains);
+    match (src_doms, tgt_doms) {
+        (Some(s), Some(t)) if !s.is_empty() && !t.is_empty() => {
+            let overlap = s.iter().any(|d| t.contains(d));
+            if overlap { 1.0 } else { self.config.cross_domain_decay }
+        }
+        _ => 1.0, // unclassified on either side: no penalty
+    }
+};
+let propagated = current_activation * edge.strength * self.config.decay_factor * cross_penalty;
+```
+
+Rationale for "unclassified -> no penalty": unclassified memories are Phase-0 or low-confidence corpus members; penalizing them would block useful cross-pollination during the accumulation ramp.
+
+API to update a node's domains after reclassification:
+
+```rust
+pub fn set_node_domains(&mut self, id: &str, domains: Vec<String>);
+```
+
+Called by the reassignment pipeline after `reassign_all`.
+
+### 9. `vestige.toml` `[domains]` section
+
+**File**: wherever `vestige.toml` is loaded (search for `[storage]` / `[server]` loaders). Add:
+
+```toml
+[domains]
+assign_threshold = 0.65
+discovery_threshold = 150
+recluster_interval = 5
+min_cluster_size = 10
+min_samples = 5
+cross_domain_decay = 0.5
+merge_threshold = 0.90
+top_terms_k = 10
+
+[domains.signals]
+git = true
+ide = true
+git_boost = 0.05
+ide_boost = 0.05
+```
+
+Rust-side: `DomainsConfig { ... }` struct with `serde(default)` so `vestige.toml` without a `[domains]` section falls back to hard-coded defaults. `DomainClassifier::from_config(cfg: &DomainsConfig) -> Self`.
+
+### 10. Dashboard UI additions
+
+**SvelteKit routes** (`apps/dashboard/src/routes/(app)/domains/`):
+
+- `+page.svelte` (list): fetches `GET /api/v1/domains` and `GET /api/v1/domains/unclassified-count`. Renders a table: `label`, `memories`, `top_terms` chips, `created_at`. Each row links to `/domains/[id]`. A "Discover" button posts `POST /api/v1/domains/discover`.
+- `[id]/+page.svelte` (detail): fetches `GET /api/v1/domains/:id`, `GET /api/v1/domains/:id/memories?limit=100`, `GET /api/v1/domains/:id/score-histogram`. Renders:
+  - Header: label (editable, triggers `PUT /api/v1/domains/:id`), top-terms chips, memory count, created_at.
+  - Histogram: a vertical bar chart of `domain_scores[:id]` buckets 0-0.1, 0.1-0.2, ..., 0.9-1.0 across all memories. Data source: server precomputes buckets so the client does not need to fetch all scores.
+  - Memory list: paginated, each row shows the raw score for this domain.
+- `proposals/+page.svelte`: fetches `GET /api/v1/domains/proposals?status=pending`. Each pending proposal card shows `kind`, `rationale`, `confidence`, `created_at`, buttons "Accept" (posts `POST /api/v1/domains/proposals/:id/accept`) and "Reject" (`POST .../reject`). Live updates via the existing WebSocket channel (`/ws`) reacting to `DomainProposalCreated` events.
+
+Styling reuses the existing Tailwind + shadcn-svelte conventions in `apps/dashboard/src/lib/components/`.
+
+Existing `(app)/stats` and `(app)/feed` pages get a small "Domains" summary panel that links to `/domains`.
+
+### 11. REST endpoints
+
+**File**: `crates/vestige-mcp/src/protocol/http.rs` or a new `crates/vestige-mcp/src/api/domains.rs` module, wired into the `/api/v1/` router.
+
+| Method | Path | Handler |
+|--------|------|---------|
+| GET | `/api/v1/domains` | `list_domains` -- returns `[Domain...]` + unclassified count |
+| POST | `/api/v1/domains/discover` | `trigger_discover` -- body `{ min_cluster_size?: usize, force?: bool }`, returns proposals or applied domains |
+| GET | `/api/v1/domains/:id` | `get_domain` |
+| PUT | `/api/v1/domains/:id` | `update_domain` -- rename |
+| DELETE | `/api/v1/domains/:id` | `delete_domain` -- with `?merge_into=other_id` |
+| GET | `/api/v1/domains/:id/memories` | paginated memories in this domain |
+| GET | `/api/v1/domains/:id/score-histogram` | precomputed buckets |
+| GET | `/api/v1/domains/proposals` | `list_proposals?status=pending` |
+| POST | `/api/v1/domains/proposals/:id/accept` | `accept_proposal` |
+| POST | `/api/v1/domains/proposals/:id/reject` | `reject_proposal` |
+
+All handlers go through the Phase 3 auth middleware (Bearer / X-API-Key / session cookie). Responses are JSON; error paths use `StatusCode::*` with a small `{"error": "..."}` body.
+
+### 12. `domain_proposals` table + trait methods
+
+Postgres migration (`crates/vestige-core/migrations/postgres/00XX_domain_proposals.sql`):
+
+```sql
+CREATE TABLE domain_proposals (
+    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    kind         TEXT NOT NULL,      -- 'split' | 'merge' | 'new_cluster'
+    payload      JSONB NOT NULL,     -- serialized ProposalKind body
+    rationale    TEXT NOT NULL,
+    confidence   DOUBLE PRECISION NOT NULL,
+    status       TEXT NOT NULL DEFAULT 'pending', -- pending|accepted|rejected
+    created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
+    resolved_at  TIMESTAMPTZ
+);
+CREATE INDEX idx_domain_proposals_status ON domain_proposals (status, created_at DESC);
+```
+
+SQLite migration: same table, `UUID` -> `TEXT`, `JSONB` -> `TEXT` with JSON-encoded bodies, `TIMESTAMPTZ` -> `TEXT` ISO-8601.
+
+`MemoryStore` trait additions:
+
+```rust
+async fn insert_domain_proposal(&self, p: &DomainProposal) -> Result<()>;
+async fn list_domain_proposals(&self, status: Option<&str>) -> Result<Vec<DomainProposal>>;
+async fn get_domain_proposal(&self, id: &str) -> Result<Option<DomainProposal>>;
+async fn set_proposal_status(&self, id: &str, status: &str) -> Result<()>;
+```
+
+### 13. WebSocket event for proposals
+
+**File**: `crates/vestige-mcp/src/dashboard/events.rs`
+
+Add variant:
+
+```rust
+pub enum VestigeEvent {
+    // ... existing ...
+    DomainProposalCreated {
+        id: String,
+        kind: String,
+        confidence: f64,
+        timestamp: DateTime<Utc>,
+    },
+    MemoryClassified {
+        id: String,
+        domains: Vec<String>,
+        top_score: f64,
+        timestamp: DateTime<Utc>,
+    },
+}
+```
+
+The SvelteKit dashboard's WS client reacts to both events: classified events refresh any open domain-detail page; proposal events push a toast and a badge on the navbar.
+
+---
+
+## Test Plan
+
+Test root: `tests/phase_4/` (a new member of the workspace; mirror the `tests/e2e` layout).
+
+`tests/phase_4/Cargo.toml`:
+
+```toml
+[package]
+name = "vestige-phase4-tests"
+version = "0.0.0"
+edition = "2024"
+publish = false
+
+[dependencies]
+vestige-core = { path = "../../crates/vestige-core", features = ["embeddings", "vector-search", "domain-classification"] }
+vestige-mcp  = { path = "../../crates/vestige-mcp" }
+tokio = { workspace = true }
+anyhow = "1"
+tempfile = "3"
+serde_json = { workspace = true }
+uuid = { workspace = true }
+```
+
+### Unit tests (colocated in `domain_classifier.rs::tests`, `context_signals.rs::tests`, `spreading_activation.rs::tests`)
+
+Each public function must have at least one test:
+
+- `classify_empty_domains_returns_empty`: `classify(&[0.0; 768], &[])` returns `ClassificationResult { scores: {}, domains: [] }`.
+- `classify_single_domain_scores`: one `Domain` with a known centroid; input embedding equal to centroid; expect score 1.0 and `domains == [id]`.
+- `classify_multi_domain_overlap`: two domains A, B; input halfway between centroids; expect both scores >= `assign_threshold`; expect `domains == [A, B]` (order not guaranteed).
+- `classify_below_threshold_returns_empty_domains_but_scores_filled`: input orthogonal to all centroids; expect `scores` populated, `domains` empty.
+- `classify_with_boost_adds_delta`: same input as above, with `boost = {A: 0.4}`; expect A now above threshold, B unchanged.
+- `classify_boost_clamps_to_unit`: `boost = {A: 5.0}`; resulting `scores[A]` must be <= 1.0.
+- `tfidf_top_k_returns_distinct_terms`: given three fake docs, `top_k=3` returns three non-duplicate strings, in descending TF-IDF order.
+- `tfidf_top_k_drops_stopwords`: `["the and for"]` + real content -> stop-words absent.
+- `compute_top_terms_handles_empty_cluster`: returns `vec![]` (no panic).
+- `signal_git_present_vs_absent`: `GitRepoSignal` given metadata with `.git` in cwd returns non-empty map; without it returns empty.
+- `signal_ide_present_vs_absent`: `IdeHintSignal` ditto for `metadata.editor == "vscode"`.
+- `signal_combined_clamped`: two signals both firing each at +0.10 -> combined map values <= +0.15.
+- `cross_domain_decay_full_weight_on_overlap`: graph with node A in domain `dev`, node B in domain `dev`, edge A->B strength 1.0; after `activate`, B's activation equals the standard `initial * strength * decay_factor` (no extra penalty).
+- `cross_domain_decay_half_weight_no_overlap`: A in `dev`, B in `infra`, same edge -> B's activation is 0.5x that of the overlap case.
+- `cross_domain_decay_unclassified_no_penalty`: A classified, B unclassified -> full weight.
+- `propose_changes_detects_split`: existing domain `dev`; new discovery returns two clusters whose centroids both sit close to old `dev` centroid, each >= min_cluster_size members -> proposal of kind `Split { parent: "dev", children: [a, b] }`.
+- `propose_changes_detects_merge`: two existing domains whose new centroids now have cosine > `merge_threshold` -> proposal of kind `Merge`.
+- `propose_changes_detects_new_cluster`: a new cluster with no match >= 0.85 to any existing -> `NewCluster`.
+- `apply_proposal_split_updates_memberships`: after accept, memories previously in `dev` get reassigned (some to child a, some to child b) via `reassign_all`.
+
+### Integration tests (`tests/phase_4/tests/`)
+
+One file per behavior listed in the Phase 4 acceptance sheet.
+
+- `discover_seed_corpus.rs` -- loads the 500-memory fixture, runs `classifier.discover(&store).await`, asserts at least 3 clusters, asserts per-cluster intra-similarity mean > 0.6, asserts discovery wall time < 10s in release. Also asserts `top_terms` for each cluster contains at least one expected keyword per cluster (dev: contains any of `rust/trait/async`; infra: `bgp/vlan/network`; home: `solar/battery/pool`).
+- `soft_assign_multi_domain.rs` -- inserts a memory "deploy zinit containers over BGP network"; after classify, `domains` contains both `dev` and `infra` (from a known centroid setup).
+- `auto_classify_on_ingest.rs` -- with three existing domains, a fresh `smart_ingest` of a dev-ish sentence ends up with `domains == ["dev"]` and non-empty `domain_scores`.
+- `reembed_triggers_recluster.rs` -- after `vestige migrate --reembed`, centroids must be recomputed; verify `list_domains()` returns fresh `centroid` values (different from pre-reembed).
+- `dream_consolidation_recluster_hook.rs` -- run 5 dream cycles with heavy synthetic memory insertion; after the 5th, assert `list_domain_proposals("pending")` has at least one proposal.
+- `proposal_accept_applies_changes.rs` -- accept a split proposal via `apply_proposal`; verify that memories in `dev` are now distributed across the new children and that the old `dev` domain is removed.
+- `proposal_reject_leaves_state.rs` -- reject a proposal; verify all domains and memberships unchanged.
+- `drift_is_proposal_only.rs` -- over 5 dream cycles with new inserts, never call accept; verify every memory's `domains` field equals its initial post-discovery value. No auto-apply.
+- `cross_domain_activation_decay.rs` -- build a `ActivationNetwork` with two memories linked by a strength-1.0 edge, one in `dev`, one in `infra`; activate `dev` memory with 1.0; assert `infra` memory's activation == `0.5 * decay_factor` (0.35 with default decay_factor 0.7). Then set both to `dev` and reassert activation == `0.7`.
+- `cli_domains_discover.rs` -- spawn `cargo run -- domains discover --force --json`, parse stdout, assert at least 3 clusters and valid JSON shape.
+- `cli_domains_rename_merge.rs` -- happy-path rename then merge, with stdout assertions.
+- `context_signal_git_repo.rs` -- ingest the same sentence from inside a tempdir with `.git` vs outside; assert the git-run produces slightly higher `domain_scores` for the code-related domain (diff >= 0.04, matches `git_boost = 0.05`).
+- `threshold_tunable.rs` -- same memory, two runs with `assign_threshold = 0.40` vs `0.85`; the low-threshold run assigns more domains than the high-threshold run for the same content.
+- `signal_boost_clamped.rs` -- artificially configure `git_boost = 5.0` and assert the resulting per-domain score is still <= 1.0.
+- `discover_preserves_stable_ids.rs` -- run discover twice with no new memories; the second run's domain ids match the first's (via centroid-similarity stable-ID matching above 0.85).
+
+### Dashboard UI tests (`tests/phase_4/ui/`)
+
+Use curl-driven smoke tests (avoids adding Playwright as a new hard dep; Playwright already exists at `apps/dashboard/playwright.config.ts` and can be extended later).
+
+- `domains_list_renders.sh` -- `curl -H "X-API-Key: $KEY" http://localhost:3927/api/v1/domains` returns 200 + JSON array with expected keys.
+- `domain_detail_histogram.sh` -- `curl .../api/v1/domains/dev/score-histogram` returns 10 buckets.
+- `proposal_review_flow.sh` -- create a pending proposal via SQL insert; `curl POST .../api/v1/domains/proposals/<id>/accept`; `curl GET .../proposals?status=accepted` shows it.
+- `unauth_domain_list_rejected.sh` -- no auth header -> 401.
+
+### Benchmarks (`tests/phase_4/benches/`)
+
+Criterion benches:
+
+- `bench_discover_10k.rs` -- synthetic 10k x 768D embeddings drawn from 5 blobs; assert `discover` wall p95 < 30s on a warm release build.
+- `bench_auto_classify_single.rs` -- 20 domains in memory, classify one 768D vector; assert p99 < 5ms.
+- `bench_reassign_all.rs` -- 10k memories, 5 domains; assert full `reassign_all` wall time < 90s (100 rows/ms baseline).
+
+---
+
+## Acceptance Criteria
+
+- [ ] `cargo build -p vestige-core --features domain-classification` zero warnings.
+- [ ] `cargo build -p vestige-mcp` zero warnings.
+- [ ] `cargo clippy --workspace --all-targets --all-features -- -D warnings` clean.
+- [ ] `cargo test -p vestige-phase4-tests` -- all tests in `tests/phase_4/` pass.
+- [ ] On a 500+ memory seed corpus covering three natural clusters (dev / infra / home), `vestige domains discover --force` produces sensible top-terms matching the expected keyword sets and labels are stable on a second run.
+- [ ] `vestige search` with domain filter `["dev"]` excludes any memory whose `domains` array does not include `dev`.
+- [ ] After 5 dream cycles with ongoing inserts, no existing memory's `domains` has silently changed; proposals exist in `domain_proposals` table; accepting a proposal reassigns as described.
+- [ ] Cross-domain spreading activation: a query in `dev` that crosses a single edge into an `infra`-only memory still returns the memory but with activation `cross_domain_decay * in-domain_activation`.
+- [ ] `vestige domains discover --min-cluster-size 20` produces strictly fewer or equal clusters than the default, and with larger per-cluster membership.
+- [ ] Dashboard `/dashboard/domains` route renders all domains within 2 seconds on the seed corpus.
+- [ ] Proposal UI flow (open pending, accept, confirmed in store) works end-to-end.
+- [ ] Benchmarks meet targets (discover 10k p95 < 30s, auto-classify p99 < 5ms).
+
+---
+
+## Rollback Notes
+
+- **Feature gate**: add `domain-classification` to `crates/vestige-core/Cargo.toml`'s `[features]`. When disabled, the `DomainClassifier` module is not compiled, the classification call in the ingest path is a no-op (`#[cfg]`-guarded), and cross-domain decay collapses to `1.0`. The CLI `domains` subcommand emits "domain classification is disabled in this build".
+- **Revert strategy**: drop the two new tables `domains` (if created in Phase 1 is retained) or `domain_proposals` (Phase 4). A DOWN migration clears `memories.domains` and `memories.domain_scores`. Existing memories simply lose their domain assignments; all search and retrieval paths work unchanged because `domains = []` is the documented "unclassified" state.
+- **Idempotency**: rerunning `discover` is always safe. Cluster numeric IDs may differ between runs, but the stable-ID match by centroid similarity preserves user-assigned labels. Do not persist cluster ids in client-side bookmarks; link via the user-assigned label.
+- **Data-loss risk**: `apply_proposal` is a destructive operation (it deletes the old parent domain in a split or merges two). The dashboard's accept button double-confirms with a modal that shows the number of affected memories.
+
+---
+
+## Open Implementation Questions
+
+Each question + candidates + RECOMMENDATION.
+
+### OQ1. Top-terms extraction: TF-IDF vs BM25 vs frequency?
+- TF-IDF with smoothed IDF -- standard, cheap, good-enough.
+- BM25 -- better for long-document discrimination, overkill for short memory contents.
+- Raw frequency -- noisy; stop-words dominate.
+**RECOMMENDATION**: TF-IDF with global IDF over the entire memory corpus (not just cluster members), recomputed once per `discover` call. Same tokenizer as the `dreams.rs::content_similarity` Jaccard for consistency.
+
+### OQ2. Proposal persistence: DB table vs in-memory with dashboard notification?
+- DB table (`domain_proposals`) -- durable, surfaces across restarts, enables audit.
+- In-memory only -- simpler, but loses proposals on server restart.
+**RECOMMENDATION**: DB table. Proposals are rare (every 5th dream) and valuable user-facing artifacts; durability is mandatory.
+
+### OQ3. `hdbscan` crate: f32 vs f64 input, exact API surface?
+- v0.10 historically takes `&[Vec<f64>]`; embeddings are `Vec<f32>`.
+- Cost of converting f32 -> f64 at discovery time: `10k * 768 = 7.68M` f64 doubles ~ 60MB transient, acceptable.
+**RECOMMENDATION**: verify v0.10's type signature at implementation time; if it requires f64, perform the conversion in `discover()` behind a single allocation. Document in module header. If the crate API diverged from the PRD snippet, fall back to the manual builder style (`HdbscanHyperParams::builder().min_cluster_size(n).min_samples(s).build()`).
+
+### OQ4. Stable domain IDs across discover re-runs?
+- Option A: numeric IDs from HDBSCAN labels -- unstable, re-runs shuffle them.
+- Option B: hash(top_terms) -- stable if top-terms stable, but top-terms drift.
+- Option C (recommended): after computing new centroids, match each to the closest existing domain by centroid cosine; if similarity > 0.85, reuse the existing domain's `id` and `label`. Otherwise mint a fresh `id = "cluster_<uuid>"`.
+**RECOMMENDATION**: Option C. Preserves user-assigned labels across drift. Threshold 0.85 is config-tunable via `stable_id_threshold` if needed later.
+
+### OQ5. Context signal injection site: ingest handler vs embedder vs classifier?
+- Embedder -- would alter embedding; signals are not about embedding quality.
+- Ingest handler -- signals known there, but then `DomainClassifier` cannot be tested in isolation.
+- Classifier as a `classify_with_boost(boost: Option<&HashMap>)` parameter -- pure, testable, composable.
+**RECOMMENDATION**: classifier parameter. The cognitive engine constructs the boost map via `ContextSignals::gather_boost(&metadata, &domains)` and hands it to the classifier. Keeps the classifier stateless w.r.t. signals.
+
+### OQ6. Re-cluster proposal cadence: event-based (every Nth dream) vs time-based (weekly)?
+- ADR resolution Q7: every Nth dream (N=5 default).
+- Alternative: once per week regardless of dream cadence.
+**RECOMMENDATION**: stick with every Nth dream. Users who dream rarely re-cluster rarely -- that matches the philosophy ("memory work triggers memory bookkeeping"). Note the alternative as future consideration; if users complain about never seeing proposals, add a time-based fallback.
+
+### OQ7. Minimum corpus size for first discover?
+- PRD default: 150.
+- Too low -> noisy initial clusters, proposals every dream.
+- Too high -> user waits forever for domains to appear.
+**RECOMMENDATION**: 150 as the default discovery gate; HDBSCAN's `min_cluster_size=10` will produce 0 clusters for < 100 memories, so the system gracefully produces no domains until the corpus is large enough. Test with `N=80, 150, 500` in `threshold_tunable.rs` to confirm sensible behavior.
+
+### OQ8. Cross-domain decay: strict no-overlap vs graded?
+- Strict: `1.0` if any overlap, `cross_domain_decay` otherwise.
+- Graded: `max(cross_domain_decay, |A intersect B| / max(|A|, |B|))`.
+**RECOMMENDATION**: strict for Phase 4. Easier to reason about, easier to tune, easier to test. Graded is a marked future enhancement; file an issue if retrieval-quality metrics justify it.
+
+### OQ9. Classifier invocation from remote HTTP clients?
+- In server mode, an agent posts `smart_ingest` -> server embeds -> server classifies.
+- All the work stays server-side; MCP clients never do classification.
+**RECOMMENDATION**: confirmed server-side-only. Document in the MCP tool schema that `smart_ingest` now returns `domains` and `domain_scores` in its response so clients can display the classification to the user.
+
+### OQ10. Where to store the dream-cycle counter?
+- In-memory on `CognitiveEngine` -- lost on restart, miscounts cadence.
+- New `system_state` singleton table.
+**RECOMMENDATION**: `system_state` table. Survives restarts. Also useful for future metrics (total memories ever, total dreams ever).
+
+### OQ11. Scope of `reassign_all` after a proposal accept vs a normal discover?
+- On discover --force (first-time), run `reassign_all` against all memories.
+- On proposal accept (split / merge), run `reassign_all` only on affected memories (parent's members for split; both parents' members for merge) to avoid touching unrelated records.
+**RECOMMENDATION**: scoped reassignment where possible; fall back to full `reassign_all` only on `discover --force` or when the set of domains has fundamentally changed. Reduces write amplification on large corpora.
+
+### OQ12. Proposal freshness?
+- Multiple re-clusters could stack up pending proposals.
+**RECOMMENDATION**: before inserting a new proposal, check for existing pending proposals with the same `kind + targets`; if present, bump `created_at` and `confidence` instead of creating a duplicate. Add a `confidence_history` array in the `payload` JSONB for audit.
+
+---
+
+## Implementation Sequencing (suggested order)
+
+1. Land the `DomainClassifier` struct, `classify` / `classify_with_boost`, unit tests. (Day 1)
+2. Add `compute_top_terms` + TF-IDF helper, tests. (Day 1)
+3. Wire `discover` end-to-end against SQLite; `discover_seed_corpus` integration test. (Day 2)
+4. Add `domain_proposals` table migrations + trait methods; both backends. (Day 2)
+5. Implement `propose_changes` + `apply_proposal`; proposal unit tests. (Day 3)
+6. Context signals module + tests. (Day 3)
+7. Hook classifier into ingest path; `auto_classify_on_ingest` integration test. (Day 4)
+8. Cross-domain decay in spreading activation; unit + integration tests. (Day 4)
+9. Dream re-cluster hook + `system_state` counter; integration tests for drift-only behavior. (Day 5)
+10. CLI subcommands. (Day 6)
+11. REST endpoints. (Day 6)
+12. SvelteKit dashboard routes + WebSocket event wiring. (Day 7-8)
+13. Benchmarks + acceptance sweep on the 500-memory seed. (Day 9)
+
+---
+
+## File Map (everything Phase 4 touches or creates)
+
+Creates:
+
+- `crates/vestige-core/src/neuroscience/domain_classifier.rs`
+- `crates/vestige-core/src/neuroscience/context_signals.rs`
+- `crates/vestige-core/migrations/postgres/00XX_domain_proposals.sql`
+- `crates/vestige-core/migrations/sqlite/00XX_domain_proposals.sql` (or inline in `storage/migrations.rs`)
+- `crates/vestige-mcp/src/api/domains.rs` (REST handlers)
+- `apps/dashboard/src/routes/(app)/domains/+page.svelte`
+- `apps/dashboard/src/routes/(app)/domains/[id]/+page.svelte`
+- `apps/dashboard/src/routes/(app)/domains/proposals/+page.svelte`
+- `apps/dashboard/src/lib/api/domains.ts`
+- `tests/phase_4/Cargo.toml`
+- `tests/phase_4/tests/*.rs` (per the Integration test list)
+- `tests/phase_4/fixtures/seed_500.json`
+- `tests/phase_4/support/fixtures.rs`
+
+Modifies:
+
+- `crates/vestige-core/Cargo.toml` -- add `hdbscan = "0.10"` under a new `domain-classification` feature.
+- `crates/vestige-core/src/neuroscience/mod.rs` -- register new modules, re-exports.
+- `crates/vestige-core/src/neuroscience/spreading_activation.rs` -- `cross_domain_decay` field in `ActivationConfig`, `domains` field on `ActivationNode`, decay math in `activate`.
+- `crates/vestige-core/src/consolidation/phases.rs` -- `DreamReClusterHook`.
+- `crates/vestige-core/src/advanced/dreams.rs` -- accept a hook callback from the orchestrator (if the orchestration is done at this level).
+- `crates/vestige-core/src/storage/trait.rs` -- add proposal + system_state methods.
+- `crates/vestige-core/src/storage/sqlite.rs` -- implement proposal + system_state methods + `all_embeddings_with_meta` if not already on the trait.
+- `crates/vestige-core/src/storage/postgres.rs` (Phase 2) -- same.
+- `crates/vestige-core/src/lib.rs` -- re-exports.
+- `crates/vestige-core/src/cognitive.rs` (or equivalent ingest orchestrator) -- auto-classify injection.
+- `crates/vestige-mcp/src/bin/cli.rs` -- `Domains` subcommand + dispatch.
+- `crates/vestige-mcp/src/dashboard/mod.rs` -- wire new REST routes.
+- `crates/vestige-mcp/src/dashboard/events.rs` -- new event variants.
+- `crates/vestige-mcp/src/dashboard/handlers.rs` -- if legacy dashboard gets a domains panel (optional).
+- `vestige.toml` config loader -- `[domains]` section + struct + defaults.
+- Root `Cargo.toml` workspace members -- add `tests/phase_4`.
+
+---
+
+## Risks
+
+- **HDBSCAN determinism**: HDBSCAN is deterministic given input order; sorting embeddings by memory id before feeding the clusterer guarantees reproducibility across runs -- do this in `discover()` and document it.
+- **Embedding dimension drift**: Phase 1's `embedding_model` registry blocks writes from mismatched embedders. If `discover()` ever sees two dimensions, it bails with a clear error and points at `vestige migrate --reembed`.
+- **Classification latency on ingest**: for users with thousands of domains (unlikely but possible), `classify` is O(n_domains * dim). 20 domains * 768 f32 = 15k flops per classification, trivial. Still, expose a `classify_budget_ms` config knob for paranoia.
+- **Re-cluster proposal storms**: if the corpus is borderline-stable, small changes can produce conflicting proposals on consecutive dreams. Mitigation: OQ12 (dedup by target set, bump confidence instead of stacking).
+- **Dashboard feature gap**: if the SvelteKit app lands with the domains route but the REST endpoints are not yet deployed, the route 404s. Mitigation: ship the REST endpoints in the same release; a feature flag on the client toggles the nav entry.
+
+---
+
+## Non-Goals Reminder
+
+- No Phase 5 federation concerns in this plan.
+- No cross-installation domain sync.
+- No automatic accept of proposals, ever.
+- No graded cross-domain decay; strict only.
+- No ML-based domain label suggestion (top-terms are enough for v1).
+- No editing individual memory memberships from the UI in this phase.