mirror of
https://github.com/samvallad33/vestige.git
synced 2026-06-28 21:49:38 +02:00
Introduce two trait boundaries that the rest of the stack now sits above,
landing Phase 1 of ADR 0001 (pluggable storage and network access).
Rebased onto v2.1.22 Sanhedrin from the original April work.
MemoryStore / LocalMemoryStore (crates/vestige-core/src/storage/memory_store.rs):
One trait, ~25 methods, covering CRUD, hybrid / FTS / vector search,
FSRS scheduling, graph edges, and the forthcoming domain surface.
trait_variant::make generates a Send-bound MemoryStore alias over the
base LocalMemoryStore so Arc<dyn MemoryStore> works under tokio/axum.
Storage errors map through a dedicated MemoryStoreError.
Embedder / LocalEmbedder (crates/vestige-core/src/embedder/):
Pluggable text-to-vector encoder. FastembedEmbedder wraps the existing
EmbeddingService; storage never calls fastembed directly anymore.
Embedder::signature() produces the ModelSignature consumed by the
store's embedding_model registry.
SqliteMemoryStore (crates/vestige-core/src/storage/sqlite.rs):
Storage renamed to SqliteMemoryStore; the old name lives on as a
pub type alias so Arc<Storage> consumers in vestige-mcp stay intact.
All existing inherent methods are untouched; the trait impl is
purely additive and dispatches into them. The db_path field added
by v2.1.1 portable-sync is preserved.
Migration V14 (crates/vestige-core/src/storage/migrations.rs):
Renumbered from V12 (the original April number) to V14 to slot in
cleanly after upstream's V12 (v2.1.1 sync_tombstones) and V13
(v2.1.2 purge tombstones).
- embedding_model registry table (CHECK id = 1, code enforces the
single-row invariant).
- knowledge_nodes.domains / domain_scores TEXT columns (JSON arrays
default '[]' / '{}'), domains catalogue table, supporting indexes.
Phase 4 populates these columns; Phase 1 just exposes the schema.
Consolidation and other cognitive pathways now accept a
&dyn LocalMemoryStore (sync) or Arc<dyn MemoryStore> (async) rather
than a concrete Storage.
Tests:
- trait-method unit tests colocated in sqlite.rs and migrations.rs
- embedder/fastembed.rs tests for name/dimension/hash stability
- new integration crate tests/phase_1 (added to workspace members):
trait_round_trip (8), embedding_model_registry (7),
domain_column_migration (5), cognitive_module_isolation (4),
send_bound_variant (2), embedder_trait (2).
Acceptance gate post-rebase:
- cargo build --workspace --all-targets: ok
- cargo clippy --workspace --all-targets -- -D warnings: clean
- cargo test -p vestige-core --lib: 428 pass
- cargo test -p vestige-phase-1-tests: 28 pass
- cargo test -p vestige-mcp --lib: 380 pass (Storage alias preserves
every existing call site)
Co-existence with v2.1.1 portable-sync: this trait extraction is
additive. Portable-sync's tombstone migrations (V12, V13) remain
on the concrete SqliteMemoryStore; Phase 2 (Postgres) will decide
which of those surfaces graduate into the trait.
57 lines
2 KiB
Rust
57 lines
2 KiB
Rust
//! Text-to-vector encoding trait. Pluggable per-install.
|
|
|
|
mod fastembed;
|
|
|
|
pub use fastembed::FastembedEmbedder;
|
|
|
|
/// Error returned by every `Embedder` method.
|
|
#[non_exhaustive]
|
|
#[derive(Debug, thiserror::Error)]
|
|
pub enum EmbedderError {
|
|
#[error("embedder initialization failed: {0}")]
|
|
Init(String),
|
|
#[error("embedding generation failed: {0}")]
|
|
EmbedFailed(String),
|
|
#[error("invalid input: {0}")]
|
|
InvalidInput(String),
|
|
}
|
|
|
|
pub type EmbedderResult<T> = std::result::Result<T, EmbedderError>;
|
|
|
|
/// Pluggable embedder. The storage layer NEVER calls fastembed directly;
|
|
/// callers compute vectors via this trait and pass them into `MemoryStore`.
|
|
///
|
|
/// `#[async_trait::async_trait]` makes every `async fn` return a
|
|
/// `Pin<Box<dyn Future + Send>>`, which is required for `Box<dyn Embedder>`
|
|
/// and `Arc<dyn Embedder>` to be dyn-compatible.
|
|
#[async_trait::async_trait]
|
|
pub trait LocalEmbedder: Send + Sync + 'static {
|
|
async fn embed(&self, text: &str) -> EmbedderResult<Vec<f32>>;
|
|
|
|
fn model_name(&self) -> &str;
|
|
|
|
fn dimension(&self) -> usize;
|
|
|
|
/// Stable blake3 hash of (model_name || dimension || vestige-core crate version).
|
|
/// Lowercase hex, 64 chars.
|
|
///
|
|
/// Used by `MemoryStore::register_model` to detect silent model drift
|
|
/// (e.g. a fastembed minor upgrade that changes vector output).
|
|
fn model_hash(&self) -> String;
|
|
|
|
async fn embed_batch(&self, texts: &[&str]) -> EmbedderResult<Vec<Vec<f32>>>;
|
|
|
|
/// Returns the `ModelSignature` describing this embedder. Convenience
|
|
/// wrapper over the three accessors above.
|
|
fn signature(&self) -> crate::storage::ModelSignature {
|
|
crate::storage::ModelSignature {
|
|
name: self.model_name().to_string(),
|
|
dimension: self.dimension(),
|
|
hash: self.model_hash(),
|
|
}
|
|
}
|
|
}
|
|
|
|
/// Type alias: `Embedder` is the dyn-compatible, Send+Sync variant.
|
|
/// Both names refer to the same `async_trait`-annotated trait.
|
|
pub use LocalEmbedder as Embedder;
|