mirror of
https://github.com/samvallad33/vestige.git
synced 2026-06-14 20:55:14 +02:00
Day 2 of the Qwen3 migration. Default build is unchanged — nomic stays
the embedding backend, every existing caller continues to see 256-dim
Matryoshka-truncated vectors from the ONNX path. The `qwen3-embed`
feature flag routes to fastembed's standalone Qwen3TextEmbedding
(Candle backend) for 1024-dim native output with 32K context.
Honest scope note: this commit is SCAFFOLDING. Under `qwen3-embed`
the backend initialises cleanly, the vector index now sizes itself to
1024d via feature-gated DEFAULT_DIMENSIONS, and the full 366-test lib
suite passes. End-to-end ingest + search under qwen3-embed still has
two gaps that Day 3 closes: sqlite.rs hardcodes the embedding_model
string as 'nomic-embed-text-v1.5' at the write sites, and
get_query_embedding doesn't call qwen3_format_query on the query text.
Neither is a regression for default builds — both are explicit Day 3
work items tracked in the audit inventory.
What's here:
- New `Backend` enum wraps either `TextEmbedding` (Nomic ONNX) or
`Qwen3TextEmbedding` (Candle) behind the same Mutex<OnceLock<...>>
the rest of Vestige already calls through. `EmbeddingService::embed`
dispatches via `Backend::embed_batch` + `Backend::post_process` so
the public API shape doesn't change.
- `qwen3-embed` Cargo feature = fastembed/qwen3 + direct candle-core
pinned to =0.10.2 (exact, not caret — supply-chain defence alongside
Cargo.lock; fastembed doesn't re-export candle_core types so we need
a direct dep path for candle_core::{Device, DType}).
- `qwen3_format_query()` helper + `QWEN3_QUERY_INSTRUCTION` constant.
Qwen3 is asymmetric — queries require the Instruct prefix, documents
go in raw. Prefix format matches the canonical `get_detailed_instruct`
Python reference in the HF model card (no space after `Query:`). The
helper is a no-op under the nomic backend so upstream code can wrap
queries unconditionally.
- Per-backend dimensions: `NOMIC_EMBEDDING_DIMENSIONS = 256`,
`QWEN3_EMBEDDING_DIMENSIONS = 1024`. `EMBEDDING_DIMENSIONS` resolves
to the active backend at compile time for back-compat.
- `search/vector.rs::DEFAULT_DIMENSIONS` and
`advanced/adaptive_embedding.rs::{DEFAULT_DIMENSIONS, CODE_DIMENSIONS}`
feature-gated to match the active backend so the USearch index
sizes itself correctly.
- Per-backend model_name() returning the HF repo ID ("nomic-ai/..." or
"Qwen/..."). Will be threaded through storage write sites in Day 3.
- MAX_TEXT_LENGTH bumps to 32K under qwen3-embed to match Qwen3's
context window; stays at 8K for nomic.
- Backend::post_process applies matryoshka_truncate for Nomic only;
Qwen3 output is already last-token pooled + L2-normalized by the
Candle model (verified in fastembed-5.13.2 qwen3.rs:1124-1125).
- Device selection: `#[cfg(feature = "metal")]` uses
Device::new_metal(0) with CPU fallback on failure; otherwise CPU.
CUDA auto-selection deferred to Day 3+.
- Shape-contract guard at the Backend output boundary — empty outer
OR inner vectors return EmbeddingError::EmbeddingFailed instead of
the previous `.unwrap()` + zero-dim vector reaching USearch.
Tests: 366 passing under default features AND --features qwen3-embed.
Zero clippy warnings on both. One live integration test
(`test_qwen3_embed_live`) `#[ignore]`d so CI doesn't try to pull the
1.2 GB Qwen3 weights on every run; invoke explicitly with
`cargo test --features qwen3-embed -- --ignored test_qwen3_embed_live`.
Pre-push audit (4 parallel reviewers — security, code-quality,
end-to-end flow trace, external verification) ran clean on:
- Cfg soundness across default / qwen3-embed / qwen3-embed+metal /
nomic-v2 / no-default-features / encryption matrices
- Doc-comment fidelity vs fastembed-5.13.2 source
- External claims (1024d, 32K ctx, MRL 32-1024, L2-normalized,
last-token pooling) all verified against Qwen3 HF model card and
fastembed qwen3.rs
- Zero `unsafe`, zero reachable panics, zero info-disclosure leaks
beyond HF upstream error strings
Day 3 (next session):
- sqlite.rs:663 and :669 — write EmbeddingService::model_name()
instead of hardcoded "nomic-embed-text-v1.5"
- sqlite.rs:1639 get_query_embedding — wrap query text with
qwen3_format_query() before calling embed()
- sqlite.rs load_embeddings_into_index — refuse cross-backend loads
(legacy nomic rows under qwen3 build) instead of silent re-use
- Add a migration warn when backend mismatch is detected
|
||
|---|---|---|
| .. | ||
| vestige-core | ||
| vestige-mcp | ||