vestige

mirror of https://github.com/samvallad33/vestige.git synced 2026-06-14 20:55:14 +02:00

Sam Valladares 82b78ab664 feat(v2.1.0): Qwen3-Embedding-0.6B backend scaffolding (feature-gated) Day 2 of the Qwen3 migration. Default build is unchanged — nomic stays the embedding backend, every existing caller continues to see 256-dim Matryoshka-truncated vectors from the ONNX path. The `qwen3-embed` feature flag routes to fastembed's standalone Qwen3TextEmbedding (Candle backend) for 1024-dim native output with 32K context. Honest scope note: this commit is SCAFFOLDING. Under `qwen3-embed` the backend initialises cleanly, the vector index now sizes itself to 1024d via feature-gated DEFAULT_DIMENSIONS, and the full 366-test lib suite passes. End-to-end ingest + search under qwen3-embed still has two gaps that Day 3 closes: sqlite.rs hardcodes the embedding_model string as 'nomic-embed-text-v1.5' at the write sites, and get_query_embedding doesn't call qwen3_format_query on the query text. Neither is a regression for default builds — both are explicit Day 3 work items tracked in the audit inventory. What's here: - New `Backend` enum wraps either `TextEmbedding` (Nomic ONNX) or `Qwen3TextEmbedding` (Candle) behind the same Mutex<OnceLock<...>> the rest of Vestige already calls through. `EmbeddingService::embed` dispatches via `Backend::embed_batch` + `Backend::post_process` so the public API shape doesn't change. - `qwen3-embed` Cargo feature = fastembed/qwen3 + direct candle-core pinned to =0.10.2 (exact, not caret — supply-chain defence alongside Cargo.lock; fastembed doesn't re-export candle_core types so we need a direct dep path for candle_core::{Device, DType}). - `qwen3_format_query()` helper + `QWEN3_QUERY_INSTRUCTION` constant. Qwen3 is asymmetric — queries require the Instruct prefix, documents go in raw. Prefix format matches the canonical `get_detailed_instruct` Python reference in the HF model card (no space after `Query:`). The helper is a no-op under the nomic backend so upstream code can wrap queries unconditionally. - Per-backend dimensions: `NOMIC_EMBEDDING_DIMENSIONS = 256`, `QWEN3_EMBEDDING_DIMENSIONS = 1024`. `EMBEDDING_DIMENSIONS` resolves to the active backend at compile time for back-compat. - `search/vector.rs::DEFAULT_DIMENSIONS` and `advanced/adaptive_embedding.rs::{DEFAULT_DIMENSIONS, CODE_DIMENSIONS}` feature-gated to match the active backend so the USearch index sizes itself correctly. - Per-backend model_name() returning the HF repo ID ("nomic-ai/..." or "Qwen/..."). Will be threaded through storage write sites in Day 3. - MAX_TEXT_LENGTH bumps to 32K under qwen3-embed to match Qwen3's context window; stays at 8K for nomic. - Backend::post_process applies matryoshka_truncate for Nomic only; Qwen3 output is already last-token pooled + L2-normalized by the Candle model (verified in fastembed-5.13.2 qwen3.rs:1124-1125). - Device selection: `#[cfg(feature = "metal")]` uses Device::new_metal(0) with CPU fallback on failure; otherwise CPU. CUDA auto-selection deferred to Day 3+. - Shape-contract guard at the Backend output boundary — empty outer OR inner vectors return EmbeddingError::EmbeddingFailed instead of the previous `.unwrap()` + zero-dim vector reaching USearch. Tests: 366 passing under default features AND --features qwen3-embed. Zero clippy warnings on both. One live integration test (`test_qwen3_embed_live`) `#[ignore]`d so CI doesn't try to pull the 1.2 GB Qwen3 weights on every run; invoke explicitly with `cargo test --features qwen3-embed -- --ignored test_qwen3_embed_live`. Pre-push audit (4 parallel reviewers — security, code-quality, end-to-end flow trace, external verification) ran clean on: - Cfg soundness across default / qwen3-embed / qwen3-embed+metal / nomic-v2 / no-default-features / encryption matrices - Doc-comment fidelity vs fastembed-5.13.2 source - External claims (1024d, 32K ctx, MRL 32-1024, L2-normalized, last-token pooling) all verified against Qwen3 HF model card and fastembed qwen3.rs - Zero `unsafe`, zero reachable panics, zero info-disclosure leaks beyond HF upstream error strings Day 3 (next session): - sqlite.rs:663 and :669 — write EmbeddingService::model_name() instead of hardcoded "nomic-embed-text-v1.5" - sqlite.rs:1639 get_query_embedding — wrap query text with qwen3_format_query() before calling embed() - sqlite.rs load_embeddings_into_index — refuse cross-backend loads (legacy nomic rows under qwen3 build) instead of silent re-use - Add a migration warn when backend mismatch is detected	2026-04-18 20:54:52 -05:00
..
vestige-core	feat(v2.1.0): Qwen3-Embedding-0.6B backend scaffolding (feature-gated)	2026-04-18 20:54:52 -05:00
vestige-mcp	chore(release): v2.0.6 "Composer" — rebuild + version bump + CHANGELOG	2026-04-18 18:33:31 -05:00

Sam Valladares 82b78ab664 feat(v2.1.0): Qwen3-Embedding-0.6B backend scaffolding (feature-gated)

Day 2 of the Qwen3 migration. Default build is unchanged — nomic stays
the embedding backend, every existing caller continues to see 256-dim
Matryoshka-truncated vectors from the ONNX path. The `qwen3-embed`
feature flag routes to fastembed's standalone Qwen3TextEmbedding
(Candle backend) for 1024-dim native output with 32K context.

Honest scope note: this commit is SCAFFOLDING. Under `qwen3-embed`
the backend initialises cleanly, the vector index now sizes itself to
1024d via feature-gated DEFAULT_DIMENSIONS, and the full 366-test lib
suite passes. End-to-end ingest + search under qwen3-embed still has
two gaps that Day 3 closes: sqlite.rs hardcodes the embedding_model
string as 'nomic-embed-text-v1.5' at the write sites, and
get_query_embedding doesn't call qwen3_format_query on the query text.
Neither is a regression for default builds — both are explicit Day 3
work items tracked in the audit inventory.

What's here:

- New `Backend` enum wraps either `TextEmbedding` (Nomic ONNX) or
  `Qwen3TextEmbedding` (Candle) behind the same Mutex<OnceLock<...>>
  the rest of Vestige already calls through. `EmbeddingService::embed`
  dispatches via `Backend::embed_batch` + `Backend::post_process` so
  the public API shape doesn't change.
- `qwen3-embed` Cargo feature = fastembed/qwen3 + direct candle-core
  pinned to =0.10.2 (exact, not caret — supply-chain defence alongside
  Cargo.lock; fastembed doesn't re-export candle_core types so we need
  a direct dep path for candle_core::{Device, DType}).
- `qwen3_format_query()` helper + `QWEN3_QUERY_INSTRUCTION` constant.
  Qwen3 is asymmetric — queries require the Instruct prefix, documents
  go in raw. Prefix format matches the canonical `get_detailed_instruct`
  Python reference in the HF model card (no space after `Query:`). The
  helper is a no-op under the nomic backend so upstream code can wrap
  queries unconditionally.
- Per-backend dimensions: `NOMIC_EMBEDDING_DIMENSIONS = 256`,
  `QWEN3_EMBEDDING_DIMENSIONS = 1024`. `EMBEDDING_DIMENSIONS` resolves
  to the active backend at compile time for back-compat.
- `search/vector.rs::DEFAULT_DIMENSIONS` and
  `advanced/adaptive_embedding.rs::{DEFAULT_DIMENSIONS, CODE_DIMENSIONS}`
  feature-gated to match the active backend so the USearch index
  sizes itself correctly.
- Per-backend model_name() returning the HF repo ID ("nomic-ai/..." or
  "Qwen/..."). Will be threaded through storage write sites in Day 3.
- MAX_TEXT_LENGTH bumps to 32K under qwen3-embed to match Qwen3's
  context window; stays at 8K for nomic.
- Backend::post_process applies matryoshka_truncate for Nomic only;
  Qwen3 output is already last-token pooled + L2-normalized by the
  Candle model (verified in fastembed-5.13.2 qwen3.rs:1124-1125).
- Device selection: `#[cfg(feature = "metal")]` uses
  Device::new_metal(0) with CPU fallback on failure; otherwise CPU.
  CUDA auto-selection deferred to Day 3+.
- Shape-contract guard at the Backend output boundary — empty outer
  OR inner vectors return EmbeddingError::EmbeddingFailed instead of
  the previous `.unwrap()` + zero-dim vector reaching USearch.

Tests: 366 passing under default features AND --features qwen3-embed.
Zero clippy warnings on both. One live integration test
(`test_qwen3_embed_live`) `#[ignore]`d so CI doesn't try to pull the
1.2 GB Qwen3 weights on every run; invoke explicitly with
`cargo test --features qwen3-embed -- --ignored test_qwen3_embed_live`.

Pre-push audit (4 parallel reviewers — security, code-quality,
end-to-end flow trace, external verification) ran clean on:
- Cfg soundness across default / qwen3-embed / qwen3-embed+metal /
  nomic-v2 / no-default-features / encryption matrices
- Doc-comment fidelity vs fastembed-5.13.2 source
- External claims (1024d, 32K ctx, MRL 32-1024, L2-normalized,
  last-token pooling) all verified against Qwen3 HF model card and
  fastembed qwen3.rs
- Zero `unsafe`, zero reachable panics, zero info-disclosure leaks
  beyond HF upstream error strings

Day 3 (next session):
- sqlite.rs:663 and :669 — write EmbeddingService::model_name()
  instead of hardcoded "nomic-embed-text-v1.5"
- sqlite.rs:1639 get_query_embedding — wrap query text with
  qwen3_format_query() before calling embed()
- sqlite.rs load_embeddings_into_index — refuse cross-backend loads
  (legacy nomic rows under qwen3 build) instead of silent re-use
- Add a migration warn when backend mismatch is detected

2026-04-18 20:54:52 -05:00

vestige-core

feat(v2.1.0): Qwen3-Embedding-0.6B backend scaffolding (feature-gated)

2026-04-18 20:54:52 -05:00

vestige-mcp

chore(release): v2.0.6 "Composer" — rebuild + version bump + CHANGELOG

2026-04-18 18:33:31 -05:00