`--no-session-persistence` only blocks resumability — Claude Code
still writes `~/.claude/projects/<workspace>/<id>.jsonl` for every
session. Reusing our deterministic brightstaff session id (a v5 UUID
hashed from the conversation prefix) caused the CLI to fail every
second request for the same conversation with
`Error: Session ID ... is already in use`.
Generate a per-spawn random v4 UUID inside `ClaudeProcess::spawn` and
pass that to `claude --session-id` (and stamp it on every stdin
JSONL event so the CLI accepts the turn). Keep the deterministic
brightstaff session id as the `SessionManager` map key so retries
still hit the hot child.
- main.rs: rebuild claude_cli_config_from_env on top of
SessionManagerConfig::default() and only override fields that have a
parsed env var, so the defaults live in exactly one place.
- hermesllm/apis/claude_cli.rs: delete the dead
`_touch_messages_message_type` stub and its unused MessagesMessage
import; apply pedantic-clippy fixes that touch the new code
(clone_from over `= x.clone()`, Map::default() over Default::default(),
map_or_else over .map(...).unwrap_or_else(...), str::to_string method
reference, collapsed identical match arms).
- hermesllm/providers/id.rs: collapse the two match arms that mapped
"claude-cli" and "claude_cli" to ProviderId::ClaudeCli.
- hermesllm/tests/claude_cli_fixtures.rs: collect text deltas straight
into a String instead of `.collect::<Vec<_>>().join("")`.
- brightstaff/tests/claude_cli_bridge.rs: add a Drop impl on
BridgeFixture so a panicking test still releases the listener task.
- The synthetic message_start path only fired when the very first
observed event was a Result. If the CLI ever emitted (say) a bare
ContentBlockStart first, we'd ship malformed Anthropic SSE without a
preceding message_start. Trigger the synthesis on any first
stream-advancing event that isn't a MessageStart.
- Make every send-to-client branch consistent: break out of the loop
when the receiver has gone away (mpsc send returned Err), so we don't
keep generating events for a vanished client.
- Replace serde_json::to_string(...).unwrap() in the streaming error
path with the same fallback json_response already uses ("{}" on
serialize failure). No more panic surface in the streaming worker.
- Drop the dead `_touch_stream_module` placeholder and its unused
`use futures::stream` import.
- Convert ClaudeProcess::last_used from tokio::sync::Mutex<Instant> to
std::sync::Mutex<Instant>: the critical section is one Copy read/write
with no .await, so a sync mutex lets SessionManager iterate sessions
without holding the map lock across an await point. Fixes the
lock-across-await pattern in lru_session_id and evict_idle.
- Simplify SessionManager::get_or_spawn to a single map-lock acquisition
on the fast path; only release the lock for the rare case where we
need to await a victim shutdown before spawning.
- Replace the hand-rolled "deterministic UUID via DefaultHasher" with a
real UUIDv5 over the OID namespace (uuid feature `v5`). Stable across
Rust toolchain versions, unlike SipHash, and matches what the doc on
the helper claimed all along.
- Introduce ProcessError::MissingStdio { which } so spawns where
Stdio::piped() somehow returned None surface as their own programmer-
error variant rather than masquerading as ExitedEarly.
- Delete the dead is_zero() helper.
Spawn the local `claude` binary as a subprocess and expose it as an
Anthropic Messages-compatible provider. Hosted in brightstaff
(`CLAUDE_CLI_LISTEN_ADDR`), with session reuse, idle TTL, and watchdog.
User-facing surface is `model_providers: [{ model: claude-cli/* }]` —
the Python CLI auto-fills name/provider_interface/base_url/access_key
and the launcher (native + supervisord) enables the bridge listener
only when at least one claude-cli provider is present.
* signals: restore the pre-port flag marker emoji
#903 inadvertently replaced the legacy FLAG_MARKER (U+1F6A9, '🚩') with
'[!]', which broke any downstream dashboard / alert that searches span
names for the flag emoji. Restores the original marker and updates the
#910 docs pass to match.
- crates/brightstaff/src/signals/analyzer.rs: FLAG_MARKER back to
"\\u{1F6A9}" with a comment noting the backwards-compatibility
reason so it doesn't drift again.
- docs/source/concepts/signals.rst and docs/source/guides/observability/
tracing.rst: swap every '[!]' reference (subheading text, example
span name, tip box, dashboard query hint) back to 🚩.
Verified: cargo test -p brightstaff --lib (162 passed, 1 ignored);
sphinx-build clean on both files; rendered HTML shows 🚩 in all
flag-marker references.
Made-with: Cursor
* fix: silence manual_checked_ops clippy lint (rustc 1.95)
Pre-existing warning in router/stress_tests.rs that becomes an error
under CI's -D warnings with rustc 1.95. Replace the manual if/else
with growth.checked_div(num_iterations).unwrap_or(0) as clippy
suggests.
Made-with: Cursor
* signals: port to layered taxonomy with dual-emit OTel
Made-with: Cursor
* fix: silence collapsible_match clippy lint (rustc 1.95)
Made-with: Cursor
* test: parity harness for rust vs python signals analyzer
Validates the brightstaff signals port against the katanemo/signals Python
reference on lmsys/lmsys-chat-1m. Adds a signals_replay bin emitting python-
compatible JSON, a pyarrow-based driver (bypasses the datasets loader pickle
bug on python 3.14), a 3-tier comparator, and an on-demand workflow_dispatch
CI job.
Made-with: Cursor
* Remove signals test from the gitops flow
* style: format parity harness with black
Made-with: Cursor
* signals: group summary by taxonomy, factor misalignment_ratio
Addresses #903 review feedback from @nehcgs:
- generate_summary() now renders explicit Interaction / Execution /
Environment headers so the paper taxonomy is visible at a glance,
even when no signals fired in a given layer. Quality-driving callouts
(high misalignment rate, looping detected, escalation requested) are
appended after the layer summary as an alerts tail.
- repair_ratio (legacy taxonomy name) renamed to misalignment_ratio
and factored into a single InteractionSignals::misalignment_ratio()
helper so assess_quality and generate_summary share one source of
truth instead of recomputing the same divide twice.
Two new unit tests pin the layer headers and the (sev N) severity
suffix. Parity with the python reference is preserved at the Tier-A
level (per-type counts + overall_quality); only the human-readable
summary string diverges, which the parity comparator already classifies
as Tier-C.
Made-with: Cursor
Publish docker image (latest) / build-arm64 (push) Has been cancelled
Publish docker image (latest) / build-amd64 (push) Has been cancelled
Build and Deploy Documentation / build (push) Has been cancelled
CI / security-scan (push) Has been cancelled
CI / test-prompt-gateway (push) Has been cancelled
CI / test-model-alias-routing (push) Has been cancelled
CI / test-responses-api-with-state (push) Has been cancelled
CI / e2e-plano-tests (3.10) (push) Has been cancelled
CI / e2e-plano-tests (3.11) (push) Has been cancelled
CI / e2e-plano-tests (3.12) (push) Has been cancelled
CI / e2e-plano-tests (3.13) (push) Has been cancelled
CI / e2e-plano-tests (3.14) (push) Has been cancelled
CI / e2e-demo-preference (push) Has been cancelled
CI / e2e-demo-currency (push) Has been cancelled
Publish docker image (latest) / create-manifest (push) Has been cancelled
* add pluggable session cache with Redis backend
* add Redis session affinity demos (Docker Compose and Kubernetes)
* address PR review feedback on session cache
* document Redis session cache backend for model affinity
* sync rendered config reference with session_cache addition
* add tenant-scoped Redis session cache keys and remove dead log_affinity_hit
- Add tenant_header to SessionCacheConfig; when set, cache keys are scoped
as plano:affinity:{tenant_id}:{session_id} for multi-tenant isolation
- Thread tenant_id through RouterService, routing_service, and llm handlers
- Use Cow<'_, str> in session_key to avoid allocation when no tenant is set
- Remove unused log_affinity_hit (logging was already inlined at call sites)
* remove session_affinity_redis and session_affinity_redis_k8s demos
* fix: route Perplexity OpenAI paths without /v1
* add tests for Perplexity provider handling in LLM module
* refactor: use constant for Perplexity provider prefix in LLM module
* moving const to top of file
* support configurable orchestrator model via orchestration config section
* add self-hosting docs and demo for Plano-Orchestrator
* list all Plano-Orchestrator model variants in docs
* use overrides for custom routing and orchestration model
* update docs
* update orchestrator model name
* rename arch provider to plano, use llm_routing_model and agent_orchestration_model
* regenerate rendered config reference
* Add Codex CLI support; xAI response improvements
* Add native Plano running check and update CLI agent error handling
* adding PR suggestions for transformations and code quality
* message extraction logic in ResponsesAPIRequest
* xAI support for Responses API by routing to native endpoint + refactor code
* cleaning up plano cli commands
* adding support for wildcard model providers
* fixing compile errors
* fixing bugs related to default model provider, provider hint and duplicates in the model provider list
* fixed cargo fmt issues
* updating tests to always include the model id
* using default for the prompt_gateway path
* fixed the model name, as gpt-5-mini-2025-08-07 wasn't in the config
* making sure that all aliases and models match the config
* fixed the config generator to allow for base_url providers LLMs to include wildcard models
* re-ran the models list utility and added a shell script to run it
* updating docs to mention wildcard model providers
* updated provider_models.json to yaml, added that file to our docs for reference
* updating the build docs to use the new root-based build
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
* adding support for signals
* reducing false positives for signals like positive interaction
* adding docs. Still need to fix the messages list, but waiting on PR #621
* Improve frustration detection: normalize contractions and refine punctuation
* Further refine test cases with longer messages
* minor doc changes
* fixing echo statement for build
* fixing the messages construction and using the trait for signals
* update signals docs
* fixed some minor doc changes
* added more tests and fixed docuemtnation. PR 100% ready
* made fixes based on PR comments
* Optimize latency
1. replace sliding window approach with trigram containment check
2. add code to pre-compute ngrams for patterns
* removed some debug statements to make tests easier to read
* PR comments to make ObservableStreamProcessor accept optonal Vec<Messagges>
* fixed PR comments
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
Co-authored-by: MeiyuZhong <mariazhong9612@gmail.com>
Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com>
* agents framework demo
* more changes
* add more changes
* pending changes
* fix tests
* fix more
* rebase with main and better handle error from mcp
* add trace for filters
* add test for client error, server error and for mcp error
* update schema validate code and rename kind => type in agent_filter
* fix agent description and pre-commit
* fix tests
* add provider specific request parsing in agents chat
* fix precommit and tests
* cleanup demo
* update readme
* fix pre-commit
* refactor tracing
* fix fmt
* fix: handle MessageContent enum in responses API conversion
- Update request.rs to handle new MessageContent enum structure from main
- MessageContent can now be Text(String) or Items(Vec<InputContent>)
- Handle new InputItem variants (ItemReference, FunctionCallOutput)
- Fixes compilation error after merging latest main (#632)
* address pr feedback
* fix span
* fix build
* update openai version
* first commit with tests to enable state mamangement via memory
* fixed logs to follow the conversational flow a bit better
* added support for supabase
* added the state_storage_v1_responses flag, and use that to store state appropriately
* cleaned up logs and fixed issue with connectivity for llm gateway in weather forecast demo
* fixed mixed inputs from openai v1/responses api (#632)
* fixed mixed inputs from openai v1/responses api
* removing tracing from model-alias-rouing
* handling additional input types from openairs
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
* resolving PR comments
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
* adding canonical tracing support via bright-staff
* improved formatting for tools in the traces
* removing anthropic from the currency exchange demo
* using Envoy to transport traces, not calling OTEL directly
* moving otel collcetor cluster outside tracing if/else
* minor fixes to not write to the OTEL collector if tracing is disabled
* fixed PR comments and added more trace attributes
* more fixes based on PR comments
* more clean up based on PR comments
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>