plano/crates/hermesllm
Syed A. Hashmi c8079ac971
signals: feature parity with the latest Signals paper. Porting logic from python repo (#903)
* signals: port to layered taxonomy with dual-emit OTel

Made-with: Cursor

* fix: silence collapsible_match clippy lint (rustc 1.95)

Made-with: Cursor

* test: parity harness for rust vs python signals analyzer

Validates the brightstaff signals port against the katanemo/signals Python
reference on lmsys/lmsys-chat-1m. Adds a signals_replay bin emitting python-
compatible JSON, a pyarrow-based driver (bypasses the datasets loader pickle
bug on python 3.14), a 3-tier comparator, and an on-demand workflow_dispatch
CI job.

Made-with: Cursor

* Remove signals test from the gitops flow

* style: format parity harness with black

Made-with: Cursor

* signals: group summary by taxonomy, factor misalignment_ratio

Addresses #903 review feedback from @nehcgs:

- generate_summary() now renders explicit Interaction / Execution /
  Environment headers so the paper taxonomy is visible at a glance,
  even when no signals fired in a given layer. Quality-driving callouts
  (high misalignment rate, looping detected, escalation requested) are
  appended after the layer summary as an alerts tail.

- repair_ratio (legacy taxonomy name) renamed to misalignment_ratio
  and factored into a single InteractionSignals::misalignment_ratio()
  helper so assess_quality and generate_summary share one source of
  truth instead of recomputing the same divide twice.

Two new unit tests pin the layer headers and the (sev N) severity
suffix. Parity with the python reference is preserved at the Tier-A
level (per-type counts + overall_quality); only the human-readable
summary string diverges, which the parity comparator already classifies
as Tier-C.

Made-with: Cursor
2026-04-23 12:02:30 -07:00
..
src signals: feature parity with the latest Signals paper. Porting logic from python repo (#903) 2026-04-23 12:02:30 -07:00
Cargo.toml Adding support for wildcard models in the model_providers config (#696) 2026-01-28 17:47:33 -08:00
README.md updating the implementation of /v1/chat/completions to use the generi… (#548) 2025-08-20 12:55:29 -07:00

hermesllm

A Rust library for handling LLM (Large Language Model) API requests and responses with unified abstractions across multiple providers.

Features

  • Unified request/response types with provider-specific parsing
  • Support for both streaming and non-streaming responses
  • Type-safe provider identification
  • OpenAI-compatible API structure with extensible provider support

Supported Providers

  • OpenAI
  • Mistral
  • Groq
  • Deepseek
  • Gemini
  • Claude
  • GitHub

Installation

Add to your Cargo.toml:

[dependencies]
hermesllm = { path = "../hermesllm" }  # or appropriate path in workspace

Usage

Basic Request Parsing

use hermesllm::providers::{ProviderRequestType, ProviderRequest, ProviderId};

// Parse request from JSON bytes
let request_bytes = r#"{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}"#;

// Parse with provider context
let request = ProviderRequestType::try_from((request_bytes.as_bytes(), &ProviderId::OpenAI))?;

// Access request properties
println!("Model: {}", request.model());
println!("User message: {:?}", request.get_recent_user_message());
println!("Is streaming: {}", request.is_streaming());

Working with Responses

use hermesllm::providers::{ProviderResponseType, ProviderResponse};

// Parse response from provider
let response_bytes = /* JSON response from LLM */;
let response = ProviderResponseType::try_from((response_bytes, ProviderId::OpenAI))?;

// Extract token usage
if let Some((prompt, completion, total)) = response.extract_usage_counts() {
    println!("Tokens used: {}/{}/{}", prompt, completion, total);
}

Handling Streaming Responses

use hermesllm::providers::{ProviderStreamResponseIter, ProviderStreamResponse};

// Create streaming iterator from SSE data
let sse_data = /* Server-Sent Events data */;
let mut stream = ProviderStreamResponseIter::try_from((sse_data, &ProviderId::OpenAI))?;

// Process streaming chunks
for chunk_result in stream {
    match chunk_result {
        Ok(chunk) => {
            if let Some(content) = chunk.content_delta() {
                print!("{}", content);
            }
            if chunk.is_final() {
                break;
            }
        }
        Err(e) => eprintln!("Stream error: {}", e),
    }
}

Provider Compatibility

use hermesllm::providers::{ProviderId, has_compatible_api, supported_apis};

// Check API compatibility
let provider = ProviderId::Groq;
if has_compatible_api(&provider, "/v1/chat/completions") {
    println!("Provider supports chat completions");
}

// List supported APIs
let apis = supported_apis(&provider);
println!("Supported APIs: {:?}", apis);

Core Types

Provider Types

  • ProviderId - Enum identifying supported providers (OpenAI, Mistral, Groq, etc.)
  • ProviderRequestType - Enum wrapping provider-specific request types
  • ProviderResponseType - Enum wrapping provider-specific response types
  • ProviderStreamResponseIter - Iterator for streaming response chunks

Traits

  • ProviderRequest - Common interface for all request types
  • ProviderResponse - Common interface for all response types
  • ProviderStreamResponse - Interface for streaming response chunks
  • TokenUsage - Interface for token usage information

OpenAI API Types

  • ChatCompletionsRequest - Chat completion request structure
  • ChatCompletionsResponse - Chat completion response structure
  • Message, Role, MessageContent - Message building blocks

Architecture

The library uses a type-safe enum-based approach that:

  • Provides Type Safety: All provider operations are checked at compile time
  • Enables Runtime Provider Selection: Provider can be determined from request headers or config
  • Maintains Clean Abstractions: Common traits hide provider-specific details
  • Supports Extensibility: New providers can be added by extending the enums

All requests are parsed into a common ProviderRequestType enum which implements the ProviderRequest trait, allowing uniform access to request properties regardless of the underlying provider format.

Examples

See the src/lib.rs tests for complete working examples of:

  • Parsing requests with provider context
  • Handling streaming responses
  • Working with token usage information

License

This project is licensed under the MIT License.