diff --git a/crates/brightstaff/src/signals/analyzer.rs b/crates/brightstaff/src/signals/analyzer.rs index 433bfe04..35e342eb 100644 --- a/crates/brightstaff/src/signals/analyzer.rs +++ b/crates/brightstaff/src/signals/analyzer.rs @@ -21,9 +21,10 @@ use super::schemas::{ use super::text_processing::NormalizedMessage; /// Marker appended to the span operation name when concerning signals are -/// detected. Kept in sync with the previous implementation for backward -/// compatibility with downstream consumers. -pub const FLAG_MARKER: &str = "[!]"; +/// detected. The 🚩 emoji (U+1F6A9) matches the pre-port implementation so +/// downstream consumers that search for flagged traces by span-name emoji +/// keep working. +pub const FLAG_MARKER: &str = "\u{1F6A9}"; /// ShareGPT-shaped row used as the canonical input to the analyzer's /// detectors. `from` is one of `"human"`, `"gpt"`, `"function_call"`, diff --git a/docs/source/concepts/signals.rst b/docs/source/concepts/signals.rst index 9e8b048b..d5e25e7e 100644 --- a/docs/source/concepts/signals.rst +++ b/docs/source/concepts/signals.rst @@ -4,39 +4,64 @@ Signalsβ„’ ======== -Agentic Signals are lightweight, model-free behavioral indicators computed from -live interaction trajectories and attached to your existing -OpenTelemetry traces. They make it possible to triage the small fraction of -trajectories that are most likely to be informative β€” brilliant successes or -**severe failures** β€” without running an LLM-as-judge on every session. +Agentic Signals are lightweight, model-free behavioral indicators computed +from live interaction trajectories and attached to your existing +OpenTelemetry traces. They are the instrumentation layer of a closed-loop +improvement flywheel for agents β€” turning raw production traffic into +prioritized data that can drive prompt, routing, and model updates without +running an LLM-as-judge on every session. The framework implemented here follows the taxonomy and detector design in -*Signals: Trajectory Sampling and Triage for Agentic Interactions* (Chen, -Hafeez, Paracha, 2026; `arXiv:2604.00356 -`_). All detectors are computed without -model calls; the entire pipeline attaches structured attributes and span -events to existing spans so your dashboards and alerts work unmodified. +*Signals: Trajectory Sampling and Triage for Agentic Interactions* +(`Chen et al., 2026 `_). All detectors +are computed without model calls; the entire pipeline attaches structured +attributes and span events to existing spans so your dashboards and alerts +work unmodified. -The Problem: Knowing What's "Good" -================================== +Why Signals Matter: The Improvement Flywheel +============================================ -One of the hardest parts of building agents is measuring how well they -perform in the real world. +Agentic applications are increasingly deployed at scale, yet improving them +after deployment remains difficult. Production trajectories are long, +numerous, and non-deterministic, making exhaustive human review infeasible +and auxiliary LLM evaluation expensive. As a result, teams face a +bottleneck: they cannot score every response, inspect every trace, or +reliably identify which failures and successes should inform the next model +update. Without a low-cost triage layer, the feedback loop from production +behavior to model improvement remains incomplete. -**Offline testing** relies on hand-picked examples and happy-path scenarios, -missing the messy diversity of real usage. Developers manually prompt models, -evaluate responses, and tune prompts by guesswork β€” a slow, incomplete -feedback loop. +Signals close this loop by cheaply identifying which interactions among +millions are worth inspecting: -**Production debugging** floods developers with traces and logs but provides -little guidance on which interactions actually matter. Finding failures means -painstakingly reconstructing sessions and manually labeling quality issues. +1. **Instrument.** Live trajectories are scored with model-free signals + attached as structured attributes on existing OpenTelemetry spans, + organized under a fixed taxonomy of interaction, execution, and + environment signals. This requires no additional model calls, + infrastructure, or changes to online agent behavior. +2. **Sample & triage.** Signal attributes act as filters: they surface + severe failures, retrieve representative exemplars, and exclude the + uninformative middle. In our experiments, signal-based sampling + achieves 82% informativeness on :math:`\tau`-bench, compared with 54% + for random sampling, yielding a 1.52Γ— efficiency gain per informative + trajectory. +3. **Data Construction.** The triaged subset becomes targeted input for + constructing preference datasets or supervised fine-tuning datasets + from production trajectories. +4. **Model Optimization.** The resulting preference or supervised + fine-tuning data is used to update the model through methods such as + DPO, RLHF, or supervised fine-tuning, so optimization is driven by + targeted production behavior rather than undifferentiated trace noise. +5. **Deploy.** The improved model is deployed and immediately + re-instrumented with the same signals, enabling teams to measure + whether the change improved production behavior and to feed the next + iteration. -You can't score every response with an LLM-as-judge (too expensive, too slow) -or manually review every trace (doesn't scale). What you need are -**behavioral signals** β€” fast, economical proxies that don't label quality -outright but dramatically shrink the search space, pointing to sessions most -likely to be broken or brilliant. +This loop depends on the first step being nearly free. The framework is +therefore designed around fixed-taxonomy, model-free detectors with +:math:`O(\text{messages})` cost, no online behavior change, and no +dependence on expensive evaluator models. By making production traces +searchable and sampleable at scale, signals turn raw agent telemetry into a +practical model-optimization flywheel. What Are Behavioral Signals? ============================ @@ -61,150 +86,159 @@ agent performance. Embedded directly into traces, they make it easy to spot friction as it happens: where users struggle, where agents loop, where tool failures cluster, and where escalations occur. -Signals vs Response Quality -=========================== - -Behavioral signals and response quality are complementary. - -**Response Quality** - Domain-specific correctness: did the agent do the right thing given - business rules, user intent, and operational context? This often - requires subject-matter experts or outcome instrumentation and is - time-intensive but irreplaceable. - -**Behavioral Signals** - Observable patterns that correlate with quality: misalignment, - stagnation, disengagement, satisfaction, tool failures, loops, and - environment exhaustion. Fast to compute and valuable for prioritizing - which traces deserve inspection. - -Used together, signals tell you *where to look*, and quality evaluation tells -you *what went wrong (or right)*. - Signal Taxonomy =============== Signals are organized into three top-level **layers**, each with its own intent. Every detected signal belongs to exactly one leaf type under one of -seven categories. +seven categories. The per-category summaries and leaf-type descriptions +below are borrowed verbatim from the reference implementation at +`katanemo/signals `_ to keep the +documentation and the detector contract in sync. -Interaction (user ↔ agent conversational quality) +Interaction β€” user ↔ agent conversational quality ------------------------------------------------- -Covers how the discourse itself is going: is the user being understood, is -the conversation progressing, is the user engaged, is the user satisfied? +**Misalignment** β€” Misalignment signals capture semantic or intent mismatch +between the user and the agent, such as rephrasing, corrections, +clarifications, and restated constraints. These signals do not assert that +either party is "wrong"; they only indicate that shared understanding has +not yet been established. .. list-table:: :header-rows: 1 - :widths: 25 25 50 + :widths: 30 70 - * - Category - - Leaf signal type - - Meaning - * - **Misalignment** - - ``misalignment.correction`` - - User explicitly corrects the agent ("No, I meant Paris, France"). - * - - - ``misalignment.rephrase`` - - User reformulates a previous request; semantic overlap is high. - * - - - ``misalignment.clarification`` - - User signals confusion ("I don't understand", "what do you mean"). - * - **Stagnation** - - ``stagnation.dragging`` - - Conversation length significantly exceeds the expected baseline. - * - - - ``stagnation.repetition`` - - Assistant near-duplicates prior turns (bigram Jaccard similarity). - * - **Disengagement** - - ``disengagement.escalation`` - - User asks to speak to a human / supervisor / support. - * - - - ``disengagement.quit`` - - User expresses intent to give up or abandon the session. - * - - - ``disengagement.negative_stance`` - - User expresses frustration: complaints, ALL CAPS, excessive - punctuation, agent-directed profanity. - * - **Satisfaction** - - ``satisfaction.gratitude`` - - User expresses thanks or appreciation. - * - - - ``satisfaction.confirmation`` - - User confirms the outcome ("got it", "sounds good"). - * - - - ``satisfaction.success`` - - User confirms task success ("that worked", "perfect"). + * - Leaf signal type + - Description + * - ``misalignment.correction`` + - Explicit corrections, negations, mistake acknowledgments. + * - ``misalignment.rephrase`` + - Rephrasing indicators, alternative explanations. + * - ``misalignment.clarification`` + - Confusion expressions, requests for clarification. -Execution (agent-caused action quality) +**Stagnation** β€” Stagnation signals capture cases where the discourse +continues but fails to make visible progress. This includes near-duplicate +assistant responses, circular explanations, repeated scaffolding, and other +forms of linguistic degeneration. + +.. list-table:: + :header-rows: 1 + :widths: 30 70 + + * - Leaf signal type + - Description + * - ``stagnation.dragging`` + - Excessive turn count, conversation not progressing efficiently. + * - ``stagnation.repetition`` + - Near-duplicate or repetitive assistant responses. + +**Disengagement** β€” Disengagement signals mark the withdrawal of +cooperative intent from the interaction. These include explicit requests to +exit the agent flow (e.g., "talk to a human"), strong negative stances, and +abandonment markers. + +.. list-table:: + :header-rows: 1 + :widths: 30 70 + + * - Leaf signal type + - Description + * - ``disengagement.escalation`` + - Requests for human agent or support. + * - ``disengagement.quit`` + - Notification to quit or leave. + * - ``disengagement.negative_stance`` + - Complaints, frustration, negative sentiment. + +**Satisfaction** β€” Satisfaction signals indicate explicit stabilization and +completion of the interaction. These include expressions of gratitude, +success confirmations, and closing utterances. We use these signals to +sample exemplar traces rather than to assign quality scores. + +.. list-table:: + :header-rows: 1 + :widths: 30 70 + + * - Leaf signal type + - Description + * - ``satisfaction.gratitude`` + - Expressions of thanks and appreciation. + * - ``satisfaction.confirmation`` + - Explicit satisfaction expressions. + * - ``satisfaction.success`` + - Confirmation of task completion or understanding. + +Execution β€” agent-caused action quality --------------------------------------- -Covers attempts to act in the world that don't yield usable outcomes. -Requires tool-call traces (``function_call`` / ``observation``) to fire. +**Failure** β€” Detects agent-caused failures in tool/function usage. These +are issues the agent is responsible for (as opposed to environment failures +which are external system issues). Requires tool-call traces +(``function_call`` / ``observation``) to fire. .. list-table:: :header-rows: 1 - :widths: 25 25 50 + :widths: 30 70 - * - Category - - Leaf signal type - - Meaning - * - **Failure** - - ``failure.invalid_args`` - - Tool call rejected due to schema / argument validation failure. - * - - - ``failure.bad_query`` - - Downstream query rejected as malformed by the tool. - * - - - ``failure.tool_not_found`` - - Agent called a tool that doesn't exist or isn't available. - * - - - ``failure.auth_misuse`` - - Authentication / authorization failure on a tool call. - * - - - ``failure.state_error`` - - Call-order / state-machine violation (e.g. commit without begin). - * - **Loops** - - ``loops.retry`` - - Same tool call repeated with near-identical arguments. - * - - - ``loops.parameter_drift`` - - Same tool called with slowly drifting parameters (walk pattern). - * - - - ``loops.oscillation`` - - Call A β†’ Call B β†’ Call A β†’ Call B pattern across multiple turns. + * - Leaf signal type + - Description + * - ``execution.failure.invalid_args`` + - Wrong type, missing required field. + * - ``execution.failure.bad_query`` + - Empty results due to overly narrow/wrong query. + * - ``execution.failure.tool_not_found`` + - Agent called non-existent tool. + * - ``execution.failure.auth_misuse`` + - Agent didn't pass credentials correctly. + * - ``execution.failure.state_error`` + - Tool called in wrong state/order. -Environment (external system / boundary conditions) +**Loops** β€” Detects behavioral patterns where the agent gets stuck +repeating tool calls. These are distinct from +``interaction.stagnation`` (conversation text repetition) and +``execution.failure`` (single tool errors) β€” these detect tool-level +behavioral loops. + +.. list-table:: + :header-rows: 1 + :widths: 30 70 + + * - Leaf signal type + - Description + * - ``execution.loops.retry`` + - Same tool with identical args β‰₯3 times. + * - ``execution.loops.parameter_drift`` + - Same tool with varied args β‰₯3 times. + * - ``execution.loops.oscillation`` + - Multi-tool Aβ†’Bβ†’Aβ†’B pattern β‰₯3 cycles. + +Environment β€” external system / boundary conditions --------------------------------------------------- -Covers failures **outside** the agent's control that still break the -interaction. Useful for separating agent-caused issues from infrastructure. +**Exhaustion** β€” Detects failures and constraints arising from the +surrounding system rather than the agent's internal policy or reasoning. +These are external issues the agent cannot control. .. list-table:: :header-rows: 1 - :widths: 25 25 50 + :widths: 30 70 - * - Category - - Leaf signal type - - Meaning - * - **Exhaustion** - - ``exhaustion.api_error`` - - Downstream API returned a 5xx or unexpected error. - * - - - ``exhaustion.timeout`` - - Tool / API call timed out. - * - - - ``exhaustion.rate_limit`` - - Rate-limit response from a tool / API. - * - - - ``exhaustion.network`` - - Transient network failure mid-call. - * - - - ``exhaustion.malformed_response`` - - Response received but couldn't be parsed. - * - - - ``exhaustion.context_overflow`` - - Context window / token budget exceeded. + * - Leaf signal type + - Description + * - ``environment.exhaustion.api_error`` + - 5xx errors, service unavailable. + * - ``environment.exhaustion.timeout`` + - Connection/read timeouts. + * - ``environment.exhaustion.rate_limit`` + - 429, quota exceeded. + * - ``environment.exhaustion.network`` + - Connection refused, DNS errors. + * - ``environment.exhaustion.malformed_response`` + - Invalid JSON, unexpected schema. + * - ``environment.exhaustion.context_overflow`` + - Token/context limit exceeded. How It Works ============ @@ -368,7 +402,8 @@ Visual Flag Marker When concerning signals are detected (disengagement present, stagnation count > 2, any execution failure / loop, or overall quality ``poor``/ -``severe``), the marker ``[!]`` is appended to the span's operation name. +``severe``), the marker 🚩 (U+1F6A9) is appended to the span's operation +name. This makes flagged sessions immediately visible in trace UIs without requiring attribute filtering. @@ -386,7 +421,7 @@ Example queries against the layered keys:: signals.execution.failure.count > 0 signals.environment.exhaustion.count > 0 -For flagged sessions, search for ``[!]`` in span names. +For flagged sessions, search for 🚩 in span names. .. image:: /_static/img/signals_trace.png :width: 100% @@ -473,7 +508,7 @@ Example Span A concerning session, showing both layered attributes and a per-instance event:: - # Span name: "POST /v1/chat/completions gpt-5.2 [!]" + # Span name: "POST /v1/chat/completions gpt-5.2 🚩" # Top-level signals.quality = "severe" @@ -585,7 +620,7 @@ Mitigation strategies: causes. .. tip:: - The ``[!]`` marker in the span name provides instant visual feedback in + The 🚩 marker in the span name provides instant visual feedback in trace UIs, while the structured attributes (``signals.quality``, ``signals.interaction.disengagement.severity``, etc.) and per-instance span events enable powerful querying and drill-down in your observability diff --git a/docs/source/conf.py b/docs/source/conf.py index a32e1383..26d8c280 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -33,6 +33,7 @@ extensions = [ "sphinx.ext.autodoc", "sphinx.ext.intersphinx", "sphinx.ext.extlinks", + "sphinx.ext.mathjax", "sphinx.ext.viewcode", "sphinx_sitemap", "sphinx_design", @@ -41,6 +42,7 @@ extensions = [ "provider_models", ] + # Paths that contain templates, relative to this directory. templates_path = ["_templates"] diff --git a/docs/source/guides/observability/tracing.rst b/docs/source/guides/observability/tracing.rst index 1e23f5f8..b3660168 100644 --- a/docs/source/guides/observability/tracing.rst +++ b/docs/source/guides/observability/tracing.rst @@ -114,11 +114,11 @@ Signals act as early warning indicators embedded in your traces: **Visual Flag Markers** -When concerning signals are detected (disengagement, execution failures / loops, stagnation > 2, or ``poor`` / ``severe`` quality), Plano automatically appends a ``[!]`` marker to the span's operation name. This makes problematic traces immediately visible in your tracing UI without requiring additional queries. +When concerning signals are detected (disengagement, execution failures / loops, stagnation > 2, or ``poor`` / ``severe`` quality), Plano automatically appends a 🚩 marker to the span's operation name. This makes problematic traces immediately visible in your tracing UI without requiring additional queries. **Example Span with Signals**:: - # Span name: "POST /v1/chat/completions gpt-4 [!]" + # Span name: "POST /v1/chat/completions gpt-4 🚩" # Standard LLM attributes: llm.model = "gpt-4" llm.usage.total_tokens = 225