plano/docs/source/concepts/signals.rst

639 lines
24 KiB
ReStructuredText
Raw Normal View History

.. -*- coding: utf-8 -*-
========
Signals™
========
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Agentic Signals are lightweight, model-free behavioral indicators computed
from live interaction trajectories and attached to your existing
OpenTelemetry traces. They are the instrumentation layer of a closed-loop
improvement flywheel for agents — turning raw production traffic into
prioritized data that can drive prompt, routing, and model updates without
running an LLM-as-judge on every session.
The framework implemented here follows the taxonomy and detector design in
*Signals: Trajectory Sampling and Triage for Agentic Interactions*
(`Chen et al., 2026 <https://arxiv.org/abs/2604.00356>`_). All detectors
are computed without model calls; the entire pipeline attaches structured
attributes and span events to existing spans so your dashboards and alerts
work unmodified.
Why Signals Matter: The Improvement Flywheel
============================================
Agentic applications are increasingly deployed at scale, yet improving them
after deployment remains difficult. Production trajectories are long,
numerous, and non-deterministic, making exhaustive human review infeasible
and auxiliary LLM evaluation expensive. As a result, teams face a
bottleneck: they cannot score every response, inspect every trace, or
reliably identify which failures and successes should inform the next model
update. Without a low-cost triage layer, the feedback loop from production
behavior to model improvement remains incomplete.
Signals close this loop by cheaply identifying which interactions among
millions are worth inspecting:
1. **Instrument.** Live trajectories are scored with model-free signals
attached as structured attributes on existing OpenTelemetry spans,
organized under a fixed taxonomy of interaction, execution, and
environment signals. This requires no additional model calls,
infrastructure, or changes to online agent behavior.
2. **Sample & triage.** Signal attributes act as filters: they surface
severe failures, retrieve representative exemplars, and exclude the
uninformative middle. In our experiments, signal-based sampling
achieves 82% informativeness on :math:`\tau`-bench, compared with 54%
for random sampling, yielding a 1.52× efficiency gain per informative
trajectory.
3. **Data Construction.** The triaged subset becomes targeted input for
constructing preference datasets or supervised fine-tuning datasets
from production trajectories.
4. **Model Optimization.** The resulting preference or supervised
fine-tuning data is used to update the model through methods such as
DPO, RLHF, or supervised fine-tuning, so optimization is driven by
targeted production behavior rather than undifferentiated trace noise.
5. **Deploy.** The improved model is deployed and immediately
re-instrumented with the same signals, enabling teams to measure
whether the change improved production behavior and to feed the next
iteration.
This loop depends on the first step being nearly free. The framework is
therefore designed around fixed-taxonomy, model-free detectors with
:math:`O(\text{messages})` cost, no online behavior change, and no
dependence on expensive evaluator models. By making production traces
searchable and sampleable at scale, signals turn raw agent telemetry into a
practical model-optimization flywheel.
What Are Behavioral Signals?
============================
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Behavioral signals are canaries in the coal mine — early, objective
indicators that something may have gone wrong (or gone exceptionally well).
They don't explain *why* an agent failed, but they reliably signal *where*
attention is needed.
These signals emerge naturally from the rhythm of interaction:
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
- A user rephrasing or correcting the same request
- Sharp increases in conversation length
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
- Negative stance markers ("this doesn't work", ALL CAPS, excessive !!! or ???)
- Agent repetition or tool-call loops
- Expressions of gratitude, confirmation, or task success
- Requests for a human agent or explicit quit intent
- Tool errors, timeouts, rate limits, and context-window exhaustion
Individually, these clues are shallow; together, they form a fingerprint of
agent performance. Embedded directly into traces, they make it easy to spot
friction as it happens: where users struggle, where agents loop, where tool
failures cluster, and where escalations occur.
Signal Taxonomy
===============
Signals are organized into three top-level **layers**, each with its own
intent. Every detected signal belongs to exactly one leaf type under one of
seven categories. The per-category summaries and leaf-type descriptions
below are borrowed verbatim from the reference implementation at
`katanemo/signals <https://github.com/katanemo/signals>`_ to keep the
documentation and the detector contract in sync.
Interaction — user ↔ agent conversational quality
-------------------------------------------------
**Misalignment** — Misalignment signals capture semantic or intent mismatch
between the user and the agent, such as rephrasing, corrections,
clarifications, and restated constraints. These signals do not assert that
either party is "wrong"; they only indicate that shared understanding has
not yet been established.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``misalignment.correction``
- Explicit corrections, negations, mistake acknowledgments.
* - ``misalignment.rephrase``
- Rephrasing indicators, alternative explanations.
* - ``misalignment.clarification``
- Confusion expressions, requests for clarification.
**Stagnation** — Stagnation signals capture cases where the discourse
continues but fails to make visible progress. This includes near-duplicate
assistant responses, circular explanations, repeated scaffolding, and other
forms of linguistic degeneration.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``stagnation.dragging``
- Excessive turn count, conversation not progressing efficiently.
* - ``stagnation.repetition``
- Near-duplicate or repetitive assistant responses.
**Disengagement** — Disengagement signals mark the withdrawal of
cooperative intent from the interaction. These include explicit requests to
exit the agent flow (e.g., "talk to a human"), strong negative stances, and
abandonment markers.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``disengagement.escalation``
- Requests for human agent or support.
* - ``disengagement.quit``
- Notification to quit or leave.
* - ``disengagement.negative_stance``
- Complaints, frustration, negative sentiment.
**Satisfaction** — Satisfaction signals indicate explicit stabilization and
completion of the interaction. These include expressions of gratitude,
success confirmations, and closing utterances. We use these signals to
sample exemplar traces rather than to assign quality scores.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``satisfaction.gratitude``
- Expressions of thanks and appreciation.
* - ``satisfaction.confirmation``
- Explicit satisfaction expressions.
* - ``satisfaction.success``
- Confirmation of task completion or understanding.
Execution — agent-caused action quality
---------------------------------------
**Failure** — Detects agent-caused failures in tool/function usage. These
are issues the agent is responsible for (as opposed to environment failures
which are external system issues). Requires tool-call traces
(``function_call`` / ``observation``) to fire.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``execution.failure.invalid_args``
- Wrong type, missing required field.
* - ``execution.failure.bad_query``
- Empty results due to overly narrow/wrong query.
* - ``execution.failure.tool_not_found``
- Agent called non-existent tool.
* - ``execution.failure.auth_misuse``
- Agent didn't pass credentials correctly.
* - ``execution.failure.state_error``
- Tool called in wrong state/order.
**Loops** — Detects behavioral patterns where the agent gets stuck
repeating tool calls. These are distinct from
``interaction.stagnation`` (conversation text repetition) and
``execution.failure`` (single tool errors) — these detect tool-level
behavioral loops.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``execution.loops.retry``
- Same tool with identical args ≥3 times.
* - ``execution.loops.parameter_drift``
- Same tool with varied args ≥3 times.
* - ``execution.loops.oscillation``
- Multi-tool A→B→A→B pattern ≥3 cycles.
Environment — external system / boundary conditions
---------------------------------------------------
**Exhaustion** — Detects failures and constraints arising from the
surrounding system rather than the agent's internal policy or reasoning.
These are external issues the agent cannot control.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``environment.exhaustion.api_error``
- 5xx errors, service unavailable.
* - ``environment.exhaustion.timeout``
- Connection/read timeouts.
* - ``environment.exhaustion.rate_limit``
- 429, quota exceeded.
* - ``environment.exhaustion.network``
- Connection refused, DNS errors.
* - ``environment.exhaustion.malformed_response``
- Invalid JSON, unexpected schema.
* - ``environment.exhaustion.context_overflow``
- Token/context limit exceeded.
How It Works
============
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Signals are computed automatically by the gateway after each assistant
response and emitted as **OpenTelemetry trace attributes** and **span events**
on your existing spans. No additional libraries or instrumentation are
required — just configure your OTEL collector endpoint as usual.
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Each conversation trace is enriched with layered signal attributes
(category-level counts and severities) plus one span event per detected
signal instance (with confidence, snippet, and per-detector metadata).
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
.. note::
Signal analysis is enabled by default and runs on the request path. It
does **not** affect the response sent to the client. Set
``overrides.disable_signals: true`` in your Plano config to skip this
CPU-heavy analysis (see the configuration reference).
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
OTel Span Attributes
====================
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Signal data is exported as structured OTel attributes. There are two tiers:
**top-level** attributes (always emitted on spans that carry signal
analysis) and **layered** attributes (emitted only when the corresponding
category has at least one signal instance).
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Top-level attributes
--------------------
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Always emitted once signals are computed.
.. list-table::
:header-rows: 1
:widths: 40 15 45
* - Attribute
- Type
- Value
* - ``signals.quality``
- string
- One of ``excellent``, ``good``, ``neutral``, ``poor``, ``severe``.
* - ``signals.quality_score``
- float
- Numeric score 0.0 100.0 that feeds the quality bucket.
* - ``signals.turn_count``
- int
- Total number of user + assistant turns in the interaction.
* - ``signals.efficiency_score``
- float
- Efficiency metric 0.0 1.0 (stays at 1.0 up to baseline turns,
then decays: ``1 / (1 + 0.3 * (turns - baseline))``).
Layered attributes
------------------
Emitted per category, only when ``count > 0``. One ``.count`` and one
``.severity`` attribute per category. Severity is a 03 bucket (see
`Severity levels`_ below).
.. list-table::
:header-rows: 1
:widths: 50 50
* - Attribute (emitted when fired)
- Source
* - ``signals.interaction.misalignment.count``
- Any ``misalignment.*`` leaf type
* - ``signals.interaction.misalignment.severity``
- "
* - ``signals.interaction.stagnation.count``
- Any ``stagnation.*`` leaf type
* - ``signals.interaction.stagnation.severity``
- "
* - ``signals.interaction.disengagement.count``
- Any ``disengagement.*`` leaf type
* - ``signals.interaction.disengagement.severity``
- "
* - ``signals.interaction.satisfaction.count``
- Any ``satisfaction.*`` leaf type
* - ``signals.interaction.satisfaction.severity``
- "
* - ``signals.execution.failure.count``
- Any ``failure.*`` leaf type
* - ``signals.execution.failure.severity``
- "
* - ``signals.execution.loops.count``
- Any ``loops.*`` leaf type
* - ``signals.execution.loops.severity``
- "
* - ``signals.environment.exhaustion.count``
- Any ``exhaustion.*`` leaf type
* - ``signals.environment.exhaustion.severity``
- "
Legacy attributes (deprecated, still emitted)
---------------------------------------------
The following aggregate keys pre-date the paper taxonomy and are still
emitted for one release so existing dashboards keep working. They are
derived from the layered counts above and will be removed in a future
release. Migrate to the layered keys when convenient.
.. list-table::
:header-rows: 1
:widths: 50 50
* - Legacy attribute
- Layered equivalent
* - ``signals.follow_up.repair.count``
- ``signals.interaction.misalignment.count``
* - ``signals.follow_up.repair.ratio``
- (computed: ``misalignment.count / max(user_turns, 1)``)
* - ``signals.frustration.count``
- Count of ``disengagement.negative_stance`` instances
* - ``signals.frustration.severity``
- Derived severity bucket of the above
* - ``signals.repetition.count``
- ``signals.interaction.stagnation.count``
* - ``signals.escalation.requested``
- True if any ``disengagement.escalation`` or ``disengagement.quit`` fired
* - ``signals.positive_feedback.count``
- ``signals.interaction.satisfaction.count``
Span Events
===========
In addition to span attributes, every detected signal instance is emitted as
a span event named ``signal.<dotted-type>`` (e.g.
``signal.interaction.satisfaction.gratitude``). Each event carries:
.. list-table::
:header-rows: 1
:widths: 30 15 55
* - Event attribute
- Type
- Description
* - ``signal.type``
- string
- Full dotted signal type (same as the event name suffix).
* - ``signal.message_index``
- int
- Zero-based index of the message that triggered the signal.
* - ``signal.confidence``
- float
- Detector confidence in [0.0, 1.0].
* - ``signal.snippet``
- string
- Matched substring from the source message (when available).
* - ``signal.metadata``
- string (JSON)
- Per-detector metadata (pattern name, ratio values, etc.).
Span events are the right surface for drill-down: attribute filters narrow
traces, then events tell you *which messages* fired *which signals* with
*what evidence*.
Visual Flag Marker
------------------
When concerning signals are detected (disengagement present, stagnation
count > 2, any execution failure / loop, or overall quality ``poor``/
``severe``), the marker 🚩 (U+1F6A9) is appended to the span's operation
name.
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
This makes flagged sessions immediately visible in trace UIs without
requiring attribute filtering.
Querying in Your Observability Platform
---------------------------------------
Example queries against the layered keys::
signals.quality = "severe"
signals.turn_count > 10
signals.efficiency_score < 0.5
signals.interaction.disengagement.severity >= 2
signals.interaction.misalignment.count > 3
signals.interaction.satisfaction.count > 0 AND signals.quality = "good"
signals.execution.failure.count > 0
signals.environment.exhaustion.count > 0
For flagged sessions, search for 🚩 in span names.
.. image:: /_static/img/signals_trace.png
:width: 100%
:align: center
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Severity Levels
===============
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Every category aggregates its leaf signal counts into a severity bucket used
by both the layered ``.severity`` attribute and the overall quality score.
- **None (0)**: 0 instances
- **Mild (1)**: 12 instances
- **Moderate (2)**: 34 instances
- **Severe (3)**: 5+ instances
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Severity is always computed per-category. For example, three instances of
``misalignment.rephrase`` plus two of ``misalignment.correction`` yield
``signals.interaction.misalignment.severity = 3`` (5 instances total).
Overall Quality Assessment
==========================
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Signals are aggregated into an overall interaction quality on a 5-point
scale. The scoring model starts at 50.0 (neutral), adds positive weight for
satisfaction, and subtracts weight for disengagement, misalignment (when
ratio > 30% of user turns), stagnation (when count > 2), execution failures,
execution loops, and environment exhaustion.
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
The resulting numeric score maps to the bucket emitted in ``signals.quality``:
**Excellent (75 100)**
Strong positive signals, efficient resolution, low friction.
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
**Good (60 74)**
Mostly positive with minor clarifications; some back-and-forth but
successful.
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
**Neutral (40 59)**
Mixed signals; neither clearly good nor bad.
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
**Poor (25 39)**
Concerning negative patterns (high friction, multiple misalignments,
moderate disengagement, tool failures). High abandonment risk.
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
**Severe (0 24)**
Critical issues — escalation requested, severe disengagement, severe
stagnation, or compounding failures. Requires immediate attention.
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
The raw numeric score is available under ``signals.quality_score``.
Sampling and Prioritization
===========================
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
In production, trace data is overwhelming. Signals provide a lightweight
first layer of triage to select the small fraction of trajectories that are
most likely to be informative. Per the paper, signal-based sampling reaches
82% informativeness on τ-bench versus 54% for random sampling — a 1.52×
efficiency gain per informative trajectory.
Workflow:
1. Gateway captures conversation messages and computes signals
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
2. Signal attributes and per-instance events are emitted to OTEL spans
3. Your observability platform ingests and indexes the attributes
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
4. Query / filter by signal attributes to surface outliers and exemplars
5. Review high-information traces to identify improvement opportunities
6. Update prompts, routing, or policies based on findings
7. Redeploy and monitor signal metrics to validate improvements
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
This creates a reinforcement loop where traces become both diagnostic data
and training signal for prompt engineering, routing policies, and
preference-data construction.
.. note::
An in-gateway triage sampler that selects informative trajectories
inline — with configurable per-category weights and budgets — is planned
as a follow-up to this release. Today, sampling is consumer-side: your
observability platform filters on the signal attributes described above.
Example Span
============
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
A concerning session, showing both layered attributes and a per-instance
event::
# Span name: "POST /v1/chat/completions gpt-5.2 🚩"
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
# Top-level
signals.quality = "severe"
signals.quality_score = 0.0
signals.turn_count = 4
signals.efficiency_score = 1.0
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
# Layered (only non-zero categories are emitted)
signals.interaction.disengagement.count = 6
signals.interaction.disengagement.severity = 3
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
# Legacy (deprecated, emitted while dual-emit is on)
signals.frustration.count = 4
signals.frustration.severity = 2
signals.escalation.requested = true
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
# Per-instance span events
event: signal.interaction.disengagement.escalation
signal.type = "interaction.disengagement.escalation"
signal.message_index = 6
signal.confidence = 1.0
signal.snippet = "get me a human"
signal.metadata = {"pattern_type":"escalation"}
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Building Dashboards
===================
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Use signal attributes to build monitoring dashboards in Grafana, Honeycomb,
Datadog, etc. Prefer the layered keys — they align with the paper taxonomy
and will outlive the legacy keys.
- **Quality distribution**: Count of traces by ``signals.quality``
- **P95 turn count**: 95th percentile of ``signals.turn_count``
- **Average efficiency**: Mean of ``signals.efficiency_score``
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
- **High misalignment rate**: Percentage where
``signals.interaction.misalignment.count > 3``
- **Disengagement rate**: Percentage where
``signals.interaction.disengagement.severity >= 2``
- **Satisfaction rate**: Percentage where
``signals.interaction.satisfaction.count >= 1``
- **Escalation rate**: Percentage where a ``disengagement.escalation`` or
``disengagement.quit`` event fired (via span-event filter)
- **Tool-failure rate**: Percentage where
``signals.execution.failure.count > 0``
- **Environment issue rate**: Percentage where
``signals.environment.exhaustion.count > 0``
Creating Alerts
===============
Set up alerts based on signal thresholds:
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
- Alert when ``signals.quality = "severe"`` count exceeds threshold in a
1-hour window
- Alert on sudden spike in
``signals.interaction.disengagement.severity >= 2`` (>2× baseline)
- Alert on sustained ``signals.execution.failure.count > 0`` — agent-caused
tool issues
- Alert on spikes in ``signals.environment.exhaustion.count`` — external
system degradation
- Alert on degraded efficiency (P95 ``signals.turn_count`` up > 50%)
Best Practices
==============
Start simple:
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
- Alert or page on ``severe`` sessions (or on spikes in ``severe`` rate)
- Review ``poor`` sessions within 24 hours
- Sample ``excellent`` sessions as exemplars
Combine multiple signals to infer failure modes:
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
- **Silent loop**: ``signals.interaction.stagnation.severity >= 2`` +
``signals.turn_count`` above baseline
- **User giving up**: ``signals.interaction.disengagement.severity >= 2`` +
any escalation event
- **Misunderstood intent**:
``signals.interaction.misalignment.count / user_turns > 0.3``
- **Agent-caused friction**: ``signals.execution.failure.count > 0`` +
``signals.interaction.misalignment.count > 0``
- **External degradation, not agent fault**:
``signals.environment.exhaustion.count > 0`` while
``signals.execution.failure.count = 0``
- **Working well**: ``signals.interaction.satisfaction.count >= 1`` +
``signals.efficiency_score > 0.8`` + no disengagement
Limitations and Considerations
==============================
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Signals don't capture:
- Task completion / real outcomes
- Factual or domain correctness
- Silent abandonment (user leaves without expressing frustration)
- Non-English nuance (pattern libraries are English-oriented)
Mitigation strategies:
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
- Periodically sample flagged sessions and measure false positives / negatives
- Tune baselines per use case and user population
- Add domain-specific phrase libraries where needed
- Combine signals with non-text metrics (tool failures, disconnects, latency)
.. note::
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
Behavioral signals complement — but do not replace — domain-specific
response quality evaluation. Use signals to prioritize which traces to
inspect, then apply domain expertise and outcome checks to diagnose root
causes.
.. tip::
The 🚩 marker in the span name provides instant visual feedback in
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
trace UIs, while the structured attributes (``signals.quality``,
``signals.interaction.disengagement.severity``, etc.) and per-instance
span events enable powerful querying and drill-down in your observability
platform.
See Also
========
docs: align signals page with paper taxonomy (#910) * docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
2026-04-24 12:31:14 -07:00
- `Signals: Trajectory Sampling and Triage for Agentic Interactions
<https://arxiv.org/abs/2604.00356>`_ — the paper this framework implements
- :doc:`../guides/observability/tracing` — Distributed tracing for agent
systems
- :doc:`../guides/observability/monitoring` — Metrics and dashboards
- :doc:`../guides/observability/access_logging` — Request / response logging
- :doc:`../guides/observability/observability` — Complete observability guide