mirror of
https://github.com/katanemo/plano.git
synced 2026-04-29 19:06:34 +02:00
docs: align signals page with paper taxonomy (#910)
* docs: align signals page with paper taxonomy Updates docs/source/concepts/signals.rst and the tracing guide's signals subsection to reflect the three-layer taxonomy shipped in #903: - Introduces the paper reference (arXiv:2604.00356) and the three layers (interaction, execution, environment) with all 20 leaf signal types in three reference tables - Documents the new layered OTel attribute set (signals.interaction.*, signals.execution.*, signals.environment.*) and marks the legacy aggregate keys (signals.follow_up.repair.*, signals.frustration.*, signals.repetition.count, signals.escalation.requested, signals.positive_feedback.count) as deprecated-but-still-emitted - Adds a Span Events section describing the per-instance signal.<type> events with confidence / snippet / metadata attributes - Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs) - Updates all example attributes, dashboard queries, and alert rules to use the layered keys - Updates the tracing guide's behavioral-signals subsection to match - Notes that the triage sampler is a planned follow-up and today sampling is consumer-side via observability-platform filters Build verified locally: sphinx-build produces no warnings on these files. Made-with: Cursor * docs: reframe signals intro around the improvement flywheel Addresses review feedback on #910: - Replace the triage-only framing at the top with an instrument -> sample & triage -> construct data -> optimize -> deploy flywheel that explains why signals matter, not just what they surface. Paper's 82% / 1.52x numbers move into step 2 of the flywheel where they belong. - Remove the 'Signals vs Response Quality' section. Per review, signals and response quality overlap rather than complement each other, so the comparison is misleading. - Borrow the per-category summaries and leaf-type descriptions verbatim from the katanemo/signals reference implementation (module docstrings) so the documentation and the detector contract stay in sync. Drops the hand-crafted examples that were not strictly accurate (e.g. 'semantic overlap is high' for rephrase, 'user explicitly corrects the agent' for correction). Made-with: Cursor * docs: address signals flywheel review feedback Addresses review comments on #910: - Shorten the paper citation to (Chen et al., 2026) per common cite practice (replacing the full author list form). - Replace the Why Signals Matter section with the review-suggested rewrite verbatim: more formal intro framing, renumbered steps to Instrument / Sample & triage / Data Construction / Model Optimization / Deploy, removes 'routing decisions' from the data-construction step, and adds DPO/RLHF/SFT as model-optimization examples. - Renders tau and O(messages) as proper math glyphs via the sphinx built-in :math: role (enabled by adding sphinx.ext.mathjax to conf.py). Using the RST role form rather than raw $...$ inline so sphinx only injects MathJax on pages that actually have math, instead of loading ~1MB of JS on every page. Build verified locally: sphinx-build produces no warnings on the changed files and the rendered HTML wraps tau and O(messages) in MathJax-ready <span class="math">\(\tau\)</span> containers. Made-with: Cursor
This commit is contained in:
parent
b81eb7266c
commit
5a652eb666
3 changed files with 547 additions and 253 deletions
|
|
@ -101,45 +101,59 @@ This creates a complete end-to-end trace showing the full request lifecycle thro
|
|||
Behavioral Signals in Traces
|
||||
----------------------------
|
||||
|
||||
Plano automatically enriches OpenTelemetry traces with :doc:`../../concepts/signals` — behavioral quality indicators computed from conversation patterns. These signals are attached as span attributes, providing immediate visibility into interaction quality.
|
||||
Plano automatically enriches OpenTelemetry traces with :doc:`../../concepts/signals` — lightweight, model-free behavioral indicators organized into three layers (interaction, execution, environment) per `Chen et al., 2026 <https://arxiv.org/abs/2604.00356>`_. Signals are attached as span attributes and per-instance span events, providing immediate visibility into interaction quality.
|
||||
|
||||
**What Signals Provide**
|
||||
|
||||
Signals act as early warning indicators embedded in your traces:
|
||||
|
||||
- **Quality Assessment**: Overall interaction quality (Excellent/Good/Neutral/Poor/Severe)
|
||||
- **Efficiency Metrics**: Turn count, efficiency scores, repair frequency
|
||||
- **User Sentiment**: Frustration indicators, positive feedback, escalation requests
|
||||
- **Agent Behavior**: Repetition detection, looping patterns
|
||||
- **Quality Assessment**: Overall interaction quality (``excellent`` / ``good`` / ``neutral`` / ``poor`` / ``severe``) and numeric score
|
||||
- **Interaction layer**: misalignment, stagnation, disengagement, satisfaction
|
||||
- **Execution layer**: tool failures and loop patterns (from ``function_call`` / ``observation`` traces)
|
||||
- **Environment layer**: exhaustion (API errors, timeouts, rate limits, context overflow)
|
||||
|
||||
**Visual Flag Markers**
|
||||
|
||||
When concerning signals are detected (frustration, looping, escalation, or poor/severe quality), Plano automatically appends a flag marker **🚩** to the span's operation name. This makes problematic traces immediately visible in your tracing UI without requiring additional queries.
|
||||
When concerning signals are detected (disengagement, execution failures / loops, stagnation > 2, or ``poor`` / ``severe`` quality), Plano automatically appends a ``[!]`` marker to the span's operation name. This makes problematic traces immediately visible in your tracing UI without requiring additional queries.
|
||||
|
||||
**Example Span with Signals**::
|
||||
|
||||
# Span name: "POST /v1/chat/completions gpt-4 🚩"
|
||||
# Span name: "POST /v1/chat/completions gpt-4 [!]"
|
||||
# Standard LLM attributes:
|
||||
llm.model = "gpt-4"
|
||||
llm.usage.total_tokens = 225
|
||||
|
||||
# Behavioral signal attributes:
|
||||
signals.quality = "Severe"
|
||||
signals.turn_count = 15
|
||||
signals.efficiency_score = 0.234
|
||||
signals.frustration.severity = 3
|
||||
signals.escalation.requested = "true"
|
||||
# Top-level signal attributes:
|
||||
signals.quality = "severe"
|
||||
signals.quality_score = 0.0
|
||||
signals.turn_count = 15
|
||||
signals.efficiency_score = 0.234
|
||||
|
||||
# Layered attributes (only non-zero categories are emitted):
|
||||
signals.interaction.misalignment.count = 4
|
||||
signals.interaction.misalignment.severity = 2
|
||||
signals.interaction.disengagement.count = 5
|
||||
signals.interaction.disengagement.severity = 3
|
||||
|
||||
# Per-instance span event:
|
||||
event: signal.interaction.disengagement.escalation
|
||||
signal.type = "interaction.disengagement.escalation"
|
||||
signal.message_index = 14
|
||||
signal.confidence = 1.0
|
||||
signal.snippet = "get me a human"
|
||||
|
||||
**Querying Signal Data**
|
||||
|
||||
In your observability platform (Jaeger, Grafana Tempo, Datadog, etc.), filter traces by signal attributes:
|
||||
|
||||
- Find severe interactions: ``signals.quality = "Severe"``
|
||||
- Find frustrated users: ``signals.frustration.severity >= 2``
|
||||
- Find severe interactions: ``signals.quality = "severe"``
|
||||
- Find disengaged users: ``signals.interaction.disengagement.severity >= 2``
|
||||
- Find misaligned interactions: ``signals.interaction.misalignment.count > 3``
|
||||
- Find tool failures: ``signals.execution.failure.count > 0``
|
||||
- Find external issues: ``signals.environment.exhaustion.count > 0``
|
||||
- Find inefficient flows: ``signals.efficiency_score < 0.5``
|
||||
- Find escalations: ``signals.escalation.requested = "true"``
|
||||
|
||||
For complete details on all available signals, detection methods, and best practices, see the :doc:`../../concepts/signals` guide.
|
||||
For complete details on all 20 leaf signal types, severity scheme, legacy attribute deprecation, and best practices, see the :doc:`../../concepts/signals` guide.
|
||||
|
||||
|
||||
Custom Span Attributes
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue