dograh/api/services/voice_prompting_guide/topics/disfluencies.py

"""Topic: build human disfluencies into the agent's speech."""

from __future__ import annotations

from api.services.voice_prompting_guide._base import (
    AuditCheck,
    Stage,
    StageLens,
    VoicePromptingTopic,
)

TOPIC = VoicePromptingTopic(
    id="disfluencies",
    title="Build natural disfluencies into the agent's speech",
    severity="medium",
    applies_to_node_types=("globalNode", "agentNode", "startCall"),
    stages={
        Stage.create: StageLens(
            relevant=True,
            lens=(
                "Give the global prompt a disfluency vocabulary (fillers, thinking "
                "sounds, self-corrects, word repeats), target a couple per turn, and "
                "add a self-check: a perfectly polished sentence means it's drifted "
                "off-character."
            ),
        ),
        Stage.review: StageLens(
            relevant=True,
            lens=(
                "Check the prompt actually instructs natural disfluency and includes "
                "the self-monitor. Polished-by-default speech is the tell that "
                "separates an agent from a person."
            ),
        ),
    },
    content="""\
LLMs default to clean, polished output. In text that reads well; in voice it's
the uncanny valley. Real people stutter, restart, use fillers, and self-correct
mid-thought. If the agent doesn't, callers notice even if they can't say why.

Build a disfluency vocabulary into the global prompt:
- Fillers: um, uh, like, so, well, you know, I mean
- Thinking sounds: let me see, hmm, one sec
- Self-corrects: "your order ID is - wait, let me check - okay, it's A X C one
  eight Z"
- Word repeats: "I can schedule that for - uh - for tomorrow at eight AM"

Target roughly two to four disfluencies per turn — at least one. Too few and
the agent sounds robotic; too many and it sounds glitchy. Add a self-monitoring
instruction: "If a turn comes out as one polished sentence with no disfluency,
you've drifted off-character."

When you give example phrases, write them as complete sample responses — the
model will reuse them closely. Pair that with a "vary your responses, don't
repeat the same sentence twice" rule so the samples don't get parroted.

This is a global-prompt rule whose effect lands on every spoken turn. It works
with the response-style topic (short, contraction-heavy turns are easier to
make sound human).
""",
    audit_checks=(
        AuditCheck(
            id="instructs_disfluency",
            judge_question=(
                "Does the prompt instruct the agent to speak with natural human "
                "disfluencies — fillers, self-corrections, or word repeats — rather "
                "than in consistently polished prose?"
            ),
            expected="yes",
            quote=(
                "No disfluency guidance — fully polished speech reads as robotic on "
                "a call."
            ),
        ),
    ),
    cross_refs=("response_style",),
)