dograh/api/services/voice_prompting_guide/topics/disfluencies.py

77 lines
2.9 KiB
Python

"""Topic: build human disfluencies into the agent's speech."""
from __future__ import annotations
from api.services.voice_prompting_guide._base import (
AuditCheck,
Stage,
StageLens,
VoicePromptingTopic,
)
TOPIC = VoicePromptingTopic(
id="disfluencies",
title="Build natural disfluencies into the agent's speech",
severity="medium",
applies_to_node_types=("globalNode", "agentNode", "startCall"),
stages={
Stage.create: StageLens(
relevant=True,
lens=(
"Give the global prompt a disfluency vocabulary (fillers, thinking "
"sounds, self-corrects, word repeats), target a couple per turn, and "
"add a self-check: a perfectly polished sentence means it's drifted "
"off-character."
),
),
Stage.review: StageLens(
relevant=True,
lens=(
"Check the prompt actually instructs natural disfluency and includes "
"the self-monitor. Polished-by-default speech is the tell that "
"separates an agent from a person."
),
),
},
content="""\
LLMs default to clean, polished output. In text that reads well; in voice it's
the uncanny valley. Real people stutter, restart, use fillers, and self-correct
mid-thought. If the agent doesn't, callers notice even if they can't say why.
Build a disfluency vocabulary into the global prompt:
- Fillers: um, uh, like, so, well, you know, I mean
- Thinking sounds: let me see, hmm, one sec
- Self-corrects: "your order ID is - wait, let me check - okay, it's A X C one
eight Z"
- Word repeats: "I can schedule that for - uh - for tomorrow at eight AM"
Target roughly two to four disfluencies per turn — at least one. Too few and
the agent sounds robotic; too many and it sounds glitchy. Add a self-monitoring
instruction: "If a turn comes out as one polished sentence with no disfluency,
you've drifted off-character."
When you give example phrases, write them as complete sample responses — the
model will reuse them closely. Pair that with a "vary your responses, don't
repeat the same sentence twice" rule so the samples don't get parroted.
This is a global-prompt rule whose effect lands on every spoken turn. It works
with the response-style topic (short, contraction-heavy turns are easier to
make sound human).
""",
audit_checks=(
AuditCheck(
id="instructs_disfluency",
judge_question=(
"Does the prompt instruct the agent to speak with natural human "
"disfluencies — fillers, self-corrections, or word repeats — rather "
"than in consistently polished prose?"
),
expected="yes",
quote=(
"No disfluency guidance — fully polished speech reads as robotic on "
"a call."
),
),
),
cross_refs=("response_style",),
)