What listeners actually hear

The most powerful delivery lever is the one a transcript can't see

Guyer, J. J., Fabrigar, L. R., & Vaughan-Johnston, T. I. (2019). Speech Rate, Intonation, and Pitch. PSPB. See also Niebuhr et al. (charismatic speech, TED corpus) and Goupil et al. (2021), Nature Communications.

Ask people what makes a confident speaker and they reach for fillers and "ums." The data points elsewhere. Pitch and intonation variation - vocal variety - is among the strongest predictors of perceived confidence, charisma, and persuasion. Niebuhr's analysis of the TED corpus found the top talks carry roughly 30% more vocal variety than average. And here is the catch: none of it shows up in a transcript.

~30% more vocal variety in top TED talks vs the average (Niebuhr corpus)

Figure · Schematic · shape illustrativeWhat drives perceived confidence

The "confidence" a listener hears is mostly acoustic - prosody, not word choice. A transcript-only score sees only the bottom two rows. Ranking synthesizes the source doc's tiers; weights are illustrative.

Data table

Item	relative weight as a confidence cue
Pitch / intonation variation (acoustic - invisible to text)	88
Falling terminals on conclusions (acoustic)	74
Adequate volume & range (acoustic)	62
Few hedges ("I think", "sort of") (visible in transcript)	55
Filler rate (visible in transcript)	38

Confidence is a bundle, not a single tell

Goupil and colleagues showed perceived confidence is a composite: falling intonation, adequate volume, short response latency, low hedging, a moderate-fast rate, few fillers - read together. No single marker is "confidence." Which is exactly why scolding each one separately misfires; the right move is to score the bundle and coach the highest-leverage piece, usually vocal variety.

Figure · Reported valuesVocal variety: top talks vs average

Niebuhr's TED corpus: the most charismatic talks carry ~30% more vocal variety. It is trainable - and it is the lever most speakers never work on.

Data table

Item	pitch/intonation range (relative)
Top TED talks	130
Average talk	100

What it means for Speech Away

A transcript cannot hear prosody - so a text-only "confidence" score is really a hedging-and-fillers proxy. That is why we send the audio itself to a multimodal model (Gemini), which can assess pitch variation, falling terminals, pace, and tone, not just words. It unlocks the Tier-1 lever a transcript-only pipeline is blind to: the actual sound of confidence.

The most powerful delivery lever is the one a transcript can't see

Confidence is a bundle, not a single tell#

What it means for Speech Away#

Confidence is a bundle, not a single tell

What it means for Speech Away