Speech Away

What listeners actually hear

The most powerful delivery lever is the one a transcript can't see

Guyer, J. J., Fabrigar, L. R., & Vaughan-Johnston, T. I. (2019). Speech Rate, Intonation, and Pitch. PSPB. See also Niebuhr et al. (charismatic speech, TED corpus) and Goupil et al. (2021), Nature Communications.

Ask people what makes a confident speaker and they reach for fillers and "ums." The data points elsewhere. Pitch and intonation variation - vocal variety - is among the strongest predictors of perceived confidence, charisma, and persuasion. Niebuhr's analysis of the TED corpus found the top talks carry roughly 30% more vocal variety than average. And here is the catch: none of it shows up in a transcript.

~30% more vocal variety in top TED talks vs the average (Niebuhr corpus)
Figure · Schematic · shape illustrativeWhat drives perceived confidence
Pitch / intonation variationacoustic - invisible to text88Falling terminals on conclusionsacoustic74Adequate volume & rangeacoustic62Few hedges ("I think", "sort of")visible in transcript55Filler ratevisible in transcript38
The "confidence" a listener hears is mostly acoustic - prosody, not word choice. A transcript-only score sees only the bottom two rows. Ranking synthesizes the source doc's tiers; weights are illustrative.
Data table
Itemrelative weight as a confidence cue
Pitch / intonation variation (acoustic - invisible to text)88
Falling terminals on conclusions (acoustic)74
Adequate volume & range (acoustic)62
Few hedges ("I think", "sort of") (visible in transcript)55
Filler rate (visible in transcript)38

Confidence is a bundle, not a single tell

Goupil and colleagues showed perceived confidence is a composite: falling intonation, adequate volume, short response latency, low hedging, a moderate-fast rate, few fillers - read together. No single marker is "confidence." Which is exactly why scolding each one separately misfires; the right move is to score the bundle and coach the highest-leverage piece, usually vocal variety.

Figure · Reported valuesVocal variety: top talks vs average
Top TED talks130Average talk100
Niebuhr's TED corpus: the most charismatic talks carry ~30% more vocal variety. It is trainable - and it is the lever most speakers never work on.
Data table
Itempitch/intonation range (relative)
Top TED talks130
Average talk100

What it means for Speech Away

A transcript cannot hear prosody - so a text-only "confidence" score is really a hedging-and-fillers proxy. That is why we send the audio itself to a multimodal model (Gemini), which can assess pitch variation, falling terminals, pace, and tone, not just words. It unlocks the Tier-1 lever a transcript-only pipeline is blind to: the actual sound of confidence.