Shared analysis only. Percentages are acoustic estimates for practice tracking—not how listeners will perceive this voice or whether someone “passes.” They do not replace listening to the clip yourself or feedback from another listener or clinician.
Acoustic estimate · not listener perception
Shared recording · anyone with this link can view results until the link expires.
Dickinson — Afraid to own a body
May 28, 2026 4:03 PM EDT · Recording · Emily Dickinson: Afraid to own a body
Playback
Pitch (median)
191.8 Hz
Voiced speech frames only; not IQR-trimmed.
Pitch std dev (σ)
23.8 Hz
IQR-trimmed across voiced frames.
Range: 75.1-589.5 Hz
Raw σ (all frames): 83.2 Hz
Cue breakdown (toward target)
Read these first—the headline % below is a weighted blend of these lines.
Pitch
96%
Acoustic estimate · not listener perception
Formant
51%
Acoustic estimate · not listener perception
Intonation
73%
Acoustic estimate · not listener perception
Headline blend Weighted toward-target score and alternate-direction mirror
Toward target %
66.4%
Acoustic estimate · not listener perception
Weighted blend of the cue breakdown above.
Alternate direction %
33.6%
Acoustic estimate · not listener perception
Mirror of the cue scores—not a separate measurement.
Resonance (formants) F1–F3 medians and how they feed the blend
F1 (median, trimmed)
599 Hz
F2 (median, trimmed)
1,683 Hz
F3 (median, trimmed)
2,885 Hz
Raw median (all valid frames): 2,865 Hz
“Toward target %” blends resonance/formants (60%, F1 weighted), pitch (30%), and intonation (pitch variability and range, 10%).
Trimmed formant medians drop outlier frames (1.5× IQR) before taking the median. Raw medians appear below when they differ by more than 2 Hz—often a sign of noisy tracking, not pauses in your speech.
Reference comparisons Population anchors for pitch, variability, and formants
The tables below are population reference anchors for context. Uses the same population reference anchors as in the app. Any baselines belong to the owner.
Pitch mean vs reference
Higher reference (toward target)
170 Hz
This clip: +22 Hz
Lower reference
121 Hz
This clip: +70 Hz
Pitch variability (σ) vs reference
Standard deviation of voiced pitch, with outlier frames removed (1.5×IQR).
Higher reference σ (toward target)
27 Hz
This clip: -3.2 Hz
Lower reference σ
22 Hz
This clip: +1.8 Hz
Formants vs reference anchors (Hillenbrand 1995)
Typical adult F1 in vowel studies is often ~300–800 Hz depending on the vowel; 1000+ Hz usually indicates a tracking artifact.
F1
- This clip
- 599 Hz
- Reference
- 625 Hz
- Δ vs target
- -26 Hz
- Lower reference
- 579 Hz
- Δ vs other
- +20 Hz
Higher reference
F2
- This clip
- 1,683 Hz
- Reference
- 1,942 Hz
- Δ vs target
- -259 Hz
- Lower reference
- 1,531 Hz
- Δ vs other
- +152 Hz
Higher reference
F3
- This clip
- 2,885 Hz
- Reference
- 2,921 Hz
- Δ vs target
- -36 Hz
- Lower reference
- 2,414 Hz
- Δ vs other
- +471 Hz
Higher reference
Resonance-only score (toward target): 50.7% (included in headline when tracking is reliable).
In higher-reference band (toward target): F1 F2 F3
Charts over time Pitch, blend, and formants within this clip
Pitch over time
Dashed: Higher reference (population) (170 Hz) · Starting voice (saved baseline) (121 Hz)
Toward target % over time
Formants F1–F3
Shared via VoiceLab. What do these numbers mean?