Shared analysis only. Percentages are acoustic estimates for practice tracking—not how listeners will perceive this voice or whether someone “passes.” They do not replace listening to the clip yourself or feedback from another listener or clinician.

Acoustic estimate · not listener perception

Shared recording · anyone with this link can view results until the link expires.

Dickinson — Afraid to own a body

May 28, 2026 4:03 PM EDT · Recording · Emily Dickinson: Afraid to own a body

Playback

Pitch (median)

191.8 Hz

Voiced speech frames only; not IQR-trimmed.

Pitch std dev (σ)

23.8 Hz

IQR-trimmed across voiced frames.

Range: 75.1-589.5 Hz

Raw σ (all frames): 83.2 Hz

Cue breakdown (toward target)

Read these first—the headline % below is a weighted blend of these lines.

Pitch

96%

Acoustic estimate · not listener perception

Formant

51%

Acoustic estimate · not listener perception

Intonation

73%

Acoustic estimate · not listener perception

Headline blend Weighted toward-target score and alternate-direction mirror

Toward target %

66.4%

Acoustic estimate · not listener perception

Weighted blend of the cue breakdown above.

Alternate direction %

33.6%

Acoustic estimate · not listener perception

Mirror of the cue scores—not a separate measurement.

Resonance (formants) F1–F3 medians and how they feed the blend

F1 (median, trimmed)

599 Hz

F2 (median, trimmed)

1,683 Hz

F3 (median, trimmed)

2,885 Hz

Raw median (all valid frames): 2,865 Hz

“Toward target %” blends resonance/formants (60%, F1 weighted), pitch (30%), and intonation (pitch variability and range, 10%).

Trimmed formant medians drop outlier frames (1.5× IQR) before taking the median. Raw medians appear below when they differ by more than 2 Hz—often a sign of noisy tracking, not pauses in your speech.

Reference comparisons Population anchors for pitch, variability, and formants

The tables below are population reference anchors for context. Uses the same population reference anchors as in the app. Any baselines belong to the owner.

Pitch mean vs reference

Higher reference (toward target)

170 Hz

This clip: +22 Hz

Lower reference

121 Hz

This clip: +70 Hz

Pitch variability (σ) vs reference

Standard deviation of voiced pitch, with outlier frames removed (1.5×IQR).

Higher reference σ (toward target)

27 Hz

This clip: -3.2 Hz

Lower reference σ

22 Hz

This clip: +1.8 Hz

Formants vs reference anchors (Hillenbrand 1995)

Typical adult F1 in vowel studies is often ~300–800 Hz depending on the vowel; 1000+ Hz usually indicates a tracking artifact.

F1

This clip
599 Hz

Higher reference

Reference
625 Hz
Δ vs target
-26 Hz
Lower reference
579 Hz
Δ vs other
+20 Hz

F2

This clip
1,683 Hz

Higher reference

Reference
1,942 Hz
Δ vs target
-259 Hz
Lower reference
1,531 Hz
Δ vs other
+152 Hz

F3

This clip
2,885 Hz

Higher reference

Reference
2,921 Hz
Δ vs target
-36 Hz
Lower reference
2,414 Hz
Δ vs other
+471 Hz

Resonance-only score (toward target): 50.7% (included in headline when tracking is reliable).

In higher-reference band (toward target): F1 F2 F3

Charts over time Pitch, blend, and formants within this clip

Pitch over time

Dashed: Higher reference (population) (170 Hz) · Starting voice (saved baseline) (121 Hz)

Toward target % over time

Formants F1–F3

Shared via VoiceLab. What do these numbers mean?