Frequently asked questions

Straight answers about formant Hz values, practice index, and what the app can and cannot tell you.

For a full walkthrough of recordings and scores, see How it works. For formants and trans voice training specifically, see Formants & resonance on that page.

Why are F1, F2, and F3 all “so high”? (And is F3 around 2500 Hz wrong?)

Formants are not your speaking pitch. Pitch (fundamental frequency) is how fast your vocal folds vibrate—often roughly ~85–180 Hz for many adult speaking voices, sometimes higher when training. Formants are separate resonances in your throat and mouth. They are measured in Hz too, but they describe filtering, not “how high your voice is” in the everyday sense.

Because all three use the same unit (Hz), it is easy to think F2 or F3 must be “impossibly high.” They are not meant to be compared to pitch or to a tone generator at that frequency.

Typical ballparks from vowel research (Hillenbrand 1995 corner-vowel averages)

Formant	What it mainly reflects	Often cited typical range (Hz)
F1	Mouth openness / tongue height	~500–650 (group averages on steady vowels)
F2	Tongue front vs back (“brightness”)	~1,650–1,950
F3	Higher timbre / vocal-tract length cues	~2,600–2,900

So F3 near 2,500 Hz is not a bug and not “whistle pitch.” It is squarely in the normal research range for the third resonance. F2 near 1,500–2,000 Hz and F1 near 500–700 Hz can also look “high” next to pitch—and still be ordinary for speech resonance.

F1 is the one people most often expect to stay near ~500 Hz because charts quote vowel averages. On full sentences (mixed vowels, automatic tracking), F1 can read higher than a single steady /a/ or /i/ in a lab—sometimes 800–1000+ Hz when harmonics are mis-tracked. That is a tracking-quality issue, not proof your tract resonated at a pure 1000 Hz tone.

VoxAccalia filters implausible frames, uses IQR-trimmed medians, and compares you to research anchors. If trimmed and raw medians differ a lot, treat the headline Hz as shaky until you get a cleaner take—not as something to “verify” with a sine wave.

I put the formant Hz in a tone generator and it sounded nothing like my voice. Does that mean the app is wrong?

No—that test compares two different things. A tone generator plays a pure sine wave at one frequency (for example 2,500 Hz). That is a single, steady pitch-like beep. It does not recreate how your vocal tract shapes speech.

When you talk, your vocal folds produce a complex buzz (fundamental plus many harmonics). Your throat, tongue, lips, and jaw boost some bands of that buzz and damp others. Formants F1, F2, and F3 are the frequencies where those boosts are strongest—they are resonance peaks, not “the note you are singing.”

Playing 2,500 Hz alone does not reproduce “your F3.” It omits your fundamental pitch, all other harmonics, consonants, timing, and the other two formants working together. Nobody hears your voice as a isolated 2,500 Hz whistle; listeners hear the whole spectrum shaped at once.

Pitch on your recording page is the number to compare with “how high/low” intuition (and even then, listen to the clip). F1–F3 are for resonance training feedback—brightness, mouth shape, tract space—not for typing into a tone app to “check” plausibility.

Why doesn’t practice index match how I sound or my training goal?

It is not a listener test. Practice index is where your clip sits between lower and higher acoustic references for pitch, formants, and intonation. It does not say how others will hear you, whether you sound “convincing,” or anything about identity.

Real perception depends on context (phone vs in-person), articulation, rhythm, word choice, and much more than three Hz numbers. One clip can score high on the index while still sounding strained or unlike your goal—for example very high pitch with little resonance change.

If the % feels “wrong,” check whether you are comparing it to listening instead of to movement on the cue breakdown (pitch, resonance, intonation). Set personal baselines so the scale reflects your starting voice and target, not only population averages.

The % went up but I don’t sound closer to my goal. Why?

Pitch and formants move independently. Pushing pitch extreme without mouth/throat resonance changes can raise the blended score while the voice still sounds strained or mismatched to your goal. That is expected math, not a bug.

Resonance (formants) counts for most of the blend (with F1 weighted highest). Check whether F1–F3 moved, not only median pitch. Sustainable training usually coordinates laryngeal height, oral posture, and pitch—not one knob alone.

What’s the difference between pitch and formants?

Pitch is how fast your vocal folds vibrate—how “high” or “low” the fundamental frequency feels.

Formants (F1–F3) describe how your vocal tract filters that buzz—vowel shape and timbre. You can change brightness and “front/back” resonance without changing pitch much, and vice versa.

Many trans voice training paths work on both; VoxAccalia shows them separately so you can see which cue moved.

Why do I see both “trimmed” and “raw” formant medians?

The big number on your recording is the trimmed median: outlier frames (outside 1.5× the interquartile range for that clip) are dropped, then we take the median. That value feeds practice index when tracking is reliable.

When trimming changes a formant by more than about 2 Hz, we also show the raw median of every valid tracked frame. A large gap usually means noisy audio or unstable tracking—not that you secretly sound like the raw number.

What does unreliable formant tracking mean?

The analyzer could not find enough stable voiced speech, or formant measurements failed plausibility checks (too few frames, values out of range, or heavy disagreement between raw and trimmed values).

When that happens, resonance may be excluded or down-weighted in practice index. Fix the take: clearer speech, less background noise, a few more seconds of audio, consistent mic distance—then record again.

Should I use population benchmarks or my own baselines?

Population benchmarks (see the Benchmarks page) are useful defaults from published vowel and pitch studies when you have not set anything personal yet.

Personal baselines—a “starting voice” clip and a “training target” clip—score new recordings between your endpoints. That often feels more meaningful than comparing every take to group averages that may not match your anatomy or mic.

You can use either or both; changing baselines updates scores on existing recordings without re-recording.

Does a high or low % mean I sound convincing to others?

No. VoxAccalia is not a passability detector and cannot hear you the way another person does. Percentages track acoustic movement toward references you chose—they do not label you or replace feedback from listening, trusted listeners, or a qualified coach.

See How it works for what we can’t tell you.

Still stuck?