Tuesday, January 3, 2023

[Phonetics, Speech Technology] What are Fundamental Frequency, Harmonics, and Formants?

The human vocal tract is a kind of resonator, with one fixed-end (i.e., vocal folds) and the other open-end (i.e., lips),
which is acting as a filter in terms of source-filter theory. Regarding sound source, there are largely two types of it:
glottal source and supra-glottal source. Glottal source can be periodic (i.e., voiced), aperiodic (i.e., whisper),
or mixed (i.e., breathy) while supra-glottal source is almost always aperiodic (i.e., noise).
Of the two sources, glottal source, in particular vocal folds, is pertinent to the fundamental frequency, harmonics, and formants,
because they are essentially periodic.

Among them, the fundamental frequency and harmonics are properties of the source,
exactly vocal folds while formants are those of the filter, exactly the vocal tract.
First of all, a complex wave, according to Fourier theorem, consists of simple waves, and
each simple wave is referred to as harmonic. The first harmonic is referred to as t
he fundamental frequency, which is determined by the number of the vocal folds' vibration
in one second, and the following harmonics, also called overtones, are multiple integers of it.
For instance, if the fundamental frequency, the first harmonic, is 100 Hz, the second harmonic
should be 200 Hz, the third harmonic should be 300 Hz, and so on.

When those harmonics are filtered through the vocal tract, some are going to be dampen out
while others are going to be amplified by the length and shape of the vocal tract. The harmonics
filtered through the vocal tract are called formant frequencies, essentially resonant frequencies.
The first three formant frequencies, each called the first formant (F1), the second formant (F2),
and the third formant (F3), are widely used by the hearer to categorize speech sounds,
in particular vowels.

In short, overtones are multiple integers of the fundamental frequency at the vocal cord,
harmonics are embracing the fundamental frequency and overtones, and formants are
resonant frequencies, viz. harmonics filtered through the vocal tract.

Lastly, formants are determined by the length and shape of the vocal tract and can be
calculated by Fn = (2n-1)*c/4L. 'n' represents the number of the resonance,
'c' speed of sound (mostly in air), which is around 35,000 cm/sec, and 'L' the length
of the resonator, here the length of the vocal tract. If one's vocal tract is 17.5cm long and
one is going to produce a schwa, which does not much affect the shape of the vocal tract,
the first three formants can be calculated as in the following table.

The number of the resonance | Formant frequencies
:----------:|:---------------------
1 (F1)|35,000cm/sec / 17.5cm*4 = 500 Hz
2 (F2)|35,000cm/sec / 17.5cm*4/3 = 1,500 Hz
3 (F3)|35,000cm/sec / 17.5cm*4/5 = 2,500 Hz

No comments:

Post a Comment