Tuesday, January 3, 2023

[Speech Technology] What is Vector Quantization?

The size of training data should be large enough for reliable values to be driven for all
the parameters. However, the larger the number of feature vectors is, the more possible
values for each feature are. Not only is this memory-inefficient, but also this is problematic
because many feature vectors will not occur at all in the training data. One solution for these
problems is using Vector Quantization (VQ).

VQ is a data compression technique. It does not deal with all the feature vectors, but only
some centroids of them, which can be obtained through Euclidean distance.
As a simple example, if we want to represent 0 to 7 in one dimension. 3 bits (by 23 = 8)
are needed to do so. However, if we apply VQ to this, only 2 bits (by 22 = 4) are needed
(4 centroids in 4 clusters: 1 in 0 to 1.99; 2 in 2 to 3.99; 5 in 4 to 5.99; and 7 in 6 to 7.99).
So, if “41371512” is the target, 24 bits (by 3 * 8) are needed without VQ while 16 bits (by 2 * 8)
with VQ. However, they are represented in different ways: VQ version is
with centroids “51271512” while non-VQ version is as it is. VQ also can be applied
to more than one dimensions. For example, if there are 16 dots in a two-dimensional space
(plane), 4 bits (by 24 =16) per data value are needed, but if 16 dots are represented
with 4 centroids, only 2 bits (by 22 = 4) per data value are needed.

To be more specific on how this works, let’s take another example.
If there are 4,096 dots in a plane, 12 bits (by 212 = 4,096) per data value are needed,
but if these are represented with 16 centroids through such clustering techniques as k-means,
only 4 bits (24 = 16) per data value are needed. All these centroid values (e.g., (2, 5)) are to be
saved as vector values and will be assigned vector numbers
(i.e., 0000, 0001, 0010, … 1111 in this case). These values are further to be saved
in the codebook, with vector values as codebook values and vector numbers as codebook entries.
If a codebook is established with a training data set as such, a sequence of acoustic data
in the form of vectors can be represented with codebook entries. For instance, if (2, 5) is saved
as entry 3 in the codebook and one of the vectors in the new acoustic data is (2, 4.9), it is going
to be represented as the entry number of 3.

In short, VQ is a data compression method, by which only some representative vectors of clusters
are dealt with. As the number of feature vectors is increasing, the effect of VQ is also increasing.

No comments:

Post a Comment