is the feature vector speech segment used
to made model
is the cardinality of a set and ?W?
is the weighted sum of covariance matrices Cov (Ac). So, only the
partition is selected during the final clustering this result in minimum WCD
2.5.3 Issues Related
There are many other
issues need attention in speaker clustering. A uniform model for all segments from
the cluster belonging could be built called average linkage but this is quite
expensive in terms of computational costs and furthermore suggest that some
other form of linkage for segments or model in the cluster may even more
suitable-complete linkage used to compute the distance between the two cluster
for individual points.
(Sander and Ester, 2000)
single linkage on other hand use the distance of the nearest pair of each to
represent the distance of pair.
training process should be initializing with some reference point
. The expected maximization (EM)
algorithm will help to identify a local maximum likelihood regardless of the
starting point but likelihood equation for GMM has many starting point and
maxima models that give different maxima K-mean and K-mean++ are some of the initialization
employed but unfortunately maximum of them are not up to the mark and take lots
of iterations to coverage 74.
from above Fig. 2.5 when training a
nodal variance GMM it has been found that variance elements become small in
magnitude which is particularly true for a mixture model with a large number of
component densities (?32). Such small variances generates a singularity inside
the likelihood function of model and Detroit identification performance by
distorting speaker model score used in maximum likelihood classifier. To avoid
this problem which will cause numerical instability, a maximum variance value
on elements of all variance vectors is added in a speaker’s model.
2.6 Training of
model is used to collect speech feature numerical data in large quantities expressed
in term of parameters and is very important in refinement of various speech
classes and acoustic model in speech recognition build on the base of Hidden
Markov Models (HMM). Acoustic model are used in evaluating probability from
speech to an acoustic unit or an acoustic hypothesis and language model are
used to identify the probability of word sequence. Most of the recognition
systems follow HMM as the acoustic modelling rule.