PLDA相关代码详解
# plda.h
该PLDA模型参考文献:《Probabilistic Linear Discriminant Analysis》;E-M是大神推导出来的一种有效方法(注意:它可以变得更有效,但似乎没有必要,因为它已经非常快了);该脚本中类间协方差维度=特征维度,如果<特征维度,需要移除对角化后类间协方差矩阵的几个最小元素
# struct PldaConfig
该配置是为了在点积打分前将PLDA应用于iVectors
struct PldaConfig {
bool normalize_length;
bool simple_length_norm;
PldaConfig(): normalize_length(true), simple_length_norm(false) { }
void Register(OptionsItf *opts) {
opts->Register("normalize-length", &normalize_length,
"If true, do length normalization as part of PLDA (see "
"code for details). This does not set the length unit; "
"by default it instead ensures that the inner product "
"with the PLDA model's inverse variance (which is a "
"function of how many utterances the iVector was averaged "
"over) has the expected value, equal to the iVector "
"dimension.");
opts->Register("simple-length-normalization", &simple_length_norm,
"If true, replace the default length normalization by an "
"alternative that normalizes the length of the iVectors to "
"be equal to the square root of the iVector dimension.");
}
};
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
- 【待确认】PldaConfig(): normalize_length(true), simple_length_norm(false) { } 是指结构体初始化吗?
normalize_length
:将长度归一化作为PLDA的补一补,这不是使长度为单位值,相反地,它确保了与PLDA模型中逆方差的内积不出问题,等于iVector维度,逆方差计算了iVectors对多少utts进行了平均【语句不通,理解不了】simple_length_norm
:将iVectors长度归一化为iVector维度的平方根
# class Plda
class Plda {
public:
Plda() { } // 【不懂】编译器默认生成的隐式构造函数?可是后面explicit已经显式声明了呀?
explicit Plda(const Plda &other): // 复制构造函数
mean_(other.mean_),
transform_(other.transform_),
psi_(other.psi_),
offset_(other.offset_) {
};
double TransformIvector(const PldaConfig &config,
const VectorBase<double> &ivector,
int32 num_enroll_examples,
VectorBase<double> *transformed_ivector) const;
float TransformIvector(const PldaConfig &config,
const VectorBase<float> &ivector,
int32 num_enroll_examples,
VectorBase<float> *transformed_ivector) const;
double LogLikelihoodRatio(const VectorBase<double> &transformed_enroll_ivector,
int32 num_enroll_utts,
const VectorBase<double> &transformed_test_ivector)
const;
void SmoothWithinClassCovariance(double smoothing_factor);
void ApplyTransform(const Matrix<double> &in_transform);
int32 Dim() const { return mean_.Dim(); }
void Write(std::ostream &os, bool binary) const;
void Read(std::istream &is, bool binary);
protected:
void ComputeDerivedVars(); // computes offset_.
friend class PldaEstimator; // 声明友元类
friend class PldaUnsupervisedAdaptor;
Vector<double> mean_; // 原始空间样本的均值
Matrix<double> transform_; // Dim() by Dim(),该变换矩阵使得类内协方差单元化,类间协方差对角化
Vector<double> psi_; // Dim(),类间(对角)协方差元素,降序排列
Vector<double> offset_; // 派生变量: -1.0 * transform_ * mean_
private:
Plda &operator = (const Plda &other); // disallow assignment
double GetNormalizationFactor(const VectorBase<double> &transformed_ivector,
int32 num_examples) const;
};
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
- psi_怎么会是类间协方差对角元素呢???
# TransformIvector()
将iVector(单个向量)投射到这样的一个空间,该空间类内方差是单位的,类间方差是对角的。该函数用于将iVectors交由LogLikelihoodRatio()
前的预变换,这样不用在打分的时候重复对iVector进行变换。如果config.normalize_length == true
,将对iVector的长度进行归一化,方式:乘以一个因子,确保ivector^T inv_var ivector = dim
。此时,需要num_enroll_examples
,该值影响了iVector期望的协方差矩阵。若config.normalize_length == false
,归一化因子也会被返回,此时归一化因子已经计算好了,但是不会被使用。如果config.simple_length_normalization == true
,归一化因子会使得iVector长度等于iVector维数的平方根
# LogLikelihoodRatio()
返回似然比log (p(test_ivector | same) / p(test_ivector | different))
,transformed_enroll_ivector
是spk所有utts的均值表示。transformed_enroll_ivector
和transformed_test_ivector
默认是已经从TransformIvector()
中转换过了。注意:在计算转换后的iVectors时,已经执行过长度归一化了。
# SmoothWithinClassCovariance()
该函数使类内协方差平滑,加上smoothing_factor
(如,0.1)乘以类间协方差(通过修改transfrom_
实现)。在psi_
的主要元素过大时,若spk的utts太少,做出一些补偿,可以很好地估计出类内协方差
# ApplyTransform()
对PLDA模型进行转换。通常用于将模型的参数投影到一个更低维的空间,例如in_transform.NumRows() <= in_transform.NumCols()
,通常用于带有PCA变换的说话人分离。
# GetNormalizationFactor()
私有函数,返回归一化因子,我们必须将它与transformed_ivector
相乘,这样iVector才能有它该有的长度。我们认为transformed_ivector
是在变换后空间中的iVector(如,中心化和与transform_
相乘)。在该空间中它的协方差”应该是\Psi + I/num_examples
。
# class PldaStats
class PldaStats {
public:
PldaStats(): dim_(0) { } /// The dimension is set up the first time you add samples.
/// This function adds training samples corresponding to
/// one class (e.g. a speaker). Each row is a separate
/// sample from this group. The "weight" would normally
/// be 1.0, but you can set it to other values if you want
/// to weight your training samples.
void AddSamples(double weight,
const Matrix<double> &group);
int32 Dim() const { return dim_; }
void Init(int32 dim);
void Sort() { std::sort(class_info_.begin(), class_info_.end()); }
bool IsSorted() const;
~PldaStats(); // 析构函数
protected:
friend class PldaEstimator;
int32 dim_; // Ivector维度
int64 num_classes_; // 说话人个数
int64 num_examples_; // total number of examples, summed over classes.
double class_weight_; // total over classes, of their weight.
double example_weight_; // total over classes, of weight times #examples.
Vector<double> sum_; // Weighted sum of class means (normalize by
// class_weight_ to get mean).
SpMatrix<double> offset_scatter_; // Sum over all examples, of the weight
// times (example - class-mean).
// We have one of these objects per class.
struct ClassInfo {
double weight;
Vector<double> *mean; // owned here, but as a pointer so
// sort can be lightweight
int32 num_examples; // the number of examples in the class
bool operator < (const ClassInfo &other) const {
return (num_examples < other.num_examples);
}
ClassInfo(double weight, Vector<double> *mean, int32 num_examples):
weight(weight), mean(mean), num_examples(num_examples) { }
};
std::vector<ClassInfo> class_info_;
private:
KALDI_DISALLOW_COPY_AND_ASSIGN(PldaStats);
};
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# struct PldaEstimationConfig
struct PldaEstimationConfig {
int32 num_em_iters;
PldaEstimationConfig(): num_em_iters(10){ } // 设置迭代次数的默认值
void Register(OptionsItf *opts) {
opts->Register("num-em-iters", &num_em_iters,
"Number of iterations of E-M used for PLDA estimation");
}
};
2
3
4
5
6
7
8
# class PldaEstimator
class PldaEstimator {
public:
PldaEstimator(const PldaStats &stats);
void Estimate(const PldaEstimationConfig &config,
Plda *output);
private:
typedef PldaStats::ClassInfo ClassInfo;
/// Returns the part of the objf relating to
/// offsets from the class means. (total, not normalized)
double ComputeObjfPart1() const;
/// Returns the part of the obj relating to
/// the class means (total_not normalized)
double ComputeObjfPart2() const;
/// Returns the objective-function per sample.
double ComputeObjf() const;
int32 Dim() const { return stats_.Dim(); }
void EstimateOneIter();
void InitParameters();
void ResetPerIterStats();
// gets stats from intra-class variation (stats_.offset_scatter_).
void GetStatsFromIntraClass();
// gets part of stats relating to class means.
void GetStatsFromClassMeans();
// M-step
void EstimateFromStats();
// Copy to output.
void GetOutput(Plda *plda); // 为方便后续打分,写入转换矩阵A以及对角矩阵Psi
const PldaStats &stats_;
SpMatrix<double> within_var_; // 迭代一次后的Sigma_within
SpMatrix<double> between_var_; // 迭代一次后的Sigma_between
// These stats are reset on each iteration.
SpMatrix<double> within_var_stats_;
double within_var_count_; // count corresponding to within_var_stats_
SpMatrix<double> between_var_stats_;
double between_var_count_; // count corresponding to within_var_stats_
KALDI_DISALLOW_COPY_AND_ASSIGN(PldaEstimator);
};
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# struct PldaUnsupervisedAdaptorConfig
struct PldaUnsupervisedAdaptorConfig {
BaseFloat mean_diff_scale;
BaseFloat within_covar_scale;
BaseFloat between_covar_scale;
PldaUnsupervisedAdaptorConfig():
mean_diff_scale(1.0),
within_covar_scale(0.3),
between_covar_scale(0.7) { }
void Register(OptionsItf *opts) {
opts->Register("mean-diff-scale", &mean_diff_scale,
"Scale with which to add to the total data variance, the outer "
"product of the difference between the original mean and the "
"adaptation-data mean");
opts->Register("within-covar-scale", &within_covar_scale,
"Scale that determines how much of excess variance in a "
"particular direction gets attributed to within-class covar.");
opts->Register("between-covar-scale", &between_covar_scale,
"Scale that determines how much of excess variance in a "
"particular direction gets attributed to between-class covar.");
}
};
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
- mean-diff-scale:对OOD与InD均值之差的平方进行缩放,加到总方差中
- within_covar_scale:该因子决定了
- between_covar_scale
# class PldaUnsupervisedAdaptor
该类使用感兴趣的域(target)中未标签的iVectors,用他们的均值和方差去自适应PLDA模型到一个新域。该类也会为这种自适应形式储存统计量。
class PldaUnsupervisedAdaptor {
public:
PldaUnsupervisedAdaptor(): tot_weight_(0.0) { }
void AddStats(double weight, const Vector<double> &ivector);
void AddStats(double weight, const Vector<float> &ivector);
void UpdatePlda(const PldaUnsupervisedAdaptorConfig &config,
Plda *plda) const;
private:
double tot_weight_;
Vector<double> mean_stats_;
SpMatrix<double> variance_stats_;
};
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# AddStats()
添加统计量至该类,通常权重设为1.0
# plda.cc
# Plda
# GetNormalizationFactor()
❌没看懂归一化方法
double Plda::GetNormalizationFactor(
const VectorBase<double> &transformed_ivector,
int32 num_examples) const {
KALDI_ASSERT(num_examples > 0);
// 计算归一化因子, The covariance for an average over
// "num_examples" training iVectors equals \Psi + I/num_examples.
Vector<double> transformed_ivector_sq(transformed_ivector);
transformed_ivector_sq.ApplyPow(2.0); // 对向量所有元素求幂
// inv_covar <- 1.0 / (\Psi + I/num_examples)
Vector<double> inv_covar(psi_);
inv_covar.Add(1.0 / num_examples);
inv_covar.InvertElements(); // 对每个元素取倒数
// "transformed_ivector" should have covariance (\Psi + I/num_examples), i.e.
// within-class/num_examples plus between-class covariance. So
// transformed_ivector_sq . (I/num_examples + \Psi)^{-1} should be equal to
// the dimension.
double dot_prod = VecVec(inv_covar, transformed_ivector_sq);
return sqrt(Dim() / dot_prod);
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# TransformIvector()
对每个spk单独进行处理
double Plda::TransformIvector(const PldaConfig &config,
const VectorBase<double> &ivector,
int32 num_examples,
VectorBase<double> *transformed_ivector) const {
KALDI_ASSERT(ivector.Dim() == Dim() && transformed_ivector->Dim() == Dim());
double normalization_factor;
// iVector <- transform_ * (iVector - mean_)
transformed_ivector->CopyFromVec(offset_); // offset_ = -1.0 * transform_ * mean_
transformed_ivector->AddMatVec(1.0, transform_, kNoTrans, ivector, 1.0);
if (config.simple_length_norm)
normalization_factor = sqrt(transformed_ivector->Dim())
/ transformed_ivector->Norm(2.0);
else // here,与每个spk的注册语音数有关,修正有偏估计
normalization_factor = GetNormalizationFactor(*transformed_ivector,
num_examples);
if (config.normalize_length) // default
transformed_ivector->Scale(normalization_factor);
return normalization_factor;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# LogLikelihoodRatio()
设p表示probe,即待测样本,g表示gallery,表示注册集。有$P(u^p | u^g_{1...n}) = N (u^p | \frac{n \Psi}{n \Psi + I} \bar{u}^g, I + \frac{\Psi}{n\Psi + I})$,及$P(u^p) = N(u^p | 0, I + \Psi)$。需要计算$P(u^p | u^g_{1..n}) / P(u^p)$,即,$N(u^p | \frac{n \Psi}{n \Psi + I} \bar{u}^g, I + \frac{\Psi}{n \Psi + I}) /N(u^p | 0, I + \Psi)$。分子表示属于某一类的概率,分母表示不属于任何一类的概率。可以简化为$- 0.5 [ (u^p - m) (I + \Psi/(n \Psi + I))^{-1} (u^p - m) + logdet(I + \Psi/(n \Psi + I)) ] + 0.5 [u^p (I + \Psi) u^p + logdet(I + \Psi) ]$,其中$m = (n \Psi)/(n \Psi + I) \bar{u}^g$。
double Plda::LogLikelihoodRatio(
const VectorBase<double> &transformed_train_ivector,
int32 n, // number of training utterances.
const VectorBase<double> &transformed_test_ivector) const {
int32 dim = Dim();
double loglike_given_class, loglike_without_class;
{ // 计算属于某一类的似然概率,m = \frac{n \Psi}{n \Psi + I} \bar{u}^g
// variance = I + \frac{\Psi}{n\Psi + I}.
Vector<double> mean(dim, kUndefined);
Vector<double> variance(dim, kUndefined);
for (int32 i = 0; i < dim; i++) {
mean(i) = n * psi_(i) / (n * psi_(i) + 1.0)
* transformed_train_ivector(i);
variance(i) = 1.0 + psi_(i) / (n * psi_(i) + 1.0);
}
double logdet = variance.SumLog();
Vector<double> sqdiff(transformed_test_ivector);
sqdiff.AddVec(-1.0, mean);
sqdiff.ApplyPow(2.0);
variance.InvertElements();
loglike_given_class = -0.5 * (logdet + M_LOG_2PI * dim +
VecVec(sqdiff, variance));
}
{ // 计算不属于任何类的概率,均值为0,variance = I + \Psi.
Vector<double> sqdiff(transformed_test_ivector); // there is no offset.
sqdiff.ApplyPow(2.0);
Vector<double> variance(psi_);
variance.Add(1.0); // I + \Psi.
double logdet = variance.SumLog();
variance.InvertElements();
loglike_without_class = -0.5 * (logdet + M_LOG_2PI * dim +
VecVec(sqdiff, variance));
}
double loglike_ratio = loglike_given_class - loglike_without_class;
return loglike_ratio;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# PldaStats
# PldaEstimator
# PldaUnsupervisedAdaptor
# AddStats()
void PldaUnsupervisedAdaptor::AddStats(double weight,
const Vector<double> &ivector) {
if (mean_stats_.Dim() == 0) {
mean_stats_.Resize(ivector.Dim());
variance_stats_.Resize(ivector.Dim());
}
KALDI_ASSERT(weight >= 0.0);
tot_weight_ += weight;
mean_stats_.AddVec(weight, ivector);
variance_stats_.AddVec2(weight, ivector);
}
2
3
4
5
6
7
8
9
10
11
tot_weight_
记录的是总权重的累加【什么作用呢?如果不设置为1,就不能代表输入的utts数量】mean_stats_
:均值统计量由InD数据直接累加得到,并没有取均值variance_stats_
:同理,weight * np.matmul(ivector,ivector.T)
# UpdatePlda()
- PldaUnsupervisedAdaptor中的数据成员
tot_weight_
,mean_stats_
,variance_stats_
void PldaUnsupervisedAdaptor::UpdatePlda(const PldaUnsupervisedAdaptorConfig &config,
Plda *plda) const {
//
KALDI_ASSERT(tot_weight_ > 0.0);
int32 dim = mean_stats_.Dim();
KALDI_ASSERT(dim == plda->Dim());
Vector<double> mean(mean_stats_);
mean.Scale(1.0 / tot_weight_); // mean_stats_/tot_weight_得到InD均值
SpMatrix<double> variance(variance_stats_);
variance.Scale(1.0 / tot_weight_);
variance.AddVec2(-1.0, mean); // var[x] = E[x^2]-[E(x)]^2
// 自适应数据相对于训练数据的均值差,选择性添加于总协方差矩阵
Vector<double> mean_diff(mean);
mean_diff.AddVec(-1.0, plda->mean_); // 得到InD与OOD数据均值差
KALDI_ASSERT(config.mean_diff_scale >= 0.0);
variance.AddVec2(config.mean_diff_scale, mean_diff); // D(x) = E[x^2]-E[x]^2
plda->mean_.CopyFromVec(mean); // 更新均值
// transform_model_是plda->transform_的行缩放,
// 使得数据变换到这样一个空间C1,该空间中总协方差为1.0.
// 我们已经通过plda->transform_转变到这样的空间,该空间中类内协方差为1.0,类间协方差为diag(plda->psi_)
// 我们需要用1.0 / sqrt(1.0 + plda->psi_(i))缩放每个维度i
Matrix<double> transform_mod(plda->transform_);
for (int32 i = 0; i < dim; i++)
transform_mod.Row(i).Scale(1.0 / sqrt(1.0 + plda->psi_(i)));
// 将InD的方差投影到空间C1.
SpMatrix<double> variance_proj(dim);
variance_proj.AddMat2Sp(1.0, transform_mod, kNoTrans,
variance, 0.0); // transform_mod*variance*transform_mod^T
// 对variance_proj做特征值分解; 将得到这样的方向:InD cov>OOD cov
Matrix<double> P(dim, dim);
Vector<double> s(dim);
variance_proj.Eig(&s, &P);
SortSvd(&s, &P);
KALDI_LOG << "Eigenvalues of adaptation-data total-covariance in space where "
<< "training-data total-covariance is unit, is: " << s;
// W,B分别是OOD数据经过transform_mod变换后的空间C1中的类内和类间协方差
SpMatrix<double> W(dim), B(dim);
for (int32 i = 0; i < dim; i++) {
W(i, i) = 1.0 / (1.0 + plda->psi_(i)),
B(i, i) = plda->psi_(i) / (1.0 + plda->psi_(i));
}
// 因此variance_proj = P diag(s) P^T.【实对称矩阵的性质】
// 假设经过transform_mod投影后,再用P^T进行投影.
// 那么InD数据的方差满足P^T P diag(s) P^T P = diag(s),
// 则PLDA的类内协方差为P^T W P,类间协方差为P^T B P.
// 在该空间中我们依旧有 W+B = I.
// 首先,计算这些投影后的方差,称其为"proj2"
// 因为这是数据投影两次后的结果(实际上是转换,因为维度没有减少)
// 首先是transform_mod,然后是P^T.
SpMatrix<double> Wproj2(dim), Bproj2(dim);
Wproj2.AddMat2Sp(1.0, P, kTrans, W, 0.0);
Bproj2.AddMat2Sp(1.0, P, kTrans, B, 0.0);
Matrix<double> Ptrans(P, kTrans);
SpMatrix<double> Wproj2mod(Wproj2), Bproj2mod(Bproj2);
for (int32 i = 0; i < dim; i++) {
// 对于特征值i, 计算投影到此方向上的类内协方差,类间协方差同理
BaseFloat within = Wproj2(i, i),
between = Bproj2(i, i);
KALDI_LOG << "For " << i << "'th eigenvalue, value is " << s(i)
<< ", within-class covar in this direction is " << within
<< ", between-class is " << between;
if (s(i) > 1.0) {
double excess_eig = s(i) - 1.0;
double excess_within_covar = excess_eig * config.within_covar_scale,
excess_between_covar = excess_eig * config.between_covar_scale;
Wproj2mod(i, i) += excess_within_covar;
Bproj2mod(i, i) += excess_between_covar;
}
}
// 将"transform_mod"和 P^T 合并起来,就是{W,B}proj2{,mod}所在的空间
Matrix<double> combined_trans(dim, dim);
combined_trans.AddMatMat(1.0, Ptrans, kNoTrans,
transform_mod, kNoTrans, 0.0);
Matrix<double> combined_trans_inv(combined_trans); // ... 以及它的逆.
combined_trans_inv.Invert();
// Wmod and Bmod 类似于 Wproj2 and Bproj2,但是属于原iVector空间
SpMatrix<double> Wmod(dim), Bmod(dim);
Wmod.AddMat2Sp(1.0, combined_trans_inv, kNoTrans, Wproj2mod, 0.0); // 注意矩阵形式
Bmod.AddMat2Sp(1.0, combined_trans_inv, kNoTrans, Bproj2mod, 0.0);
TpMatrix<double> C(dim);
// 做Cholesky分解: Wmod = C C^T. 如果将 C^{-1} 作为一种变换, 将有
// C^{-1} W C^{-T} = I, 这将使得类内协方差为单位阵.
C.Cholesky(Wmod);
TpMatrix<double> Cinv(C);
Cinv.Invert();
// 通过 Cinv 进行投影,得到 Bmod_proj.
SpMatrix<double> Bmod_proj(dim);
Bmod_proj.AddTp2Sp(1.0, Cinv, kNoTrans, Bmod, 0.0);
Vector<double> psi_new(dim);
Matrix<double> Q(dim, dim);
// Do symmetric eigenvalue decomposition of Bmod_proj, so
// Bmod_proj = Q diag(psi_new) Q^T
Bmod_proj.Eig(&psi_new, &Q);
SortSvd(&psi_new, &Q);
// 这意味着,如果将 Q^T 视作变换, 则有 Q^T Bmod_proj Q = diag(psi_new)
// 所以,Q^T 使得 Bmod_proj 对角化(同时,类内协方差依旧为单位阵).
// 我们期望得到的最终变换是,从原始空间投影到新的归一化空间,即:
// first Cinv, then Q^T, i.e. the matrix Q^T Cinv.
Matrix<double> final_transform(dim, dim);
final_transform.AddMatTp(1.0, Q, kTrans, Cinv, kNoTrans, 0.0);
KALDI_LOG << "Old diagonal of between-class covar was: "
<< plda->psi_ << ", new diagonal is "
<< psi_new;
plda->transform_.CopyFromMat(final_transform);
plda->psi_.CopyFromVec(psi_new);
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
- 实对称阵每个特征值对应的特征向量都是正交的
# ivector-plda-scoring
用PLDA模型计算trials的对数似然比值,对于训练样本,输入的是人均iVector,可以用--num-utts指定注册者的语音数,如果没有,默认每人一句。
Usage: ivector-plda-scoring <plda> <train-ivector-rspecifier> <test-ivector-rspecifier> <trials-rxfilename> <scores-wxfilename>
e.g.: ivector-plda-scoring --num-utts=ark:exp/train/num_utts.ark plda ark:exp/train/spk_ivectors.ark ark:exp/test/ivectors.ark trials scores
2
#include "base/kaldi-common.h"
#include "util/common-utils.h"
#include "ivector/plda.h"
int main(int argc, char *argv[]) {
using namespace kaldi;
typedef kaldi::int32 int32;
typedef std::string string;
try {
ParseOptions po(usage); // 解析命令行
std::string num_utts_rspecifier; // 非常重要
PldaConfig plda_config;
plda_config.Register(&po); // 设置config
po.Register("num-utts", &num_utts_rspecifier, "Table to read the number of "
"utterances per speaker, e.g. ark:num_utts.ark\n");
po.Read(argc, argv);
if (po.NumArgs() != 5) {
po.PrintUsage();
exit(1);
}
std::string plda_rxfilename = po.GetArg(1), // PLDA模型
train_ivector_rspecifier = po.GetArg(2), // 注册集
test_ivector_rspecifier = po.GetArg(3), // 验证集
trials_rxfilename = po.GetArg(4), // trials文件
scores_wxfilename = po.GetArg(5); // 打分文件
// 变量初始化
// diagnostics:
double tot_test_renorm_scale = 0.0, tot_train_renorm_scale = 0.0;
int64 num_train_ivectors = 0, num_train_errs = 0, num_test_ivectors = 0;
int64 num_trials_done = 0, num_trials_err = 0;
// plda 类
Plda plda;
ReadKaldiObject(plda_rxfilename, &plda);
int32 dim = plda.Dim();
// 读文件的类
SequentialBaseFloatVectorReader train_ivector_reader(train_ivector_rspecifier);
SequentialBaseFloatVectorReader test_ivector_reader(test_ivector_rspecifier);
RandomAccessInt32Reader num_utts_reader(num_utts_rspecifier);
typedef unordered_map<string, Vector<BaseFloat>*, StringHasher> HashType;
// These hashes will contain the iVectors in the PLDA subspace
// (that makes the within-class variance unit and diagonalizes the
// between-class covariance). They will also possibly be length-normalized,
// depending on the config.
HashType train_ivectors, test_ivectors;
// 读取注册集
KALDI_LOG << "Reading train iVectors";
for (; !train_ivector_reader.Done(); train_ivector_reader.Next()) {
std::string spk = train_ivector_reader.Key();
if (train_ivectors.count(spk) != 0) {
KALDI_ERR << "Duplicate training iVector found for speaker " << spk;
}
const Vector<BaseFloat> &ivector = train_ivector_reader.Value();
int32 num_examples;
if (!num_utts_rspecifier.empty()) {
if (!num_utts_reader.HasKey(spk)) {
KALDI_WARN << "Number of utterances not given for speaker " << spk;
num_train_errs++;
continue;
}
num_examples = num_utts_reader.Value(spk);
} else {
num_examples = 1;
}
Vector<BaseFloat> *transformed_ivector = new Vector<BaseFloat>(dim);
// 做归一化,返回归一化系数
// 为什么要累加呢?计算平均归一化因子
tot_train_renorm_scale += plda.TransformIvector(plda_config, ivector,
num_examples,
transformed_ivector);
train_ivectors[spk] = transformed_ivector;
num_train_ivectors++;
}
KALDI_LOG << "Read " << num_train_ivectors << " training iVectors, "
<< "errors on " << num_train_errs;
if (num_train_ivectors == 0)
KALDI_ERR << "No training iVectors present.";
KALDI_LOG << "Average renormalization scale on training iVectors was "
<< (tot_train_renorm_scale / num_train_ivectors);
// 读取测试集
KALDI_LOG << "Reading test iVectors";
for (; !test_ivector_reader.Done(); test_ivector_reader.Next()) {
std::string utt = test_ivector_reader.Key();
if (test_ivectors.count(utt) != 0) {
KALDI_ERR << "Duplicate test iVector found for utterance " << utt;
}
const Vector<BaseFloat> &ivector = test_ivector_reader.Value();
int32 num_examples = 1; // 测试集中,该变量为固定值,主要影响归一化因子
Vector<BaseFloat> *transformed_ivector = new Vector<BaseFloat>(dim);
tot_test_renorm_scale += plda.TransformIvector(plda_config, ivector,
num_examples,
transformed_ivector);
test_ivectors[utt] = transformed_ivector;
num_test_ivectors++;
}
KALDI_LOG << "Read " << num_test_ivectors << " test iVectors.";
if (num_test_ivectors == 0)
KALDI_ERR << "No test iVectors present.";
KALDI_LOG << "Average renormalization scale on test iVectors was "
<< (tot_test_renorm_scale / num_test_ivectors);
// 读取trials
Input ki(trials_rxfilename);
bool binary = false;
Output ko(scores_wxfilename, binary);
double sum = 0.0, sumsq = 0.0;
std::string line;
while (std::getline(ki.Stream(), line)) {
std::vector<std::string> fields;
SplitStringToVector(line, " \t\n\r", true, &fields);
if (fields.size() != 2) {
KALDI_ERR << "Bad line " << (num_trials_done + num_trials_err)
<< "in input (expected two fields: key1 key2): " << line;
}
//key1 来自traing ivector,key2 来自test ivector
std::string key1 = fields[0], key2 = fields[1];
if (train_ivectors.count(key1) == 0) {
KALDI_WARN << "Key " << key1 << " not present in training iVectors.";
num_trials_err++;
continue;
}
if (test_ivectors.count(key2) == 0) {
KALDI_WARN << "Key " << key2 << " not present in test iVectors.";
num_trials_err++;
continue;
}
//将train_ivector[key]保存到*train_ivector
const Vector<BaseFloat> *train_ivector = train_ivectors[key1],
*test_ivector = test_ivectors[key2];
Vector<double> train_ivector_dbl(*train_ivector),
test_ivector_dbl(*test_ivector);
int32 num_train_examples;
if (!num_utts_rspecifier.empty()) {
// we already checked that it has this key.
num_train_examples = num_utts_reader.Value(key1);
} else {
num_train_examples = 1;
}
//打分函数
BaseFloat score = plda.LogLikelihoodRatio(train_ivector_dbl,
num_train_examples,
test_ivector_dbl);
sum += score;
sumsq += score * score;
num_trials_done++;
ko.Stream() << key1 << ' ' << key2 << ' ' << score << std::endl;
}
// 清理内存
for (HashType::iterator iter = train_ivectors.begin();
iter != train_ivectors.end(); ++iter)
delete iter->second;
for (HashType::iterator iter = test_ivectors.begin();
iter != test_ivectors.end(); ++iter)
delete iter->second;
// 计算得分均值和方差
if (num_trials_done != 0) {
BaseFloat mean = sum / num_trials_done, scatter = sumsq / num_trials_done,
variance = scatter - mean * mean, stddev = sqrt(variance);
KALDI_LOG << "Mean score was " << mean << ", standard deviation was "
<< stddev;
}
KALDI_LOG << "Processed " << num_trials_done << " trials, " << num_trials_err
<< " had errors.";
return (num_trials_done != 0 ? 0 : 1);
} catch(const std::exception &e) {
std::cerr << e.what();
return -1;
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
- 两种长度归一化方法
# ivector-compute-lda
// ivectorbin/ivector-compute-lda.cc
#include "base/kaldi-common.h"
#include "util/common-utils.h"
#include "gmm/am-diag-gmm.h"
#include "ivector/ivector-extractor.h"
#include "util/kaldi-thread.h"
namespace kaldi {
class CovarianceStats {
public:
CovarianceStats(int32 dim): tot_covar_(dim),
between_covar_(dim),
num_spk_(0),
num_utt_(0) { }
/// get total covariance, normalized per number of frames.
void GetTotalCovar(SpMatrix<double> *tot_covar) const {
KALDI_ASSERT(num_utt_ > 0);
*tot_covar = tot_covar_;
tot_covar->Scale(1.0 / num_utt_);
}
void GetWithinCovar(SpMatrix<double> *within_covar) {
KALDI_ASSERT(num_utt_ - num_spk_ > 0);
*within_covar = tot_covar_;
within_covar->AddSp(-1.0, between_covar_);
within_covar->Scale(1.0 / num_utt_);
}
void AccStats(const Matrix<double> &utts_of_this_spk) {
int32 num_utts = utts_of_this_spk.NumRows();
tot_covar_.AddMat2(1.0, utts_of_this_spk, kTrans, 1.0);
Vector<double> spk_average(Dim());
spk_average.AddRowSumMat(1.0 / num_utts, utts_of_this_spk);
between_covar_.AddVec2(num_utts, spk_average);
num_utt_ += num_utts;
num_spk_ += 1;
}
/// Will return Empty() if the within-class covariance matrix would be zero.
bool SingularTotCovar() { return (num_utt_ < Dim()); }
bool Empty() { return (num_utt_ - num_spk_ == 0); }
std::string Info() {
std::ostringstream ostr;
ostr << num_spk_ << " speakers, " << num_utt_ << " utterances. ";
return ostr.str();
}
int32 Dim() { return tot_covar_.NumRows(); }
// Use default constructor and assignment operator.
void AddStats(const CovarianceStats &other) {
tot_covar_.AddSp(1.0, other.tot_covar_);
between_covar_.AddSp(1.0, other.between_covar_);
num_spk_ += other.num_spk_;
num_utt_ += other.num_utt_;
}
private:
KALDI_DISALLOW_COPY_AND_ASSIGN(CovarianceStats);
SpMatrix<double> tot_covar_;
SpMatrix<double> between_covar_;
int32 num_spk_;
int32 num_utt_;
};
template<class Real>
void ComputeNormalizingTransform(const SpMatrix<Real> &covar,
Real floor,
MatrixBase<Real> *proj) {
int32 dim = covar.NumRows();
Matrix<Real> U(dim, dim);
Vector<Real> s(dim);
covar.Eig(&s, &U);
// Sort eigvenvalues from largest to smallest.
SortSvd(&s, &U);
// Floor eigenvalues to a small positive value.
int32 num_floored;
floor *= s(0); // Floor relative to the largest eigenvalue
s.ApplyFloor(floor, &num_floored);
if (num_floored > 0) {
KALDI_WARN << "Floored " << num_floored << " eigenvalues of covariance "
<< "to " << floor;
}
// Next two lines computes projection proj, such that
// proj * covar * proj^T = I.
s.ApplyPow(-0.5);
proj->AddDiagVecMat(1.0, s, U, kTrans, 0.0);
}
void ComputeLdaTransform(
const std::map<std::string, Vector<BaseFloat> *> &utt2ivector,
const std::map<std::string, std::vector<std::string> > &spk2utt,
BaseFloat total_covariance_factor,
BaseFloat covariance_floor,
MatrixBase<BaseFloat> *lda_out) {
KALDI_ASSERT(!utt2ivector.empty());
int32 lda_dim = lda_out->NumRows(), dim = lda_out->NumCols();
KALDI_ASSERT(dim == utt2ivector.begin()->second->Dim());
KALDI_ASSERT(lda_dim > 0 && lda_dim <= dim);
CovarianceStats stats(dim);
std::map<std::string, std::vector<std::string> >::const_iterator iter;
for (iter = spk2utt.begin(); iter != spk2utt.end(); ++iter) {
const std::vector<std::string> &uttlist = iter->second;
KALDI_ASSERT(!uttlist.empty());
int32 N = uttlist.size(); // number of utterances.
Matrix<double> utts_of_this_spk(N, dim);
for (int32 n = 0; n < N; n++) {
std::string utt = uttlist[n];
KALDI_ASSERT(utt2ivector.count(utt) != 0);
utts_of_this_spk.Row(n).CopyFromVec(
*(utt2ivector.find(utt)->second));
}
stats.AccStats(utts_of_this_spk);
}
KALDI_LOG << "Stats have " << stats.Info();
KALDI_ASSERT(!stats.Empty());
KALDI_ASSERT(!stats.SingularTotCovar() &&
"Too little data for iVector dimension.");
SpMatrix<double> total_covar;
stats.GetTotalCovar(&total_covar);
SpMatrix<double> within_covar;
stats.GetWithinCovar(&within_covar);
SpMatrix<double> mat_to_normalize(dim);
mat_to_normalize.AddSp(total_covariance_factor, total_covar);
mat_to_normalize.AddSp(1.0 - total_covariance_factor, within_covar);
Matrix<double> T(dim, dim);
ComputeNormalizingTransform(mat_to_normalize,
static_cast<double>(covariance_floor), &T);
SpMatrix<double> between_covar(total_covar);
between_covar.AddSp(-1.0, within_covar);
SpMatrix<double> between_covar_proj(dim);
between_covar_proj.AddMat2Sp(1.0, T, kNoTrans, between_covar, 0.0);
Matrix<double> U(dim, dim);
Vector<double> s(dim);
between_covar_proj.Eig(&s, &U);
bool sort_on_absolute_value = false; // any negative ones will go last (they
// shouldn't exist anyway so doesn't
// really matter)
SortSvd(&s, &U, static_cast<Matrix<double>*>(NULL),
sort_on_absolute_value);
KALDI_LOG << "Singular values of between-class covariance after projecting "
<< "with interpolated [total/within] covariance with a weight of "
<< total_covariance_factor << " on the total covariance, are: " << s;
// U^T is the transform that will diagonalize the between-class covariance.
// U_part is just the part of U that corresponds to the kept dimensions.
SubMatrix<double> U_part(U, 0, dim, 0, lda_dim);
// We first transform by T and then by U_part^T. This means T
// goes on the right.
Matrix<double> temp(lda_dim, dim);
temp.AddMatMat(1.0, U_part, kTrans, T, kNoTrans, 0.0);
lda_out->CopyFromMat(temp);
}
void ComputeAndSubtractMean(
std::map<std::string, Vector<BaseFloat> *> utt2ivector,
Vector<BaseFloat> *mean_out) {
int32 dim = utt2ivector.begin()->second->Dim();
size_t num_ivectors = utt2ivector.size();
Vector<double> mean(dim);
std::map<std::string, Vector<BaseFloat> *>::iterator iter;
for (iter = utt2ivector.begin(); iter != utt2ivector.end(); ++iter)
mean.AddVec(1.0 / num_ivectors, *(iter->second));
mean_out->Resize(dim);
mean_out->CopyFromVec(mean);
for (iter = utt2ivector.begin(); iter != utt2ivector.end(); ++iter)
iter->second->AddVec(-1.0, *mean_out);
}
}
int main(int argc, char *argv[]) {
using namespace kaldi;
typedef kaldi::int32 int32;
try {
const char *usage =
"Compute an LDA matrix for iVector system. Reads in iVectors per utterance,\n"
"and an utt2spk file which it uses to help work out the within-speaker and\n"
"between-speaker covariance matrices. Outputs an LDA projection to a\n"
"specified dimension. By default it will normalize so that the projected\n"
"within-class covariance is unit, but if you set --normalize-total-covariance\n"
"to true, it will normalize the total covariance.\n"
"Note: the transform we produce is actually an affine transform which will\n"
"also set the global mean to zero.\n"
"\n"
"Usage: ivector-compute-lda [options] <ivector-rspecifier> <utt2spk-rspecifier> "
"<lda-matrix-out>\n"
"e.g.: \n"
" ivector-compute-lda ark:ivectors.ark ark:utt2spk lda.mat\n";
ParseOptions po(usage);
int32 lda_dim = 100; // Dimension we reduce to
BaseFloat total_covariance_factor = 0.0,
covariance_floor = 1.0e-06;
bool binary = true;
po.Register("dim", &lda_dim, "Dimension we keep with the LDA transform");
po.Register("total-covariance-factor", &total_covariance_factor,
"If this is 0.0 we normalize to make the within-class covariance "
"unit; if 1.0, the total covariance; if between, we normalize "
"an interpolated matrix.");
po.Register("covariance-floor", &covariance_floor, "Floor the eigenvalues "
"of the interpolated covariance matrix to the product of its "
"largest eigenvalue and this number.");
po.Register("binary", &binary, "Write output in binary mode");
po.Read(argc, argv);
if (po.NumArgs() != 3) {
po.PrintUsage();
exit(1);
}
std::string ivector_rspecifier = po.GetArg(1),
utt2spk_rspecifier = po.GetArg(2),
lda_wxfilename = po.GetArg(3);
KALDI_ASSERT(covariance_floor >= 0.0);
int32 num_done = 0, num_err = 0, dim = 0;
SequentialBaseFloatVectorReader ivector_reader(ivector_rspecifier);
RandomAccessTokenReader utt2spk_reader(utt2spk_rspecifier);
std::map<std::string, Vector<BaseFloat> *> utt2ivector;
std::map<std::string, std::vector<std::string> > spk2utt;
for (; !ivector_reader.Done(); ivector_reader.Next()) {
std::string utt = ivector_reader.Key();
const Vector<BaseFloat> &ivector = ivector_reader.Value();
if (utt2ivector.count(utt) != 0) {
KALDI_WARN << "Duplicate iVector found for utterance " << utt
<< ", ignoring it.";
num_err++;
continue;
}
if (!utt2spk_reader.HasKey(utt)) {
KALDI_WARN << "utt2spk has no entry for utterance " << utt
<< ", skipping it.";
num_err++;
continue;
}
std::string spk = utt2spk_reader.Value(utt);
utt2ivector[utt] = new Vector<BaseFloat>(ivector);
if (dim == 0) {
dim = ivector.Dim();
} else {
KALDI_ASSERT(dim == ivector.Dim() && "iVector dimension mismatch");
}
spk2utt[spk].push_back(utt);
num_done++;
}
KALDI_LOG << "Read " << num_done << " utterances, "
<< num_err << " with errors.";
if (num_done == 0) {
KALDI_ERR << "Did not read any utterances.";
} else {
KALDI_LOG << "Computing within-class covariance.";
}
Vector<BaseFloat> mean;
ComputeAndSubtractMean(utt2ivector, &mean);
KALDI_LOG << "2-norm of iVector mean is " << mean.Norm(2.0);
Matrix<BaseFloat> lda_mat(lda_dim, dim + 1); // LDA matrix without the offset term.
SubMatrix<BaseFloat> linear_part(lda_mat, 0, lda_dim, 0, dim);
ComputeLdaTransform(utt2ivector,
spk2utt,
total_covariance_factor,
covariance_floor,
&linear_part);
Vector<BaseFloat> offset(lda_dim);
offset.AddMatVec(-1.0, linear_part, kNoTrans, mean, 0.0);
lda_mat.CopyColFromVec(offset, dim); // add mean-offset to transform
KALDI_VLOG(2) << "2-norm of transformed iVector mean is "
<< offset.Norm(2.0);
WriteKaldiObject(lda_mat, lda_wxfilename, binary);
KALDI_LOG << "Wrote LDA transform to "
<< PrintableWxfilename(lda_wxfilename);
std::map<std::string, Vector<BaseFloat> *>::iterator iter;
for (iter = utt2ivector.begin(); iter != utt2ivector.end(); ++iter)
delete iter->second;
utt2ivector.clear();
return 0;
} catch(const std::exception &e) {
std::cerr << e.what();
return -1;
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312