# 分布域对齐
# Return of Frustratingly Easy Domain Adaptation[2015]
- 在计算机视觉中引入CORAL
- 探讨了与特征归一化、流形法和最大化均值差异MMD的差异性
- However, these approaches only align the bases of the subspaces, not the distribution of the projected points. They also require expensive subspace projection and hyperparameter selection.【其他非监督方法只关注了子空间基的对齐,但没有关注数据在子空间上的分布】
- An alternative approach would be whitening the target and then re-coloring it with the source covariance. However, as demonstrated in (Harel and Mannor 2011; Fernando et al. 2013) and our experiments, transforming data from source to target space gives better performance.This might be due to the fact that by transforming the source to target space the classifier was trained using both the label information from the source and the unlabelled structure from the target.【解释了为什么不从target转换到source】
- For a linear classifier$f_{\vec{w}}(I)=\vec{w}^{T} \phi(I)$, we can apply an equivalent transformation to the parameter vector $\vec{w}^{T}$ instead of the features $u$. This results in added efficiency when the number of classifiers is small but the number and dimensionality of target examples is very high.【将对于特征的变换转化为对分类器模型参数的变换,适用于分类器少而数据维度高的情况】
- Relationship to Feature Normalization: In this example, although the features are normalized to have zero mean and unit variance in each dimension, the differences in correlations present in the source and target domains cause the distributions to be different.
- CORAL avoids subspace projection, which can be costly and requires selecting the hyper-parameter that controls the dimensionality of the subspace.【CORAL优于低维流形的原因】
- Intuitively, symmetric transformations find a space that “ignores” the differences between the source and target domain while asymmetric transformations try to “bridge” the two domains.【source与target间如果是对称转换,就像是在减小差异一样,如果是非对称转换,就像是连接两个域,MMD是对称转换】
- 在神经网络中,每一层特征都有会域迁移的问题,batch normalization尽管将每一层进行标准化,但抹去了两个域的数据分布特点,所以CORAL也可以用于神经网络
# Deep CORAL: Correlation Alignment for Deep Domain Adaptation[2016]
- However, it relies on a linear transformation and is not end-to-end trainable: it needs to first extract features, apply the transformation, and then train an SVM classifier in a separate step.【CORAL的缺点】
- In this work, we extend CORAL to incorporate it directly into deep networks by constructing a differentiable loss function that minimizes the difference between source and target correlations–the CORAL loss.【将原先的线性转换变为非线性转换】
- Our proposed Deep CORAL approach is similar to DDC, DAN, and ReverseGrad in the sense that a new loss (CORAL loss) is added to minimize the difference in learned feature covariances across domains, which is similar to minimizing MMD with a polynomial kernel. 【类似于最小化MMD,本文最小化协方差差异】
- However, it is more powerful than DDC (which aligns sample means only), much simpler to optimize than DAN and ReverseGrad, and can be integrated into different layers or architectures seamlessly.【MMD属于一阶统计量对齐,CORAL高级点,二阶统计量对齐】
- As mentioned before, the final deep features need to be both discriminative enough to train a strong classifier and invariant to the difference between source and target domains.【该网络既要分类准确,也需有域鲁棒性】
- DLID [1] trains a joint source and target CNN architecture with two adaptation layers. DDC [23] applies a single linear kernel to one layer to minimize Maximum Mean Discrepancy (MMD) while DAN [13] minimizes MMD with multiple kernels applied to multiple layers. ReverseGrad [5] and Domain- Confusion [22] add a binary classifier to explicitly confuse the two domains.【几种神经网络算法】
# Discriminative Feature Alignment: Improving Transferability of Unsupervised Domain Adaptation by Gaussian-guided Latent Alignment[2020]
# 域不变特征
# 🛠️Unsupervised Domain Adaptation by Backpropagation[2014]
gradient reversal layer:在正向传播期间保持输入不变,并在反向传播期间通过将梯度乘以一个负标量来反转梯度
$$\begin{gathered} E\left(\theta_{f}, \theta_{y}, \theta_{d}\right)=\sum_{i=1 . . N \atop d_{i}=0} L_{y}\left(G_{y}\left(G_{f}\left(\mathrm{x}{i} ; \theta{f}\right) ; \theta_{y}\right), y_{i}\right)- \ \lambda \sum_{i=1 . . N} L_{d}\left(G_{d}\left(G_{f}\left(\mathrm{x}{i} ; \theta{f}\right) ; \theta_{d}\right), y_{i}\right)= \ =\sum_{i=1 . . N} L_{y}^{i}\left(\theta_{f}, \theta_{y}\right)-\lambda \sum_{i=1 . . N} L_{d}^{i}\left(\theta_{f}, \theta_{d}\right) \end{gathered}$$
$$\begin{gathered} \left(\hat{\theta}{f}, \hat{\theta}{y}\right)=\arg \min {\theta{f}, \theta_{y}} E\left(\theta_{f}, \theta_{y}, \hat{\theta}{d}\right) \ \hat{\theta}{d}=\arg \max {\theta{d}} E\left(\hat{\theta}{f}, \hat{\theta}{y}, \theta_{d}\right) \end{gathered}$$
GRL的实现:定义torch.autograd.Function的子类,自己定义某些操作,且定义反向求导函数_tsq292978891的博客-CSDN博客 (opens new window)
# Deep Domain Confusion: Maximizing for Domain Invariance[2014]
adaptation layer+domain confusion loss:基于maximum mean discrepancy(MMD)得到的
domain confusion可以用于选择适应层的维数,也可以在预训练的CNN架构中为一个新的适应层选择有效的位置,并微调
$$\mathcal{L}=\mathcal{L}{C}\left(X{L}, y\right)+\lambda \operatorname{MMD}^{2}\left(X_{S}, X_{T}\right)$$
# Simultaneous Deep Transfer Across Domains and Tasks[2015]
- 本文同时优化域不变性以促进域转移,并使用软标签分布匹配损失在任务间传递信息
# Adversarial Discriminative Domain Adaptation[2017]
Adversarial Discriminative Domain Adaption 阅读笔记_sinat_29381299的博客-CSDN博客 (opens new window)
Adversarial Discriminative Domain Adaptation阅读笔记 - 简书 (jianshu.com) (opens new window)
本文提出的ADDA结合了判别性建模,解绑的权重共享(对称/非对称)和GAN loss
source domain有标签,所以通过监督式的loss可以得到映射,但是target domain没有,所以需要将映射参数化
通常target mapping与source mapping结构是一致的,但是大家会对mapping提出各种约束,希望可以在映射后,source与target的距离会最小化,同时还要满足target是可分的
$$\begin{array}{r} \mathcal{L}{\mathrm{adv}{D}}\left(\mathbf{X}{s}, \mathbf{X}{t}, M_{s}, M_{t}\right)= \ -\mathbb{E}{\mathbf{x}{s} \sim \mathbf{X}{s}}\left[\log D\left(M{s}\left(\mathbf{x}{s}\right)\right)\right] \ -\mathbb{E}{\mathbf{x}{t} \sim \mathbf{X}{t}}\left[\log \left(1-D\left(M_{t}\left(\mathbf{x}_{t}\right)\right)\right)\right] \end{array}$$
$$(√)\mathcal{L}{\mathrm{adv}{M}}\left(\mathbf{X}{s}, \mathbf{X}{t}, D\right)=-\mathbb{E}{\mathbf{x}{t} \sim \mathrm{X}{t}}\left[\log D\left(M{t}\left(\mathrm{x}_{t}\right)\right)\right]$$