ATL(Angular Triplet Loss)

Theory

basic theory

motivation

这个loss虽然叫做angular triplet loss，但实际上是从sphereface的想法上发展出来的。

在说动机之前，先简单的分析一下sphereface。

首先是下面这张图，这是sphereface的核心思想之一：angular（论文中）：

可以看到，origin softmax是没有将weights（分类权重）归一化的，这个我在之前的project(loss functions)中也分析过了，没有将weights归一化，那就永远属于欧式距离的范围。所以，只有将weights归一化，才能称之为cosine距离。

然后是这张图，sphereface的核心思想之二：margin（论文中）：

为什么需要这个m?就是希望它能够是样本满足 max intra-class angle < min inter-class angle。为了满足这个条件，作者推导出m=4,但是这个条件实在是太强了，强到违背了分布的规律，所以在代码中不得不加入lambda来进行调整。

在这么强的约束下，肯定是有一些样本是无法满足margin分类条件的，如果这些样本无法正确分类，然后这些样本的feature norm又很大的话，最终的loss自然会很大，所以这些样本的feature norm就会变小，这点在project(FaceRecognitionSystem)的norm select中分析过,如下图：

显然，对于上图中0038_02来说，确实很难分辨这个人是谁，m=4的要求对于他来说肯定是约束过强了。有没有一种方法能够在满足max intra-class angle < min inter-class angle的情况下同时又能自动的选择每个样本对于的margin呢？

triplet的apn理论非常的适合。如果我们normalize p,n；只更新a。这便既满足了angular又满足了triplet特性，便是angular triplet loss。

Implementation

triplet的随机选取apn的实现有些过于麻烦了，所以我直接基于sphereface的marginInnerProdect实现了一个Easy Triplet。

softmax是这样的，它要维护一个类中心（即分类层的权重），然后其他的样本去做分类。这个类中心在迭代的过程中不断更新，最终保持一种稳定的状态。

要命的是，这个类中心并不一定真的就是当类所有样本的中心位置。

所以在Easy Triplet里面，我舍弃了类中心的更新过程，直接把传进来的样本当成是新的类中心来做分类。具体实现可以参考代码。

同时，为了让收敛更快，我同样在代码里面实现了easy triplet的semihard example选择机制。

Result

Usage

caffe_layer

1.put marginInnerProduct.hpp/.cpp/.cu into caffe dir.

2.add

message LayerParameter {
...
optional MarginInnerProductParameter margin_inner_product_param = 147;
...
}

message MarginInnerProductParameter {
  optional uint32 num_output = 1; // The number of outputs for the layer
  enum MarginType {
    SINGLE = 0;
    DOUBLE = 1;
    TRIPLE = 2;
    QUADRUPLE = 3;
  }
  optional MarginType type = 2 [default = SINGLE]; 
  optional FillerParameter weight_filler = 3; // The filler for the weight

  // The first axis to be lumped into a single inner product computation;
  // all preceding axes are retained in the output.
  // May be negative to index from the end (e.g., -1 for the last axis).
  optional int32 axis = 4 [default = 1];
  optional float base = 5 [default = 1];
  optional float gamma = 6 [default = 0];
  optional float power = 7 [default = 1];
  optional int32 iteration = 8 [default = 0];
  optional float lambda_min = 9 [default = 0];
  optional bool triplet = 10 [default = false];
  optional bool semihard = 11 [default = false];
}

in your caffe.proto

alignment

use RSA for landmark detection && AlignWuXiang with ec-my-y = 40 and size = [112,112]

deploy

resface 28 layer(see in proto/face_deploy.prototxt)

train

method 1:step

step 1:

train a pre-trained model use sphereface.

step 2:

set triplet & semihard param to true.

method 2:directly train

set triplet & semihard param to true.

train with solver(see in proto/face_solver.prototxt)

trainning log will be upload soon

In my experiment,method 2 converge faster than original sphereface,so maybe there is no need for you to try different m and lambda_min if you want to get a good result.I'm still focus on this method nowadays.

final state

notice that original triplet loss also use Margin.So for small training set(like casia webface),you may set m=2 and lambda_min = 3 in Angular Triplet Loss.

Reference

1.FaceNet: A Unified Embedding for Face Recognition and Clustering

2.SphereFace: Deep Hypersphere Embedding for Face Recognition

Thanks

@happynear @deepinsight

Contributor

@KaleidoZhouYN

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
form		form
layer		layer
proto		proto
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATL(Angular Triplet Loss)

Theory

basic theory

motivation

Implementation

Result

Usage

caffe_layer

alignment

deploy

train

method 1:step

method 2:directly train

final state

Reference

Thanks

Contributor

About

Releases

Packages

Languages

KaleidoZhouYN/Angular-Triplet-Loss

Folders and files

Latest commit

History

Repository files navigation

ATL(Angular Triplet Loss)

Theory

basic theory

motivation

Implementation

Result

Usage

caffe_layer

alignment

deploy

train

method 1:step

method 2:directly train

final state

Reference

Thanks

Contributor

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages