-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add the LinearChainCrf operator. #5084
Conversation
paddle/framework/tensor_impl.h
Outdated
PADDLE_ENFORCE_LT(begin_idx, end_idx, | ||
"Begin index must be less than end index."); | ||
"The start row index must be less than the end row index."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/less/smaller/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Right now, we just bet that sum won't be zero. If this really happens, we | ||
// will figure out what should be done then. | ||
PADDLE_ENFORCE(sum, | ||
"The unnormalized probabilites of all possible unfinished " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not probabilities. Normalization is performed for numerical stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sum is used to normalize one row of the alpha to avoid overflow or underflow, but one row of Alpha(in forward computation)/Beta(in backward computation) is meaningful (alpha dotmul beta is also meanningful).
Beta is calculated in "reverse order" with compared to Alpha (not mathmatically accurate). They all caputres unnormalzied probabilities for unfinished seqences. The accumulation of this sum come to the normalizion factor.
I try to make the warning information meaningful.
PADDLE_ENFORCE(sum, | ||
"The unnormalized probabilites of all possible unfinished " | ||
"sequences must be greater than 0."); | ||
for (size_t i = 0; i < len; ++i) x[i] /= sum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should retain Wei Xu's code here. Because multiplication needs less CPU cycles than division.
"batch. A two dimensional tensor with shape [N x D], " | ||
"denoted as \f$\alpha\f$. \f$\alpha$\f is a memo table used to " | ||
"calculate the normalization factor in CRF. \f$\alpha[k, v]$\f stores " | ||
"the unnormalized probabilites of all possible unfinished sequences of " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ``unnormalized probabilites'' is not a proper interpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing you are referring to this paper: http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf
It would be valuable to cite this in the comments for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are quite right. It is this paper. I will add it. Thank you very much for all your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I add this into the operator's comments.
"tensor with shape [S x 1], where S is the sequence number in a " | ||
"mini-batch. " | ||
"Note: S is equal to the sequence number in a mini-batch. The " | ||
"output " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge line 91 and 92 to a single line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
beta_value[k * tag_num + i] = sum; | ||
} | ||
NormalizeL1<T>(beta_value + k * tag_num, tag_num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore comment "// normalizeL1 is to avoid underflow or overflow at (**)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
auto alpha_mat = EigenMatrix<T>::From(*alpha); | ||
auto beta_mat = EigenMatrix<T>::From(*beta); | ||
auto x_grad_mat = EigenMatrix<T>::From(*emission_grad); | ||
auto* place = ctx.GetEigenDevice<platform::CPUPlace>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change auto*
to auto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using auto* is suggested by our style.
auto row_sum = prob.sum(Eigen::DSizes<int, 1>(1)) | ||
.reshape(Eigen::DSizes<int, 2>(seq_length, 1)) | ||
.broadcast(Eigen::DSizes<int, 2>(1, tag_num)); | ||
x_grad_mat.device(*place) = prob / row_sum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change *place
to place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto* is suggested by our style, so I do not change this.
# the linear_chain_crf operator only supports sequence (LoD level = 1) | ||
lod = [[0]] | ||
for i in range(SEQ_NUM): | ||
lod[-1].append(lod[-1][-1] + random.randint(1, MAX_SEQ_LEN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sequence of length 1, 2 and 3 must be included in the test. These are high-risk boundary conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested all these cases offline. Actually, I agree the unit test needs some fixed boundary cases. I will fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I will fix the unit test in the next PR, for the unit test in the latest develop branch framework seems to have a certain problem.
from op_test import OpTest | ||
|
||
|
||
class LinearChainCrfForward(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test case of LinearChainCrfForward() should also be included to make sure the implementation is correct. Furthermore, it is also useful for detecting unexpected errors introduced by future modifications. Reference: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/test_CRFLayerGrad.cpp#L56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The forward computation has already been added in the unit test. I can how the original codes do the check later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow comments.
paddle/framework/tensor_impl.h
Outdated
PADDLE_ENFORCE_LT(begin_idx, end_idx, | ||
"Begin index must be less than end index."); | ||
"The start row index must be less than the end row index."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Right now, we just bet that sum won't be zero. If this really happens, we | ||
// will figure out what should be done then. | ||
PADDLE_ENFORCE(sum, | ||
"The unnormalized probabilites of all possible unfinished " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sum is used to normalize one row of the alpha to avoid overflow or underflow, but one row of Alpha(in forward computation)/Beta(in backward computation) is meaningful (alpha dotmul beta is also meanningful).
Beta is calculated in "reverse order" with compared to Alpha (not mathmatically accurate). They all caputres unnormalzied probabilities for unfinished seqences. The accumulation of this sum come to the normalizion factor.
I try to make the warning information meaningful.
"tensor with shape [S x 1], where S is the sequence number in a " | ||
"mini-batch. " | ||
"Note: S is equal to the sequence number in a mini-batch. The " | ||
"output " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Linear chain CRF is a special case of CRF that is useful for sequence labeling | ||
task. Sequence labeling tasks do not assume a lot of conditional | ||
independences among inputs. They only concern about the input and the output | ||
being linear sequences. Thus, the graph model of CRF is a simple chain or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
: public framework::OpKernel<T> { | ||
public: | ||
void Compute(const framework::ExecutionContext& ctx) const override { | ||
PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I am doing this. I will make this operator running on GPU by simply copy the inputs to GPU.
|
||
// Beta is the memo table used in dynamic programming to calculate the | ||
// backwark vectors. For a backward vector i (the i-th row of beta), it | ||
// captures the unnormalized probabilities of partial sequences starting at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained above, each element in alpha and beta, and also alpha dotmul beta are meaningful. I try to make the comments meaningful.
for (size_t i = 0; i < tag_num; ++i) { | ||
T sum = 0.; | ||
for (size_t j = 0; j < tag_num; ++j) { | ||
sum += w_exps[(i + state_trans_base_idx) * tag_num + j] * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
beta_value[k * tag_num + i] = sum; | ||
} | ||
NormalizeL1<T>(beta_value + k * tag_num, tag_num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
from op_test import OpTest | ||
|
||
|
||
class LinearChainCrfForward(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The forward computation has already been added in the unit test. I can how the original codes do the check later.
# the linear_chain_crf operator only supports sequence (LoD level = 1) | ||
lod = [[0]] | ||
for i in range(SEQ_NUM): | ||
lod[-1].append(lod[-1][-1] + random.randint(1, MAX_SEQ_LEN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested all these cases offline. Actually, I agree the unit test needs some fixed boundary cases. I will fix this.
// resized to its correct size in the function Compute. | ||
ctx->SetOutputDim("LogLikelihood", {emission_dims[0], 1}); | ||
|
||
ctx->ShareLoD("Emission", /*->*/ "EmissionExps"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the output is not a LoDTensor, this line can be removed.
PADDLE_ENFORCE(ctx->HasOutput(framework::GradVarName("Emission")), | ||
"Output(Emission@GRAD) should be not null."); | ||
PADDLE_ENFORCE(ctx->HasOutput(framework::GradVarName("Transition")), | ||
"Output(Transition@GRAD) should be not null."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these two checks. The input gradients is not to be calculated at any time. Just like the old framework, some times, one branch of the network does not need to do backward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
ctx->SetOutputDim(framework::GradVarName("Emission"), emission_exps_dims); | ||
ctx->SetOutputDim(framework::GradVarName("Transition"), | ||
transition_exps_dims); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (ctx->HasOutput(framework::GradVarName("Emission")) {
...
}
if (ctx->HasOutput(framework::GradVarName("Transition")) {
...
}
see mul_op.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
const Tensor* transition_exps, const Tensor* alpha, | ||
const Tensor* label, Tensor* beta, | ||
Tensor* transition_grad, | ||
Tensor* emission_grad) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the const input arguments: const Tensor&
,
For the output arguments: Tensor*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you for pointing this out.
@@ -0,0 +1,54 @@ | |||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this header file also be used for the CRF decoding code? If not, it seems this header file can be removed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After our discussion. I will keep this header and move the function Compute
into the header file and add GPU kernel function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
a8dea8f
to
0bcfc00
Compare
for (size_t i = 0; i < tag_num; ++i) { | ||
T sum = 0.; | ||
for (size_t j = 0; j < tag_num; ++j) { | ||
sum += alpha_value[(k - 1) * tag_num + j] * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wei's add a comment "// (*)" at the end of this line. Line 321 refers to this line by this mark.
for (size_t i = 0; i < tag_num; ++i) { | ||
T sum = 0.; | ||
for (size_t j = 0; j < tag_num; ++j) { | ||
sum += w_exps[(i + state_trans_base_idx) * tag_num + j] * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wei added a mark "// (**)" at the end of this line, which is referred by line 495.
framework::make_ddim({static_cast<int>(batch_size), 1}), | ||
platform::CPUPlace()); | ||
|
||
auto place = ctx.GetEigenDevice<platform::CPUPlace>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change auto
to auto*
, and change place
to *place
accordingly in the following lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to @pengli09 comments, there are some style issues. Apart from that the PR looks very solid. Great Job!
paddle/framework/tensor_impl.h
Outdated
PADDLE_ENFORCE_LE(end_idx, dims_[0], "The end row index is out of bound."); | ||
PADDLE_ENFORCE_LT( | ||
begin_idx, end_idx, | ||
"The start row index must be smaller than the end row index."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smaller -> lesser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"See more details in the operator's comments."); | ||
AddInput( | ||
"Label", | ||
"(LoDTensor, default: LoDTensor<int>). The groundtruth which is a 2-D " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a space in groundtruth. It should be ground truth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you.
"Emission", | ||
"(LoDTensor, default: LoDTensor<float>). " | ||
"The unscaled emission weight matrix for the linear chain CRF. " | ||
"This input is a LoDTensor with shape [N x D] where N is the total " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply write that N is the size of the minibatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is no problem. Thank you. Done.
"batch. A two dimensional tensor with shape [N x D], " | ||
"denoted as \f$\alpha\f$. \f$\alpha$\f is a memo table used to " | ||
"calculate the normalization factor in CRF. \f$\alpha[k, v]$\f stores " | ||
"the unnormalized probabilites of all possible unfinished sequences of " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing you are referring to this paper: http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf
It would be valuable to cite this in the comments for reference.
|
||
Linear chain CRF is a special case of CRF that is useful for sequence labeling | ||
task. Sequence labeling tasks do not assume a lot of conditional | ||
independences among inputs. They only concern about the input and the output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can write this sentence as: The only constraint they impose is that the input and output must be linear sequences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Done.
ctx->SetOutputDim("Alpha", emission_dims); | ||
ctx->SetOutputDim("EmissionExps", emission_dims); | ||
ctx->SetOutputDim("TransitionExps", transition_dims); | ||
// (TODO caoying) This is tricky. The 1st dimension of Output(LogLikelihood) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can follow the Google C++ stye guide for TODO comments. http://google.github.io/styleguide/cppguide.html#TODO_Comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thank you.
|
||
protected: | ||
// Explicitly set that the data type of output of the linear_chain_crf_grad | ||
// operator is determined by its input: graidents of LogLikelihood. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: graidents -> gradients
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you.
} | ||
} // namespace | ||
|
||
using framework::LoDTensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This violates the Google Style Guide. For more details check out https://google.github.io/styleguide/cppguide.html#Namespaces
auto* label = ctx.Input<LoDTensor>("Label"); | ||
|
||
auto in_lod = emission_weights->lod(); | ||
PADDLE_ENFORCE(in_lod.size(), "Input(Emission) is not a sequence."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message language could be Input(Emission) must be a sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
auto in_lod = emission_weights->lod(); | ||
PADDLE_ENFORCE(in_lod.size(), "Input(Emission) is not a sequence."); | ||
|
||
// TODO(caoying) The checks related to LoD information should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can follow the Google C++ stye guide for TODO comments. http://google.github.io/styleguide/cppguide.html#TODO_Comments
I have not finished yet, so, please do not begin the review. Thank you all. |
cdc0db5
to
e95a4c7
Compare
a1a21bb
to
b990c32
Compare
Thank you all for all of the comments ~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve this PR. The enhancements if have can be done in next PR.
if (platform::is_gpu_place(ctx.GetPlace())) { | ||
emission_weights = &emission_weight_tensor; | ||
transition_weights = &transition_weight_tensor; | ||
label = &label_tensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For short lines code, the LoDTensor xx
and Tensor xx
from line 72 to line 86 can be removed, then use new LoDTensor()
and new Tensor()
here for the pointers from line 72 to line 86. And delete these pointers at last.
|
||
Tensor emission_row_max; | ||
emission_row_max.mutable_data<T>( | ||
framework::make_ddim({static_cast<int>(batch_size), 1}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_cast<int64_t>
The number type in DDim
is int64_t
.
@@ -38,8 +39,8 @@ class CrossEntropyOp : public framework::OperatorWithKernel { | |||
"If Attr(soft_label) == true, the 2nd dimension of " | |||
"Input(X) and Input(Label) should be equal."); | |||
} else { | |||
PADDLE_ENFORCE_EQ(label_dims[1], 1, | |||
"If Attr(soft_label) == false, the 2nd dimension of " | |||
PADDLE_ENFORCE_EQ(label_dims[1], 1UL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dimension type is int64_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve as requested by @lcy-seso . However, there are still a few places need to be updated/fixed. Please revise the code as soon as possible once you have time.
I will enhance the unittest, and refine the codes in the following PR. Thanks for all the comments. |
fixes #4168
fixes #5250
Currently, this operator only runs on CPU.
2. Currently, the linear chain CRF operator does not backpropagate the gradients it receives, so it can only be used at the end of the network. This can be easily fixed.- It is better to use double in unitest for this operator. But, currently, I come into a coredump and I haven't checked it yet. I will fix this later.