-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CTC (Connectionist Temporal Classification) Implementation #4681
base: master
Are you sure you want to change the base?
Conversation
4f614f7
to
c7dd6e7
Compare
…ests. Added reverse layer (usefull for bidirectional recurrent layers, e.g. BLSTM), finished working on CTC-Loss-Layer, more tests. Separated forward and backward pass by introducing new intermediate variables (e.g. alpha and beta). CTCDecoderLayer: added scores and optional accuracy as top blobs. Implemented CTCDecoderLayerTest for GreedyDecoder. Added parameters to ctc decoder layer into proto. Added dummy example to ctc examples. Added an example to show the progress of learning. Fixed lint errors, made layout changes
…ces that are terminated by -1. Seperated header and src files code
Hi, I tried this CTC with 'image captioning' example, but the loss didn't decrease and always told me 'no valid path found'. So is there a specific format details about the input data? |
I tried to explain the input data layout in the class description of include/caffe/layers/ctc_loss_layer.hpp. |
Hi, I ran your example, I want to use LSTM+CTC to recognise English words, but it is not very clear how to do.Could you give me some suggestions? Thank you very much. |
A small problem with the BLSTM part of the code : as of right now, the sequence indicators must be complete (span the whole sequence) to effectively feed the information into the reversed layers. |
@Jenkyrados True! I already handled this issue by adding a "ReverseTimeLayer" that correctly reverses the sequence depending on its length. The ReverseLayer can be removed or considered as a 'Mirror' that can't be used to reverse sequences with different lengths but instead to mirror images, etc. |
Hmm, curious how you did it, is it in a branch of your fork? |
@Jenkyrados Have a look in my warp-ctc branch https://github.com/ChWick/caffe/tree/warp-ctc. It also adds a WarpCTC layer that wraps https://github.com/baidu-research/warp-ctc. |
Looks good. Thanks a bunch! |
In warp_ctc_layer.cpp, #include <warp_ctc/ctcpp.h>. But I couldn't find the "warp_ctc" folder or ctcpp.h in this branch. |
@06221098 If you want to use warp-ctc with caffe, you need to use the dev branch of my warp-ctc fork: https://github.com/ChWick/warp-ctc/tree/develop It adds support for c++ and templates as required in caffe. You can compile wrap_ctc as standalone shared library (cmake && make install) and add the path to the installation in caffe |
@ChWick Thank you very much for your prompt response. I am trying as you said. |
@ChWick Hi, ChWick. I compiled the warp-ctc-develop followed the following steps: yyy@node5: /home/yyy/warp-ctc-develop/src/ctcpp_entrypoint.cu(1): error: expected a ";" 2 errors detected in the compilation of "/tmp/tmpxft_00000e27_00000000-16_ctcpp_entrypoint.compute_52.cpp1.ii". make[2]: *** [CMakeFiles/warpctc.dir/src/./warpctc_generated_ctcpp_entrypoint.cu.o] Error 1 |
@ChWick Sorry to bother you. I have solved my problems. Thank you very much. |
@ChWick Excuse me, can you give a simple usage for CTCGreedyDecoderLayer? I use this layer on the top of a net, but I get a strange output, I suppose I used it in a wrong way...... |
@shengyudingli
Provide the output of you last InnerProductLayer as first blob (probabilities), the sequence indicators for the LTSM Layers as the second blob (for computing the input sequence lengths). Optionally you can provide a target_sequence blob as 3rd bottom blob for computing the scores/accuracy. |
I implemented a basic CTC algorithm for Caffe: CTCLossLayer for loss and gradient calculation. CTCDecoderLayer for decoding, only a greedy is implemented though.
It is based on the implementation of tensorflow and the paper of A. Graves. Since I mostly transcribed the code of tensorflow you should check if there are any copyright issues.
Moreover, I implemented an additional ReverseLayer which I use for bidirectional recurrent nets (e.g. BLSTMs).
I added a dummy example that shows the basic functionality (besides the tests) by overfitting dummy data.