-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Linear Chain Conditional Random Field #4090
Comments
I would find this useful. |
Ok, great! Before I make the pull request I try to figure out the implementation in tensorflow in order to get the ChainCRF going with both backends. |
I'm interesting,I would find this very useful. |
Following as well. I was about to embark on an implementation myself. (May still just for the exercise, but if you have something I would be interested.) |
Interested as well! ;) |
This is awesome! |
This will be great! |
Thanks for your support! I am almost done, except for a bug in my tensorflow implementation. I hope to resolve this issue in a few days. |
It will be an awesome function! |
Great, waiting for update! |
Sorry, for the delay. I am still working on it, but in my spare time. The layer is complete, but the example is not finished yet. |
I'm interested as well. I have the problem that RNNs are not able to capture e.g. BIO-encoding correctly and produce ill formated BIO-tags (e.g. starting an I-tag without a previous B-tag). Thanks for contributing and looking forward to your implementation. |
python3 conll2000_bi_lstm_crf.py I have run the setup.py in https://github.com/fchollet/keras/tree/bba6b521abc462261dd65883be59c94e1467b7cf What is the right way to run this file? |
This should work if the library is properly installed. I guess you had a previous keras version in your conda environment. Then your install didn't update existing files but just added the new ones. For example, Try again: python setup.py install --force You can check the installation by running python3 -c "from keras.layers import ChainCRF" If this doesn' throw an |
Thanks. |
hi I'm running a blstm-crf model but before the training begins I meet the following error: and my model is as follows: before that I add a TimeDistributed wrapper to make the input dim of CRF be correct.But I don't know what this error means.Could somebody help me? |
In your setting, the targets must be one-hot encoded and hence of dimension 3 (and not 2), i.e: Y_test.shape = (nb_samples, timesteps, nb_classes) |
Thanks for the reply but I'm not very clear about the shape. After my preprocess I use So how can I take the dimension transition from Thanks a lot. |
You cannot make the desired dimension transition. The model works only for temporal data, but your preprocessing shows that this is not true in your case. Why are you trying to use a ChainCRF? |
I use lstm and crf to Chinese sequence segmentation. In my preprocess I use a window sliding the sentence so the training data has It seems that I should set a sequence length as timesteps to form the data shape to be But I have used an embedding layer which make the output dim to the LSTM layer become And I notice that the ChainCRF layer doesn't support mask_zero argument yet. So does it means I should discard the embedding layer and retrim the data dimension to be |
Thanks for the advice and I have already fixed the problem by discrad embedding layer and retrim data dimension. |
Can I use this Chain CRF to implement BiLSTM with CRF for NER tagging as shown in the code here https://github.com/glample/tagger |
got this error any help ? Loading data... |
Hi @SamihYounes, Please update to the latest version of the pull request #4621. |
I have a question about the example code n_words = 10000
maxlen = 32
(X_train, y_train), (X_test, y_test) = load_treebank(nb_words=n_words, maxlen=maxlen)
n_samples, n_steps, n_classes = y_train.shape
model = Sequential()
model.add(Embedding(n_words, 128, input_length=maxlen, dropout=0.2))
model.add(LSTM(64, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes)))
model.add(Dropout(0.2))
crf = ChainCRF()
model.add(crf)
model.compile(loss=crf.loss, optimizer='rmsprop', metrics=['accuracy']) After LSTM layer, Dense layer convert each timestep of the LSTM output into n_classes dimension. CRF could take features of size 100 for each timestep and then output tags of size 8 right? |
Dear @JacobIsrael123, Of course, we could integrate a dense layer for input dimension conversion but this is not always necessary (for example if the preceding layer is recurrent layer with the output dimension nb_classes). While designing the ChainCRF layer, I decided to keep it simple as possible. |
@phipleg Can you share the document/tutorial you used for implementing ChainCRF module? Thanks. |
The above issue with loss value is only observed with Theano backend. Tensorflow works fine. |
new to git hub, how can I run setup in https://github.com/fchollet/keras/tree/bba6b521abc462261dd65883be59c94e1467b7cf? if I use cntk, can I still use the crf layer? |
python setup.py install --force |
I am working on a sequence labeling task based on a bi-directional LSTM architecture with variable sequence length (I'm not padding sentences). Thus, during training, I have a lot of mini-batches, including those with size 1. @phipleg said in a previous post that "mini batches of size 1 are problematic". Does this mean that this implementation won't work in such situation? |
@dfalci This was fixed in a later commit, now the CRF implementation works fine for mini-batches of size 1. A hint on speeding up your idea: What I do is to group sentences by sentence length and then create mini-batches of sentences with the same length. If your train data is large enough and the sentence are approx. of the same length, you will only have few mini-batches with a single sentence. |
@phipleg I'm interested in using your implementation, but am wondering if you could elaborate on what this means:
What I think this means is that the second-from-last layer, the one just before the CRF, is actually trying to predict the target directly, and learns to do so based on a loss function that is distinct from the CRF (e.g. cross-entropy). Meanwhile, the CRF learns a set of transition probabilities, based on its own loss function --- the log likelihood calculated from the forward-backward algorithm. So, training of the CRF could be decoupled from training from the rest of the network, since the CRFs parameters do not affect the loss function as seen by the rest of the network ("The layer is the identity function during training"). Have I understood correctly? While this seems reasonable, my reading of Bidirectional LSTM-CRF Models for Sequence Tagging (Huang et al 2015) and Neural Architectures for Named Entity Recognition (Lample 2016) is that their Bi-LSTM-CRF implementations are trained by back-propagating the CRF's log-likelihood loss function through the entire network. (I could be mistaken of course.) |
@enewe101 I can maybe answer that. The CRF-Layer is updated using back-propagation during the training to learn transition probabilities. However, for training, we already know the correct labels. Hence, training and inference for a CRF or Hidden-Markov-Model in a Neural Network is distinct from those layers at inference. But the error function of the CRF layer is used for training and the transistions are updated with each epoch. You could of course also decouple the training of the network from the training of the CRF / HMM. But this is seldom done as it introduces further complexity. The paper of Collobert et al. 'NLP almost from scratch' explains the process well how to add a HMM to a network and how the training and inference must be modified. In my implementation (https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf) I achieve on par results with Huang et al., Lample et al., and Ma & Hovy for various tasks using this CRF implementation. So it appears that this CRF implementation works well |
Hi @enewe101, as @nreimers pointed out the CRF layer only applies the costly inference at prediction, and not at training because the target labels are already known. During training it acts as the identiy but at the same time holds the parameters for the CRF loss. You need to use this loss in your model (and not some cross-entropy like you said). Otherwise, by taking gradients, the CRF parameters won't get any updates. |
@phipleg I see that typically the CRF layer loss is applied to a single output layer. Is it possible to have two outputs, as in the functional API demo, while using the CRF loss form two different CRFs? |
Hi @chaxor, have you tried already something like this? input_for_crf1 = ...
input_for_crf2 = ...
crf1 = ChainCRF(params_for_crf1)(input_for_crf1)
crf2 = ChainCRF(params_for_crf1)(input_for_crf2)
model = Model(inp, [out1,out2])
model.compile(optimizer = ...., loss = [crf1.loss, crf2.loss]) |
@phipleg Well, that was a simple fix. Thank you so much for your help! |
hey, i just use the newest ChainCRF layer but the result is strange.The acc of train set first increase and then decrease,and the acc of val increase continuously, i am not clear about that, can you explain it ?@phipleg |
Dear @KARABAER, it is hard to say without knowing your complete model, the data and training code. Please give more details. |
Hi, @phipleg I think transistions(a square matrix) should be updated after every batch-training and should be stored if I want to save model, right? |
Hi @phipleg In my problem setting, the input tensor has 4 dimension (batch_size, session_len, query_len, num_classes), so I built a TimeDistributed layer on top of CRF like below:
But I found it always throws an error when I trying to fit the sequential model. Does current CRF implementation not support adding a TimeDistributed layer on top? If it's that case, is there any alternative to support a 4D input? Thanks! |
Hi @phipleg I found it's okay to define the create_model function as above, the error comes from the CRF loss function which it only accepts 3D predictions. But my predictions are 4D tensors. So I extend the loss function to accept 4D predictions as below:
I finally found above solution worked! So for anyone having the same question as me, maybe you can take a look at above solution. One more question: I found the model is training properly and the test results also make sense. However, the validation loss is about 100X larger than the training loss. Does anyone have any idea of what may cause this issue? Thanks! |
Although @phipleg and @nreimers explained I am still confused about joint training of CRF and LSTM. What I'd expect is that first we should obtain LSTM outputs and CRF layer should make a decision using these outputs and randomly initialized transition matrix by calculating a score function. Then a loss function is calculated with true and predicted labels and the gradient of this function wrt to the parameters that we try to learn should be calculated. Finally we update parameters accordingly. I don't understand how to back propagate the errors from the output of CRF layer. In LSTM without CRF layer one may use cross entropy to do sequence to sequence learning by calculating a loss function using predicted and true(maybe one hot representation) labels at each time step. But how to back propagate error in case of CRF? |
Is there any preference or major difference between this implementation and the ChainCrf in keras-contrib? |
@lzfelix This implementation worked better for me (see https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf). Worked better means: Higher F1-score for standard sequence tagging tasks like NER, Chunking etc. Die ChainCrf implementation in keras-contrib has the issue that it produced several invalid BIO-tags, i.e., it starts an I-tag without a previous B-tag. This is not the case for this implementation. However: This implementation only works for Keras 1, while ChainCrf in keras-contrib also works for Keras 2. |
Hi @nreimers, thank you for the feedback and your careful evaluation of both models. I'm quite new to CRFs and coincidently I was reading your code before reaching this thread. Regarding the limitation on Keras version that you mentioned, I was able to overcome it through minor code modifications. On this repository I show such changes and evaluate the CRF on the POS task using a very simple model, maybe you can use this as well to upgrade your code to Keras 2, as pointed out on your repository's readme. |
Hi @lzfelix , that is great, I will have a look. |
@nreimers, thank you! Please let me know if you have any comments or find anything strange on my code. As I mentioned previously, I'm still learning CRFs. @Jeffyrao, I have observed that my loss on dev is much higher than on training as well, but not as much as you reported. It's possible to see this behaviour on my demo code here. |
Closing this issue, as linear chain CRF is supported as a Keras layer in TensorFlow Addons. Thank you for the feature request! |
I implemented a Linear Chain CRF layer for sequence tagging tasks inspired by the paper:
Lample et al. Neural Architectures for Named Entity Recognition (Neural Architectures for Named Entity Recognition)
The layer is the identity function during training and applies the forward-backward algorithm during inference. For that it holds a set of trainable parameters which is accessed by a specific loss function.
You can see the API in the short gist for pos tagging on the penn treebank (provided by NLTK):
https://gist.github.com/phipleg/adfccb0ad96b777eecc9bb0f16ab54fc
Currently, it is only implemented in Theano and supports fixed length sequences (no masking).
Is anybody interested in seeing the layer in Keras? The need was raised in issue 824 but the issue is closed.
I could refactor my code and make a pull request in a few days. For that I would need to add a few functions to the Theano backend because I make use of Theano's scan function. I could also provide an example.
The text was updated successfully, but these errors were encountered: