-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added PubMed embeddings computed by @jessepeng #519
Conversation
Is the size of the hidden layer(s) and the number of layers known for these models? This would be an interesting information for comparative experiments. |
Hi @khituras - I believe the model was trained with a hidden size of 1150 and 3 layers and BPTT truncated at a sequence length of 240. It was only trained over a 5% sample of PubMed abstracts until 2015, which is 1.219.734 abstracts. @jessepeng is this correct? |
Yes, this is correct. Below are the hyperparameters used for training: |
@jessepeng Thank you so much for this specification. Was there some specific evaluation strategy that lead you to choose these parameters? |
Yes, good point - we'll add this to the documentation with the release! |
Can we know the statistics of test and validation dataset and what is the perplexity on test and validation dataset? |
@khituras No, I chose most of those parameters because they were the standard parameters of Flair. I did however choose the number of layers and number of hidden dimensions to be in accordance to a word-level LM I also trained on the same corpus. The architecture and hyperparameters I chose for this LM follow Merity et. al. 2017. @pinal-patel The dataset consisting of the aforementioned 1.219.734 abstract was split 60/10/30 into train/validation/test datasets. The perplexities on train/val/test were 2,15/2,08/2,07 for the forward model and 2,19/2,1/2,09 for the backward model. |
@jessepeng Did you start the training from scratch on Pubmed abstracts or did you further fine tune on a model trained on Wiki or some similar dataset? |
@jessepeng ? |
@shreyashub I started training from scratch. I trained each direction for about 10 days on a GeForce GTX Titan X. |
Hello @shreyashub to fine tune an existing LanguageModel, you only need to load an existing one instead of instantiating a new one. The rest of the training code remains the same as in Tutorial 9: from flair.data import Dictionary
from flair.models import LanguageModel
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus
# get your corpus, process forward and at the character level
corpus = TextCorpus('/path/to/your/corpus',
dictionary,
is_forward_lm,
character_level=True)
# instantiate an existing LM, such as one from the FlairEmbeddings
language_model = FlairEmbeddings('news-forward-fast').lm
# use the model trainer to fine-tune this model on your corpus
trainer = LanguageModelTrainer(language_model, corpus)
trainer.train('resources/taggers/language_model',
sequence_length=10,
mini_batch_size=10,
max_epochs=10) Note that when you fine-tune, you automatically use the same character dictionary as before and automatically copy the direction (forward/backward). |
as |
Yes that works - the pooled variant just builds on top of |
@jessepeng computed a character LM over PubMed abstracts and shared the models with us. This PR adds them as FlairEmbeddings.
Init with: