Skip to content

Google QUEST Q&A Labeling. Improving automated understanding of complex question answer content

Notifications You must be signed in to change notification settings

PPshrimpGo/quest_qa_labeling

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google QUEST Q&A Labeling

Improving automated understanding of complex question answer content

In order to run the code install 'A lightweight python library that helps to keep track of numerical experiments'.
You can find competition data here.

Example of default bert-base training command from master branch:

run.py --epochs=5 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=1 --batch_size=8 --warmup=300 --lr=1e-5 --bert_model=bert-base-uncased

Example of BART training command from bart branch:

run.py --epochs=4 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=4 --batch_size=2 --warmup=250 --lr=2e-5 --bert_model=./bart.large

After you've added a pseudo labels set (we used a 100k subset from archive):

run.py --epochs=4 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=4 --batch_size=2 --warmup=250 --lr=2e-5 --bert_model=./bart.large --pseudo_file ../input/leak-free-pseudo-100k/pseudo-100k-4x-blend-no-leak-fold-{}.csv.gz --split_pseudo --leak_free_pseudo

In monty branch you can find code for LM pretraining on stackexchange data

Read our solution and explanation here.
To be done.

About

Google QUEST Q&A Labeling. Improving automated understanding of complex question answer content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%