Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

[Problem/ Squad V2] the result is too low compare with the F1 score in paper, Is something wrong? #230

Gs-Zhang opened this issue Sep 22, 2020 · 7 comments


Copy link

I0922 11:46:38.663871 140308634334976] ***** Final Eval results *****
INFO:tensorflow: exact = 50.09685841825992
I0922 11:46:38.663987 140308634334976] exact = 50.09685841825992
INFO:tensorflow: f1 = 50.11359538016659
I0922 11:46:38.664040 140308634334976] f1 = 50.11359538016659
INFO:tensorflow: null_score_diff_threshold = -1.230899453163147
I0922 11:46:38.664077 140308634334976] null_score_diff_threshold = -1.230899453163147
INFO:tensorflow: total = 11873
I0922 11:46:38.664113 140308634334976] total = 11873

Copy link

"albert_config_file", 'albert_base/albert_config.json',
"The config json file corresponding to the pre-trained ALBERT model. "
"This specifies the model architecture.")

flags.DEFINE_string("vocab_file", 'albert_xlarge/30k-clean.vocab',
"The vocabulary file that the ALBERT model was trained on.")

flags.DEFINE_string("spm_model_file", 'albert_xlarge/30k-clean.model',
"The model file for sentence piece tokenization.")

"output_dir", 'result',
"The output directory where the model checkpoints will be written.")

Other parameters

flags.DEFINE_string("train_file", '/home/gszhang/code/NLP/albert/train-v2.0.json',
"SQuAD json for training. E.g., train-v1.1.json")

"predict_file", 'dev-v2.0.json',
"SQuAD json for predictions. E.g., dev-v1.1.json or test-v1.1.json")

flags.DEFINE_string("train_feature_file", '/home/gszhang/code/NLP/albert/result/train_feature',
"training feature file.")

"predict_feature_file", '/home/gszhang/code/NLP/albert/result/predict_feature',
"Location of predict features. If it doesn't exist, it will be written. "
"If it does exist, it will be read.")

"predict_feature_left_file", '/home/gszhang/code/NLP/albert/result/predict_left_feature',
"Location of predict features not passed to TPU. If it doesn't exist, it "
"will be written. If it does exist, it will be read.")

"init_checkpoint", 'albert_xlarge/model.ckpt-best.index',
"Initial checkpoint (usually from a pre-trained BERT model).")

"albert_hub_module_handle", None,
"If set, the ALBERT hub module to use.")

"do_lower_case", True,
"Whether to lower case the input text. Should be True for uncased "
"models and False for cased models.")

"max_seq_length", 384,
"The maximum total input sequence length after WordPiece tokenization. "
"Sequences longer than this will be truncated, and sequences shorter "
"than this will be padded.")

"doc_stride", 128,
"When splitting up a long document into chunks, how much stride to "
"take between chunks.")

"max_query_length", 64,
"The maximum number of tokens for the question. Questions longer than "
"this will be truncated to this length.")

flags.DEFINE_bool("do_train", False, "Whether to run training.")

flags.DEFINE_bool("do_predict", True, "Whether to run eval on the dev set.")

flags.DEFINE_integer("train_batch_size", 8, "Total batch size for training.")

flags.DEFINE_integer("predict_batch_size", 8,
"Total batch size for predictions.")

flags.DEFINE_float("learning_rate", 5e-5, "The initial learning rate for Adam.")

flags.DEFINE_float("num_train_epochs", 3.0,
"Total number of training epochs to perform.")

"warmup_proportion", 0.1,
"Proportion of training to perform linear learning rate warmup for. "
"E.g., 0.1 = 10% of training.")

flags.DEFINE_integer("save_checkpoints_steps", 1000,
"How often to save the model checkpoint.")

flags.DEFINE_integer("iterations_per_loop", 1000,
"How many steps to make in each estimator call.")

"n_best_size", 20,
"The total number of n-best predictions to generate in the "
"nbest_predictions.json output file.")

"max_answer_length", 30,
"The maximum length of an answer that can be generated. This is needed "
"because the start and end predictions are not conditioned on one another.")

flags.DEFINE_bool("use_tpu", False, "Whether to use TPU or GPU/CPU.")

"tpu_name", None,
"The Cloud TPU to use for training. This should be either the name "
"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 "

"tpu_zone", None,
"[Optional] GCE zone where the Cloud TPU is located in. If not "
"specified, we will attempt to automatically detect the GCE project from "

"gcp_project", None,
"[Optional] Project name for the Cloud TPU-enabled project. If not "
"specified, we will attempt to automatically detect the GCE project from "

tf.flags.DEFINE_string("master", None, "[Optional] TensorFlow master URL.")

"num_tpu_cores", 8,
"Only used if use_tpu is True. Total number of TPU cores to use.")

flags.DEFINE_integer("start_n_top", 5, "beam size for the start positions.")

flags.DEFINE_integer("end_n_top", 5, "beam size for the end positions.")

flags.DEFINE_float("dropout_prob", 0.1, "dropout probability.")

this is what I set in run_squad_v2, I can't find the problem, Thanks for your help!

Copy link

And the feature file is not exist, it is generated when I am running the .py.

Copy link

hi, I meet the same problem, since i use the gpu to run the code, and i change TPUEstimator to Estimator and change TPUEstimatorSpec to EstimatorSpec, and the problem was solved, and can get the f1 score as paper

Copy link

@Huibin-Ge - Is it possible to provide your notebook file or code which you are using. I am facing issues in running fine-tuning of albert base using SQuAD 2.0 and training doesn't start and stopped abruptly without any error.
Must be some parameter is wrong.

Copy link

same problem

Copy link

hi, I meet the same problem, the result is too low, can you tell me how to change TPUEstimator to Estimator and change TPUEstimatorSpec to EstimatorSpec?

Copy link

Hi, I public my fixed code in TPUEstimator to Estimator mainly in and TPUEstimatorSpec to EstimatorSpec mainly in

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
None yet

No branches or pull requests

6 participants