[Problem/ Squad V2] the result is too low compare with the F1 score in paper, Is something wrong? #230

Gs-Zhang · 2020-09-22T07:45:26Z

I0922 11:46:38.663871 140308634334976 run_squad_v2.py:505] ***** Final Eval results *****
INFO:tensorflow: exact = 50.09685841825992
I0922 11:46:38.663987 140308634334976 run_squad_v2.py:507] exact = 50.09685841825992
INFO:tensorflow: f1 = 50.11359538016659
I0922 11:46:38.664040 140308634334976 run_squad_v2.py:507] f1 = 50.11359538016659
INFO:tensorflow: null_score_diff_threshold = -1.230899453163147
I0922 11:46:38.664077 140308634334976 run_squad_v2.py:507] null_score_diff_threshold = -1.230899453163147
INFO:tensorflow: total = 11873
I0922 11:46:38.664113 140308634334976 run_squad_v2.py:507] total = 11873

Gs-Zhang · 2020-09-22T07:47:44Z

flags.DEFINE_string(
"albert_config_file", 'albert_base/albert_config.json',
"The config json file corresponding to the pre-trained ALBERT model. "
"This specifies the model architecture.")

flags.DEFINE_string("vocab_file", 'albert_xlarge/30k-clean.vocab',
"The vocabulary file that the ALBERT model was trained on.")

flags.DEFINE_string("spm_model_file", 'albert_xlarge/30k-clean.model',
"The model file for sentence piece tokenization.")

flags.DEFINE_string(
"output_dir", 'result',
"The output directory where the model checkpoints will be written.")

Other parameters

flags.DEFINE_string("train_file", '/home/gszhang/code/NLP/albert/train-v2.0.json',
"SQuAD json for training. E.g., train-v1.1.json")

flags.DEFINE_string(
"predict_file", 'dev-v2.0.json',
"SQuAD json for predictions. E.g., dev-v1.1.json or test-v1.1.json")

flags.DEFINE_string("train_feature_file", '/home/gszhang/code/NLP/albert/result/train_feature',
"training feature file.")

flags.DEFINE_string(
"predict_feature_file", '/home/gszhang/code/NLP/albert/result/predict_feature',
"Location of predict features. If it doesn't exist, it will be written. "
"If it does exist, it will be read.")

flags.DEFINE_string(
"predict_feature_left_file", '/home/gszhang/code/NLP/albert/result/predict_left_feature',
"Location of predict features not passed to TPU. If it doesn't exist, it "
"will be written. If it does exist, it will be read.")

flags.DEFINE_string(
"init_checkpoint", 'albert_xlarge/model.ckpt-best.index',
"Initial checkpoint (usually from a pre-trained BERT model).")

flags.DEFINE_string(
"albert_hub_module_handle", None,
"If set, the ALBERT hub module to use.")

flags.DEFINE_bool(
"do_lower_case", True,
"Whether to lower case the input text. Should be True for uncased "
"models and False for cased models.")

flags.DEFINE_integer(
"max_seq_length", 384,
"The maximum total input sequence length after WordPiece tokenization. "
"Sequences longer than this will be truncated, and sequences shorter "
"than this will be padded.")

flags.DEFINE_integer(
"doc_stride", 128,
"When splitting up a long document into chunks, how much stride to "
"take between chunks.")

flags.DEFINE_integer(
"max_query_length", 64,
"The maximum number of tokens for the question. Questions longer than "
"this will be truncated to this length.")

flags.DEFINE_bool("do_train", False, "Whether to run training.")

flags.DEFINE_bool("do_predict", True, "Whether to run eval on the dev set.")

flags.DEFINE_integer("train_batch_size", 8, "Total batch size for training.")

flags.DEFINE_integer("predict_batch_size", 8,
"Total batch size for predictions.")

flags.DEFINE_float("learning_rate", 5e-5, "The initial learning rate for Adam.")

flags.DEFINE_float("num_train_epochs", 3.0,
"Total number of training epochs to perform.")

flags.DEFINE_float(
"warmup_proportion", 0.1,
"Proportion of training to perform linear learning rate warmup for. "
"E.g., 0.1 = 10% of training.")

flags.DEFINE_integer("save_checkpoints_steps", 1000,
"How often to save the model checkpoint.")

flags.DEFINE_integer("iterations_per_loop", 1000,
"How many steps to make in each estimator call.")

flags.DEFINE_integer(
"n_best_size", 20,
"The total number of n-best predictions to generate in the "
"nbest_predictions.json output file.")

flags.DEFINE_integer(
"max_answer_length", 30,
"The maximum length of an answer that can be generated. This is needed "
"because the start and end predictions are not conditioned on one another.")

flags.DEFINE_bool("use_tpu", False, "Whether to use TPU or GPU/CPU.")

tf.flags.DEFINE_string(
"tpu_name", None,
"The Cloud TPU to use for training. This should be either the name "
"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 "
"url.")

tf.flags.DEFINE_string(
"tpu_zone", None,
"[Optional] GCE zone where the Cloud TPU is located in. If not "
"specified, we will attempt to automatically detect the GCE project from "
"metadata.")

tf.flags.DEFINE_string(
"gcp_project", None,
"[Optional] Project name for the Cloud TPU-enabled project. If not "
"specified, we will attempt to automatically detect the GCE project from "
"metadata.")

tf.flags.DEFINE_string("master", None, "[Optional] TensorFlow master URL.")

flags.DEFINE_integer(
"num_tpu_cores", 8,
"Only used if use_tpu is True. Total number of TPU cores to use.")

flags.DEFINE_integer("start_n_top", 5, "beam size for the start positions.")

flags.DEFINE_integer("end_n_top", 5, "beam size for the end positions.")

flags.DEFINE_float("dropout_prob", 0.1, "dropout probability.")

this is what I set in run_squad_v2, I can't find the problem, Thanks for your help!

Gs-Zhang · 2020-09-22T07:49:07Z

And the feature file is not exist, it is generated when I am running the .py.

Huibin-Ge · 2020-11-29T15:01:34Z

hi, I meet the same problem, since i use the gpu to run the code, and i change TPUEstimator to Estimator and change TPUEstimatorSpec to EstimatorSpec, and the problem was solved, and can get the f1 score as paper

PremalMatalia · 2021-04-25T18:49:02Z

@Huibin-Ge - Is it possible to provide your notebook file or code which you are using. I am facing issues in running fine-tuning of albert base using SQuAD 2.0 and training doesn't start and stopped abruptly without any error.
Must be some parameter is wrong.

marvel2120 · 2021-08-10T00:52:49Z

same problem

kavin525zhang · 2022-07-25T06:44:37Z

hi, I meet the same problem, the result is too low, can you tell me how to change TPUEstimator to Estimator and change TPUEstimatorSpec to EstimatorSpec?

huibinGe · 2022-07-26T02:12:12Z

Hi, I public my fixed code in https://github.com/huibinGe/albert_gpu_squad. TPUEstimator to Estimator mainly in run_squad_v2.py and TPUEstimatorSpec to EstimatorSpec mainly in squad_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Problem/ Squad V2] the result is too low compare with the F1 score in paper, Is something wrong? #230

[Problem/ Squad V2] the result is too low compare with the F1 score in paper, Is something wrong? #230

Gs-Zhang commented Sep 22, 2020

Gs-Zhang commented Sep 22, 2020

Gs-Zhang commented Sep 22, 2020

Huibin-Ge commented Nov 29, 2020

PremalMatalia commented Apr 25, 2021

marvel2120 commented Aug 10, 2021

kavin525zhang commented Jul 25, 2022

huibinGe commented Jul 26, 2022

[Problem/ Squad V2] the result is too low compare with the F1 score in paper, Is something wrong? #230

[Problem/ Squad V2] the result is too low compare with the F1 score in paper, Is something wrong? #230

Comments

Gs-Zhang commented Sep 22, 2020

Gs-Zhang commented Sep 22, 2020

Other parameters

Gs-Zhang commented Sep 22, 2020

Huibin-Ge commented Nov 29, 2020

PremalMatalia commented Apr 25, 2021

marvel2120 commented Aug 10, 2021

kavin525zhang commented Jul 25, 2022

huibinGe commented Jul 26, 2022