How can I get the Model-based/TinyLlama (1.1B) /Always Retrieval match is 28.2 ? #1

HNUZCC · 2024-10-09T03:27:12Z

I run the bash run_lm.sh

show:
======= estimate no retrieval (q) API cost: 0.017889500000000003, total tokens #: 35779 ================
======= estimate always retrieval (q+context) API cost: 0.892045, total tokens #: 1784090 ================
======= total retrieval: [2785/2785] ================

{'data_source': 'retrievalqa', 'total_data_count': 2785, 'retrieval_frequency': 2785, 'retrieval_rate': 100.0, 'match_score': 59.9, 'f1_score': 15.2, 'em_score': 0.1, 'accuracy_score': 34.3, 'match_total': 1667, 'f1_total': 424.5294026557026, 'em_total': 4.0, 'accuracy_total': 954.0, 'total_q_tokens': 35779, 'total_context_tokens': 1748311, 'total_no_retrieval_tokens': 35779, 'total_always_retrieval_tokens': 1748311, 'estimate_no_retrieval_cost': 0.017889500000000003, 'estimate_always_retrieval_cost': 0.892045, 'saved_cost_rate': 0.9799455184435762, 'args': {'openai_config_path': './openai_config.txt', 'data_source': 'retrievalqa', 'retrieval_mode': 'always_retrieval', 'input_data_path': './data/retrievalqa.jsonl', 'output_score_path': './results/always_retrieval/TinyLlama/TinyLlama-1.1B-Chat-v1.0/m=vanilla/t=0.0/score_retrievalqa_seed20.json', 'output_prediction_path': './results/always_retrieval/TinyLlama/TinyLlama-1.1B-Chat-v1.0/m=vanilla/t=0.0/predict_retrievalqa_seed20.jsonl', 'model_name': 'TinyLlama/TinyLlama-1.1B-Chat-v1.0', 'max_tokens': 100, 'batch_size': 1, 'doc_top_n': 5, 'limit_input': 0, 'prompt_method': 'vanilla', 'seed': 20, 'temperature': 0.0, 'top_p': 1.0, 'world_size': 1}}
./results/always_retrieval/TinyLlama/TinyLlama-1.1B-Chat-v1.0/m=vanilla/t=0.0
./results/always_retrieval/TinyLlama/TinyLlama-1.1B-Chat-v1.0/m=vanilla/t=0.0

but the article shows that Model-based TinyLlama (1.1B) Always Retrieval match is 28.2. what is the match mean? The reproduced data seems to be inconsistent with it, is it my misunderstanding or my operational error？

ZhangzihanGit · 2024-11-02T05:00:26Z

Hi HNUZCC,

I'm sorry for not getting back to you sooner. I just saw this message.

In your experiments, you run the overall data, which has 2,785 data instances, with 1,271 labelled as required retrieval and 1,514 labelled as do not require retrieval. We have presented these results in Table 10 in the Appendix (see Appendix A.6). On the other hand, the 28.2 Always Retrieval match score for Model-based TinyLlama (1.1B) in Table 1 was only evaluated on the 1,271 questions that require retrieval.

As stated in Section 3.2, different from strict matching, match score measures whether gold answers are included in the model predictions. So, for example, if "Canada" is the gold answer, and the model prediction is "The answer is Canada". Then, the match score would be 1, but the strict match score would be 0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I get the Model-based/TinyLlama (1.1B) /Always Retrieval match is 28.2 ? #1

How can I get the Model-based/TinyLlama (1.1B) /Always Retrieval match is 28.2 ? #1

HNUZCC commented Oct 9, 2024

ZhangzihanGit commented Nov 2, 2024 •

edited

Loading

How can I get the Model-based/TinyLlama (1.1B) /Always Retrieval match is 28.2 ? #1

How can I get the Model-based/TinyLlama (1.1B) /Always Retrieval match is 28.2 ? #1

Comments

HNUZCC commented Oct 9, 2024

ZhangzihanGit commented Nov 2, 2024 • edited Loading

ZhangzihanGit commented Nov 2, 2024 •

edited

Loading