llama3 evaluator #314

zhuang-li · 2024-05-16T08:14:12Z

I evaluated the performance of llama3 70b using the pipeline of adding new evaluators. The human agreement is 67.53, which is only slightly lower than the human agreement of 'alpaca_eval_gpt4_turbo_fn'. The cost is 0.41 per 1000 examples, and the speed is 208 seconds per 1000 examples. There might be ways to improve its human correlation further, like COT. But I met some bugs using the template from 'alpaca_eval_cot_gpt4_turbo_fn'. For now, I will use this setting as the primary evaluator in my paper to provide a cost-effective and rapid evaluation. Given the approaching deadline, this approach will also help meet the time constraints. I hope this method convinces the reviewers of its reliability and effectiveness.

YannDubs · 2024-05-16T20:42:35Z

src/alpaca_eval/leaderboards/data_AlpacaEval/alpaca_eval_llama3_70b_fn_leaderboard.csv

I'm not sure what happened here. This file shows the evaluators leaderboard, which should actually be here:
https://github.com/tatsu-lab/alpaca_eval/blob/main/src/alpaca_eval/leaderboards/evaluators/evaluators_leaderboard.csv

did you change the path or is that a mistake in the code?

the path of the current file should actually have a leaderboard of models as evaluated by llama3 (rather than thee leaderboard of evaluators)

I was using this command first

alpaca_eval analyze_evaluators --annotators_config '<path_to_config.yaml>'

It generates a 'leaderboard.csv' under the folder for the llama3 evaluation config file.

But then I found that the leaderboard.csv should appear in 'src/alpaca_eval/leaderboards/data_AlpacaEval/_leaderboard.csv' given this command in the section "Contributing an evaluator"

alpaca_eval make_leaderboard \ --leaderboard_path src/alpaca_eval/leaderboards/data_AlpacaEval/<evaluator>_leaderboard.csv \ --all_model_outputs alpaca_eval_all_outputs.json \ --annotators_config <evaluator_config>

So I copied the 'leaderboard.csv' from the folder of llama3 config file to 'src/alpaca_eval/leaderboards/data_AlpacaEval/_leaderboard.csv'

I guess this was a mistake... How should we resolve it?

YannDubs · 2024-05-16T20:43:12Z

Those are great results!!! can you check the path issue above and then I'll merge.

YannDubs · 2024-05-17T04:15:23Z

Ok I'm merging in and I'll put this file in the right place. However, have you evaluated any model using llama 3 (beyond the analysis). If so you should have the leaderboard under src/alpaca_eval/leaderboards/data_AlpacaEval_2/alpaca_eval_llama3_70b_fn_leaderboard.csv, can you push that file in another PR?
thanks @zhuang-li

llama3 evaluator

ad9825c

YannDubs reviewed May 16, 2024

View reviewed changes

YannDubs mentioned this pull request May 16, 2024

Update README.md #315

Merged

YannDubs merged commit c006178 into tatsu-lab:main May 17, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama3 evaluator #314

llama3 evaluator #314

zhuang-li commented May 16, 2024

YannDubs May 16, 2024

YannDubs May 16, 2024

zhuang-li May 17, 2024

YannDubs commented May 16, 2024

YannDubs commented May 17, 2024

llama3 evaluator #314

llama3 evaluator #314

Conversation

zhuang-li commented May 16, 2024

YannDubs May 16, 2024

Choose a reason for hiding this comment

YannDubs May 16, 2024

Choose a reason for hiding this comment

zhuang-li May 17, 2024

Choose a reason for hiding this comment

YannDubs commented May 16, 2024

YannDubs commented May 17, 2024