Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3 evaluator #314

Merged
merged 1 commit into from
May 17, 2024
Merged

llama3 evaluator #314

merged 1 commit into from
May 17, 2024

Conversation

zhuang-li
Copy link
Contributor

I evaluated the performance of llama3 70b using the pipeline of adding new evaluators. The human agreement is 67.53, which is only slightly lower than the human agreement of 'alpaca_eval_gpt4_turbo_fn'. The cost is 0.41 per 1000 examples, and the speed is 208 seconds per 1000 examples. There might be ways to improve its human correlation further, like COT. But I met some bugs using the template from 'alpaca_eval_cot_gpt4_turbo_fn'. For now, I will use this setting as the primary evaluator in my paper to provide a cost-effective and rapid evaluation. Given the approaching deadline, this approach will also help meet the time constraints. I hope this method convinces the reviewers of its reliability and effectiveness.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what happened here. This file shows the evaluators leaderboard, which should actually be here:
https://github.com/tatsu-lab/alpaca_eval/blob/main/src/alpaca_eval/leaderboards/evaluators/evaluators_leaderboard.csv

did you change the path or is that a mistake in the code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the path of the current file should actually have a leaderboard of models as evaluated by llama3 (rather than thee leaderboard of evaluators)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using this command first

alpaca_eval analyze_evaluators --annotators_config '<path_to_config.yaml>'    

It generates a 'leaderboard.csv' under the folder for the llama3 evaluation config file.

But then I found that the leaderboard.csv should appear in 'src/alpaca_eval/leaderboards/data_AlpacaEval/_leaderboard.csv' given this command in the section "Contributing an evaluator"

alpaca_eval make_leaderboard \
  --leaderboard_path src/alpaca_eval/leaderboards/data_AlpacaEval/<evaluator>_leaderboard.csv \
  --all_model_outputs alpaca_eval_all_outputs.json \
  --annotators_config <evaluator_config>

So I copied the 'leaderboard.csv' from the folder of llama3 config file to 'src/alpaca_eval/leaderboards/data_AlpacaEval/_leaderboard.csv'

I guess this was a mistake... How should we resolve it?

@YannDubs
Copy link
Collaborator

Those are great results!!! can you check the path issue above and then I'll merge.

@YannDubs YannDubs mentioned this pull request May 16, 2024
@YannDubs
Copy link
Collaborator

Ok I'm merging in and I'll put this file in the right place. However, have you evaluated any model using llama 3 (beyond the analysis). If so you should have the leaderboard under src/alpaca_eval/leaderboards/data_AlpacaEval_2/alpaca_eval_llama3_70b_fn_leaderboard.csv, can you push that file in another PR?
thanks @zhuang-li

@YannDubs YannDubs merged commit c006178 into tatsu-lab:main May 17, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants