Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Model Execution on the HumanEval Benchmark #3201

Open
lucas-s-p opened this issue Dec 6, 2024 · 3 comments
Open

Issue with Model Execution on the HumanEval Benchmark #3201

lucas-s-p opened this issue Dec 6, 2024 · 3 comments

Comments

@lucas-s-p
Copy link

I’m experiencing an issue while running some models on the HumanEval benchmark. Most of the predictions generated by the model are empty, with no results at all. However, when testing the same model on Colab, it successfully predicted for a specific input that wasn’t working on HELM. Additionally, other models are running successfully on HELM, generating correct predictions for the given inputs. I’ve tried adjusting several parameters, but I still couldn’t figure out the cause. Does anyone know why this might be happening?

@yifanmai
Copy link
Collaborator

yifanmai commented Dec 7, 2024

Hi @lucas-s-p, does this issue also show up when you try to evaluate the same models on other scenarios (e.g. mmlu), or does this only happen for HumanEval?

@lucas-s-p
Copy link
Author

Hi @yifanmai! I’ve tested the model on another scenario in HELM, the APPS benchmark, and it works correctly, generating proper predictions. The issue with empty predictions occurs on the HumanEval benchmark. I’m using the scenario added in this PR: add scenarios HumanEval and APPS

@yifanmai
Copy link
Collaborator

yifanmai commented Dec 7, 2024

Unfortunately most of the authors of the HumanEval integration in HELM are no longer active in this repository, so there isn't much in the way of maintenance or support for this scenario...

cc @lxuechen @q-hwang @teetone in case one of you can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants