Issue with Model Execution on the HumanEval Benchmark #3201

lucas-s-p · 2024-12-06T11:42:54Z

I’m experiencing an issue while running some models on the HumanEval benchmark. Most of the predictions generated by the model are empty, with no results at all. However, when testing the same model on Colab, it successfully predicted for a specific input that wasn’t working on HELM. Additionally, other models are running successfully on HELM, generating correct predictions for the given inputs. I’ve tried adjusting several parameters, but I still couldn’t figure out the cause. Does anyone know why this might be happening?

yifanmai · 2024-12-07T00:36:01Z

Hi @lucas-s-p, does this issue also show up when you try to evaluate the same models on other scenarios (e.g. mmlu), or does this only happen for HumanEval?

lucas-s-p · 2024-12-07T01:10:09Z

Hi @yifanmai! I’ve tested the model on another scenario in HELM, the APPS benchmark, and it works correctly, generating proper predictions. The issue with empty predictions occurs on the HumanEval benchmark. I’m using the scenario added in this PR: add scenarios HumanEval and APPS

yifanmai · 2024-12-07T05:32:58Z

Unfortunately most of the authors of the HumanEval integration in HELM are no longer active in this repository, so there isn't much in the way of maintenance or support for this scenario...

cc @lxuechen @q-hwang @teetone in case one of you can help.

yifanmai added the user question label Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Model Execution on the HumanEval Benchmark #3201

Issue with Model Execution on the HumanEval Benchmark #3201

lucas-s-p commented Dec 6, 2024

yifanmai commented Dec 7, 2024

lucas-s-p commented Dec 7, 2024

yifanmai commented Dec 7, 2024 •

edited

Loading

Issue with Model Execution on the HumanEval Benchmark #3201

Issue with Model Execution on the HumanEval Benchmark #3201

Comments

lucas-s-p commented Dec 6, 2024

yifanmai commented Dec 7, 2024

lucas-s-p commented Dec 7, 2024

yifanmai commented Dec 7, 2024 • edited Loading

yifanmai commented Dec 7, 2024 •

edited

Loading