You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m experiencing an issue while running some models on the HumanEval benchmark. Most of the predictions generated by the model are empty, with no results at all. However, when testing the same model on Colab, it successfully predicted for a specific input that wasn’t working on HELM. Additionally, other models are running successfully on HELM, generating correct predictions for the given inputs. I’ve tried adjusting several parameters, but I still couldn’t figure out the cause. Does anyone know why this might be happening?
The text was updated successfully, but these errors were encountered:
Hi @lucas-s-p, does this issue also show up when you try to evaluate the same models on other scenarios (e.g. mmlu), or does this only happen for HumanEval?
Hi @yifanmai! I’ve tested the model on another scenario in HELM, the APPS benchmark, and it works correctly, generating proper predictions. The issue with empty predictions occurs on the HumanEval benchmark. I’m using the scenario added in this PR: add scenarios HumanEval and APPS
Unfortunately most of the authors of the HumanEval integration in HELM are no longer active in this repository, so there isn't much in the way of maintenance or support for this scenario...
I’m experiencing an issue while running some models on the HumanEval benchmark. Most of the predictions generated by the model are empty, with no results at all. However, when testing the same model on Colab, it successfully predicted for a specific input that wasn’t working on HELM. Additionally, other models are running successfully on HELM, generating correct predictions for the given inputs. I’ve tried adjusting several parameters, but I still couldn’t figure out the cause. Does anyone know why this might be happening?
The text was updated successfully, but these errors were encountered: