[CI/Build] Update Llama4 eval yaml#27070
Conversation
Signed-off-by: zhewenli <zhewenli@meta.com>
There was a problem hiding this comment.
💡 Codex Review
https://github.com/vllm-project/vllm/blob/45f1d12f2b8c11143ee308e7c2b7586377e9735a/.buildkite/lm-eval-harness/configs/Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml#L7-L12
Update MMLU baseline to measured accuracy
The updated instructions and test plan document MMLU‑Pro results of about 0.7743 exact_match, yet the YAML still asserts a ground truth of 0.80. test_lm_eval_correctness.py consumes this file and checks np.isclose against the value field, so the regression test will continue to fail even when the model reproduces the numbers reported in this commit. Please lower the expected value (or justify raising the actual score) so the baseline reflects the measured metric.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: zhewenli <zhewenli@meta.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: zhewenli <zhewenli@meta.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: zhewenli <zhewenli@meta.com>
Purpose
Fix some missing pieces in #21810
Test Plan
ChartQA:
MMLU-Pro: