Add batch processing to MMLU and Humaneval evaluation scripts to prevent OOM errors #597

LlamaEnjoyer · 2024-08-23T20:52:00Z

This is a workaround for the OOM errors I get when running MMLU and humaneval tests on my WIndows PC. It works! :)

turboderp · 2024-08-28T18:30:54Z

Just out of interest, have you tried just doing a gc.collect() periodically while it's creating jobs?

LlamaEnjoyer · 2024-08-28T19:25:10Z

Yes, unfortunately it didn't seem to affect anything.

LlamaEnjoyer · 2024-08-28T19:26:51Z

Basically my commited memory in Windows fills up to circa 180GBs (which is give or take 3x my system RAM) and then it OOMs.

jim-plus · 2024-09-24T02:45:01Z

I tried this variant out. Limiting the batch to 64 allowed me to run the test with 16GB VRAM under Windows, while 128 resulted in OOM. I didn't try to optimize the batch size. Used torch 2.4.1 and python 3.11 along with cuda 12.4.

turboderp · 2024-09-24T09:28:00Z

@jim-plus I'm curious if the OoM you're getting is due to VRAM or system RAM. The issue this PR means to address is a system memory leak of some kind in PyTorch or HF Tokenizers (maybe SentencePiece?), which is why I'm a little hesitant to merge it. While enqueued each sequence would use a couple of kB of system RAM at most, and zero VRAM until they're moved to the active list. So it's bizarre that a few thousand jobs can overcommit 3x the available system RAM in this way, or if limiting the length of the queue affects VRAM allocation somehow.

It's definitely unintended behavior, and if it is a bug in ExLlama I'd rather fix it than work around it. Or if it's a memory leak in the tokenizer, perhaps tokenization could be batched in an isolated context.

LlamaEnjoyer · 2024-09-24T09:39:25Z

@turboderp Agreed, it's always better to fix the root cause than workaround it. Still I just wanted to have it out in the open, in case someone might find it useful.

If there's anything else you'd like for me to try to help debug this I'm all ears :)

jim-plus · 2024-09-25T18:04:20Z

Something curious is happening. Running the baseline script with default batch 128 will OOM when preparing questions, but that doesn't happen when I select batch size 128 for the updated script above. There seems to be a memory spike when initially preparing questions which levels off.

LlamaEnjoyer · 2024-09-25T18:14:24Z

That's correct. The changes I introduced all the script to process the 164 questions in batches (with a default batch size value of 50, if not altered via cmd line argument)) instead of all 164 as done in the original version. This helps to avoid the OOM which happens when the commited memory reaches the Window's limit of circa 3x the system RAM size in GB.

Add batch processing to prevent OOM errors

d3e46da

LlamaEnjoyer mentioned this pull request Aug 23, 2024

Running humaneval against llama-3-8b-instruct exl2 quant results in a silent OOM when samples per task > 7 #496

Open

LlamaEnjoyer and others added 2 commits September 22, 2024 11:12

Merge branch 'master' into batched_evals

b4352e6

Merge branch 'turboderp:master' into batched_evals

7ae197e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batch processing to MMLU and Humaneval evaluation scripts to prevent OOM errors #597

Add batch processing to MMLU and Humaneval evaluation scripts to prevent OOM errors #597

LlamaEnjoyer commented Aug 23, 2024

turboderp commented Aug 28, 2024

LlamaEnjoyer commented Aug 28, 2024

LlamaEnjoyer commented Aug 28, 2024

jim-plus commented Sep 24, 2024

turboderp commented Sep 24, 2024

LlamaEnjoyer commented Sep 24, 2024

jim-plus commented Sep 25, 2024

LlamaEnjoyer commented Sep 25, 2024 •

edited

Loading

Add batch processing to MMLU and Humaneval evaluation scripts to prevent OOM errors #597

Are you sure you want to change the base?

Add batch processing to MMLU and Humaneval evaluation scripts to prevent OOM errors #597

Conversation

LlamaEnjoyer commented Aug 23, 2024

turboderp commented Aug 28, 2024

LlamaEnjoyer commented Aug 28, 2024

LlamaEnjoyer commented Aug 28, 2024

jim-plus commented Sep 24, 2024

turboderp commented Sep 24, 2024

LlamaEnjoyer commented Sep 24, 2024

jim-plus commented Sep 25, 2024

LlamaEnjoyer commented Sep 25, 2024 • edited Loading

LlamaEnjoyer commented Sep 25, 2024 •

edited

Loading