Now, we run individual prompts through the queue. by finbarrtimbers · Pull Request #796 · allenai/open-instruct

finbarrtimbers · 2025-07-17T21:23:25Z

Changes the queue setup so that we run individual prompts through the queues rather than batches. This will avoid the need for load balancing and should enable us to get some nice speed ups once we enable the async engine (next PR!).

Benchmark results! From HEAD (spoiler, basically no change):

Results (excluding first batch):
Average tokens/second: 1014.84
Average MFU: 3.96%
Average generation time per batch: 2.02s
Average new tokens per sample: 2048.0 tokens
Wasted compute % (variable response length): 0.00%

From this PR:

Results (excluding first batch):
Average tokens/second: 1008.56
Average MFU: 3.94%
Average generation time per batch: 2.04s
Average new tokens per sample: 2048.0 tokens
Wasted compute % (variable response length): 0.00%

mnoukhov · 2025-07-17T21:55:21Z

I'm confused why (1) the length of samples goes down and also (2) the time per batch goes down if the speed goes down too

finbarrtimbers · 2025-07-17T22:24:37Z

I'm confused why (1) the length of samples goes down and also (2) the time per batch goes down if the speed goes down too

@mnoukhov it's because I accidentally changed the batch size between runs! That's not the length, but the total number of tokens generated in the batch. Nice catch.

mnoukhov

Did a quick pass and seems good. Definitely a good precursor to async

open_instruct/vllm_utils3.py

mnoukhov · 2025-07-18T00:48:12Z

open_instruct/vllm_utils3.py

+            dataset_indices_batch = []
+            eval_prompts = None
+
+            # Pull prompts until we have a full batch or queue is empty


Seems like this can have more prompts than batch_size (unless .get only gets one at a time?)

I'm also confused when would you have a batch that is not full but due to timeout you will take what you get?

Ok, I have fixed this! Now, PromptRequest only contains a single prompt, and I have removed the timeout, so it will block until it gets a full batch.

Makes sense, you can remove the comment about non-blocking

mnoukhov

other than the little questions about specific batch sizes and divisibility, seems good to me

mnoukhov · 2025-07-18T18:19:38Z

open_instruct/vllm_utils3.py

+            dataset_indices_batch = []
+            eval_prompts = None
+
+            # Pull prompts until we have a full batch or queue is empty


Makes sense, you can remove the comment about non-blocking

mnoukhov · 2025-07-18T18:56:20Z

open_instruct/grpo_fast.py

-                inference_results_Q, pending_queries_map, args.vllm_num_engines, training_step
+                inference_results_Q,
+                pending_queries_map,
+                args.num_unique_prompts_rollout * args.num_samples_per_prompt_rollout,


is it possible that at the end of an epoch or something you will have fewer than this number of results? or are we dropping the last batch

ShufflingIterator guarantees that they're all the same size:

class ShufflingIterator: def __init__(self, data: np.ndarray, batch_size: int, seed: Optional[int] = None): self.data = data.copy() self.batch_size = batch_size self.index = 0 self.rng = np.random.default_rng(seed) self.rng.shuffle(self.data) # Ensure the effective dataset size is divisible by batch_size self.effective_size = len(self.data) - (len(self.data) % batch_size) def __iter__(self) -> Iterator[List[int]]: return self def __next__(self) -> List[int]: if self.index >= self.effective_size: self.index = 0 self.rng.shuffle(self.data) end_index = self.index + self.batch_size batch = self.data[self.index : end_index].tolist() self.index = end_index return batch

mnoukhov · 2025-07-18T18:57:59Z

open_instruct/grpo_fast.py

+    batch_size_per_engine = (
+        args.num_unique_prompts_rollout * args.num_samples_per_prompt_rollout
+    ) // args.vllm_num_engines


do we know if this is equally divisible? and is it alright if its not?

It should be divisible; I changed the code to raise a ValueError if it isn't, and we can handle that then.

This reverts commit 541058c.

…)" (#804)" This reverts commit 4659dca.

* Update oe-eval.sh to set a default timeout of 48h. (allenai#789) * Updated configs to support changes. (allenai#790) * Add benchmark scripts (allenai#786) * Added scripts to run benchmarks. * Removed install script. * Added install script back. * Add remap verifier (allenai#773) * first pass remap verifier * make judge json parsing a little more robust * typoooooooo * typoooooooo * fix logic... * clean logging naming up * Ran the linter. (allenai#792) * fix the URL for code api setup (allenai#791) Co-authored-by: Michael Noukhovitch <[email protected]> * Add nltk setup to uv dockerfile (allenai#785) * add punk tokenizer * fix up command * Switches the actors to use the Ray queue. (allenai#784) * Made changes. * Switched to use ray.util.queue.Queue instead of a custom RayQueue class. * Now, only handles new version. * Updated benchmark_generators.py and test_grpo_fast.py. * CLeaned up code from Claude. * training_step defaults to None. * Added an info dataclass to replace the tuple. * Removes assumption that queries_prompt_Q and inference_results_Q are in sync by moving queries_prompt_Q to be a map. * CLeaned up benchmark * Added code to split batch sizes. * Removed benchmark scripts, which are now in a separate PR. * Now, we create all Ray queues in main, and pass them in as appropriate. * Removed changes * Test changes. * Linter passes * Added tests. * Now, we index with the dataset indices. * Checks and tests pass. * Ran linter * Added benchmark scripts back. Whoops. * Set new default value for num_samples * Updates the benchmark script (allenai#795) * Set new default value for num_samples * Now run N batches at once * different batch size * Fix pack length * Fix pack length * Fix wasted compute % (was accidentally multiplying by 100), and fix num rollouts (was referencing the wrong variable). * Now, we save benchmark results to CSV. * Now show a percentage for time spent generating. * Updated benchmark saving code. * Fixed syntax error. * Fixed benchmark * Fixed timing code. * Removed changes to vllm_utils3.py. * Now, we actually write the data to disk> * Bigger batch * Modified benchmark * Undid changes to benchmark script. * Temp change * Undid changes to benchmark script. * install nginx in uv (allenai#793) it was only being installed in regular Dockerfile Co-authored-by: Michael Noukhovitch <[email protected]> Co-authored-by: Saurabh Shah <[email protected]> * allow passing local models, bubble up dataset cache errors (allenai#797) Co-authored-by: Michael Noukhovitch <[email protected]> * binary reward for code (allenai#798) * binary reward for code * style * binary code reward flag -> pass rate reward threshold * Now, we run individual prompts through the queue. (allenai#796) * Now, we run individual prompts through the queue. * Fixed issues. * Ran linter * Fixed linter errors. * COde lints. * Test passes. * Ran linter. * Ensures that we send single prompts as requests. * Now, code lints. * Cleaned up code. * Fixes test. * Linter passes. * Cleaned test up. * Removed redundant comments. * Adds flashinfer dep. (allenai#800) * Adds flashinfer dep. * Now, open_instruct builds even on mac. * Updated install instructions to add flash-infer. * Now, we set flashinfer as the default attention backend. * Added flashinfer to the base dockerfile. * Ran linter. * Removed extra changes to mason.py. * Undid changes to uv.lock. * Updated requirements.txt * Updated flash-attn version. --------- Co-authored-by: Hamish Ivison <[email protected]> * new beaker names (allenai#803) * Remove Unused DPO Function (allenai#794) * delete function Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Update open_instruct/dataset_transformation.py --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: Hamish Ivison <[email protected]> * extra reporting (allenai#799) prev-branch: padding-free-squashing-7 Co-authored-by: Hamish Ivison <[email protected]> * Revert "Now, we run individual prompts through the queue. (allenai#796)" (allenai#804) This reverts commit 541058c. * Fix misnamed variables. (allenai#808) * Fix misnamed variables. * Ran linter. * Fix broken syntax. (allenai#809) Co-authored-by: Hamish Ivison <[email protected]> * Add new olmo chat templates, and improve data mixing/tokenization (allenai#765) Adds new olmo-core-compatible chat templates. Includes: * New olmo template with support for function-calling. Includes a basic hard-coded system prompt, and appends "You do not have access to any functions" to any SFT examples that do not include functions. * Thinker version of the above template, has <think> included in the generation prompt * R1-style thinker template These 3 templates mirror our current Tulu templates Also includes some necessary changes to the --add_bos logic, to handle the new chat template which does not have a bos token. Includes a few other QoL fixes: * Fixes a bug in the olmocore tokenization script re: label mask * Logs dataset-level statistics during data mixing and tokenization * Supports easy upsampling during data mixing * Fixes from last PR (allenai#810) * fix up my (jacob's) slightly broken pr --------- Co-authored-by: jacob-morrison <[email protected]> * Delete run_repro.sh (allenai#813) * Fix disk space error on image creation (allenai#814) * remove moar things * create on pr * dont create on pr * use upstream stats --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: Finbarr Timbers <[email protected]> Co-authored-by: Hamish Ivison <[email protected]> Co-authored-by: Michael <[email protected]> Co-authored-by: Michael Noukhovitch <[email protected]> Co-authored-by: Saurabh Shah <[email protected]> Co-authored-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: Jacob Morrison <[email protected]>

* Now, we run individual prompts through the queue. * Fixed issues. * Ran linter * Fixed linter errors. * COde lints. * Test passes. * Ran linter. * Ensures that we send single prompts as requests. * Now, code lints. * Cleaned up code. * Fixes test. * Linter passes. * Cleaned test up. * Removed redundant comments.

…" (allenai#804) This reverts commit d3a349a.

finbarrtimbers marked this pull request as ready for review July 17, 2025 22:26

finbarrtimbers force-pushed the insert-prompts branch from dd81492 to 6e5cf07 Compare July 18, 2025 01:14

mnoukhov reviewed Jul 18, 2025

View reviewed changes

finbarrtimbers added 12 commits July 18, 2025 12:28

Now, we run individual prompts through the queue.

fd8aff1

Fixed issues.

03593a2

Ran linter

fe11a69

Fixed linter errors.

a5f5c79

COde lints.

bd8ff44

Test passes.

736833a

Ran linter.

85486c6

Ensures that we send single prompts as requests.

1c73a03

Now, code lints.

5103503

Cleaned up code.

f883a22

Fixes test.

45161ff

Linter passes.

01965e9

finbarrtimbers force-pushed the insert-prompts branch from 067d9fc to 01965e9 Compare July 18, 2025 18:29

Cleaned test up.

bd9f6a2

mnoukhov approved these changes Jul 18, 2025

View reviewed changes

Removed redundant comments.

05e35bc

finbarrtimbers merged commit 541058c into main Jul 18, 2025
3 checks passed

finbarrtimbers deleted the insert-prompts branch July 18, 2025 19:51

saurabh111233212 added a commit that referenced this pull request Jul 21, 2025

Revert "Now, we run individual prompts through the queue. (#796)"

f59b750

This reverts commit 541058c.

saurabh111233212 mentioned this pull request Jul 21, 2025

Revert "Now, we run individual prompts through the queue." #804

Merged

saurabh111233212 added a commit that referenced this pull request Jul 21, 2025

Revert "Now, we run individual prompts through the queue. (#796)" (#804)

4659dca

This reverts commit 541058c.

finbarrtimbers added a commit that referenced this pull request Jul 21, 2025

Revert "Revert "Now, we run individual prompts through the queue. (#796…

8694020

…)" (#804)" This reverts commit 4659dca.

finbarrtimbers mentioned this pull request Jul 21, 2025

Second attempt at landing "Now, we run individual prompts through the queue." #807

Closed

finbarrtimbers added a commit that referenced this pull request Jul 22, 2025

Revert "Revert "Now, we run individual prompts through the queue. (#796…

1f5f06d

…)" (#804)" This reverts commit 4659dca.

sang1583535 pushed a commit to sang1583535/open-instruct that referenced this pull request Feb 3, 2026

Revert "Now, we run individual prompts through the queue. (allenai#796)…

5b9bc6c

…" (allenai#804) This reverts commit d3a349a.

Conversation

finbarrtimbers commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mnoukhov commented Jul 17, 2025

Uh oh!

finbarrtimbers commented Jul 17, 2025

Uh oh!

mnoukhov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mnoukhov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented Jul 17, 2025 •

edited

Loading