Second attempt at landing "Now, we run individual prompts through the queue." #807

finbarrtimbers · 2025-07-21T22:06:06Z

Reverts #804, which reverted #796.

The main issue was really dumb. We were missing a return statement on the sync_weights_and_prepare_prompts function. I will try to figure out a way to test this, but I want to submit this so we are unblocked.

Repro run.

…)" (#804)" This reverts commit 4659dca.

hamishivi · 2025-07-22T21:51:45Z

open_instruct/vllm_utils3.py

-            dataset_index=dataset_index,
-        )
+            results.append(
+                GenerationResult(


I think theres a mismatch in the queues: here, all the outputs for a given prompt are put together in a big list in responses, but then in grpo_fast inference_results_q is pulled from args.num_unique_prompts_rollout * args.num_samples_per_prompt_rollout times, which is more than there is actually in the queue (it only has one entry per prompt, not per response).

Agreed, fixed!

open_instruct/grpo_fast.py

hamishivi

Some extra changes needed to get stuff working!

Also, in make_reward_fn, need to make the following change:

 _, timeouts, tool_errors, tool_outputs, _, tool_calleds = infos
 ---
 timeouts = infos.timeouts
tool_errors = infos.tool_errors
tool_outputs = infos.tool_outputs
tool_calleds = infos.tool_calleds

hamishivi · 2025-07-22T23:43:13Z

open_instruct/grpo_fast.py


    # Start vLLM engines to process from queues
+    batch_size_per_engine = (
+        args.num_unique_prompts_rollout * args.num_samples_per_prompt_rollout


I think this should just be args.num_unique_prompts_rollout, since we send just prompts to the engine (and then ask it to produce multiple rollouts via samplingParams)

Nice catch, fixed.

finbarrtimbers · 2025-07-23T00:18:40Z

Some extra changes needed to get stuff working!

Also, in make_reward_fn, need to make the following change:
 _, timeouts, tool_errors, tool_outputs, _, tool_calleds = infos
 ---
 timeouts = infos.timeouts
tool_errors = infos.tool_errors
tool_outputs = infos.tool_outputs
tool_calleds = infos.tool_calleds

@hamishivi I don't follow what you mean by this!

finbarrtimbers · 2025-07-23T00:34:22Z

Some extra changes needed to get stuff working!
Also, in make_reward_fn, need to make the following change:
 _, timeouts, tool_errors, tool_outputs, _, tool_calleds = infos
 ---
 timeouts = infos.timeouts
tool_errors = infos.tool_errors
tool_outputs = infos.tool_outputs
tool_calleds = infos.tool_calleds
@hamishivi I don't follow what you mean by this!

NVM I understand. Fixed.

hamishivi

Nice! Happy to merge assuming this is tested and runs fine on a small job :)

hamishivi · 2025-07-23T05:09:07Z

open_instruct/grpo_fast.py

    code_tool_api_endpoint: Optional[str] = None

    def __post_init__(self):
+        if self.num_unique_prompts_rollout % self.vllm_num_engines != 0:


Thinking on this some more, can we remove this need? Just round-robin add the extra prompts to the extra engines? Already hit a case where this bugs me 128 prompts, 24 vllm engines)

How should we allocate the prompts to each machine? Should I add a inference_batch_size parameter?

can't we just do best-effort even-load-balance? evenly divide, and then add one to the first n engines batch size when sending out the prompts until we have sent out all prompts.

hamishivi

I also get an error like:

2025-07-23T05:37:13.010Z   File "/stage/open_instruct/grpo_fast.py", line 2217, in main
2025-07-23T05:37:13.010Z     maybe_evaluate(
2025-07-23T05:37:13.010Z   File "/stage/open_instruct/grpo_fast.py", line 1912, in maybe_evaluate
2025-07-23T05:37:13.011Z     df = pd.DataFrame(table)
2025-07-23T05:37:13.011Z          ^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z   File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 778, in __init__
2025-07-23T05:37:13.011Z     mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
2025-07-23T05:37:13.011Z           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z   File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
2025-07-23T05:37:13.011Z     return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
2025-07-23T05:37:13.011Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z   File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
2025-07-23T05:37:13.011Z     index = _extract_index(arrays)
2025-07-23T05:37:13.011Z             ^^^^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z   File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
2025-07-23T05:37:13.011Z     raise ValueError("All arrays must be of the same length")
2025-07-23T05:37:13.011Z ValueError: All arrays must be of the same length

running this branch. I think also the maybe_evaluate function needs to handle the multiple outputs somehow? not sure on exact cause.

finbarrtimbers · 2025-08-07T19:26:55Z

Closing in favour of #859.

finbarrtimbers added 3 commits July 22, 2025 14:34

Revert "Revert "Now, we run individual prompts through the queue. (#796…

1f5f06d

…)" (#804)" This reverts commit 4659dca.

Updated code.

d1ebfba

Have to switch repros

7dd7e32

finbarrtimbers force-pushed the revert-804-revert-796-insert-prompts branch from 0f2810d to 7dd7e32 Compare July 22, 2025 20:34

Linter passes.

2254e60

finbarrtimbers requested a review from hamishivi July 22, 2025 21:07

finbarrtimbers marked this pull request as ready for review July 22, 2025 21:07

hamishivi reviewed Jul 22, 2025

View reviewed changes

finbarrtimbers added 4 commits July 22, 2025 16:46

Fixed bug in queue expectations.

d5d55d5

Ran linter.

9dc0191

Added tqdm.

8bb1243

Ran linter.

1445b0c

hamishivi reviewed Jul 22, 2025

View reviewed changes

open_instruct/grpo_fast.py Outdated Show resolved Hide resolved

finbarrtimbers added 3 commits July 22, 2025 17:09

Moved the prompt verification to post_init.

198530f

Fixed bug in code.

db5320d

Ran linter.

48db3a8

hamishivi reviewed Jul 22, 2025

View reviewed changes

A bunch of changes.

79c34e0

finbarrtimbers added 2 commits July 23, 2025 00:39

Fixed bugs.

287e0d7

Ran linter.

2497450

hamishivi approved these changes Jul 23, 2025

View reviewed changes

hamishivi reviewed Jul 23, 2025

View reviewed changes

hamishivi requested changes Jul 23, 2025

View reviewed changes

finbarrtimbers closed this Aug 7, 2025

finbarrtimbers mentioned this pull request Aug 7, 2025

Change LLMRayActor to continually process individual prompts #859

Closed

finbarrtimbers deleted the revert-804-revert-796-insert-prompts branch August 18, 2025 14:09

Second attempt at landing "Now, we run individual prompts through the queue." #807

Second attempt at landing "Now, we run individual prompts through the queue." #807

Uh oh!

Conversation

finbarrtimbers commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hamishivi Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

hamishivi Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers commented Jul 23, 2025

Uh oh!

finbarrtimbers commented Jul 23, 2025

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

hamishivi Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

hamishivi Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

finbarrtimbers commented Jul 21, 2025 •

edited

Loading