Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Jul 21, 2025

Reverts #804, which reverted #796.

The main issue was really dumb. We were missing a return statement on the sync_weights_and_prepare_prompts function. I will try to figure out a way to test this, but I want to submit this so we are unblocked.

Repro run.

@finbarrtimbers finbarrtimbers force-pushed the revert-804-revert-796-insert-prompts branch from 0f2810d to 7dd7e32 Compare July 22, 2025 20:34
@finbarrtimbers finbarrtimbers requested a review from hamishivi July 22, 2025 21:07
@finbarrtimbers finbarrtimbers marked this pull request as ready for review July 22, 2025 21:07
dataset_index=dataset_index,
)
results.append(
GenerationResult(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think theres a mismatch in the queues: here, all the outputs for a given prompt are put together in a big list in responses, but then in grpo_fast inference_results_q is pulled from args.num_unique_prompts_rollout * args.num_samples_per_prompt_rollout times, which is more than there is actually in the queue (it only has one entry per prompt, not per response).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, fixed!

Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some extra changes needed to get stuff working!

Also, in make_reward_fn, need to make the following change:

 _, timeouts, tool_errors, tool_outputs, _, tool_calleds = infos
 ---
 timeouts = infos.timeouts
tool_errors = infos.tool_errors
tool_outputs = infos.tool_outputs
tool_calleds = infos.tool_calleds


# Start vLLM engines to process from queues
batch_size_per_engine = (
args.num_unique_prompts_rollout * args.num_samples_per_prompt_rollout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be args.num_unique_prompts_rollout, since we send just prompts to the engine (and then ask it to produce multiple rollouts via samplingParams)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, fixed.

@finbarrtimbers
Copy link
Collaborator Author

Some extra changes needed to get stuff working!

Also, in make_reward_fn, need to make the following change:

 _, timeouts, tool_errors, tool_outputs, _, tool_calleds = infos
 ---
 timeouts = infos.timeouts
tool_errors = infos.tool_errors
tool_outputs = infos.tool_outputs
tool_calleds = infos.tool_calleds

@hamishivi I don't follow what you mean by this!

@finbarrtimbers
Copy link
Collaborator Author

Some extra changes needed to get stuff working!
Also, in make_reward_fn, need to make the following change:

 _, timeouts, tool_errors, tool_outputs, _, tool_calleds = infos
 ---
 timeouts = infos.timeouts
tool_errors = infos.tool_errors
tool_outputs = infos.tool_outputs
tool_calleds = infos.tool_calleds

@hamishivi I don't follow what you mean by this!

NVM I understand. Fixed.

Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Happy to merge assuming this is tested and runs fine on a small job :)

code_tool_api_endpoint: Optional[str] = None

def __post_init__(self):
if self.num_unique_prompts_rollout % self.vllm_num_engines != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on this some more, can we remove this need? Just round-robin add the extra prompts to the extra engines? Already hit a case where this bugs me 128 prompts, 24 vllm engines)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we allocate the prompts to each machine? Should I add a inference_batch_size parameter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just do best-effort even-load-balance? evenly divide, and then add one to the first n engines batch size when sending out the prompts until we have sent out all prompts.

Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also get an error like:

2025-07-23T05:37:13.010Z   File "/stage/open_instruct/grpo_fast.py", line 2217, in main
2025-07-23T05:37:13.010Z     maybe_evaluate(
2025-07-23T05:37:13.010Z   File "/stage/open_instruct/grpo_fast.py", line 1912, in maybe_evaluate
2025-07-23T05:37:13.011Z     df = pd.DataFrame(table)
2025-07-23T05:37:13.011Z          ^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z   File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 778, in __init__
2025-07-23T05:37:13.011Z     mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
2025-07-23T05:37:13.011Z           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z   File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
2025-07-23T05:37:13.011Z     return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
2025-07-23T05:37:13.011Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z   File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
2025-07-23T05:37:13.011Z     index = _extract_index(arrays)
2025-07-23T05:37:13.011Z             ^^^^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z   File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
2025-07-23T05:37:13.011Z     raise ValueError("All arrays must be of the same length")
2025-07-23T05:37:13.011Z ValueError: All arrays must be of the same length

running this branch. I think also the maybe_evaluate function needs to handle the multiple outputs somehow? not sure on exact cause.

@finbarrtimbers
Copy link
Collaborator Author

Closing in favour of #859.

@finbarrtimbers finbarrtimbers deleted the revert-804-revert-796-insert-prompts branch August 18, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants