-
Notifications
You must be signed in to change notification settings - Fork 490
Second attempt at landing "Now, we run individual prompts through the queue." #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0f2810d to
7dd7e32
Compare
| dataset_index=dataset_index, | ||
| ) | ||
| results.append( | ||
| GenerationResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think theres a mismatch in the queues: here, all the outputs for a given prompt are put together in a big list in responses, but then in grpo_fast inference_results_q is pulled from args.num_unique_prompts_rollout * args.num_samples_per_prompt_rollout times, which is more than there is actually in the queue (it only has one entry per prompt, not per response).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, fixed!
hamishivi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some extra changes needed to get stuff working!
Also, in make_reward_fn, need to make the following change:
_, timeouts, tool_errors, tool_outputs, _, tool_calleds = infos
---
timeouts = infos.timeouts
tool_errors = infos.tool_errors
tool_outputs = infos.tool_outputs
tool_calleds = infos.tool_calleds
open_instruct/grpo_fast.py
Outdated
|
|
||
| # Start vLLM engines to process from queues | ||
| batch_size_per_engine = ( | ||
| args.num_unique_prompts_rollout * args.num_samples_per_prompt_rollout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just be args.num_unique_prompts_rollout, since we send just prompts to the engine (and then ask it to produce multiple rollouts via samplingParams)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, fixed.
@hamishivi I don't follow what you mean by this! |
NVM I understand. Fixed. |
hamishivi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Happy to merge assuming this is tested and runs fine on a small job :)
| code_tool_api_endpoint: Optional[str] = None | ||
|
|
||
| def __post_init__(self): | ||
| if self.num_unique_prompts_rollout % self.vllm_num_engines != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking on this some more, can we remove this need? Just round-robin add the extra prompts to the extra engines? Already hit a case where this bugs me 128 prompts, 24 vllm engines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we allocate the prompts to each machine? Should I add a inference_batch_size parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just do best-effort even-load-balance? evenly divide, and then add one to the first n engines batch size when sending out the prompts until we have sent out all prompts.
hamishivi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also get an error like:
2025-07-23T05:37:13.010Z File "/stage/open_instruct/grpo_fast.py", line 2217, in main
2025-07-23T05:37:13.010Z maybe_evaluate(
2025-07-23T05:37:13.010Z File "/stage/open_instruct/grpo_fast.py", line 1912, in maybe_evaluate
2025-07-23T05:37:13.011Z df = pd.DataFrame(table)
2025-07-23T05:37:13.011Z ^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 778, in __init__
2025-07-23T05:37:13.011Z mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
2025-07-23T05:37:13.011Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
2025-07-23T05:37:13.011Z return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
2025-07-23T05:37:13.011Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
2025-07-23T05:37:13.011Z index = _extract_index(arrays)
2025-07-23T05:37:13.011Z ^^^^^^^^^^^^^^^^^^^^^^
2025-07-23T05:37:13.011Z File "/opt/miniconda3/lib/python3.12/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
2025-07-23T05:37:13.011Z raise ValueError("All arrays must be of the same length")
2025-07-23T05:37:13.011Z ValueError: All arrays must be of the same length
running this branch. I think also the maybe_evaluate function needs to handle the multiple outputs somehow? not sure on exact cause.
|
Closing in favour of #859. |
Reverts #804, which reverted #796.
The main issue was really dumb. We were missing a return statement on the
sync_weights_and_prepare_promptsfunction. I will try to figure out a way to test this, but I want to submit this so we are unblocked.Repro run.