-
Notifications
You must be signed in to change notification settings - Fork 31
Multi-Domain Mix #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rafapi
wants to merge
83
commits into
main
Choose a base branch
from
multi-env
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Multi-Domain Mix #105
Changes from all commits
Commits
Show all changes
83 commits
Select commit
Hold shift + click to select a range
cac78d7
Add environment selector
rafapi df1d846
Fix env launcher
rafapi 9735130
Adapt domains to env registry
rafapi bb5e5ca
Adapt domain configs
rafapi a1a02bf
Collect env info
rafapi e4d0bc4
Remove unrelated files
rafapi da43cbc
Remove backup
rafapi 5b18001
Remove duplicates
rafapi 9af7329
Restore
rafapi 599b510
add domains
rafapi 32eb5b8
add coding
rafapi efaec65
Add remaining loaders
rafapi 1220f6d
Domain mix tracking metrics
rafapi 158b2ea
update domain rollouts
rafapi fe8e728
refresh async llm flow
rafapi 6f9c5cc
sync coding init
rafapi 455ed42
expand coding dataset
rafapi 795e490
remove legacy executor
rafapi 68072b1
revise coding rollouts
rafapi 981cb74
adjust multidomain loader
rafapi f7e6946
refresh preprocess pipeline
rafapi 73ca9d1
enhance utils helpers
rafapi 38ff188
add multi domain config
rafapi 2c5ebfd
introduce domain sampling
rafapi 40cf648
add coding sandbox test
rafapi 2c74b77
implement verifier api
rafapi cdfe57b
add symbolic init
rafapi a3c4106
add symbolic dataset
rafapi 7eef15e
add symbolic rollouts
rafapi cc091ac
remove deleted domains
rafapi 7664773
remove symbolic
rafapi 120ba7b
restore env replica compatibility and uniqueness
rafapi f8d147e
fix
rafapi 62ad5fb
up test len
rafapi 2d22d5e
per domain logging
rafapi 52fcb56
add domain to rollout data
rafapi 9da8f04
keep existing env_replicas value
rafapi 3368e69
restore template
rafapi 2831252
add default to rep per actor
rafapi cb916c1
weight replicas by domain mix
rafapi a6ad805
add easy mix configs
rafapi 2243897
remove bloated coding rewards
rafapi 0cbc542
use existing reward structure
rafapi 6888e05
Merge branch 'main' into multi-env
rafapi 4adcd81
fix naming
rafapi a5a6e44
fix cache data composition
rafapi dca8a43
remove tapeagents imports
rafapi 84b6587
remove tapeagents imports
rafapi 8bbca61
add fn_calling
rafapi 1937578
coding conf
rafapi 69b5154
main mix config
rafapi fd2fc3b
fix finish reason detection
rafapi b139560
include fn_calling loader
rafapi 8b5c159
add domain_mix placeholder
rafapi b876adb
add fn_calling
rafapi 904c80e
fix path
rafapi e4017d9
change mix
rafapi d7935d0
Fix strings
rafapi 8e2e7b3
ensure we arere passing an empty call type
rafapi 5e10988
fix imports
rafapi dac01c1
return
rafapi eb3bacf
return too
rafapi f5093bf
return more
rafapi 27a2a6d
extract ability list
rafapi 5aea032
normalise prompt
rafapi 02f9294
add missing math-code mix config
rafapi f6c128c
declare zero weight domains
rafapi 60169b2
use hydra object conversion
rafapi a3de18e
remove empty lines
rafapi ba360fa
fix end of line
rafapi f405321
remove tapeagents import
rafapi bba110d
remove duplicate code
rafapi 8f60aa2
Merge branch 'main' into multi-env
rafapi 67ffd60
fix imports
rafapi 7ea8744
remove redundant object conversion
rafapi 182ee6e
init dataset placeholders
rafapi 6bc73eb
flattent and convert to python object
rafapi 63c3035
fix per domain system prompt
rafapi fa9a9bc
add sys prompt for coding
rafapi 5e9b037
only spawn environments present in domain mix
rafapi 2677ad9
adaptive sampling
rafapi 48afae1
track domains
rafapi 00fd6cb
track domains
rafapi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| defaults: | ||
| - base | ||
| - _self_ | ||
|
|
||
| actor: | ||
| rollout_policy: pipelinerl.domains.coding.generate_coding_rollout | ||
| system_prompt: |- | ||
| You are an expert Python programmer. When providing code solutions, format your final code inside markdown code blocks using triple backticks with the python language identifier, like this: | ||
| ```python | ||
| # your code here | ||
| ``` | ||
| Provide complete, working implementations that pass all test cases. | ||
| task_template: |- | ||
| {task} | ||
| task_prompt: "" | ||
| ensure_boxed_answers: false | ||
|
|
||
| coding_time_limit_s: 15.0 | ||
| coding_per_test_timeout_s: 10.0 | ||
| coding_memory_limit_bytes: 1073741824 | ||
| coding_compile_timeout_s: 10.0 | ||
| coding_sandbox_url: ${oc.env:CODING_SANDBOX_URL, "http://sandbox:8080/run_code"} | ||
|
|
||
| dataset_loader: pipelinerl.domains.coding.dataset.load_problems | ||
| dataset_loader_params: | ||
| dataset_id: ServiceNow-AI/mixed-training-text-datasets | ||
| dataset_config: 80k-if-math-coding-fncalling-stem | ||
| split_ratios: | ||
| train: 0.9 | ||
| validation: 0.05 | ||
| test: 0.05 | ||
| allowed_call_types: | ||
| - assert | ||
| - std | ||
| max_examples_per_split: 2048 | ||
| trust_remote_code: true | ||
| huggingface_token: ${oc.env:CODING_HF_TOKEN, null} | ||
|
|
||
| train_dataset_names: | ||
| - coding@train | ||
|
|
||
| test_dataset_names: | ||
| - coding@validation | ||
|
|
||
| environments: | ||
| - key: coding | ||
| mode: remote | ||
| _target_: pipelinerl.domains.coding.CodingSandboxEnvironment | ||
| sandbox_url: ${actor.coding_sandbox_url} | ||
| compile_timeout_s: ${actor.coding_compile_timeout_s} | ||
| run_timeout_s: ${actor.coding_per_test_timeout_s} | ||
| request_timeout_s: ${actor.coding_time_limit_s} | ||
| memory_limit_bytes: ${actor.coding_memory_limit_bytes} | ||
|
|
||
| environment_key: coding |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| defaults: | ||
| - base | ||
| - domain_rollouts: base | ||
| - override rewards: success_and_format | ||
| - _self_ | ||
|
|
||
| actor: | ||
| rollout_policy: pipelinerl.domains.dispatcher.generate_multidomain_rollout | ||
| llm_max_rollouts: 2 | ||
| rollout_workers: 1 | ||
| domain_rollouts: | ||
| math: ${domain_rollouts.math} | ||
| guessing: ${domain_rollouts.guessing} | ||
| coding: ${domain_rollouts.coding} | ||
|
|
||
| dataset_loader: pipelinerl.domains.multidomain.load_problems | ||
| train_dataset_names: | ||
| - math_debug | ||
| - guessing_debug | ||
| - coding_debug | ||
| test_dataset_names: | ||
| - math_debug | ||
| - coding_debug | ||
|
|
||
| environment: null | ||
| environment_key: null | ||
|
|
||
| world: | ||
| env_replicas_per_actor: 0 | ||
| environment_mode: embedded |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # Domain mix presets | ||
|
|
||
| Hydra group `domain_mix` stores reusable presets for `actor.domain_mix`. | ||
|
|
||
| Usage examples: | ||
|
|
||
| ``` | ||
| python main.py --config-name multi_domain/base +domain_mix=math_coding_70_30 | ||
| python main.py --config-name multi_domain/base +domain_mix=balanced | ||
| ``` | ||
|
|
||
| Override or extend these presets by creating new files under `conf/domain_mix/`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # @package actor.domain_mix | ||
|
|
||
| math: 1.0 | ||
ollmer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| guessing: 1.0 | ||
| counting: 1.0 | ||
| chartqa: 1.0 | ||
| miniwob: 1.0 | ||
| coding: 1.0 | ||
| fn_calling: 1.0 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| # @package actor.domain_mix | ||
|
|
||
| math: 0.3 | ||
| coding: 0.7 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| # @package actor.domain_mix | ||
|
|
||
| math: 0.4 | ||
| coding: 0.3 | ||
| fn_calling: 0.3 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| # @package actor.domain_mix | ||
|
|
||
| math: 0.7 | ||
| coding: 0.3 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| # Mapping between domain identifiers and rollout callables. | ||
| math: pipelinerl.domains.math.generate_math_rollout | ||
| guessing: pipelinerl.domains.guessing.generate_guessing_rollout | ||
| counting: pipelinerl.domains.counting.generate_counting_rollout | ||
| miniwob: pipelinerl.domains.miniwob.rollouts.generate_miniwob_rollout | ||
| chartqa: pipelinerl.domains.chartqa.generate_chartqa_rollout | ||
| coding: pipelinerl.domains.coding.generate_coding_rollout | ||
| fn_calling: pipelinerl.domains.fn_calling.generate_fn_calling_rollout |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| defaults: | ||
| - base | ||
| - _self_ | ||
|
|
||
| actor: | ||
| rollout_policy: pipelinerl.domains.fn_calling.generate_fn_calling_rollout | ||
| system_prompt: "" | ||
| task_template: "{task}" | ||
| task_prompt: "" | ||
| ensure_boxed_answers: false | ||
|
|
||
| dataset_loader: pipelinerl.domains.fn_calling.dataset.load_problems | ||
| dataset_loader_params: | ||
| dataset_id: ServiceNow-AI/mixed-training-text-datasets | ||
| dataset_config: 80k-if-math-coding-fncalling-stem | ||
| split_ratios: | ||
| train: 0.9 | ||
| validation: 0.05 | ||
| test: 0.05 | ||
| allowed_call_types: [] | ||
| max_examples_per_split: 2048 | ||
| trust_remote_code: true | ||
| huggingface_token: ${oc.env:CODING_HF_TOKEN, null} | ||
|
|
||
| train_dataset_names: | ||
| - fn_calling@train | ||
|
|
||
| test_dataset_names: | ||
| - fn_calling@validation | ||
|
|
||
| environments: | ||
| - key: fn_calling | ||
| mode: remote | ||
| _target_: pipelinerl.domains.fn_calling.AgenticToolsEnvironment | ||
|
|
||
| environment_key: fn_calling |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| defaults: | ||
| - base | ||
| - /domain_rollouts@domain_rollouts: base | ||
| - domain_mix: math_coding_70_30 | ||
| - _self_ | ||
|
|
||
| actor: | ||
| rollout_policy: pipelinerl.domains.dispatcher.generate_multidomain_rollout | ||
| system_prompt: "" | ||
| task_template: |- | ||
| {task} | ||
| task_prompt: "" | ||
| ensure_boxed_answers: false | ||
| domain_rollouts: | ||
| math: ${domain_rollouts.math} | ||
| coding: ${domain_rollouts.coding} | ||
| coding_time_limit_s: 15.0 | ||
| coding_per_test_timeout_s: 10.0 | ||
| coding_memory_limit_bytes: 1073741824 | ||
| coding_compile_timeout_s: 10.0 | ||
| coding_sandbox_url: ${oc.env:CODING_SANDBOX_URL, "http://sandbox:8080/run_code"} | ||
|
|
||
| dataset_loader: pipelinerl.domains.multidomain.loader.load_datasets | ||
rafapi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| dataset_loader_params: | ||
| per_domain_params: | ||
| coding: | ||
| dataset_id: ServiceNow-AI/mixed-training-text-datasets | ||
| dataset_config: 80k-if-math-coding-fncalling-stem | ||
| split_ratios: | ||
| train: 0.9 | ||
| validation: 0.05 | ||
| test: 0.05 | ||
| allowed_call_types: | ||
| - assert | ||
| - std | ||
| max_examples_per_split: 2048 | ||
| trust_remote_code: true | ||
| huggingface_token: ${oc.env:CODING_HF_TOKEN, null} | ||
|
|
||
| train_dataset_names: | ||
| - math::open_reasoner_zero_57k | ||
| - math::open_reasoner_zero_extended_72k | ||
| - coding::coding@train | ||
|
|
||
| test_dataset_names: | ||
| - math::aime_2024 | ||
| - math::amc_2023 | ||
| - math::math_500 | ||
| - coding::coding@validation | ||
|
|
||
| environments: | ||
| - key: math | ||
| mode: remote | ||
| replicas_per_actor: ${world.env_replicas_per_actor} | ||
| _target_: pipelinerl.domains.math.MathEnvironment | ||
| - key: coding | ||
| mode: remote | ||
| replicas_per_actor: ${world.env_replicas_per_actor} | ||
| _target_: pipelinerl.domains.coding.CodingSandboxEnvironment | ||
| sandbox_url: ${actor.coding_sandbox_url} | ||
| compile_timeout_s: ${actor.coding_compile_timeout_s} | ||
| run_timeout_s: ${actor.coding_per_test_timeout_s} | ||
| request_timeout_s: ${actor.coding_time_limit_s} | ||
| memory_limit_bytes: ${actor.coding_memory_limit_bytes} | ||
|
|
||
| environment_key: null | ||
|
|
||
| world: | ||
| env_replicas_per_actor: 1 | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.