-
Notifications
You must be signed in to change notification settings - Fork 30
Multi-Domain RL Training #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| coding_compile_timeout_s: 10.0 | ||
| coding_sandbox_url: ${oc.env:CODING_SANDBOX_URL, "http://sandbox:8080/run_code"} | ||
|
|
||
| dataset_loader: pipelinerl.domains.multidomain.loader.load_datasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have multiple dataloaders at once so we load different datasets for different domains in the same exp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's a proportional concatenation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so how can we define a multiple dataset_loader functions in a single config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in your config you could do (if you wanted to do math, coding and agentic_fn_calling):
defaults:
- base
- multi_domain: base # inits multidomain loader and dispatcher
- domain_mix: main_mix # Or inline the mix as showed below
actor:
domain_mix:
math: 0.4
coding: 0.3
fn_calling: 0.3
train_dataset_names:
- math::open_reasoner_zero_57k
- coding::coding@train
- fn_calling::fn_calling@train
test_dataset_names:
- math::math_500
- coding::coding@validation
- fn_calling::fn_calling@validation
Domain mix distribution:Targets: math: 40%, coding: 30%, fn_calling: 30%Initial distribution:
Using adaptive sampling (366k+ completions):
|
Enables simultaneous training across multiple domains (math, coding, function calling) with domain-agnostic orchestration
Architecture
Configuration
Adaptive Sampling
DomainWeightedSampler adjusts weights dynamically:
adjusted_weight = base_weight × (target_ratio / actual_ratio), clamped to [0.1, 10.0]. This compensates for domains with slower rollouts (e.g. coding sandbox execution) to maintain the configured mix in the output stream.Metrics
Per-domain stats: domain_mix_actual/{domain}, domain_mix_target/{domain}, domain_mix_count/{domain}.