Multi-Domain RL Training #105

rafapi · 2025-11-17T19:43:23Z

Enables simultaneous training across multiple domains (math, coding, function calling) with domain-agnostic orchestration

Architecture

Component	Role
multidomain/loader.py	Parses :: syntax, concatenates datasets, injects domain field into each sample
dispatcher.py	Routes problem["domain"] → domain-specific rollout callable via actor.domain_rollouts mapping
domain_sampling.py	Weighted sampling with adaptive rebalancing based on completion ratios to maintain target mix despite varying rollout latencies

Configuration

  actor:
    domain_mix:        # sampling weights (normalised at runtime)
      math: 0.4
      coding: 0.3
      fn_calling: 0.3
    domain_rollouts:   # domain → rollout function mapping
      math: pipelinerl.domains.math.generate_math_rollout
      coding: pipelinerl.domains.coding.generate_coding_rollout
    domain_system_prompts:  # per-domain system prompts
      coding: "You are an expert Python programmer..."

  train_dataset_names:
    - math::open_reasoner_zero_57k
    - coding::coding@train

Adaptive Sampling

DomainWeightedSampler adjusts weights dynamically: adjusted_weight = base_weight × (target_ratio / actual_ratio), clamped to [0.1, 10.0]. This compensates for domains with slower rollouts (e.g. coding sandbox execution) to maintain the configured mix in the output stream.

Metrics

Per-domain stats: domain_mix_actual/{domain}, domain_mix_target/{domain}, domain_mix_count/{domain}.

conf/actor/web.yaml

conf/domain_mix/balanced.yaml

ollmer · 2025-11-24T11:33:19Z

conf/math_code.yaml

+  coding_compile_timeout_s: 10.0
+  coding_sandbox_url: ${oc.env:CODING_SANDBOX_URL, "http://sandbox:8080/run_code"}
+
+dataset_loader: pipelinerl.domains.multidomain.loader.load_datasets


Can we have multiple dataloaders at once so we load different datasets for different domains in the same exp?

yes, it's a proportional concatenation

so how can we define a multiple dataset_loader functions in a single config?

in your config you could do (if you wanted to do math, coding and agentic_fn_calling):

defaults: - base - multi_domain: base # inits multidomain loader and dispatcher - domain_mix: main_mix # Or inline the mix as showed below actor: domain_mix: math: 0.4 coding: 0.3 fn_calling: 0.3 train_dataset_names: - math::open_reasoner_zero_57k - coding::coding@train - fn_calling::fn_calling@train test_dataset_names: - math::math_500 - coding::coding@validation - fn_calling::fn_calling@validation

pipelinerl/actor.py

pipelinerl/async_llm.py

pipelinerl/domain_sampling.py

pipelinerl/domains/coding/dataset.py

pipelinerl/domains/coding/rollouts.py

rafapi · 2025-11-30T12:32:00Z

Domain mix distribution:

Targets: math: 40%, coding: 30%, fn_calling: 30%

Initial distribution:

math=22.4%
coding=6.4%
fn_calling=71.2%

Using adaptive sampling (366k+ completions):

math=39.6%
coding=30.3%
fn_calling=30.1%

rafapi added 30 commits November 6, 2025 19:26

Add environment selector

cac78d7

Fix env launcher

df1d846

Adapt domains to env registry

9735130

Adapt domain configs

bb5e5ca

Collect env info

a1a02bf

Remove unrelated files

e4d0bc4

Remove backup

da43cbc

Remove duplicates

5b18001

Restore

9af7329

add domains

599b510

add coding

32eb5b8

Add remaining loaders

efaec65

Domain mix tracking metrics

1220f6d

update domain rollouts

158b2ea

refresh async llm flow

fe8e728

sync coding init

6f9c5cc

expand coding dataset

455ed42

remove legacy executor

795e490

revise coding rollouts

68072b1

adjust multidomain loader

981cb74

refresh preprocess pipeline

f7e6946

enhance utils helpers

73ca9d1

add multi domain config

38ff188

introduce domain sampling

2c5ebfd

add coding sandbox test

40cf648

implement verifier api

2c74b77

add symbolic init

cdfe57b

add symbolic dataset

a3c4106

add symbolic rollouts

7eef15e

remove deleted domains

cc091ac