Skip to content

Conversation

@rafapi
Copy link
Collaborator

@rafapi rafapi commented Nov 17, 2025

Enables simultaneous training across multiple domains (math, coding, function calling) with domain-agnostic orchestration

Architecture

Component Role
multidomain/loader.py Parses :: syntax, concatenates datasets, injects domain field into each sample
dispatcher.py Routes problem["domain"] → domain-specific rollout callable via actor.domain_rollouts mapping
domain_sampling.py Weighted sampling with adaptive rebalancing based on completion ratios to maintain target mix despite varying rollout latencies

Configuration

  actor:
    domain_mix:        # sampling weights (normalised at runtime)
      math: 0.4
      coding: 0.3
      fn_calling: 0.3
    domain_rollouts:   # domain → rollout function mapping
      math: pipelinerl.domains.math.generate_math_rollout
      coding: pipelinerl.domains.coding.generate_coding_rollout
    domain_system_prompts:  # per-domain system prompts
      coding: "You are an expert Python programmer..."

  train_dataset_names:
    - math::open_reasoner_zero_57k
    - coding::coding@train

Adaptive Sampling

DomainWeightedSampler adjusts weights dynamically: adjusted_weight = base_weight × (target_ratio / actual_ratio), clamped to [0.1, 10.0]. This compensates for domains with slower rollouts (e.g. coding sandbox execution) to maintain the configured mix in the output stream.

Metrics

Per-domain stats: domain_mix_actual/{domain}, domain_mix_target/{domain}, domain_mix_count/{domain}.

coding_compile_timeout_s: 10.0
coding_sandbox_url: ${oc.env:CODING_SANDBOX_URL, "http://sandbox:8080/run_code"}

dataset_loader: pipelinerl.domains.multidomain.loader.load_datasets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have multiple dataloaders at once so we load different datasets for different domains in the same exp?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's a proportional concatenation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how can we define a multiple dataset_loader functions in a single config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in your config you could do (if you wanted to do math, coding and agentic_fn_calling):

defaults:
  - base
  - multi_domain: base        # inits multidomain loader and dispatcher
  - domain_mix: main_mix      # Or inline the mix as showed below

actor:
  domain_mix:
    math: 0.4
    coding: 0.3
    fn_calling: 0.3

train_dataset_names:
  - math::open_reasoner_zero_57k
  - coding::coding@train
  - fn_calling::fn_calling@train

test_dataset_names:
  - math::math_500
  - coding::coding@validation
  - fn_calling::fn_calling@validation

@rafapi
Copy link
Collaborator Author

rafapi commented Nov 30, 2025

Domain mix distribution:

Targets: math: 40%, coding: 30%, fn_calling: 30%

Initial distribution:

  • math=22.4%
  • coding=6.4%
  • fn_calling=71.2%

Using adaptive sampling (366k+ completions):

  • math=39.6%
  • coding=30.3%
  • fn_calling=30.1%

@rafapi rafapi changed the title Implement multi-environment and multi-domain mixing Multi-Domain RL Training Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants