Skip to content

Conversation

@xinyuangui2
Copy link
Contributor

@xinyuangui2 xinyuangui2 commented Nov 17, 2025

Summary

This PR adds support for per-dataset execution options in DataConfig, allowing users to specify different ExecutionOptions for different datasets. This enables fine-grained control over how each dataset is processed by Ray Data.

Use case

For train and test dataset, users might need different execution_options.exclude_resources or execution_options.resource_limits. This PR enables this configuration.

For example, it's common to have different ExecutionOptions for train dataset and validation dataset.

trainer = ray.train.torch.TorchTrainer(
    train_func,
    # Pass training dataset in datasets arg to split it across training workers
    datasets={"train": train_dataset, "val": val_dataset},
    scaling_config=ray.train.ScalingConfig(
        num_workers=2,
        use_gpu=True,
        # Use powerful GPUs for training
        accelerator_type="A100",
    ),
    dataset_config=DatasetConfig(
        datasets_to_split='all',
        execution_options={
            # train dataset uses default execution options
            'val': ExecutionOptions(exclude_resources=ExecutionResources.zero(), resource_limits=ExecutionResources.for_limits(cpu=1)),
        }
        enable_shard_locality=True,
    ),
)

@xinyuangui2 xinyuangui2 changed the title per dataset config [Train] Per dataset execution_option for DataConfig Nov 18, 2025
Signed-off-by: xgui <[email protected]>
Signed-off-by: xgui <[email protected]>
@xinyuangui2 xinyuangui2 marked this pull request as ready for review November 18, 2025 18:09
@xinyuangui2 xinyuangui2 requested a review from a team as a code owner November 18, 2025 18:09
@ray-gardener ray-gardener bot added train Ray Train Related Issue data Ray Data-related issues labels Nov 18, 2025
Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks! can you add a motivating usage example in the PR description?

Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Almost good

Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@justinvyu
Copy link
Contributor

Btw, can you add a motivating usage example in the PR description?

@xinyuangui2
Copy link
Contributor Author

Btw, can you add a motivating usage example in the PR description?

Done.

@justinvyu justinvyu enabled auto-merge (squash) November 24, 2025 20:16
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Nov 24, 2025
@justinvyu justinvyu merged commit a1167ad into ray-project:master Nov 24, 2025
7 of 8 checks passed
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
This PR adds support for per-dataset execution options in `DataConfig`,
allowing users to specify different `ExecutionOptions` for different
datasets. This enables fine-grained control over how each dataset is
processed by Ray Data.

---------

Signed-off-by: xgui <[email protected]>
Signed-off-by: Xinyuan <[email protected]>
Co-authored-by: Justin Yu <[email protected]>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
This PR adds support for per-dataset execution options in `DataConfig`,
allowing users to specify different `ExecutionOptions` for different
datasets. This enables fine-grained control over how each dataset is
processed by Ray Data.

---------

Signed-off-by: xgui <[email protected]>
Signed-off-by: Xinyuan <[email protected]>
Co-authored-by: Justin Yu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants