[RayTune+RayTrain]The PlacementGroupFactory in Ray Tune fails when using Ray XGBoost trainer 

### What happened + What you expected to happen

**Description**
The PlacementGroupFactory in Ray Tune works as expected for [simple nested child tasks](https://docs.ray.io/en/latest/ray-observability/user-guides/debug-apps/general-debugging.html#placement-groups-are-not-composable), but fails when using a nested Ray XGBoost trainer. The nested  Ray XGBoost trainer is unable to use the placement group resource, instead it created an additional placement group, causing the tuner.fit to hang forever with that "PENDING" placement group. 

We kindly request the Ray team's assistance:
- **Is the following described issue expected? If not, how should we address it? A rough timeline would be helpful if available.**
- **Could you provide guidance on preventing hangs when placement groups can't be satisfied in Ray Tune? We'd prefer a graceful failure over an indefinite hang.**

**Steps to Reproduce** **(See Reproduction script section for full scripts)**
1. Set up a Ray cluster with 4 CPUs
2. Use the following code structure:
```
from ray.train.xgboost import XGBoostTrainer

def train_xgboost(config):
    trainer = XGBoostTrainer(
          scaling_config=ScalingConfig(
                num_workers=1,
                resources_per_worker={"CPU": 2},
          ),
         ....
    )
    result = trainer.fit()

pg = tune.PlacementGroupFactory([
    {"CPU": 1},
    {"CPU": 3}
])

tuner = tune.Tuner(    
    tune.with_resources(train_xgboost, resources=pg),
    ...
)
tuner.fit()
```

**Expected Behavior**
The nested XGBoost task uses a total of 3 cpus: 1 for coordinator + num_worker(1) * cpu/worker(2) = 3. And XGBoost trainer should be able to use the already-created placement group that has {cpu:3}.

**Actual Behavior**
The tuner.fit call hangs forever, and Ray dashboard shows the following resource usage:
```
Resource Status
Usage:
1.0/4.0 CPU (1.0 used of 4.0 reserved in placement groups)
0B/28.68GiB memory
35.87KiB/2.00GiB object_store_memory
Demands:
{'CPU': 3.0} * 1 (PACK): 1+ pending placement groups
```
and placement group table shows the following:
![Screenshot 2024-08-30 at 4 52 45 PM](https://github.com/user-attachments/assets/36a7b9f9-8479-4b03-b769-c08f550c26b9)

As shown in the placement graph, two placement groups were created:
1. first placement group with  {"CPU": 1},  {"CPU": 3}; this pg was successfully created and passed down to ray tune
2. second placement group with {"CPU": 3}; this pg was not able to be fulfilled due to entire cluster has 4 cpus and all have been used by first placement group, as a result second pg gets stuck in PENDING state and causing tuner.fit to hang. 


This basically means, the XGBoost trainer didnt utilize the 3 cpus that was specified in the placement group, instead it created additional placement group that asks for 3 additional cpus, given cluster has a total of 4 cpus and all have been reserved in the first placement group, hence the second placement group cannot be satisfied and hang forever. 

**Things I have Tried**
- It works if we don't specify an additional bundle in the PG. Changing [{"CPU": 1}, {"CPU": 3}] to [{"CPU": 1}] resolves the issue; the first PG uses 1 CPU, and the nested XGBoost trainer creates a second PG using 3 CPUs, totaling 4 CPUs (matching the ray cluster's capacity). However, this solution is inadequate. Ray doesn't pre-check if the nested Ray XGBoost trainer's resources can be utilized. Without proper tuning, hangs may still occur, especially when concurrent_trials exceeds 1.

- We can enforce timeout in [either trial or experiment level ](https://docs.ray.io/en/latest/tune/tutorials/tune-stopping.html); However this is also inadequate because we dont necessarily know the suitable timeout value to set due to various sizes of workloads




### Versions / Dependencies

ray 2.10.0
python: 3.11.5
MacOS


### Reproduction script

```
from ray import tune
from ray import train
from ray.train.xgboost import XGBoostTrainer
from ray.train import ScalingConfig
import ray

# Set up a ray cluster with 4 cpus
ray.shutdown()
ray.init(num_cpus=4)


# Execute the ray tune
def train_xgboost(config):
    train_dataset = ray.data.from_items([{"x": x, "y": x + 1} for x in range(32)])
    trainer = XGBoostTrainer(
        label_column="y",
        params={"objective": "reg:squarederror"},
        scaling_config=ScalingConfig(
            num_workers=1,            
            resources_per_worker={"CPU": 2},            
        ),
        datasets={"train": train_dataset},
    )
    result = trainer.fit()
    train.report({"train-rmse": result.metrics["train-rmse"]})

pg = tune.PlacementGroupFactory([
    {"CPU": 1},
    {"CPU": 3}
])

tuner = tune.Tuner(    
    tune.with_resources(train_xgboost, resources=pg),
    param_space={},
    tune_config=tune.TuneConfig(
        metric="train-rmse",
        mode="min",
        num_samples=1,
        max_concurrent_trials=1,        
    ),
)
tuner.fit()
```

### Issue Severity

High: It blocks me from completing my task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RayTune+RayTrain]The PlacementGroupFactory in Ray Tune fails when using Ray XGBoost trainer #47439

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RayTune+RayTrain]The PlacementGroupFactory in Ray Tune fails when using Ray XGBoost trainer #47439

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions