[Data] Fix SF 100 release test #58816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

alexeykudinkin merged 28 commits into ray-project:master from owenowenisme:data/fix-sf-100-release-test

Dec 4, 2025

Member

owenowenisme commented Nov 19, 2025 •

edited

Loading

Description

Fix aggregator actors OOM for SF‑100 map_groups on an autoscaling cluster.

Main problems:

Even with “scheduling_strategy”: “SPREAD”, autoscaling causes greedy placement on the same nodes (node fills → autoscale → schedule on new node → autoscale), so actors cluster on the first few nodes.
Available memory calculations didn’t exclude the Object Store, a known issue ([Core] Actor/Task Memory resource allocation should be separate from Object Store allocation #49344).
Aggregator resource miscalculation: we observed ≈1.7 GiB spikes, but only allocated 1.13 GiB per actor.Previously, aggregator memory was computed as dataset_size/num_aggregators + dataset_size/num_partitions (input + output).
This is incorrect because partition size is determined by num_partitions, not num_aggregators, and some aggregators receive multiple partitions via partition_id % num_aggregators, leading to under‑allocation.For SF‑100, dataset size ≈82.1 GiB.
The old formula gives (82.1/128) + (82.1/200) ≈ 1.13 GiB. For an aggregator receiving 2 partitions, the max memory should be (82.1/200 per‑partition) × 2 (input + output) × 2 (ceil(num_partitions/num_aggregators)) ≈ 1.76 GiB, matching the observed ≈1.7 GiB peak.

This fails only on autoscaling clusters because the resource budget is tighter than fixed‑size clusters (≈18 nodes vs 32 nodes), and aggregators aren’t well spread.

Job URL: https://console.anyscale-staging.com/o/anyscale-internal/jobs/prodjob_z2pll84knil1iexjhwfcmislan

Fixes

Correct the calculation of aggregators from dataset_size/num_aggregators + dataset_size/num_partitions to (82.1/ 200)(per partition size) * 2( Input and output) * 2 (ceil (num_partition/num_aggregators))
-Compute CPU from real available memory (exclude Object Store proportion). (May not be the best way but this is what we can do right now.)
Manually exclude Object Store Memory proportion when calculating target_num_cpus for aggregators

        worker_heap_memory_proportion = 1 - DEFAULT_OBJECT_STORE_MEMORY_PROPORTION
        target_num_cpus = min(
            cap,
            estimated_aggregator_memory_required
            / (4 * GiB * worker_heap_memory_proportion),
        )

Related issues

Closes #58734

bveeramani and others added 7 commits

November 17, 2025 13:25


          Initial commit

0abbd50

Signed-off-by: Balaji Veeramani <[email protected]>


          Update scale

0340d58

Signed-off-by: Balaji Veeramani <[email protected]>


          Remove comment

28ebd4b

Signed-off-by: Balaji Veeramani <[email protected]>


          Merge branch 'master' into fixed-size-map-groups

9b935b4


          Increase scale

3d79ecc

Signed-off-by: Balaji Veeramani <[email protected]>


          Update file'

f430e5e

Signed-off-by: Balaji Veeramani <[email protected]>


          Merge remote-tracking branch 'upstream/master' into fixed-size-map-gr…

d9ba322

…oups

owenowenisme force-pushed the data/fix-sf-100-release-test branch from b30adf5 to d9ba322 Compare

November 19, 2025 09:14

owenowenisme added 6 commits

November 19, 2025 23:30


          remove adjustment

bc4ce3d

Signed-off-by: You-Cheng Lin <[email protected]>


          update matrix

7f223e8

Signed-off-by: You-Cheng Lin <[email protected]>


          Merge remote-tracking branch 'upstream/master' into data/fix-sf-100-r…

4af7aea

…elease-test


          debug log

bbb0586

Signed-off-by: You-Cheng Lin <[email protected]>


          quick fix

b4c0472

Signed-off-by: You-Cheng Lin <[email protected]>


          update

b23c251

Signed-off-by: You-Cheng Lin <[email protected]>

owenowenisme added go data release-test labels


          update

f995052

Signed-off-by: You-Cheng Lin <[email protected]>

owenowenisme marked this pull request as ready for review

November 23, 2025 16:31

owenowenisme requested a review from a team as a code owner

November 23, 2025 16:31

alexeykudinkin reviewed

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py

Comment on lines 1267 to 1277

    
                          f"shuffle={aggregator_shuffle_object_store_memory_required / MiB:.1f}MiB, "

                          f"output={output_object_store_memory_required / MiB:.1f}MiB, "

Contributor

alexeykudinkin Nov 25, 2025

Let's keep these line-items separate as they help estimate individual components

Member Author

owenowenisme Nov 26, 2025

Not sure what this mean, but I restore the info to show shuffle & output memory info

python/ray/data/_internal/execution/operators/hash_shuffle.py

    
                          # Output (object store)

                          output_object_store_memory_required

                          max_partitions_for_aggregator * partition_byte_size_estimate * 2

                      )

Contributor

alexeykudinkin Nov 25, 2025

Let's preserve the structure of estimating individual components (shuffle, output, working memory)

owenowenisme and others added 2 commits

November 26, 2025 08:19


          restore logging info

5bc7889

Signed-off-by: You-Cheng Lin <[email protected]>


          Merge branch 'master' into data/fix-sf-100-release-test

74c60a3

Signed-off-by: You-Cheng Lin <[email protected]>

alexeykudinkin reviewed

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py

Comment on lines 1246 to 1269

    
                      )

                      # Estimate of object store memory required to accommodate all partitions

                      # handled by a single aggregator

                      aggregator_shuffle_object_store_memory_required: int = math.ceil(

                          estimated_dataset_bytes / num_aggregators

                      )

                      # Estimate of memory required to accommodate single partition as an output

                      # (inside Object Store)

                      output_object_store_memory_required: int = partition_byte_size_estimate

                      )  # Estimated byte size of a single partition

                      aggregator_total_memory_required: int = (

                          # Inputs (object store)

                          aggregator_shuffle_object_store_memory_required

                          +

                          # Output (object store)

                          output_object_store_memory_required

                      )

Contributor

alexeykudinkin Nov 26, 2025

What i meant is following

We still want to estimate total memory required based on corresponding components (shuffle, output, etc)
Core change you're making is moving from dataset_size / num_aggregators to partition_byte_size * max_partitions, which is just a rounded version

Let's keep the overall structure and just make the actual change you want to make by updating shuffle component estimate.

BTW, we can add some skew factor (like say 30% extra as padding against skews)

owenowenisme and others added 2 commits

November 27, 2025 17:24


          update

8b1ecc7

Signed-off-by: You-Cheng Lin <[email protected]>


          Merge branch 'master' into data/fix-sf-100-release-test

35bcf3e

cursor bot reviewed

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py Show resolved Hide resolved

alexeykudinkin reviewed

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py Outdated Show resolved Hide resolved

owenowenisme added 2 commits

December 3, 2025 11:59


          remove worker proportion calculation

398d038

Signed-off-by: You-Cheng Lin <[email protected]>


          add ceil

514baa2

Signed-off-by: You-Cheng Lin <[email protected]>


          Merge branch 'master' into data/fix-sf-100-release-test

4c531c6

cursor bot reviewed

View reviewed changes

release/release_data_tests.yaml Show resolved Hide resolved


          update release test yaml

11f563e

Signed-off-by: You-Cheng Lin <[email protected]>

owenowenisme force-pushed the data/fix-sf-100-release-test branch from b082697 to 11f563e Compare

December 3, 2025 06:28

owenowenisme added 2 commits

December 3, 2025 14:32


          Merge branch 'master' into data/fix-sf-100-release-test

d45fc6d


          Merge branch 'master' into data/fix-sf-100-release-test

d74dad9

alexeykudinkin approved these changes

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py Outdated

Comment on lines 1261 to 1262

    
                      # Add 30% buffer to account for data skew

                      SKEW_FACTOR = 1.3

Contributor

alexeykudinkin Dec 3, 2025

Previous comment got lost -- make this a class var so that we can patch it if needed

python/ray/data/_internal/execution/operators/hash_shuffle.py Outdated

    
                      )  # Estimated byte size of a single partition

                      # Add 30% buffer to account for data skew

                      SKEW_FACTOR = 1.3

Contributor

alexeykudinkin Dec 3, 2025

Suggested change

      
                    SKEW_FACTOR = 1.3
          
                    SHUFFLE_AGGREGATOR_MEMORY_ESTIMATE_SKEW_FACTOR = 1.3


          address comments

d2aa269

Signed-off-by: You-Cheng Lin <[email protected]>

cursor bot reviewed

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py Show resolved Hide resolved


          trigger test

3f4a317

Signed-off-by: You-Cheng Lin <[email protected]>

cursor bot reviewed

View reviewed changes

cursor bot left a comment

Bug: Test expected values not updated for new memory formula

The test_hash_shuffle_operator_remote_args test cases have expected memory values that match the old memory calculation formula, but HashShuffleOperator._estimate_aggregator_memory_allocation was changed to use a new formula. The new formula multiplies partition_byte_size * max_partitions_for_aggregator for both input and output (plus a 1.3 skew factor), resulting in significantly larger memory values. For example, Case 1 expects 671088640 but the new formula produces approximately 1,395,864,372. These tests will fail until the expected values are updated.

python/ray/data/tests/test_hash_shuffle.py#L392-L393

ray/python/ray/data/tests/test_hash_shuffle.py

Lines 392 to 393 in 3f4a317

    
                           "num_cpus": 0.16, 
        
                           "memory": 671088640,

python/ray/data/tests/test_hash_shuffle.py#L440-L441

ray/python/ray/data/tests/test_hash_shuffle.py

Lines 440 to 441 in 3f4a317

    
                           "num_cpus": 0.16,  # ~0.6Gb / 4Gb = ~0.16 
        
                           "memory": 687865856,

owenowenisme added 2 commits

December 4, 2025 12:49


          update test

ceeb3a7

Signed-off-by: You-Cheng Lin <[email protected]>


          make test size to medium

6951c51

Signed-off-by: You-Cheng Lin <[email protected]>

alexeykudinkin merged commit 8437df4 into ray-project:master

6 checks passed

bveeramani mentioned this pull request

[Data] Remove test_hash_shuffle from excluded missing-pytest-main lint #59247

Merged

bveeramani added a commit that referenced this pull request


          [Data] Remove test_hash_shuffle from excluded missing-pytest-main…

a5cad87

… lint (#59247)

Our CI uses Bazel to run Python tests (not pytest). If we don't call
`pytest.main` in a test module, the tests aren't actually run.

#58816 fixed this issue for
`test_hash_shuffle`. As a follow up, this PR removes the
`test_hash_shuffle` from the list of files excluded from the
`pytest.main` lint.

Signed-off-by: Balaji Veeramani <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data go release-test