[Data] Improve memory leak test robustness #58968

bveeramani · 2025-11-25T08:38:29Z

Why are these changes needed?

The memory leak being tested (apache/arrow#45493) specifically occurs when inferring types from ndarray objects, not from lists containing ndarrays. Testing the list case added no value since the leak doesn't manifest there—it only added execution time and obscured the test's purpose.

More importantly, the previous 1 MiB threshold was too tight and caused flaky failures. Memory measurements via RSS are inherently noisy due to OS-level allocation behavior, garbage collection timing, and memory fragmentation. A test that occasionally uses 1.1 MiB would fail despite no actual leak.

The new approach:

Calls _infer_pyarrow_type 8 times in a loop, which leaks 1 GiB without Ray Data's workaround (admittedly, 8 is a magic number here)
Uses a 64 MiB threshold, providing a much larger margin above normal variation while still catching any real leak with a clear signal

This creates a much stronger test: if the leak exists, we'd see memory growth approaching 1 GiB (with repeated runs), making failures unambiguous. Meanwhile, normal RSS fluctuations of a few MiB won't trigger false positives.

Remove `list` parameter and increase tolerance threshold for the Arrow type inference memory leak test. The memory leak (apache/arrow#45493) specifically occurs with ndarray objects, not lists—testing the list case added no value. The previous 1 MiB threshold was too tight, causing flaky failures due to normal RSS measurement noise. The new approach calls `_infer_pyarrow_type` 8 times in a loop, which would leak ~56 MiB if the bug regresses, and uses a 64 MiB threshold to provide clear signal while avoiding false positives. Signed-off-by: Balaji Veeramani <[email protected]>

gemini-code-assist

Code Review

This pull request improves the robustness of a memory leak test related to _infer_pyarrow_type. The changes focus the test on the specific case that causes the leak (ndarray objects), remove unnecessary parametrization, and make the test less flaky by amplifying the potential leak and increasing the memory threshold. The changes are well-justified and improve the test's reliability. I have one suggestion to improve a comment for clarity.

gemini-code-assist · 2025-11-25T08:41:28Z

python/ray/data/tests/unit/test_arrow_type_conversion.py

+    # Call the function several times. If there's a memory leak, this loop will leak
+    # as much as 1 GiB of memory.


The comment states that the loop will leak "as much as 1 GiB of memory", which is an overstatement and could be confusing. Based on the test setup (an ~7 MiB array processed 8 times), the expected memory leak would be around 56 MiB if the bug regresses. The PR description also mentions this amount. To avoid confusion, it would be better to update the comment to reflect the expected leak size more accurately.

Suggested change

# Call the function several times. If there's a memory leak, this loop will leak

# as much as 1 GiB of memory.

# Call the function several times. If there's a memory leak of ~7 MiB per call,

# this loop will leak ~56 MiB.

This isn't true. I ran this, and it leaked 1 GiB

owenowenisme

Overall LGTM, just some nit comments

python/ray/data/tests/unit/test_arrow_type_conversion.py

owenowenisme · 2025-11-25T08:59:26Z

python/ray/data/tests/unit/test_arrow_type_conversion.py

    after = process.memory_info().rss

-    assert after - before < 1024 * 1024, after - before
+    assert after - before < 64 * MiB, memory_string(after - before)


nit: Something like 8 * 7Mib * some margin of error would be much better

Added named constant rather than magic number. Should be ideally 0 MiB because we garbage collect after the loop

On second thought, reverted from 8 * 7 MiB * margin of error to a fixed threshold because of this reasoning: #58968 (comment)

Ahh okay, I thought a ndarray is exact 7MiB

Signed-off-by: Balaji Veeramani <[email protected]>

cursor · 2025-11-25T09:42:18Z

python/ray/data/tests/unit/test_arrow_type_conversion.py

    after = process.memory_info().rss

-    assert after - before < 1024 * 1024, after - before
+    margin_of_error = ndarray.nbytes * num_repetitions


Bug: Margin of error calculation allows excessive memory growth

The margin_of_error is set to ndarray.nbytes * num_repetitions, which allows up to 56 MiB of memory growth (8 repetitions × 7 MiB). Since garbage collection runs after the loop, the expected memory growth should ideally be zero. This permissive margin would allow approximately one leaked array copy per iteration, potentially masking the very memory leak the test aims to detect. The margin should account only for normal RSS measurement noise, not the cumulative size of all iterations.

Signed-off-by: Balaji Veeramani <[email protected]>

## Why are these changes needed? The memory leak being tested ([apache/arrow#45493](apache/arrow#45493)) specifically occurs when inferring types from **ndarray objects**, not from lists containing ndarrays. Testing the `list` case added no value since the leak doesn't manifest there—it only added execution time and obscured the test's purpose. More importantly, the previous 1 MiB threshold was too tight and caused flaky failures. Memory measurements via RSS are inherently noisy due to OS-level allocation behavior, garbage collection timing, and memory fragmentation. A test that occasionally uses 1.1 MiB would fail despite no actual leak. The new approach: - **Calls `_infer_pyarrow_type` 8 times in a loop**, which leaks 1 GiB without Ray Data's workaround (admittedly, 8 is a magic number here) - **Uses a 64 MiB threshold**, providing a much larger margin above normal variation while still catching any real leak with a clear signal This creates a much stronger test: if the leak exists, we'd see memory growth approaching 1 GiB (with repeated runs), making failures unambiguous. Meanwhile, normal RSS fluctuations of a few MiB won't trigger false positives. --------- Signed-off-by: Balaji Veeramani <[email protected]>

bveeramani requested a review from a team as a code owner November 25, 2025 08:38

bveeramani requested a review from owenowenisme November 25, 2025 08:39

gemini-code-assist bot reviewed Nov 25, 2025

View reviewed changes

bveeramani added the go add ONLY when ready to merge, run all tests label Nov 25, 2025

owenowenisme approved these changes Nov 25, 2025

View reviewed changes

Addressr eview comments

cf53ba9

Signed-off-by: Balaji Veeramani <[email protected]>

bveeramani enabled auto-merge (squash) November 25, 2025 09:21

github-actions bot disabled auto-merge November 25, 2025 09:22

bveeramani added 2 commits November 25, 2025 01:38

Address review comments

619decf

Signed-off-by: Balaji Veeramani <[email protected]>

Update docstring

f42f70a

Signed-off-by: Balaji Veeramani <[email protected]>

cursor bot reviewed Nov 25, 2025

View reviewed changes

Address review comments

b3b4edf

Signed-off-by: Balaji Veeramani <[email protected]>

ray-gardener bot added the data Ray Data-related issues label Nov 25, 2025

bveeramani merged commit a9d5464 into master Nov 25, 2025
6 checks passed

bveeramani deleted the bveeramani/improve-memory-leak-test branch November 25, 2025 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data] Improve memory leak test robustness #58968

[Data] Improve memory leak test robustness #58968

Uh oh!

bveeramani commented Nov 25, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

bveeramani Nov 25, 2025

Uh oh!

owenowenisme left a comment

Uh oh!

Uh oh!

owenowenisme Nov 25, 2025 •

edited

Loading

Uh oh!

bveeramani Nov 25, 2025

Uh oh!

bveeramani Nov 25, 2025

Uh oh!

owenowenisme Nov 25, 2025

Uh oh!

cursor bot Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Call the function several times. If there's a memory leak, this loop will leak
		# as much as 1 GiB of memory.

[Data] Improve memory leak test robustness #58968

[Data] Improve memory leak test robustness #58968

Uh oh!

Conversation

bveeramani commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

bveeramani Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

owenowenisme left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

owenowenisme Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bveeramani Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

bveeramani Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

owenowenisme Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Nov 25, 2025

Choose a reason for hiding this comment

Bug: Margin of error calculation allows excessive memory growth

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bveeramani commented Nov 25, 2025 •

edited

Loading

owenowenisme Nov 25, 2025 •

edited

Loading