Skip to content

Conversation

@bveeramani
Copy link
Member

@bveeramani bveeramani commented Nov 25, 2025

Why are these changes needed?

The memory leak being tested (apache/arrow#45493) specifically occurs when inferring types from ndarray objects, not from lists containing ndarrays. Testing the list case added no value since the leak doesn't manifest there—it only added execution time and obscured the test's purpose.

More importantly, the previous 1 MiB threshold was too tight and caused flaky failures. Memory measurements via RSS are inherently noisy due to OS-level allocation behavior, garbage collection timing, and memory fragmentation. A test that occasionally uses 1.1 MiB would fail despite no actual leak.

The new approach:

  • Calls _infer_pyarrow_type 8 times in a loop, which leaks 1 GiB without Ray Data's workaround (admittedly, 8 is a magic number here)
  • Uses a 64 MiB threshold, providing a much larger margin above normal variation while still catching any real leak with a clear signal

This creates a much stronger test: if the leak exists, we'd see memory growth approaching 1 GiB (with repeated runs), making failures unambiguous. Meanwhile, normal RSS fluctuations of a few MiB won't trigger false positives.

Remove `list` parameter and increase tolerance threshold for the
Arrow type inference memory leak test.

The memory leak (apache/arrow#45493) specifically occurs with ndarray
objects, not lists—testing the list case added no value. The previous
1 MiB threshold was too tight, causing flaky failures due to normal
RSS measurement noise.

The new approach calls `_infer_pyarrow_type` 8 times in a loop, which
would leak ~56 MiB if the bug regresses, and uses a 64 MiB threshold
to provide clear signal while avoiding false positives.

Signed-off-by: Balaji Veeramani <[email protected]>
@bveeramani bveeramani requested a review from a team as a code owner November 25, 2025 08:38
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the robustness of a memory leak test related to _infer_pyarrow_type. The changes focus the test on the specific case that causes the leak (ndarray objects), remove unnecessary parametrization, and make the test less flaky by amplifying the potential leak and increasing the memory threshold. The changes are well-justified and improve the test's reliability. I have one suggestion to improve a comment for clarity.

Comment on lines 140 to 141
# Call the function several times. If there's a memory leak, this loop will leak
# as much as 1 GiB of memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment states that the loop will leak "as much as 1 GiB of memory", which is an overstatement and could be confusing. Based on the test setup (an ~7 MiB array processed 8 times), the expected memory leak would be around 56 MiB if the bug regresses. The PR description also mentions this amount. To avoid confusion, it would be better to update the comment to reflect the expected leak size more accurately.

Suggested change
# Call the function several times. If there's a memory leak, this loop will leak
# as much as 1 GiB of memory.
# Call the function several times. If there's a memory leak of ~7 MiB per call,
# this loop will leak ~56 MiB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't true. I ran this, and it leaked 1 GiB

@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Nov 25, 2025
Copy link
Member

@owenowenisme owenowenisme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just some nit comments

after = process.memory_info().rss

assert after - before < 1024 * 1024, after - before
assert after - before < 64 * MiB, memory_string(after - before)
Copy link
Member

@owenowenisme owenowenisme Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Something like 8 * 7Mib * some margin of error would be much better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added named constant rather than magic number. Should be ideally 0 MiB because we garbage collect after the loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, reverted from 8 * 7 MiB * margin of error to a fixed threshold because of this reasoning: #58968 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay, I thought a ndarray is exact 7MiB

Signed-off-by: Balaji Veeramani <[email protected]>
@bveeramani bveeramani enabled auto-merge (squash) November 25, 2025 09:21
@github-actions github-actions bot disabled auto-merge November 25, 2025 09:22
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
after = process.memory_info().rss

assert after - before < 1024 * 1024, after - before
margin_of_error = ndarray.nbytes * num_repetitions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Margin of error calculation allows excessive memory growth

The margin_of_error is set to ndarray.nbytes * num_repetitions, which allows up to 56 MiB of memory growth (8 repetitions × 7 MiB). Since garbage collection runs after the loop, the expected memory growth should ideally be zero. This permissive margin would allow approximately one leaked array copy per iteration, potentially masking the very memory leak the test aims to detect. The margin should account only for normal RSS measurement noise, not the cumulative size of all iterations.

Fix in Cursor Fix in Web

Signed-off-by: Balaji Veeramani <[email protected]>
@ray-gardener ray-gardener bot added the data Ray Data-related issues label Nov 25, 2025
@bveeramani bveeramani merged commit a9d5464 into master Nov 25, 2025
6 checks passed
@bveeramani bveeramani deleted the bveeramani/improve-memory-leak-test branch November 25, 2025 19:02
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
## Why are these changes needed?

The memory leak being tested
([apache/arrow#45493](apache/arrow#45493))
specifically occurs when inferring types from **ndarray objects**, not
from lists containing ndarrays. Testing the `list` case added no value
since the leak doesn't manifest there—it only added execution time and
obscured the test's purpose.

More importantly, the previous 1 MiB threshold was too tight and caused
flaky failures. Memory measurements via RSS are inherently noisy due to
OS-level allocation behavior, garbage collection timing, and memory
fragmentation. A test that occasionally uses 1.1 MiB would fail despite
no actual leak.

The new approach:
- **Calls `_infer_pyarrow_type` 8 times in a loop**, which leaks 1 GiB
without Ray Data's workaround (admittedly, 8 is a magic number here)
- **Uses a 64 MiB threshold**, providing a much larger margin above
normal variation while still catching any real leak with a clear signal

This creates a much stronger test: if the leak exists, we'd see memory
growth approaching 1 GiB (with repeated runs), making failures
unambiguous. Meanwhile, normal RSS fluctuations of a few MiB won't
trigger false positives.

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants