-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] Improve memory leak test robustness #58968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Remove `list` parameter and increase tolerance threshold for the Arrow type inference memory leak test. The memory leak (apache/arrow#45493) specifically occurs with ndarray objects, not lists—testing the list case added no value. The previous 1 MiB threshold was too tight, causing flaky failures due to normal RSS measurement noise. The new approach calls `_infer_pyarrow_type` 8 times in a loop, which would leak ~56 MiB if the bug regresses, and uses a 64 MiB threshold to provide clear signal while avoiding false positives. Signed-off-by: Balaji Veeramani <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves the robustness of a memory leak test related to _infer_pyarrow_type. The changes focus the test on the specific case that causes the leak (ndarray objects), remove unnecessary parametrization, and make the test less flaky by amplifying the potential leak and increasing the memory threshold. The changes are well-justified and improve the test's reliability. I have one suggestion to improve a comment for clarity.
| # Call the function several times. If there's a memory leak, this loop will leak | ||
| # as much as 1 GiB of memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states that the loop will leak "as much as 1 GiB of memory", which is an overstatement and could be confusing. Based on the test setup (an ~7 MiB array processed 8 times), the expected memory leak would be around 56 MiB if the bug regresses. The PR description also mentions this amount. To avoid confusion, it would be better to update the comment to reflect the expected leak size more accurately.
| # Call the function several times. If there's a memory leak, this loop will leak | |
| # as much as 1 GiB of memory. | |
| # Call the function several times. If there's a memory leak of ~7 MiB per call, | |
| # this loop will leak ~56 MiB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true. I ran this, and it leaked 1 GiB
owenowenisme
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just some nit comments
| after = process.memory_info().rss | ||
|
|
||
| assert after - before < 1024 * 1024, after - before | ||
| assert after - before < 64 * MiB, memory_string(after - before) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Something like 8 * 7Mib * some margin of error would be much better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added named constant rather than magic number. Should be ideally 0 MiB because we garbage collect after the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, reverted from 8 * 7 MiB * margin of error to a fixed threshold because of this reasoning: #58968 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh okay, I thought a ndarray is exact 7MiB
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
| after = process.memory_info().rss | ||
|
|
||
| assert after - before < 1024 * 1024, after - before | ||
| margin_of_error = ndarray.nbytes * num_repetitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Margin of error calculation allows excessive memory growth
The margin_of_error is set to ndarray.nbytes * num_repetitions, which allows up to 56 MiB of memory growth (8 repetitions × 7 MiB). Since garbage collection runs after the loop, the expected memory growth should ideally be zero. This permissive margin would allow approximately one leaked array copy per iteration, potentially masking the very memory leak the test aims to detect. The margin should account only for normal RSS measurement noise, not the cumulative size of all iterations.
Signed-off-by: Balaji Veeramani <[email protected]>
## Why are these changes needed? The memory leak being tested ([apache/arrow#45493](apache/arrow#45493)) specifically occurs when inferring types from **ndarray objects**, not from lists containing ndarrays. Testing the `list` case added no value since the leak doesn't manifest there—it only added execution time and obscured the test's purpose. More importantly, the previous 1 MiB threshold was too tight and caused flaky failures. Memory measurements via RSS are inherently noisy due to OS-level allocation behavior, garbage collection timing, and memory fragmentation. A test that occasionally uses 1.1 MiB would fail despite no actual leak. The new approach: - **Calls `_infer_pyarrow_type` 8 times in a loop**, which leaks 1 GiB without Ray Data's workaround (admittedly, 8 is a magic number here) - **Uses a 64 MiB threshold**, providing a much larger margin above normal variation while still catching any real leak with a clear signal This creates a much stronger test: if the leak exists, we'd see memory growth approaching 1 GiB (with repeated runs), making failures unambiguous. Meanwhile, normal RSS fluctuations of a few MiB won't trigger false positives. --------- Signed-off-by: Balaji Veeramani <[email protected]>
Why are these changes needed?
The memory leak being tested (apache/arrow#45493) specifically occurs when inferring types from ndarray objects, not from lists containing ndarrays. Testing the
listcase added no value since the leak doesn't manifest there—it only added execution time and obscured the test's purpose.More importantly, the previous 1 MiB threshold was too tight and caused flaky failures. Memory measurements via RSS are inherently noisy due to OS-level allocation behavior, garbage collection timing, and memory fragmentation. A test that occasionally uses 1.1 MiB would fail despite no actual leak.
The new approach:
_infer_pyarrow_type8 times in a loop, which leaks 1 GiB without Ray Data's workaround (admittedly, 8 is a magic number here)This creates a much stronger test: if the leak exists, we'd see memory growth approaching 1 GiB (with repeated runs), making failures unambiguous. Meanwhile, normal RSS fluctuations of a few MiB won't trigger false positives.