-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-4582: [Python/C++] Acquire the GIL on Py_INCREF #3655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Would You mind adding a test for it? |
|
@kszucs I have no idea on how to test this. It's a race condition that occured really seldomly. |
Codecov Report
@@ Coverage Diff @@
## master #3655 +/- ##
==========================================
+ Coverage 87.76% 87.79% +0.02%
==========================================
Files 689 689
Lines 83984 84014 +30
Branches 1081 1081
==========================================
+ Hits 73712 73759 +47
+ Misses 10157 10144 -13
+ Partials 115 111 -4
Continue to review full report at Codecov.
|
|
Green AppVeyor build: https://ci.appveyor.com/project/xhochy/arrow/builds/22398318 |
|
@xhochy nice catch!! Taking a closer look |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Minimal reproducing example:
```
import dask
import pandas as pd
import pyarrow as pa
import numpy as np
def segfault_me(df):
pa.Table.from_pandas(df, nthreads=1)
while True:
df = pd.DataFrame(
{"P": np.arange(0, 10), "L": np.arange(0, 10), "TARGET": np.arange(10, 20)}
)
dask.compute([
dask.delayed(segfault_me)(df),
dask.delayed(segfault_me)(df),
dask.delayed(segfault_me)(df),
dask.delayed(segfault_me)(df),
dask.delayed(segfault_me)(df),
])
```
Segfaults are more likely when run in AddressSanitizer or otherwise slow system with many cores. It is important that always the same df is passed into the functions.
The issue was that the reference count of the underlying NumPy array was increased at the same time by multiple threads. The decrease happend then with a GIL, so the array was sometimes destroyed while still used.
Author: Korn, Uwe <[email protected]>
Closes apache#3655 from xhochy/ARROW-4582 and squashes the following commits:
7f9838d <Korn, Uwe> docker-compose run clang-format
3d6e5ee <Korn, Uwe> ARROW-4582: Acquire the GIL on Py_INCREF
Minimal reproducing example:
Segfaults are more likely when run in AddressSanitizer or otherwise slow system with many cores. It is important that always the same df is passed into the functions.
The issue was that the reference count of the underlying NumPy array was increased at the same time by multiple threads. The decrease happend then with a GIL, so the array was sometimes destroyed while still used.