-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression segfault under CPython 3.12a5 in the SymPy test suite #102250
Comments
@oscarbenjamin do you have time to debug it to help us? Which test exacty fails? Or is it a combination / sequence of tests? |
I got another crash from sympy but not sure it's CPython issue or Sympy issue at this moment.
|
Python 3.12.0a5+ (heads/main:a35fd38b57, Feb 25 2023, 23:27:09) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sympy
>>> sympy.test(subprocess=False, seed=96737803)
============================= test process starts ==============================
executable: /...omitted.../python.exe (3.12.0-alpha-5) [CPython]
architecture: 64-bit
cache: yes
ground types: python
numpy: None
random seed: 96737803
hash randomization: off
...
sympy/functions/combinatorial/tests/test_comb_numbers.py[24] ....zsh: segmentation fault ./python.exe Second run sympy/functions/special/tests/test_bsplines.py[11] .........wzsh: segmentation fault ./python.exe Not on Python 3.11.2 (main, Feb 26 2023, 15:08:59) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sympy
>>> sympy.test(subprocess=False, seed=96737803)
============================= test process starts ==============================
executable: /...omitted.../python.exe (3.11.2-final-0) [CPython]
architecture: 64-bit
cache: yes
ground types: python
numpy: None
random seed: 96737803
hash randomization: off
...
True Moreover Python 3.11.2+ (heads/3.11:7a0dc8a802, Feb 26 2023, 16:22:23) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
...
True It looks like if CPython is the culprit, it probably should be bisectable in 3.12. Unfortunately, the test suite is quite long! :) |
The crash you showed suggests this is a segfault arising from a refcounting bug. Yes, SymPy is pure Python. It can be used with other optional dependencies that are not pure Python like numpy, gmpy2 etc but this bug is seen without any of those. A refcounting bug here is definitely a CPython bug. |
I am trying to bisect it currently but it's a slow process because of the time taken to build cpython and then run the sympy test suite. Also it is difficult because the failure itself is not completely deterministic. As I said in the OP running individual tests does not reproduce the failure so it is not possible to narrow it down to less than running the whole test suite. Actually I have narrowed it down a little bit. It seems that the first 1/6 of the test suite is enough to reproduce this: import sympy
sympy.test(subprocess=False, split="1/6") That does not reduce the time taken for the crash to show up but it does reduce the time taken to reproduce the crash not happening which is useful for bisecting. I haven't quite finished bisecting but I'm currently down to this range:
|
Tests pass in |
Bisect complete: sympy/functions/elementary/tests/test_miscellaneous.py[18] ......s........... [OK]
sympy/functions/elementary/tests/test_piecewise.py[63] ....ww...w..............................................w...... [OK]
sympy/functions/elementary/tests/test_trigonometric.py[86] ./bug.sh: line 12: 30073 Bus error: 10 ./python.exe -c 'import sympy; sympy.test(subprocess=False, split="1/6")'
$ git bisect bad
7b14c2ef194b6eed79670aa9d7e29ab8e2256a56 is the first bad commit
commit 7b14c2ef194b6eed79670aa9d7e29ab8e2256a56
Author: Mark Shannon <[email protected]>
Date: Mon Jan 16 12:35:21 2023 +0000
GH-100982: Add `COMPARE_AND_BRANCH` instruction (GH-100983)
:040000 040000 a01bf959d999386154fb4dfef5d568c0288cb316 dbd2b3a4a61424abfa5f9e726973b9d03bd24bfd M Doc
:040000 040000 0fac1d9d6fc2e2066a40e527653ecdae70011c23 d3479ece49974dc513a50d3bd1d5684834e95970 M Include
:040000 040000 58458255181b3044deeb7c7f852c89e37d398508 f24643aa3a7f1afbdd58639b6e4496f59f1da4f9 M Lib
:040000 040000 3a605b820cd18bd683911c0a274405ab875a6c79 11052d23b2fb46e6cb0964518d8c215dbc5b3bcd M Misc
:040000 040000 9364b4ad219faab45e789b146bd93c5694865d79 9a9702c99b36e043768cfde5b3dbdb97ba8972f3 M Objects
:040000 040000 40b1d7e8c2b8c5af2fd1186256c49bebd3e9c954 83b3b65a9a27d8cb683313f6d280e9bfd20f16ca M Python
:040000 040000 0cae5606a7c50631954bb014dd67b0384f201210 12b0ea6eb5eafad436ccf7564bacd0a72f40ab59 M Tools That's 7b14c2e from gh-100983. I will do a bit more testing with the parent commit (b1a74a1) to confirm that the bug is not seen there. CC @markshannon |
Full test suite green for me in b1a74a1. |
Does running the |
Attempting to run the full test suite at 7b14c2e, I encounter a segfault at ...
sympy/diffgeom/tests/test_diffgeom.py[16] ................ [OK]
sympy/diffgeom/tests/test_function_diffgeom_book.py[4] ..zsh: segmentation fault ./python.exe However, running |
I confirm the full test suite passes with the parent commit so I'm confident that the problem begines with 7b14c2e.
I tried something similar to this. Exactly where it crashes is different on different runs but for example it sometimes crashes in the diffgeom tests so I made a script to run just the diffgeom tests 100 times and that didn't give any crash. I'll try again with: import sympy
for n in range(1000):
print()
print('-------------------------------------- n =', n)
print()
sympy.test('test_error_functions', subprocess=False) It's possible that it would be different by setting the environment variable How many repetitions should I expect to need? |
The crash @corona10 mentioned is reproducible with
The bytecode for both is below b1a74a1 (good):
7b14c2e (bad):
with the difference between the two being I can't cause this alone to cause a segfault when Edit: Removing https://github.com/sympy/sympy/blob/sympy-1.11.1/sympy/core/tests/test_relational.py#L860-L869 from the sympy test suite allow sthe subset of the test suite which I was running on (split='1/12') to run without segfaulting anymore. Adding the test back in reintroduces the segfault that appears later on. |
Okay this starts to make sense. A comparison like >>> cond = x > 0
>>> cond
x > 0
>>> type(cond)
<class 'sympy.core.relational.StrictGreaterThan'>
>>> bool(cond) # it isn't known if cond is True or False
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/oscar/current/sympy/sympy.git/sympy/core/relational.py", line 510, in __bool__
raise TypeError("cannot determine truth value of Relational")
TypeError: cannot determine truth value of Relational The Lines 1854 to 1857 in 7b14c2e
Now I think:
I'm not sure what Line 68 in 7b14c2e
I guess it means goto error; . I'm not sure what happens then.
|
To be clear it is the Symbol object refcount that causes the crash. Using $ ./python.exe t.py
Traceback (most recent call last):
File "/Users/enojb/current/sympy/cpython.git/t.py", line 3, in <module>
assert x > 0
^^^^^
File "/Users/enojb/current/sympy/cpython.git/sympy/core/relational.py", line 511, in __bool__
raise TypeError("cannot determine truth value of Relational")
TypeError: cannot determine truth value of Relational
Modules/gcmodule.c:115: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
Enable tracemalloc to get the memory block allocation traceback
object address : 0x103808410
object refcount : 10
object type : 0x7f80fd043a30
object type name: Symbol
object repr : x
Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: finalizing (tstate=0x0000000101fe7000)
Current thread 0x0000000109b74600 (most recent call first):
Garbage-collecting
<no Python frame>
Abort trap: 6 |
No it won't. It will return the Lines 1854 to 1869 in 7b14c2e
Then PyObject_IsTrue returns -1 because GreaterThan.__bool__ raises. Then we hit the err < 0 and goto error; .
|
Before 7b14c2e the Lines 1839 to 1854 in b1a74a1
|
Here is a simple reproducer not involving SymPy: class SymbolicBool:
def __bool__(self):
raise TypeError
class Symbol:
def __gt__(self, other):
return SymbolicBool()
x = Symbol()
assert x > 0 With $ ./python.exe t2.py
Traceback (most recent call last):
File "/Users/enojb/current/sympy/cpython.git/t2.py", line 11, in <module>
assert x > 0
^^^^^
File "/Users/enojb/current/sympy/cpython.git/t2.py", line 3, in __bool__
raise TypeError
TypeError
Modules/gcmodule.c:450: visit_decref: Assertion "!_PyObject_IsFreed(op)" failed
Enable tracemalloc to get the memory block allocation traceback
object address : 0x10a16bbf0
object refcount : 5
object type : 0x109ac23e0
object type name: dict
object repr : Segmentation fault: 11 |
In fact it crashes with an ordinary build as well including the OSX 3.12a5 build that I installed from python.org. |
Fixed in #102287 by @sweeneyde. Thanks for reporting @oscarbenjamin! |
Thanks for fixing and thanks everyone for the help pinning this down! |
* main: (67 commits) pythongh-99108: Add missing md5/sha1 defines to Modules/Setup (python#102308) pythongh-100227: Move _str_replace_inf to PyInterpreterState (pythongh-102333) pythongh-100227: Move the dtoa State to PyInterpreterState (pythongh-102331) pythonGH-102305: Expand some macros in generated_cases.c.h (python#102309) Migrate to new PSF mailgun account (python#102284) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (in Python/) (python#102193) pythonGH-90744: Fix erroneous doc links in the sys module (python#101319) pythongh-87092: Make jump target label equal to the offset of the target in the instructions sequence (python#102093) pythongh-101101: Unstable C API tier (PEP 689) (pythonGH-101102) IDLE: Simplify DynOptionsMenu __init__code (python#101371) pythongh-101561: Add typing.override decorator (python#101564) pythongh-101825: Clarify that as_integer_ratio() output is always normalized (python#101843) pythongh-101773: Optimize creation of Fractions in private methods (python#101780) pythongh-102251: Updates to test_imp Toward Fixing Some Refleaks (pythongh-102254) pythongh-102296 Document that inspect.Parameter kinds support ordering (pythonGH-102297) pythongh-102250: Fix double-decref in COMPARE_AND_BRANCH error case (pythonGH-102287) pythongh-101100: Fix sphinx warnings in `types` module (python#102274) pythongh-91038: Change default argument value to `False` instead of `0` (python#31621) pythongh-101765: unicodeobject: use Py_XDECREF correctly (python#102283) [doc] Improve grammar/fix missing word (pythonGH-102060) ...
Thanks again to all. The 3.12a6 release rolled out to SymPy CI which is now all green for CPython 3.12 (although not the latest mypy release...): I also would like to say thanks to all involved with the faster-CPython project. I know that any significant changes will cause churn and some bugs like this one but SymPy and many other projects stand to benefit significantly from the speed improvements in CPython. Now that 3.12 tests pass I have collected the total time taken (in CI) to run the SymPy test and doctest suite under different CPython versions: 3.8: 3439 secs (That's the time for tests1+tests2+doctests in the linked CI job. There "latest" means 3.11.) According to those timings 3.12 looks like showing the biggest effect from recent releases and the cumulative effect from 3.8 to 3.12 is a 30% reduction in runtime. |
Crash report
This comes from sympy/sympy#24776 which adds CPython 3.12 prerelease testing in SymPy's CI.
This is seen with CPython 3.12a5 but not with 3.11 or earlier versions.
The reproducer is to run the SymPy test suite:
Then
I don't yet have a simpler reproducer for this because it seems to be non-deterministic but the SymPy test suite reliably invokes a segfault under 3.12 alpha 5 after about 5 minutes. The tests that are running at the time of the segfault will pass if run in isolation. Running the whole test suite though will cause it to fail randomly at one of a few specific places. I don't have a simpler reproducer because running a smaller part of the test suite does not reproduce the problem.
Error messages
Usually:
In one case I have also seen (on OSX):
Your environment
The problem was seen initially in GitHub Actions CI on an Ubuntu 20.04 runner but I have also reproduced it locally in OSX (an Intel-CPU Macbook).
I don't immediately have a setup that I can use to bisect this but I will get one set up soon to narrow this down.
Linked PRs
The text was updated successfully, but these errors were encountered: