Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PanicException doesn't inherit from base Exception #2783

Closed
david-cortes opened this issue Nov 25, 2022 · 31 comments · May be fixed by #3057
Closed

PanicException doesn't inherit from base Exception #2783

david-cortes opened this issue Nov 25, 2022 · 31 comments · May be fixed by #3057

Comments

@david-cortes
Copy link

From pola-rs/polars#5606

PanicException doesn't inherit from the base python Exception, only from BaseException.

As used by libraries such as polars, PanicException is typically something comparable to ValueError, rather than something comparable to SystemExit or KeyboardInterrupt (built-ins which inherit from BaseException but not from Exception).

Using a library that relies on pyo3 in some application/service makes error handling problematic, as one most typically adds something like except Exception in order to handle errors gracefully while still responding to interrupt signals for example, but such code would not catch errors coming from pyo3-wrapped libraries.

@birkenfeld
Copy link
Member

I tried to explain our stance in #2638

Basically, panics should stay akin to Rust panics, not Python exceptions. It is polars' job to raise appropriate Python exceptions for conditions like invalid caller-supplied input (like the mentioned invalid column name) and other run-time errors.

Finally, if you still disagree it's easy enough to catch except (Exception, PanicException).

@david-cortes
Copy link
Author

As a counter argument: pyo3 is used to create python libraries. Major python libraries (e.g. NumPy) never throw SystemExit anywhere in their code, regardless of how important an error is, or how exactly a user got into that error, or whether the error was caused by the user or by the library author.

One would not expect to get such type of exceptions (e.g. SystemExit, KeyboardInterrupt) out of a library, but pyo3 leaves the possibility open to producing the currently-equivalent PanicException due to a combination of user error and library author error.

A C++ library wrapped with cython for example would not throw any exception that's not derived from Exception, and one does not need to manually extend error catching conditions when using a cython-wrapped library, as one would have to do when including a pyo3-wrapped library.

@birkenfeld
Copy link
Member

First, I'm against strawmanning the current situation as "PanicException is basically SystemExit" - that's simply not true. That said: of course, when combining parts written in different languages, mapping concepts of both cannot be 1:1 in some places. Something on the level of a panic does not exist in Python.

As for the C++ comparison, in places where Rust code panics, C++ code would typically run into a segfault.

@davidhewitt
Copy link
Member

Agreed with what @birkenfeld has said above. A Rust panic really is intended to be treated as a (graceful) crash, not as a user-space error. It sounds from the polars thread that the maintainers there are willing to treat panics as bugs and fix them.

one does not need to manually extend error catching conditions when using a cython-wrapped library, as one would have to do when including a pyo3-wrapped library.

The point is that a panic isn't a typical error condition (returning Result is). A Python user isn't expected to interact with panics, the choice to not extend from Exception is deliberate so that panics are not accidentally caught.

Will close this one, if there is significant pressure from the community we can revisit.

@davidhewitt davidhewitt closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2022
@david-cortes
Copy link
Author

Then again, if a user who is using a pyo3-wrapped library runs into some internal library error (e.g. due to a bug in the library itself), would you say it is appropriate to have the library throw a non-Exception error?

If you think it should, I'll then ask, do you think it is reasonable to expect all libraries wrapped with pyo3 to not run into unexpected errors or to expect all of them to never throw PanicException as a response to some user error, in such a way that the user would not need to manually catch PanicException when using such library?

And, if you were using a python library such as pandas (something akin to polars) in some app/service, and you run into some bug in the library that is not your own fault, would you as the user want the error to be non-catcheable?

@adamreichold
Copy link
Member

want the error to be non-catcheable?

You repeatedly say this, but it is not correct. While except Exception not catching it can be considered a usability issue, the writers of said application or service can still add expect PanicException to their request handlers etc. when they integrate PyO3-based dependencies.

@david-cortes
Copy link
Author

want the error to be non-catcheable?

You repeatedly say this, but it is not correct. While except Exception not catching it can be considered a usability issue, the writers of said application or service can still add expect PanicException to their request handlers etc. when they integrate PyO3-based dependencies.

Sure, but then it sounds like you do expect users of pyo3-wrapped libraries to add additional error handling in a way that would not be needed with other types of libraries.

@adamreichold
Copy link
Member

adamreichold commented Nov 27, 2022

Sure, but then it sounds like you do expect users of pyo3-wrapped libraries to add additional error handling in a way that would not be needed with other types of libraries.

No, not in the majority of cases. As explained above, panics in Rust indicate logic errors and similarly unrecoverable failures that might have corrupted a library's internal state and it usually is the best course of action to e.g. crash the service and let the supervisor layer restart it cleanly.

If the Python bindings of polars abuse panics to indicate invalid user input, then that is a bug in those bindings.

@touilleMan
Copy link

Hi I'd like to ad my 2c here

tl;dr: would you accept a PR to add a feature that make PanicException inherits Exception ? this would make usability much better 😄

On Exception vs BaseException

From the Python documentation:

BaseException is the common base class of all exceptions. One of its subclasses, Exception, is the base class of all the non-fatal exceptions. Exceptions which are not subclasses of Exception are not typically handled, because they are used to indicate that the program should terminate.

From the GeneratorExit documentation

It directly inherits from BaseException instead of Exception since it is technically not an error.

On top of that, asyncio.CancelledError inherits from BaseException. Just like for GeneratorExit this is a way to use the exception mechanism to implement a feature.

So, in Python philosophy, Exception should be for errors and BaseException for termination or for feature implementation.

The fact that a panic in Rust should lead to a fatal termination cannot be decided in pyo3 given it depend entirely on what the program does (see example 1 for instance).

On Rust panics are special snowflakes

The rational for having Rust panic raising a BaseException according to @adamreichold

panics in Rust indicate logic errors and similarly unrecoverable failures that might have corrupted a library's internal state

However this is not specific to Rust ! Any library in Python can have it internal state corrupted by an unexpected exception. However those exception still inherit Exception (for instance ZeroDivisionError).

And as says the Zen of Python 😄

Special cases aren't special enough to break the rules.

On how to deal with BaseException-based PanicException

As we saw, BaseException-based exception are all very different, hence we cannot have a except BaseException dealing the same way for all of them:

try:
    # May raise any exception ?
    my_code()
except BaseException as exc:
    # Nothing to do, this is not an error
    if isinstance(exc, GeneratorExit):
        raise
    # Asyncio handle this itself, nothing to do
    elif isinstance(exc, asyncio.CancelledError):
        raise
    else:
        # Real error: Exception or PanicException
        logger.error(f"uncatched exception: {exc}")

But this code is complex and fragile: for instance if we start supporting trio in addition with asyncio the code will break.

We could change it to only catch the exceptions types we care about:

try:
    # May raise any exception ?
    my_code()
except BaseException as exc:
    if isinstance(exc, Exception):
        # Real exception !
        logger.error(f"uncatched exception: {exc}")
    elif isinstance(exc, my_pyo3_module.PanicException):
        # Real exception, but special snowflake
        logger.error(f"uncatched exception: {exc}")
    else:
        # Let go through GeneratorExit and e.g. asyncio.CancelledError
        raise

But now it's even worst: PanicException is specific to each module, so the code will break if we introduce a new pyo3 based module !

This is quite a huge footgun so it's better to limit as much as possible user interaction with this type of exception (it would be even better to have each module have a differently named PanicException, but I guess it's a complex topic ^^).

Now we can compare with if PanicException inherit Exception:

try:
    # May raise any exception ?
    my_code()
except Exception as exc:
        logger.error(f"uncatched exception: {exc}")

That's a huge usability and user-friendly boost ! Especially when comparing the suggested fix when dealing with multiple pyo3 module

Example 1

Let's consider a web server for instance, it uses pyo3 to handle parsing of incoming requests. If an incoming request contains an unexpected payload the pyo3 may raise an exception that will crash the entire server :'(

Of course that mean there is a bug to fix in the server code, but in the meantime we don't have to deal with a randomly crashing server in production !

On top of that, web framework already provide a unexpected error exception handler, but it is based on catching Exception (e.g. flask and django) so we have to roll our own middleware just to catch PanicException (and make sure it won't break next time a new pyo3 module is introduced...)

Example 2

Another real life example: we use pyo3 in a client application.
We rely on Sentry to report unexpected exception, that means:

  1. an unexpected exception occurs somewhere in the application
  2. the exception bubble up
  3. when high enough it will reach a global exception catcher that will log an error
  4. The sentry lib that run in our application listen for the error logs, queue them and send them

Step 3) typically rely on a except Exception global except, it is often provided by a 3rd party (for instance pyQt handle this itself when an exception bubble up in a Qt slot).

Now we have to manually check each part of the code where unexpected exception catching is done, because we have to manually make sure they behave as excepted when a non-Exception bubble up. And if we miss a place we may end up with an exception that terminate the application right away, and Sentry potentially not sending a crash report so we will not be aware of the issue :'(

@adamreichold
Copy link
Member

However this is not specific to Rust ! Any library in Python can have it internal state corrupted by an unexpected exception. However those exception still inherit Exception (for instance ZeroDivisionError).

I don't think is as general as it is put here, or rather this is a fundamental difference in how Rust and Python handle error conditions: Python always uses unwinding whereas Rust differentiates between expected error conditions like I/O errors, unreachable hosts or invalid user input handled via Result and internal inconsistencies handled via panics, i.e. unwinding or aborts.

Python has no such distinction besides BaseException versus Exception and hence one side will always be somewhat unhappy depending on which approach is better approximated on the other side of language barrier.

@birkenfeld
Copy link
Member

While I've been strongly in the BaseException camp before, I can see that this question is getting raised again and again, since panics-as-termination strongly goes against the Python feeling that "everything can be caught".

And since PyO3 is mostly integrating Rust into Python, not the other way round, maybe it really is better to adapt to expectations?

Our argument that broken internal state should lead to termination is basically void, since after all it can be caught and will be caught anyway, just with more workarounds and grumbling.

@mejrs
Copy link
Member

mejrs commented Feb 27, 2023

What matters most for me is which option is most likely to encourage and lead users to write robust and correct programs. Handling panics as exceptions makes it easy to continue running buggy programs (possibly with corrupted internal state) which goes against that.

@birkenfeld
Copy link
Member

Handling panics as exceptions makes it easy to continue running buggy programs

As said before: "the users" will do it anyway, just with more grumbling after finding out their expectations were wrong.

And under similar circumstances, I did it too, even in pure Rust: in a request handling thread, catch panics and restart that specific thread. Exiting and restarting the whole process was not a good option.

@david-cortes
Copy link
Author

What matters most for me is which option is most likely to encourage and lead users to write robust and correct programs. Handling panics as exceptions makes it easy to continue running buggy programs (possibly with corrupted internal state) which goes against that.

The bugs in this case are more likely to be on library developers than on users of those libraries.

@mejrs
Copy link
Member

mejrs commented Feb 27, 2023

Coming back to your first example, how do Python webservers handle things like SystemExit?

@mejrs mejrs reopened this Feb 27, 2023
@adamreichold
Copy link
Member

since panics-as-termination strongly goes against the Python feeling that "everything can be caught".

And under similar circumstances, I did it too, even in pure Rust: in a request handling thread, catch panics and restart that specific thread. Exiting and restarting the whole process was not a good option.

I'd like to reiterate that we are discussing how hard it is to catch PanicException, not whether it is possible at all.

Looking at Rust for example, using catch_unwind together with the UnwindSafe trait is not trivial either will often involve a bit of boiler plate. But that is intentional, as one should definitely be sure that one wants to do this and hence making the programmer jump through a hoop or two can be seen as a safety feature.

Similarly, the linked

except BaseException as e:
  if not "PanicException" in str(type(e)):
    raise

  # restart threads, reset state, etc.

might be ugly, but I think that it would do what is asked and it does not interact with CancelledError or GeneratorExit. (Where I might add that personally, I find calling the use of exceptions as a control flow mechanism "implementing a feature" rather generous. But of course, Python is what it is and it will not change just because I would want it to.)

@birkenfeld
Copy link
Member

Coming back to your first example, how do Python webservers handle things like SystemExit?

Probably not at all - why should they? SystemExit does not simply appear as a result of buggy libraries.

I'd like to reiterate that we are discussing how hard it is to catch PanicException, not whether it is possible at all.

Yes, and since it is pretty clear that our users want to do it, it's just stubbornness to make it surprisingly hard for them.

might be ugly, but I think that it would do what is asked

It is not just ugly, it is also excitingly stringly-typed, and hard to get right.

My first instinct was to catch PanicException by type, as is proper, and only as an afterthought I realized that it is impossible in general. I could wager that a decent percentage of Python developers would assume that the PanicException exported from one PyO3 module is the same as in all others. As mere users of PyO3 based libraries, they cannot be expected to know or understand the reasons preventing it to be so.

@adamreichold
Copy link
Member

Yes, and since it is pretty clear that our users want to do it, it's just stubbornness to make it surprisingly hard for them.

As written above when referring to catch_unwind, this can be seen as a safety feature. I am not saying that this is the only possibly of looking at it, but I would also be glad if the argument I was trying to make was not brushed aside as mere stubbornness.

My first instinct was to catch PanicException by type, as is proper, and only as an afterthought I realized that it is impossible in general.

I am definitely not happy about the stringly type check but I would also like avoid conflating these two issues: That each PyO3 extension has their own type objects and whether PanicException should derive from Exception.

The first issue is regularly reported in different contexts as well where it similarly surprises users. We probably need better documentation for this as I hardly see us committing to a stable C ABI/API for sharing type objects any time soon.

@touilleMan
Copy link

touilleMan commented Feb 28, 2023

I'd like to reiterate that we are discussing how hard it is to catch PanicException, not whether it is possible at all.

@adamreichold I still fail to understand what is the benefit to make such thing harder 🤔

I've provided two real-life use cases where having PanicException inheriting Exception is valuable in term of readability, maintenance and interoperability with 3rd party libs.

Can you provide some examples showing PanicException inheriting BaseException have some benefits ?

as one should definitely be sure that one wants to do this and hence making the programmer jump through a hoop or two can be seen as a safety feature.

What about adding a feature to enable PanicException inheriting Exception ? This way it would be opt-in (hence the jump through a hoop) but not painful to use 😃

Coming back to your first example, how do Python webservers handle things like SystemExit?

@mejrs the webserver (actually the webframework) don't have to handle SystemExit, it just let it bubble up (unwinding the stack as it does so that any resource is automatically closed thanks to the context managers ). In the end the exception reached the end of the stack and the Python interpreter knowns it can safely stop the program altogether.
This is actually pretty similar to what is done with any uncatched exception (including KeyboardInterrupt), the only difference is that Python interpreter choose to print the stacktrace of those errors as they should have been handled by the program and hence we can consider the program has crashed with an unexpected exception.

@mejrs
Copy link
Member

mejrs commented Feb 28, 2023

@mejrs the webserver (actually the webframework) don't have to handle SystemExit, it just let it bubble up (unwinding the stack as it does so that any resource is automatically closed thanks to the context managers ). In the end the exception reached the end of the stack and the Python interpreter knowns it can safely stop the program altogether.

Cleaning up things as it bubbles up is exactly what a panic does. To me a panic feels very similar to SystemExit, so I'm not sure why you feel the need to handle PanicException and SystemExit differently.

@mejrs
Copy link
Member

mejrs commented Feb 28, 2023

As an aside, the "I have a web server that cannot crash no matter what" use case is not a good example. You have to account for your service going down either way. You cannot stop things like aborts/segfaults from bringing down your server.

You really should just let the panic do its thing and have a service for restarting your server if it goes down.

@david-cortes
Copy link
Author

As an aside, the "I have a web server that cannot crash no matter what" use case is not a good example. You have to account for your service going down either way. You cannot stop things like aborts/segfaults from bringing down your server.

You really should just let the panic do its thing and have a service for restarting your server if it goes down.

In a segfault, there might be memory corruption extending outside of the library in question, to the point that a program cannot be certain that e.g. print will call print anymore. At that point, it's not possible to gracefully recover from the error.

In a rust panic, there doesn't need to be any memory corruption that would impede the user to e.g. return an error trace, zero, NaN, or similar; or to do the operation with a non-PyO3 library.

@mejrs
Copy link
Member

mejrs commented Feb 28, 2023

That is not my point. The server can go down whether or not you are catching PanicException. For example:

  • The rust code panics while panicking, causing an abort
  • allocation failure aborts the program
  • the library is compiled with panic=abort (there are good reasons for this, like faster compile times and smaller binaries)

So the usability argument is moot. You still have to be ready for your server crashing. If anything, making it easy to catch PanicException is worse for usability, because it lulls you into the idea that you can just catch it and not have to worry about any of the above things.

In a rust panic, there doesn't need to be any memory corruption

That's not necessarily true; there have been many soundness bugs (including in the standard library) caused by code unwinding out of unsafe code without that unsafe code being ready for it. I have written such buggy unsafe code - it is a really difficult problem.

Moreover, safe code might not behave well after catching a panic. The panic might have poisoned a mutex or messed up an invariant somewhere, and any subsequent calls into that library might just panic again.

Personally I would not trust that if I caught a panic from an underlying rust library that it would continue to do what I want it to. There is really only one use case for catching panics; catching it, doing something (like logging) and resuming the panic.

@touilleMan
Copy link

Cleaning up things as it bubbles up is exactly what a panic does. To me a panic feels very similar to SystemExit, so I'm not sure why you feel the need to handle PanicException and SystemExit differently.

To me a panic is very similar to a ZeroDivisionError it is an exception representing a unexpected error (please tell me nobody uses ZeroDivisionError as control flow in their code😄 ).
On the other hand SystemExit represent the intention of stopping the program, and KeyboardInterrupt an external intention of stopping the program.

Should an unexpected error cause the program to stop ? Maybe.

  • If you ask the C people they will tell you "hell yes !" because the language has no memory safety and things can only go worst from this point.
  • If you ask the Erlang peoply they will tell you "hell no !" because the application uptime is critical and the whole language&runtime is build around an unexpected error recovery system (i.e. the "let it crash" philosophy)

Python is a memory safe language, Rust is a memory safe language.
So we don't risk corrupting the memory state of the program but only it business logic, which by definition only the application developer knows what to do about.

The panic might have poisoned a mutex or messed up an invariant somewhere, and any subsequent calls into that library might just panic again.

That's precisely the point. The application developer is supposed to correctly handle the unexpected exception. If the developer made a mistake in the handling then more exceptions will occur without corrupting memory which is totally what you would expect in a pure-Python code.

As an aside, the "I have a web server that cannot crash no matter what" use case is not a good example. You have to account for your service going down either way. You cannot stop things like aborts/segfaults from bringing down your server.

You are misinterpreting my words:

  • I'm okay with my server application having to shut down (and I'm happy a KeyboardInterrupt exception will bubble up and allow a clean shutdown).
  • I'm okay with my server application crashing in a segfault if there is a catastrophic bug in CPython or a native library I use. That's because there is not much we can do about it, and CPython is heavily battle-tested so the risk is low anyway (never had a single segfault in +10years using CPython ^^)
  • I'm not okay with a library calling exit when it encounter an error instead of raising a proper exception that fit into the Python error handling philosophy.

There is really only one use case for catching panics; catching it, doing something (like logging) and resuming the panic.

See my example 2: it's complex to guarantee the error will be correctly logged and reported when if it inherits BaseException because the whole Python ecosystem uses the convention that errors inherit Exception.

If anything, making it easy to catch PanicException is worse for usability, because it lulls you into the idea that you can just catch it and not have to worry about any of the above things.

I think you try too hard to go into my shoes here. As a programmer I need to be able to recover from error as long as there is no memory corruption.
I understand I have to be careful and conservative about it (for instance I can destroy all resources that are related to the library that have send the error, and restart from zero), but this is the same thing with my whole codebase.

That's not necessarily true; there have been many soundness bugs (including in the standard library) caused by code unwinding out of unsafe code without that unsafe code being ready for it. I have written such buggy unsafe code - it is a really difficult problem.

That's an interesting point 😃

I would say soundness bugs in 3rd party libraries are out of the scope: any piece of Rust may contain unsafe blocks and do broken dark magic, this wouldn't mean we should consider any pyo3 module broken per-se.
On top of that, one can enable panic=abort as you said to ensure extra safety if he considers panicking is too big of a codesmell.

I'm really interesting however if you can share an example of the buggy unsafe code you've written 😃
What I understand from Rust documentation is that unwind safety guarantee memory safety and only "It is possible, however, for logical invariants to be broken in Rust which can end up causing behavioral bugs".
Again this seem to indicate only the application developer can say if his code is unwind safe or not.

@mejrs
Copy link
Member

mejrs commented Mar 1, 2023

I think we don't agree on what a panic is. A panic is not an unexpected error. A panic is a bug in a program. If I write a python library in Rust, and python code can cause a panic in the underlying Rust code that means my library has a bug.

Should a bug in a program cause the program to stop? Absolutely.

@david-cortes
Copy link
Author

I think we don't agree on what a panic is. A panic is not an unexpected error. A panic is a bug in a program. If I write a python library in Rust, and python code can cause a panic in the underlying Rust code that means my library has a bug.

Should a bug in a program cause the program to stop? Absolutely.

This is what a panic looks like in practice in what's perhaps the most popular library using PyO3:

import polars as pl
pl.DataFrame({'x':['Some text']}) / 0
PanicException: data types don't match: InvalidOperation(Owned("division operation not supported for shape: (1,)\nChunkedArray: 'x' [str]\n[\n\t\"Some text\"\n] and shape: (1,)\nSeries: '' [str]\n[\n\t\"0\"\n]"))

@adamreichold
Copy link
Member

This is what a panic looks like in practice in what's perhaps the most popular library using PyO3:

Polars usage of panics for invalid input and operations is simply incorrect. We consider it a bug in Polars and have said so multiple times in issues and discussions here.

@david-cortes
Copy link
Author

This is what a panic looks like in practice in what's perhaps the most popular library using PyO3:

Polars usage of panics for invalid input and operations is simply incorrect. We consider it a bug in Polars and have said so multiple times in issues and discussions here.

Surely it is a bug in polars, but it is not reasonable to expect libraries to be bug-free. And if this is how a high-profile library like polars uses PyO3, I can only venture to guess that smaller libraries might be doing worse things.

@adamreichold
Copy link
Member

Surely it is a bug in polars, but it is not reasonable to expect libraries to be bug-free. And if this is how a high-profile library like polars uses PyO3, I can only venture to guess that smaller libraries might be doing worse things.

This is a reasonable assumption to make, but I don't think it fits this particular case: This is not an oversight or misunderstanding, but Polars rather made a decision to use panics this way which does not align with the community consensus on when panics or Results are appropriate. (Already in the underlying Rust library, not just their Python bindings.)

@touilleMan
Copy link

A panic is not an unexpected error. A panic is a bug in a program.

Both are not mutually exclusive, a panic is unexpected (as the language give no guarantee it won't happen).

Should a bug in a program cause the program to stop? Absolutely.

The Erlang people disagree with that (and also basically everybody writing code than involve dead people if the software stops 😄 )
You cannot deal with absolute when it comes to buggy software. Otherwise why just stop the program ? It's very possible the panic came from a side effect in the OS (or even the hardware), should we also restart the computer ?

There is no perfect solution here, only strategies with pros and cons.
I agree that restarting the software is the best way to be safe when a bug occurs. But we can agree it would be a terrible user experience if something as complicated as Firefox had to close every time it had a hiccup.

@mejrs
Copy link
Member

mejrs commented Mar 22, 2023

Thanks for the discussion but I'm closing this again because I still believe inheriting from BaseException is the right thing to do.

Discussion or solutions on making PanicException easier to name/import/have-one-for-all-pyo3-modules is still welcome, but please make a new issue/PR for doing so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants