You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mypy assisted error handling, exception mechanisms in other languages, fun with pattern matching and type variance
TLDR: I overview few error handling techniques (with the emphasis on Python, although I mention few other programming languages), some existing Python libraries and suggesting a simple and clean mypy-based approach.
You might learn few things about error handling in different languages, pattern matching, type variance, mypy's capabilities in general and clues for making your code and interfaces more mypy-friendly (and IDE friendly if you're using LSP/Intellij).
I am somewhat obsessed with personal data and information, analyzing data for quantified self, lifelogging etc. I am trying to integrate all my information sources and make it easy to access and search. You can see some examples in my package and Orger: part I, part II.
To get this data, manipulate with it and interact with, of course, you need to extract it first (e.g. from json/csv), parse it (e.g. from plaintext), or even worse, reverse engineer it from vendor locked formats (e.g. in my kobo parsing library).
If you ever worked with data and had to parse some semi-structured data (let alone natural language), or scraped web pages, you might start getting flashbacks now. Undocumented APIs, bad characters, cryptic regexes, corrupt fields, unexpected nulls, logical inconsistencies, all sorts of things. You will almost never get it right from the first few attempts, and then when it finally does what your want… it breaks after couple of days because of course you missed some edge cases or data provider just gives you utter garbage for no reason. And thing you've spent so much effort on stops working, spams your mailbox and requires attention.
Ew. Data is messy.
Most modern programming languages are fairly unforgiving to unexpected, and would crash at the slightest opportunity. Some languages do have quirks (e.g. 'undefined' in JS), but generally well written software aborts very soon after something unexpected starts happening. And for good reasons:
if it didn't, your program's state would lose the properties the author intended it to have.
Ignoring the errors will almost surely prevent the program from getting to desired result anyway and end up with even more severe, or potentially catastrophic inconsistencies. How about formatting your disk if you're really unlucky?
another good reason to fail fast is that it makes the programmer more likely to notice and then fix the bug
So in most cases, as long as you can get away with it, it's good to throw exception or abort the program immediately in some way. You might not be able to do that if you're literally doing rocket science or flight control software, but most of us aren't. For typical software engineering problems, some errors are less crucial and more manageable than other errors. So we try to be pragmatic when we program, evaluate failure risks and use try/catch mechanisms where appropriate.
Now, I'm sure we as an engineers we could handwave about about that stuff forever, so let me be more specific straightaway and introduce a motivating real life problem that I actually had to solve.
Say, you own a Kindle book. Electronic books are great. Yeah okay they don't smell like the real thing, but the possibility of highlighting bits of text and typing your comment without distracting on external means of annotation is incredibly helpful. However, then when you want to go through your highlights after reading to refresh your memory or perhaps to share with a friend, you find out it's not so convenient to actually quickly access them.
So you decide to write a script that would process the highlights, perhaps group them by book, displays timestamps and render a nice HTML page so you could easily open it from phone and recall latest books you read to discuss with friends.
On device, Kindle keeps bookmarks and highlights are stored … in My Clippings.txt file.
PHYS771 Lecture 12: Proof (scottaaronson.com)
- Your Highlight on Page 2 | Added on Sunday, July 21, 2013 10:06:53 AM
Roger Penrose likes to talk about making direct contact with Platonic reality, but it's a bit embarrassing when you think you've made such contact and it turns out the next morning that you were wrong!
[Tong][2013] Dynamics and Relativity
Your Highlight on Page 120 | Added on Sunday, August 4, 2013 6:17:21 PM
It is worth mentioning that although the two people disagree on whether the light hits the walls at the same time, this does not mean that they can't be friends.
PHYS771 Lecture 12: Proof (scottaaronson.com)
Your Highlight on Page 14 | Added on Sunday, August 4, 2013 8:41:53 PM
No hidden-variable theory can be local (I think some guy named Bell proved that).
Yes, it's a messy format and not very machine friendly. But oh well it's a file, you're a programmer. You know the drill.
1: from datetime import datetime
2: from typing import NamedTuple, Sequence
3: import re
4: from pathlib import Path
5: from itertools import groupby
6: from textwrap import wrap
7: 8: classHighlight(NamedTuple):
9: dt: datetime
10: title: str11: page: str12: text: str13: 14: classBook(NamedTuple):
15: "Represents book along with its highlights"16: title: str17: highlights: Sequence[Highlight]
18: 19: defparse_entry(entry: str) -> Highlight:
20: groups = re.search(
21: r'(?P<title>.*)$\n.*Highlight on Page (?P<page>\d+).*Added on (?P<dts>.*)$\n\n(?P<text>.*)$',
22: entry,
23: re.MULTILINE,
24: )
25: assert groups isnotNone, "Couldn't match regex!"26: dt = datetime.strptime(groups['dts'], '%A, %B %d, %Y %I:%M:%S %p')
27: return Highlight(
28: dt=dt,
29: title=groups['title'],
30: page=groups['page'],
31: text=groups['text'],
32: )
33: 34: defiter_highlights():
35: data = Path(clippings_file).read_text()
36: for entry in data.split('=========='):
37: yield parse_entry(entry.strip())
38: 39: defiter_books():
40: key = lambda e: e.title
41: for book, hls in groupby(sorted(iter_highlights(), key=key), key=key):
42: highlights = list(sorted(hls, key=lambda hl: hl.dt))
43: yield Book(title=book, highlights=highlights)
44: 45: defprint_books():
46: for r in iter_books():
47: print(f'* {r.title}')
48: for h in r.highlights:
49: text = "\n ".join(wrap(h.text))
50: print(f' - {h.dt:%d %b %Y %H:%M} {text} [Page {h.page}]')
51: print()
52: print_books()
* PHYS771 Lecture 12: Proof (scottaaronson.com)
- 21 Jul 2013 10:06 Roger Penrose likes to talk about making direct contact with Platonic
reality, but it's a bit embarrassing when you think you've made such
contact and it turns out the next morning that you were wrong! [Page 2]
- 04 Aug 2013 20:41 No hidden-variable theory can be local (I think some guy named Bell
proved that). [Page 14]
[Tong][2013] Dynamics and Relativity
04 Aug 2013 18:17 It is worth mentioning that although the two people disagree on
whether the light hits the walls at the same time, this does not mean
that they can't be friends. [Page 120]
For the purposes of this post, to keep the example output clean, I am just using plain text. Even though it's not quite HTML with CSS, it still looks kinda nice, doesn't it?
if you're wondering why yield I'll explain it further down
Now:
imagine you've set this script to run in cron, and it's been fine for a while. You left for a three week holiday to finally get some rest from programming; started reading this new book about quant finance (yeah, you've always had interesting ways of getting a rest from computer) and… your script stopped working.
Traceback (most recent call last):
File "<stdin>", line 55, in <module>
File "<stdin>", line 49, in print_books
File "<stdin>", line 44, in iter_books
File "<stdin>", line 34, in iter_highlights
File "<stdin>", line 21, in parse_entry
AssertionError: Couldn't match regex!
You swear out loud, reach for the laptop you promised to distance yourself from and turns our your parser chokes over page instead of Page in one of new entries. (and yes, this was actually the case in my Kindle export)
PHYS771 Lecture 12: Proof (scottaaronson.com)
- Your Highlight on Page 2 | Added on Sunday, July 21, 2013 10:06:53 AM
Roger Penrose likes to talk about making direct contact with Platonic reality, but it's a bit embarrassing when you think you've made such contact and it turns out the next morning that you were wrong!
[Tong][2013] Dynamics and Relativity
Your Highlight on Page 120 | Added on Sunday, August 4, 2013 6:17:21 PM
It is worth mentioning that although the two people disagree on whether the light hits the walls at the same time, this does not mean that they can't be friends.
PHYS771 Lecture 12: Proof (scottaaronson.com)
Your Highlight on Page 14 | Added on Sunday, August 4, 2013 8:41:53 PM
No hidden-variable theory can be local (I think some guy named Bell proved that).
My Life as a Quant: Reflections on Physics and Finance (Emanuel Derman)
Your Highlight on page 54 | Added on Tuesday, October 4, 2013 12:11:16 PM
The Black-Scholes model allows us to determine the fair value of a stock option.
You could argue that you should have made the regex in parse_entry case independent in the first place, but it's not something you would normally expect. Kindle specifically got all sorts of nasty things: roman numerals for page numbers, locale dependent dates, inconsistent separators, and so on.
Perhaps you even fix this particular problem, but it's a matter of short time till next parsing issue. It's quite sad if you have to constantly tend for things that are meant to simplify and enhance your life.
Or,
you wrote this parser and decided that it could be useful for other people.
So for a small fee, you are providing a service that fetches highlights from their Kindles, displays on profile pages and lets their friends comment.
Imagine user's highlights result in the same error described above. It would be pretty sad if parsing a single entry took down the whole user's page or prevented updates. No matter how fast you'd be willing to fix these things, users would leave discouraged.
With the way code is written at the moment, any exception would take the whole program down. So, we need some way of getting around these errors and carrying on.
One simple strategy would be to make parsing fully defensive, wrap the whole parse_entry call in try/except and log:
33: import logging
34: defiter_highlights():
35: data = Path(clippings_file).read_text()
36: for entry in data.split('=========='):
37: try:
38: yield parse_entry(entry.strip())
39: exceptExceptionas e:
40: logging.exception(e)
Logging typically works well for minor things not worthy a proper error (i.e. warnings) and as a means of retrospective error analysis and debugging. In our case logging wouldn't do the job:
you're not aware that error is happening at all. If it's your personal tool, chances are you don't have time to go through all the logs and inspect them regularly.
user expects to see their data, but can't find it. It's pretty frustrating.
What do we want?
keep track of errors, render as much as we can, but terminate with non-zero exit code
potentially present errors in the interface so you or your users wouldn't worry about lost data
So we need some way of propagating the errors up the call hierarchy instead of throwing immediately or suppressing.
Often it's tempting to fallback to some sort of special 'default' or 'error' value. I bet you've seen this before: 0 or INT_MAX meaning error for integer type, or "" for string types. We could try something similar and squeeze exception into the Highlight object itself.
33: defiter_highlights():
34: data = Path(clippings_file).read_text()
35: for entry in data.split('=========='):
36: try:
37: yield parse_entry(entry.strip())
38: exceptExceptionas e:
39: yield Highlight(dt=datetime.now(), page='', book="ERROR", text=str(e))
One obvious problem is that it's very nontransparent and relies on implicit convention: there is no way of telling that this function might return some special Highlight which should be treated as error. That not only complicates code, but might also introduce logical inconsistencies.
E.g. if your Highlight object also returned book's ISBN and you filled it with some arbitrary text, it would almost surely not be a valid ISBN, that might cause failures down the pipeline.
Sometimes it's inevitable though, e.g. I'm giving an example later.
An abstraction that stood the test of time well is a container that holds a result representing one of two:
success value, representing the desired outcome of type T
or 'error value', holding error description of type E.
I will try to stick to the same semantics further down, 'result' typically meaning that it could be either desired value or error.
You can vaguely think of it as an interface Result, and two implementations: Ok and Error. In runtime, you can ask the instance behind Result, which of these alternative it holds and act accordingly.
let f: Result<File, io::Error> = File::open("hello.txt");
let f = match f {
Ok(file) => file,
Err(error) => {
panic!("There was a problem opening the file: {:?}", error)
},
};
main = do
line <- getLine
case runParser emailParser line of
Right (user, domain) -> print ("The email is OK.", user, domain)
Left (pos, err) -> putStrLn ("Parse error on " <> pos <> ": " <> err)
Yes, Left meaning error and Right meaning success are not necessarily obvious. It's kinda a pun: "right" also means "correct". Also notice that error is not just a string, but also contains the position where parsing failed.
So, Rust and Haskell programmers seem to be quite happy with it? Why can't we have same in Python? Well, some people tried! So I'll review a python library that does that: result.Result
Let's try it on our program and see how it works. To make it easier to compare to the original code I suggest duplicating the tab in a separate window and tiling them side by side.
33: from result import Ok, Err
34: defiter_highlights():
35: data = Path(clippings_file).read_text()
36: for entry in data.split('=========='):
37: try:
38: yield Ok(parse_entry(entry.strip()))
39: exceptExceptionas e:
40: yield Err(str(e))
We've had to wrap success and error values in Ok and Err, but so far it's not too bad.
41: from itertools import tee
42: defiter_books():
43: vit, eit = tee(iter_highlights())
44: values = (r.value for r in vit if r.is_ok())
45: errors = (r.err() for r in eit if r.is_err())
46: key = lambda e: e.title
47: for book, hls in groupby(sorted(values, key=key), key=key):
48: highlights = list(sorted(hls, key=lambda hl: hl.dt))
49: yield Ok(Book(title=book, highlights=highlights))
50: yieldfrommap(Err, errors)
We use itertools.tee here so we don't have to pollute our code with temporary lists.
51: defprint_books():
52: for r in iter_books():
53: if r.is_ok():
54: v = r.value
55: print(f'* {v.title}')
56: for h in v.highlights:
57: text = "\n ".join(wrap(h.text))
58: print(f' - {h.dt:%d %b %Y %H:%M} {text} [Page {h.page}]')
59: print()
60: else:
61: e = r.err()
62: print(f"* ERROR: {e}")
63: print_books()
* PHYS771 Lecture 12: Proof (scottaaronson.com)
- 21 Jul 2013 10:06 Roger Penrose likes to talk about making direct contact with Platonic
reality, but it's a bit embarrassing when you think you've made such
contact and it turns out the next morning that you were wrong! [Page 2]
- 04 Aug 2013 20:41 No hidden-variable theory can be local (I think some guy named Bell
proved that). [Page 14]
[Tong][2013] Dynamics and Relativity
04 Aug 2013 18:17 It is worth mentioning that although the two people disagree on
whether the light hits the walls at the same time, this does not mean
that they can't be friends. [Page 120]
ERROR: Couldn't match regex!
Cool, we rendered as much as we can, and we get the error displayed as well, so nothing crashes and the users are not as unhappy. The error looks a bit out of nowhere, but at least it's there. We will address how we can improve it later.
Sadly, for someone else who looks at iter_highlights or iter_books signatures, it's not obvious that it yields Result objects, not Book/Highlight objects without reading the code. It's a thankless job for a human to keep track of, and mypy is a perfect fit for this task. Gladly, result library already comes with type annotations.
So, let's try to use mypy to aid us at writing correct code.
Let's focus just on iter_highlights and iter_books and use the Result type.
34: from result import Ok, Err, Result
35: from typing import Iterator
36: Error = str37: 38: defiter_highlights() -> Iterator[Result[Error, Highlight]]:
39: data = Path(clippings_file).read_text()
40: for entry in data.split('=========='):
41: try:
42: yield Ok(parse_entry(entry.strip()))
43: exceptExceptionas e:
44: yield Err(str(e))
45: from itertools import tee
46: defiter_books() -> Iterator[Result[Error, Book]]:
47: vit, eit = tee(iter_highlights())
48: values = (r.ok() for r in vit if r.is_ok())
49: errors = (r for r in eit if r.is_err())
50: key = lambda e: e.title
51: for book, hls in groupby(sorted(values, key=key), key=key):
52: highlights = list(sorted(hls, key=lambda hl: hl.dt))
53: yield Ok(Book(title=book, highlights=highlights))
54: yieldfrom errors
Mypy output [exit code 1]:
input.py: note: In function "iter_books":
input.py:52: error: Item "None" of "Optional[Highlight]" has no
attribute "dt" [union-attr]
highlights = list(sorted(hls, key=lambda hl: hl.dt))
^
input.py:53: error: Argument "highlights" to "Book" has incompatible
type "List[Optional[Highlight]]"; expected "Sequence[Highlight]" [arg-type]
yield Ok(Book(title=book, highlights=highlights))
^
input.py:54: error: Incompatible types in "yield from" (actual type
"Result[str, Highlight]", expected type "Result[str, Book]") [misc]
yield from errors
^
Found 3 errors in 1 file (checked 1 source file)
Umm. Let's go through the errors:
errors 1 and 2 are due to ok() method being too defensive and returning None if is_ok is False. Ideally, you'd throw exception here, because such a situation is a programming bug. We can just enforce non-optional type here via unopt helper.
error 3 happens because even though we filtered error values, mypy has no idea about that, so it still assumes that errors might hold Highlight objects. You could blame mypy of not being smart enough, but it would be a very hard if not impossible analysis in general case. We can get around this by unpacking error and wrapping back in Err.
Let's apply these insights and try again:
45: from typing import Optional, TypeVar
46: X = TypeVar('X')
47: defunopt(x: Optional[X]) -> X:
48: 49: assert x isnotNone50: return x
51: 52: from itertools import tee
53: defiter_books() -> Iterator[Result[Error, Book]]:
54: vit, eit = tee(iter_highlights())
55: values = (unopt(r.ok()) for r in vit if r.is_ok())
56: errors = (unopt(r.err()) for r in eit if r.is_err())
57: key = lambda e: e.title
58: for book, hls in groupby(sorted(values, key=key), key=key):
59: highlights = list(sorted(hls, key=lambda hl: hl.dt))
60: yield Ok(Book(title=book, highlights=highlights))
61: for err in errors:
62: yield Err(err)
Mypy output [exit code 0]:
Success: no issues found in 1 source file
Phew! With some minor changes and restructuring we've convinced mypy.
It does come with some downsides:
readability: there is a bit of visual noise since you need to add Ok/Err wrappers and access the success value via .value property
safety: you could forget to call is_ok/is_err before calling ok/err, and mypy won't even blink.
The contract if .is_ok() is True, then it's safe to call .ok() is too complicated to be encoded as a type that mypy can handle. You'll get None or exception thrown in runtime. The author of the library admits it by the way, so it's not a criticism, just highlighting limitations of mypy here!
Now, let's try out returns.result library, clearly inspired by Haskell's Either monad and do notation. I'm quite glad someone already implemented it and I didn't have to reinvent the wheel here.
So, let's try and rewrite the code using returns.result.Result:
19: from returns.result import safe
20: 21: @safe22: defparse_entry(entry: str) -> Highlight:
23: groups = re.search(
24: r'(?P<title>.*)$\n.*Highlight on Page (?P<page>\d+).*Added on (?P<dts>.*)$\n\n(?P<text>.*)$',
25: entry,
26: re.MULTILINE,
27: )
28: assert groups isnotNone, "Couldn't match regex!"29: dt = datetime.strptime(groups['dts'], '%A, %B %d, %Y %I:%M:%S %p')
30: return Highlight(
31: dt=dt,
32: title=groups['title'],
33: page=groups['page'],
34: text=groups['text'],
35: )
36: 37: from returns.result import Result
38: from typing import Iterator
39: defiter_highlights() -> Iterator[Result[Highlight, Exception]]:
40: data = Path(clippings_file).read_text()
41: for entry in data.split('=========='):
42: yield parse_entry(entry.strip())
So far the only difference from the original code is @safe decorator on parse_entry, which basically deals with catching all exceptions and wrapping into Result.
As a consequence, iter_highlights required no changes in its body. (which may not be a desirable thing as we'll see later)
43: from typing import cast
44: from returns.result import Success, Failure
45: from itertools import tee
46: defiter_books() -> Iterator[Result[Book, Exception]]:
47: vit, eit = tee(iter_highlights())
48: sentinel = cast(Highlight, object())
49: values = (r.unwrap() for r in vit if r.value_or(sentinel) isnot sentinel)
50: errors = (r.failure() for r in eit if r.value_or(sentinel) is sentinel)
51: key = lambda e: e.title
52: for book, hls in groupby(sorted(values, key=key), key=key):
53: highlights = list(sorted(hls, key=lambda hl: hl.dt))
54: yield Success(Book(title=book, highlights=highlights))
55: for e in errors:
56: yield Failure(e)
Ok, that definitely requires some explanation…
returns library public API doesn't provide any way to tell between success and failure (kind of deliberately). The types _Success and _Failure are private, and the only method that we can use seems to be result.value_or(default). This method returns the success value if result is Success and falls back to default if result is a Failure. So we use a sentinel object to distinguish between actual success values and default ones, and also have to trick mypy with a cast.
Apart from this obscurity, the function suffers from exactly the same issues as the iter_books implementation from the previous section, and for the same reason: contract is too complicated to be expressed in mypy.
One could argue that this function is going to look awkward anyway since we need to separate list of results into successes and errors. Let's see the function that should be more straightforward:
57: from typing import Callable
58: defprint_books() -> None:
59: for r in iter_books():
60: defprint_ok(r: Book) -> None:
61: print(f'* {r.title}')
62: for h in r.highlights:
63: text = "\n ".join(wrap(h.text))
64: print(f' - {h.dt:%d %b %Y %H:%M} {text} [Page {h.page}]')
65: print_error = lambda e: print(f"* ERROR: {e}")
66: r.map(print_ok).fix(print_error)
The idea here is that we can use map method (that works like fmap in Haskell) and use it to print successful results, and chain it with fix that works like like fmap, but for errors. In a sense, these methods encapsulate pattern matching (which Python lacks syntactically) so as long the implementor did the dirty business of correctly doing it dynamically, you're safe. However I feel that this particular library overdid this encapsulation a bit, hence very hacky implementation of iter_books.
Lambdas can't be multiline, so we have to define a local function for print_ok.
There is a bug in mypy that sometimes prevents you from inlining the lambda and struggles with type inference. Here I'm hitting this bug with print_error, that's why it's not .fix(lambda e: print(f"* ERROR: {e}")).
Another potential problem is one could forget to implement one of map/fix clauses, since nothing enforces calling them. Even if you're detecting unused variables, missing .fix clause could stay unnoticed forever. It's very similar to forgetting catch when using Javascript Promises.
It might be possible to enforce with some static analysis though, e.g. via mypy plugin by flagging dangling/temporary Result values (e.g. similarly to must_use attribute in Rust), but it's a project on its own.
Well at the very least it works and type checks!
Python output [exit code 0]:
* PHYS771 Lecture 12: Proof (scottaaronson.com)
- 21 Jul 2013 10:06 Roger Penrose likes to talk about making direct contact with Platonic
reality, but it's a bit embarrassing when you think you've made such
contact and it turns out the next morning that you were wrong! [Page 2]
- 04 Aug 2013 20:41 No hidden-variable theory can be local (I think some guy named Bell
proved that). [Page 14]
* [Tong][2013] Dynamics and Relativity
- 04 Aug 2013 18:17 It is worth mentioning that although the two people disagree on
whether the light hits the walls at the same time, this does not mean
that they can't be friends. [Page 120]
* ERROR: Couldn't match regex!
Mypy output [exit code 0]:
Success: no issues found in 1 source file
Overall I'm not sold, Python simply lacks syntax that lets you unpack and compose Result objects in a clean way and you end up with boilerplate. lifts are not very readable in Haskell, let alone in Python.
I think authors did a great experiment though, the more people have fun with types, the more good abstractions we'll find.
I don't want to discourage people from using their library, so if it's your personal project and it makes your code more manageable or it just feels fun then by all means go for it!
But as much as I like ideas from functional programming, I'm almost certain that it's gonna look confusing to an average Python programmer, and won't be welcome warmly in your team.
f, err := os.Open("filename.ext")
if err != nil {
log.Fatal(err)
}
// do something with the open *File f
However, it's not limited only by Go, e.g. you'd often encounter it implicitly in C (which had no exceptions) or C++ code. For instance, std::filesystem::is_symlink comes in two flavours:
bool is_symlink( const std::filesystem::path& p ), which throws exceptions on errors.
bool is_symlink( const std::filesystem::path& p, std::error_code& ec ) noexcept, which sets ec on errors.
You can think of it as if it returned std::tuple<bool, std::error_code>. I assume it's not that way because the compiler wouldn't be able to distinguish between signatures.
Personally I as well as many other people find it pretty ugly. No judgment here though as I have no idea behind the design requirements and rationale for such a model in Go. Pretty sure one can get used to it after a while and that there are some static flow analyzers that help to ensure correct error handling.
Main issue with this approach regarding Python is that it's not mypy friendly as return type of Open would have to be Tuple[Optional[Success], Optional[Error]]. In the type theory language, it is a product type, so in addition to all members of Success type and all members of Error type, it also got inhabitants that don't make sense for our program, such as (None, None) and also all of Tuple[Success, Error].
In other words, nothing on type level prevents the callee (os.Open) from returning something like (file_descriptor, "whoops"), which has ambiguous meaning. If we use it we would have to pay with sacrificing type safety or extra code on caller site to eliminate these impossible program states:
f, err = open('filename.ext')
if err isNone:
assert f isnotNone
It seems that we were on the right track with the container type and combinators, but never completely satisfied. Let's recall the problems we had again:
readability: extra wrappers and accessor methods like Ok/Success/Error/.is_ok()/.unwrap().
It's visual noise and also they creep throughout the code, so if you decide you won't need them later, you might have to refactor a lot of code.
safety: it's still possible to write logically inconsistent code like if res.is_error(): return res.value * 10.
composability: fmap-style combinators are not really going to look good because Python lacks multiline lambdas.
performance and memory use: not going to make claims here as I haven't benchmarked, but there is a potential for overhead caused by extra wrapper objects.
First, we'll attack readability and safety. Yes, at the same time!
In part it's solved with syntactic sugar in other languages like do syntax in Haskell, or try! macro and ? operator in Rust. Sometimes it's inevitable and you have to inject values into rust's Result explicitly via Ok/Err constructors. However checking for .is_ok() or isRight is really not that common in idiomatic Rust and Haskell. Reason is pattern matching! E.g. if we had pattern matching in Python we could write something like:
defprint_books():
for r in iter_books():
match r:
Book b:
print(f'* {b.title}')
for h in b.highlights:
text = "\n ".join(wrap(h.text))
print(f' - {h.dt:%d %b %Y %H:%M} {text} [Page {h.page}]')
print()
Error e:
print(f"* ERROR: {e}")
That's cleaner than checking for is_ok/is_err and unpacking; and also makes it type safe because b and e already have the appropriate types. In our imaginary world where python had this syntax, surely mypy would have supported it too, right?
That looks very similar to pattern matching both in terms of syntax and typing rules.
So, it seems that Union would represent our result type. Do we still need to come up with some special wrapper for errors? Not really, Python already has a fairly convenient candidate for it: Exception! Most often you have it anyway in except clause, if it's not enough, you can inherit it, add extra fields and treat as any other type.
On the other hand, Exceptions almost never end up as function return values (and when they do, it's normally some fairly unambiguous code dealing specifically with error handling). Hmm, how convenient 🤔.
So even though we don't have explicit tagged unions in Python, if we agree that error values are represented as Exceptions, then we do get a disjoint type (i.e. Ok and Error are mutually exclusive) at runtime.
So, rules of thumb:
use Union[T, Exception] to represent type for results that hold T but can also end up with an error
return or yield exceptions and success values without using any extra wrappers
'pattern match' through isinstance
Let's see how we can rewrite our program by employing these principles:
33: from typing import TypeVar, Union34: T = TypeVar('T')
35: Res = Union[T, Exception]
36: 37: from typing import Iterator
38: 39: defiter_highlights() -> Iterator[Res[Highlight]]:
40: data = Path(clippings_file).read_text()
41: for entry in data.split('=========='):
42: try:
43: yield parse_entry(entry.strip())
44: exceptExceptionas e:
45: yield e46: 47: from itertools import tee
48: 49: defiter_books() -> Iterator[Res[Book]]:
50: vit, eit = tee(iter_highlights())
51: values = (r for r in vit ifnotisinstance(r, Exception))
52: errors = (r for r in eit ifisinstance(r, Exception))
53: key = lambda e: e.title
54: for book, hls in groupby(sorted(values, key=key), key=key):
55: highlights = list(sorted(hls, key=lambda hl: hl.dt))
56: yield Book(title=book, highlights=highlights)
57: yieldfrom errors
58: 59: defprint_books() -> None:
60: for r in iter_books():
61: ifnotisinstance(r, Exception):
62: print(f'* {r.title}')
63: for h in r.highlights:
64: text = "\n ".join(wrap(h.text))
65: print(f' - {h.dt:%d %b %Y %H:%M} {text} [Page {h.page}]')
66: print()
67: else:
68: print(f"* ERROR: {r}")
69: print_books()
Python output [exit code 0]:
* PHYS771 Lecture 12: Proof (scottaaronson.com)
- 21 Jul 2013 10:06 Roger Penrose likes to talk about making direct contact with Platonic
reality, but it's a bit embarrassing when you think you've made such
contact and it turns out the next morning that you were wrong! [Page 2]
- 04 Aug 2013 20:41 No hidden-variable theory can be local (I think some guy named Bell
proved that). [Page 14]
[Tong][2013] Dynamics and Relativity
04 Aug 2013 18:17 It is worth mentioning that although the two people disagree on
whether the light hits the walls at the same time, this does not mean
that they can't be friends. [Page 120]
ERROR: Couldn't match regex!
Mypy output [exit code 0]:
Success: no issues found in 1 source file
Yay, it works and typechecks. Now you can decide for yourself how clean it is by comparing it side by side with the original code without error handling. You'd see that the only differences (apart from indentation) is code for error handling.
Here's what I like about this approach:
no extra wrapper classes, code is clean and readable
Also note that surprisingly, Python's dynamic nature actually helps here. E.g. if you rewrote iter_books in Rust, you'd have to use Ok and Err to wrap the return values into Res object. I can imagine you might be able get away with explicit wrapping if you use language with conversions like Scala or C++.
because of no runtime wrappers, on the 'successful' code path, the callee doesn't need extra code to wrap/unwrap anything.
You can prototype and mess with your program in the interpreter without having to think about errors. If you do get an error, it would just most likely crash the whole program with AttributeError, which is essentially the desired non-defensive behaviour during prototyping.
You can completely ignore mypy and error handling, until you're happy, then you harden your program by making sure it complies to mypy.
no memory overhead caused by constant wrapping and unwrapping.
I don't really want to make claims about CPU here. I tried isolated micro benchmarking; using isinstance(r, Exception) runs in 50ns, using is_err() call and then unpacking err() runs is 60ns. But these numbers might not make sense under a realistic data flow.
easy to operate and transform values, you just write regular Python code without extra lambdas or kludgy local functions.
If you don't need to handle the error, you can just yield it up the call stack as we do in iter_books.
doesn't require modifying existing types, and introducing invalid states that signal errors (mentioned here)
correct variance for free
Variance reflects how compound types (e.g. containers/functions) behave with respect to inheritance of their arguments and return types. You might have also heard of this as Liskov substitution principle. I wouldn't try to explain it here, as it's a topic that deserves a whole post and something you need to experiment with and get comfortable. You can also find some explanations and examples here.
It short, we can let Res[T] to be covariant with respect to T, because it's a simple immutable wrapper around T.
If you were defining your own generic class, you'd have to declare T = TypeVar('T', covariant=True). It's somewhat misleading, because variance is a property of a generic container, however for some historic reasons in mypy, you specify variance in the definition of type variable. However, because Res is merely an alias to Union, you don't have to remember to do it, because Union is already defined as covariant in both its type arguments.
Downsides:
isinstance looks a bit verbose and might be frowned upon as it's often considered as code smell
We can't get around this and hide in a helper function for the same reason mentioned above, but it might be solved in mypy in some near future, though.
That's basically what I wanted to show! I've been using this pattern for a while now and I think it could work well. Remember about typing contexts and how isinstance / is None checks impact it, and you can keep your code clean and safe.
Not suggesting you to go and rewrite all your code from using try/catch now though. Every error handling style has its place, and hopefully you'll figure out parts of your projects where it's applicable.
Sometimes it's desirable to quickly switch result back to non-defensive version. You can do it by using a simple helper function unwrap (naming inspired by rust):
from typing import Union, TypeVar
T = TypeVar('T', covariant=True)
Res = Union[T, Exception]
defunwrap(res: Res[T]) -> T:
ifisinstance(res, Exception):
raise res
else:
return res
Python output [exit code 1]:
123
Traceback (most recent call last):
File "input.py", line 13, in <module>
print(unwrap(bad))
File "input.py", line 6, in unwrap
raise res
RuntimeError: bad
Mypy output [exit code 0]:
Success: no issues found in 1 source file
When you're actively working on your code and running tests, you want to make sure that there are no errors and be as non-defensive as possible. However, in the field, you want to keep the code more defensive. To switch behaviours quickly, you can use the following trick:
The idea here is Error.defensive_policy determines if exception will be handled defensively or thrown straightaway. This is enforced on type level, because in order to get Error you need to call its constructor at some point.
Also note the use of bound=Exception on the type variable, this is because we can only raise something that inherits Exception.
17: 18: from typing import Iterator
19: defiter_numbers() -> Iterator[Res[int]]:
20: for s in ['1', 'two', '3', '4']:
21: try:
22: yieldint(s)
23: exceptExceptionas e:
24: yield Error(e)
25: 26: defprint_negated() -> None:
27: for n in iter_numbers():
28: ifnotisinstance(n, Error):
29: print(-n)
30: else:
31: print('ERROR! ' + str(n.exc))
Now, the default behavior is defensive:
Python output [exit code 0]:
-1
ERROR! invalid literal for int() with base 10: 'two'
-3
-4
Mypy output [exit code 0]:
Success: no issues found in 1 source file
And if we set the error policy to non-defensive, we get exception as soon as we get parsing error:
Python output [exit code 1]:
-1
Traceback (most recent call last):
File "input.py", line 33, in <module>
print_negated()
File "input.py", line 27, in print_negated
for n in iter_numbers():
File "input.py", line 24, in iter_numbers
yield Error(e)
File "input.py", line 14, in __init__
raise exc
File "input.py", line 22, in iter_numbers
yield int(s)
ValueError: invalid literal for int() with base 10: 'two'
Mypy output [exit code 0]:
Success: no issues found in 1 source file
Even though you never actually return Error under the non-defensive policy, you don't have to change any of the type signatures: Iterator[int] is still a perfectly good Iterator[Res[int]]. Thanks, covariance!
I'm using this technique in my Kobo parser and control it via --errors argument. On CI, it runs in non-defensive mode of course. However when other people use the library for the first time they, something is likely to fail. It deals with decoding binary blobs in unspecified format after all! So one can run it in defensive mode, get most of their data and just ignore (hopefully few) errors till they are fixed.
If you remember the output, we got a rather cryptic ERROR: Couldn't match regex!. That's of course not desirable because you can't easily tell what exactly is causing the error.
However the problem is that raise ... from ... is a compound statement, so you can't write yield RuntimeError(entry) from e.
see my investigation attempt here
I find it handy to have a helper function here:
from typing import TypeVar
E = TypeVar('E', bound=Exception)
defechain(e: E, from_: Exception) -> E:
e.__cause__ = from_
return e
, then you can write yield echain(RuntimeError(entry), from_=e), and use traceback.format_exception to unroll it and get the stacktrace. The result looks like this:
* ERROR: Traceback (most recent call last):
File "/tmp/tmp.afhyiITIK2", line 45, in iter_highlights
yield parse_entry(entry.strip())
File "/tmp/tmp.afhyiITIK2", line 26, in parse_entry
assert groups is not None, "Couldn't match regex!"
AssertionError: Couldn't match regex!
The above exception was the direct cause of the following exception:
RuntimeError:
My Life as a Quant: Reflections on Physics and Finance (Emanuel Derman)
- Your Highlight on page 54 | Added on Tuesday, October 4, 2013 12:11:16 PM
The Black-Scholes model allows us to determine the fair value of a stock option.
Remember parse_entry? Its return type is Highlight, so it can return a single highlight or throw a single error, that will be handled by iter_highlights.
If we change return type to Iterator[Res[Highlight]], we can be more defensive and do some neat fallbacks:
defparse_entry(entry: str) -> Iterator[Res[Highlight]]:
groups = re.search(
r'(?P<title>.*)$\n.*Highlight on Page (?P<page>\d+).*Added on (?P<dts>.*)$\n\n(?P<text>.*)$',
entry,
re.MULTILINE,
)
assert groups isnotNone, "Couldn't match regex!"dts = groups['dts']
title = groups['title']
page = groups['page']
text = groups['text']
iflen(dts) == 0:
yieldException("Bad timestamp!")
dt = datetime.now()
else:
dt = datetime.strptime(dts, '%A, %B %d, %Y %I:%M:%S %p')
iflen(text) == 0:
yieldException("Empty highlight, something might be wrong")
yield Highlight(
dt=dt,
title=title,
page=page,
text=text,
)
You can think of Exceptions coming from parse_entry as sort of warnings and you can handle them accordingly in iter_highlights, e.g. attach extra context.
Of course, this complicates code, and you can't predict all possible errors anyway, so there is always some balance of how defensive you can be.
One case where I find 'special error value' more or less appropriate is when your function returns a pandas DataFrame.
When manipulating dataframes, you typically don't iterate explicitly, but apply more idiomatic (and often efficient!) combinators like merge, join, concat etc, so it makes sense to try and keep errors inside the dataframe. For me, it looks somewhat like this:
It looks pretty clean since DataFrame constructor automatically creates the necessary columns and fills missing values with None. (you can see some frame examples here).
Then in the dataframe processing code I would typically check for presence of non-nil value in 'error' column and act accordingly. E.g. here I'm using the timestamp attached to the parsing errors to plot them neatly close to the rest of data.
from typing import Any defdispatch(x: Any) -> None: try: raise x except A as e: print("Matched A!") except B as e: print("Matched B!") exceptExceptionas e: print(f"Unhandled object: {type(e)} {e}")
This looks a bit odd. We still have to type Exception, you can't just write except e, which hardly makes it different from isinstance. Note that we have to use else block: if you put code in it under try, you'll start catching exceptions coming from the printing code, which is unintended.
And the obvious downside is that there is a potential to forget to handle exception signaled by unwrap and mypy can't help you here.
sometimes existing and simple things work better and cleaner
Not trying to advocate avoiding syntactic sugar, decorators and libraries at any cost, however you might experience friction while trying to introduce them in more conservative teams.
it's kind of ironic that you can't achieve similar level of safety and cleanliness in many statically typed programming languages
Python is often hated by static typing advocates (I suppose as any other dynamically typed language). Have to admit, I was one of these haters few years ago. But in this case Python nails it.
writing is damn hard
Literate programming is even harder, however I'm glad I've started doing this in Emacs and Org mode. That saved me from otherwise massive amounts of code duplication and reference rot.
Python: better typed than you think
https://ift.tt/2Pp6iux
About 0 Minutes
mypy assisted error handling, exception mechanisms in other languages, fun with pattern matching and type variance
TLDR: I overview few error handling techniques (with the emphasis on Python, although I mention few other programming languages), some existing Python libraries and suggesting a simple and clean mypy-based approach.
You might learn few things about error handling in different languages, pattern matching, type variance, mypy's capabilities in general and clues for making your code and interfaces more mypy-friendly (and IDE friendly if you're using LSP/Intellij).
¶1 Intro aka computers are hard
I am somewhat obsessed with personal data and information, analyzing data for quantified self, lifelogging etc. I am trying to integrate all my information sources and make it easy to access and search. You can see some examples in my package and Orger: part I, part II.
To get this data, manipulate with it and interact with, of course, you need to extract it first (e.g. from json/csv), parse it (e.g. from plaintext), or even worse, reverse engineer it from vendor locked formats (e.g. in my kobo parsing library).
If you ever worked with data and had to parse some semi-structured data (let alone natural language), or scraped web pages, you might start getting flashbacks now. Undocumented APIs, bad characters, cryptic regexes, corrupt fields, unexpected nulls, logical inconsistencies, all sorts of things. You will almost never get it right from the first few attempts, and then when it finally does what your want… it breaks after couple of days because of course you missed some edge cases or data provider just gives you utter garbage for no reason. And thing you've spent so much effort on stops working, spams your mailbox and requires attention.
Ew. Data is messy.
Most modern programming languages are fairly unforgiving to unexpected, and would crash at the slightest opportunity. Some languages do have quirks (e.g. 'undefined' in JS), but generally well written software aborts very soon after something unexpected starts happening. And for good reasons:
if it didn't, your program's state would lose the properties the author intended it to have.
Ignoring the errors will almost surely prevent the program from getting to desired result anyway and end up with even more severe, or potentially catastrophic inconsistencies. How about formatting your disk if you're really unlucky?
So in most cases, as long as you can get away with it, it's good to throw exception or abort the program immediately in some way. You might not be able to do that if you're literally doing rocket science or flight control software, but most of us aren't. For typical software engineering problems, some errors are less crucial and more manageable than other errors. So we try to be pragmatic when we program, evaluate failure risks and use try/catch mechanisms where appropriate.
Now, I'm sure we as an engineers we could handwave about about that stuff forever, so let me be more specific straightaway and introduce a motivating real life problem that I actually had to solve.
¶2 The problem: parsing Kindle highlights
Say, you own a Kindle book. Electronic books are great. Yeah okay they don't smell like the real thing, but the possibility of highlighting bits of text and typing your comment without distracting on external means of annotation is incredibly helpful. However, then when you want to go through your highlights after reading to refresh your memory or perhaps to share with a friend, you find out it's not so convenient to actually quickly access them.
So you decide to write a script that would process the highlights, perhaps group them by book, displays timestamps and render a nice HTML page so you could easily open it from phone and recall latest books you read to discuss with friends.
On device, Kindle keeps bookmarks and highlights are stored … in My Clippings.txt file.
It is worth mentioning that although the two people disagree on whether the light hits the walls at the same time, this does not mean that they can't be friends.
PHYS771 Lecture 12: Proof (scottaaronson.com)
No hidden-variable theory can be local (I think some guy named Bell proved that).
Yes, it's a messy format and not very machine friendly. But oh well it's a file, you're a programmer. You know the drill.
whether the light hits the walls at the same time, this does not mean
that they can't be friends. [Page 120]
yield
I'll explain it further downNow:
imagine you've set this script to run in cron, and it's been fine for a while. You left for a three week holiday to finally get some rest from programming; started reading this new book about quant finance (yeah, you've always had interesting ways of getting a rest from computer) and… your script stopped working.
You swear out loud, reach for the laptop you promised to distance yourself from and turns our your parser chokes over page instead of Page in one of new entries. (and yes, this was actually the case in my Kindle export)
It is worth mentioning that although the two people disagree on whether the light hits the walls at the same time, this does not mean that they can't be friends.
PHYS771 Lecture 12: Proof (scottaaronson.com)
No hidden-variable theory can be local (I think some guy named Bell proved that).
My Life as a Quant: Reflections on Physics and Finance (Emanuel Derman)
The Black-Scholes model allows us to determine the fair value of a stock option.
You could argue that you should have made the regex in
parse_entry
case independent in the first place, but it's not something you would normally expect. Kindle specifically got all sorts of nasty things: roman numerals for page numbers, locale dependent dates, inconsistent separators, and so on.Perhaps you even fix this particular problem, but it's a matter of short time till next parsing issue. It's quite sad if you have to constantly tend for things that are meant to simplify and enhance your life.
Or,
you wrote this parser and decided that it could be useful for other people.
So for a small fee, you are providing a service that fetches highlights from their Kindles, displays on profile pages and lets their friends comment.
Imagine user's highlights result in the same error described above. It would be pretty sad if parsing a single entry took down the whole user's page or prevented updates. No matter how fast you'd be willing to fix these things, users would leave discouraged.
With the way code is written at the moment, any exception would take the whole program down. So, we need some way of getting around these errors and carrying on.
What do we do?
¶3 A non-solution #1: logging
One simple strategy would be to make parsing fully defensive, wrap the whole
parse_entry
call intry/except
and log:Logging typically works well for minor things not worthy a proper error (i.e. warnings) and as a means of retrospective error analysis and debugging. In our case logging wouldn't do the job:
What do we want?
So we need some way of propagating the errors up the call hierarchy instead of throwing immediately or suppressing.
¶4 A non-solution #2: special error value
Often it's tempting to fallback to some sort of special 'default' or 'error' value. I bet you've seen this before:
0
orINT_MAX
meaning error for integer type, or""
for string types. We could try something similar and squeeze exception into theHighlight
object itself.One obvious problem is that it's very nontransparent and relies on implicit convention: there is no way of telling that this function might return some special
Highlight
which should be treated as error. That not only complicates code, but might also introduce logical inconsistencies.E.g. if your
Highlight
object also returned book's ISBN and you filled it with some arbitrary text, it would almost surely not be a valid ISBN, that might cause failures down the pipeline.Sometimes it's inevitable though, e.g. I'm giving an example later.
¶5 Almost solution #1: Result container
An abstraction that stood the test of time well is a container that holds a result representing one of two:
T
E
.I will try to stick to the same semantics further down, 'result' typically meaning that it could be either desired value or error.
You can vaguely think of it as an interface
Result
, and two implementations:Ok
andError
. In runtime, you can ask the instance behindResult
, which of these alternative it holds and act accordingly.It has manifested as:
in Rust: std::result::Result. Example borrowed from here:
in Haskell:
Either E T
Yes,
Left
meaning error andRight
meaning success are not necessarily obvious. It's kinda a pun: "right" also means "correct". Also notice that error is not just a string, but also contains the position where parsing failed.std::expected<E, T>
So, Rust and Haskell programmers seem to be quite happy with it? Why can't we have same in Python? Well, some people tried! So I'll review a python library that does that: result.Result
Let's try it on our program and see how it works. To make it easier to compare to the original code I suggest duplicating the tab in a separate window and tiling them side by side.
We've had to wrap success and error values in
Ok
andErr
, but so far it's not too bad.We use
itertools.tee
here so we don't have to pollute our code with temporary lists.[Tong][2013] Dynamics and Relativity
whether the light hits the walls at the same time, this does not mean
that they can't be friends. [Page 120]
ERROR: Couldn't match regex!
Cool, we rendered as much as we can, and we get the error displayed as well, so nothing crashes and the users are not as unhappy. The error looks a bit out of nowhere, but at least it's there. We will address how we can improve it later.
Sadly, for someone else who looks at
iter_highlights
oriter_books
signatures, it's not obvious that it yieldsResult
objects, notBook/Highlight
objects without reading the code. It's a thankless job for a human to keep track of, and mypy is a perfect fit for this task. Gladly,result
library already comes with type annotations.So, let's try to use mypy to aid us at writing correct code.
Let's focus just on
iter_highlights
anditer_books
and use theResult
type.Umm. Let's go through the errors:
ok()
method being too defensive and returningNone
ifis_ok
isFalse
. Ideally, you'd throw exception here, because such a situation is a programming bug. We can just enforce non-optional type here viaunopt
helper.errors
might holdHighlight
objects. You could blame mypy of not being smart enough, but it would be a very hard if not impossible analysis in general case. We can get around this by unpacking error and wrapping back inErr
.Let's apply these insights and try again:
Phew! With some minor changes and restructuring we've convinced mypy.
It does come with some downsides:
Ok/Err
wrappers and access the success value via.value
propertysafety: you could forget to call
is_ok/is_err
before callingok/err
, and mypy won't even blink.The contract if .is_ok() is True, then it's safe to call .ok() is too complicated to be encoded as a type that mypy can handle. You'll get
None
or exception thrown in runtime. The author of the library admits it by the way, so it's not a criticism, just highlighting limitations of mypy here!Ok, we've learned something, let's try again.
¶By the way, what's up with
Iterator
everywhere?Glad you asked! Several reasons I'm using generators here:
.append
and then returning them in the end.Iterator
type is covariant, whereasList
is not. I'm elaborating on it later. I'm also usingSequence
for the same reason.¶6 Almost solution #2: use error combinators
Now, let's try out returns.result library, clearly inspired by Haskell's
Either
monad anddo
notation. I'm quite glad someone already implemented it and I didn't have to reinvent the wheel here.So, let's try and rewrite the code using
returns.result.Result
:So far the only difference from the original code is
@safe
decorator onparse_entry
, which basically deals with catching all exceptions and wrapping intoResult
.As a consequence,
iter_highlights
required no changes in its body. (which may not be a desirable thing as we'll see later)Ok, that definitely requires some explanation…
returns
library public API doesn't provide any way to tell between success and failure (kind of deliberately). The types_Success
and_Failure
are private, and the only method that we can use seems to beresult.value_or(default)
. This method returns the success value ifresult
isSuccess
and falls back todefault
ifresult
is aFailure
. So we use a sentinel object to distinguish between actual success values anddefault
ones, and also have to trick mypy with acast
.Apart from this obscurity, the function suffers from exactly the same issues as the
iter_books
implementation from the previous section, and for the same reason: contract is too complicated to be expressed in mypy.One could argue that this function is going to look awkward anyway since we need to separate list of results into successes and errors. Let's see the function that should be more straightforward:
The idea here is that we can use
map
method (that works likefmap
in Haskell) and use it to print successful results, and chain it withfix
that works like likefmap
, but for errors. In a sense, these methods encapsulate pattern matching (which Python lacks syntactically) so as long the implementor did the dirty business of correctly doing it dynamically, you're safe. However I feel that this particular library overdid this encapsulation a bit, hence very hacky implementation ofiter_books
.Lambdas can't be multiline, so we have to define a local function for
print_ok
.There is a bug in mypy that sometimes prevents you from inlining the lambda and struggles with type inference. Here I'm hitting this bug with
print_error
, that's why it's not.fix(lambda e: print(f"* ERROR: {e}"))
.Another potential problem is one could forget to implement one of
map/fix
clauses, since nothing enforces calling them. Even if you're detecting unused variables, missing.fix
clause could stay unnoticed forever. It's very similar to forgettingcatch
when using Javascript Promises.It might be possible to enforce with some static analysis though, e.g. via mypy plugin by flagging dangling/temporary
Result
values (e.g. similarly tomust_use
attribute in Rust), but it's a project on its own.Well at the very least it works and type checks!
Overall I'm not sold, Python simply lacks syntax that lets you unpack and compose
Result
objects in a clean way and you end up with boilerplate.lifts
are not very readable in Haskell, let alone in Python.I think authors did a great experiment though, the more people have fun with types, the more good abstractions we'll find.
I don't want to discourage people from using their library, so if it's your personal project and it makes your code more manageable or it just feels fun then by all means go for it!
But as much as I like ideas from functional programming, I'm almost certain that it's gonna look confusing to an average Python programmer, and won't be welcome warmly in your team.
¶7 Still-not-quite-a-solution #3: (Value, Error) pairs
Before we go on to the solution I propose let me mention another notable pattern of error handling.
It's commonly used in Go.
However, it's not limited only by Go, e.g. you'd often encounter it implicitly in C (which had no exceptions) or C++ code. For instance,
std::filesystem::is_symlink
comes in two flavours:bool is_symlink( const std::filesystem::path& p )
, which throws exceptions on errors.bool is_symlink( const std::filesystem::path& p, std::error_code& ec ) noexcept
, which setsec
on errors.You can think of it as if it returned
std::tuple<bool, std::error_code>
. I assume it's not that way because the compiler wouldn't be able to distinguish between signatures.Personally I as well as many other people find it pretty ugly. No judgment here though as I have no idea behind the design requirements and rationale for such a model in Go. Pretty sure one can get used to it after a while and that there are some static flow analyzers that help to ensure correct error handling.
Main issue with this approach regarding Python is that it's not mypy friendly as return type of
Open
would have to beTuple[Optional[Success], Optional[Error]]
. In the type theory language, it is a product type, so in addition to all members ofSuccess
type and all members ofError
type, it also got inhabitants that don't make sense for our program, such as(None, None)
and also all ofTuple[Success, Error]
.In other words, nothing on type level prevents the callee (
os.Open
) from returning something like(file_descriptor, "whoops")
, which has ambiguous meaning. If we use it we would have to pay with sacrificing type safety or extra code on caller site to eliminate these impossible program states:¶8 Solution: keep it simple
It seems that we were on the right track with the container type and combinators, but never completely satisfied. Let's recall the problems we had again:
readability: extra wrappers and accessor methods like
Ok/Success/Error/.is_ok()/.unwrap()
.It's visual noise and also they creep throughout the code, so if you decide you won't need them later, you might have to refactor a lot of code.
if res.is_error(): return res.value * 10
.fmap
-style combinators are not really going to look good because Python lacks multiline lambdas.First, we'll attack readability and safety. Yes, at the same time!
In part it's solved with syntactic sugar in other languages like
do
syntax inHaskell
, ortry!
macro and?
operator in Rust. Sometimes it's inevitable and you have to inject values into rust'sResult
explicitly viaOk/Err
constructors. However checking for.is_ok()
orisRight
is really not that common in idiomatic Rust and Haskell. Reason is pattern matching! E.g. if we had pattern matching in Python we could write something like:That's cleaner than checking for
is_ok/is_err
and unpacking; and also makes it type safe becauseb
ande
already have the appropriate types. In our imaginary world where python had this syntax, surely mypy would have supported it too, right?Oh wait. It kind of supports it already!
So, mypy keeps track of the typing context and narrows it down after certain operations, in particular,
isinstance
checks andis None/is not None
checks.That looks very similar to pattern matching both in terms of syntax and typing rules.
So, it seems that
Union
would represent our result type. Do we still need to come up with some special wrapper for errors? Not really, Python already has a fairly convenient candidate for it:Exception
! Most often you have it anyway inexcept
clause, if it's not enough, you can inherit it, add extra fields and treat as any other type.On the other hand, Exceptions almost never end up as function return values (and when they do, it's normally some fairly unambiguous code dealing specifically with error handling). Hmm, how convenient 🤔.
So even though we don't have explicit tagged unions in Python, if we agree that error values are represented as Exceptions, then we do get a disjoint type (i.e.
Ok
andError
are mutually exclusive) at runtime.So, rules of thumb:
Union[T, Exception]
to represent type for results that holdT
but can also end up with an errorreturn
oryield
exceptions and success values without using any extra wrappersisinstance
Let's see how we can rewrite our program by employing these principles:
[Tong][2013] Dynamics and Relativity
whether the light hits the walls at the same time, this does not mean
that they can't be friends. [Page 120]
ERROR: Couldn't match regex!
Mypy output [exit code 0]:
Success: no issues found in 1 source file
Yay, it works and typechecks. Now you can decide for yourself how clean it is by comparing it side by side with the original code without error handling. You'd see that the only differences (apart from indentation) is code for error handling.
Here's what I like about this approach:
no extra wrapper classes, code is clean and readable
Also note that surprisingly, Python's dynamic nature actually helps here. E.g. if you rewrote
iter_books
in Rust, you'd have to useOk
andErr
to wrap the return values intoRes
object. I can imagine you might be able get away with explicit wrapping if you use language with conversions likeScala
orC++
.because of no runtime wrappers, on the 'successful' code path, the callee doesn't need extra code to wrap/unwrap anything.
You can prototype and mess with your program in the interpreter without having to think about errors. If you do get an error, it would just most likely crash the whole program with
AttributeError
, which is essentially the desired non-defensive behaviour during prototyping.You can completely ignore mypy and error handling, until you're happy, then you harden your program by making sure it complies to mypy.
no memory overhead caused by constant wrapping and unwrapping.
I don't really want to make claims about CPU here. I tried isolated micro benchmarking; using
isinstance(r, Exception)
runs in 50ns, usingis_err()
call and then unpackingerr()
runs is 60ns. But these numbers might not make sense under a realistic data flow.easy to operate and transform values, you just write regular Python code without extra lambdas or kludgy local functions.
If you don't need to handle the error, you can just yield it up the call stack as we do in
iter_books
.correct variance for free
Variance reflects how compound types (e.g. containers/functions) behave with respect to inheritance of their arguments and return types. You might have also heard of this as Liskov substitution principle. I wouldn't try to explain it here, as it's a topic that deserves a whole post and something you need to experiment with and get comfortable. You can also find some explanations and examples here.
It short, we can let
Res[T]
to be covariant with respect toT
, because it's a simple immutable wrapper aroundT
.If you were defining your own generic class, you'd have to declare
T = TypeVar('T', covariant=True)
. It's somewhat misleading, because variance is a property of a generic container, however for some historic reasons in mypy, you specify variance in the definition of type variable. However, becauseRes
is merely an alias toUnion
, you don't have to remember to do it, becauseUnion
is already defined as covariant in both its type arguments.Downsides:
isinstance
looks a bit verbose and might be frowned upon as it's often considered as code smellWe can't get around this and hide in a helper function for the same reason mentioned above, but it might be solved in mypy in some near future, though.
That's basically what I wanted to show! I've been using this pattern for a while now and I think it could work well. Remember about typing contexts and how
isinstance
/is None
checks impact it, and you can keep your code clean and safe.Not suggesting you to go and rewrite all your code from using
try/catch
now though. Every error handling style has its place, and hopefully you'll figure out parts of your projects where it's applicable.¶9 Tips & tricks
¶Custom error type
While the three line API is enough in most cases, you might want something more fancy.
One improvement is allowing arbitrary error type.
The downside now is that you do need to wrap your exception (i.e. presumably you still want to keep the message and stacktrace) in
Error
container.¶unwrap
Sometimes it's desirable to quickly switch result back to non-defensive version. You can do it by using a simple helper function
unwrap
(naming inspired by rust):¶Global error policy
When you're actively working on your code and running tests, you want to make sure that there are no errors and be as non-defensive as possible. However, in the field, you want to keep the code more defensive. To switch behaviours quickly, you can use the following trick:
The idea here is
Error.defensive_policy
determines if exception will be handled defensively or thrown straightaway. This is enforced on type level, because in order to getError
you need to call its constructor at some point.Also note the use of
bound=Exception
on the type variable, this is because we can onlyraise
something that inheritsException
.Now, the default behavior is defensive:
And if we set the error policy to non-defensive, we get exception as soon as we get parsing error:
Even though you never actually return
Error
under the non-defensive policy, you don't have to change any of the type signatures:Iterator[int]
is still a perfectly goodIterator[Res[int]]
. Thanks, covariance!I'm using this technique in my Kobo parser and control it via --errors argument. On CI, it runs in non-defensive mode of course. However when other people use the library for the first time they, something is likely to fail. It deals with decoding binary blobs in unspecified format after all! So one can run it in defensive mode, get most of their data and just ignore (hopefully few) errors till they are fixed.
¶Improving error context
If you remember the output, we got a rather cryptic ERROR: Couldn't match regex!. That's of course not desirable because you can't easily tell what exactly is causing the error.
Normally, you'd use exception chaining, i.e.
raise EXCEPTION from CAUSE
syntax here.raise ... from ...
is a compound statement, so you can't writeyield RuntimeError(entry) from e
. see my investigation attempt hereI find it handy to have a helper function here:
, then you can write
yield echain(RuntimeError(entry), from_=e)
, and usetraceback.format_exception
to unroll it and get the stacktrace. The result looks like this:Now that's better!
¶Fine grained defensiveness
Remember
parse_entry
? Its return type isHighlight
, so it can return a single highlight or throw a single error, that will be handled byiter_highlights
.If we change return type to
Iterator[Res[Highlight]]
, we can be more defensive and do some neat fallbacks:You can think of
Exceptions
coming fromparse_entry
as sort of warnings and you can handle them accordingly initer_highlights
, e.g. attach extra context.Of course, this complicates code, and you can't predict all possible errors anyway, so there is always some balance of how defensive you can be.
¶Error values, revisited
One case where I find 'special error value' more or less appropriate is when your function returns a pandas
DataFrame
.When manipulating dataframes, you typically don't iterate explicitly, but apply more idiomatic (and often efficient!) combinators like
merge
,join
,concat
etc, so it makes sense to try and keep errors inside the dataframe. For me, it looks somewhat like this:It looks pretty clean since
DataFrame
constructor automatically creates the necessary columns and fills missing values withNone
. (you can see some frame examples here).Then in the dataframe processing code I would typically check for presence of non-nil value in 'error' column and act accordingly. E.g. here I'm using the timestamp attached to the parsing errors to plot them neatly close to the rest of data.
¶Cursed pattern matching mechanism
This is forbidden knowledge liberated during the latest Area 51 raid. Tsss… don't tell the government.
Have to admit, this is a pretty weird idea that I haven't got practical use for, but still.
What's a construction in Python language that's dispatching objects according to their type?
try/catch
!It certainly looks unconventional, and you can only use that as long as your object inherits from
Exception
.We can exploit this for our specific case on
Union[T, Exception]
by usingunwrap
:This looks a bit odd. We still have to type
Exception
, you can't just writeexcept e
, which hardly makes it different fromisinstance
. Note that we have to useelse
block: if you put code in it undertry
, you'll start catching exceptions coming from the printing code, which is unintended.And the obvious downside is that there is a potential to forget to handle exception signaled by
unwrap
and mypy can't help you here.¶10 Closing points
sometimes existing and simple things work better and cleaner
Not trying to advocate avoiding syntactic sugar, decorators and libraries at any cost, however you might experience friction while trying to introduce them in more conservative teams.
it's kind of ironic that you can't achieve similar level of safety and cleanliness in many statically typed programming languages
Python is often hated by static typing advocates (I suppose as any other dynamically typed language). Have to admit, I was one of these haters few years ago. But in this case Python nails it.
writing is damn hard
Literate programming is even harder, however I'm glad I've started doing this in Emacs and Org mode. That saved me from otherwise massive amounts of code duplication and reference rot.
¶12 --
Let me know what you think! I'm open to all feedback.
Discussion:
via beepb00p.xyz https://beepb00p.xyz
January 16, 2020 at 03:02PM
The text was updated successfully, but these errors were encountered: