Add unreachable code rule #5384

Thomasdezeeuw · 2023-06-27T10:21:45Z

Summary

This adds a new rule that detect unreachable code, currently limited to function bodies.

How it Works

The rule works as follows.

First we create "basic blocks" from the statements. These basic block are zero or more lines of code (statements) for which the code flow is easy to follow. Specifically all statements in a single block follow each other, no diversion of the control flow. At the end of the block the control can do one of three things: 1) continue to another code block, 2) based on a condition jump to one of two code blocks, or 3) terminate (return or end of the function).

Second, based on these basic blocks, and the simplified control flow they represent, we can determine what blocks are and aren't reached. We do this by starting with the first block of the function and following the jumps it makes to the next code block, marking them all as reached.

Third, we create a diagnostic for each code block that is not reached by step 2. Currently these diagnostics are quite limited, see below.

Future Work and Limitations

This commit is only the beginning of this rule, there is much work still left to do.

The diagnostics created currently is quite limited. It only mentions the function name and points to the fist statement in the basic block. In the future this should be expanded to point to all statements in the basic block. Furthermore it would be helpful to the users to explain why a code block is not reached as currently, except for the most basic example, this might not be clear.

We're currently quite limited on how we determine if a branch is always taken or not, specifically we only detect the constants true and false and only in the most basic condition (mainly the if and while statements). This can be expanded to detect more cases where we can statically determine whether or not a branch is taken.

This currently doesn't handle the try or (async) with statements.

For match statements we currently set BasicBlock::stmts to the entire match statement for each basic block, even though we're only interested in one of its cases.

Within match statements binding to named patterns is currently not handled. Similarly to wildcard they should be considered to be always taken (assuming no guard is present).

False Positive

This has the possibility for false positive mostly around the non-implementation of try and with statements. Currently I found one in the Bokeh repo and added it as a (commented-out) test case. I believe time is best spend implementing try and with instead of working around this issue.

Test Plan

Added new tests, simply run cargo test.

I also manually ran this on airflow the Bokeh, Cpython, FastAPI and Jupyter server repos.

github-actions · 2023-06-27T10:53:21Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.01      8.1±0.05ms     5.0 MB/sec    1.00      8.0±0.02ms     5.1 MB/sec
formatter/numpy/ctypeslib.py               1.01   1768.6±5.14µs     9.4 MB/sec    1.00   1745.5±2.57µs     9.5 MB/sec
formatter/numpy/globals.py                 1.02    198.3±0.43µs    14.9 MB/sec    1.00    195.1±0.34µs    15.1 MB/sec
formatter/pydantic/types.py                1.02      3.9±0.02ms     6.6 MB/sec    1.00      3.8±0.01ms     6.7 MB/sec
linter/all-rules/large/dataset.py          1.02     14.1±0.08ms     2.9 MB/sec    1.00     13.8±0.06ms     2.9 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.03      3.5±0.02ms     4.7 MB/sec    1.00      3.4±0.02ms     4.9 MB/sec
linter/all-rules/numpy/globals.py          1.00    436.4±1.49µs     6.8 MB/sec    1.00    435.6±0.39µs     6.8 MB/sec
linter/all-rules/pydantic/types.py         1.02      6.3±0.02ms     4.1 MB/sec    1.00      6.1±0.03ms     4.2 MB/sec
linter/default-rules/large/dataset.py      1.01      7.0±0.02ms     5.8 MB/sec    1.00      6.9±0.02ms     5.9 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1495.1±2.75µs    11.1 MB/sec    1.00   1499.9±3.07µs    11.1 MB/sec
linter/default-rules/numpy/globals.py      1.00    170.8±3.71µs    17.3 MB/sec    1.00    170.7±0.26µs    17.3 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.1±0.02ms     8.2 MB/sec    1.00      3.1±0.01ms     8.2 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.02      9.6±0.12ms     4.2 MB/sec    1.00      9.4±0.07ms     4.3 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.0±0.02ms     8.3 MB/sec    1.00      2.0±0.02ms     8.3 MB/sec
formatter/numpy/globals.py                 1.00    218.9±2.29µs    13.5 MB/sec    1.02   222.4±14.54µs    13.3 MB/sec
formatter/pydantic/types.py                1.01      4.5±0.04ms     5.7 MB/sec    1.00      4.5±0.04ms     5.7 MB/sec
linter/all-rules/large/dataset.py          1.00     15.4±0.16ms     2.6 MB/sec    1.00     15.5±0.07ms     2.6 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      4.1±0.04ms     4.0 MB/sec    1.00      4.1±0.03ms     4.0 MB/sec
linter/all-rules/numpy/globals.py          1.00   429.0±10.25µs     6.9 MB/sec    1.01    433.8±7.69µs     6.8 MB/sec
linter/all-rules/pydantic/types.py         1.00      7.1±0.12ms     3.6 MB/sec    1.01      7.1±0.16ms     3.6 MB/sec
linter/default-rules/large/dataset.py      1.00      8.1±0.03ms     5.0 MB/sec    1.02      8.2±0.04ms     4.9 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1658.9±9.63µs    10.0 MB/sec    1.01   1676.2±8.92µs     9.9 MB/sec
linter/default-rules/numpy/globals.py      1.00    180.5±1.34µs    16.4 MB/sec    1.00    181.1±1.32µs    16.3 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.6±0.01ms     7.0 MB/sec    1.01      3.7±0.02ms     7.0 MB/sec

Thomasdezeeuw · 2023-06-27T11:51:53Z

I found a couple of false positive with the Bokeh repo, looking into them now.

MichaReiser

Impressive work! I've two questions that may as well fall under future work

Have you thought about how to support e.g. conditional expressions where we also have a conditional data flow?
I commented on the BasicBlock layout. Could you expand on the reason why you chose this specific BasicBlock layout? Or in general, could you document how the basic blocks should be structured?

crates/ruff/src/rules/ruff/rules/unreachable.rs

MichaReiser · 2023-06-28T12:42:38Z

crates/ruff/src/rules/ruff/rules/unreachable.rs

+    /// i = 0          # block 0
+    /// while True:    #


What's the reason for creating a connection to block 0 vs creating four blocks:

Everything before the while i = 0

While header (while True:)

While body

After the body

And connecting

1 and 2 with an unconditional jump

2 and 3 with a conditional jump

2 and 4 with a conditional jump

3 and 2 with an unconditional jump

For the purposes of detecting unreachable code, your block 1 doesn't contain any control flow, notwithstanding function calls that always raise an exception, thus block 2 is always reached when block 1 is reached. Creating one block instead of two is then simply less work for us later on. Rustc does the same thing, see https://rustc-dev-guide.rust-lang.org/appendix/background.html#what-is-a-control-flow-graph.

Other then the addition block, it's what we roughly create at the moment, see https://github.com/astral-sh/ruff/pull/5384/files#diff-a98f67bee97e7c459fad838467a76c7cbcd7c66beca1bae4826d1fa4054c4e5d (crates/ruff/src/rules/ruff/rules/snapshots/ruff__rules__ruff__rules__unreachable__tests__while.py_12.snap) for a graph of code that is quite similar.

crates/ruff/src/rules/ruff/rules/unreachable.rs

Thomasdezeeuw

Impressive work! I've two questions that may as well fall under future work
* Have you thought about how to support e.g. conditional expressions where we also have a conditional data flow?

You mean for example an if statement inside of the condition (test) of an if statement (Expr::IfExp)? No, I didn't have time to look at this yet.

* I commented on the `BasicBlock` layout. Could you expand on the reason why you chose this specific `BasicBlock` layout? Or in general, could you document how the basic blocks should be structured?

I'm going to assume you mean BasicBlocks (multiple) as you left a comment on that.

Basically the BasicBlocks.blocks is a tree laid out as a vector/array. The fist and last blocks are defined as the last and first blocks in the function (i.e. they are switched) because we start processing the last statement. Why start with the last statement? Because the statement always jumps to the next statement assuming no control flow diversion, which means they have to be part of the existing tree (vector) to reference them.

In between these two block however things get a little fuzzy... It's not ideal, but I couldn't really find a reasonable way to make it return a fixed order (in the time I had). Basically each statement can add an arbitrary number of blocks as it can have an arbitrary number of statements within it (think the body of a loop or if statement). For these "sub-statements" we use the same approach as the top-level statements, but we reuse the blocks vector (for performance reasons). This all works fairly straight forward up to the point where you have recursion, e.g. for loops.

For while loops the body jumps to the while block itself, unless we see a break or return. But by default create_blocks points to the last block in blocks as the next block (unless after is set). But the after argument isn't sufficient in all cases, so we need change_next_block to fix up the control flow in some cases. (I added the after argument after I already implemented change_next_block, but I couldn't remove it as it's still required in some cases last time I checked)

crates/ruff/src/rules/ruff/rules/unreachable.rs

Thomasdezeeuw · 2023-06-29T09:33:32Z

crates/ruff/src/rules/ruff/rules/unreachable.rs

+    /// i = 0          # block 0
+    /// while True:    #


For the purposes of detecting unreachable code, your block 1 doesn't contain any control flow, notwithstanding function calls that always raise an exception, thus block 2 is always reached when block 1 is reached. Creating one block instead of two is then simply less work for us later on. Rustc does the same thing, see https://rustc-dev-guide.rust-lang.org/appendix/background.html#what-is-a-control-flow-graph.

Other then the addition block, it's what we roughly create at the moment, see https://github.com/astral-sh/ruff/pull/5384/files#diff-a98f67bee97e7c459fad838467a76c7cbcd7c66beca1bae4826d1fa4054c4e5d (crates/ruff/src/rules/ruff/rules/snapshots/ruff__rules__ruff__rules__unreachable__tests__while.py_12.snap) for a graph of code that is quite similar.

crates/ruff/src/rules/ruff/rules/unreachable.rs

dimaqq

Consider special-casing yield

>>> def foo():
...   if False: yield
...   return 42
... 
>>> def bar():
...   return 42
... 
...
>>> foo()
<generator object foo at 0x7ffa840a4e00>
>>> bar()
42

IIRC there's one more corner case with nonlocals/locals in the presence of a closure, but I can't type if off the top of my head.

Thomasdezeeuw · 2023-07-02T15:32:38Z

Consider special-casing yield
>>> def foo():
...   if False: yield
...   return 42
... 
>>> def bar():
...   return 42
... 
...
>>> foo()
<generator object foo at 0x7ffa840a4e00>
>>> bar()
42
IIRC there's one more corner case with nonlocals/locals in the presence of a closure, but I can't type if off the top of my head.

I've moved Stmt::Expr to be handled separately, but it's future work.

Thomasdezeeuw · 2023-07-02T15:38:16Z

@MichaReiser @charliermarsh I can't push the branch any more, but here are two more patches:

Move Stmt::Expr to a different match branch:

From d2042fbd090cd59e77363395a48dc94dd6da0db0 Mon Sep 17 00:00:00 2001
From: Thomas de Zeeuw <[email protected]>
Date: Sun, 2 Jul 2023 17:28:09 +0200
Subject: [PATCH 1/2] Move Stmt::Expr to a different match branch

This still needs to be handled.
---
 .../ruff/src/rules/ruff/rules/unreachable.rs  | 34 ++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/crates/ruff/src/rules/ruff/rules/unreachable.rs b/crates/ruff/src/rules/ruff/rules/unreachable.rs
index b8959749b..5bc085cf8 100644
--- a/crates/ruff/src/rules/ruff/rules/unreachable.rs
+++ b/crates/ruff/src/rules/ruff/rules/unreachable.rs
@@ -357,7 +357,6 @@ fn create_blocks<'stmt>(
             | Stmt::Assign(_)
             | Stmt::AugAssign(_)
             | Stmt::AnnAssign(_)
-            | Stmt::Expr(_)
             | Stmt::Break(_)
             | Stmt::Continue(_) // NOTE: the next branch gets fixed up in `change_next_block`.
             | Stmt::Pass(_) => unconditional_next_block(blocks, after),
@@ -437,6 +436,7 @@ fn create_blocks<'stmt>(
                 // TODO: currently we don't include the lines before the match
                 // statement in the block, unlike what we do for other
                 // statements.
+                after = Some(blocks.len() - 1);
                 continue;
             }
             Stmt::Raise(_) => {
@@ -461,6 +461,38 @@ fn create_blocks<'stmt>(
                     orelse,
                 }
             }
+            Stmt::Expr(stmt) => {
+                match &*stmt.value {
+                    Expr::BoolOp(_) |
+                    Expr::BinOp(_) |
+                    Expr::UnaryOp(_) |
+                    Expr::Dict(_) |
+                    Expr::Set(_) |
+                    Expr::Compare(_) |
+                    Expr::Call(_) |
+                    Expr::FormattedValue(_) |
+                    Expr::JoinedStr(_) |
+                    Expr::Constant(_) |
+                    Expr::Attribute(_) |
+                    Expr::Subscript(_) |
+                    Expr::Starred(_) |
+                    Expr::Name(_) |
+                    Expr::List(_) |
+                    Expr::Tuple(_) |
+                    Expr::Slice(_)  => unconditional_next_block(blocks, after),
+                    // TODO: handle these expressions.
+                    Expr::NamedExpr(_) |
+                    Expr::Lambda(_) |
+                    Expr::IfExp(_) |
+                    Expr::ListComp(_) |
+                    Expr::SetComp(_) |
+                    Expr::DictComp(_) |
+                    Expr::GeneratorExp(_) |
+                    Expr::Await(_) |
+                    Expr::Yield(_) |
+                    Expr::YieldFrom(_) => unconditional_next_block(blocks, after),
+                }
+            },
             // The tough branches are done, here is an easy one.
             Stmt::Return(_) => NextBlock::Terminate,
         };
-- 
2.41.0
``` 

Improve BasicBlocks docs:

````patch
From 41e7813b4d83771689905020c5cd1e0779788445 Mon Sep 17 00:00:00 2001
From: Thomas de Zeeuw <[email protected]>
Date: Sun, 2 Jul 2023 17:29:42 +0200
Subject: [PATCH 2/2] Improve BasicBlocks docs

---
 crates/ruff/src/rules/ruff/rules/unreachable.rs | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/crates/ruff/src/rules/ruff/rules/unreachable.rs b/crates/ruff/src/rules/ruff/rules/unreachable.rs
index 5bc085cf8..dcbe33186 100644
--- a/crates/ruff/src/rules/ruff/rules/unreachable.rs
+++ b/crates/ruff/src/rules/ruff/rules/unreachable.rs
@@ -194,9 +194,10 @@ struct BasicBlocks<'stmt> {
     /// # Notes
     ///
     /// The order of these block is unspecified. However it's guaranteed that
-    /// the last block is the statement in the function and the first block is
-    /// the last statement. The block are more or less in reverse order, but it
-    /// gets fussy around control flow statements (e.g. `if` statements).
+    /// the last block is the first statement in the function and the first
+    /// block is the last statement. The block are more or less in reverse
+    /// order, but it gets fussy around control flow statements (e.g. `while`
+    /// statements).
     ///
     /// For loop blocks, and similar recurring control flows, the end of the
     /// body will point to the loop block again (to create the loop). However an
-- 
2.41.0
```

This adds a new rule that detect unreachable code, currently limited to function bodies. How it Works ============ The rule works as follows. First we create "basic blocks" from the statements. These basic block are zero or more lines of code (statements) for which the code flow is easy to follow. Specifically all statements in a single block follow each other, no diversion of the control flow. At the end of the block the control can do one of three things: 1) continue to another code block, 2) based on a condition jump to one of two code blocks, or 3) terminate (return or end of the function). Second, based on these basic blocks, and the simplified control flow they represent, we can determine what blocks are and aren't reached. We do this by starting with the first block of the function and following the jumps it makes to the next code block, marking them all as reached. Third, we create a diagnostic for each code block that is not reached by step 2. Currently these diagnostics are quite limited, see below. Future Work and Limitations =========================== This commit is only the beginning of this rule, there is much work still left to do. The diagnostics created currently is quite limited. It only mentions the function name and points to the fist statement in the basic block. In the future this should be expanded to point to all statements in the basic block. Furthermore it would be helpful to the users to explain *why* a code block is not reached as currently, except for the most basic example, this might not be clear. We're currently quite limited on how we determine if a branch is always taken or not, specifically we only detect the constants true and false and only in the most basic condition (mainly the `if` and `while` statements). This can be expanded to detect more cases where we can statically determine whether or not a branch is taken. This currently doesn't have `try` or (`async`) `with` statements. For match statements we currently set `BasicBlock::stmts` to the entire match statement for each basic block, even though we're only interested in one of its cases. Within match statements binding to named patterns is currently not handled. Similarly to wildcard they should be considered to be always taken (assuming no guard is present).

In some cases, for example while constructing a while loop, the block indices don't always exist. Deal with that possibility by simpling ignoring that block.

This is often the `after` variable, which wasn't correctly used everywhere. This now fixed and a regression tests based on a function found in Bokeh is added to test this.

But this commit doesn't actually fix the problem. The problem is that the try statements aren't handled yet and simple continue with the next block, which in the case of a `while True` loop creates an infinite loop. Thus the rule triggers on any statements after the while loop, but this is incorrect.

MichaReiser · 2023-07-04T09:25:44Z

Current dependencies on/for this PR:

main
- PR Add unreachable code rule #5384 👈

This comment was auto-generated by Graphite.

MichaReiser · 2023-07-04T11:50:53Z

...ules/ruff/rules/snapshots/ruff__rules__ruff__rules__unreachable__tests__async-for.py.md.snap

@@ -0,0 +1,241 @@
+---
+source: crates/ruff/src/rules/ruff/rules/unreachable.rs


I changed the file extension to .md.snap. This gives us a nice preview if you associate the file extension with markdown in your editor

MichaReiser · 2023-07-04T11:51:37Z

The only remaining question is how to get the rule for now.

Thomasdezeeuw force-pushed the thomas/control_flow_graph branch from 936f428 to 576d03f Compare June 27, 2023 12:32

Thomasdezeeuw requested review from MichaReiser and charliermarsh June 27, 2023 13:04

MichaReiser reviewed Jun 28, 2023

View reviewed changes

Thomasdezeeuw commented Jun 29, 2023

View reviewed changes

dimaqq reviewed Jun 29, 2023

View reviewed changes

Thomas de Zeeuw and others added 8 commits July 4, 2023 08:38

Don't assume the next block exists in change_next_block

55847aa

In some cases, for example while constructing a while loop, the block indices don't always exist. Deal with that possibility by simpling ignoring that block.

Make RUF014 rule example clearer

0576cfd

Properly propagate the next block

dec4241

This is often the `after` variable, which wasn't correctly used everywhere. This now fixed and a regression tests based on a function found in Bokeh is added to test this.

Fix unreachable code example

f200357

Move Stmt::Expr to a different match branch

0057c01

Improve BasicBlocks docs

0869f5a

MichaReiser force-pushed the thomas/control_flow_graph branch from 5983995 to 0869f5a Compare July 4, 2023 06:43

Introduce BasicBlocKBuilder

9ea325d

MichaReiser added 2 commits July 4, 2023 13:13

Use IndexVec

647033b

Use md.snap for snapshots

9de3e5a

MichaReiser force-pushed the thomas/control_flow_graph branch from de8fd71 to 9de3e5a Compare July 4, 2023 11:50

MichaReiser reviewed Jul 4, 2023

View reviewed changes

MichaReiser approved these changes Jul 4, 2023

View reviewed changes

Add unreachable-code feature

d04ba39

MichaReiser force-pushed the thomas/control_flow_graph branch from a0b40e7 to d04ba39 Compare July 4, 2023 14:17

MichaReiser enabled auto-merge (squash) July 4, 2023 14:22

MichaReiser merged commit 0b963dd into main Jul 4, 2023

MichaReiser deleted the thomas/control_flow_graph branch July 4, 2023 14:27

bittner mentioned this pull request Aug 4, 2023

Ruff should detect unreachable code but reports 'Missing explicit return', fixing it with return None #6334

Open

charliermarsh mentioned this pull request Jan 24, 2024

[pylint] - Add unreachable rule (W0101) #9611

Closed

augustelalande mentioned this pull request Apr 27, 2024

[ruff] Re-implement unreachable #10891

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unreachable code rule #5384

Add unreachable code rule #5384

Thomasdezeeuw commented Jun 27, 2023 •

edited

Loading

github-actions bot commented Jun 27, 2023 •

edited

Loading

Thomasdezeeuw commented Jun 27, 2023

MichaReiser left a comment

MichaReiser Jun 28, 2023

Thomasdezeeuw Jun 29, 2023

Thomasdezeeuw left a comment

Thomasdezeeuw Jun 29, 2023

dimaqq left a comment

Thomasdezeeuw commented Jul 2, 2023

Thomasdezeeuw commented Jul 2, 2023

MichaReiser commented Jul 4, 2023

MichaReiser Jul 4, 2023

MichaReiser commented Jul 4, 2023

		@@ -0,0 +1,241 @@
		---
		source: crates/ruff/src/rules/ruff/rules/unreachable.rs

Add unreachable code rule #5384

Add unreachable code rule #5384

Conversation

Thomasdezeeuw commented Jun 27, 2023 • edited Loading

Summary

How it Works

Future Work and Limitations

False Positive

Test Plan

github-actions bot commented Jun 27, 2023 • edited Loading

PR Check Results

Ecosystem

Benchmark

Linux

Windows

Thomasdezeeuw commented Jun 27, 2023

MichaReiser left a comment

Choose a reason for hiding this comment

MichaReiser Jun 28, 2023

Choose a reason for hiding this comment

Thomasdezeeuw Jun 29, 2023

Choose a reason for hiding this comment

Thomasdezeeuw left a comment

Choose a reason for hiding this comment

Thomasdezeeuw Jun 29, 2023

Choose a reason for hiding this comment

dimaqq left a comment

Choose a reason for hiding this comment

Thomasdezeeuw commented Jul 2, 2023

Thomasdezeeuw commented Jul 2, 2023

MichaReiser commented Jul 4, 2023

MichaReiser Jul 4, 2023

Choose a reason for hiding this comment

MichaReiser commented Jul 4, 2023

Thomasdezeeuw commented Jun 27, 2023 •

edited

Loading

github-actions bot commented Jun 27, 2023 •

edited

Loading