Skip to content

[pyflakes] Handle some common submodule import situations for unused-import (F401)#20200

Merged
dylwil3 merged 33 commits intoastral-sh:mainfrom
dylwil3:unused-imports
Sep 26, 2025
Merged

[pyflakes] Handle some common submodule import situations for unused-import (F401)#20200
dylwil3 merged 33 commits intoastral-sh:mainfrom
dylwil3:unused-imports

Conversation

@dylwil3
Copy link
Collaborator

@dylwil3 dylwil3 commented Sep 1, 2025

Summary

The PR under review attempts to make progress towards the age-old problem of submodule imports, specifically with regards to their treatment by the rule unused-import (F401).

Some related issues:

Prior art:

One distinguishing feature of this PR is that we have attempted to be less ambitious in the hopes that this will improve the chances of landing nonzero progress into main.

What the People Want

Users expect to see the following behavior in these common situations:

# Example 1
import a    # ok
import a.b  # F401: unused-import

a.foo()
# Example 2
import a    # F401: unused-import
import a.b  # ok

a.b.bar()
# Example 3
import a    # ok
import a.b  # ok

a.foo()
a.b.bar()

The following situations are unintuitive to users, but are valid Python and we are probably forced to proceed as indicated:

# Example 4
import a.b  # ok

a.foo()
# Example 5
import a.b  # F401: unused-import
import a.c  # ok

a.x.baz()

The goal of this PR is to modify the implementation of unused-import to match this expectation.

Modeling the Python in Users' Brains

Given the expected behavior above, how might a typical user be modeling Python in their minds? One possibility is as follows. Upon seeing the statement:

import a.b

the user thinks:

[User]: We have made available the symbol a, which is a module, but we are only allowed to access members of this module that have the form a.b.*.

This happens to be incorrect. Rather than restrict what is available to access on a, the opposite happens:

[Python]: We have executed the equivalent of the statement import a, and, in particular, can access any top-level member of a made available by this import. We then import the submodule b, making possibly even more members available.

Similarly, when this user sees:

import a
import a.b

a.foo()
a.b.bar()

they think:

[User] The line a.foo() is only allowed because of the first import. So that is definitely needed. The second import may or may not be needed depending on whether a/__init__.py imports b.

when really the opposite is true:

[Python] The first import is totally redundant and is not used. The call a.foo() references the symbol a that is being loaded in on the second line.

These views are actually incompatible, so it is not possible to model the user's mind without running afoul of Examples 4 and 5 above.

The Worst of Both Worlds

To model both the behavior that the user expects and the unintuitive behavior that Python allows, we adopt an approach I'll call "Thanks I Hate It (TIHI)". It is the worst of both worlds, and it alters how we think about accessing members of a module. Again we will consider the example:

import a
import a.b

a.foo()
a.b.bar()

[TIHI]: Following Python, the import statements both make available the symbol a and all of it's top-level members. The second import statement also makes available the members of the form a.b.*. However, the attribute call a.foo will access a via the first import statement and the call a.b.bar will access a.b via the second.

In general the TIHI model will resolve an attribute load of the form a.a1...an (for a module a) by:

  1. Finding the maximum prefix match amongst all available
    import statements import a.b1...bm,
  2. Among these, finding those of minimal path-length, and
  3. Adding a reference to the binding created by the nearest
    of these.

What we actually do

Rather than force Ruff's semantic model to be incorrect by actually implementing TIHI, we hack TIHI into the logic of unused-import itself. Doing this is somewhat awkward due to the nature of the implementation of unused-import.

Let me briefly review the current implementation and then explain what we do differently.

Current implementation

After having the semantic model visit the entire module, we look at the current live import bindings that have no references. There are various other filters and logic then applied, but our essential starting bucket for this rule consists of that: bindings to import statements that are live at the end of the file and have no references.

In particular, if the module consists solely of:

import a
import a.b

then our starting "bucket" where we might emit a lint does not include import a since the symbol a is shadowed by the statement import a.b.

This PR

From now on we will refer to the current behavior as the "stable" behavior. In this PR we introduce preview behavior under very specific assumptions designed to limit the scope of this PR to handle the most common cases where users get tripped up. If any of these assumptions are not met then we revert to the stable behavior.

To begin, instead of only considering import bindings that are live at the end of the module, we also consider all bindings shadowed by these. We assume that all of these bindings are also import bindings - specifically simple imports and submodule imports without aliases, not from imports. If not - revert to stable behavior. Next, we have to decide which of these statements have been "used" in the sense of the "TIHI" model above. So we iterate through the references to the last binding and decide which import statement is intuitively being referenced, according to the matching logic we described earlier.

As a technical note here, a call like a.b.foo() will create a reference to the symbol a - so we will not see the full attribute access. We must therefore crawl up the AST and collect the qualified name of the attribute access. We assume that all references arise from a Name node or from appearance in a binding to __all__ - if we find some other kind of reference, we revert to stable behavior.

Having marked each import statement as used or unused, we collect the unused ones and then proceed as in the stable behavior.

Questions

What if the submodule import has a side-effect and so is used? For example, if a/b.py contains import a.x but a/__init__.py does not. Then a.x.baz really does need import a.b.

We already don't model side-effects, even for simple module imports like import a. So the user would be required to suppress the lint for such an anti-pattern.

Shouldn't you do ____ to avoid allocating and performance regressions?

Maybe! The benchmarks did not show a regression, but if we want to pre-emptively do that I'm happy to. Some places where it may make sense are:

  • We currently allocate a vector of size 1 in a common situation where we "bail" to the stable behavior when iterating over bindings. This is to make the opaque type of various iterators the same. But we could avoid that.
  • We could use something like SmallVec when collecting candidate unused import bindings shadowing a given one, since there will often not be very many
  • We could use a bitmask to mark used bindings (for a given symbol) as long as we are only dealing with < 64 of them and fallback to Vec<bool> otherwise
  • ... probably other things I'm not thinking of!

@dylwil3 dylwil3 added rule Implementing or modifying a lint rule preview Related to preview mode features labels Sep 1, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+245 -0 violations, +0 -0 fixes in 30 projects; 25 projects unchanged)

DisnakeDev/disnake (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ disnake/ext/commands/bot_base.py:5:8: F401 [*] `collections` imported but unused
+ disnake/ext/commands/cog.py:21:8: F401 [*] `disnake` imported but unused

RasaHQ/rasa (+77 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ rasa/cli/data.py:21:8: F401 [*] `rasa.shared.utils.cli` imported but unused
+ rasa/cli/data.py:22:8: F401 [*] `rasa.utils.common` imported but unused
+ rasa/cli/data.py:6:8: F401 [*] `rasa.shared.core.domain` imported but unused
+ rasa/cli/export.py:11:8: F401 [*] `rasa.utils.common` imported but unused
+ rasa/cli/interactive.py:22:8: F401 [*] `rasa.utils.common` imported but unused
+ rasa/cli/telemetry.py:7:8: F401 [*] `rasa.cli.utils` imported but unused
+ rasa/cli/test.py:28:8: F401 [*] `rasa.utils.common` imported but unused
+ rasa/cli/test.py:8:8: F401 [*] `rasa.shared.data` imported but unused
+ rasa/cli/train.py:11:8: F401 [*] `rasa.utils.common` imported but unused
+ rasa/core/actions/action.py:84:8: F401 [*] `rasa.shared.utils.io` imported but unused
+ rasa/core/brokers/broker.py:9:8: F401 [*] `rasa.shared.utils.io` imported but unused
+ rasa/core/channels/rasa_chat.py:8:8: F401 [*] `jwt.exceptions` imported but unused
+ rasa/core/evaluation/marker_base.py:24:8: F401 [*] `rasa.shared.nlu.constants` imported but unused
+ rasa/core/evaluation/marker_base.py:25:8: F401 [*] `rasa.shared.utils.validation` imported but unused
+ rasa/core/evaluation/marker_base.py:27:8: F401 [*] `rasa.shared.utils.common` imported but unused
... 62 additional changes omitted for project

Snowflake-Labs/snowcli (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ tests/test_utils.py:22:8: F401 [*] `snowflake.cli._plugins.snowpark.models` imported but unused

apache/airflow (+23 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

+ airflow-core/src/airflow/plugins_manager.py:22:8: F401 [*] `importlib` imported but unused
+ airflow-core/tests/unit/utils/test_log_handlers.py:24:8: F401 [*] `logging.config` imported but unused
+ devel-common/src/sphinx_exts/docs_build/fetch_inventories.py:19:8: F401 [*] `concurrent` imported but unused
+ devel-common/src/tests_common/pytest_plugin.py:2147:16: F401 `airflow.sdk._shared.logging` imported but unused; consider using `importlib.util.find_spec` to test for availability
+ devel-common/src/tests_common/pytest_plugin.py:2150:20: F401 `airflow.sdk._shared.logging` imported but unused; consider using `importlib.util.find_spec` to test for availability
+ providers/amazon/src/airflow/providers/amazon/aws/hooks/batch_waiters.py:36:8: F401 [*] `botocore.client` imported but unused
+ providers/celery/src/airflow/providers/celery/executors/celery_executor_utils.py:131:16: F401 [*] `airflow.jobs.local_task_job_runner` imported but unused
+ providers/celery/src/airflow/providers/celery/executors/celery_executor_utils.py:132:16: F401 [*] `airflow.macros` imported but unused
+ providers/celery/src/airflow/providers/celery/executors/celery_executor_utils.py:135:16: F401 `airflow.providers.standard.operators.bash` imported but unused; consider using `importlib.util.find_spec` to test for availability
+ providers/celery/src/airflow/providers/celery/executors/celery_executor_utils.py:136:16: F401 `airflow.providers.standard.operators.python` imported but unused; consider using `importlib.util.find_spec` to test for availability
... 13 additional changes omitted for project

apache/superset (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

+ superset/utils/logging_configurator.py:21:8: F401 [*] `flask.app` imported but unused
+ tests/integration_tests/cli_tests.py:29:8: F401 [*] `superset.cli.importexport` imported but unused

aws/aws-sam-cli (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ samcli/lib/package/s3_uploader.py:25:8: F401 [*] `botocore` imported but unused

binary-husky/gpt_academic (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ crazy_functions/Multi_Agent_Legacy.py:57:16: F401 [*] `autogen` imported but unused
+ request_llms/bridge_chatglm.py:22:16: F401 [*] `os` imported but unused
+ request_llms/bridge_chatglm3.py:22:16: F401 [*] `os` imported but unused

bokeh/bokeh (+4 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

+ release/credentials.py:18:8: F401 [*] `boto` imported but unused
+ release/credentials.py:20:8: F401 [*] `boto.s3.key` imported but unused
+ tests/support/defaults.py:87:12: F401 [*] `bokeh.models` imported but unused
+ tests/test_cross.py:39:8: F401 [*] `_pytest.mark` imported but unused

ibis-project/ibis (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ ibis/backends/conftest.py:5:8: F401 `importlib.metadata` imported but unused
+ ibis/backends/tests/test_dot_sql.py:11:8: F401 `ibis.backends.sql.dialects` imported but unused
+ ibis/expr/types/_rich.py:12:8: F401 `rich` imported but unused

langchain-ai/langchain (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ libs/core/tests/unit_tests/tracers/test_langchain.py:3:8: F401 [*] `unittest` imported but unused

latchbio/latch (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ src/latch_cli/centromere/ctx.py:12:8: F401 [*] `paramiko.util` imported but unused
+ src/latch_cli/snakemake/single_task_snakemake.py:12:8: F401 [*] `snakemake.workflow` imported but unused
+ src/latch_cli/snakemake/workflow.py:27:8: F401 [*] `snakemake` imported but unused

lnbits/lnbits (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ lnbits/settings.py:3:8: F401 [*] `importlib` imported but unused

mlflow/mlflow (+52 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ dev/update_ml_package_versions.py:16:8: F401 [*] `urllib.error` imported but unused
+ examples/flower_classifier/score_images_spark.py:19:8: F401 [*] `mlflow` imported but unused
+ examples/hyperparam/search_random.py:19:8: F401 [*] `mlflow.sklearn` imported but unused
+ examples/hyperparam/search_random.py:20:8: F401 [*] `mlflow.tracking` imported but unused
+ mlflow/anthropic/autolog.py:4:8: F401 [*] `mlflow` imported but unused
+ mlflow/langchain/utils/logging.py:282:12: F401 [*] `langchain.agents.agent` imported but unused
+ mlflow/langchain/utils/logging.py:283:12: F401 [*] `langchain.chains.base` imported but unused
+ mlflow/langchain/utils/logging.py:285:12: F401 [*] `langchain.llms.huggingface_hub` imported but unused
+ mlflow/langchain/utils/logging.py:286:12: F401 [*] `langchain.llms.openai` imported but unused
+ mlflow/langchain/utils/logging.py:77:12: F401 [*] `langchain.agents` imported but unused
... 42 additional changes omitted for project

pandas-dev/pandas (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ pandas/_libs/__init__.py:16:8: F401 [*] `pandas._libs.pandas_parser` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
+ pandas/tests/indexes/datetimes/methods/test_to_pydatetime.py:7:8: F401 [*] `dateutil.tz` imported but unused
+ pandas/tests/indexes/datetimes/test_constructors.py:12:8: F401 [*] `dateutil` imported but unused

prefecthq/prefect (+20 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ integration-tests/test_client_context_lifespan.py:10:8: F401 [*] `prefect` imported but unused
+ integration-tests/test_client_context_lifespan.py:12:8: F401 [*] `prefect.exceptions` imported but unused
+ src/integrations/prefect-aws/prefect_aws/workers/ecs_worker.py:71:8: F401 [*] `anyio` imported but unused
+ src/integrations/prefect-dask/prefect_dask/task_runners.py:91:8: F401 [*] `distributed.deploy` imported but unused
+ src/integrations/prefect-gcp/prefect_gcp/workers/cloud_run.py:165:8: F401 [*] `anyio` imported but unused
+ src/integrations/prefect-kubernetes/prefect_kubernetes/worker.py:121:8: F401 [*] `anyio` imported but unused
+ src/integrations/prefect-ray/tests/test_task_runners.py:10:8: F401 [*] `ray` imported but unused
+ src/integrations/prefect-ray/tests/test_task_runners.py:16:8: F401 [*] `prefect.task_engine` imported but unused
+ src/prefect/cli/profile.py:18:8: F401 [*] `prefect.settings.profiles` imported but unused
+ src/prefect/client/schemas/schedules.py:9:8: F401 [*] `dateutil` imported but unused
... 10 additional changes omitted for project

pypa/cibuildwheel (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ cibuildwheel/__main__.py:17:8: F401 [*] `cibuildwheel.util` imported but unused

pypa/pip (+4 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ src/pip/_internal/cli/base_command.py:6:8: F401 [*] `logging.config` imported but unused
+ src/pip/_internal/vcs/__init__.py:5:8: F401 `pip._internal.vcs.bazaar` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
+ src/pip/_internal/vcs/__init__.py:6:8: F401 `pip._internal.vcs.git` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
+ src/pip/_internal/vcs/__init__.py:7:8: F401 `pip._internal.vcs.mercurial` imported but unused; consider removing, adding to `__all__`, or using a redundant alias

... Truncated remaining completed project reports due to GitHub comment length restrictions

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
F401 245 245 0 0 0

@dylwil3 dylwil3 force-pushed the unused-imports branch 3 times, most recently from f955acb to 42cb8c8 Compare September 12, 2025 19:48
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we have restricted our analyses to the present scope, so this snapshot is possibly surprising. The main reason to restrict to one scope at a time is that the unused import rule already acts on the scope level. So the current behavior is already to emit two diagnostics even for something like:

import a

def foo():
    import a

@dylwil3 dylwil3 marked this pull request as ready for review September 12, 2025 20:33
@MichaReiser MichaReiser added the great writeup A wonderful example of a quality contribution label Sep 18, 2025
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a masive improvement with very low overhead. Well done.

I've a couple of nit comments but I also found a correctness bug that we need to look into.

I haven't reviewed all tests yet.

@dylwil3 dylwil3 requested a review from MichaReiser September 25, 2025 20:57
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and now with a 0 perf regression. Well done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a hidden unwrap call when you access segments[0]. There's an implicit assumption that segments is never empty.

binding
                .as_any_import()
                .expect("binding to be import binding since current function called after restricting to these in `unused_imports_in_scope`")
                .qualified_name()
                .segments()[0];

I haven't checked all code paths but I think the assumption is true for all QualifiedName. If so, then having a method on QualifiedName that returns the first segment (and documents why doing so without checking the length of segments is okay) seems generally useful (e.g. first_segment, root_name, what not :))

@dylwil3 dylwil3 merged commit 57e1ff8 into astral-sh:main Sep 26, 2025
36 checks passed
@dylwil3 dylwil3 deleted the unused-imports branch September 26, 2025 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

great writeup A wonderful example of a quality contribution preview Related to preview mode features rule Implementing or modifying a lint rule

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments