Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta issue: plugin system #283

Open
sobolevn opened this issue Sep 29, 2022 · 67 comments
Open

Meta issue: plugin system #283

sobolevn opened this issue Sep 29, 2022 · 67 comments
Labels
core Related to core functionality

Comments

@sobolevn
Copy link
Contributor

I think that the main thing why Flake8 is so popular is its plugin system.
You can find plugins for every possible type of problems and tools.

Right now docs state:

Beyond rule-set parity, ruff suffers from the following limitations vis-à-vis Flake8:

1. Flake8 has a plugin architecture and supports writing custom lint rules.

I propose designing and implementing plugin API.
This way ruff can compete with flake8 in terms of adoption and usability.

Plugin API

I think that there are some flake8 problems that should be fixed and also there are some unique chalenges that should be addressed.

  1. Explicit "opt-in" for plugins. Right now flake8 suffers from a problem when you install some tool and it has a flake8 plugin definition. This plugin is automatically enabled due to how setuptools hooks work. I think that all rules must be explicit. So, eslint's explicit plugins: looks like a better way.
  2. Special "fix" API and tooling: so many typical problems can be solved easily by plugin authors
  3. Plugin order. Since plugins will change the source code, we must be very strict about what order they run in. Do they run in parallel while checking files?
  4. Plugin configuration: current way of configuring everything in [flake8] section can cause conflicts between plugins. Probably, the way eslint does that is better
  5. Packaging: how to buld and package rust extensions? How to build wheels?

Please, share your ideas and concerns.

@charliermarsh
Copy link
Member

Strongly agree. Will write up some thoughts on this later!

@charliermarsh
Copy link
Member

The first big decision is: should plugins be written in Rust? Or in Python? I believe that either could be possible (though I haven't scoped out the work at all), e.g., using pyo3. It may even be easier to support plugins in Python given that loading code dynamically is much easier in a scripted language...

However, I'm partial to requiring plugins be written in Rust. It will lead to a more cohesive codebase, allow us to maintain a focus on performance, and avoid requiring extensive cross-language FFI. I'm open to being convinced otherwise here though.

Here are a few relevant resources on implementing a plugin system for Rust:

One of the main challenges seem to be around the lack of ABI stability in Rust. In many of the above write-ups, they discuss how both the plugins and calling library need to use the same versions of Rust in order to be compatible, which feels like a tall order. (From that perspective, one thing that's interesting to me is: could we compile plugins to WASM?)

@sobolevn
Copy link
Contributor Author

sobolevn commented Sep 29, 2022

I think that instead of Python based plugins, it is better to provide some kind of query language to make easy plugins very easy. Like in https://github.com/hchasestevens/astpath

In my opinion, any complex stuff should be in Rust. This way it can reuse existing APIs and be fast. But, I don't know how many Python developers actually know Rust 🤔

I think another way of dealing with it is to ask exisiting flake8 plugin authors about their prefered way of writting it. Their feedback would be very valuable!

@charliermarsh
Copy link
Member

Interesting, Fixit / LibCST has something kind of like that too. It's not quite a distinct query language, but it's effectively a DSL (in Python) to pattern-match against AST patterns.

@thejcannon
Copy link
Contributor

thejcannon commented Oct 25, 2022

My 2c. flake8's plugin support is pretty rudimentary (operates on tokens/lines plus a handfull of metadata). Therefore if you supported Python plugins, you likely could craft it in a way that supporting flake8 plugins out-of-the-box(-ish) would be feasible.

Then you could have most flake8 plugins available through ruff, and wouldn't need to support new plugins by porting the code to Rust (only if you want to because yummy yummy perf).

(lightly related to #414)

@charliermarsh
Copy link
Member

Another idea: we could build a plug-in system atop https://github.com/ast-grep/ast-grep. This would allow users to express lint rules in YAML or via a simple DSL.

@charliermarsh
Copy link
Member

(That tool is itself built atop tree-sitter.)

@ljodal
Copy link

ljodal commented Nov 8, 2022

I’m coming to this from a position of having written a flake8 plugin for a very specific need at work, and as part of a larger project. This is not something generic, so it’d never make sense as a built-in feature in ruff. I’d love a way to write plugins to ruff in Python, mostly because it’s convenient as someone familiar with Python and not so much rust, but also because it would be nice to keep a project in pure Python even while interacting with rust.

The specific plugin in my case, oida, was first written as a standalone thing before I discovered how easy it was to add it as a plugin to flake8. It also uses LibCST, for its ability to round trip code, where we do codemodding for its. If it would be possible to expose a similar ast based Python-interface for plugins that would be awesome.

I also have use cases where I’d like to do auto fixing, which it would also be nice to support. The first thing I’d like to do is normalize import statements (relative vs absolute). In order to do that I’d need an interface where I get import statements or ast nodes and the path to the file so I can locate it in relation to other files on the system.

I understand that writing plugin in Python would be a slowdown compared to writing them in rust, but I think that tradeoff would be very much acceptable in many cases.

@charliermarsh
Copy link
Member

Very helpful and all makes sense.

Maybe just as another data point for the thread: when I was at Spring Discovery, we wrote a few Flake8 plugins to enforce highly codebase-specific rules.

For example:

  • "Always late-import TensorFlow" (i.e., import it within a function that depends on it, rather than at module top-level)
  • "If you ever import module X, make sure the file also imported module Y"
  • "Imports to module Z should always use import from structure"

@charliermarsh
Copy link
Member

So in that light, I think there are different categories of plugins:

  • Plugins that are custom to a codebase
  • Plugins that may apply to many codebases, but don't make sense to include directly in Ruff (e.g., Django-specific stuff could qualify here)

@charliermarsh
Copy link
Member

I think most of those "custom" plugins / checks could be built atop something like ast-grep, but more complex checks (like rewriting absolute and relative imports) would be limited by that approach.

@charliermarsh
Copy link
Member

The first thing I’d like to do is normalize import statements (relative vs absolute). In order to do that I’d need an interface where I get import statements or ast nodes and the path to the file so I can locate it in relation to other files on the system.

(Separately: this could arguably make sense to include in Ruff directly.)

@ljodal
Copy link

ljodal commented Nov 8, 2022

I think most of those "custom" plugins / checks could be built atop something like ast-grep, but more complex checks (like rewriting absolute and relative imports) would be limited by that approach.

Yeah, I wouldn’t really be able to implement any of Oida using ast-grep, as all the rules depend on the context of the project. I use in-process caching to keep that state ready between files in the current flake8 plugin btw, forgot to mention that above, so the flake8 interface isn’t ideal for that kind of plugin.

The first thing I’d like to do is normalize import statements (relative vs absolute). In order to do that I’d need an interface where I get import statements or ast nodes and the path to the file so I can locate it in relation to other files on the system.

(Separately: this could arguably make sense to include in Ruff directly.)

I guess some rules could be, again not for my specific case. What we’re considering at work is to enforce relative imports within a Django app and use absolute imports for everything else. Our structure will be project.component.app or project.app, so it would be very specific to our use case how that rule should be applied. I’ve already played with implementing it in isort, but I found that code base hard to navigate and would love a clean ast/cst based plugin interface where I could add this logic :)

@peterjc
Copy link

peterjc commented Nov 14, 2022

You asked for feedback from other flake8 plugin authors, so:

  • https://github.com/peterjc/flake8-black (620k downloads/month on PyPI), not needed if you can run black directly as well as running flake8, for example via the tool pre-commit or otherwise. Currently this reloads each Python file from disk (scope here to refactor to let black use its cache), it would not be possible to use the AST from flake8 directly. Does not make sense to plug into ruff.

  • https://github.com/peterjc/flake8-rst-docstrings/ (238k downloads/month on PyPI), uses the AST to extract docstrings, which are passed as strings to the Python library docutils to be validated as RST. My code is essentially a wrapper, and since docutils is written in Python that would have to be used internally if this plugin were to be ported to ruff.

  • https://github.com/peterjc/flake8-sfs (15k downloads/month on PyPI), uses the AST directly looking for particular kinds of node. Probably could be done in either Python or Rust, although unlikely to be popular enough to deserve including in ruff itself.

@charliermarsh
Copy link
Member

Thank you @peterjc! Really appreciate your engagement here as a plugin author!

(Regarding RST: it looks like there's at least one Rust crate for parsing RST, though it doesn't look super popular.)

@ofek
Copy link
Contributor

ofek commented Nov 15, 2022

Is this possible, or supported currently? https://github.com/adamchainz/flake8-tidy-imports

@charliermarsh
Copy link
Member

@ofek - Not currently supported but it’s a pretty small surface area so should be easy to add some time in the next few days.

@ofek
Copy link
Contributor

ofek commented Nov 17, 2022

Thanks! I've been enforcing absolute imports recently (except in tests) https://github.com/pypa/hatch/blob/b0911bb0eaa8d331c24eda940b97bf244ecd5ac3/.flake8#L8-L11

After that I'll switch over, and make new projects generated by Hatch use this.

@charliermarsh
Copy link
Member

Sweet! The banned relative import rule I can definitely do today.

@charliermarsh
Copy link
Member

@ofek -- I252 (banned relative imports) just went out in v0.0.125.

You can use it in Hatch by adding this to your pyproject.toml:

[tool.ruff]
select = [
  "B",
  "C",
  "E",
  "F",
  "W",
  # Ruff doesn't have this, but it does have E722.
  # "B001",
  "B003",
  "B006",
  "B007",
  # These don't exist in newer flake8-bugbear versions IIUC.
  # "B301",
  # "B305",
  # "B306",
  # "B902",
  "Q000",
  "Q001",
  "Q002",
  "Q003",
  "I252",
]
ignore = [
  "B027",
  # "E203",
  # "E722",
  # "W503",
]
line-length = 120
# tests can use relative imports
per-file-ignores = {"tests/*" = ["I252"], "tests/**/*" = ["I252"]}

[tool.ruff.flake8-tidy-imports]
ban-relative-imports = "all"

Let me know if it works, or doesn't! :)

@ofek
Copy link
Contributor

ofek commented Nov 20, 2022

Thank you!!! pypa/hatch#607

@ljodal
Copy link

ljodal commented Nov 28, 2022

@charliermarsh You wrote somewhere that libcst is significantly slower than the current ast implementation in ruff (can't find it right now). Do you know why? Is it because it's a cst or is it because the classes it exposes are Python "compatible"?

I'm asking because I've started looking into pyo3 and from what I see the only way to expose an ast to a Python plugin would be to make the ast classes Python classes in pyo3. If that's what's slow with libcst I guess there's not really any point in investigating that route too much, but if we could make that fast enough I guess it could be one way to make plugins work.

That doesn't resolve auto-fixing, but as I suggested in another thread I think maybe doing auto-fixing on the token level could be made to work. Maybe an interface like this:

def visit_Import(node: ast.Import, tokens: list[str]) -> list[str]:
    # Check ast (or tokens) for violations and return updated token
    return ["import", " ", "foo"]

Or maybe have tokens as an attribute on the ast nodes 🤔

@charliermarsh
Copy link
Member

charliermarsh commented Nov 28, 2022

@ljodal - This was all based on LibCST as a Rust crate, with no Python FFI -- so I think it's just the CST and parser, and not anything to do with the the serialization. (I also hacked in some RustPython vs. LibCST benchmarks into the existing LibCST benches and got similar results. As with all benchmarking, though, I could definitely be doing something wrong!)

@charliermarsh
Copy link
Member

@ljodal - I don't have great intuition for whether the PyO3 FFI would add much overhead and what the performance impact would be. I think it's worth exploring!

@ljodal
Copy link

ljodal commented Nov 29, 2022

Aight, then I'll continue investigating :)

I haven't written any rust before, so it's slow going (thinking of doing advent of code in rust to get a kickstart). My plan was to use the Python ASDL definitions to generate AST classes, but it's been years since last I touched compilers so I'll have to see how I go about the tokenization and conversion to ast

@zanieb
Copy link
Member

zanieb commented Jul 28, 2023

There is also a lot of interesting discussion about plugins in the Rust ecosystem over at helix-editor/helix#3806

It may be essential for us to allow plugins to be authored in Python, regardless of the machinery between the Python API and our Rust API.

@adam-azarchs
Copy link

Definitely not a mainstream idea but just wanted throw this out there for consideration: you could consider using starlark for plugins.

The language is essentially a non-Turing-complete subset of python with strong sandboxing and safe multithreaded execution, designed for having an interpreter embedded in another hosting process. There is in fact an interpreter implementation for rust.

In some ways this would provide the "best of both worlds" in allowing you to keep a near-python syntax while avoiding many of the performance and maintenance issues inherent to supporting python plugins. For one thing, you don't need to be ABI-compatible with python, since you'd be self-hosting the interpreter. The main downside would be lack of availability of arbitrary python packages, but from a performance perspective that's probably a good thing.

The main differences between starlark and python:

  1. Top-level variables (including functions) are frozen after import, and are single-assignment. This makes concurrent execution easy, since there can be no mutable state shared between invocations of a function; something that ruff would I think very much want to be able to take advantage of. It also permits certain forms of ahead of time "compilation" and static checking which are impossible to do reliably in a language as dynamic as python, e.g. checking for undefined names during load rather than at runtime on every reference.
  2. No try/except. Errors are always hard failures. This significantly simplifies the runtime and again allows for more "precompilation".
  3. Disallowing of various bug-prone patterns like modification during iteration.
  4. No unbounded loops. for x in y is allowed but no while loops and, by default, no recursion. This is a little awkward at times but allows the runtime to ensure that all starlark programs will eventually terminate.
  5. No OS access out of the box. While the hosting interpreter can expose methods for things like reading files, none is provided by default, meaning it should be safe to run untrusted starlark code. It also prevents plugin authors from "going around" your provided APIs.

Ultimately, I don't think any of this would provide much of a benefit over and above using WASM for plugins. WASM already enables users to write their code in any language of their choice that supports wasm as a compilation target. However, python is not one of those languages, and as has been pointed out, most ruff users work primarily, if not exclusively, in python, so having something at least near-python may have some value. It still wouldn't enable things like flake8-rst-docstrings delegating out to a python rst-parsing library, but personally I would consider that to be a good thing, as python dependency trees can quickly grown out of control and become difficult to maintain and keep up to date.

@ofek
Copy link
Contributor

ofek commented Sep 25, 2023

If we do go down the Starlark route, the PyOxidizer project(s) can serve as an extensive example of usage in Rust.

@obi1kenobi
Copy link

Small update if you might be considering Trustfall: at RustConf last week, @estebank and a few other folks expressed interest in using Trustfall to query Rust HIR as a way to support custom lints for Rust 👀

@Gnosnay
Copy link

Gnosnay commented Dec 14, 2023

learned a lot from this long thread. May i know for now, if we wanna define our own syntax linter check with ruff, how should we do?
If anyone can give one way, i will very appreciate it

@monotkate
Copy link

It sounds like the jury is still out when it comes to creating custom rules, is that correct? I've seen custom linting rules be a valuable tool when modularizing a monolith, with rules very customized for the codebase you're working in. As far as I can tell, Ruff does not allow you to develop custom rules at this time, so we'd have to run another linter alongside Ruff for that ability.

We just switched our codebase to using Ruff, and are also looking to start modularizing. I'm trying to figure out if I need to chose a second tool alongside Ruff for customizations.

@adam-azarchs
Copy link

I think the main point of contention is not whether it should be allowed but rather how.

IMO if people want to author a plugin in python they should probably use a python-based tool (e.g. flake8 or pylint) to run it. One of the things I like about ruff is that it doesn't have a dependency on a python runtime, and not unrelatedly that it is very fast. A plugin architecture for ruff would be nice, certainly, but I'd advocate for it being either native plugins (e.g. .so libs that can be dynamically loaded into the process and register themselves), WASM, or maybe some kind of DSL (ast-grep was mentioned earlier).

I strongly suspect that a DSL would be sufficient for 80+% of the kind of use cases people are describing where a repo has rules very specific to their code base that wouldn't be sufficiently broadly applicable to upstream. Especially nice about that is that such custom rules could just be included in the pyproject.toml (toml is isomorphic to yaml, though it does get awkward compared to yaml for more deeply nested structures).

@morgante
Copy link

We've been working with Biome to integrate GritQL as an extension/plugin system and I'd love to offer the same for Ruff. The problem space is similar and I think GritQL provides a few advantages:

  • Preserve some of the best things about Ruff: no runtime Python dependency, pure Rust, no separate installation steps, etc.
  • Most custom/codebase-specific rules can be expressed as simple AST-based transforms.
  • Traversal still happens in pure Rust and, because declarative queries are used, they could be optimized to maintain Ruff's excellent performance

Here's a few example of how @charliermarsh's earlier custom suggestions could be implemented directly:

  • Always late import TensorFlow - studio
    `import $import` where {$import <: contains `tensorflow`, $import <: not within block()}
    
  • If you ever import module X, make sure the file also imported module Y - studio
    `import $import` where {
        $import <: contains `moduleX`,
        $program <: not contains `import moduleY`
    }
    
  • Imports to module Z should always use import from structure - studio
    `import $import` where {$import <: contains `module`, $import <: not within `from $_ import $_`}
    

@JobaDiniz
Copy link

I'm looking to write a few fitness functions by extending ruff linters. I think that would be ideal, and I'd like to use python.

Since ruff does not support plugins, I'm writing these functions as tests that run on CI using pytest, but these specific fitness functions I'm writing are linters, so it would make sense to write them as part of ruff linters.

@jhosmanfriasbravo
Copy link

@charliermarsh hi! do you think that ruff will consider a plugin system in the short-medium term?

@MichaReiser
Copy link
Member

Thank you, @morgante, for offering your support to help us build a GritQL-based plugin system.

GritQL is undoubtedly at the top of my mind when it comes to designing a plugin system for Ruff, and I'm following the work in the Biome repository from a distance (but I must admit, not very closely).

It will probably be a while before we evaluate solutions for a plugin system because we're currently in the middle of rewriting Ruff's compiler infrastructure to support multifile analysis (and more ;)). But I'll come back to your offer when we're ready to explore Ruff plugins.

@ssbarnea
Copy link

One flake8 plugin that could be very useful to be covered by ruff would be pydoclint. Even having external plugins, like called as shell commands would prove very useful, speed should not be no1 priority. In time plugin authors might rewrite them in rust, but for start a way to hook external ones would prove very useful.

@adam-azarchs
Copy link

I am far from convinced that there's value in having ruff launch such external processes, including hosting a python runtime (which while technically could be in-process, there wouldn't really be much of an advantage over forking it as a subprocess). The primary value that flake8 gets out of integrating a lot of plugins is that they all can share the same parsed AST representation, to avoid redundant work. Likewise with ruff's integrated linters. That would not be the case for ruff plugins unless those plugins were also written in rust, or some other language that could consume ruff's in-memory representations with limited or no transformations.

It sort of sounds to me that what some people are looking for is a way to run a bunch of arbitrary checkers on python files in a repository, and they don't really care whether those commands are integrated into the same executable or not. If that's what you're looking for, maybe look at something like pre-commit? pre-commit is quite good at that job. ruff just feels like the wrong layer for doing that - it's a thing that gets run by pre-commit, and shouldn't be trying to do the same job.

@flying-sheep
Copy link
Contributor

A new blog post investigating Rust plugin systems, probably helpful! https://benw.is/posts/plugins-with-rust-and-wasi

@2bndy5
Copy link

2bndy5 commented Aug 30, 2024

FYI, using external processes (like a rust-based standalone binary or a python-based entrypoint script) is how mdbook implements plugins for preprocessing Markdown. This is how mdbook also supports plugins implemented in python or possibly other languages. If designed well enough, one could write a ruff plugin that also acts as a standalone linter. This means adding unsupported rules (from linters that ruff is non-compliant with) could be implemented externally in the (unsupported) third-party linter.

@carlpaten
Copy link

Members of my team have been agitating for Ruff, but without support for custom rules it's a tough sell. We need to be able to implement bespoke rules to support our own coding style - stuff like forbidding top-level statements not guarded by if __name__ == "__main__":, for example. We have the choice between maintaining our own fork of Ruff, with limited in-house Rust experience, or use Flake8 + Black.

@Dreamsorcerer
Copy link

We have the choice between maintaining our own fork of Ruff, with limited in-house Rust experience, or use Flake8 + Black.

Or (and I don't necessarily recommend it), Ruff + Flake8 (using the external config option for rules handled by Flake8).

@muglug
Copy link

muglug commented Sep 4, 2024

Members of my team have been agitating for Ruff, but without support for custom rules it's a tough sell.

If they want to monetise what they’ve built (and add this as a paid feature) then Astral might welcome this sort of feedback, but otherwise it feels a little off-key for a free tool with a permissive open-source license.

@bverhoeve
Copy link

I have the same problem as @Dreamsorcerer for my team. flake8 + black works fast enough for us, so the speed of ruff isn't a good enough trade off for the lack of custom plugins.

@jd-solanki
Copy link

Along with plugin system I would like to share great idea for beginners to lint their code who don't know how to write linting rules using AST or something else.

"Lint using regex"

In JS, ESLint has plugin system and we used to write custom rules for our solutions. We wrote some rules using official rules docs but using regex to lint the code allowed all of us in our team to writing linting rules without any additional learning.

Regex plugin that allowed beginner devs to write linting rules: https://www.npmjs.com/package/eslint-plugin-regex

@chadrik
Copy link

chadrik commented Oct 12, 2024

I had a look at ast-grep and it's pretty fantastic. It is fast, it is written in rust and provides multiple high-level APIs, including both python and YAML. But it has one giant problem: it doesn't preserve comments making it pretty useful for real-world code manipulation.

Would it be possible to publish ruff's python ast parser as a crate, which ast-grep could be modified to use, and then ruff could use ast-grep's high level APIs for its plugin system?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to core functionality
Projects
None yet
Development

No branches or pull requests