Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New helper script to run fast lints and pre-push hook that runs it #3949

Merged
merged 22 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ format: cpp-format toml-format py-format
# Lint all of our code
lint: toml-lint py-lint rs-lint

# Run the fast versions of our linters
fast-lint:
python scripts/fast_lint.py

### C and C++

# Clear the C++ build directories
Expand Down
117 changes: 117 additions & 0 deletions scripts/fast_lint.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
#!/usr/bin/env python3

"""Runs the common linters on any files that have changed relative to the main branch."""
from __future__ import annotations

import argparse
import concurrent.futures
import logging
import os
import subprocess
import sys
import time

from git import Repo
jleibs marked this conversation as resolved.
Show resolved Hide resolved


def changed_files() -> list[str]:
repo = Repo(os.getcwd())

current_branch = repo.active_branch
common_ancestor = repo.merge_base(current_branch, "main")[0]

return [item.b_path for item in repo.index.diff(common_ancestor)]


def run_cmd(cmd: list[str], files: list[str] | bool) -> bool:
jleibs marked this conversation as resolved.
Show resolved Hide resolved
start = time.time()
if not files:
jleibs marked this conversation as resolved.
Show resolved Hide resolved
logging.info(f"SKIP: `{' '.join(cmd)}`")
return True

if isinstance(files, bool):
files = []
proc = subprocess.run(cmd + files, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
if proc.returncode == 0:
logging.info(f"PASS: `{' '.join(cmd)} {'<FILES>' if files else ''}` in {time.time() - start:.2f}s")
if proc.stdout:
logging.debug(f"stdout: {proc.stdout}")
else:
logging.info(f"FAIL: `{' '.join(cmd)} {'<FILES>' if files else ''}` in {time.time() - start:.2f}s")
if proc.stdout:
logging.info(f"stdout: {proc.stdout}")

return proc.returncode == 0


def filter_ext(files: list[str], ext: list[str]) -> list[str]:
return [f for f in files if any(f.endswith(e) for e in ext)]


def filter_cpp(files: list[str]) -> list[str]:
return [f for f in files if f.endswith(".h") or f.endswith(".hpp") or f.endswith(".c") or f.endswith(".cpp")]


def main() -> None:
start = time.time()
parser = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument(
"--log-level",
dest="log_level",
default="INFO",
choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
help="Set the logging level (default: INFO)",
)
parser.add_argument("--num-threads", type=int, default=8, help="Number of threads to use (default: 4)")
parser.add_argument(
"files",
metavar="file",
type=str,
nargs="*",
help="File paths. Empty = all files, recursively.",
)

args = parser.parse_args()
logging.basicConfig(level=args.log_level)

script_dirpath = os.path.dirname(os.path.realpath(__file__))
root_dirpath = os.path.abspath(f"{script_dirpath}/..")
os.chdir(root_dirpath)

if args.files:
files = args.files
else:
files = changed_files()

logging.debug("Checking:")
for f in files:
logging.debug(f" {f}")

jobs = [
(["pixi", "run", "codegen", "--check"], True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one takes 5-7s for me, which is dangerously close to some threshold as far as pre-commit hook are concerned.

❯ python scripts/fast_lint.py 
INFO:root:SKIP: `rustfmt --check`
INFO:root:SKIP: `taplo fmt --check`
INFO:root:SKIP: `pixi run clang-format --dry-run`
INFO:root:PASS: `ruff check --config rerun_py/pyproject.toml <FILES>` in 0.01s
INFO:root:PASS: `typos <FILES>` in 0.04s
INFO:root:PASS: `python scripts/lint.py <FILES>` in 0.10s
INFO:root:PASS: `blackdoc --check <FILES>` in 0.51s
INFO:root:PASS: `black --check --config rerun_py/pyproject.toml <FILES>` in 0.54s
INFO:root:FAIL: `mypy --no-warn-unused-ignore <FILES>` in 0.99s
INFO:root:stdout: scripts/screenshot_compare/build_screenshot_compare.py:84: error: Incompatible types in assignment (expression has type "str", variable has type "Path")  [assignment]
Found 1 error in 1 file (checked 2 source files)

INFO:root:PASS: `pixi run codegen --check ` in 5.70s
INFO:root:Lints failed in 5.73s

That's after a few runs to warm things up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(running on my antoine/screenshot_compare branch)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as pre-commi

Ahh yeah, agreed it's slow for a pre-commit. I have been primarily thinking of this as a pre-push, which I think strikes the right balance. I've now added option to skip arbitrary jobs if you wanted to use for a pre-commit as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-push makes sense, yeah 👍🏻

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still sounds too slow for a pre-push for my taste. It's also not something that I usually forget to do (running codegen), while things like typos fail my CI very often.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often are you git pushing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno. 5-20 times a day? 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright -- if you want to save those 25 - 100 seconds you can now:

export RERUN_LINT_SKIP=lint-codegen

(["rustfmt", "--check"], filter_ext(files, [".rs"])),
(["python", "scripts/lint.py"], files),
(["typos"], files),
(["taplo", "fmt", "--check"], filter_ext(files, [".toml"])),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this so much! Taplo is way to slow when running `just toml-lint.

(["pixi", "run", "clang-format", "--dry-run"], filter_ext(files, [".cpp", ".c", ".h", ".hpp"])),
(["ruff", "check", "--config", "rerun_py/pyproject.toml"], filter_ext(files, [".py"])),
(["black", "--check", "--config", "rerun_py/pyproject.toml"], filter_ext(files, [".py"])),
(["blackdoc", "--check"], filter_ext(files, [".py"])),
(["mypy", "--no-warn-unused-ignore"], filter_ext(files, [".py"])),
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of this PR scope, but it would be nice if all of these would actually be (just_command, file_list) tuple. justfile as single source of truth on how to run things...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sold on making these all dispatch through just since I'm not sure running them individually always makes sense. However, I've added them all to pixi and updated the pixi deps so they should now run deterministically from the pixi environment.


with concurrent.futures.ThreadPoolExecutor(max_workers=args.num_threads) as executor:
results = [executor.submit(run_cmd, command, files) for command, files in jobs]

success = all(result.result() for result in results)

if success:
logging.info(f"All lints passed in {time.time() - start:.2f}s")
sys.exit(0)
else:
logging.info(f"Lints failed in {time.time() - start:.2f}s")
sys.exit(1)


if __name__ == "__main__":
main()
Loading