Skip to content

Commit

Permalink
Add an option to auto regenerate baseline (#202)
Browse files Browse the repository at this point in the history
  • Loading branch information
jsh9 authored Jan 12, 2025
1 parent 7282eb6 commit fe188f4
Show file tree
Hide file tree
Showing 7 changed files with 521 additions and 84 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Change Log

## [0.5.17] - 2025-01-12

- Added
- A new config option `--auto-regenerate-baseline` to automatically
regenerate the baseline file for every successful _pydoclint_ run

## [0.5.16] - 2025-01-11

- Added
Expand Down
31 changes: 21 additions & 10 deletions docs/config_options.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ page:
- [17. `--only-attrs-with-ClassVar-are-treated-as-class-attrs` (shortform: `-oawcv`, default: `False)](#17---only-attrs-with-classvar-are-treated-as-class-attrs-shortform--oawcv-default-false)
- [18. `--baseline`](#18---baseline)
- [19. `--generate-baseline` (default: `False`)](#19---generate-baseline-default-false)
- [20. `--show-filenames-in-every-violation-message` (shortform: `-sfn`, default: `False`)](#20---show-filenames-in-every-violation-message-shortform--sfn-default-false)
- [21. `--config` (default: `pyproject.toml`)](#21---config-default-pyprojecttoml)
- [20. `--auto-regenerate-baseline` (shortform: `-arb`, default: `True`)](#20---auto-regenerate-baseline-shortform--arb-default-true)
- [21. `--show-filenames-in-every-violation-message` (shortform: `-sfn`, default: `False`)](#21---show-filenames-in-every-violation-message-shortform--sfn-default-false)
- [22. `--config` (default: `pyproject.toml`)](#22---config-default-pyprojecttoml)

<!--TOC-->

Expand Down Expand Up @@ -215,26 +216,36 @@ Baseline allows you to remember the current project state and then show only
new violations, ignoring old ones. This can be very useful when you'd like to
gradually adopt _pydoclint_ in existing projects.

A path to the file is expected. It is recommended to add this option to config
file. (The default config file is `pyproject.toml`.)
If you'd like to use this feature, pass in the full file path to this option.
For convenience, you can write this option in your `pyproject.toml` file:

```toml
[tool.pydoclint]
baseline = "pydoclint-baseline.txt"
```

If `--generate-baseline=True` (or `--generate-baseline True`) is passed,
If you also set `--generate-baseline=True` (or `--generate-baseline True`),
_pydoclint_ will generate a file that contains all current violations of your
project. If `--generate-baseline` is not passed (default value is `False`),
_pydoclint_ will read your baseline file, and ignore all violations specified
in that file.
project.

If `--generate-baseline` is not passed to _pydoclint_ (the default
is `False`), _pydoclint_ will read your baseline file, and ignore all
violations specified in that file.

## 19. `--generate-baseline` (default: `False`)

Required to use with `--baseline` option. If `True`, generate the baseline file
that contains all current violations.

## 20. `--show-filenames-in-every-violation-message` (shortform: `-sfn`, default: `False`)
## 20. `--auto-regenerate-baseline` (shortform: `-arb`, default: `True`)

If it's set to True, _pydoclint_ will automatically regenerate the baseline
file every time you fix violations in the baseline and rerun _pydoclint_.

This saves you from having to manually regenerate the baseline file by setting
`--generate-baseline=True` and run _pydoclint_.

## 21. `--show-filenames-in-every-violation-message` (shortform: `-sfn`, default: `False`)

If False, in the terminal the violation messages are grouped by file names:

Expand Down Expand Up @@ -268,7 +279,7 @@ This can be convenient if you would like to click on each violation message and
go to the corresponding line in your IDE. (Note: not all terminal app offers
this functionality.)

## 21. `--config` (default: `pyproject.toml`)
## 22. `--config` (default: `pyproject.toml`)

The full path of the .toml config file that contains the config options. Note
that the command line options take precedence over the .toml file. Look at this
Expand Down
119 changes: 88 additions & 31 deletions pydoclint/baseline.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,19 @@

from pydoclint.utils.violation import Violation

SEPARATOR = '--------------------\n'
SEPARATOR = '--------------------\n' # 20 dashes
LEN_INDENT = 4
INDENT = ' ' * LEN_INDENT
ONE_SPACE = ' '
INDENT = ONE_SPACE * LEN_INDENT


def generateBaseline(
violationsInAllFiles: dict[str, list[Violation]], path: Path
violationsAllFiles: dict[str, list[Violation]] | dict[str, list[str]],
path: Path,
) -> None:
"""Generate baseline file based of passed violations."""
with path.open('w', encoding='utf-8') as baseline:
for file, violations in violationsInAllFiles.items():
for file, violations in violationsAllFiles.items():
if violations:
baseline.write(f'{file}\n')
for violation in violations:
Expand All @@ -24,10 +26,10 @@ def generateBaseline(
baseline.write(f'{SEPARATOR}')


def parseBaseline(path: Path) -> dict[str, set[str]]:
def parseBaseline(path: Path) -> dict[str, list[str]]:
"""Parse baseline file."""
with path.open('r', encoding='utf-8') as baseline:
parsed: dict[str, set[str]] = {}
parsed: dict[str, list[str]] = {}
splittedFiles = [
list(group)
for key, group in groupby(
Expand All @@ -36,34 +38,89 @@ def parseBaseline(path: Path) -> dict[str, set[str]]:
if not key
]
for file in splittedFiles:
parsed[file[0].strip()] = {func.strip() for func in file[1:]}
parsed[file[0].strip()] = [func.strip() for func in file[1:]]

return parsed


def removeBaselineViolations(
baseline: dict[str, set[str]],
violationsInAllFiles: dict[str, list[Violation]],
) -> tuple[bool, dict[str, list[Violation]]]:
def reEvaluateBaseline(
baseline: dict[str, list[str]],
actualViolationsInAllFiles: dict[str, list[Violation]],
) -> tuple[bool, dict[str, list[str]], dict[str, list[Violation]]]:
"""
Remove from the violation dictionary the already existing violations
specified in the baseline file.
Re-evaluate baseline violations, dropping those that are already fixed
by the users, and calculating those that still need to be fixed.
Parameters
----------
baseline : dict[str, list[str]]
The baseline violations, parsed from the baseline file
actualViolationsInAllFiles : dict[str, list[Violation]]
The actual violations that pydoclint finds, which may contain
baseline violations. The keys of the dictionary are the file names
in the repo that pydoclint looks at
Returns
-------
baselineRegenerationNeeded : bool
Whether the baseline file should be regenerated
unfixedBaselineViolationsInAllFiles : dict[str, list[str]]
The unfixed baseline violations in all the Python files of the repo
that pydoclint looks at. The keys are file names, and the values
(``list[str]``) are lists of violation messages (``str``) in
each file
remainingViolationsInAllFiles : dict[str, list[Violation]]
The remaining violations that users still need to fix. The keys are
file names, and the values (``list[Violation]``) are lists of
violations (``Violation``) in each file
"""
baselineRegenerationNeeded: bool = False

unfixedBaselineViolationsInAllFiles: dict[str, list[str]] = {}
remainingViolationsInAllFiles: dict[str, list[Violation]] = {}

for file, actualViolations in actualViolationsInAllFiles.items():
baselineViolations: list[str] = baseline.get(file, [])

unfixedBaselineViolations: list[str]
remainingViolations: list[Violation]

(
unfixedBaselineViolations,
remainingViolations,
) = calcUnfixedBaselineViolationsAndRemainingViolations(
baselineViolations=baselineViolations,
actualViolations=actualViolations,
)

if unfixedBaselineViolations != baselineViolations:
baselineRegenerationNeeded = True

unfixedBaselineViolationsInAllFiles[file] = unfixedBaselineViolations
remainingViolationsInAllFiles[file] = remainingViolations

return (
baselineRegenerationNeeded,
unfixedBaselineViolationsInAllFiles,
remainingViolationsInAllFiles,
)


def calcUnfixedBaselineViolationsAndRemainingViolations(
baselineViolations: list[str],
actualViolations: list[Violation],
) -> tuple[list[str], list[Violation]]:
"""
Based on the baseline violations and the actual violations, calculate
which baseline violations have not been fixed, and which violations are
new (not part of the baseline) and need to be fixed.
"""
baselineRegenerationNeeded = False
clearedViolationsAllFiles: dict[str, list[Violation]] = {}
for file, violations in violationsInAllFiles.items():
if oldViolations := baseline.get(file):
newViolations = []
if len(violations) < len(oldViolations):
baselineRegenerationNeeded = True

for violation in violations:
if f'{str(violation).strip()}' not in oldViolations:
newViolations.append(violation)

if newViolations:
clearedViolationsAllFiles[file] = newViolations
elif violations:
clearedViolationsAllFiles[file] = violations

return baselineRegenerationNeeded, clearedViolationsAllFiles
unfixedBaselineViolations: list[str] = []
remainingViolations: list[Violation] = []
for viol in actualViolations:
if str(viol) in baselineViolations:
unfixedBaselineViolations.append(str(viol))
else:
remainingViolations.append(viol)

return unfixedBaselineViolations, remainingViolations
69 changes: 51 additions & 18 deletions pydoclint/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from pydoclint.baseline import (
generateBaseline,
parseBaseline,
removeBaselineViolations,
reEvaluateBaseline,
)
from pydoclint.parse_config import (
injectDefaultOptionsFromUserSpecifiedTomlFilePath,
Expand Down Expand Up @@ -275,6 +275,17 @@ def validateStyleValue(
' file should be specified by the --baseline option.)'
),
)
@click.option(
'-arb',
'--auto-regenerate-baseline',
type=bool,
show_default=True,
default=True,
help=(
'If True, automatically regenerate the baseline file every time'
' pydoclint runs successfully.'
),
)
@click.option(
'-sfn',
'--show-filenames-in-every-violation-message',
Expand Down Expand Up @@ -341,6 +352,7 @@ def main( # noqa: C901
require_yield_section_when_yielding_nothing: bool,
only_attrs_with_classvar_are_treated_as_class_attrs: bool,
generate_baseline: bool,
auto_regenerate_baseline: bool,
baseline: str,
show_filenames_in_every_violation_message: bool,
config: str | None, # don't remove it b/c it's required by `click`
Expand Down Expand Up @@ -400,7 +412,7 @@ def main( # noqa: C901
click.echo(
click.style(
"The baseline file was specified but it doesn't exist.\n"
'Use --generate-baseline True to generate it.',
'Use `--generate-baseline=True` to generate it first.',
fg='red',
bold=True,
),
Expand Down Expand Up @@ -444,8 +456,8 @@ def main( # noqa: C901
if baseline is None:
click.echo(
click.style(
'The baseline file was not specified. '
'Use --baseline option or specify it in your config file',
'The baseline file was not specified. Use the'
' --baseline option or specify it in your config file',
fg='red',
bold=True,
),
Expand All @@ -456,30 +468,51 @@ def main( # noqa: C901
generateBaseline(violationsInAllFiles, baselinePath)
click.echo(
click.style(
'Baseline file was sucessfuly generated', fg='green', bold=True
'The baseline file was successfully generated',
fg='green',
bold=True,
),
err=echoAsError,
)
ctx.exit(0)

if baseline is not None:
parsedBaseline = parseBaseline(baselinePath)
parsedBaseline: dict[str, list[str]] = parseBaseline(baselinePath)
(
baselineRegenerationNeeded,
unfixedBaselineViolationsInAllFiles,
violationsInAllFiles,
) = removeBaselineViolations(parsedBaseline, violationsInAllFiles)
) = reEvaluateBaseline(parsedBaseline, violationsInAllFiles)
if baselineRegenerationNeeded:
click.echo(
click.style(
'Some old violations was fixed. Please regenerate'
' your baseline file after fixing new problems.\n'
'Use option --generate-baseline True',
fg='red',
bold=True,
),
err=echoAsError,
)

if auto_regenerate_baseline:
generateBaseline(
violationsAllFiles=unfixedBaselineViolationsInAllFiles,
path=baselinePath,
)
click.echo(
click.style(
'Some old violations were fixed, and'
' the baseline file was successfully re-generated',
fg='green',
bold=True,
),
err=echoAsError,
)
else:
click.echo(
click.style(
'Some old violations were fixed. Please regenerate'
' your baseline file after fixing new problems.\n'
'Use `--generate-baseline=True`. Or you can use'
' `--auto-regenerate-baseline=True` to do this'
' automatically in the future.',
fg='red',
bold=True,
),
err=echoAsError,
)

# Print violation messages nicely to the terminal
violationCounter: int = 0
if len(violationsInAllFiles) > 0:
counter = 0
Expand Down
10 changes: 10 additions & 0 deletions pydoclint/utils/violation.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,16 @@ def _str(self, showLineNum: bool = False) -> str:

return f'{self.line}: {self.__str__()}'

def __eq__(self, other: object) -> bool:
if not isinstance(other, Violation):
return False

return (
self.line == other.line
and self.code == other.code
and self.msg == other.msg
)

def getInfoForFlake8(self) -> tuple[int, int, str]:
"""Get the violation info for flake8"""
colOffset: int = 0 # we don't need column offset to locate the issue
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = pydoclint
version = 0.5.16
version = 0.5.17
description = A Python docstring linter that checks arguments, returns, yields, and raises sections
long_description = file: README.md
long_description_content_type = text/markdown
Expand Down
Loading

0 comments on commit fe188f4

Please sign in to comment.