Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Consider space-only lines to be empty #220

Merged
merged 4 commits into from
Jan 14, 2024
Merged

Conversation

pawamoy
Copy link
Member

@pawamoy pawamoy commented Oct 18, 2023

Related to PR #219: #219

@machow

@machow
Copy link
Contributor

machow commented Oct 19, 2023

Whoops, I think I left my comment directly on the commit, rather than the PR 🤷‍♂️

b9bdee6#r130433541

I think checking for " " * 4 is accommodating parameter descriptions that add linebreaks inside themselves (e.g. the description spreads itself out with newlines, but keeps indenting by 4 so it's clear it's still going).

And then the allow_section_blank_line is accommodating an extra blankline between e.g. the documentation for two parameters.

@pawamoy
Copy link
Member Author

pawamoy commented Oct 19, 2023

You're right, but I think the previous order of conditions was not handling both cases correctly, and now it is.

Previously:

  • if line is indented, consider it's a continuation line (this condition catches lines composed only of spaces!)
  • else if line is empty, consider empty line (in accordance with blank line option) -> eaten by previous condition!

Now:

  • if line is empty, consider empty line (in accordance with blank line option)
  • if line is indented, consider it's a continuation line

Single blank lines are still supported in the middle of parameter descriptions.

EDIT: or maybe not... I'll add tests to confirm.

@pawamoy
Copy link
Member Author

pawamoy commented Oct 19, 2023

Seems like there's a single issue:

    docstring = """
        Examples
        --------
        Line 1.

        Line 2.
    """
    sections, _ = parse_numpy(docstring, allow_section_blank_line=False)
    assert len(sections) == 2

This fails.

@machow
Copy link
Contributor

machow commented Oct 19, 2023

if line is indented, consider it's a continuation line

It seems like this won't be routed to right place, since _is_empty_line() uses .strip()

@pawamoy
Copy link
Member Author

pawamoy commented Oct 19, 2023

I'm not sure what you mean in the last comment. And I'm not sure my own comment was clear 😅

I'll try to restate things more clearly.

IMO it looks like the allow_section_blank_line option does not work correctly.
With this docstring:

Examples
--------
Line 1.

Line 2.

...and allow_section_blank_line=False, it should parse as two sections, one example section and one text section. But it parses as a single example section.

Then, in parameter descriptions (or any other kind of item, like raised exceptions or emitted warnings), the option is currently "baked-in": whatever its value, a single blank line does not break the item or section, and two blank lines break the item and section.

I'd say this behavior makes sense, but this can be debated. I don't see why anyone would use two blank lines instead of one if they don't want to break a section... unless they use Markdown extensions that insert dynamic contents in the docstring, with unchecked leading/trailing new lines?

(For sections themselves, I previously decided it should break with a single blank line by default to retain the previous behavior, for backward compatibility)

Maybe the allow_section_blank_line should also have an effect on item descriptions as well as sections?

Damn, that sounds so complicated for a single docstring parser 😅

IMO we just have to fix the failing case, and we're good to go.

@machow
Copy link
Contributor

machow commented Oct 20, 2023

As far as I know, the notion of ending a section based on linebreaks only applies to certain contexts in the parser:

  • _read_block(): empty lines can't trigger a new section. A section ends when a new one is detected. This is because numpydoc sections like Examples can contain a lot of free text, including many empty lines.

  • _read_block_items(): empty lines can trigger a new section (due to the original parser implementation; it makes sense you need to decide when the parameter items stop). In your original implementation, though, a line starting with 4 spaces is not considered empty (in the blame, the 4 space rule is from 2 years ago). This is because 4 spaces is a way to indicate that a parameter item description is still going.

Examples

Since _read_block() doesn't use empty lines to trigger new sections, the example docstring you shared should be a single section:

Examples
--------
Line 1.

Line 2.

Since _read_block_items() originally allowed starting a line with 4 spaces to indicate a line is not empty, this was originally read as a single section:

Parameters
------------
x: int
    a description
    # <- note the 4 spaces, but pretend this comment doesn't exist
    # <- note the 4 spaces, but pretend this comment doesn't exist
    more description

The change allow_section_blank_line made was to allow this to be a single section:

Parameters
-----------
x: int
    the x description
# <- note no spaces, but pretend this comment doesn't exist
y: int
    the y description

I could be very wrong here!

I've tried to read through the git blame for the parser to get a feel for the original behavior, but I could be missing something 😅. Overall, though, I think the PR may be implementing new, more aggressive terminating of sections, and with numpydoc empty lines are often not meant to signal that a section is ending.

Thanks for taking the time read through all this, and improve the numpy parser! I'm happy to dig deeper where useful!

@pawamoy
Copy link
Member Author

pawamoy commented Oct 25, 2023

Thanks for the feedback. It all makes sense. It guess it boils down to: do we want to allow prose in between sections? Which the Numpydoc style guide does not explicitly allow. If not, then we don't have to check for blank lines anywhere, since our only section delimiters are sections titles (and the dash line after them).

However I'm strong on "line with spaces only is an empty/blank line", because linters will report these lines, and formatters will remove the spaces, and therefore users shouldn't have to rely on those. This fits well with "no prose in-between sections" anyway. Should we move towards this?

@machow
Copy link
Contributor

machow commented Oct 31, 2023

Hey, sorry for the wait -- I tried out a bunch of docstrings in sphinx, and also afaict it also doesn't care if you have a line that's just 4 spaces. Let me test a little bit more, but I think I'm coming around to...

However I'm strong on "line with spaces only is an empty/blank line"

@pawamoy
Copy link
Member Author

pawamoy commented Nov 11, 2023

How are your tests going 🙂?

@pawamoy
Copy link
Member Author

pawamoy commented Jan 14, 2024

Hey @machow, just letting you know that I'll move forward with this.

@pawamoy pawamoy merged commit 8c57354 into main Jan 14, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants