Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preview style feedback: Parenthesizing long dict, conditional, and type annotations #4123

Open
MichaReiser opened this issue Dec 22, 2023 · 6 comments
Labels
C: preview style Issues with the preview and unstable style. Add the name of the responsible feature in the title. T: style What do we want Blackened code to look like?

Comments

@MichaReiser
Copy link

MichaReiser commented Dec 22, 2023

This is not a proposal for a specific style change but feedback related to the parenthesize_long_type_hints, wrap_long_dict_values_in_parens and parenthesize_conditional_expressions preview styles.

I started implementing the said preview styles in Ruff. Implementing these styles proved challenging, making me take one step back and evaluate the proposed design changes. I concluded that the preview styles improve the overall formatting but don’t fix the underlying problem. I believe that improving the formatting of binary-like expressions, conditional expressions, and potentially other expressions in parenthesized contexts won't just help improve readability in specific parent expressions but improve the readability overall. My goal isn't to stop you from shipping the preview styles as they are (they are improvements in most cases), but I hope to learn more about the design decisions and to align on a future style that addresses the shortcomings outlined in this issue if there's interest. I haven't worked out a specific style proposal for binary expressions and conditional expressions, and it is possible that changing the formatting proves to either a) not match Black's principles or b) be too disruptive to change now.

I start with summarizing the relevant preview styles before talking about concrete feedback. The Black principles that I mention are the principles that I understand from reverse engineering Black. There's a chance that I inferred them incorrectly or that they are unintentional.

Relevant Preview Styles

Parenthesize long type hints

Parenthesize long-type hints to improve separation between arguments.

Stable:

def foo(
    i: int,
    x: Loooooooooooooooooooooooong
    | Looooooooooooooooong
    | Looooooooooooooooooooong
    | Looooooong,
    s: str,
) -> None:
    pass

Preview

def foo(
    x: (
        Loooooooooooooooooooooooong
        | Looooooooooooooooong
        | Looooooooooooooooooooong
        | Looooooong
    ),
    s: str,
) -> None:
    pass

This style also applies to annotated assignments and can help to keep the assignment in the preferred line length:

Stable

x: Loooooooooooooooooooooooong | Looooooooooooooooong | Looooooooooooooooooooo | Looooooong = (
    495
)

Preview

x: (
    Loooooooooooooooooooooooong
    | Looooooooooooooooong
    | Looooooooooooooooooooo
    | Looooooong
) = 495

One important difference between type annotations in annotated assignments and type annotations in function parameters is that type annotations in annotated assignments must be parenthesized, or splitting the expressions over multiple lines may introduce syntax errors (except the expression comes with its own pair of parentheses). Parenthesizing isn’t technically required inside of function parameters because the type annotation is in a parenthesized context (the parameters are always parenthesized, except in lambdas, but they don’t allow type annotations because the : would be ambiguous).

The preview style omits the parentheses if the expression starts or ends with a parenthesized expression (can_omit_optional_parentheses) or is an attribute chain (the same as for expressions in clause headers). The following examples show the formatting with the preview style enabled:

def test(
    argument: VeryLongClassNameWithAwkwardGenericSubtype[
        integeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeer,
        VeryLongClassNameWithAwkwardGenericSubtype,
        str,
    ] = True,
    argument2=None,
):
    pass

def test(
    argument: int | [
        VeryLongClassNameWithAwkwardGeneric,
        VeryLongClassNameWithAwkwardGenericSubtype,
    ] = True,
    argument2=None,
):
    pass

def test(
    argument: [
        VeryLongClassNameWithAwkwardGeneric,
        VeryLongClassNameWithAwkwardGenericSubtype,
    ] | int = True,
    argument2=None,
):
    pass

def test(
    argument: [
        str,
        VeryLongClassNameWithAwkwardGenericSubtype,
    ] | VeryLongClassNameWithAwkwardGenericSubtype[int] = True,
    argument2=None,
):
    pass

def test(
    argument: VeryLongClassNameWithAwkwardGenericSubtype[
        int
    ].VeryLongClassNameWithAwkwardGenericSubtype[str] = True,
    argument2=None,
):
    pass

Not adding the parentheses if splitting after the parenthesized sub-expression is sufficient and improves readability, except when content follows after the first parenthesized expression (last three examples) because it suffers from the same poor readability as the existing stable style.

Parenthesize long dictionary values

Parenthesise long dictionary values to improve separation between dictionary entries:

Stable

{
    aaaaaaaaaa: babbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    * a
    * call(
        "default",
    ),
    bbbbbbbbbbbbbbbbbbb: 23,
}

It’s hard to tell where the value aaa... ends and the bb... entry starts.

Preview

The value of aaa... gets parenthesized:

{
    aaaaaaaaaa: (
        babbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
        * a
        * call(
            "default",
        )
    ),
    bbbbbbbbbbbbbbbbbbb: 23,
}

Black omits parentheses when the dictionary value starts or ends with a parenthesized expression, similar to parenthesize long type hints and the formatting of expressions in clause headers. The following example is formatted with Black’s new preview style enabled:

{
    aaaaaaaaaa: babbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb * a + call(
        "default",
    ),
    bbbbbbbbbbbbbbbbbbb: 23,
}

Parenthesize conditional expressions

This style is only related in that it adds parentheses around long sub-expressions. It differs from the above preview styles in that it always adds parentheses, even if the expression starts or ends with a parenthesized expression.

Stable

[
    "____________________________",
    "foo",
    "bar",
    "baz"
    if some_really_looooooooong_variable
    else "some other looooooooooooooong value",
]

Preview

[
    "____________________________",
    "foo",
    "bar",
    (
        "baz"
        if some_really_looooooooong_variable
        else "some other looooooooooooooong value"
    ),
]

Style Feedback

Not a local issue

While parenthesizing long sub-expressions inside dictionary values and type annotations clearly improves readability, it only solves some of the problems but not all. The very same issue exists at least for:

Parameter default values

def test(
    aaaaaaaaaa=babbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    * a
    * call("default"),
    b=None,
):
    pass

Slice Indices

a[
    aaaaaaaaaa
    * babbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    * a
    * call("default"),
    b,
]

Conditional Expressions

x = (
    aaaaaaaaaa
    * babbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    * a
    * call("default")
    if cccccccccccccc
    * dddddddddddddddddddddddddddddddddddddddddddddd
    * e
    * call("default")
    else ffffffffffffffffff
    * gggggggggggggggggggggggggggggggggggggg
    * h
    * call("default")
)

Lists

[
    babbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    * a
    * call(
        "default",
    ),
    23,
]

With Items

with (
    aaaaaaaaaa
    * babbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    * a
    * call("default") as x,
    yyyyyyyyyyyyy,
    zzzzzzzzzzzzzzz,
):
    pass

I suspect the problem is even more common and applies to all places where Black formats multiple expressions in an indented context, and the sub-expression doesn’t come with its own parentheses.

The issue isn’t specific to binary expression. It also applies to conditional expressions (fixed by the parenthesize long conditional expressions preview style) and long call chains.

def test(
    aaaaaaaaaa=ibabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    if a
    else call(
        "default",
    ),
    bbbbbbbbbbbbbbbbbbb=23,
):
    pass

def test(
    aaaaaaaaaa=ibabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb(1, 2)
    .aa.bbbbbbbbbb("default")
    .ddddd,
    bbbbbbbbbbbbbbbbbbb=23,
):
    pass

That’s why I believe this is a larger problem about how sub-expressions (binary expressions, conditional expressions) should be formatted. For example, I find it very hard to parse the following binary expression.

(
    aaaaaaaaa
    + bbbbbbbbbbbb
    * ccccccccccccccc
    * ddddddddddddddd
    * xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    + yyyyyyyyyyyy
    + zzzzzzzzzzzzzz
)

Adding some extra space (similar to Prettier and rustfmt) greatly helps.

(
    aaaaaaaaa
        + bbbbbbbbbbbb
            * ccccccccccccccc
            * ddddddddddddddd
            * xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        + yyyyyyyyyyyy
        + zzzzzzzzzzzzzz
)

That’s why I believe the problem is mainly about how we format some expressions rather than whether they should be parenthesized if embedded in other expressions and statements.

Doesn’t solve the problem entirely

Black omits parentheses for some expressions. This can lead to the very same problem that the new preview styles try to avoid:

{
    aaaaaaaaaa: call(
        a_call,
        with_plenty,
        arguments,
        so_that_it_splits,
    )
    .length.more,
    bbbbbbbbbbbbbbbbbbb: 23,
}

{
    aaaaaaaaaa: x.aaaaaaaaaaa[
        1, 2, 3, 4
    ].bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb,
    bbbbbbbbbbbbbbbbbbb: 23,
}

These examples are uncommon, but they demonstrate that omitting parentheses when the first (or any middle part) is parenthesized surfaces the same problem the new styles intended to solve.

Normalizing parentheses of sub-expressions sets a new precedence.

Black puts a lot of effort into avoiding parenthesizing expressions when not necessary. This also applies to the new preview styles discussed above: They avoid parenthesizing the expression when it starts or ends with an expression with its own set of parentheses. Black also has a long history of preserving parentheses around sub-expressions and only adding/removing parentheses for top-level expressions.

The new preview style formatting breaks with both these principles:

  • Black adds parentheses even if the expression is already in a parenthesized context.
  • Black now normalizes parentheses around dictionary values, conditional expressions, and type annotations.

I would love to see Black normalize more parentheses (as long as it doesn’t change semantics). But I think this change should be made holistically rather than in one-off places.

I agree that adding parentheses helps improve readability, but I think indenting yields similar readability improvements with less clutter and is more in line with Black’s principles:

# Preview
{
    x: (
        aaaaaaaaa
        + bbbbbbbbbbbb
        * ccccccccccccccc
        * ddddddddddddddd
        * xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        + yyyyyyyyyyyy
        + zzzzzzzzzzzzzz
    ),
    s: str,
}

# Proposed

{
    x: aaaaaaaaa
        + bbbbbbbbbbbb
            * ccccccccccccccc
            * ddddddddddddddd
            * xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        + yyyyyyyyyyyy
        + zzzzzzzzzzzzzz,
    s: str,
}

For comparison

Prettier:

({
  x:
    aaaaaaaaa +
    bbbbbbbbbbbb *
      ccccccccccccccc *
      ddddddddddddddd *
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +
    yyyyyyyyyyyy +
    zzzzzzzzzzzzzz,
  s: str,
});

Rust:

FormatModuleError {
    x: aaaaaaaaa
        + bbbbbbbbbbbb
            * ccccccccccccccc
            * ddddddddddddddd
            * xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        + yyyyyyyyyyyy
        + zzzzzzzzzzzzzz,
};

In parentheses formatting

Black differentiates between two contexts when it comes to formatting binary expressions (applies to a few others, but we can ignore them for simplicity):

  • Unparenthesized: Black prefers splitting parenthesized expressions (list, dict, calls, subscript) over splitting any binary expressions:

    aaaaaaaaaaaaaaaaaaaaaaaaaaaaa * ccccccccccccccccccc + ccccccaaaaaaaaall(
    	123, 345, 6789, 10000
    )
  • Parenthesized: Black splits around the binary expression operators before splitting the operands if the entire expression is in a parenthesized context.

    a = [
    	aaaaaaaaaaaaaaaaaaaaaaaaaaaaa * ccccccccccccccccccc
    	+ ccccccaaaaaaaaall(123, 345, 6789, 10000)
    ]

The motivation for the two different layouts (as far as I understand it) is that splitting around binary expression operators is only valid if the expression is inside parentheses because otherwise, it results in invalid syntax.

Black’s preview styles now apply the “unparenthesized” layout even in parenthesized contexts:

{
    aa: aaaaaaaaaaaaaaaaaaaaaaaaaaaaa * ccccccccccccccccccc + ccccccaaaaaaaaall(
        123, 345, 6789, 10000
    )
}

Over the “parenthesized” layout (note: I added a manual indent for the second line.):

{
    aa: aaaaaaaaaaaaaaaaaaaaaaaaaaaaa * ccccccccccccccccccc
        + ccccccaaaaaaaaall(123, 345, 6789, 10000)
}

Both styles feel acceptable to me, but using the parenthesized layout feels more consistent with how Black formats the same expression outside of dictionaries.

Summary

The main shortcoming is how binary and conditional expressions are formatted today and, only to a lesser extent, how they are formatted when used as a sub-expression. That’s why I believe that improving the formatting of the said expressions is more impactful than parenthesizing them in some, but not all, contexts. I also believe that improving the formatting increases consistency and reduces the formatter’s complexity (this may not be true for Black, but applies to Ruff) because it avoids making exceptions to some of Black’s principles.

The downside is that tackling the problem holistically requires working out a concrete formatting proposal and takes more time. While I believe that we can do better when it comes to formatting binary expressions, it’s worth considering that it is a very disruptive change. While it would improve overall formatting, there will be cases were it formats worse than it used to.

@jakkdl
Copy link
Contributor

jakkdl commented Dec 22, 2023

Discussions that's happened in Black, for context:
#2316 - main issue discussing different ways of handling long type hints
#3930 - PR that handled long type hints in a different way, abandoned after seeing the diffs on real-life code where they weren't made more readable on average.

When I implemented #3899 I did note a couple related examples that aren't handled well at the moment
https://github.com/jakkdl/black/blob/391a41334100d1640859090a1b55862190680265/tests/data/preview_py_310/pep604_union_types_line_breaks.py#L22-L45

I'm not a seasoned contributor to Black and not especially familiar with the code base, so when I implemented parenthesize_long_type_hints I opted for a minimal change, but I agree that there should probably be a larger overhaul.

@MichaReiser
Copy link
Author

@jakkdl thanks for linking the issues and giving me some more context.

PR that handled long type hints in a different way, abandoned after seeing the diffs on real-life code where they weren't made more readable on average.

I saw the PR but haven't been able to look at the diff shade report because the git logs have expired ;(

I'm not a seasoned contributor to Black and not especially familiar with the code base, so when I implemented parenthesize_long_type_hints I opted for a minimal change, but I agree that there should probably be a larger overhaul.

I think that's very reasonable and I think it's worth shipping the changes for Black because it does improve readability in many cases. For ruff, it's a bit more complicated because implementing requires some significant changes that don't feel justified, especially if we want to change the formatting again later.

@jakkdl
Copy link
Contributor

jakkdl commented Dec 24, 2023

@jakkdl thanks for linking the issues and giving me some more context.

PR that handled long type hints in a different way, abandoned after seeing the diffs on real-life code where they weren't made more readable on average.

I saw the PR but haven't been able to look at the diff shade report because the git logs have expired ;(

You should be able to view them at https://github.com/psf/black/actions/runs/6442045741/job/17492459124?pr=3930 in the "Generate HTML diff report" step

I'm not a seasoned contributor to Black and not especially familiar with the code base, so when I implemented parenthesize_long_type_hints I opted for a minimal change, but I agree that there should probably be a larger overhaul.

I think that's very reasonable and I think it's worth shipping the changes for Black because it does improve readability in many cases. For ruff, it's a bit more complicated because implementing requires some significant changes that don't feel justified, especially if we want to change the formatting again later.

👍

@hauntsaninja hauntsaninja added the C: preview style Issues with the preview and unstable style. Add the name of the responsible feature in the title. label Dec 28, 2023
@JelleZijlstra
Copy link
Collaborator

Thanks, it's a good idea to think about these changes more generally. We often make changes that are limited to one syntactic context, when the same change could really make sense in other contexts too. We're in practice limited to those changes that people are interested enough to send us; I personally don't have the time or inclination to work on bigger changes directly.

We're likely to actually drop wrap_long_dict_values_in_parens because of some issues similar to the ones you mentioned.

Looking at your proposed

{
    x: aaaaaaaaa
        + bbbbbbbbbbbb
            * ccccccccccccccc
            * ddddddddddddddd
            * xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        + yyyyyyyyyyyy
        + zzzzzzzzzzzzzz,
    s: str,
}

I feel it makes the structure of the code less clear than a solution with more parentheses:

{
    x: (
        aaaaaaaaa
        + (
            bbbbbbbbbbbb
            * ccccccccccccccc
            * ddddddddddddddd
            * xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        )
        + yyyyyyyyyyyy
        + zzzzzzzzzzzzzz
    ),
    s: str,
}

In this second variant, the structure of the code mirrors the AST more directly, which makes it easier to understand (for me at least!).

@MichaReiser
Copy link
Author

Hy @JelleZijlstra

Thanks, it's a good idea to think about these changes more generally. We often make changes that are limited to one syntactic context, when the same change could really make sense in other contexts too. We're in practice limited to those changes that people are interested enough to send us; I personally don't have the time or inclination to work on bigger changes directly.

That makes sense to me and is probably also a good way to test changes before applying them more broadly.

I like what you're proposing. I would probably omit the outermost parentheses which gets you very close to what prettier does

{
    x: aaaaaaaaa
        + (
            bbbbbbbbbbbb
            * ccccccccccccccc
            * ddddddddddddddd
            * xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        )
        + yyyyyyyyyyyy
        + zzzzzzzzzzzzzz,
    s: str,
}

Prettier:

a = {
  x:
    aaaaaaaaa +
    bbbbbbbbbbbb *
      ccccccccccccccc *
      ddddddddddddddd *
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +
    yyyyyyyyyyyy +
    zzzzzzzzzzzzzz,
  s: str,
};

I have long wondered why Black isn't more opinionated about where it inserts (or removes) parentheses. Prettier normalizes all parentheses for you and I always liked that.

I believe to now understand the reasons why Black doesn't normalize parentheses: Inserting parentheses in binary expressions is safe, but removing them might not be. @konstin pointed this out when I sought feedback for alternative designs for parenthesize_long_type_hints.. The problem is that Python supports operator overloading, and, to my knowledge, it allows you to overload them in arbitrary ways, even such that violate the operator's associativity. This problem has come up more recently in a lint rule that converts unnecessary dunder calls to their corresponding operators source and the example @konstin shared internally is:

class A:
    def __init__(self, x: int):
        self.x = x

    def __or__(self, other: "A"):
        return A(self.x - other.x)


def f(param: A(2) | A(3) | A(4)):
    print("hi")


def g(param: A(2) | (A(3) | A(4))):
    print("hi")


if __name__ == '__main__':
    print(f.__annotations__["param"].x)
    print(g.__annotations__["param"].x)

that prints prints -5 and 3

I fear that this limits us where we can insert parentheses because we can't remove them without risking changing the program's semantics, and not removing them violates reversibility.

Do you know if that's the reason why Black doesn't normalize parentheses today?

@tylerlaprade
Copy link

I feel it makes the structure of the code less clear than a solution with more parentheses

I'm only a single data point, but for me the un-parenthesized version is more readable. Since Python already ascribes meaning to indentation, it's natural for me to assume that everything indented is a single term of the expression, according to order of operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: preview style Issues with the preview and unstable style. Add the name of the responsible feature in the title. T: style What do we want Blackened code to look like?
Projects
None yet
Development

No branches or pull requests

5 participants