Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import format changes based on line ending type #1566

Closed
VakarisZ opened this issue Oct 16, 2020 · 6 comments
Closed

Import format changes based on line ending type #1566

VakarisZ opened this issue Oct 16, 2020 · 6 comments
Labels
bug Something isn't working Hacktoberfest help wanted Extra attention is needed
Milestone

Comments

@VakarisZ
Copy link

VakarisZ commented Oct 16, 2020

Import

from impacket.smb3structs import (
    SMB2_CREATE, SMB2_FLAGS_DFS_OPERATIONS, SMB2_IL_IMPERSONATION, SMB2_OPLOCK_LEVEL_NONE, SMB2Create,
    SMB2Create_Response, SMB2Packet)

checked with command python -m isort . -c -l 120 --wl 120 passes on dos line endings, but with unix line endings it fails because it expects

from impacket.smb3structs import (SMB2_CREATE, SMB2_FLAGS_DFS_OPERATIONS, SMB2_IL_IMPERSONATION, SMB2_OPLOCK_LEVEL_NONE,
                                  SMB2Create, SMB2Create_Response, SMB2Packet)

Visually, both line endings look the same, so I'd expect them to be treated the same (either as 1 or as 2 characters).

@timothycrosley timothycrosley added bug Something isn't working Hacktoberfest help wanted Extra attention is needed labels Oct 17, 2020
@anirudnits
Copy link
Collaborator

From my brief inspection, I believe the issue lies here

text = TextIOWrapper(buffer, encoding, line_buffering=True, newline="")

with newline="". From python docs:

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n',
'\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled,but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

I'm not sure if this the exact reason and I would delve a bit deeper when I get time.

@areeh
Copy link

areeh commented Oct 23, 2020

@anirudnits
Bear in mind that setting relates to isort's history of changing newlines, so while that might fix the 1 vs 2 character issue, it might bring back the changing line endings issue. See eg: #493 #671

@areeh
Copy link

areeh commented Oct 23, 2020

For reference, here's how Black appears to solve both issues at once (edited heavily for brevity):

with open(src, "rb") as buf:
    src_contents, encoding, newline = decode_bytes(buf.read())

dst_contents = format_file_contents(src_contents)

with open(src, "w", encoding=encoding, newline=newline) as f:
    f.write(dst_contents)
def decode_bytes(src: bytes) -> Tuple[FileContent, Encoding, NewLine]:
    """Return a tuple of (decoded_contents, encoding, newline).
    `newline` is either CRLF or LF but `decoded_contents` is decoded with
    universal newlines (i.e. only contains LF).
    """
    srcbuf = io.BytesIO(src)
    encoding, lines = detect_encoding(srcbuf.readline)
    if not lines:
        return "", encoding, "\n"

    newline = "\r\n" if b"\r\n" == lines[0][-2:] else "\n"
    srcbuf.seek(0)
    with io.TextIOWrapper(srcbuf, encoding) as tiow:
        return tiow.read(), encoding, newline

Note that they preserve newlines for output, and use only LF internally (which should give consistent formatting in this case)

Actual implementation in Black source code:
https://github.com/psf/black/blob/407052724fa1c97ee8bcd4e96de650def00be03e/src/black/__init__.py#L1012
https://github.com/psf/black/blob/407052724fa1c97ee8bcd4e96de650def00be03e/src/black/__init__.py#L806

@areeh areeh mentioned this issue Oct 23, 2020
@areeh
Copy link

areeh commented Dec 2, 2020

I abandoned my PR for a couple of reasons:

  • Black handles the newline issue in one place, I couldn't find a way to handle it without adding to StringIOs scattered throughout the lib (as a first time contributor, I want to limit my code changes)
  • I couldn't find a nice way to set newlines on the output when outputting to stdout (black explicitly decided not to handle this as far as I know)
  • the approach I was trying would give 1 type of newline for the entire output, which doesn't seem to match tests such as:
assert api.sort_code_string("import A\n\r\nimportA\n\n") == "import A\r\n\r\nimportA\r\n\n"

@timothycrosley
Copy link
Member

Hi @VakarisZ,

I'm sorry you experienced this error! This has been fixed in develop and will be released to PyPI in the 5.8.0 release of isort.

To address the thread about how newlines are handled that stemmed from this issue: isort can't do things exactly like Black does because it is streaming based, and doesn't load a full file at once. This is much more efficient and scalable but it does come with some costs. That said, isort does a good job of keeping track of the line separator used for a chunk of imports, and generally uses this information when checking line length - I tracked it down to a single place where the parsed line separator was ignored and \n hard coded, I then added an additional test case to ensure this doesn't pop up again.

Thanks!

~Timothy

@timothycrosley timothycrosley added this to the 5.8.0 milestone Mar 20, 2021
@VakarisZ
Copy link
Author

Thanks, @timothycrosley, you rock!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Hacktoberfest help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants