Add support for byte and unicode Literal strings #6087

Michael0x2a · 2018-12-19T09:03:55Z

This pull request adds support for byte and unicode Literal strings. I left in some comments explaining some nuances of the implementation; here are a few additional meta-notes:

I reworded several of the comments suggesting that the way we represent bytes as a string is a hack or that we should eventually switch to representing bytes as literally bytes.

Basically, I tried experimenting with that approach but ultimately rejected it: I ended up having to constantly serialize/deserialize between bytes and strings, which I felt complicated the code.

As a result, I decided that the solution we had previously is in fact, from a high-level perspective, the best possible approach. (The actual code for translating the output of typed_ast into a human-readable string is admittedly a bit hacky though.)

In any case, the phrase "how mypy currently parses the contents of bytes literals" is severely out-of-date anyways. That comment was added about 3 years ago, when we were adding the fast parser for the first time and running it concurrently with the actual parser.
I removed the is_stub field from fastparse2.ASTConverter: it turned out we were just never using that field.
One complication I ran into was figuring out how to handle forward references to literal strings. For example, suppose we have the type List["Literal['foo']"]. Do we treat this as being equivalent to List[Literal[u'foo']] or List[Literal[b'foo']]?

If this is a Python 3 file or a Python 2 file with unicode_literals, we'd want to pick the former. If this is a standard Python 2 file, we'd want to pick the latter.

In order to make this happen, I decided to use a heuristic where the type of the "outer" string decides the type of the "inner" string. For example:
- In Python 3, "Literal['foo']" is a unicode string. So, the inner Literal['foo'] will be treated as the same as Literal[u'foo'].
- The same thing happens when using Python 2 with unicode_literals.
- In Python 3, it is illegal to use a byte string as a forward reference. So, types like List[b"Literal['foo']"] are already illegal.
- In standard Python 2, "Literal['foo']" is a byte string. So the inner Literal['foo'] will be treated as the same as Literal[u'foo'].
I'll add tests validating that all of this stuff works as expected with incremental and fine-grained mode in a separate diff -- probably after fixing and landing Add tests for Literal types with incremental and fine-grained mode #6075, which I intend to use as a baseline foundation.

This pull request adds support for byte and unicode Literal strings. I left in some comments explaining some nuances of the implementation; here are a few additional meta-notes: 1. I reworded several of the comments suggesting that the way we represent bytes as a string is a "hack" or that we should eventually switch to representing bytes as literally bytes. I started with that approach but ultimately rejected it: I ended up having to constantly serialize/deserialize between bytes and strings, which I felt complicated the code. As a result, I decided that the solution we had previously is in fact, from a high-level perspective, the best possible approach. (The actual code for translating the output of `typed_ast` into a human-readable string *is* admittedly a bit hacky though.) In any case, the phrase "how mypy currently parses the contents of bytes literals" is severely out-of-date anyways. That comment was added about 3 years ago, when we were adding the fast parser for the first time and running it concurrently with the actual parser. 2. I removed the `is_stub` field from `fastparse2.ASTConverter`: it turned out we were just never using that field. 3. One complication I ran into was figuring out how to handle forward references to literal strings. For example, suppose we have the type `List["Literal['foo']"]`. Do we treat this as being equivalent to `List[Literal[u'foo']]` or `List[Literal[b'foo']]`? If this is a Python 3 file or a Python 2 file with `unicode_literals`, we'd want to pick the former. If this is a standard Python 2 file, we'd want to pick the latter. In order to make this happen, I decided to use a heuristic where the type of the "outer" string decides the type of the "inner" string. For example: - In Python 3, `"Literal['foo']"` is a unicode string. So, the inner `Literal['foo']` will be treated as the same as `Literal[u'foo']`. - The same thing happens when using Python 2 with `unicode_literals`. - In Python 3, it is illegal to use a byte string as a forward reference. So, types like `List[b"Literal['foo']"]` are already illegal. - In standard Python 2, `"Literal['foo']"` is a byte string. So the inner `Literal['foo']` will be treated as the same as `Literal[u'foo']`. 4. I will add tests validating that all of this stuff works as expected with incremental and fine-grained mode in a separate diff -- probably after fixing and landing python#6075, which I intend to use as a baseline foundation.

ilevkivskyi

Thanks, looks good! I only have documentation requests. Also please check if this PR fixes/helps any of the following:
#2536
#5098
#3619

ilevkivskyi · 2018-12-22T21:54:30Z

mypy/fastparse.py

+
+    If 'unicode_literals' is true, we assume that `Foo["blah"]` is equivalent
+    to `Foo[u"blah"]` (for both Python 2 and 3). Otherwise, we assume it's
+    equivalent to `Foo[b"blah"]`.


This comment is misleading. IIUC on Python 3 Literal["blah"] is equivalent to Literal[u"blah"], but from the last sentence it looks like it is Literal[b"blah"].

Probably my confusion stems from using unicode_literals for both future import and the argument name. Maybe make them different?

ilevkivskyi · 2018-12-22T22:00:21Z

mypy/nodes.py

@@ -1231,10 +1231,12 @@ class StrExpr(Expression):
    """String literal"""

    value = ''
+    from_python_2 = False


Please document this.

ilevkivskyi · 2018-12-22T22:01:52Z

mypy/types.py

@@ -1368,6 +1383,23 @@ def __eq__(self, other: object) -> bool:
        else:
            return NotImplemented

+    def value_repr(self) -> str:


Please add a docstring.

…dard before the Python 3.4 release

It fouls up the tests under Python 3.4, and is immaterial to the substance of the test anyways.

Michael0x2a · 2018-12-28T23:20:53Z

@ilevkivskyi -- alas, I don't think this PR helps resolve any of those issues. The only real non-literal-types related change this PR makes is modifying the way mypy handles parsing "string inside strings" -- when parsing types like List["Literal['foo']"].

That said, I think we might be able to close #2536 though it seems like it's mostly a duplicate of #5098 at this point. (And also, the code snippet Jukka included at the end doesn't work because it's using function annotations inside of Python 2 code, not because of anything to do with strings or unicode_literals.)

The suggestion in #3619 may be eventually superseded by python/typing#208, not sure.

Michael0x2a added 2 commits December 19, 2018 01:00

Add a few more comments

87b7045

Michael0x2a mentioned this pull request Dec 20, 2018

Literal types tracking issue #5935

Closed

42 tasks

ilevkivskyi approved these changes Dec 22, 2018

View reviewed changes

Michael0x2a added 3 commits December 28, 2018 13:37

Replace upside-down smiley with a character added to the unicode stan…

8d4c425

…dard before the Python 3.4 release

Respond to code review

88f931f

Remove second smiley from test

acccf03

It fouls up the tests under Python 3.4, and is immaterial to the substance of the test anyways.

Michael0x2a merged commit fcee66d into python:master Dec 28, 2018

Michael0x2a deleted the add-support-for-unicode-and-byte-literal-types branch December 28, 2018 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for byte and unicode Literal strings #6087

Add support for byte and unicode Literal strings #6087

Michael0x2a commented Dec 19, 2018

ilevkivskyi left a comment

ilevkivskyi Dec 22, 2018

ilevkivskyi Dec 22, 2018

ilevkivskyi Dec 22, 2018

Michael0x2a commented Dec 28, 2018

Add support for byte and unicode Literal strings #6087

Add support for byte and unicode Literal strings #6087

Conversation

Michael0x2a commented Dec 19, 2018

ilevkivskyi left a comment

Choose a reason for hiding this comment

ilevkivskyi Dec 22, 2018

Choose a reason for hiding this comment

ilevkivskyi Dec 22, 2018

Choose a reason for hiding this comment

ilevkivskyi Dec 22, 2018

Choose a reason for hiding this comment

Michael0x2a commented Dec 28, 2018