-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Literal carriage returns break parser under Python 3.x #153
Comments
Yeah, I think we should not use universal newlines. |
Sorry for the delay. The last couple of days were distracting. I think I know where the problem is, but I'll need to find a moment to test my hypothesis before I can give you a definitive answer. Also, I can't say for certain without reading the Vim source code to try to match whatever it does, but I'm guessing that the proper solution for detecting newlines would be to check for (I'll have to try generating a test Failing that, maybe splitting on |
I didn't have time to check vimlparser yet, but I did find a moment to look up Vim's documentation on the line-ending detection algorithm that we probably want to replicate: From
In other words, assuming that
For step 4, the best way to eliminate surprises would probably be to use |
That said, step 4 may not even be necessary because, a little further down, it looks like it's saying that it uses a simplified form of that algorithm for VimL that's sourced or in vimrc:
(Which, now that I think about it, would make sense. I'm on Linux and had to manually change the line endings from DOS to Unix on bwHomeEndAdv to get it to load properly.) In that case, I suppose the best solution would be to just follow that "systems with a Dos-like Sorry about not catching that before I made the previous post. I didn't sleep well and I'm just about to go back to bed. |
Sorry again for the delay. The problem is this function: def viml_readfile(path):
lines = []
f = open(path)
for line in f.readlines():
lines.append(line.rstrip("\r\n"))
f.close()
return lines ...and it works if experimentally changed to this: def viml_readfile(path):
lines = []
# Replicate Vim's algorithm for identifying DOS vs. UNIX line endings
# in sourced scripts and .vimrc files
with open(path, 'rb') as f:
content = f.read().decode('utf-8')
first_n = content.index('\n')
if first_n > 0 and content[first_n - 1] == '\r':
raw_lines = content.split('\r\n')
else:
raw_lines = content.split('\n')
for line in raw_lines:
lines.append(line.rstrip("\r\n"))
return lines ...though, to be honest, that's still broken and was always broken, because there's another bug in the Python version of the parser that I noticed while writing that. My unconditional
The proper solution to match Vim is tricky because, again, it depends on stuff you can reconfigure external to the script but it boils down to "Parse once using the system's default encoding as specified by My best guess at an algorithm that doesn't require a big table of regional encoding would be something like this:
That works for two reasons:
I'd have done that too, but the whole "search through the AST for |
The closest trivial hack I can think of would be to replace
Since all structural elements of VimL are 7-bit ASCII, that'll get you something that parses UTF-8 correctly and, if that fails, you instead get a correct AST but any tokens containing non-ASCII characters will be assumed to be latin1, even if that makes them gibberish. (Of course, not irretrievable gibberish. Someone could always do |
Does the
|
Sorry for the delayed response. I'll try to get you an answer soon. |
No. Substituting To avoid this back-and-forthing, here's a test file, containing the relevant two lines bit of my vimrc, plus a few extra lines of context to help make sure things are working properly: Vim parses it properly, as does vimlparser when run with Python 2.x, or when my proposed "Replicate Vim's algorithm for identifying DOS vs. UNIX line endings in sourced scripts and .vimrc files" modifications are run with Python 3.x. Otherwise, vimlparser with Python 3.x will produce the somewhat amusingly confusing failure mode of appearing to output nothing at all because the error message doesn't use If you use
For that reason, I'd also suggest changing the raise VimLParserException(Err(viml_printf("E492: Not an editor command: %s", self.reader.peekline()), self.ea.cmdpos)) ...that removes the need for
However, the successful output still looks truncated unless you run it through something like |
This chunk from my
.vimrc
causespython3 py/vimlparser.py ~/.vimrc
to error out on my system.Since this didn't show up in my initial search for VimL parsers for Python, I started work on a VimL parser of my own and I know why it happens.
The literal
<CR>
s in this piece of my.vimrc
(the^M
s)......get converted to
\n
by the universal newline support in Python 3.x's text mode. (I assume because a bare\r
is the old Classic MacOS line terminator and Python's universal newlines mode is converting all potential newlines rather than limiting itself to whichever kind of newline it encounters first like Vim itself does.)The quick hack I came up with to make the parser robust was to open the file in binary mode and then manually decode and split lines:
The text was updated successfully, but these errors were encountered: