-
-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra trailing bracket when default value is negative #894
Comments
If |
Gritty internal details from |
Hmm. The Ruby parser (which is reused for I can see that the scanner event for the |
I am just experimenting with I don't remember my parser theory clearly enough to know if this is the expected behavior for a LALR parser or not. But it is certainly plausible that the parser might need to use lookahead in some cases, but in other cases not. It's clear that YARD's approach of using the lexer events to determine "how far" it has reached in a source file, and how much of the textual source code "belongs to" an AST node, is very brittle. |
I can confirm this. The problem is that lexical tokens and AST nodes don't line up cleanly due to the order that Ripper emits events-- there's a lot of juggling and assumptions around the way source is reconstructed from these events. If someone knows a better way to track locations, I'm all for a rewrite of |
I've just been looking at I don't think it can. That is a serious deficiency. I also looked at Rubinius' parser, However, it looks like the |
I'm not a fan of bringing in unnecessary required dependencies, and it looks like |
I think the first order of business is to refactor The parser events generated by Unfortunately, this is exactly what YARD does. I would like to refactor Once that is done, it will be relatively easy to swap out |
I guess I don't really agree with where effort and complexity is being placed on solving this problem. It seems like there are simpler ways to solve this in a reasonable way than rewriting 500+ LOC, negatively impacting performance, and introducing a number of breaking API changes, in order to fix an extraneously emitted close paren character. More importantly, to the breaking changes point (and something I admittedly forgot to mention), is that Ripper cannot [easily] be ripped out of YARD. The entire handler architecture API is bound to it. Any change that removes Ripper would have to come with a compatibility layer to bring back pretty much all of the AST production nodes that Ripper produces, because that AST is the effective API for YARD's handler arch. Furthermore, those "extraneous" nodes actually are meaningful in many cases-- any AST that scrubbed out, for instance, how comments line up on columns, or removed extraneous whitespace nodes, could actually affect YARD's ability to pull comments from source-- or a plugin's ability to do the same. So, unfortunately, it's a non-starter to try and change the AST structure at this point in time. To be honest, there are tremendous benefits to sticking with Ripper, even if it is occasionally awkward, due to the fact that (a) it is supported as part of Ruby's core codebase, which means it's not going away, and (b) the fact that it's coupled to the grammar means it will always reflect the current syntax of Ruby in an accurate way. You can't get that from a third-party library like Refactor the code to be easier to maintain, sure, but I don't think switching libraries is an option right now. |
I've done some benchmarking and am convinced that none of the alternatives to What I am referring to as "extraneous nodes" are things like Removing unneeded nodes from the AST (actually, not generating them in the first place) will make your code more robust. Why? Because although you don't really care about those nodes, when This leaves the question of how to make I'm trying to think of a solution which is robust and clean, and won't require a ton of new code. The best thing I can think of right now is to see if the MRI guys will accept a patch which makes |
Actually, this is my point. Either one of these nodes may be important to certain plugins. Since YARD is a generalized library that can be used for code analysis, being able to easily identify empty method definitions (by quickly creating a handler that looks for I would suggest looking at this the other way around. Instead of treating these nodes as superfluous, I think it's more appropriate to treat YARD's AST implementation as incomplete. It is clear that YARD does not currently support all nodes made available via Ripper. Typically when implementing a full AST abstraction in an OO language, you would have one class per node, each with its proper attributes and behaviors modeled directly in code. Because we're relying on an s-exp form in this case, we're taking a lot of shortcuts for the sake of maintainability but at the cost of a complete robust implementation. The real solution here is to have a complete picture of Ripper's AST-- even if we ripped out "unnecessary" nodes, YARD is still not there yet. As for YARD's failing tests, if YARD is failing due to new nodes popping up in an otherwise backwards-compatible way, that's a deficiency in our implementation and should be fixed by being more accepting of new node configurations. If we start blacklisting nodes like
There are, you can look through the git blame / history on the file to see adjustments made in the past to deal with this kind of issue.
This is a pragmatic solution, IMO, and I've thought about trying that before. If Ripper gave information about source locations for semantic AST nodes, a ton of code could be ripped out of YARD. Even if it just provided a better way to collate token events with the nodes they are associated with, that would help a great deal. It would help plenty of other Ruby devs, too. |
There are several points I would like to respond to above, but let me ask this first: If the Ruby core devs are willing to accept a patch to I would like it if An alternative would be to move the invocation of "scanner" events out of the lexer, and into the parser. So rather than invoking scanner events as the tokens are recognized, invoke them as the tokens are reduced into parse nodes. This would make it totally unambiguous which tokens are part of which AST nodes. Any thoughts? |
Hmm, interesting discovery. The "tokens" passed to
|
Update - working on this, looking for error patterns.
Notice the repeated comma in the parsing of 1st parameter.... |
@lsegal Think I've got patch. Is there anywhere else in YARD where a parameter, value, whatever may be parsed? I'll check constant & cvar values, but I've seen |
@lsegal Well, I've got the issue fixed with named & unnamed parameters. But, there maybe issues with
I fed it an absurd method parameter string (every kind there is), and it may need some updating. Would you be opposed to moving that method into Most of its functionality comes from calling methods in Obviously, it will work either way. |
Generally speaking, yes, since the AST is not responsible for (re-)formatting its own nodes back into Ruby code. In fact, YARD doesn't even really deal with reformatting Ruby code except for this one edge case, which is specific to that particular handler, not the parser. I would recommend keeping it there, which avoids deprecation and still works correctly. |
@lsegal a test file of the fix, runs benchmark and rspec against current and patched, is located at: Along with the tests in the files, the patch passes YARD tests with 2.3.2 & 2.2.4 |
Do you mind opening this as a pull request (with just the fix / tests)? That would be easier to review and merge. |
@lsegal Be happy to. Seeing as test code is more personal that even 'normal' code, and that I used the file for my own testing, and I wasn't sure if you wanted a benchmark test. The test code has one method removed that was my 'inspect' code. The PR is changing code in the
So, where would you like me to add the tests? |
@MSP-Greg RipperParser is an internal class so it doesn't have its own test file IIRC. Anything but the MethodObject would be appropriate, the closer to the code the better, but don't create a new test file just for that. I think RubyParser has some catch-all tests for just this kind of thing. You can functional / regression test this instead of a pure unit style thing. Unit testing a parser is kind of crazy work. |
Although I've never been involved with a 'code' parser before, I was somewhat involved with XML and WebSocket at very early stages (that might date me somewhat). XML was interesting because, at times, there were very 'serious discussions' between the document centric and the data centric groups (I was a data guy). One could say that Ripper is designed to parse 'human readable' code into structures / sets / lists / groups / etc that are machine readable. One could reassemble the parts from those structures, but that has another set of issues. So, YARD's trying to keep track of ranges in the source. But since the Ripper is not really concerned with the 'human readable' delimiters, there's always going to be edge cases. Hence, as they are discovered, I don't have an issue with writing tests (of any type) for them, especially with Ruby 3 on the horizon.
Maybe add a ripper_parser_spec.rb file and put these tests and the percent array literal tests in it? Use it as a parser catch-all for 'edge cases'? BTW, I'll post the PR tomorrow. |
We just hit this with Sord (issue here), almost finished an issue before finding this one :) I'll leave it below since I already wrote it all out. My issue descriptionI discovered this while working on Sord. (issue here) Steps to reproduceCreate this file: module A
def x(a = -1)
# code
end
end
Actual OutputThe method parameter's default value is Expected OutputI'd expect the method parameter's default value to be Environment details:
I have read the Contributing Guide. |
Closing out some old issues. Given the significant age of this, it would be best to re-open with new context if it's still relevant. |
For the following method:
This documentation is generated:
Note the extra bracket:
a(b = -1))
The text was updated successfully, but these errors were encountered: