Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heredoc parameters are not parsed correctly #779

Closed
Sharpie opened this issue Jun 2, 2014 · 11 comments
Closed

Heredoc parameters are not parsed correctly #779

Sharpie opened this issue Jun 2, 2014 · 11 comments
Labels

Comments

@Sharpie
Copy link
Contributor

Sharpie commented Jun 2, 2014

I'm working on documenting some Ruby source code that makes heavy use of Heredocs as method arguments. However, YARD doesn't quite parse these correctly.

Given an example:

# @param foo_param [String]
def foo(foo_param = <<-EOS)
Some default string
EOS

  puts foo_param
end

# @param bar_param [String]
def bar(bar_param = <<-EOS
Another default string
EOS
)

  puts bar_param
end

However, YARD ends up capturing a couple extra lines on each side of the Heredoc which results in the following output:

yard_heredoc

The interesting part is that Ripper.sexp is able to correctly parse the parameter value out of the source:

> Ripper.sexp <<-EORUBY
* # @param foo_param [String]
* def foo(foo_param = <<-EOS)
* Some default string
* EOS
* 
*   puts foo_param
* end
* EORUBY

=> [:program,
 [[:def,
   [:@ident, "foo", [2, 4]],
   [:paren,
    [:params,
     nil,
     [[[:@ident, "foo_param", [2, 8]],
       [:string_literal,
        [:string_content,
         [:@tstring_content, "Some default string\n", [3, 0]]]]]],
     nil,
     nil,
     nil]],
   [:bodystmt,
    [[:command,
      [:@ident, "puts", [6, 2]],
      [:args_add_block, [[:var_ref, [:@ident, "foo_param", [6, 7]]]],
    nil,
    nil,
    nil]]]]

Also, running YARD with the --legacy parser seems to correctly parse the defaults. So, it seems that this may be a bug with the implementation of the RubyParser.

This behavior was observed using YARD 0.8.7.4 running under Ruby 1.9.3-p547.

@lsegal
Copy link
Owner

lsegal commented Jun 3, 2014

I'll accept a patch + test case if you can fix this.

@lsegal
Copy link
Owner

lsegal commented Jun 3, 2014

I'm also curious what your expectation is here-- I know there's an off-by-one error on the position, but I'm not really sure what we should be displaying in the case of foo. If we fixed the off by one, the signature would be:

foo(foo_param = <<-EOS) See default string EOS)

Which would not be what you want either.

@Sharpie
Copy link
Contributor Author

Sharpie commented Jun 3, 2014

I spent some time staring at the RubyParser code last weekend, but I don't know the internal mechanics well enough to determine a fix --- this is my first contact with the RubyParser code base. I'll keep trying, but it might take a while.

As for expected output, I would expect the YARD parser to return the same parameter value that Ripper does:

[[[:@ident, "foo_param", [2, 8]],
       [:string_literal,
        [:string_content,
         [:@tstring_content, "Some default string\n", [3, 0]]]]]]

That is: Some default string\n

@lsegal
Copy link
Owner

lsegal commented Jun 3, 2014

I guess I disagree with the expectation here. YARD isn't supposed to be displaying the interpolated string value, it's supposed to be displaying the raw source of your signature line. That's the logic we have implemented. We take the start offset of the method signature up until the end offset (right before the first body statement) and grab all of the source code between those two points-- that's how we get the result we do, and that's how we do it every time.

Basically, if "<<-EOS ...." is in the signature line, that's what YARD should be showing, and that's what it is showing (just with an off-by-one error). This is why I think you're not going to get what you want even if we fix the offset bug. It's the legacy parser that is doing something funky here by changing the resolved syntax of heredoc tokens when tokenizing. That's old behavior.

The only way to make your expected behavior work in the new ripper backed parser would be to not display the source between the "def" and the body statements and instead interpolate tokens. That seems unlikely to happen.

FWIW you can work around this with an @overload tag:

# @overload foo(x="Hello")
def foo(x=<<-EOS)
Hello
EOS
end

You can, of course, also not use heredocs in signature lines, as you want your signature to be a one liner anyway.

@Sharpie
Copy link
Contributor Author

Sharpie commented Jun 4, 2014

I've been using function definitions as an example, but my eventual goal is to parse Puppet Function Definitions using a custom YARD handler. The situation is very similar to the methodify example in the YARD DSL guide.

However, Puppet Functions have the unfortunate quirk of having their docstring passed to the method that creates them:

module Puppet::Parser::Functions
  newfunction(:validate_bool, :doc => <<-'ENDHEREDOC') do |args|
    Validate that all passed values are either true or false. Abort catalog
    compilation if any value fails this check.

    The following values will pass:

        $iamtrue = true
        validate_bool(true)
        validate_bool(true, true, false, $iamtrue)

    The following values will fail, causing compilation to abort:

        $some_array = [ true ]
        validate_bool("false")
        validate_bool("true")
        validate_bool($some_array)

    ENDHEREDOC

    # Function body ...

  end
end

The idea is to write a YARD handler that extracts the docstring from the method call and attaches it to the object:

class ParserFunctionHandler < YARD::Handlers::Ruby::Base
  handles method_call(:newfunction)

  process do
    func_name, docstring = process_parameters(statement.parameters(false))

    obj = MethodObject.new(:root, func_name)
    register obj
    register_docstring(obj, docstring, nil)
  end
end

However, I got confused when I noticed that statement.parameters included the heredoc delimiters while Ripper.sexp statement.source only included the string content. The behavior of the RubyParser makes sense if the goal is to extract the exact signature definition from the source and not just the string contents.

For my usecase, I may be able to assume that if the docstring starts with <<- a line should be removed from the beginning and end of the docstring. Does this sound like a reasonable assumption?

@lsegal lsegal added the Bug label Oct 26, 2014
@MSP-Greg
Copy link
Contributor

@lsegal Well, given that this is an old 'Bug' and it's parser related, I thought I'd take a look at it. Messy.

I guess I disagree with the expectation here. YARD isn't supposed to be displaying the interpolated string value, it's supposed to be displaying the raw source of your signature line.

I'm not sure I agree. For one thing, you could have two methods with identical signatures (using heredocs), and if they had different indentation, YARD would display the default value differently, which would be very odd. We/I might be able to get around that issue by throwing the heredoc string to Ripper (as noted above), but I'm not sure. I assume eval is not something we want anywhere...

What's more of a PITA is that you could have a signature like

def foo(foo_param = <<~EOS.rstrip)
  Some default string

  EOS
end

Displaying the above as a default value is messy, as maybe it could be something like

'Some default string\n\n'.rstrip

I will look at that, but I may have to move out of RipperParser and into ParameterNode simply because the code would be too burdensome for general parsing. I'll also check heredocs for cvar and constant values.

Lastly, re #894, which really involves unary operators, I tried cascading two of them (~ and -), and my code broke. So that PR might be later in the week.

Any thoughts about moving all these tests to ripper_parser_spec.rb?

@lsegal
Copy link
Owner

lsegal commented Aug 17, 2016

Displaying the above as a default value is messy, as maybe it could be something like 'Some default string\n\n'.rstrip

YARD does not re-format code at the parser or AST level. This is something a handler could do (in #format_args), but I don't see a need to support it (this issue would not benefit from this functionality). If you really have such complex signature lines, @overload solves this just fine. Presumably your @overload would remove this extraneous string information, since you probably don't want it in the signature anyway. In the provided use-case, it seems clear that the heredoc data actually belongs as documentation data, not signature data, so such an @overload probably would omit the heredoc anyway.

Any thoughts about moving all these tests to ripper_parser_spec.rb?

As mentioned on another issue, RipperParser is an internal class. RubyParser is the public API. YARD's general convention is to only unit test public methods unless there is an exception. I see no reason to make an exception here. There are already tests for things like this in spec/parser/ruby_parser_spec.rb.

@peterhuene
Copy link

I've found the following code in the YARD Ruby parser that is the source of our heredoc woes:

https://github.com/lsegal/yard/blob/master/lib/yard/parser/ruby/ruby_parser.rb#L450-L452

With this example Ruby code:

class Example
  # Does something
  # @param [Object] x Something.
  # @param [String] y Something else.
  # @return [void]
  def self.foo(x, y = <<-HEREDOC)
this is the default value
HEREDOC
  end
end

If I break on line 450 in ruby_parser.rb, node.inspect evaluates to s(:string_literal, s(:string_content, s(:tstring_content, "this is the default value\n"))) (that's the desired representation for our use case, at least).

However, node.source_range (145..188) and args[0].source_range (155..180) are not in agreement, resulting in the execution of the conditional which changes the range to 146..187 and truncates the heredoc on both ends.

Thus after the source range is altered, node.inspect evaluates to s(:string_literal, s(:string_content, "<-HEREDOC)\nthis is the default value\nHERED")).

I'm not sure why this conditional exists since there's no comments explaining it, but obviously this doesn't handle heredocs well.

@lsegal
Copy link
Owner

lsegal commented Sep 13, 2016

If I break on line 450 in ruby_parser.rb, node.inspect evaluates to s(:string_literal, s(:string_content, s(:tstring_content, "this is the default value\n"))) (that's the desired representation for our use case, at least).

It's not the desired representation for YARD. See above for the full discussion on why not, but ultimately, YARD is actually correcting (or trying to correct) the otherwise incorrectly ranged string_literal node, which should point to the beginning and end positions of the entire string declaration, including delimiters. It is the string_content node that is meant to hold only the string data, and it sounds like you want YARD to only select string_content out of a string. Unfortunately, the expected behavior here is for YARD to display the method signature as it exists in source, which is fundamentally at odds with how these examples are formatted.

I'm not sure why this conditional exists since there's no comments explaining it, ...

You can check the git blame information about commits for comments explaining them. This one in particular fixed a related parsing bug in bd35156.

but obviously this doesn't handle heredocs well.

You're right about this part. The fix was a naive implementation and assumed single character token delimiters wrapping the string_literal node. That should indeed be updated to support arbitrary delimiter tokens, though fixing this will not change the overall expectation.

@peterhuene
Copy link

peterhuene commented Sep 14, 2016

Obviously it's your call as to what is a correct representation for YARD, but as a user I don't care how a string literal is represented in source, just what the value of the string is (that's why I'm looking at the docs and not at the source...). At any rate, we'll work around this on our side. Thanks for the follow-up!

danielparks added a commit to danielparks/puppet-strings that referenced this issue Oct 2, 2022
This changes most heredocs to use the tilde syntax with variable
interpolation off. It indents the contents of the heredoc along with the
final symbol.

To maintain these changes, it enables `Layout/HeredocIndentation` and
`Layout/ClosingHeredocIndentation` in Rubocop.

The tilde syntax smartly strips the indentation, and allows for
different indention of the closing identifier:

```ruby
"test\n" == <<~'END'
    test
  END
```

This syntax was introduced in Ruby 2.3.0, so this will not pose
compatibility problems.

### Tests that should fail

Note that two tests couldn’t be updated to a more common heredoc format
to avoid failing.

It appears there is a problem with the way YARD parses parameters with
defaults with a heredoc ([YARD #779][]). @Sharpie wrote a work-around
for that issue in `PuppetStrings::Yard::Handlers::Ruby::Base`, but it
can’t work in these cases because YARD doesn’t provide enough source to
read the whole heredoc content.

Given that these are tests of the Puppet 3 style functions, it’s
probably not worth pursuing this issue.

[YARD #779]: lsegal/yard#779
danielparks added a commit to danielparks/puppet-strings that referenced this issue Oct 2, 2022
This changes most heredocs to use the tilde syntax with variable
interpolation off. It indents the contents of the heredoc along with the
final symbol.

To maintain these changes, it enables `Layout/HeredocIndentation` and
`Layout/ClosingHeredocIndentation` in Rubocop.

The tilde syntax smartly strips the indentation, and allows for
different indention of the closing identifier:

```ruby
"test\n" == <<~'END'
    test
  END
```

This syntax was introduced in Ruby 2.3.0, so this will not pose
compatibility problems.

### Tests that should fail

Note that two tests couldn’t be updated to a more common heredoc format
to avoid failing.

It appears there is a problem with the way YARD parses parameters with
defaults with a heredoc ([YARD #779][]). There is a work-around for that
issue in `PuppetStrings::Yard::Handlers::Ruby::Base`, but it can’t work
in these cases because YARD doesn’t provide enough source to read the
whole heredoc content.

Given that these are tests of the Puppet 3 style functions, it’s
probably not worth pursuing this issue.

[YARD #779]: lsegal/yard#779
danielparks added a commit to danielparks/puppet-strings that referenced this issue Oct 4, 2022
This changes most heredocs to use the tilde syntax with variable
interpolation off. It indents the contents of the heredoc along with the
final symbol.

To maintain these changes, it enables `Layout/HeredocIndentation` and
`Layout/ClosingHeredocIndentation` in Rubocop.

The tilde syntax smartly strips the indentation, and allows for
different indention of the closing identifier:

```ruby
"test\n" == <<~'END'
    test
  END
```

This syntax was introduced in Ruby 2.3.0, so this will not pose
compatibility problems.

### Tests that should fail

Note that two tests couldn’t be updated to a more common heredoc format
to avoid failing.

It appears there is a problem with the way YARD parses parameters with
defaults with a heredoc ([YARD #779][]). There is a work-around for that
issue in `PuppetStrings::Yard::Handlers::Ruby::Base`, but it can’t work
in these cases because YARD doesn’t provide enough source to read the
whole heredoc content.

Given that these are tests of the Puppet 3 style functions, it’s
probably not worth pursuing this issue.

[YARD #779]: lsegal/yard#779
@lsegal
Copy link
Owner

lsegal commented Aug 26, 2024

Closing out some old issues. Given the significant age of this, it would be best to re-open with new context if it's still relevant.

@lsegal lsegal closed this as completed Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants