-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change comment directive parsing #1149
base: master
Are you sure you want to change the base?
Conversation
Thanks for the PR! But because 1) the new maintainers are still learning the codebase, and 2) this PR touches a core part of RDoc, I'll hold off merging it until version 6.8.0 series is released 🙏 |
178917a
to
b17e948
Compare
b17e948
to
1b4b08d
Compare
1b4b08d
to
15fa499
Compare
lib/rdoc/comment.rb
Outdated
value_lines = lines.take_while do |l| | ||
l.rstrip[indent_regexp].size > base_indent_size | ||
end | ||
min_indent = value_lines.map { |l| l[indent_regexp].size }.min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be:
min_indent = value_lines.map { |l| l[indent_regexp].size }.min | |
min_indent = value_lines.min_by { |l| l[indent_regexp].size }.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be like this
value_lines = [' arg2,', ' arg3,', ' )']
indent_regexp = /^\s*/
min_indent = value_lines.min_by { |l| l[indent_regexp].size }[indent_regexp].size #=> 2
but I think map{}.min
is better
when :c | ||
private_start_regexp = /^(\s*\*)?-{2,}$/ | ||
private_end_regexp = /^(\s*\*)?\+{2}$/ | ||
indent_regexp = /^\s*(\/\*+|\*)?\s*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we store these static regexp objects in constants instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It already have an explicit name private_start_regexp
and I don't see much benefit of making it a constant.
C_PRIVATE_START_REGEXP = /regexp/
private_constant :C_PRIVATE_START_REGEXP, :C_...
...
when :c
private_start_regexp = C_PRIVATE_START_REGEXP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why can't we directly refer to C_PRIVATE_START_REGEXP
without defining locals for them 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because these locals are a layer to absorbs differences between types
when :ruby
private_start_regexp = /^-{2,}$/
when :c
private_start_regexp = /^(\s*\*)?-{2,}$/
when :simple
private_start_regexp = /^-{2}$/
end
...
line.match?(private_start_regexp)
...
value_lines = take_multiline_directive_value_lines(..., indent_regexp, ...)
I think it could be simplified to:
- Remove # or * depend on type first
- Use the same regexp for all types
But it will introduce incompatibility.
horizontal divider between two paragraphs in :simple
---
--
only two dashes starts private section in :simple
++
#-----
# Two or more dashes after # are private section start in :ruby
# Tested in RDocCommentTest#test_remove_private_long
#++
Removing spaces before *
instead of replacing *
to a space (needed to distinguish *---
from * ---
) also brings changes in a c comment where * are not aligned. (I think this is a small problem though)
/*
* :call-seq:
* foo(bar)
*/
line = lines.shift | ||
read_lines = 1 | ||
if in_private | ||
in_private = false if line.match?(private_end_regexp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to flip in_private
here? Can we add a comment explaining it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private_start_regexp (matches to --
) begins private section (in_private = true
)
private_end_regexp (matchs to ++
) ends private section (in_private = false
)
comment added
prefix_indent = ' ' * prefix.size | ||
line = line.byteslice(prefix.bytesize..) | ||
/\A(?<colon>\\?:|:?)(?<directive>[\w-]+):(?<param>.*)/ =~ line | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add param.strip!
here so we don't need to call param.strip
in multiple branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch 👍
changed to:
raw_param = directive_match[:param]
param = raw_param.strip
Unstripped raw_param is also used to reject :toto::
like in the old implementation in pre_process.rb
rdoc/lib/rdoc/markup/pre_process.rb
Line 112 in a253a8d
# skip something like ':toto::' |
lib/rdoc/comment.rb
Outdated
blank_line_regexp = /\A\s*\z/ | ||
min_spaces = lines.map do |l| | ||
l[/\A */].size unless l.match?(blank_line_regexp) | ||
end.compact.min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank_line_regexp = /\A\s*\z/ | |
min_spaces = lines.map do |l| | |
l[/\A */].size unless l.match?(blank_line_regexp) | |
end.compact.min | |
min_spaces = lines.map do |l| | |
l.match(/\A *(?=\S)/)&.end(0) | |
end.compact.min |
lib/rdoc/comment.rb
Outdated
lines.shift while lines.first&.match?(blank_line_regexp) | ||
lines.pop while lines.last&.match?(blank_line_regexp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stripping blank lines can't be done first?
lib/rdoc/comment.rb
Outdated
private def take_multiline_directive_value_lines(directive, filename, line_no, lines, base_indent_size, indent_regexp, has_param) | ||
return [] if lines.empty? | ||
|
||
first_indent_size = lines.first[indent_regexp].size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first_indent_size = lines.first[indent_regexp].size | |
first_indent_size = lines.first.match(indent_regexp)&.end(0) |
It can avoid to create a string for each time.
Fix comment directive parsing problem
Problem of comment parsing
The main problem is that
@preprocess.handle
parses comment, removes directive, and process code_object at the same time.This pull request change RDoc to parse comment and extract directives first, and then apply directives to code object.
Flow of legacy RDoc parsing method
For example parsing this code
Step 1
RDoc performs
@preprocess.hanlde
to RDoc::NormalClass.:category:
is applied to klass and replaced with blank line:not-new:
and:yields:
are replaced with blank line. maybe bug.:args: a, b
is replaced with:args: a, b
Step 2
RDoc performs
@preprocess.hanlde
to RDoc::AnyMethod.:args: a, b
is applied tometh.params
.Step 3
RDoc removes private section that starts with
#--
and ends with#++
.Step 4
RDoc normalizes comment by removing
#
and indentation.Step 5
RDoc extracts
":call-seq:\n initialize(x, y, z)
from comment and apply to method object.Problems
RDoc removes directives and expand
:include:
twice in some case, and once in other case.To avoid all directives removed in the first
@preprocess.handle
, preprocess needs directive-replace mechanizm which is making things complex.Private section and call-seq are processed later. This is making RDoc accept weird comment like directive inside private section and private section inside call-seq.
Handling meta programming method is also hard.
@preprocess.handle(comment, code_object)
requires code object already created.We need to parse the comment to know the code object type (method or attribute). After that, we can finally parse the comment with the code object.
C comments are also complicated. :include: can include text containing
*/
.Removing directive line and private section from the comment might remove
/*
and*/
which makes normalize_comment fail.The original implementation was avoiding this by using different processing order than ruby parser. This is not consistent.
Solution
We need to parse comment first and only once to extract directives.
Expand
:include:
, read directives (including:call-seq:
), remove private section at the same time.Comment parser should return normalized comment text and directives as an attribute hash. Directive should also contain line number.
Changed things
:call-seq:
New type of directive called "multiline directive" is introduced to make
:call-seq:
also a directive.Multiline directive ends with blank line. This restriction is for compatibility with old RDoc.
Some invalid multiline directive (unindented, ends with other directive) is also accepted with warning.
The resuld of parsing this call-seq is changed. I think it get better.
Private section
#----foobar
was accepted as private section start.#++++foobar
was decomposed to#++
(private end) and++foobar
(normal comment).Start is now
/^#-{2,}$/
(two or more -), end is now/^#\+{2}$/
(exactly two +).Unhandled directives
In old RDoc, unhandled directive
# :unknown: foo
remain in normal comment.Now it is removed just like other directives. Unhandled directive is appended to code object's metadata. It does not make sence to leave metadata in the comment. I think this was just a side effect of avoiding double parsing problem.
Normalize and remove private section
Everything is done in parse phase
C and Simple parser
C used to accept
/*\n# :directive:\n*/
but now only accepts* :directive:
.Changes for call-seq, private section and unhandled directive described above are also applied to C and Simple parser.
Old comment parsing
RDoc::Markup::PreProcess#handle
RDoc::Comment#extract_call_seq
RDoc::Comment#remove_private
is only used fromRDoc::Parser::Ruby
. We can remove them in the future.Diff (updated: 2025/02/02)
I compared generated html files of rdoc itself and in
ruby/ruby
.HTML meta tag content (ruby/ruby)
Files:
Example diff
OpenSSL/Timestamp/Factory.html (ruby/ruby)
This invalid document is parsed differentl
Win32.html (ruby/ruby, RDOC_USE_PRISM_PARSER)
This will no longer considered to be a private section(invisible comment surrounded by -- and ++)
History_rdoc.html (ruby/rdoc)
Parsing this part is improved.
The current document looks like
* :stopdoc:
and* :nodoc:
was processed as directive.lib/rdoc/markdown_kpeg.html (ruby/rdoc)
Maybe it shouldn't be documented.
https://ruby.github.io/rdoc/lib/rdoc/markdown_kpeg.html
RDoc/MarkupReference.html (ruby/rdoc, RDOC_USE_PRISM_PARSER)
<pre>:call-seq:
→<pre>:call-seq:
(trailing space removed)RDoc/Parser/Ruby.html (ruby/rdoc, RDOC_USE_PRISM_PARSER)
Escape of
# \:method: or :attr: directives in +comment+.
is now working.Note that this is related to an old bug in master branch