-
-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: checking common documentation errors for CI #1007
Comments
You should check out the "yardstick" gem which is built for doc linting. It might be a bit out of date but should be workable for you. You can also build your own linting using the API: http://yardoc.org/docs -- check the YARD::Docstring.after_parse callback for instance. As for generating without html, you want the |
@lsegal Thanks, I assume yardstick is the right tool (though it is underdeveloped for our needs... but that's how opensource works, right?.. I'll just make some PRs there) |
OK, I've investigated the matters for some time and have more focused question, about this (and similar) code line: https://github.com/lsegal/yard/blob/master/lib/yard/docstring_parser.rb#L210 if library.has_tag?(tag_name)
@tags += [library.tag_create(tag_name, tag_buf)].flatten
else
log.warn "Unknown tag @#{tag_name}" + # <== HERE!
(object ? " in file `#{object.file}` near line #{object.line}" : "")
end What is my concern about it? It is that So, to my proposal. What do you think about introducing structured warnings mechanizm in YARD, so that aforementioned line will look like: Warnings.add 'UnknownTag', tag_name, object Then, default behavior would be the same: just print warnings the moment they appear; but that approach allows several good things to emerge:
Would you accept PR like that, being it properly done and tested?.. |
I'm not opposed to this, but consider that a tool like yardstick can already implement this fairly easily by overriding the default docstring parser: module Warnings
@@warnings = {}
module_function
def add(type, text)
(@@warnings[type] ||= []) << text
end
def print
if @@warnings.size > 0
puts "Warnings:"
@@warnings.each do |k, v|
v.each do |line|
puts " #{k}: #{line}"
end
end
end
end
end
class LintingDocstringParser < YARD::DocstringParser
def create_tag(tag_name, tag_buf = '')
if !library.has_tag?(tag_name)
# handle warnings for non-existent tag
Warnings.add 'UnknownTag', tag_name
return
end
begin
library.tag_create(tag_name, tag_buf)
rescue YARD::Tags::TagFormatError
# handle tag format error
Warnings.add 'InvalidTagFormat', tag_name
end
super
end
end
YARD::Docstring.default_parser = LintingDocstringParser
YARD::Parser::SourceParser.after_parse_list { Warnings.print }
##
## Parsing this code:
##
class Foo
# @parm hi
# @return b<String> desc
# @parmy hi
def foo; end
end If you run with That said, I wouldn't be opposed to the structured logging idea, but note that it's probably a large undertaking. If you do open a PR, I would recommend starting with a proof of concept on a few small modules to get the general API down before you attempt to convert the entire codebase. Also, consider that simply adding an extra def warn(msg, opts={})
if opts[:name]
(@warnings[opts[:name]] ||= []) << opts.merge(message: msg)
end
end Then you could use it as: log.warn "message here", name: "UnknownTag", tag_name: tag_name, object: object The latter keys being optional and specific to the warning metadata. That would retain the same basic API but augment logging to support structured data without breaking the API, or even introducing a new API. |
@lsegal Yes, I understand an option of "just rewriting parser" (and it is easier for me than properly contribute to YARD :)) but there are pretty important considerations:
OK, thanks! Will do it this way. |
You are correct, but my rebuttal to this would be that YARD already handles this responsibility sufficiently. In practice, all warnings that YARD generates are important; they tell you that "X was not parsed correctly". I don't know your exact use-case, but I don't see any reason to hide some warnings but not others, since they are all weighted equally in the context of parsing correctness. At a linting level, I could see certain things weighted higher, but to take the above example, for instance, an typo'd tag name is never something you'd want hidden (IMO). On that note,
Can you point to which warnings you believe are harmless? YARD shouldn't generate warnings for harmless things.
This shouldn't be the case. The implementation above uses the public API and isn't fixed to any YARD version so long as the API sticks around-- which is really the same issue as for any |
Basically, there are two absolutely different kinds of situation YARD is unhappy about:
Examples of first: # common Rails metaphor:
include Rails.application.routes.url_helpers # Undocumentable mixin
# common metaprogramming:
attr_reader *OPTIONS # Undocumentable OPTIONS
class SomeData < Struct.new(:kind, :subject, :reason) # Undocumentable superclass Examples of second:
In the former case, I've wrote code to work, I will not change it just for the sake of documentation. Now, when I've formulated it, maybe entire "structured warnings" solution seems like overkill to solve the thing! What I'd be happy to see, in fact:
|
I think this is something where I would disagree about severity, though I understand where you are coming from. These warnings shouldn't be as innocuous as you think. Let me unpack: attr_reader *OPTIONS # Undocumentable OPTIONS This is usually a really bad thing to do. Your documentation will completely omit attributes on the class. If they are public methods, it's even worse. I understand the code is technically correct here, but there's a difference between technically correct and properly documented. Put another way, the intent of your code is not being correctly expressed in your generated HTML documentation (or yri, etc). From an unobjective standpoint, that is a problem that needs correcting. class SomeData < Struct.new(:kind, :subject, :reason) # Undocumentable superclass YARD actually does recognize the above Struct subclass, but I understand your point. Let's consider instead: class SomeData < DatabaseTable.new(:id, :name, :email)
end This is an equally big documentation problem. SomeData will be documented as inheriting Object. Your API consumers will have no idea how to use the class. For public classes, this is almost equivalent to having no docs at all (especially if the class body is empty as above). You should really be seeing this warning and trying to deal with it so your API consumers can understand your codebase properly. The idea of technically correct DRY code and documentable code are often at odds in Ruby. I'm not sure it's worth being so tied to certain syntaxes when they affect documentability. This is obviously a personal opinion, but if you can get proper docs by just writing out your attribute names, the minimal amount of duplication has big rewards. DRY should not apply to documentation-- repeating yourself is part of good documentation. YARD helps with keeping docs more DRY than say RDoc, but it still can't solve all edge cases. That's where plugins come in. Anyway, let me step back from the opinionated behavior for a second and point out that it actually sounds like you might want something completely different here: It sounds to me like what you really want, pragmatically speaking, is a way to error out a CI build if an error pops up that is important to you. We can actually separate this feature request away from logger output: if the warnings displayed but only certain warnings created a non-zero exit code, presumably the log output would be okay to have around? The way I see it, I still believe YARD should be warning you about the class (1) cases, but I agree that they can still avoid changing the exit code. If you agree with this, I would be open to (a PR) adding an exit code attribute somewhere in YARD that would be toggled on serious errors (like tag parsing and log.error calls) but would not affect UndocumentableErrors. I guess optionally users could opt-in to a strict mode where all warnings toggled the non-zero exit, but I'm okay with your class 1/2 distinction. In this scenario, logging would remain unchanged-- the exit code would not be tied to actual log output-- so you could still get warning text with a 0 exit code. And it sounds like that's really what you want, here. Finally, a It sounds to me like both of these together would give you enough control without needing to touch logging behavior. It's actually fairly easy to do, too, because there are only a few places you'd need to toggle the That said, a Hope that helps! |
About the first part of your reply: I understand your concerns, but we are speaking about really large (and pretty pragmatic) codebase, written by myriads of people. Although we are really aiming high standards, there are certain edge cases and complicated domains when many kinds of hard metaprogramming are absolutely necessary. Then, about documenting things, let's imagine two different situations:
So, my "class 1" errors are not unimportant (and undocumentability of some things should be fixed some way or another), my idea is that they should be skippable somehow (either with "it's bad, but we cannot do anything about it" or with "we document it another way" mindset). So, what it turns out of our discussion:
WDYT?.. Does it look more reasonable now? |
So here's what I think could be done fairly easily:
This would not immediately provide the behavior you want in YARD core, but it would give you the tools necessary to implement what you want quickly (including directive support for toggling warnings on/off in source). Sample implementation: require 'yard'
class DocLinter
class << self
# TODO: use this toggle in a Tag::Directive to toggle behavior on and off.
attr_accessor :enabled
end
self.enabled = true
def initialize
@exit_code = 1
@messages = Hash.new { |h, k| h[k] = [] }
# Add hooks
YARD::Logger.instance.on_warn(&method(:on_warn))
YARD::Logger.instance.on_error(&method(:on_error))
end
EXIT_CODES = {
'InvalidTagFormat' => 1,
'UnknownTag' => 1,
'NamespaceMissing' => 1
}.freeze
def on_warn(message:nil, code:nil, **extra)
return unless self.class.enabled
val = EXIT_CODES[code]
if val
@exit_code = val
# TODO: print custom log messages
@messages[code] << message
# stop the warning message from being printed.
# alternate API ideas:
# return false
# raise YARD::Logger::SuppressMessage
return :suppress
end
end
def on_error(*)
return unless self.class.enabled
@exit_code = 127
end
def print_messages
return if @messages.empty?
puts "Warnings (#{@messages.values.flatten.size}):"
@messages.each do |code, msgs|
msgs.each do |msg|
puts " [#{code}]:\t#{msg}"
end
end
end
def finish
print_messages
exit @exit_code
end
end
module DocLintInjector
def run(*)
lint = DocLinter.new
super
# YARD will do this:
lint.on_warn message: 'invalid tag', code: 'InvalidTagFormat'
lint.on_warn message: 'undocumentable superclass', code: 'Undocumentable'
lint.finish
end
end
YARD::CLI::Yardoc.send(:prepend, DocLintInjector) Unfortunately there's no good way to hook Yardoc execution (mostly only because of the fact that your $ ruby -r./path/to/doc_linter.rb -S yard I can add a separate TODO item for an class DocLinter
def initialize
# ... other stuff ...
YARD::CLI::Yardoc.after_run(&method(:finish))
end
end What do you think about this? |
Everything looks reasonable, but...
So, what am I imagining now (I know that during the ticket I am constantly changing those ideas, but that's how discussions working, right?.. I'm trying to find the compormise that all of us would be happy with). Things I am aware of, currently:
Surprisingly for me, even requirement for different warning levels is not that necessary in this scheme. Like, "don't Hm? |
I'm not sure we're really agreeing on premise here, so I really can't see myself following through to your conclusion points. Or rather, this point is technically correct, but I don't interpret YARD's current behavior as failing to properly ensure documentation correctness. I would argue the other way around: YARD is correctly telling you that your code is undocumentable, and should continue to tell you this for as long as the undocumentable code exists. YARD does not know that you've implemented a workaround; from its perspective, your documentation has correctness problems. In that interpretation, YARD is doing its job. I think where we disagree is the idea that better control of logger output is related to correctness. I don't think it is. Logger output in YARD is simply there to tell you what it is seeing for debugging and analysis purposes. To be clear, there are no plans to support a On the flipside, having redundant warnings when you have a legitimate documentation workaround in place does not in itself work against the goal of improving documentation. Frankly, a constant reminder that you're doing "weird things" to resolve your undocumentable code (i.e., everything except just rewriting the code in a documentable form) is probably a good thing. I would actually argue that by ignoring a warning, you're not just ignoring the log output, but you're ignoring the fundamental problem, which means you're not furthering the goal of improving documentation either (even if you might think you are). As you pointed out, the "third option is 'John rewrites the code'"-- yet it seems like you do not consider this an equally valid solution. Code should be self-documentable-- I understand Ruby's edge cases, but if you're writing meta-code to document I do understand that you're in a different boat, because it sounds like your team is pushing for documentation excellence. For that reason, I see why you are making the arguments you are. Realize, however, that you're very likely in a small minority of people who push that hard for documentation excellence. For YARD to raise the bar, it needs to focus on all users. I personally wish that the concept of a "documentation quality team" was a real and popular thing, but it's not. Perhaps if it becomes a thing, we can re-visit this issue and provide better support for those teams. Until then, YARD provides enough tooling and extensibility to provide more customized reporting for your specific project with very little effort. I think the best path forward here is so provide the minor updates to the logger so that a plugin could be written to do what you need. If the plugin proves that it is worthwhile to users, we can have a discussion about merging it into YARD core. I think this is the right path forward. Hope that makes sense! Thanks for bringing up this topic, although we don't agree 100% I know where you are coming from and I appreciate your focus on trying to improve things! |
OK, I think we've successfully reached "let's agree to disagree" point here :) For me, Ruby's metaprogramming is one of its precious parts, and discouraging metaprogramming for the sake of documentability is somewhat questionable. But yes, I understand your reasons and see your points, so, let's return to a pragmatic compromise discussion. So, as far as I can understand, "pragmatic compromise" would be just adding callbacks to the logger. In this case, I propose to stick "the simplest thing that should possibly work" and implement, for "first try", only But one additional change I'd be willing to do in this case: make warnings atomic. I mean: # Change thins like this: lib/yard/code_objects/proxy.rb:196
log.warn "Load Order / Name Resolution Problem on #{path}:"
log.warn "-"
log.warn "Something is trying to call #{meth} on object #{path} before it has been recognized."
log.warn "This error usually means that you need to modify the order in which you parse files"
log.warn "so that #{path} is parsed before methods or other objects attempt to access it."
log.warn "-"
log.warn "YARD will recover from this error and continue to parse but you *may* have problems"
log.warn "with your generated documentation. You should probably fix this."
log.warn "-"
# Into this:
log.warn "Load Order / Name Resolution Problem on #{path}:\n"\
"-\n"\
"Something is trying to call #{meth} on object #{path} before it has been recognized.\n"\
"This error usually means that you need to modify the order in which you parse files\n"\
"so that #{path} is parsed before methods or other objects attempt to access it.\n"\
"-\n"\
"YARD will recover from this error and continue to parse but you *may* have problems\n"\
"with your generated documentation. You should probably fix this.\n"\
"-"\
# ...or something like that (with warn, for exmple, properly formatting it with indents to
# read like one message) This way it would be one call to callback instead of 9 "independent" ones. Is this acceptable? |
This was actually just resolved the other day: de13266#diff-79d0d073dc872cfaaa6870d2fa4f29ebR168 There are a bunch of others to fix up too, though. |
👍 Then, I just do |
I'm actually working on an API for the logging stuff. I'll have something pretty soon to show! I think it will be very easy to build plugins off of. |
Oh, OK! JFYI, I'm really willing to help wherever I could be useful. |
@zverok Well, I'm working on integrating the following gist into the Logger class. Feel free to play around with it and provide comments or even start filling in tests for that API! https://gist.github.com/lsegal/82c9ddf036ea2cf36111efc2f2f9f412 Basically the idea is that you register a code to a severity so YARD can track usage (and provide debug info if you typo'd a code name), where the core logging severity types (warn/debug/...) are also mapped to codes. You can do: log.add :my_custom_code, 'msg' or just a simple: log.add :warn, 'msg' And callbacks can be registered for a specialized code name or to all messages. The way to think about it is that you can register custom code names that simply map to regular severity levels, where the regular severity levels are default code names. I'm going to be working on wrapping the core #warn #error #fatal #info and #debug calls to also call the callbacks so you could still use the standard |
Gist was just updated to near final API, I'll try to integrate into |
@lsegal Now, it looks really useful and well-thought! So, in a short time we can expect all YARD's log messages to became structured, and callbacks implemented?.. One small notice: maybe here is a good place to use Ruby's infamous |
throw/catch is far too obscure to rely on for a general purpose public API, and fairly difficult to document. I'd rather use a |
You can check the |
Wow. That's quite an impressive work. I am starting to integrate it with our code practices immediately, maybe to find some "thin places". Thank you so much! |
Enables ability to transition YARD log calls away from inline log messages and log only the data required for callbacks to build the correct message. Example: log.add :unknown_tag, object: object, tag_name: tag_name Used by: log.on_message :unknown_tag do |data| data[:message] = "Unknown tag #{data[:tag_name]} in #{object}" end References #1007
@zverok just added the ability to use callbacks to modify logger output, so YARD can move away from ever even passing message parameters to def parse(*args)
...
rescue LoadOrderError => err
log.add :load_order, :error => err
end
# in a central logging message file
log.on_message :load_order do |data|
data[:message] = "the big long load order error warning"
end This gets us almost all the way to your original proposed |
@lsegal Looks good! My attempts to use the new functionality also seem successful. The class looks like this. Sample output:
(And it exits with 2 on errors, 1 on warnings and 0 on what I'm calling "notices" here) The only thing I'm disliking currently is a need for pretty cumbersome code from here and below, to re-format YARDS "free form" messages into more regular ones. |
OK, everything works extremely well with this branch! I have solid validator constantly working in production, and it is really useful for us, so, we are grateful to the sky! When this branch will be released, I'll opensource the tool we build and hope it will be useful for community. |
...And when/if it will be merged, we could opensource our document checker, it is already gem-quality... |
...So we'll have two competing YARD checker gems :))) |
@zverok I would be happy to join forces and contribute back the code I'm writing.. it's not a wheel worth reinventing over and over :D |
Yeah, I am thinking about the same of course. Just let's hope the |
Well... As @lsegal seem not being a big fan of this branch of events... I've finally managed to do the independent patch/plugin gem for structured logging and documentation validations. |
Great read. Thanks for the efforts @zverok |
Hey.
What I'm trying to achieve currently (on large codebase constantly changed by hundreds of people) is adding continuous integration check for YARD docs. What should be checked, ideally:
@param
vs@params
, documentation for parameter that is not there and so on);Let's skip levels 1+ for now completely (just a mention for future discussions) and focus on level 0.
yard
command currently has two levels of error reporting (amirite?):[error]
(can't continue parsing this file) and[warn]
(file is parsed but not everything is allright). Most ofwarn
are indeed useful, so, it almost looks like "if there are any warnings, YARD CI step is failed", but there is one case, looking like this:As far as I understand, "official" solution for this is "just ignore all warnings", which also gets rid of things like
"Unknown tag @params"
,"Cannot resolve link to"
,"@param tag has unknown parameter name"
which are pretty useful, to say the less.So, to be honest, my "ideal" feature request would be:
yardoc
option;yardoc --no-save
?)....but I'm open to any advice and suggestion :)
(Also, I'm not afraid and even willing to provide PRs and other contributions, if it could be appropriate.)
The text was updated successfully, but these errors were encountered: