-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new tidy check to ensure that rustdoc DOM IDs are all declared as expected #86178
Add new tidy check to ensure that rustdoc DOM IDs are all declared as expected #86178
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly prefer to avoid parsing free-form Rust code; rather something like the feature gate declarations which are in an ad-hoc but simple(r) DSL seem preferable, along with the requisite // START-DOM-ID-DECLS
or so section, rather than looking for the function name.
I'm wary that the test as-is seems to reach into rustdoc's HTML generation and broadly speaking feels pretty invasive in terms of trying to match things, but at the same time feels pretty unlikely to be "good enough" to actually close out this problem. Do we have cases where not having this tidy check led to problems (e.g., issues, discussions, etc.)? Maybe some broader discussion amongst the rustdoc team that indicates that both a) this is a real problem and b) this solution is going to solve it sufficiently and not cause more pain than necessary?
The second commit of this PR adds a few declarations in the id map, but it's not clear to me what value that has precisely -- it doesn't seem like the generated HTML today had any problems, so that fix is not fully clear to me as to its purpose (especially when viewed in comparison to the declaration of the id "help" as ignored, rather than adding it to the map)... |
It's prevention. It doesn't have issues in the std/core/test docs because we check them, but it very likely has issues in some other crates. The goal of this ID map is to replace duplicated IDs because you can't have the same ID used twice on the same page (it's invalid). My other comment above explain a bit more. I'll update to fix the comment. |
I understand that in theory this prevents bugs, but if we've never encountered a bug due to this I'm a little hesitant to start parsing arbitrary Rust code in tidy via regexes to try to catch it. cc @rust-lang/rustdoc |
I am not super familiar with this part of the code, but I think I lean towards Mark's perspective, if we've never had bugs related to this it seems silly to spend lots of effort on it. |
I also lean towards Mark's perspective. Parsing rust code this way seems rather invasive and I don't think that's how we should achieve this, and I'm unsure if this check is necessary in the first place. |
We did, quite a few times. It's just that I sent PRs to add the ID in the ID map and that stopped there. But it happened. I don't have enough time to start writing checks if I don't think they're useful. ;) However, maybe the approach here isn't the right one. If you have another idea on how to enforce it, I'd be very happy to hear it! (Parsing rust source code with regex is really not great...) |
07e0162
to
6584dbf
Compare
Updated/rebased: a new ID was created and wasn't added to the ID map ( It's an anchor issue. For a long time, we detected them in the std when the html check we have detected that a page has the same ID more than once. We still do, but it actually doesn't cover everything, far from it. From time to time, someone tells me they found a conflict in IDs. In such case, I simply add the ID to the map and then it's fixed when the version is available to the user. However, it happened quite a lot of time. At some point we even had two ID maps which were kinda working together but it was a huge mess. I simplified things a bit recently. Then @jsha started reworking the front-end a lot and I realized that IDs have been modified/created (and forgotten to be added to this ID map, creating new potential issues). So to prevent this problem to go unnoticed until someone comes to me directly, I wrote this check. Now I completely agree that parsing Rust (and now HTML too because of the template) with regex is very suboptimal to say the least, but I didn't find a better idea... We need this test, but like I said, if you have a better idea on how to check the IDs, I'd be very happy to hear about it! |
6584dbf
to
708711b
Compare
https://github.com/rust-lang/rust/pull/86178/files#r649044525 seems like a good idea to me, we could use JSON or something something so it's easier to parse. Parsing a 15 line file once per run doesn't seem too expensive. |
But this is half the problem like I said: we still need to check in the rustdoc source code that IDs are actually used, and that IDs in the rustdoc source code are declared inside the map. Making their declaration better is only a small part of the problem. |
@GuillaumeGomez I don't know how to test all the IDs are used, but you can check they're in the map by forcing all the IDs to come from the map, something like this:
|
But unfortunately that doesn't work inside the tera template files. Also, it doesn't prevent someone to use |
Does your PR currently work for tera files?
Sure, but it's a lot easier to catch "you used an id without going through the |
Yes. I simply added the
But that would still require this regex to check it. ;) Also, we currently don't check in js because there is simply no way to do so... |
… r=jyn514 Clean up rustdoc IDs I cherry-picked the commit from rust-lang#86178. It adds missing rustdoc IDs (for the HTML) and remove unused ones. cc `@camelid` r? `@jyn514`
☔ The latest upstream changes (presumably #86920) made this pull request unmergeable. Please resolve the merge conflicts. |
708711b
to
03ed00e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still a little uncertain that this is worthwhile to pursue, but if we do, I would like some comments a the top of the tidy file added describing the failure modes and how we're handling them. For example: new way of generating IDs is added to rustdoc. How do we catch that? (Or, if we don't, that is useful to document too).
I think some discussion of the goals would also be useful. At least, embedding
We need to declare the IDs in a map in rustdoc to prevent the possibility to have conflicts when generating DOM. Until now, it was checked manually, which was very "unsafe", as the second commit of this PR proves it.
into the tidy check top-level doc comment would already be useful, but expanding on what is unsafe about it (i.e., what we're trying to prevent) and how this mitigates that would be good, I think. Part of the problem is that the rustdoc ID map is not documented anywhere (that I can find quickly) which means that there's no basis for understanding its purpose without doing some digging around and having background knowledge of HTML's ID "limitations".
check_id(path, trimmed.split('"').skip(1).next().unwrap(), ids, line_nb, bad); | ||
is_checking_small_section_header = None; | ||
} | ||
} else if trimmed.starts_with("write_small_section_header(") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, it is currently too easy for this function to be renamed or argument order changed -- there's no comments that I can see indicating it's special. I think we should at least have comments to the definition. But still, parsing arbitrary Rust code seems worrisome (easy to get wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GuillaumeGomez Could you add a doc-comment to write_small_section_header()
saying that calls to it are parsed by tidy?
Also, this check seems somewhat fragile since it will only check calls that don't have any non-whitespace characters before them. Could you instead use something like this?
} else if trimmed.starts_with("write_small_section_header(") { | |
} else if trimmed.contains("write_small_section_header(") && !trimmed.contains("fn write_small_section_header(") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the doc comment and updating the check.
46d9dcd
to
0cf8877
Compare
Good point, I updated the PR description. |
Can you explain this part to me? In what situations can we have ID conflicts? IIUC, all the IDs in I would expect that only "dynamic" IDs (e.g., |
It's when we generate HTML from markdown (for titles). In order to avoid IDs to be duplicated, we list the ones used by rustdoc (the "static" ones, not the "dynamic" ones). Basically, the ones that don't go through our ID map if we don't put them ourselves. |
Ah, that makes sense, I forgot about headings 👍 |
I just realized that #85833 (comment) is a real-world example of the fragility of the |
Yep... I think it's better than nothing while we look for a better idea. |
ping |
I think this check might not work with some of the IDs that were landed in the scrape-examples PR. Closing and re-opening (to re-run CI) to see for sure. |
0cf8877
to
9375fde
Compare
@camelid CI passed but just in case I rebased on master. |
This comment has been minimized.
This comment has been minimized.
9375fde
to
802a293
Compare
☔ The latest upstream changes (presumably #91356) made this pull request unmergeable. Please resolve the merge conflicts. |
I'm going to close this PR for the reasons I listed on Zulip. Sorry for taking so long to follow up on this. |
We need to declare the IDs in a map in rustdoc to prevent the possibility to have conflicts when generating DOM. Until now, it was checked manually, which was very "unsafe", as the second commit of this PR proves it (second commit was moved into #86819 and merged).
cc @jsha
r? @Mark-Simulacrum