-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate detection of dead doc links #15257
Comments
I wonder if this might be something for @nodejs/website to figure out. |
IIRC @mikeal once said he created something for crawling every link found on a website? |
This is probably not enough, changing headings within a page causes the |
Maybe we can use puppeteer for this. |
Strawman with puppeteer for simple wrong hashes detection (intra links only): script'use strict';
const { URL } = require('url');
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const { href, origin, pathname } = new URL('https://nodejs.org/api/all.html');
await page.goto(href);
const wrongLinks = await page.evaluate((mainOrigin, mainPathname) => {
return [...document.body.querySelectorAll('a[href]')]
.filter(link => link.origin === mainOrigin &&
link.pathname === mainPathname &&
link.hash !== '' &&
document.body.querySelector(link.hash) === null)
.map(link => `${link.innerText} : ${link.href}`)
.join('\n');
}, origin, pathname);
console.log(wrongLinks);
browser.close();
})(); Currently, it detects these links: outputcluster.settings : https://nodejs.org/api/all.html#clustersettings |
I was more thinking of external links when I opened this, but it's good to have checks for those relative links too. For external ones, I think a simple status code check should be enough. For the internal ones, I could see a check like above being part of the CI, but I don't think we can include puppeteer in the repository, it's just too heavy. |
Also, there is a tool called html-proofer that can be used for such things. I use it in some of my static pages to check if all the resources are exists like (images, stylesheets, etc) and... it also checks for any broken links on your website. |
A more meticulous and tangled variant for internal links checking (for hash-only links and for inter-document links inside the doc site). It still uses puppeteer, so it is not bearable inside the repo or CI, but it can be occasionally used locally. |
Could we use something Node.js like jsdom or cheerio instead of Puppeteer? The latter sounds a lot like an overkill to me, while cheerio might even be small enough to be bundled in core. |
Or even better, a Markdown-based solution, that can possibly be integrated with doctool. |
@TimothyGu I had someone PR Danger as a CI/CD tool for a markdown-only project of mine to detect broken links - it may be useful to run on docs updates? http://danger.systems/js/ |
There's been zero activity on this in 11 months. I recommend closing. |
I wrote a tool, similar to the html-proofer API, based on node. https://github.com/timaschew/link-checker |
FWIW, the internal doc system is checked now (see #21889), so we only need external link validation. |
Still no actual activity on this, should we keep it open? |
I agree, better close it than. |
Links in docs get regularily broken (example), and it should be possible to have a script that iterates all links in the docs and checks for a HTTP response code of < 400.
Probably not something we want to run as part of the CI, but I could see the script being run on-demand regularily.
The text was updated successfully, but these errors were encountered: