-
Notifications
You must be signed in to change notification settings - Fork 22.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken URL fragments in the repo #25421
Comments
Note that if you group them by broken fragments (rather than by page with a broken link), we can look for one and solve all occurrences, and the list can be resolved more quickly. |
But then we'll get multiple commits on each file :) |
As long as we don't do one commit per flaw… I think this is ok: there is about 20 times the same broken link to HTTP/Cache#varying… If all are fixed in the same PR, it is quick to review, and I don't mind that the other flaws in these files are solved separately. |
@OnkarRuikar: I don't think this issue can be kept up to date. Can you share the script used to generate the gists and every time we try to fix something we can run the script? |
We don't have to. The current script prevents new occurrences from happening. The gist still hold true, e.g. |
What about anchors that happen to be fixed, though? For example, the very first item, "content/files/en-us/glossary/script-supporting_element/index.md Web/HTML/Kinds_of_HTML_content#script-supporting_elements", was fixed in #28000. |
Anyway I'll drop my script used in https://github.com/orgs/mdn/discussions/279: import fs from "node:fs/promises";
import path from "node:path";
import { load } from "cheerio";
async function* getFiles(dir) {
const dirents = await fs.readdir(dir, { withFileTypes: true });
for (const dirent of dirents) {
const res = path.join(dir, dirent.name);
if (dirent.isDirectory()) {
yield* getFiles(res);
} else if (res.endsWith("plain.html")) {
yield Promise.all([res, fs.readFile(res, "utf8")]);
}
}
}
const files = getFiles("build");
/** @type {Record<string, string[]>} */
const pathToIds = { __proto__: null };
/** @type {Record<string, string[]>} */
const pathToLinks = { __proto__: null };
for await (const [filename, content] of files) {
const $ = load(content);
pathToLinks[filename] = $("a")
.map((i, el) => $(el).attr("href"))
.get()
.map(
(url) =>
new URL(
url.startsWith("#")
? `${filename.replace(/build\/|\/plain.html/g, "")}${url}`
: url,
"https://developer.mozilla.org"
)
)
.filter(
(url) =>
url.hostname === "developer.mozilla.org" &&
url.hash &&
url.hash !== "#languages-switcher-button"
);
pathToIds[filename] = $("[id]")
.map((i, el) => $(el).attr("id"))
.get();
}
for (const [filename, urls] of Object.entries(pathToLinks)) {
const failures = [];
for (const url of urls) {
const referencedPath = path.join(
"build",
url.pathname
.replaceAll("*", "_star_")
.replaceAll("::", "_doublecolon_")
.replaceAll(":", "_colon_")
.replaceAll("?", "_question_")
.toLowerCase(),
"plain.html"
);
const ids = pathToIds[referencedPath];
if (ids) {
const target = ids.find((id) => id === decodeURI(url.hash.slice(1)));
if (!target) {
failures.push(
`Anchor not found: ${url
.toString()
.replace("https://developer.mozilla.org", "")}`
);
}
} else if (
!["/", "/en-US/docs/MDN/Contribute/Feedback", "/docs/MDN/About"].includes(
url.pathname
)
) {
failures.push(
`Page not found: ${url
.toString()
.replace("https://developer.mozilla.org", "")}`
);
}
}
if (failures.length) {
const sourceFile = path.resolve(
filename
.replace("build/en-us/docs", "files/en-us")
.replace("plain.html", "index.md")
);
console.log(`In ${sourceFile}:
${failures.map((e) => ` - ${e}`).join("\n")}
`);
}
} |
The script doesn't seem to handle definition list anchors:
I've updated the broken fragment data in OP. As per my script there seems only 617 broken URL fragments left. The script runs on custom platform, which i developed years ago. It'll take some effort to make is stand alone. @bsmth We've already plugged the hole so there aren't going to be new broken frags. For existing ~617 ones we need to create a project and a online spreadsheet to track them while fixing them. |
I don't see those problems in my output. Make sure you have built using the latest content. Also, the pre-commit hook isn't able to catch anchor errors generated from macros. |
I built it in yari. Doing it in content works.
Yes, for macros the entire content needs to be built. Witch won't be favourable before every commit or in GitHub workflows. |
Closing in favor of mdn/mdn#561 |
good idea, thanks! |
There are many broken URL fragments in
mdn/content
.Almost all the time when a header(
## header
) gets modified, contributors don't update related fragments all over the repo.Following is the list of broken fragments data:
https://gist.github.com/OnkarRuikar/fa14827682a37f04f23c49189bfa0701
The text was updated successfully, but these errors were encountered: