Add a blocking lychee linkcheck CI job and fix broken docs links#2699
Open
pankajastro wants to merge 10 commits into
Open
Add a blocking lychee linkcheck CI job and fix broken docs links#2699pankajastro wants to merge 10 commits into
pankajastro wants to merge 10 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds CI infrastructure to proactively detect broken external links in the Sphinx docs without blocking merges, and introduces a documented configuration hook for future exclusions.
Changes:
- Add
linkcheck_ignoreplaceholder (currently empty) todocs/conf.pyfor future linkcheck ignore patterns. - Add a non-blocking
linkcheckjob to.github/workflows/docs-build.ymlthat runssphinx-build -b linkcheckon the same triggers as the docs HTML build.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
docs/conf.py |
Adds an explicit linkcheck_ignore config entry (empty placeholder) for Sphinx link checking. |
.github/workflows/docs-build.yml |
Adds a linkcheck CI job (continue-on-error) to run Sphinx’s linkcheck builder on docs-related changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This was referenced May 19, 2026
b9fc52c to
2f733f8
Compare
Add an empty `linkcheck_ignore` placeholder in docs/conf.py and a `linkcheck` job in docs-build.yml that runs `sphinx-build -b linkcheck` on the same triggers as the HTML build. The job is marked continue-on-error so PRs are not blocked while known broken external links are fixed in separate in-review PRs; once those land it can be promoted to required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2f733f8 to
e3fe473
Compare
Turn the docs linkcheck job into a real, blocking check instead of an informational one, and resolve every link it flagged so the job is green: - Remove the job-level `continue-on-error` so broken links fail CI. - Populate `linkcheck_ignore` only for destinations an automated checker cannot verify: the Slack invite (auth wall), ossrank (flaky TLS / rate-limited), and example `localhost` URLs. - Add `linkcheck_anchors_ignore_for_url` for github.com — GitHub renders anchors client-side, so fragments like `#L72` / `#issuecomment-...` are false positives; the page itself is still verified. - Convert hand-written `.html` cross-references to `:doc:`/`:ref:` roles so they resolve per-page and are validated by the HTML build (several also had genuinely wrong relative paths, e.g. `../profiles/index.html`). - Fix a real 404: the dbt-fusion adapters link now points at the existing `crates/dbt-adapter` directory. Verified locally: `sphinx-build -b linkcheck` exits 0 with no broken links, and the HTML build introduces no new cross-reference warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The www.astronomer.io docs site throttles automated requests from GitHub runners and intermittently exceeds the 30s read timeout, failing the blocking linkcheck job non-deterministically (it resolves fine locally). Ignore the host (same rationale as ossrank) and add linkcheck retries plus a longer timeout so transient slowness on any host doesn't flake CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sphinx's linkcheck builder fails the build on timed-out links just like broken ones, and there is no way to make timeouts non-fatal — which host times out varies run to run, so the blocking job was flaky. Replace it with the lychee link checker (lycheeverse/lychee-action), configured via a repo-root lychee.toml that encodes "fail only on real breakage": - accept 429/503/504 (host up but throttling automated requests) and retry with lower concurrency, so transient CDN throttling (e.g. docs sites) does not fail the job, while 404/403/410 still do; - only check http/https, so example connection strings (postgres://, s3://, ...) in code blocks are not treated as links; - exclude loopback and a few un-checkable URLs (Slack invite auth wall, ossrank flaky TLS, Bitnami Helm repo bot-block) and an intentional your-org/... placeholder. lychee does not check URL fragments by default, so GitHub's client-side anchors are no longer false positives, and it uses GITHUB_TOKEN to avoid GitHub rate limits. Also fix a genuinely stale link: the Scarf privacy-policy comment in docs/index.rst pointed at docs/policy/security-privacy.rst, which #2519 split — privacy now lives in the top-level PRIVACY_NOTICE.rst. The hand-written .html cross-references converted to :doc:/:ref: in the previous commits stay (they keep lychee from flagging relative .html paths and are validated by the HTML build). The unused Sphinx linkcheck_* config is removed from docs/conf.py. Verified locally with lychee 0.24.2: 0 errors across docs/**/*.rst. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The linkcheck job depends on lychee.toml, but it was not in the path filter, so a config-only change could merge without CI coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
A prior auto-generated commit dropped the `uses:` line for the lychee-action step and left a duplicate `args:` key, producing invalid YAML. Restore the action reference and keep the quoted glob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2699 +/- ##
=======================================
Coverage 98.28% 98.28%
=======================================
Files 106 106
Lines 7913 7913
=======================================
Hits 7777 7777
Misses 136 136 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a
linkcheckCI job that catches broken links in the docs, and fix thebroken links it surfaced.
linkcheckjob to.github/workflows/docs-build.ymlusing thelychee link checker. It runs on
pushes to
mainand PRs touchingdocs/,cosmos/, or the workflow, andfails the build on genuinely broken links (
fail: true).lychee.tomlconfiguring checker behaviour: only http/httpsschemes (docs contain example
postgres:///s3:///… strings), retries andtimeouts so throttling hosts don't make the check flaky, accept 429/503/504
as "host up but rate-limiting", and an
excludelist for URLs an automatedchecker can't verify (Slack auth wall, bot-blocked Bitnami charts, the
your-orgplaceholder, flaky ossrank.com).profile-mapping docstrings (~27 files), including the stale privacy-policy
link.
Notes
excludelist is reserved for unverifiable or intentionally-placeholder URLs.Test plan
linkcheckjob appears and runs lychee againstdocs/**/*.rst.buildjob continues to pass.lychee --config lychee.toml 'docs/**/*.rst'passes.🤖 Generated with Claude Code