Skip to content

Add a blocking lychee linkcheck CI job and fix broken docs links#2699

Open
pankajastro wants to merge 10 commits into
mainfrom
docs/linkcheck-ci
Open

Add a blocking lychee linkcheck CI job and fix broken docs links#2699
pankajastro wants to merge 10 commits into
mainfrom
docs/linkcheck-ci

Conversation

@pankajastro
Copy link
Copy Markdown
Contributor

@pankajastro pankajastro commented May 19, 2026

Summary

Add a linkcheck CI job that catches broken links in the docs, and fix the
broken links it surfaced.

  • Add a linkcheck job to .github/workflows/docs-build.yml using the
    lychee link checker. It runs on
    pushes to main and PRs touching docs/, cosmos/, or the workflow, and
    fails the build on genuinely broken links (fail: true).
  • Add a repo-root lychee.toml configuring checker behaviour: only http/https
    schemes (docs contain example postgres:///s3:///… strings), retries and
    timeouts so throttling hosts don't make the check flaky, accept 429/503/504
    as "host up but rate-limiting", and an exclude list for URLs an automated
    checker can't verify (Slack auth wall, bot-blocked Bitnami charts, the
    your-org placeholder, flaky ossrank.com).
  • Fix the broken/stale links the checker found across the docs and several
    profile-mapping docstrings (~27 files), including the stale privacy-policy
    link.

Notes

  • Genuinely broken links are fixed in the docs rather than excluded; the
    exclude list is reserved for unverifiable or intentionally-placeholder URLs.

Test plan

  • CI: confirm the new linkcheck job appears and runs lychee against docs/**/*.rst.
  • CI: confirm it fails when a link is broken and passes once links are fixed.
  • CI: confirm the existing build job continues to pass.
  • Local: lychee --config lychee.toml 'docs/**/*.rst' passes.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 19, 2026 05:02
@pankajastro pankajastro requested review from a team, corsettigyg, dwreeves and jbandoro as code owners May 19, 2026 05:02
@pankajastro pankajastro requested review from pankajkoti and tatiana May 19, 2026 05:02
@pankajastro pankajastro marked this pull request as draft May 19, 2026 05:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CI infrastructure to proactively detect broken external links in the Sphinx docs without blocking merges, and introduces a documented configuration hook for future exclusions.

Changes:

  • Add linkcheck_ignore placeholder (currently empty) to docs/conf.py for future linkcheck ignore patterns.
  • Add a non-blocking linkcheck job to .github/workflows/docs-build.yml that runs sphinx-build -b linkcheck on the same triggers as the docs HTML build.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
docs/conf.py Adds an explicit linkcheck_ignore config entry (empty placeholder) for Sphinx link checking.
.github/workflows/docs-build.yml Adds a linkcheck CI job (continue-on-error) to run Sphinx’s linkcheck builder on docs-related changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add an empty `linkcheck_ignore` placeholder in docs/conf.py
and a `linkcheck` job in docs-build.yml that runs
`sphinx-build -b linkcheck` on the same triggers as the HTML
build. The job is marked continue-on-error so PRs are not
blocked while known broken external links are fixed in
separate in-review PRs; once those land it can be promoted
to required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Turn the docs linkcheck job into a real, blocking check instead of an
informational one, and resolve every link it flagged so the job is green:

- Remove the job-level `continue-on-error` so broken links fail CI.
- Populate `linkcheck_ignore` only for destinations an automated checker
  cannot verify: the Slack invite (auth wall), ossrank (flaky TLS /
  rate-limited), and example `localhost` URLs.
- Add `linkcheck_anchors_ignore_for_url` for github.com — GitHub renders
  anchors client-side, so fragments like `#L72` / `#issuecomment-...` are
  false positives; the page itself is still verified.
- Convert hand-written `.html` cross-references to `:doc:`/`:ref:` roles so
  they resolve per-page and are validated by the HTML build (several also
  had genuinely wrong relative paths, e.g. `../profiles/index.html`).
- Fix a real 404: the dbt-fusion adapters link now points at the existing
  `crates/dbt-adapter` directory.

Verified locally: `sphinx-build -b linkcheck` exits 0 with no broken links,
and the HTML build introduces no new cross-reference warnings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The www.astronomer.io docs site throttles automated requests from GitHub
runners and intermittently exceeds the 30s read timeout, failing the
blocking linkcheck job non-deterministically (it resolves fine locally).
Ignore the host (same rationale as ossrank) and add linkcheck retries plus
a longer timeout so transient slowness on any host doesn't flake CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sphinx's linkcheck builder fails the build on timed-out links just like
broken ones, and there is no way to make timeouts non-fatal — which host
times out varies run to run, so the blocking job was flaky.

Replace it with the lychee link checker (lycheeverse/lychee-action),
configured via a repo-root lychee.toml that encodes "fail only on real
breakage":
- accept 429/503/504 (host up but throttling automated requests) and retry
  with lower concurrency, so transient CDN throttling (e.g. docs sites) does
  not fail the job, while 404/403/410 still do;
- only check http/https, so example connection strings (postgres://, s3://,
  ...) in code blocks are not treated as links;
- exclude loopback and a few un-checkable URLs (Slack invite auth wall,
  ossrank flaky TLS, Bitnami Helm repo bot-block) and an intentional
  your-org/... placeholder.

lychee does not check URL fragments by default, so GitHub's client-side
anchors are no longer false positives, and it uses GITHUB_TOKEN to avoid
GitHub rate limits.

Also fix a genuinely stale link: the Scarf privacy-policy comment in
docs/index.rst pointed at docs/policy/security-privacy.rst, which #2519
split — privacy now lives in the top-level PRIVACY_NOTICE.rst.

The hand-written .html cross-references converted to :doc:/:ref: in the
previous commits stay (they keep lychee from flagging relative .html paths
and are validated by the HTML build). The unused Sphinx linkcheck_* config
is removed from docs/conf.py.

Verified locally with lychee 0.24.2: 0 errors across docs/**/*.rst.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pankajastro pankajastro changed the title Add linkcheck CI job and ignore-list config Add a blocking lychee linkcheck CI job and fix broken docs links Jun 2, 2026
@pankajastro pankajastro marked this pull request as ready for review June 2, 2026 16:59
Copilot AI review requested due to automatic review settings June 2, 2026 16:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Comment thread docs/getting_started/how-cosmos-works.rst Outdated
Comment thread docs/reference/configs/profile-config.rst
Comment thread .github/workflows/docs-build.yml
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 2, 2026 17:04
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The linkcheck job depends on lychee.toml, but it was not in the path
filter, so a config-only change could merge without CI coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comment thread .github/workflows/docs-build.yml
Comment thread docs/reference/configs/profile-config.rst
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 2, 2026 17:10
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
A prior auto-generated commit dropped the `uses:` line for the
lychee-action step and left a duplicate `args:` key, producing invalid
YAML. Restore the action reference and keep the quoted glob.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Comment thread .github/workflows/docs-build.yml
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.28%. Comparing base (2d9f6f0) to head (a85730d).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2699   +/-   ##
=======================================
  Coverage   98.28%   98.28%           
=======================================
  Files         106      106           
  Lines        7913     7913           
=======================================
  Hits         7777     7777           
  Misses        136      136           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tatiana tatiana self-assigned this Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants