Skip to content

fix(contributors): handle GitHub secondary rate limit#18073

Merged
pettinarip merged 1 commit into
devfrom
fetch-gh-contributors
Apr 30, 2026
Merged

fix(contributors): handle GitHub secondary rate limit#18073
pettinarip merged 1 commit into
devfrom
fetch-gh-contributors

Conversation

@myelinated-wackerow
Copy link
Copy Markdown
Collaborator

@myelinated-wackerow myelinated-wackerow commented Apr 29, 2026

Summary

The fetch-github-contributors Trigger.dev task was silently failing on a subset of paths, leaving appPages[pagePath] missing or empty in Netlify Blobs. Downstream, getAppPageLastCommitDate([]) reduces to new Date(0), which is why production pages like https://ethereum.org/what-is-ethereum/ have been showing "Page last update: January 1, 1970" plus an empty "See contributors" modal (issue #18070).

This PR fixes the data side. A separate, smaller PR can add UI defenses around empty contributor arrays.

Why -- research notes

The task is authenticated and was not hitting 429s, so the failure mode initially looked unlike rate limiting. It is. GitHub's REST API has two distinct rate limits:

  • Primary (5,000 req/hr authenticated). Returns 429 (or sometimes 403) with X-RateLimit-Remaining: 0 and X-RateLimit-Reset. The old code handled this case.
  • Secondary (a.k.a. abuse rate limit). Returns 403 with a Retry-After header and no X-RateLimit-* headers. Triggers include:
    • "No more than 100 concurrent requests are allowed. This limit is shared across the REST API and GraphQL API."
    • "No more than 900 points per minute are allowed for REST API endpoints" (GET = 1 point).
    • "Make requests for a single user or client ID serially. Do not make requests... concurrently."

A captured 403 response from the failing task confirms this exact shape: status 403, retry-after: 60, no X-RateLimit-Remaining. The old 403 handler keyed on X-RateLimit-Remaining === "0", so secondary-limit 403s fell through to a console.warn + return []. The empty array was then written to blob storage, and the fetcher's own if (contributors.length > 0) filter made "fetch failed" indistinguishable from "no contributors."

The likely trigger was concurrency: BATCH_SIZE = 20 parallel requests against /repos/.../commits, multiplied by ~6 historical paths per app page from getAllHistoricalPaths (most of which 404 for the legacy src/pages/... and _-prefixed variants), put us over GitHub's secondary thresholds well before primary quota was exhausted.

Changes

Scoped to src/data-layer/fetchers/fetchGitHubContributors.ts only. fetchRetry.ts and other fetchers untouched, so blast radius is limited to this task.

  • Concurrency cut. BATCH_SIZE 20 -> 2, BATCH_DELAY_MS 50 -> 200. With parallelBatch running batches serially, the task is at most 2 concurrent across its full run -- well under GitHub's 100-concurrent ceiling and 900 points/min budget. A weekly task does not need 20-wide parallelism.
  • Proper rate-limit detection. New readRateLimitWait(response) returns ms-to-wait or null. Recognizes:
    1. Primary limit via X-RateLimit-Remaining: 0 + X-RateLimit-Reset.
    2. Secondary limit via Retry-After header (delta-seconds or HTTP-date), capped at 5 minutes.
    3. Secondary limit via response-body sniff (/secondary rate limit|abuse/i) when the header is omitted, defaulting to 60 s per GitHub's guidance.
  • Bounded retries. fetchCommitsForPath takes an attempt counter, retries rate-limited responses up to 3 times, then throws. Avoids runaway loops on persistent limits.
  • Loud failures on truly unexpected errors. Non-OK statuses other than 404 or rate limits now throw with status, status text, and the first 300 chars of the response body. Previously these were console.warn + return [], which silently corrupted blob storage. 404 still returns [] (expected for legacy paths from getAllHistoricalPaths).
  • Small refactors that fell out: extracted commit transform into transformCommits with a typed GitHubCommit interface, and pulled the repeated Authorization + Accept headers into a githubApiHeaders() helper used by both fetchCommitsForPath and discoverPathsFromTree.

What this does NOT change

  • fetchRetry.ts -- intentionally untouched. All 403 handling is inline in this fetcher.
  • getAllHistoricalPaths fan-out -- the speculative legacy paths (src/pages/..., _-prefixed variants) remain. They exist to capture pre-App-Router commit history for files that no longer exist at HEAD, so filtering them against the current git tree would lose contributor data. At BATCH_SIZE=2 the 404 cost is negligible.
  • UI defenses around empty contributor arrays (proposed in App Router pages show "Page last update: January 1, 1970" and empty "See contributors" modal #18070) -- worth doing, separate PR.
  • fetchNameLookup's graceful empty-Map fallback -- not on the rate-limit hot path.

Test plan

  • Trigger the fetch-github-contributors task manually in Trigger.dev preview/staging.
  • Confirm task completes without throwing and writes a populated appPages map (no missing keys for current App Router pages).
  • Spot-check that "what-is-ethereum", "what-is-ether", "gas", "learn", "roadmap/vision" all have contributor entries.
  • Verify task logs show no Rate limited on ... retries during a normal run, or if retries occur they recover within bounds.
  • Deploy to a preview env and confirm https://deploy-preview.../what-is-ethereum/ shows a real "Page last update" date and a populated contributors modal.
  • Watch the next scheduled production run to confirm successful completion and current blob freshness.

Related issues

Generated by Claude Opus 4.7

The fetch-github-contributors task was hitting GitHub's secondary rate limit (returned as 403 with Retry-After) on parallel commits API requests, but the existing 403 handler only recognized the primary rate limit (X-RateLimit-Remaining=0 + X-RateLimit-Reset). Secondary-limit 403s fell through to a console.warn and an empty-array return, which was then written to blob storage as a missing entry; consumers fell back to new Date(0) and shipped "Page last update: January 1, 1970" plus an empty contributors modal in production.

Drops BATCH_SIZE from 20 to 2 (GitHub's guidance is serial-per-token; 2 keeps us comfortably under both the 100-concurrent ceiling and the 900-points/min budget) and bumps BATCH_DELAY_MS from 50 to 200. Adds readRateLimitWait that detects primary limits, secondary limits via Retry-After, and the body-message fallback when the header is omitted. Adds a bounded retry loop (3 attempts, 5-minute cap per wait) and replaces the silent empty-array return on unexpected non-OK statuses with a thrown error so a real failure surfaces as a failed task run rather than corrupting the blob.

Refs #18070.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 29, 2026

Deploy Preview for ethereumorg ready!

Name Link
🔨 Latest commit 8aabdb3
🔍 Latest deploy log https://app.netlify.com/projects/ethereumorg/deploys/69f33a60da204f086ecd2d0d
😎 Deploy Preview https://deploy-preview-18073.ethereum.it
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
7 paths audited
Performance: 72 (🟢 up 23 from production)
Accessibility: 93 (no change from production)
Best Practices: 100 (no change from production)
SEO: 98 (🔴 down 1 from production)
PWA: 59 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Member

@pettinarip pettinarip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@pettinarip pettinarip merged commit 68f8a70 into dev Apr 30, 2026
11 checks passed
@pettinarip pettinarip deleted the fetch-gh-contributors branch April 30, 2026 11:53
@wackerow wackerow mentioned this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

App Router pages show "Page last update: January 1, 1970" and empty "See contributors" modal

2 participants