Skip to content

fix(host-service): fetch PRs per-branch to avoid 504 on large repos#4268

Merged
Kitenite merged 10 commits into
superset-sh:mainfrom
ruangustavo:fix-pr-badge-graphql-time
May 10, 2026
Merged

fix(host-service): fetch PRs per-branch to avoid 504 on large repos#4268
Kitenite merged 10 commits into
superset-sh:mainfrom
ruangustavo:fix-pr-badge-graphql-time

Conversation

@ruangustavo
Copy link
Copy Markdown
Contributor

@ruangustavo ruangustavo commented May 8, 2026

Summary

  • Replace the single pullRequests(first: 100) GraphQL query with per-branch pullRequests(headRefName: $branch, first: 1) queries, scoped per workspace branch.
  • Make refresh failure-isolating: a 504/rate-limit on one branch no longer blanks every workspace's PR badge — the existing pullRequestId is preserved and the next tick retries.

Why / Context

On large repos (e.g. Monest-Eng/monest-backend) the existing query — fetch all PRs sorted by updated_at — frequently returned GraphQL 504 or tripped GitHub abuse-detection (403). Symptoms in the desktop UI:

  • PR badges in the sidebar disappeared on every refresh tick.
  • Failures cached for the full TTL (working as designed for stability) meant the badges stayed gone for ~20s at a time.
  • A single bad branch (or transient outage) blanked badges for unrelated workspaces in the same repo.

How It Works

  • Per-branch GraphQL: PULL_REQUEST_FOR_BRANCH_QUERY filters with headRefName: $branch, cost-light vs. paginating 100 PRs. Util renamed fetchRepositoryPullRequestsfetchPullRequestForBranch.
  • Cache key changed from owner/repoowner/repo#branch. One failing branch can no longer poison resolution for the rest. Failed promises still cached for the full TTL to avoid retry storms (existing semantics preserved).
  • Three-way refresh semantics in performProjectRefresh:
    • matched → upsert PR row, set pullRequestId.
    • no-match (resolved-null OR head-identity mismatch) → clear pullRequestId.
    • failed (Promise.allSettled rejection) → preserve existing pullRequestId. Transient 504 must not blank the badge.
  • Fork guard: pullRequests(headRefName: …) filters by branch name only on the base repo. Fork PRs share branch names; we verify headRepositoryOwner.login and headRepository.name match the workspace upstream before accepting the node.
  • Cache eviction scoped by project repo prefix so refreshing one project doesn't evict cache entries belonging to other projects we haven't refreshed yet.

Manual QA Checklist

Refresh behaviour

  • Workspaces in a healthy repo show correct PR badges.
  • Workspaces in a large repo (e.g. Monest-Eng/monest-backend) show badges without 504s.
  • On simulated transient failure, existing PR badges remain (don't blank).
  • After failure resolves, next tick repopulates badges.
  • Closing/merging a PR clears the badge on the next refresh.
  • A workspace whose branch points to a fork (different owner/repo) is not matched against a base-repo PR with the same branch name.

Cache

  • Two workspaces on the same branch share one in-flight fetch (no duplicate GraphQL call).
  • Refreshing project A doesn't evict cache entries for project B.

Testing

  • bun run typecheck ✓ (turbo cache hit)
  • bun run lint ✓ (Biome, 4352 files, 0 issues)
  • bun test packages/host-service/src/runtime/pull-requests/pull-requests.test.ts ✓ — 11 pass / 0 fail / 34 expects covering:
    • multi-workspace test harness (PR refresh)
    • hardened harness fragility note + correct multi-PR upsert
    • per-branch routing + same-branch dedup
    • failure-isolation (one branch's 504 doesn't affect siblings)
    • fork-mismatch (branch-name collision with base repo doesn't false-match)

Design Decisions

  • Per-branch fetch over single repo fetch with first: 100: Per-branch is O(workspaces) GraphQL points but each is cheap and isolates failures. The previous "one fetch for all" looks cheaper in nominal case but its tail (504, 403 abuse) was the production failure mode.
  • Preserve pullRequestId on fetch failure instead of clearing: Transient infra blips shouldn't show "no PR" and trigger UI flicker. Stale-but-correct beats correct-but-flickery.

Risks / Rollout

  • Risk: Per-branch fetches multiply GraphQL points usage on repos with many workspaces. Per-branch cache + 20s TTL bounds this to one query per (repo, branch) per refresh window. No primary-rate-limit impact observed in tests.
  • Rollout: Ships in next desktop build; no migration required, no flag.
  • Rollback: Single revert of this PR restores the previous "fetch all" path. Cache state is in-memory only; no on-disk schema or persisted format changed.

Summary by cubic

Fixes PR badge timeouts and flicker on large GitHub repos by fetching PRs per-branch with failure isolation and better fork matching. Badges stay stable under 504s and when multiple forks share the same branch. Addresses #4246.

  • Bug Fixes
    • Replace repo-wide query with per-branch pullRequests(headRefName: $branch, first: 10) and match by head owner/repo case-insensitively.
    • Cache per owner/repo#branch (20s TTL), dedup same-branch requests, and keep failed promises cached to avoid retry storms.
    • Failure isolation: matched → link; no-match → clear; head lookup failed → keep existing pullRequestId (tracked per-branch).
    • Handle multi-fork branch collisions by picking the candidate whose head matches the workspace’s upstream.
    • Evict stale branch cache entries scoped to the project’s repo prefix.
    • Tests cover per-branch routing/dedup, lookup-failure preservation, fork mismatch, multi-fork collisions, and case-insensitive matching.

Written for commit 9e8ff14. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes
    • Improved PR linking resilience: when upstream PR data cannot be fetched, the system now preserves existing PR links instead of clearing them, reducing data loss during refresh operations.

Review Change Stack

Ruan Gustavo Araujo da Silveira added 7 commits May 8, 2026 11:16
… repos

PullRequestsForSidebar timed out (504) on repos with many PRs because it
materialized 100 PRs x 50 status contexts per refresh. Replace with a
per-branch query keyed by headRefName, fired in parallel via
Promise.allSettled with per-branch caching. Failure of one branch no
longer poisons PR resolution for unrelated workspaces.

Refs superset-sh#4246
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2796ef4b-8bda-4fcd-9879-1b45927639e2

📥 Commits

Reviewing files that changed from the base of the PR and between 6588f0a and 9e8ff14.

📒 Files selected for processing (2)
  • packages/host-service/src/runtime/pull-requests/pull-requests.test.ts
  • packages/host-service/src/runtime/pull-requests/pull-requests.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/host-service/src/runtime/pull-requests/pull-requests.test.ts

📝 Walkthrough

Walkthrough

The PR refines pull request fetch resilience by tracking failed upstream-key attempts separately from successful matches. fetchRepoPullRequests now returns both a matched PR map and a failedKeys set; performProjectRefresh applies three-way workspace linking: set pullRequestId on upstream-key match, clear when unmatched, or preserve the existing value when that key's fetch failed.

Changes

Preserve existing pullRequestId when upstream-key fetch fails

Layer / File(s) Summary
Fetch return type
packages/host-service/src/runtime/pull-requests/pull-requests.ts
fetchRepoPullRequests signature changes from returning a single Map<string, PullRequest> to returning { matched: Map, failedKeys: Set } to distinguish successful lookups from fetch errors.
Error tracking during fetch
packages/host-service/src/runtime/pull-requests/pull-requests.ts
Catch blocks during per-upstream-key head resolution now add keys to failedKeys when fetch or validation errors occur, enabling later preservation logic.
Return both results
packages/host-service/src/runtime/pull-requests/pull-requests.ts
Function returns { matched, failedKeys } instead of the match map alone.
Three-way workspace linking
packages/host-service/src/runtime/pull-requests/pull-requests.ts
performProjectRefresh now receives both matched and failedKeys, then applies conditional per-workspace logic: set pullRequestId on match, clear to null when no match exists, or preserve existing pullRequestId when the upstream-key is in failedKeys.
Regression test
packages/host-service/src/runtime/pull-requests/pull-requests.test.ts
New test case confirms pullRequestId is preserved when GitHub API calls throw during head lookup, validating the failure-preservation path.

🎯 2 (Simple) | ⏱️ ~12 minutes

🐰 A PR that learns from its failures—
When lookups stumble, we hold our ground,
Preserve the bond instead of loss,
Three paths forward, wisely found. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: switching to per-branch PR fetches to avoid 504 errors on large repos, which aligns with the core objective of the changeset.
Description check ✅ Passed The PR description comprehensively covers all template sections including detailed context, implementation, testing results, design decisions, and risks. However, it lacks explicit Related Issues and Type of Change sections from the template.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

This PR replaces a single pullRequests(first: 100) GraphQL query per repo with a per-branch pullRequests(headRefName: $branch, first: 1) query, scoped to each workspace's upstream branch. It also introduces three-way refresh semantics so transient 504/rate-limit failures preserve existing PR badges instead of blanking them.

  • Per-branch GraphQL: Cache key is now owner/repo#branch; a single failing branch no longer poisons resolution for sibling workspaces in the same repo.
  • Failure isolation via Promise.allSettled: failed branches add their key to failedKeys, leaving the workspace's pullRequestId untouched; only clean no-match responses clear the badge.
  • Fork guard: After fetching, the headRepositoryOwner.login and headRepository.name on the returned node are verified against the workspace's upstream before accepting a match, preventing branch-name collisions with the base repo from false-linking fork workspaces.

Confidence Score: 4/5

Safe to merge for the common case; all workspaces pointing to the base repo or a single fork behave correctly.

The three-way refresh semantics and per-branch cache isolation are sound. The one weak spot is that getCachedBranchPullRequest always queries the base repo and caches results under projectRepo#branch, so two workspaces in the same project pointing to different fork repos but sharing the same branch name compete for one cached result — the workspace whose fork's PR was not the most recently updated will have its badge incorrectly cleared. This is a narrow edge case that won't affect the majority of users.

The fetchBranchPullRequests method in pull-requests.ts (specifically the cache lookup / fork-guard interaction) deserves a second look if multi-fork project setups are a planned use case.

Important Files Changed

Filename Overview
packages/host-service/src/runtime/pull-requests/pull-requests.ts Core logic rewrite: per-branch caching, three-way refresh semantics, and fork identity guard. Works correctly for the common case; a niche edge case exists when two workspaces have different upstream repos but share the same branch name.
packages/host-service/src/runtime/pull-requests/utils/github-query/github-query.ts Renamed and simplified from returning a list to returning the single first node; straightforward, correct.
packages/host-service/src/runtime/pull-requests/utils/github-query/query.ts New GraphQL query adds headRefName filter and reduces first from 100 to 1; fields and ordering are unchanged from the original.
packages/host-service/src/runtime/pull-requests/pull-requests.test.ts 558-line harness extension covering per-branch routing, dedup, failure isolation, fork guard, cache TTL, and eviction. Relies on Drizzle internal JSON shape in extractEqRight; fragility is well-documented. Missing direct test coverage for the multi-fork same-branch collision scenario.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/host-service/src/runtime/pull-requests/pull-requests.ts:916-931
**Multi-fork same-branch cache collision clears valid PR links**

`getCachedBranchPullRequest` is always called with `projectRepo`, so two workspace targets that differ only in their upstream repo (e.g., `fork-owner-A/repo` vs `fork-owner-B/repo`) but share the same branch name both resolve to the cache key `projectRepo.owner/projectRepo.name#feat/x`. GitHub returns a single node (the most recently updated PR for that branch on the base repo). The head-identity guard then correctly rejects the non-matching fork — but since `failedKeys` is not set for a resolved-null, the workspace's `pullRequestId` is cleared even though its PR exists. The affected workspace's badge goes blank until the next refresh cycle where the other fork's PR may no longer be the most-recently-updated one. The scenario requires two workspaces in the same project pointing to different fork repos with an identical branch name, which is uncommon but entirely plausible when two contributors each maintain their own fork.

Reviews (1): Last reviewed commit: "chore(host-service): post-verification f..." | Re-trigger Greptile

Comment thread packages/host-service/src/runtime/pull-requests/pull-requests.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/host-service/src/runtime/pull-requests/pull-requests.ts`:
- Around line 926-929: The fork guard compares raw GraphQL strings
(node.headRepositoryOwner?.login and node.headRepository?.name) to
target.owner/target.name and can false-positive on case-only differences; update
the check to normalize both sides using the existing upstreamKey() helper (reuse
upstreamKey(node.headRepositoryOwner?.login, node.headRepository?.name) and
upstreamKey(target.owner, target.name)) so the comparison is case-insensitive
and consistent with how workspaces are keyed.
- Around line 915-930: The cache lookup in getCachedBranchPullRequest (used
inside the entries.map async callback) only fetches the single most recently
updated PR via pullRequests(headRefName: ..., first: 1), which causes cross-fork
collisions when multiple forks use the same branch name; update the
GraphQL/fetch call inside getCachedBranchPullRequest to request multiple
candidates (e.g., first: 10 or higher), then locally filter the returned nodes
by matching node.headRepositoryOwner?.login === target.owner and
node.headRepository?.name === target.name to pick the correct fork's PR (or
return null if none match) so the entries.map caller receives the fork-specific
PR instead of an unrelated one.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41687ae8-6d7d-4550-a401-f911e043ee64

📥 Commits

Reviewing files that changed from the base of the PR and between 0d91a02 and ec61f15.

📒 Files selected for processing (6)
  • packages/host-service/src/runtime/pull-requests/pull-requests.test.ts
  • packages/host-service/src/runtime/pull-requests/pull-requests.ts
  • packages/host-service/src/runtime/pull-requests/utils/github-query/github-query.ts
  • packages/host-service/src/runtime/pull-requests/utils/github-query/index.ts
  • packages/host-service/src/runtime/pull-requests/utils/github-query/query.ts
  • packages/host-service/src/runtime/pull-requests/utils/github-query/types.ts

Comment thread packages/host-service/src/runtime/pull-requests/pull-requests.ts Outdated
Comment thread packages/host-service/src/runtime/pull-requests/pull-requests.ts Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/host-service/src/runtime/pull-requests/pull-requests.ts">

<violation number="1" location="packages/host-service/src/runtime/pull-requests/pull-requests.ts:917">
P2: Single-node branch lookup can clear a workspace PR when another PR with the same branch name is returned first, because the code does not search alternate matches before treating the result as no match.</violation>

<violation number="2" location="packages/host-service/src/runtime/pull-requests/pull-requests.ts:926">
P2: Case-sensitive head-repo matching can reject valid PRs when owner/repo casing differs, causing the workspace PR link to be cleared.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/host-service/src/runtime/pull-requests/pull-requests.ts Outdated
Comment thread packages/host-service/src/runtime/pull-requests/pull-requests.ts Outdated
Ruan Gustavo Araujo da Silveira and others added 3 commits May 8, 2026 18:06
Fetch up to 10 candidates per branch and match the workspace's fork
case-insensitively, instead of trusting the single most-recently-updated
PR returned by GraphQL. Prevents two workspaces sharing a branch name
on different forks from blanking each other's PR badge, and tolerates
owner/repo casing drift.
…dge-graphql-time

# Conflicts:
#	packages/host-service/src/runtime/pull-requests/pull-requests.test.ts
#	packages/host-service/src/runtime/pull-requests/pull-requests.ts
#	packages/host-service/src/runtime/pull-requests/utils/github-query/github-query.ts
#	packages/host-service/src/runtime/pull-requests/utils/github-query/index.ts
#	packages/host-service/src/runtime/pull-requests/utils/github-query/query.ts
#	packages/host-service/src/runtime/pull-requests/utils/github-query/types.ts
@Kitenite Kitenite merged commit d2f92c7 into superset-sh:main May 10, 2026
10 of 11 checks passed
@saddlepaddle saddlepaddle mentioned this pull request May 12, 2026
3 tasks
saddlepaddle added a commit that referenced this pull request May 12, 2026
Changes since v0.2.14:

- workspaces: `superset workspaces list` now accepts `--project` and
  `--search` filters, matching the desktop list view. (#4455)
- cli-framework: `--help` on a subcommand now shows the global options
  (e.g. `--json`, `--quiet`, `--api-key`) instead of hiding them. (#4424)
- host-service: attachment upload no longer rejects unknown mediaType
  values returned by some hosts. (#4439)
- host-service: PR fetch is now per-branch, avoiding 504s on repos with
  large numbers of open PRs. (#4268)

Push cli-v0.2.15 after this lands to fire the release pipeline.
MocA-Love pushed a commit to MocA-Love/superset that referenced this pull request May 25, 2026
…uperset-sh#4268)

* fix(host-service): fetch PRs per-branch to avoid GraphQL 504 on large repos

PullRequestsForSidebar timed out (504) on repos with many PRs because it
materialized 100 PRs x 50 status contexts per refresh. Replace with a
per-branch query keyed by headRefName, fired in parallel via
Promise.allSettled with per-branch caching. Failure of one branch no
longer poisons PR resolution for unrelated workspaces.

Refs superset-sh#4246

* refactor(host-service): drop redundant type cast in branch PR fetcher

* test(host-service): add multi-workspace test harness for PR refresh

* test(host-service): harden harness — fragility note + correct multi-PR upsert

* test(host-service): cover per-branch routing and same-branch dedup

* test(host-service): tighten failure-isolation and fork-mismatch assertions

* chore(host-service): post-verification fixes

* fix(host-service): handle multi-fork branch collisions in PR cache

Fetch up to 10 candidates per branch and match the workspace's fork
case-insensitively, instead of trusting the single most-recently-updated
PR returned by GraphQL. Prevents two workspaces sharing a branch name
on different forks from blanking each other's PR badge, and tolerates
owner/repo casing drift.

* fix(host-service): preserve PR badge on lookup failure

---------

Co-authored-by: Ruan Gustavo Araujo da Silveira <ruan.silveira@M4Pro.local>
Co-authored-by: Kiet Ho <hoakiet98@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants