fix(host-service): fetch PRs per-branch to avoid 504 on large repos#4268
Conversation
… repos PullRequestsForSidebar timed out (504) on repos with many PRs because it materialized 100 PRs x 50 status contexts per refresh. Replace with a per-branch query keyed by headRefName, fired in parallel via Promise.allSettled with per-branch caching. Failure of one branch no longer poisons PR resolution for unrelated workspaces. Refs superset-sh#4246
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe PR refines pull request fetch resilience by tracking failed upstream-key attempts separately from successful matches. ChangesPreserve existing pullRequestId when upstream-key fetch fails
🎯 2 (Simple) | ⏱️ ~12 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR replaces a single
Confidence Score: 4/5Safe to merge for the common case; all workspaces pointing to the base repo or a single fork behave correctly. The three-way refresh semantics and per-branch cache isolation are sound. The one weak spot is that getCachedBranchPullRequest always queries the base repo and caches results under projectRepo#branch, so two workspaces in the same project pointing to different fork repos but sharing the same branch name compete for one cached result — the workspace whose fork's PR was not the most recently updated will have its badge incorrectly cleared. This is a narrow edge case that won't affect the majority of users. The fetchBranchPullRequests method in pull-requests.ts (specifically the cache lookup / fork-guard interaction) deserves a second look if multi-fork project setups are a planned use case.
|
| Filename | Overview |
|---|---|
| packages/host-service/src/runtime/pull-requests/pull-requests.ts | Core logic rewrite: per-branch caching, three-way refresh semantics, and fork identity guard. Works correctly for the common case; a niche edge case exists when two workspaces have different upstream repos but share the same branch name. |
| packages/host-service/src/runtime/pull-requests/utils/github-query/github-query.ts | Renamed and simplified from returning a list to returning the single first node; straightforward, correct. |
| packages/host-service/src/runtime/pull-requests/utils/github-query/query.ts | New GraphQL query adds headRefName filter and reduces first from 100 to 1; fields and ordering are unchanged from the original. |
| packages/host-service/src/runtime/pull-requests/pull-requests.test.ts | 558-line harness extension covering per-branch routing, dedup, failure isolation, fork guard, cache TTL, and eviction. Relies on Drizzle internal JSON shape in extractEqRight; fragility is well-documented. Missing direct test coverage for the multi-fork same-branch collision scenario. |
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
packages/host-service/src/runtime/pull-requests/pull-requests.ts:916-931
**Multi-fork same-branch cache collision clears valid PR links**
`getCachedBranchPullRequest` is always called with `projectRepo`, so two workspace targets that differ only in their upstream repo (e.g., `fork-owner-A/repo` vs `fork-owner-B/repo`) but share the same branch name both resolve to the cache key `projectRepo.owner/projectRepo.name#feat/x`. GitHub returns a single node (the most recently updated PR for that branch on the base repo). The head-identity guard then correctly rejects the non-matching fork — but since `failedKeys` is not set for a resolved-null, the workspace's `pullRequestId` is cleared even though its PR exists. The affected workspace's badge goes blank until the next refresh cycle where the other fork's PR may no longer be the most-recently-updated one. The scenario requires two workspaces in the same project pointing to different fork repos with an identical branch name, which is uncommon but entirely plausible when two contributors each maintain their own fork.
Reviews (1): Last reviewed commit: "chore(host-service): post-verification f..." | Re-trigger Greptile
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/host-service/src/runtime/pull-requests/pull-requests.ts`:
- Around line 926-929: The fork guard compares raw GraphQL strings
(node.headRepositoryOwner?.login and node.headRepository?.name) to
target.owner/target.name and can false-positive on case-only differences; update
the check to normalize both sides using the existing upstreamKey() helper (reuse
upstreamKey(node.headRepositoryOwner?.login, node.headRepository?.name) and
upstreamKey(target.owner, target.name)) so the comparison is case-insensitive
and consistent with how workspaces are keyed.
- Around line 915-930: The cache lookup in getCachedBranchPullRequest (used
inside the entries.map async callback) only fetches the single most recently
updated PR via pullRequests(headRefName: ..., first: 1), which causes cross-fork
collisions when multiple forks use the same branch name; update the
GraphQL/fetch call inside getCachedBranchPullRequest to request multiple
candidates (e.g., first: 10 or higher), then locally filter the returned nodes
by matching node.headRepositoryOwner?.login === target.owner and
node.headRepository?.name === target.name to pick the correct fork's PR (or
return null if none match) so the entries.map caller receives the fork-specific
PR instead of an unrelated one.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 41687ae8-6d7d-4550-a401-f911e043ee64
📒 Files selected for processing (6)
packages/host-service/src/runtime/pull-requests/pull-requests.test.tspackages/host-service/src/runtime/pull-requests/pull-requests.tspackages/host-service/src/runtime/pull-requests/utils/github-query/github-query.tspackages/host-service/src/runtime/pull-requests/utils/github-query/index.tspackages/host-service/src/runtime/pull-requests/utils/github-query/query.tspackages/host-service/src/runtime/pull-requests/utils/github-query/types.ts
There was a problem hiding this comment.
2 issues found across 6 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/host-service/src/runtime/pull-requests/pull-requests.ts">
<violation number="1" location="packages/host-service/src/runtime/pull-requests/pull-requests.ts:917">
P2: Single-node branch lookup can clear a workspace PR when another PR with the same branch name is returned first, because the code does not search alternate matches before treating the result as no match.</violation>
<violation number="2" location="packages/host-service/src/runtime/pull-requests/pull-requests.ts:926">
P2: Case-sensitive head-repo matching can reject valid PRs when owner/repo casing differs, causing the workspace PR link to be cleared.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Fetch up to 10 candidates per branch and match the workspace's fork case-insensitively, instead of trusting the single most-recently-updated PR returned by GraphQL. Prevents two workspaces sharing a branch name on different forks from blanking each other's PR badge, and tolerates owner/repo casing drift.
…dge-graphql-time # Conflicts: # packages/host-service/src/runtime/pull-requests/pull-requests.test.ts # packages/host-service/src/runtime/pull-requests/pull-requests.ts # packages/host-service/src/runtime/pull-requests/utils/github-query/github-query.ts # packages/host-service/src/runtime/pull-requests/utils/github-query/index.ts # packages/host-service/src/runtime/pull-requests/utils/github-query/query.ts # packages/host-service/src/runtime/pull-requests/utils/github-query/types.ts
Changes since v0.2.14: - workspaces: `superset workspaces list` now accepts `--project` and `--search` filters, matching the desktop list view. (#4455) - cli-framework: `--help` on a subcommand now shows the global options (e.g. `--json`, `--quiet`, `--api-key`) instead of hiding them. (#4424) - host-service: attachment upload no longer rejects unknown mediaType values returned by some hosts. (#4439) - host-service: PR fetch is now per-branch, avoiding 504s on repos with large numbers of open PRs. (#4268) Push cli-v0.2.15 after this lands to fire the release pipeline.
…uperset-sh#4268) * fix(host-service): fetch PRs per-branch to avoid GraphQL 504 on large repos PullRequestsForSidebar timed out (504) on repos with many PRs because it materialized 100 PRs x 50 status contexts per refresh. Replace with a per-branch query keyed by headRefName, fired in parallel via Promise.allSettled with per-branch caching. Failure of one branch no longer poisons PR resolution for unrelated workspaces. Refs superset-sh#4246 * refactor(host-service): drop redundant type cast in branch PR fetcher * test(host-service): add multi-workspace test harness for PR refresh * test(host-service): harden harness — fragility note + correct multi-PR upsert * test(host-service): cover per-branch routing and same-branch dedup * test(host-service): tighten failure-isolation and fork-mismatch assertions * chore(host-service): post-verification fixes * fix(host-service): handle multi-fork branch collisions in PR cache Fetch up to 10 candidates per branch and match the workspace's fork case-insensitively, instead of trusting the single most-recently-updated PR returned by GraphQL. Prevents two workspaces sharing a branch name on different forks from blanking each other's PR badge, and tolerates owner/repo casing drift. * fix(host-service): preserve PR badge on lookup failure --------- Co-authored-by: Ruan Gustavo Araujo da Silveira <ruan.silveira@M4Pro.local> Co-authored-by: Kiet Ho <hoakiet98@gmail.com>
Summary
pullRequests(first: 100)GraphQL query with per-branchpullRequests(headRefName: $branch, first: 1)queries, scoped per workspace branch.pullRequestIdis preserved and the next tick retries.Why / Context
On large repos (e.g.
Monest-Eng/monest-backend) the existing query — fetch all PRs sorted byupdated_at— frequently returned GraphQL 504 or tripped GitHub abuse-detection (403). Symptoms in the desktop UI:How It Works
PULL_REQUEST_FOR_BRANCH_QUERYfilters withheadRefName: $branch, cost-light vs. paginating 100 PRs. Util renamedfetchRepositoryPullRequests→fetchPullRequestForBranch.owner/repo→owner/repo#branch. One failing branch can no longer poison resolution for the rest. Failed promises still cached for the full TTL to avoid retry storms (existing semantics preserved).performProjectRefresh:pullRequestId.pullRequestId.Promise.allSettledrejection) → preserve existingpullRequestId. Transient 504 must not blank the badge.pullRequests(headRefName: …)filters by branch name only on the base repo. Fork PRs share branch names; we verifyheadRepositoryOwner.loginandheadRepository.namematch the workspace upstream before accepting the node.Manual QA Checklist
Refresh behaviour
Monest-Eng/monest-backend) show badges without 504s.Cache
Testing
bun run typecheck✓ (turbo cache hit)bun run lint✓ (Biome, 4352 files, 0 issues)bun test packages/host-service/src/runtime/pull-requests/pull-requests.test.ts✓ — 11 pass / 0 fail / 34 expects covering:Design Decisions
first: 100: Per-branch isO(workspaces)GraphQL points but each is cheap and isolates failures. The previous "one fetch for all" looks cheaper in nominal case but its tail (504, 403 abuse) was the production failure mode.pullRequestIdon fetch failure instead of clearing: Transient infra blips shouldn't show "no PR" and trigger UI flicker. Stale-but-correct beats correct-but-flickery.Risks / Rollout
Summary by cubic
Fixes PR badge timeouts and flicker on large GitHub repos by fetching PRs per-branch with failure isolation and better fork matching. Badges stay stable under 504s and when multiple forks share the same branch. Addresses #4246.
pullRequests(headRefName: $branch, first: 10)and match by head owner/repo case-insensitively.owner/repo#branch(20s TTL), dedup same-branch requests, and keep failed promises cached to avoid retry storms.pullRequestId(tracked per-branch).Written for commit 9e8ff14. Summary will update on new commits.
Summary by CodeRabbit