fix: prevent dev/staging subdomains from being indexed by search engines#17741
Conversation
✅ Deploy Preview for ethereumorg ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
myelinated-wackerow
left a comment
There was a problem hiding this comment.
Nice PR -- the layered approach (HTTP header + build-time env + code-level check) is the right strategy here. Two suggestions:
1. Prefer const with IIFE over export let
Minor, but export let allows external reassignment. An IIFE keeps it immutable:
export const IS_PRODUCTION_DEPLOY = (() => {
try {
return new URL(SITE_URL).hostname === "ethereum.org"
} catch {
return false
}
})()2. Code-level protection may leak on arbitrary branch deploys and deploy previews
I'm fairly confident (though not 100%) that for branches other than dev and staging, the code-level layer doesn't fire correctly. Since NEXT_PUBLIC_SITE_URL is only set for those two branches, arbitrary branch deploys and deploy previews would fall through the SITE_URL chain. DEPLOY_PRIME_URL / DEPLOY_URL / URL are build-time-only Netlify vars -- I believe they're unavailable at SSR runtime in Netlify Functions -- so SITE_URL would resolve to "https://ethereum.org", making IS_PRODUCTION_DEPLOY true. That means no noindex meta tag, robots.txt allows crawling, and canonical URLs point to ethereum.org.
The X-Robots-Tag HTTP header from context.branch-deploy / context.deploy-preview does block indexing at the HTTP level, so these deploys aren't unprotected. But the canonical URL leak could still send confusing duplicate signals to crawlers.
One possible fix: rather than deriving IS_PRODUCTION_DEPLOY from SITE_URL, use a dedicated boolean that only production sets:
[context.production.environment]
NEXT_PUBLIC_IS_PRODUCTION = "true"export const IS_PRODUCTION_DEPLOY = process.env.NEXT_PUBLIC_IS_PRODUCTION === "true"Since it's NEXT_PUBLIC_*, it gets inlined at build time. It defaults to false everywhere except production, and doesn't touch SITE_URL -- so no risk of breaking AB testing, OG images, JSON-LD, or other SITE_URL-dependent functionality.
That said, this is a suggestion -- the HTTP header layer already covers the indexing concern, so this is about closing the canonical URL gap for non-dev/staging deploys. Your call on whether that's worth the extra env var.
Reviewed by Claude Opus 4.6
Use NEXT_PUBLIC_CONTEXT (already inlined at build time via next.config.js) instead of parsing SITE_URL hostname. This ensures IS_PRODUCTION_DEPLOY is false on all non-production deploys (branch deploys, deploy previews), closing the canonical URL leak for arbitrary branch deploys where SITE_URL would fall through to "https://ethereum.org".
|
@wackerow thanks. I refactor the code to stop using the SITE_URL to calc production env flag...I just use the flag that we already have available. Safer and simpler. |
Move the SITE_URL fallback chain (NEXT_PUBLIC_SITE_URL → DEPLOY_PRIME_URL → DEPLOY_URL → URL) into next.config.js env config so it gets resolved at build time and inlined by webpack. This ensures SSR pages on deploy previews and branch deploys use the correct deploy-specific URL instead of falling through to "https://ethereum.org".

Summary
X-Robots-Tag: noindex, nofollowHTTP headers innetlify.tomlfor branch deploys, deploy previews, and thedev/stagingbranchesNEXT_PUBLIC_SITE_URLper branch (dev→https://dev.ethereum.org,staging→https://staging.ethereum.org) so canonical URLs and noindex meta tags resolve correctly at build timeIS_PRODUCTION_DEPLOYconstant fromSITE_URLhostname check, used by bothrobots.tsandmetadata.tsto gate indexingProblem
SITE_URLwas resolving tohttps://ethereum.orgon dev/staging deploys at SSR runtime becauseDEPLOY_PRIME_URL(a build-time-only Netlify env var) was unavailable, falling through toURLwhich always points to the production domain. This caused:ethereum.org, creating cross-domain duplicate signalsrobots.txtcorrectly blocked crawling but pages lacked<meta name="robots" content="noindex">, so Google indexed URLs it couldn't crawlFix layers
X-Robots-Tagheadernetlify.tomlNEXT_PUBLIC_SITE_URLper branchIS_PRODUCTION_DEPLOYcheckTest plan
dev.ethereum.orgreturnsX-Robots-Tag: noindex, nofollowheaderdev.ethereum.orgpages have<meta name="robots" content="noindex, nofollow">dev.ethereum.orgcanonical URLs point todev.ethereum.org, notethereum.orgstaging.ethereum.orghas the same protectionsethereum.org(production) is unaffected — no noindex, correct canonicals