feat(seo): add robots.txt, sitemap, JSON-LD, and enhanced metadata#965
Conversation
Marketing site (superset.sh): - Add robots.ts with crawl directives - Add sitemap.ts with dynamic blog post URLs - Update layout.tsx with full Open Graph, Twitter Cards, keywords - Add JSON-LD structured data (Organization, SoftwareApplication, Article, Website) - Add opengraph-image.tsx for dynamic OG image generation - Add RSS feed at /feed.xml - Add manifest.json for PWA support - Update blog posts with Article JSON-LD and canonical URLs Docs site (docs.superset.sh): - Add robots.ts with crawl directives - Add sitemap.ts from Fumadocs source - Update layout.tsx with metadataBase and full metadata
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds SEO, indexing controls, and content-distribution endpoints: metadata enrichment (Open Graph/Twitter/JSON‑LD), robots.txt and sitemap routes, RSS feed, Open Graph image generator, PWA manifest, and noindex headers for LLM-generated outputs. Changes
Sequence Diagram(s)sequenceDiagram
participant C as Crawler / Client
participant R as Next.js Route (robots/sitemap/feed)
participant S as Pages Source (getPages/getBlogPosts)
participant CDN as CDN / Cache
rect rgba(200,200,255,0.5)
C->>R: GET /robots.txt
R-->>C: robots.txt (disallow rules, sitemap URL)
end
rect rgba(200,255,200,0.5)
C->>R: GET /sitemap.xml
R->>S: fetch pages/posts
S-->>R: pages list
R-->>C: sitemap XML (entries with lastModified, priority)
R->>CDN: set cache headers
end
rect rgba(255,220,200,0.5)
C->>R: GET /feed.xml
R->>S: fetch blog posts
S-->>R: posts
R-->>C: RSS XML (application/xml, caching headers)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
… cards - Add RSS feed alternates to blog page metadata for feed discovery - Add OG/Twitter metadata to blog index page - Disallow /llms.mdx/ and /llms-full.txt in docs robots.txt - Add X-Robots-Tag: noindex to LLM/MDX endpoints (prevent duplicate content) - Add canonical URLs and Twitter cards to docs pages
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@apps/marketing/public/manifest.json`:
- Around line 9-15: The manifest.json's icons array only contains a 192x192
entry; add a 512x512 icon object to the "icons" array (e.g., { "src":
"/favicon-512.png", "sizes": "512x512", "type": "image/png" }) and include the
optional "purpose" field (e.g., "any maskable") to support adaptive icons;
ensure the referenced asset exists in public assets and follow the same key
names ("src", "sizes", "type", "purpose") so Android PWA checks and splash
screens use the 512×512 image.
In `@apps/marketing/src/app/feed.xml/route.ts`:
- Line 33: The RSS <author> element is being populated with post.author (see
escapeXml(post.author)) which is a plain name; update the feed generation to
either (A) emit a Dublin Core creator element instead: replace
<author>${escapeXml(post.author)}</author> with
<dc:creator>${escapeXml(post.author)}</dc:creator> and add the dc namespace
(xmlns:dc="http://purl.org/dc/elements/1.1/") to the feed root, or (B) if you
have an email field, format the <author> per RSS spec using post.authorEmail and
post.author like <author>${escapeXml(post.authorEmail)}
(${escapeXml(post.author)})</author>; adjust the code that builds the XML in
route.ts accordingly (references: escapeXml, post.author, post.authorEmail, feed
root).
In `@apps/marketing/src/app/layout.tsx`:
- Around line 114-116: The SoftwareApplicationJsonLd component currently
includes hardcoded aggregateRating (ratingValue: "5", ratingCount: "100") which
can be misleading; update the SoftwareApplicationJsonLd usage to remove the
aggregateRating field or replace it with real ratings sourced from your review
system before rendering (e.g., pull live ratingValue and ratingCount and pass
them into SoftwareApplicationJsonLd), ensuring the fields are only present when
genuine data exists; specifically locate the SoftwareApplicationJsonLd
invocation in layout.tsx and either delete the aggregateRating prop or
conditionally supply ratingValue/ratingCount derived from your actual reviews
API.
In `@apps/marketing/src/components/JsonLd/JsonLd.tsx`:
- Around line 44-48: The JsonLd component currently contains a hardcoded
aggregateRating object (fields ratingValue and ratingCount) which violates
Google's guidelines; remove the aggregateRating block from the JSON-LD output or
change JsonLd to only include aggregateRating when real review data is provided
(e.g., check a prop or data source before emitting the aggregateRating object).
Specifically, update the JsonLd component to stop emitting the aggregateRating
object (or make it conditional) instead of the current hardcoded aggregateRating
with ratingValue: "5" and ratingCount: "100".
- Around line 131-138: The JSON-LD currently advertises a SearchAction via the
potentialAction object in the JsonLd component (SearchAction, urlTemplate) but
the blog page does not support search; either remove the potentialAction block
from JsonLd to avoid advertising non-existent search, or implement search
handling on the blog page: accept searchParams (e.g., search_term_string) in the
blog page component, filter the posts list by that query, and ensure urlTemplate
in JsonLd matches the actual query parameter name and route; update the JsonLd
potentialAction only if the blog page implements the search handler.
🧹 Nitpick comments (5)
apps/docs/src/app/sitemap.ts (1)
4-14: Extract sitemap priorities into named constants.
This avoids magic numbers and improves maintainability.♻️ Suggested refactor
+const QUICK_START_PRIORITY = 1.0; +const DEFAULT_PRIORITY = 0.8; + export default function sitemap(): MetadataRoute.Sitemap { const baseUrl = "https://docs.superset.sh"; @@ return pages.map((page) => ({ url: `${baseUrl}${page.url}`, lastModified: new Date(), changeFrequency: "weekly" as const, - priority: page.url === "/quick-start" ? 1.0 : 0.8, + priority: page.url === "/quick-start" ? QUICK_START_PRIORITY : DEFAULT_PRIORITY, })); }apps/marketing/src/app/sitemap.ts (1)
4-46: Centralize sitemap constants (priorities and fixed dates).
This reduces magic numbers and makes updates safer.♻️ Suggested refactor
+const HOME_PRIORITY = 1.0; +const BLOG_INDEX_PRIORITY = 0.9; +const BLOG_POST_PRIORITY = 0.8; +const SECTION_PRIORITY = 0.5; +const LEGAL_PRIORITY = 0.3; +const LEGAL_LAST_MODIFIED = new Date("2025-01-15"); + export default function sitemap(): MetadataRoute.Sitemap { const baseUrl = "https://superset.sh"; @@ { url: baseUrl, lastModified: new Date(), changeFrequency: "weekly", - priority: 1.0, + priority: HOME_PRIORITY, }, @@ url: `${baseUrl}/blog`, lastModified: new Date(), changeFrequency: "daily", - priority: 0.9, + priority: BLOG_INDEX_PRIORITY, }, @@ url: `${baseUrl}/privacy`, - lastModified: new Date("2025-01-15"), + lastModified: LEGAL_LAST_MODIFIED, changeFrequency: "yearly", - priority: 0.3, + priority: LEGAL_PRIORITY, }, @@ url: `${baseUrl}/terms`, - lastModified: new Date("2025-01-15"), + lastModified: LEGAL_LAST_MODIFIED, changeFrequency: "yearly", - priority: 0.3, + priority: LEGAL_PRIORITY, }, @@ url: `${baseUrl}/ports`, lastModified: new Date(), changeFrequency: "monthly", - priority: 0.5, + priority: SECTION_PRIORITY, }, @@ const blogPages: MetadataRoute.Sitemap = posts.map((post) => ({ url: `${baseUrl}/blog/${post.slug}`, lastModified: new Date(post.date), changeFrequency: "monthly" as const, - priority: 0.8, + priority: BLOG_POST_PRIORITY, }));apps/marketing/src/app/feed.xml/route.ts (2)
5-5: Extract hardcoded URL to a named constant.The
baseUrlis a magic string that appears in multiple places across the codebase (also in sitemap, JSON-LD components). Consider extracting it to a shared constant to ensure consistency and ease future updates.Example approach
+// At module top or in a shared config +const BASE_URL = "https://superset.sh"; + export async function GET() { const posts = getBlogPosts(); - const baseUrl = "https://superset.sh"; + const baseUrl = BASE_URL;Or import from a shared config module used across the marketing app.
7-13: Consider movingescapeXmloutside the handler.The helper is recreated on every request. Moving it to module scope avoids repeated function allocation.
Suggested refactor
+const escapeXml = (str: string) => + str + .replace(/&/g, "&") + .replace(/</g, "<") + .replace(/>/g, ">") + .replace(/"/g, """) + .replace(/'/g, "'"); + export async function GET() { const posts = getBlogPosts(); const baseUrl = "https://superset.sh"; - - const escapeXml = (str: string) => - str - .replace(/&/g, "&") - .replace(/</g, "<") - .replace(/>/g, ">") - .replace(/"/g, """) - .replace(/'/g, "'");apps/marketing/src/app/blog/[slug]/page.tsx (1)
23-23: Consider extracting the base URL to avoid duplication.The URL
https://superset.shis hardcoded here and duplicated betweenBlogPostPage(line 23) andgenerateMetadata(line 56). The layout already definesmetadataBase: new URL("https://superset.sh").Consider extracting this to a shared constant or using a helper function to build blog post URLs consistently.
♻️ Suggested refactor
// In a shared constants file (e.g., `@/lib/constants.ts`) export const SITE_URL = "https://superset.sh"; export const getBlogPostUrl = (slug: string) => `${SITE_URL}/blog/${slug}`;Then import and use in this file:
+import { getBlogPostUrl } from "@/lib/constants"; + export default async function BlogPostPage({ params }: PageProps) { const { slug } = await params; const post = getBlogPost(slug); // ... - const url = `https://superset.sh/blog/${slug}`; + const url = getBlogPostUrl(slug);Also applies to: 56-56
| "icons": [ | ||
| { | ||
| "src": "/favicon-192.png", | ||
| "sizes": "192x192", | ||
| "type": "image/png" | ||
| } | ||
| ] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Step 1: Check if the manifest.json file exists and view its content
echo "=== Checking manifest.json ==="
if [ -f "apps/marketing/public/manifest.json" ]; then
echo "✓ manifest.json exists"
cat -n "apps/marketing/public/manifest.json"
else
echo "✗ manifest.json NOT FOUND"
fi
echo -e "\n=== Checking referenced favicon-192.png ==="
if [ -f "apps/marketing/public/favicon-192.png" ]; then
echo "✓ favicon-192.png exists"
ls -lh "apps/marketing/public/favicon-192.png"
else
echo "✗ favicon-192.png NOT FOUND"
fi
echo -e "\n=== Checking for favicon-512.png ==="
if [ -f "apps/marketing/public/favicon-512.png" ]; then
echo "✓ favicon-512.png exists"
ls -lh "apps/marketing/public/favicon-512.png"
else
echo "✗ favicon-512.png NOT FOUND"
fi
echo -e "\n=== Searching for all icon/favicon files in marketing/public ==="
find "apps/marketing/public/" -maxdepth 1 \( -name "*favicon*" -o -name "*icon*" \) 2>/dev/nullRepository: superset-sh/superset
Length of output: 909
🏁 Script executed:
#!/bin/bash
echo "=== Contents of app-icons directory ==="
find "apps/marketing/public/app-icons" -type f 2>/dev/null | head -20
echo -e "\n=== Checking file sizes and types ==="
ls -lhR "apps/marketing/public/app-icons" 2>/dev/nullRepository: superset-sh/superset
Length of output: 1646
Add 512×512 icon for proper PWA installation on Android.
The manifest only includes a 192×192 icon. PWA best practices require a 512×512 icon for Android splash screens and optimal installation experience across devices. Without it, the PWA may show degraded visuals or fail installation checks on some Android devices.
Additionally, consider adding the "purpose" field (e.g., "any maskable") to support adaptive icons on modern devices.
📱 Recommended icon configuration
"icons": [
{
"src": "/favicon-192.png",
"sizes": "192x192",
- "type": "image/png"
+ "type": "image/png",
+ "purpose": "any maskable"
+ },
+ {
+ "src": "/favicon-512.png",
+ "sizes": "512x512",
+ "type": "image/png",
+ "purpose": "any maskable"
}
]🤖 Prompt for AI Agents
In `@apps/marketing/public/manifest.json` around lines 9 - 15, The manifest.json's
icons array only contains a 192x192 entry; add a 512x512 icon object to the
"icons" array (e.g., { "src": "/favicon-512.png", "sizes": "512x512", "type":
"image/png" }) and include the optional "purpose" field (e.g., "any maskable")
to support adaptive icons; ensure the referenced asset exists in public assets
and follow the same key names ("src", "sizes", "type", "purpose") so Android PWA
checks and splash screens use the 512×512 image.
| <description>${escapeXml(post.description || "")}</description> | ||
| <pubDate>${new Date(post.date).toUTCString()}</pubDate> | ||
| <guid isPermaLink="true">${baseUrl}/blog/${post.slug}</guid> | ||
| <author>${escapeXml(post.author)}</author> |
There was a problem hiding this comment.
RSS <author> element expects email format.
Per the RSS 2.0 specification, the <author> element should contain an email address (optionally with a name in parentheses), e.g., author@example.com (Author Name). Using just a name string may cause validation warnings in some feed validators.
Consider using <dc:creator> (with Dublin Core namespace) for plain author names, or omit the author element if email addresses aren't available.
Option: Use dc:creator instead
-<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
+<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">Then in items:
- <author>${escapeXml(post.author)}</author>
+ <dc:creator>${escapeXml(post.author)}</dc:creator>🤖 Prompt for AI Agents
In `@apps/marketing/src/app/feed.xml/route.ts` at line 33, The RSS <author>
element is being populated with post.author (see escapeXml(post.author)) which
is a plain name; update the feed generation to either (A) emit a Dublin Core
creator element instead: replace <author>${escapeXml(post.author)}</author> with
<dc:creator>${escapeXml(post.author)}</dc:creator> and add the dc namespace
(xmlns:dc="http://purl.org/dc/elements/1.1/") to the feed root, or (B) if you
have an email field, format the <author> per RSS spec using post.authorEmail and
post.author like <author>${escapeXml(post.authorEmail)}
(${escapeXml(post.author)})</author>; adjust the code that builds the XML in
route.ts accordingly (references: escapeXml, post.author, post.authorEmail, feed
root).
| potentialAction: { | ||
| "@type": "SearchAction", | ||
| target: { | ||
| "@type": "EntryPoint", | ||
| urlTemplate: `${url}/blog?q={search_term_string}`, | ||
| }, | ||
| "query-input": "required name=search_term_string", | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for search-related functionality in the blog directory
rg -n "searchParams|query|q=" apps/marketing/src/app/blog --type ts --type tsx -A 3Repository: superset-sh/superset
Length of output: 91
🏁 Script executed:
#!/bin/bash
# Find TypeScript/React files in the blog directory
find apps/marketing/src/app/blog -type f \( -name "*.tsx" -o -name "*.ts" \) | head -20Repository: superset-sh/superset
Length of output: 854
🏁 Script executed:
#!/bin/bash
# Examine the main blog page
cat -n apps/marketing/src/app/blog/page.tsxRepository: superset-sh/superset
Length of output: 3334
🏁 Script executed:
#!/bin/bash
# Check if there's any search functionality elsewhere in the blog directory
rg -i "search" apps/marketing/src/app/blog -g "*.tsx" -g "*.ts" -A 2Repository: superset-sh/superset
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check the JsonLd component to understand the SearchAction context
sed -n '120,150p' apps/marketing/src/components/JsonLd/JsonLd.tsxRepository: superset-sh/superset
Length of output: 696
Remove or implement the advertised search functionality.
The SearchAction schema points to /blog?q={search_term_string}, but the blog page at apps/marketing/src/app/blog/page.tsx doesn't accept or handle searchParams. Either implement search filtering on the blog page or remove the SearchAction from the structured data to avoid misleading search engines.
🤖 Prompt for AI Agents
In `@apps/marketing/src/components/JsonLd/JsonLd.tsx` around lines 131 - 138, The
JSON-LD currently advertises a SearchAction via the potentialAction object in
the JsonLd component (SearchAction, urlTemplate) but the blog page does not
support search; either remove the potentialAction block from JsonLd to avoid
advertising non-existent search, or implement search handling on the blog page:
accept searchParams (e.g., search_term_string) in the blog page component,
filter the posts list by that query, and ensure urlTemplate in JsonLd matches
the actual query parameter name and route; update the JsonLd potentialAction
only if the blog page implements the search handler.
Replace hardcoded domain URLs with COMPANY constants from @superset/shared: - COMPANY.MARKETING_URL for superset.sh URLs - COMPANY.DOCS_URL for docs.superset.sh URLs - COMPANY.NAME, COMPANY.GITHUB_URL, COMPANY.X_URL for branding This ensures consistency and makes domain changes easier to manage.
Summary
Comprehensive SEO implementation for marketing and docs sites:
/feed.xmlfor blog subscribersMarketing Site Changes
src/app/robots.tssrc/app/sitemap.tssrc/app/layout.tsxsrc/app/opengraph-image.tsxsrc/app/feed.xml/route.tssrc/app/blog/[slug]/page.tsxsrc/components/JsonLd/public/manifest.jsonDocs Site Changes
src/app/robots.tssrc/app/sitemap.tssrc/app/layout.tsxTest plan
/robots.txton both sites - should return crawl rules/sitemap.xmlon both sites - should list all pages/feed.xmlon marketing site - should return RSS feedSummary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.