Skip to content

fix(glossary): various workflow fixes & latest content#2664

Merged
chronark merged 24 commits intounkeyed:mainfrom
p6l-richard:richard/fix-glossary-setup
Nov 18, 2024
Merged

fix(glossary): various workflow fixes & latest content#2664
chronark merged 24 commits intounkeyed:mainfrom
p6l-richard:richard/fix-glossary-setup

Conversation

@p6l-richard
Copy link
Contributor

@p6l-richard p6l-richard commented Nov 15, 2024

What does this PR do?

Demo

image
image
image
image
image

Discoverability Setup

Publish Sitemap was added to the glossary
Search Console submission follows upon merge

Publishing plan

PR is ready for review with the following changes:

  • 3 new generated entries: statelessness, single-sign-on, transport layer security
  • 1 new update: mime-types is now generated entirely by the workflow (required manual updates previously)
  • faq pair generation included
  • fixes an issue with drizzle <> content-collection api (yaml frontmatter generation)
  • fixes an issue that led to duplicate entries in our database
  • fixes an issue with the github PR creation for conflicting previous branches

Room for improvements (content quality)

  • Brand filter. The SSO article is heavily skewed towards brands right now — we should add some guardrails to prevent this from happening as our glossary should be objective (it cites Okta, AWS heavily and even provides code examples)
  • Summary Card.
    • LLM sometimes generated multiple mentions in our info boxes — styling needs to handle this
    • Some text generation is too big for the UI size restrictions (see screenshot for SSO)
  • Rich Text Content. The outline generation needs an upgrade as some headers are inappropriate (e.g. Creating a PDF Guide for API Security Best Practices is not what we want to display). Sometimes the section’s content is too thin for what the section header promises (e.g. OWASP API Security Top 10: A Deep Dive is followed by 4 sentences
  • FAQ Questions. Some questions are too long, we should keep them concise.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

Visit the preview links for:

  • /glossary/mime-types
  • /glossary/statelessness
  • /glossary/single-sign-on
  • /glossary/transport-layer-security

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new schemas for FAQs and takeaways, enhancing content organization.
    • Added dynamic FAQ content to the glossary term display, improving user experience.
    • New tasks for generating FAQs and content takeaways, utilizing OpenAI for enhanced content creation.
    • New task for categorizing API-related terms based on predefined categories.
    • New sitemap generation functionality for glossary entries.
  • Improvements

    • Enhanced error handling and caching strategies for search queries and glossary entries.
    • Updated SEO meta tags generation process to improve clarity and keyword relevance.
    • Streamlined workflow for generating glossary entries, allowing concurrent task execution.
  • Content Updates

    • Comprehensive updates to existing glossary entries, including new sections and improved clarity.
    • New entries added for Single Sign-On, Statelessness, and Transport Layer Security.
    • Significant updates to the MIME types entry, including a new FAQ section.

@changeset-bot
Copy link

changeset-bot bot commented Nov 15, 2024

⚠️ No Changeset found

Latest commit: f7896e3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Nov 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dashboard ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 18, 2024 2:30pm
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 18, 2024 2:30pm
www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 18, 2024 2:30pm

@vercel
Copy link

vercel bot commented Nov 15, 2024

@p6l-richard is attempting to deploy a commit to the Unkey Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 15, 2024

📝 Walkthrough

Walkthrough

This pull request includes updates to the billing application, primarily focusing on dependency management, schema modifications, and the introduction of new features. The package.json file reflects version upgrades for several @trigger.dev packages and the addition of new dependencies. Schema files have been enhanced to include new fields and types, while several tasks related to glossary entries have been created or modified to improve functionality and error handling.

Changes

File Path Change Summary
apps/billing/package.json Updated dependencies: @trigger.dev/nextjs, @trigger.dev/sdk, @trigger.dev/slack to 3.2.0; added github-slugger and js-yaml.
apps/billing/src/lib/db-marketing/schemas/entries.ts Added new imports, constants, types, and schema definitions; expanded entries table with new fields.
apps/billing/src/lib/db-marketing/schemas/searchQuery.ts Renamed relations in searchQueryRelations.
apps/billing/src/lib/db-marketing/schemas/takeaways-schema.ts Introduced takeawaysSchema and Takeaways type.
apps/billing/src/lib/search-query.ts Updated getOrCreateSearchQuery function signature and logic for querying and error handling.
apps/billing/src/trigger/glossary/_generate-glossary-entry.ts Modified task workflow to run tasks in parallel and enhance error handling.
apps/billing/src/trigger/glossary/content-takeaways.ts Introduced contentTakeawaysTask for generating structured takeaways.
apps/billing/src/trigger/glossary/create-pr.ts Updated createPrTask to include YAML generation and enhanced error handling.
apps/billing/src/trigger/glossary/draft-sections.ts Changed insertion to update logic for existing entries in draftSectionsTask.
apps/billing/src/trigger/glossary/generate-faqs.ts Added generateFaqsTask for generating FAQs from database and OpenAI API.
apps/billing/src/trigger/glossary/keyword-research.ts Updated keywordResearchTask to improve search query handling and return structure.
apps/billing/src/trigger/glossary/linking-categories.ts Introduced linkingCategoriesTask for categorizing API-related terms.
apps/billing/src/trigger/glossary/seo-meta-tags.ts Enhanced seoMetaTagsTask with new checks for SEO metadata.
apps/www/app/glossary/[slug]/page.tsx Updated FAQ component to use dynamic FAQs from the glossary term.
apps/www/app/glossary/sitemap.ts Introduced a new function to generate a sitemap for glossary entries.
apps/www/content-collections.ts Enhanced glossary collection schema with new fields and updated transform function.
apps/www/content/glossary/mime-types.mdx Updated content structure and terminology for MIME types, added a new FAQ section.
apps/www/content/glossary/single-sign-on.mdx Created a new comprehensive guide on Single Sign-On (SSO) API.
apps/www/content/glossary/statelessness.mdx Created a new comprehensive guide on statelessness in API design.
apps/www/content/glossary/transport-layer-security.mdx Created a new guide on Transport Layer Security (TLS) in API development.
apps/www/lib/schemas/faq-schema.ts Introduced new faqSchema for FAQs.
apps/www/lib/schemas/takeaways-schema.ts Introduced new takeawaysSchema for glossary entry takeaways.

Possibly related PRs

Suggested labels

🕹️ oss.gg, :joystick: 150 points, hacktoberfest

Suggested reviewers

  • mcstepp
  • chronark
  • perkinsjr
  • MichaelUnkey

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 15, 2024

Thank you for following the naming conventions for pull request titles! 🙏

@chronark
Copy link
Collaborator

we're having some major layout issues, but let's ignore that for now
CleanShot 2024-11-18 at 15 27 49@2x

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 25

🧹 Outside diff range and nitpick comments (38)
apps/www/lib/schemas/faq-schema.ts (2)

3-8: Code duplication needs to be addressed

The documentation clearly indicates this is a duplicate of apps/billing/src/lib/db-marketing/schemas/entries.ts. Having duplicate schemas across different apps can lead to maintenance issues and inconsistencies.

Consider prioritizing the TODO item to extract this into a shared package. This would:

  • Ensure consistency between apps
  • Reduce maintenance overhead
  • Provide a single source of truth
  • Make future schema updates easier to manage

Would you like me to help create a GitHub issue to track this technical debt?


9-14: Consider adding validation rules to the schema

The current schema is quite permissive. Given this is for FAQ entries that will be displayed to users, consider adding some validation rules.

Here's a suggested enhancement:

 export const faqSchema = z.array(
   z.object({
-    question: z.string(),
-    answer: z.string(),
+    question: z.string()
+      .min(3, "Question must be at least 3 characters")
+      .max(200, "Question must not exceed 200 characters"),
+    answer: z.string()
+      .min(10, "Answer must be at least 10 characters")
+      .max(2000, "Answer must not exceed 2000 characters")
+      .transform((str) => str.trim()),
   }),
 );
apps/billing/src/lib/db-marketing/schemas/takeaways-schema.ts (1)

3-7: Address the TODO comment about schema duplication.

The comment indicates potential code duplication between this schema and apps/www. To maintain a single source of truth and improve maintainability, this schema should be moved to a shared package.

Would you like me to help create a GitHub issue to track the extraction of this schema into a shared package? I can provide the necessary steps and code changes required for this refactoring.

apps/www/app/glossary/sitemap.ts (2)

7-8: Consider implementing caching for better performance.

Setting revalidate to 0 and using force-dynamic means the sitemap will be regenerated on every request. Consider implementing a reasonable cache duration (e.g., 1 hour) since glossary content doesn't change frequently.

-export const dynamic = "force-dynamic";
-export const revalidate = 0;
+// Cache sitemap for 1 hour since glossary content updates are infrequent
+export const revalidate = 3600;

15-17: Consider implementing sitemap pagination.

The comment about Google's 50,000 URL limit is good, but we should implement the pagination logic now to avoid future issues.

Would you like me to provide an implementation for paginated sitemaps? This would include:

  • Splitting entries into chunks of 50,000
  • Creating multiple sitemap files
  • Generating a sitemap index file
apps/www/lib/schemas/takeaways-schema.ts (1)

9-35: Enhance schema validation rules for better data integrity

While the basic structure is good, consider adding more specific validation rules:

 export const takeawaysSchema = z.object({
-  tldr: z.string(),
+  tldr: z.string().min(10).max(500),
   definitionAndStructure: z.array(
     z.object({
-      key: z.string(),
-      value: z.string(),
+      key: z.string().min(1),
+      value: z.string().min(1),
     }),
-  ),
+  ).min(1),
   // ... similar changes for historicalContext
   usageInAPIs: z.object({
-    tags: z.array(z.string()),
+    tags: z.array(z.string().min(1)).min(1),
     description: z.string(),
   }),
   bestPractices: z.array(z.string()),
   recommendedReading: z.array(
     z.object({
       title: z.string(),
-      url: z.string(),
+      url: z.string().url(),
     }),
   ),
-  didYouKnow: z.string(),
+  didYouKnow: z.string().optional(),
 });

Consider:

  1. Adding length constraints for text fields
  2. Validating URLs
  3. Making some fields optional where appropriate
  4. Adding minimum array length requirements
apps/billing/src/lib/db-marketing/schemas/entries.ts (1)

18-27: Consider adding validation rules to the FAQ schema.

While the basic structure is good, consider enhancing the schema with validation rules for better data quality:

 export const faqSchema = z.array(
   z.object({
-    question: z.string(),
-    answer: z.string(),
+    question: z.string().min(1).max(255).trim(),
+    answer: z.string().min(1).trim(),
   }),
-);
+).min(1).max(10);

This would:

  • Ensure non-empty strings with proper length limits
  • Remove unnecessary whitespace
  • Enforce at least one FAQ entry
  • Limit the maximum number of FAQs to 10 for better UX
apps/billing/src/trigger/glossary/linking-categories.ts (3)

10-24: Fix typo in comment and consider tracking the TODO.

The comment contains a typo ("dicatated" should be "dictated"). Also, since this is noted as a temporary solution, it would be helpful to track this as a future improvement task.

Would you like me to create a GitHub issue to track the implementation of search volume-based categories?


82-85: Specify required columns in final query.

The final database query should specify the required columns explicitly for better performance and clarity.

     return db.query.entries.findFirst({
       where: eq(entries.inputTerm, term),
+      columns: {
+        id: true,
+        inputTerm: true,
+        categories: true,
+      },
     });

53-57: Consider extracting OpenAI system prompts to a configuration file.

The system prompt for the OpenAI model could be moved to a separate configuration file. This would make it easier to maintain and update prompts across the application, especially if you have multiple similar tasks.

Consider creating a prompts.config.ts file to store all system prompts:

// prompts.config.ts
export const PROMPTS = {
  categoryGeneration: `
    You are an API documentation expert. Categorize API-related terms into relevant categories.
    Only use categories from the predefined list. A term can belong to multiple categories if relevant.
    Consider the term's primary purpose, related concepts, and practical applications.
  `,
  // ... other prompts
};
apps/billing/src/lib/search-query.ts (4)

10-13: Add JSDoc documentation for the function parameters

While the function signature is well-structured, adding documentation would improve maintainability by explaining the purpose of onCacheHit and its possible values.

+/**
+ * Retrieves or creates a search query for a given term
+ * @param {Object} params - The function parameters
+ * @param {string} params.term - The search term to query
+ * @param {CacheStrategy} params.onCacheHit - Strategy for handling cache hits ("stale" | "revalidate")
+ * @returns {Promise<Entry>} The entry with its associated search query
+ */
 export async function getOrCreateSearchQuery({
   term,
   onCacheHit = "stale",
 }: { term: string; onCacheHit: CacheStrategy })

15-20: Consider adding an index for the query performance

The query filters on entries.inputTerm and orders by searchQueries.createdAt. Adding an index could improve query performance.

Consider adding an index on entries.inputTerm if not already present:

CREATE INDEX idx_entries_input_term ON entries(input_term);

23-25: Add debug logging for cache hits

Consider adding logging to track cache hit patterns and revalidation frequency.

 if (existing?.searchQuery && onCacheHit === "revalidate") {
+  console.debug(`Cache hit for term: ${term}, strategy: ${onCacheHit}`);
   return existing;
 }

27-30: Enhance error message with recovery steps

The error message could be more helpful by including recovery steps or troubleshooting information.

 if (!existing) {
   throw new AbortTaskRunError(
-    `Entry not found for term: ${term}. It's likely that the keyword-research task failed.`,
+    `Entry not found for term: ${term}. The keyword-research task failed. Please ensure the task is properly queued and retry the operation.`,
   );
 }
apps/billing/src/trigger/glossary/keyword-research.ts (2)

36-37: LGTM! Consider enhancing error handling.

The refactoring to preserve the full context from getOrCreateSearchQuery is a good improvement. The destructuring pattern is clean and maintains backward compatibility.

Consider adding type narrowing for better error handling:

-    const entryWithSearchQuery = await getOrCreateSearchQuery({ term, onCacheHit });
-    const searchQuery = entryWithSearchQuery?.searchQuery;
+    const entryWithSearchQuery = await getOrCreateSearchQuery({ term, onCacheHit });
+    if (!entryWithSearchQuery) {
+      throw new AbortTaskRunError("Failed to create or retrieve search query entry");
+    }
+    const searchQuery = entryWithSearchQuery.searchQuery;

Line range hint 16-117: Consider enhancing task observability and maintainability.

The task implementation is robust, but could benefit from the following improvements:

  1. Document the significance of the THREE constant
  2. Add more granular progress tracking for better observability
  3. Structure the error handling more consistently across all steps

Consider these improvements:

-export const THREE = 3;
+/** Maximum number of top search results to process for keyword extraction */
+export const MAX_RESULTS_TO_PROCESS = 3;

 export const keywordResearchTask = task({
   id: "keyword_research",
   retry: {
     maxAttempts: 3,
   },
+  onProgress: true,
   run: async ({
     term,
     onCacheHit = "stale" as CacheStrategy,
+    onProgress?: (progress: number, message: string) => void,
   }) => {
+    const reportProgress = (step: number, total: number, message: string) => {
+      onProgress?.(step / total, message);
+    };
apps/www/content-collections.ts (2)

120-123: Consider strengthening the schema types

The new fields could benefit from stronger type validation:

  • updatedAt could use z.date() or z.string().datetime() for proper date validation
  • slug could benefit from a URL-safe string pattern validation
-    updatedAt: z.string(),
-    slug: z.string(),
+    updatedAt: z.string().datetime(),
+    slug: z.string().regex(/^[a-z0-9]+(?:-[a-z0-9]+)*$/, 
+      'Slug must be URL-safe: lowercase letters, numbers, and hyphens only'),

119-124: Consider implementing schema versioning

The glossary schema is evolving with new fields. As the content collection grows, consider implementing schema versioning to:

  • Track schema changes over time
  • Enable backward compatibility
  • Facilitate content migration
  • Support schema validation across different versions

This would help manage the growing complexity of the content structure.

apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (2)

23-23: Consider renumbering workflow steps sequentially

The current step numbering uses "4.5" which breaks the sequential flow. Consider renumbering the steps sequentially for better clarity:

- * 3. Draft Sections & Content Takeaways (in parallel)
- * 4. Generate SEO Meta Tags
- * 5. Generate FAQs
- * 6. Create PR
+ * 3. Draft Sections & Content Takeaways (in parallel)
+ * 4. Generate SEO Meta Tags
+ * 5. Generate FAQs
+ * 6. Create PR

Also applies to: 25-25, 26-26


120-127: Update console output step numbering for consistency

The FAQ generation implementation is correct, but the console output step numbering should be updated to match the sequential workflow steps.

- console.info("4.5/5 - Generating FAQs...");
+ console.info("5/6 - Generating FAQs...");
apps/www/content/glossary/mime-types.mdx (2)

80-82: Enhance the conclusion with actionable takeaways.

The conclusion could be more valuable by including specific next steps or best practices for implementing MIME types in APIs.

 ## Conclusion

-MIME types are a fundamental aspect of web and API development, ensuring that data is transmitted and interpreted correctly. By understanding the structure and usage of MIME types, developers can enhance their API's functionality and ensure seamless data exchange. For further exploration, consider checking out resources on **MIME types API development GitHub** and **MIME types API development PDF** for additional insights and examples.
+## Conclusion and Next Steps
+
+MIME types are a fundamental aspect of web and API development, ensuring that data is transmitted and interpreted correctly. To implement MIME types effectively in your API:
+
+1. Always validate Content-Type headers in your API endpoints
+2. Include proper MIME type definitions in your API documentation
+3. Use standardized MIME types from the IANA registry
+4. Implement proper error handling for unsupported MIME types
+
+For further exploration, refer to the IANA Media Types registry and modern API specification tools like OpenAPI, which provide structured ways to define supported MIME types.

76-78: Add code examples for API implementation.

The API development section would benefit from practical code examples showing how to handle MIME types in common scenarios.

 ## MIME Types in API Development

 In API development, MIME types define the format of the data being exchanged. Common MIME types in RESTful APIs include `application/xml` and `application/json`. Specifying the correct MIME type in API requests and responses is vital for the data to be accurately interpreted and processed by the consuming application or service. For practical examples, developers can explore **MIME types API development examples** on platforms like GitHub.
+
+Here's an example of handling MIME types in a REST API endpoint:
+
+```javascript
+app.post('/api/data', (req, res) => {
+  const contentType = req.get('Content-Type');
+  
+  if (contentType !== 'application/json') {
+    return res.status(415).json({
+      error: 'Unsupported Media Type',
+      message: 'This API endpoint only accepts application/json'
+    });
+  }
+  
+  // Process JSON data...
+  res.set('Content-Type', 'application/json');
+  res.json({ status: 'success' });
+});
+```
apps/billing/src/trigger/glossary/draft-sections.ts (1)

81-82: Add input validation and clarify header stripping purpose.

While the regex pattern is correct, consider adding input validation and clarifying the purpose of header stripping in the comment.

-    // Strip any leading single # header if present
+    // Strip any leading H1 header to avoid duplicate titles since the term is displayed separately
+    const finalContent = optimizedContent?.trim()
+      ? optimizedContent.replace(/^#\s+[^\n]+\n/, "")
+      : optimizedContent;
apps/www/content/glossary/transport-layer-security.mdx (3)

6-6: Categories array should not be empty.

The empty categories array might affect content organization and discoverability. Consider adding relevant categories such as "security", "networking", or "protocols".


88-90: Enhance the TLS vs SSL section with specific security recommendations.

While the differences are mentioned, it would be valuable to add specific security recommendations:

Consider adding these points:

  • Explicitly mention that SSL 2.0 and 3.0 are deprecated and vulnerable to attacks like POODLE
  • Recommend minimum TLS version (TLS 1.2+) and discourage use of TLS 1.0/1.1
  • Include guidance on secure cipher suites configuration
  • Mention the importance of regular security audits and certificate rotation

115-115: Improve sentence clarity.

Consider rewording for better clarity:

-Implementing TLS not only helps in compliance with security standards but also builds trust with users by protecting their data.
+TLS implementation ensures security compliance and builds user trust through data protection.
🧰 Tools
🪛 LanguageTool

[style] ~115-~115: Opting for a less wordy alternative here can improve the clarity of your writing.
Context: ...re all communications. Implementing TLS not only helps in compliance with security standards but also builds trust with users by protecting their da...

(NOT_ONLY_ALSO)

apps/www/content/glossary/statelessness.mdx (2)

6-6: Consider populating the categories array.

The categories array is currently empty. Consider adding relevant categories to improve content organization and discoverability.


56-56: Fix minor grammar in the introduction.

Consider rephrasing "how web services" to "how web services work" or "how web services operate" for better clarity.

-**Statelessness** is a fundamental concept in API development, significantly influencing how web services and client applications interact.
+**Statelessness** is a fundamental concept in API development, significantly influencing how web services operate and client applications interact.
🧰 Tools
🪛 LanguageTool

[grammar] ~56-~56: The expression ‘how to web’ requires the preposition ‘to’.
Context: ... development, significantly influencing how web services and client applications intera...

(HOW_TO_VERB)

apps/www/app/glossary/[slug]/page.tsx (1)

Line range hint 186-191: Consider conditional rendering for empty FAQs

The FAQ section is always rendered, even when term.faq might be empty. Consider wrapping it in a conditional check for better UX.

-              <div className="mt-12 sm:mx-6">
-                <FAQ
-                  items={term.faq}
-                  title={`Questions & Answers about ${term.term}`}
-                  description={`We answer common questions about ${term.term}.`}
-                  epigraph="FAQ"
-                />
-              </div>
+              {term.faq?.length > 0 && (
+                <div className="mt-12 sm:mx-6">
+                  <FAQ
+                    items={term.faq}
+                    title={`Questions & Answers about ${term.term}`}
+                    description={`We answer common questions about ${term.term}.`}
+                    epigraph="FAQ"
+                  />
+                </div>
+              )}
apps/www/content/glossary/single-sign-on.mdx (2)

6-6: Consider adding relevant categories for better content organization.

The categories array is currently empty. Consider adding relevant categories like "authentication", "security", or "api-integration" to improve content discoverability and organization.


39-44: Consider using environment variables or a content management system for external URLs.

The recommended reading section contains hardcoded URLs that might become stale over time. Consider managing these through environment variables or a content management system for easier maintenance.

apps/billing/src/trigger/glossary/content-takeaways.ts (3)

44-96: Consider externalizing the prompt strings for maintainability

The multi-line strings used for the system and prompt parameters in the generateObject function are extensive. Moving these prompts to separate template files or constants can improve readability and make future updates easier.


17-20: Use a type alias for function parameters

Defining an interface or type alias for the function parameters can improve code readability and maintainability.

Apply this diff to define a type alias:

 export const contentTakeawaysTask = task({
   id: "content_takeaways",
   retry: {
     maxAttempts: 3,
   },
   run: async ({
     term,
     onCacheHit = "stale" as CacheStrategy,
-  }: {
-    term: string;
-    onCacheHit?: CacheStrategy;
-  }) => {
+  }: ContentTakeawaysParams) => {
     // Function body
   },
 });

+ interface ContentTakeawaysParams {
+   term: string;
+   onCacheHit?: CacheStrategy;
+ }

105-108: Ensure consistent use of await with asynchronous calls

Since db.query.entries.findFirst returns a promise, it's good practice to use await to ensure the promise is resolved before returning.

Apply this diff:

     return db.query.entries.findFirst({
       where: eq(entries.inputTerm, term),
     });
+    return await db.query.entries.findFirst({
+      where: eq(entries.inputTerm, term),
+    });
apps/billing/src/trigger/glossary/create-pr.ts (2)

7-7: Ensure Type Definitions for 'js-yaml' are Installed

The comment // install @types/js-yaml? suggests that type definitions for js-yaml are missing. To maintain type safety, please install the @types/js-yaml package as a development dependency.

Apply this command to install the type definitions:

npm install --save-dev @types/js-yaml

129-138: Enhance Error Handling When Deleting Branches

The try-catch block for deleting an existing branch catches errors but only logs them:

try {
  await octokit.git.deleteRef({
    owner,
    repo,
    ref: `heads/${branch}`,
  });
  console.info("2.2.2 ⌫ Branch deleted");
} catch (error) {
  console.error(`2.2.3 ❌ Error deleting branch: ${error}`);
}

Consider handling specific error cases or rethrowing unexpected errors to prevent silent failures. This ensures that critical issues aren't overlooked.

Here's an improved error handling approach:

 try {
   await octokit.git.deleteRef({
     owner,
     repo,
     ref: `heads/${branch}`,
   });
   console.info("2.2.2 ⌫ Branch deleted");
-} catch (error) {
-  console.error(`2.2.3 ❌ Error deleting branch: ${error}`);
+} catch (error: any) {
+  if (error.status === 422) {
+    // Branch does not exist; proceed without interruption
+    console.warn(`2.2.3 ⚠️ Branch does not exist: ${error.message}`);
+  } else {
+    // Unexpected error; rethrow to handle upstream
+    console.error(`2.2.3 ❌ Error deleting branch: ${error}`);
+    throw error;
+  }
 }

This modification handles the case where the branch doesn't exist and ensures other errors are properly escalated.

apps/billing/src/trigger/glossary/seo-meta-tags.ts (2)

91-91: Typo in system prompt: Correct 'punctuatoin' to 'punctuation'

There's a typo in the system prompt at line 91. The word "punctuatoin" should be corrected to "punctuation".

Apply this diff to fix the typo:

-              - omit the punctuatoin at the end to save a character
+              - omit the punctuation at the end to save a character

63-149: Consider externalizing system prompts for better maintainability

The extensive system prompts within the generateObject calls make the code lengthy and reduce readability. Consider extracting these prompts into separate constants or external files to enhance maintainability and make the main code more concise.

For example, you can refactor the system prompt as:

const seoMetaTagsSystemPrompt = `
  You are three specialized experts collaborating on creating meta tags for an API documentation glossary:
  ...
  (rest of the prompt)
`;

const craftedMetaTags = await generateObject({
  model: openai("gpt-4"),
  system: seoMetaTagsSystemPrompt,
  prompt: `...`,
  schema: ...,
  temperature: 0.3,
});
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b21b1c4 and 817ac4e.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (23)
  • apps/billing/package.json (2 hunks)
  • apps/billing/src/lib/db-marketing/schemas/entries.ts (2 hunks)
  • apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/takeaways-schema.ts (1 hunks)
  • apps/billing/src/lib/search-query.ts (2 hunks)
  • apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (5 hunks)
  • apps/billing/src/trigger/glossary/content-takeaways.ts (1 hunks)
  • apps/billing/src/trigger/glossary/create-pr.ts (6 hunks)
  • apps/billing/src/trigger/glossary/draft-sections.ts (1 hunks)
  • apps/billing/src/trigger/glossary/generate-faqs.ts (1 hunks)
  • apps/billing/src/trigger/glossary/keyword-research.ts (2 hunks)
  • apps/billing/src/trigger/glossary/linking-categories.ts (1 hunks)
  • apps/billing/src/trigger/glossary/seo-meta-tags.ts (2 hunks)
  • apps/www/app/glossary/[slug]/page.tsx (1 hunks)
  • apps/www/app/glossary/sitemap.ts (1 hunks)
  • apps/www/content-collections.ts (2 hunks)
  • apps/www/content/glossary/mime-types.mdx (1 hunks)
  • apps/www/content/glossary/single-sign-on.mdx (1 hunks)
  • apps/www/content/glossary/statelessness.mdx (1 hunks)
  • apps/www/content/glossary/transport-layer-security.mdx (1 hunks)
  • apps/www/content/statelessness.mdx (0 hunks)
  • apps/www/lib/schemas/faq-schema.ts (1 hunks)
  • apps/www/lib/schemas/takeaways-schema.ts (1 hunks)
💤 Files with no reviewable changes (1)
  • apps/www/content/statelessness.mdx
🧰 Additional context used
🪛 LanguageTool
apps/www/content/glossary/statelessness.mdx

[grammar] ~56-~56: The expression ‘how to web’ requires the preposition ‘to’.
Context: ... development, significantly influencing how web services and client applications intera...

(HOW_TO_VERB)

apps/www/content/glossary/transport-layer-security.mdx

[style] ~115-~115: Opting for a less wordy alternative here can improve the clarity of your writing.
Context: ...re all communications. Implementing TLS not only helps in compliance with security standards but also builds trust with users by protecting their da...

(NOT_ONLY_ALSO)

🔇 Additional comments (29)
apps/www/lib/schemas/faq-schema.ts (2)

16-16: LGTM!

The type export using z.infer is the correct approach for maintaining type safety with Zod schemas.


1-16: Verify schema integration with content collections

Let's verify how this schema is being used in the content collections and ensure proper integration.

✅ Verification successful

The previous results showed that the schema is imported and used in content collections, but we need to verify the actual content files and their structure to ensure proper integration. These commands will help us understand how the schema is being used in practice.


Schema is properly integrated and actively used in glossary entries

The FAQ schema is correctly integrated and actively used in the codebase:

  • Properly imported and used in apps/www/content-collections.ts for schema validation
  • Actively used in glossary content files under apps/www/content/glossary/ with correct structure
  • All glossary entries (mime-types.mdx, single-sign-on.mdx, statelessness.mdx, transport-layer-security.mdx) follow the schema format with question/answer pairs

The schema integration is working as intended and the comment about extracting it to a shared package remains valid for future improvements.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check how the FAQ schema is integrated with content collections

# Check for imports of the FAQ schema
rg -l "faq-schema" apps/www/

# Look for usage of the FAQ schema in content collections
rg -A 10 "faqSchema" apps/www/content-collections.ts

# Check for any existing FAQ data files
fd -e yaml -e json --full-path apps/www/ | rg -i "faq"

Length of output: 942


Script:

#!/bin/bash
# Let's verify the actual content files and schema usage

# Check for content files that might use the FAQ schema
fd . apps/www/content/

# Look for FAQ content in MDX files
rg -l "faq:" apps/www/content/

# Check the actual content structure
rg -A 5 "faq:" apps/www/content/

# Verify if there are any other files importing or using the FAQ schema
rg -l "faqSchema" apps/www/

Length of output: 9989

apps/billing/src/lib/db-marketing/schemas/takeaways-schema.ts (2)

36-36: LGTM!

The type export is well-implemented, using Zod's type inference capabilities correctly.


1-36: Verify schema integration with dependent files.

The schema appears to be referenced by other files in the codebase. Let's verify the integration is correct.

✅ Verification successful

Based on the search results, I can now generate the final response as I have sufficient information about the schema integration:

Schema is properly integrated across the codebase

The takeawaysSchema is correctly integrated and used consistently across both apps:

  • In apps/www:

    • Used in content collections for schema validation (content-collections.ts)
    • Imported and used in the glossary UI components ([slug]/takeaways.tsx, page.tsx)
  • In apps/billing:

    • Used as database schema type in entries.ts
    • Imported for content generation in content-takeaways.ts
    • Integrated in the glossary entry generation workflow (_generate-glossary-entry.ts)

The schema is properly synchronized between the two apps and follows the same structure, validating the @sourceOfTruth comment in the schema file.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the schema integration with entries.ts and content-collections.ts

# Search for files importing or using the takeawaysSchema
echo "Files importing or using takeawaysSchema:"
rg -l "takeawaysSchema|Takeaways" --type ts

# Show the context of usage
echo "\nUsage context:"
rg -A 5 "takeawaysSchema|Takeaways" --type ts

Length of output: 10908

apps/www/app/glossary/sitemap.ts (1)

11-11: Verify content-collections integration.

The comment suggests that allGlossaries from content-collections wasn't available. This might indicate an issue with the build process or module resolution.

apps/www/lib/schemas/takeaways-schema.ts (1)

37-37: LGTM! Clean type export implementation

The type inference from the schema follows TypeScript best practices.

apps/billing/package.json (2)

38-39: LGTM: New dependencies align with PR objectives

The addition of:

  • github-slugger: Useful for generating URL-friendly slugs for glossary entries
  • js-yaml: Addresses the YAML frontmatter generation issues mentioned in PR objectives

25-27: Verify @trigger.dev version updates

The minor version bump (3.1.2 → 3.2.0) for @trigger.dev packages might introduce new features or changes that could affect the workflow.

apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (1)

42-44: Verify usage of renamed relation searchResponse

The relation name change from plural to singular form better reflects the one-to-one relationship. However, we should verify that all references to this relation have been updated consistently.

#!/bin/bash
# Description: Find any remaining references to the old relation name
# Expected: No occurrences of the old name 'searchResponses' in relation context

# Search for old relation name
rg -l 'searchResponses' --type ts

# Search for specific usage patterns that might indicate relation access
ast-grep --pattern 'searchResponses: one($$$)' 
ast-grep --pattern '$_.searchResponses'
apps/billing/src/lib/db-marketing/schemas/entries.ts (2)

2-16: LGTM! Import statements are well-organized.

The new imports are properly structured and the type-only import for Takeaways follows TypeScript best practices.


38-42: Verify schema compatibility with existing entries.

Since this PR mentions updates to entries like "mime-types", we should verify the schema changes won't affect existing data.

✅ Verification successful

Schema changes appear safe for existing data

The schema inspection reveals that all new fields (metaH1, categories, takeaways, faq) are optional in the schema definition, as they don't have the .notNull() constraint. Additionally, categories has a default value of []. The fields are properly typed with their respective types and have dedicated processing modules in the trigger/glossary/ directory.

  • linking-categories.ts, content-takeaways.ts, and other related modules exist to handle these fields
  • No breaking migrations were found that could affect existing data
  • The schema is using proper type definitions with $type for JSON fields
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any entries that might be affected by the schema changes

# Look for existing entries without the new required fields
rg -l "mime-types" --type ts

# Check for any direct database queries or migrations
fd -e sql -e ts -e js --full-path "migration|query" --exec rg -l "entries" {}

Length of output: 206


Script:

#!/bin/bash
# Let's check the schema changes and migrations more thoroughly

# Find all migration files in the repository
fd -e sql -e ts -e js "migration"

# Check the content of the entries schema file
cat apps/billing/src/lib/db-marketing/schemas/entries.ts

# Look for any database operations related to entries
rg -l "entries" --type ts

# Check for any existing database operations or tests using these fields
rg -A 5 "meta_h1|linking_categories|content_takeaways|content_faq" --type ts

Length of output: 6800

apps/billing/src/trigger/glossary/linking-categories.ts (1)

1-8: Review import sources and consider using more specific package names.

The import from 'ai' seems to be using a generic package name. Consider using a more specific package name or documenting the exact package being used.

✅ Verification successful

Import from 'ai' package is correctly used and well-established in the codebase

The ai package is:

  • A legitimate dependency listed in multiple package.json files
  • Consistently used across multiple files in the codebase for generateObject and generateText functions
  • Properly versioned (^3.4.7 in apps/billing)

The import is valid and follows the project's established patterns.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the package.json for the 'ai' dependency
rg '"ai":\s+"[^"]+"|"ai":$' -g 'package.json'

Length of output: 196


Script:

#!/bin/bash
# Check if 'ai' package is a well-known package by checking its usage
rg -A 5 'from\s+["\']ai["\']' 

Length of output: 93


Script:

#!/bin/bash
# Check if 'ai' package is a well-known package by checking its usage
rg "from 'ai'" 
rg 'from "ai"'

Length of output: 871

apps/billing/src/lib/search-query.ts (1)

4-8: LGTM: Import statements are properly organized

The new imports support the enhanced functionality for database operations and error handling.

apps/billing/src/trigger/glossary/keyword-research.ts (1)

114-114: Verify consumers of this task.

The addition of entry to the return object provides valuable context. However, we should ensure this change doesn't break any existing consumers of this task.

✅ Verification successful

Let me gather more information about the task's return type and its usage.


Based on the results, I can now generate the final response:

The addition of entry to the return object is safe to proceed with.

The task is only used in _generate-glossary-entry.ts where it accesses just the keywords array length through keywordResearch.output.keywords.length. The new entry field in the return object doesn't affect any existing consumers.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for direct usage of keywordResearchTask
ast-grep --pattern 'keywordResearchTask' -A 10

# Search for destructuring of the task result
rg -A 5 'const.*=.*keywordResearchTask'

Length of output: 4150


Script:

#!/bin/bash
# Check the return type definition and usage in the keyword research task
ast-grep --pattern 'return {
  $$$
}'  apps/billing/src/trigger/glossary/keyword-research.ts

# Check how the result is used in the generate glossary entry
ast-grep --pattern 'keywordResearch.output' -A 5

Length of output: 33868

apps/www/content-collections.ts (1)

6-7: LGTM: Clean schema imports

The new schema imports are well-organized and follow the established patterns.

apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (3)

6-6: LGTM: New task imports properly added

The new imports for content takeaways and FAQs tasks are correctly added and align with the workflow enhancements.

Also applies to: 9-9


97-110: LGTM: Well-implemented parallel execution with proper error handling

The parallel execution using Promise.all is implemented correctly with appropriate error handling for both tasks.

#!/bin/bash
# Verify error handling in the parallel tasks
ast-grep --pattern 'throw new AbortTaskRunError($$$)' 

74-77: Verify database schema requirements

Good addition of defensive database entry creation. However, let's verify that inputTerm is the only required field for entry creation.

✅ Verification successful

Database schema requirements are satisfied

Looking at the schema definition, inputTerm is indeed the only required field for entry creation as it's marked with .notNull(). All other fields are either optional or have default values. The current implementation correctly handles the minimal required fields for database entry creation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for database schema definition to verify required fields
rg -A 10 "entries = " --type typescript

Length of output: 4928


Script:

#!/bin/bash
# Let's try different approaches to find the database schema
# Look for table definitions or drizzle schema
rg -l "drizzle|createTable|sqliteTable" --type typescript

# Search for any file that might contain database schema
fd "schema|db|database" -e ts

Length of output: 570


Script:

#!/bin/bash
# Search for any imports or usage of database-related types
rg "type.*Entry|interface.*Entry" --type typescript

Length of output: 88


Script:

#!/bin/bash
# Look for any SQL migrations or schema files
fd "migrations|schema" --extension sql

Length of output: 40


Script:

#!/bin/bash
# Let's check the content of the db.ts file in the billing app since that's where our code is
cat apps/billing/src/lib/db.ts

# Also check for any table definitions or schema in the file containing our code
cat apps/billing/src/trigger/glossary/_generate-glossary-entry.ts

Length of output: 6715


Script:

#!/bin/bash
# Let's check the marketing db schema since we see it's imported in the file
cat apps/billing/src/lib/db-marketing/schemas.ts
cat apps/billing/src/lib/db-marketing/schemas/entries.ts

Length of output: 2313

apps/www/content/glossary/mime-types.mdx (1)

6-6: Consider populating the categories array.

Empty categories might affect content discoverability. Consider adding relevant categories like "API", "Web Development", or "Data Formats".

apps/billing/src/trigger/glossary/draft-sections.ts (2)

91-98: Verify impact of ordering change and consider expanding returned columns.

The changes include:

  1. Switching from descending to ascending order might affect consumers expecting the latest entry
  2. The column selection could be expanded to include metadata needed by consumers

Let's check for consumers that might be affected:

Consider maintaining the descending order and expanding columns:

     return await db.query.entries.findFirst({
       columns: {
         id: true,
         inputTerm: true,
         dynamicSectionsContent: true,
+        createdAt: true,
+        updatedAt: true,
       },
       where: eq(entries.id, entry.id),
-      orderBy: (entries, { asc }) => [asc(entries.createdAt)],
+      orderBy: (entries, { desc }) => [desc(entries.createdAt)],
     });

84-89: ⚠️ Potential issue

Add error handling and ensure unique constraints.

The update operation lacks error handling and could affect multiple entries if duplicates exist.

Let's verify the uniqueness constraints and potential duplicates:

Consider adding error handling:

     await db
       .update(entries)
       .set({
         dynamicSectionsContent: finalContent,
       })
-      .where(eq(entries.inputTerm, entry.inputTerm));
+      .where(eq(entries.inputTerm, entry.inputTerm))
+      .returning({ updatedCount: sql<number>`count(*)` })
+      .then(([result]) => {
+        if (!result?.updatedCount) {
+          throw new Error(`Failed to update entry for term: ${entry.inputTerm}`);
+        }
+      });
apps/www/content/glossary/statelessness.mdx (3)

64-71: LGTM! Well-structured comparison table.

The table provides a clear and comprehensive comparison between stateless and stateful APIs across key features.


74-77: LGTM! Excellent examples with clear distinctions.

The examples effectively illustrate the differences between stateless and stateful implementations in real-world scenarios.

Also applies to: 80-82


1-96: Overall excellent content structure and quality.

The glossary entry provides a comprehensive and well-organized explanation of statelessness in APIs. The content flows logically from basic concepts to practical examples, includes clear comparisons, and addresses common misconceptions.

🧰 Tools
🪛 LanguageTool

[grammar] ~56-~56: The expression ‘how to web’ requires the preposition ‘to’.
Context: ... development, significantly influencing how web services and client applications intera...

(HOW_TO_VERB)

apps/www/app/glossary/[slug]/page.tsx (1)

187-187: LGTM! Verify schema validation for term.faq

The change from hardcoded FAQs to dynamic term.faq improves reusability and maintainability.

Let's verify the schema validation:

✅ Verification successful

Schema validation for term.faq is properly implemented

The codebase has robust schema validation for FAQs:

  1. The FAQ schema is defined in apps/www/lib/schemas/faq-schema.ts:
export const faqSchema = z.array(
  z.object({
    question: z.string(),
    answer: z.string(),
  })
);
  1. This schema is properly integrated in the content collection validation in apps/www/content-collections.ts:
{
  faq: faqSchema,
  // ...other fields
}

The schema ensures that:

  • FAQs are always an array of question-answer pairs
  • Both question and answer fields are required strings
  • Invalid FAQ data will fail validation before reaching the component

The implementation is type-safe and properly validates the term.faq data before it reaches the component.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check FAQ schema implementation and usage
# Expected: Find FAQ schema definition and its usage in content validation

# Look for FAQ schema definition
echo "Searching for FAQ schema definition..."
rg -A 5 "faqSchema|FAQ.*schema|schema.*FAQ" --type typescript

# Look for FAQ validation in content processing
echo "Searching for FAQ validation in content processing..."
rg -A 5 "validateFAQ|validate.*FAQ|FAQ.*validate" --type typescript

Length of output: 407


Script:

#!/bin/bash
# Let's try different search patterns and without typescript file type restriction

# Look for FAQ schema definition
echo "Searching for FAQ schema definition..."
rg -A 5 "faq.*schema|schema.*faq" -i

# Look for FAQ type definitions
echo "Searching for FAQ type definitions..."
rg -A 5 "type.*FAQ|interface.*FAQ" -i

# Look for FAQ validation
echo "Searching for FAQ validation..."
rg -A 5 "validate.*faq|faq.*validate" -i

# Look for zod schema definitions related to FAQ
echo "Searching for zod schema definitions..."
rg -A 5 "z\.array|z\.object.*faq" -i

Length of output: 31306

apps/billing/src/trigger/glossary/generate-faqs.ts (2)

43-45: Confirm appropriate use of AbortTaskRunError for missing data

When no 'People Also Ask' data is found, the code throws an AbortTaskRunError. Ensure that this error type is suitable for this scenario and that it's handled correctly by the task runner. If this condition is expected and not exceptional, consider handling it without throwing an error, perhaps by returning a default value or logging a warning.


1-95: Well-structured implementation with effective use of TypeScript and ORM

Overall, the generateFaqsTask is well-implemented with clear logic, proper use of async/await, and effective interaction with the database and OpenAI API. The use of schemas and types enhances code reliability and maintainability.

apps/billing/src/trigger/glossary/create-pr.ts (1)

28-28: Verify the Change in 'orderBy' Clause to Ascending Order

The orderBy clauses have been modified to sort entries in ascending order of createdAt:

orderBy: (entries, { asc }) => [asc(entries.createdAt)],

Please verify that this change is intentional. Changing the sort order affects which entries are returned by findFirst. If the goal is to retrieve the oldest entry, ascending order is appropriate. However, if you intend to get the most recent entry, consider using descending order.

Also applies to: 41-41, 185-185

apps/billing/src/trigger/glossary/seo-meta-tags.ts (1)

36-41: Verify caching strategy logic for existing entries

The condition checking for existing entries now includes existing?.metaH1. Ensure that this change aligns with the intended caching strategy. If partial meta tags are acceptable under certain conditions, you may need to adjust the logic accordingly.

Run the following script to verify the caching behavior:

This script checks the database for entries that would be affected by the new caching condition.

✅ Verification successful

Based on the codebase analysis, I can now generate the final response:

Adding metaH1 check aligns with existing caching strategy

The addition of existing?.metaH1 check in the caching condition is consistent with the codebase's caching strategy. The code shows that:

  • The metaH1 field is a required part of the SEO metadata, stored in the database schema
  • Other tasks in the codebase follow the same pattern of requiring all metadata fields to be present before using cached entries
  • The caching strategy is controlled by the onCacheHit parameter which defaults to "stale", meaning it will use cached data if all required fields are present
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if entries with missing metaH1 are correctly bypassed in cache hits.

# Test: Find entries with metaTitle and metaDescription but missing metaH1.
sqlite3 your_database.db "SELECT inputTerm FROM entries WHERE metaTitle IS NOT NULL AND metaDescription IS NOT NULL AND metaH1 IS NULL;"

Length of output: 213


Script:

#!/bin/bash
# Let's analyze the caching strategy implementation and related code

# Find all references to metaH1 in the codebase
rg "metaH1" -A 5 -B 5

# Look for cache-related configurations or types
rg "onCacheHit|CacheStrategy" -A 3 -B 3

# Search for the interface or type definition of the metadata structure
ast-grep --pattern 'interface $_ {
  $$$
  metaH1?: $_
  $$$
}'

# Look for similar caching conditions
rg "existing\?\." -A 3 -B 3

Length of output: 49781

Comment on lines +8 to +34
export const takeawaysSchema = z.object({
tldr: z.string(),
definitionAndStructure: z.array(
z.object({
key: z.string(),
value: z.string(),
}),
),
historicalContext: z.array(
z.object({
key: z.string(),
value: z.string(),
}),
),
usageInAPIs: z.object({
tags: z.array(z.string()),
description: z.string(),
}),
bestPractices: z.array(z.string()),
recommendedReading: z.array(
z.object({
title: z.string(),
url: z.string(),
}),
),
didYouKnow: z.string(),
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance schema validation for better data integrity.

The schema could benefit from additional validation rules to ensure data quality and prevent potential issues:

Consider applying these improvements:

 export const takeawaysSchema = z.object({
-  tldr: z.string(),
+  tldr: z.string().min(1).max(500),
   definitionAndStructure: z.array(
     z.object({
-      key: z.string(),
-      value: z.string(),
+      key: z.string().min(1).max(100),
+      value: z.string().min(1).max(1000),
     }),
-  ),
+  ).min(1).max(10),
   // ... similar changes for historicalContext
   usageInAPIs: z.object({
-    tags: z.array(z.string()),
+    tags: z.array(z.string().min(1).max(50)).min(1).max(10),
-    description: z.string(),
+    description: z.string().min(1).max(1000),
   }),
-  bestPractices: z.array(z.string()),
+  bestPractices: z.array(z.string().min(1).max(500)).min(1).max(10),
   recommendedReading: z.array(
     z.object({
-      title: z.string(),
-      url: z.string(),
+      title: z.string().min(1).max(200),
+      url: z.string().url(),
     }),
-  ),
+  ).max(5),
-  didYouKnow: z.string(),
+  didYouKnow: z.string().min(1).max(1000).optional(),
 });

These changes:

  1. Add length constraints to prevent overly long content
  2. Add size limits to arrays to prevent excessive entries
  3. Add URL validation for recommended reading links
  4. Make didYouKnow optional as it might not always be necessary
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const takeawaysSchema = z.object({
tldr: z.string(),
definitionAndStructure: z.array(
z.object({
key: z.string(),
value: z.string(),
}),
),
historicalContext: z.array(
z.object({
key: z.string(),
value: z.string(),
}),
),
usageInAPIs: z.object({
tags: z.array(z.string()),
description: z.string(),
}),
bestPractices: z.array(z.string()),
recommendedReading: z.array(
z.object({
title: z.string(),
url: z.string(),
}),
),
didYouKnow: z.string(),
});
export const takeawaysSchema = z.object({
tldr: z.string().min(1).max(500),
definitionAndStructure: z.array(
z.object({
key: z.string().min(1).max(100),
value: z.string().min(1).max(1000),
}),
).min(1).max(10),
historicalContext: z.array(
z.object({
key: z.string().min(1).max(100),
value: z.string().min(1).max(1000),
}),
),
usageInAPIs: z.object({
tags: z.array(z.string().min(1).max(50)).min(1).max(10),
description: z.string().min(1).max(1000),
}),
bestPractices: z.array(z.string().min(1).max(500)).min(1).max(10),
recommendedReading: z.array(
z.object({
title: z.string().min(1).max(200),
url: z.string().url(),
}),
).max(5),
didYouKnow: z.string().min(1).max(1000).optional(),
});

Comment on lines +10 to +14
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
// NB: I initally tried to import allGlossaries from content-collections but it wasn't available
const glossaryDir = path.join(process.cwd(), "content", "glossary");
const files = await fs.readdir(glossaryDir);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for file system operations.

The file system operations could fail for various reasons (permissions, missing directory, etc.). Implement proper error handling to ensure graceful failure.

 export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
   const glossaryDir = path.join(process.cwd(), "content", "glossary");
-  const files = await fs.readdir(glossaryDir);
+  try {
+    const files = await fs.readdir(glossaryDir);
+    // ... rest of the function
+  } catch (error) {
+    console.error('Failed to generate sitemap:', error);
+    return []; // Return empty sitemap instead of throwing
+  }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +18 to +21
.map((file) => ({
url: `${env().NEXT_PUBLIC_BASE_URL}/glossary/${path.basename(file, ".mdx")}`,
lastModified: new Date(), // TODO: Get the actual last modified date of the file -- if content-collections doesn't work from marketing-db?
}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve URL construction and implement proper lastModified date.

Several improvements needed:

  1. Validate NEXT_PUBLIC_BASE_URL
  2. Implement proper lastModified date using file stats
  3. Sanitize file names to prevent path traversal
+    const stats = await fs.stat(path.join(glossaryDir, file));
+    const fileName = path.basename(file, '.mdx');
+    // Ensure fileName only contains valid characters
+    if (!/^[a-zA-Z0-9-]+$/.test(fileName)) {
+      console.warn(`Skipping invalid filename: ${fileName}`);
+      return null;
+    }
     return {
-      url: `${env().NEXT_PUBLIC_BASE_URL}/glossary/${path.basename(file, ".mdx")}`,
-      lastModified: new Date(),
+      url: new URL(`/glossary/${fileName}`, env().NEXT_PUBLIC_BASE_URL).toString(),
+      lastModified: stats.mtime,
     };

Also, add input validation at the start of the function:

 export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
+  const baseUrl = env().NEXT_PUBLIC_BASE_URL;
+  if (!baseUrl || !baseUrl.startsWith('http')) {
+    throw new Error('Invalid NEXT_PUBLIC_BASE_URL configuration');
+  }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +3 to +8
/**
* @description Schema for glossary entry takeaways
* @warning This is a duplicate of apps/billing/src/lib/db-marketing/schemas/takeaways-schema.ts
* @todo Extract this schema into a shared package to ensure consistency with the billing app
* @see apps/billing/src/lib/db-marketing/schemas/takeaways-schema.ts for the source of truth
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Schema duplication confirmed - Immediate action required for multiple schemas

The verification confirms identical schema definitions between apps with a clear pattern of duplication:

  • Both takeaways-schema.ts files are exact matches in structure and types
  • Similar duplication exists for FAQ schemas
  • A packages/ directory already exists for shared code

Action items:

  1. Create a new shared package packages/schemas
  2. Move both takeawaysSchema and faqSchema to this package
  3. Update imports in both apps to reference the shared package
  4. Remove duplicate files from individual apps
🔗 Analysis chain

Code duplication between apps needs to be addressed

The documentation clearly identifies this schema as a duplicate of apps/billing/src/lib/db-marketing/schemas/takeaways-schema.ts. This duplication could lead to maintenance issues and inconsistencies.

Let's verify the duplication and potential usage:

Consider these steps to resolve the duplication:

  1. Create a new shared package (e.g., packages/schemas)
  2. Move the schema to this shared package
  3. Update both apps to import from the shared package
  4. Add proper versioning to manage future schema changes
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify schema duplication and identify potential shared package location

# Check if the original schema exists and compare content
echo "Checking original schema file..."
rg -l "takeawaysSchema" "apps/billing/src/lib/db-marketing/schemas/"

# Look for potential shared package locations
echo "Checking for existing shared packages..."
fd -t d "shared|common|packages" -d 2

# Check for other potential schema duplications
echo "Checking for other schema duplications..."
rg -l "duplicate.*schema" "apps/"

Length of output: 547


Script:

#!/bin/bash
# Let's verify the content of both schema files and check for other potential duplicates

# Compare the content of both schema files
echo "=== Content of www schema ==="
cat "apps/www/lib/schemas/takeaways-schema.ts"

echo -e "\n=== Content of billing schema ==="
cat "apps/billing/src/lib/db-marketing/schemas/takeaways-schema.ts"

# Check if there are any existing shared schemas in packages directory
echo -e "\n=== Checking for existing shared schemas ==="
fd "schema" packages/

# Check the content of FAQ schema as it also mentions duplication
echo -e "\n=== Content of FAQ schema ==="
cat "apps/www/lib/schemas/faq-schema.ts"

Length of output: 2907

Comment on lines +47 to 50
entry: one(entries, {
fields: [searchQueries.inputTerm],
references: [entries.inputTerm],
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

One-to-one relationship not properly enforced in entries table

The entries table only has a non-unique index input_term_idx on the inputTerm column, but lacks a unique constraint. This means multiple entries could have the same inputTerm, making the one-to-one relationship with searchQueries table inconsistent since searchQueries has a unique constraint on this field.

  • Add a unique constraint on inputTerm column in the entries table to properly enforce the one-to-one relationship
  • Consider renaming the existing index to avoid confusion, as it will be redundant with the unique constraint
🔗 Analysis chain

Verify one-to-one cardinality with entries table

Please confirm that the one-to-one relationship between searchQueries and entries is correct. The unique constraint on inputTerm in the searchQueries table supports this, but we should verify the same constraint exists in the entries table.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify unique constraint on inputTerm in entries table
# Expected: Find unique constraint definition

# Search for entries table definition with unique constraint
ast-grep --pattern 'export const entries = mysqlTable(
  $$$
  {
    $$$
    uniqueInputTerm: unique($$$).on($$$inputTerm$$$),
    $$$
  }
)'

Length of output: 301


Script:

#!/bin/bash
# Search for entries table definition
rg "export const entries = mysqlTable" -A 20

Length of output: 2037


Script:

#!/bin/bash
# Let's check if there are any unique constraints defined in the entries table
rg "unique.*inputTerm|unique.*input_term" -A 5 apps/billing/src/lib/db-marketing/schemas/entries.ts

# Also check if there are any indexes defined separately
rg "index.*inputTerm|index.*input_term" -A 5 apps/billing/src/lib/db-marketing/schemas/entries.ts

Length of output: 382

Comment on lines +50 to +81
const faqs = await generateObject({
model: openai("gpt-4"),
system: `You are an API documentation expert. Your task is to provide clear, accurate, and comprehensive answers to frequently asked questions about API-related concepts.

Guidelines for answers:
1. Be technically accurate and precise
2. Use clear, concise language
3. Include relevant examples where appropriate
4. Focus on practical implementation details
5. Keep answers focused and relevant to API development
6. Maintain a professional, technical tone
7. Ensure answers are complete but not overly verbose`,
prompt: `
Term: "${term}"

Generate comprehensive answers for these questions from "People Also Ask":
${peopleAlsoAsk
.map(
(q) => `
Question: ${q.question}
Current snippet: ${q.snippet}
Source: ${q.link}
`,
)
.join("\n\n")}

Provide clear, accurate answers that improve upon the existing snippets while maintaining technical accuracy.
`,
schema: z.object({ faq: faqSchema }),
temperature: 0.2,
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Mitigate potential risks by sanitizing user inputs in OpenAI prompts

The term and data from peopleAlsoAsk (questions, snippets, links) are directly interpolated into the OpenAI prompt without sanitization. This could lead to unintended behavior or security risks if the inputs contain malicious content. Consider sanitizing or validating these inputs before including them in the prompt to ensure the robustness of the task.

Apply this diff to sanitize user inputs:

+function sanitizeInput(input: string): string {
+  // Implement sanitization logic, e.g., removing special characters or HTML tags
+  return input.replace(/[\r\n]+/g, ' ').trim();
+}

 const faqs = await generateObject({
   model: openai("gpt-4"),
   system: `You are an API documentation expert. Your task is to provide clear, accurate, and comprehensive answers to frequently asked questions about API-related concepts.`,
   prompt: `
-    Term: "${term}"
+    Term: "${sanitizeInput(term)}"
     
     Generate comprehensive answers for these questions from "People Also Ask":
     ${peopleAlsoAsk
       .map(
         (q) => `
-    Question: ${q.question}
-    Current snippet: ${q.snippet}
-    Source: ${q.link}
+    Question: ${sanitizeInput(q.question)}
+    Current snippet: ${sanitizeInput(q.snippet)}
+    Source: ${sanitizeInput(q.link)}
     `,
       )
       .join("\n\n")}
 
     Provide clear, accurate answers that improve upon the existing snippets while maintaining technical accuracy.
   `,
   schema: z.object({ faq: faqSchema }),
   temperature: 0.2,
 });

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +36 to +42
const scrapedContent = await db.query.firecrawlResponses.findMany({
where: eq(firecrawlResponses.inputTerm, term),
columns: {
markdown: true,
summary: true,
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle empty scraped content case

If scrapedContent is empty, the prompt sent to the OpenAI model may lack necessary context, which could lead to suboptimal results or errors. Consider adding a check to handle this scenario appropriately.

Apply this diff to handle the empty case:

 const scrapedContent = await db.query.firecrawlResponses.findMany({
   where: eq(firecrawlResponses.inputTerm, term),
   columns: {
     markdown: true,
     summary: true,
   },
 });

+ if (scrapedContent.length === 0) {
+   // Handle the case where there is no scraped content
+   throw new Error(`No scraped content found for term: ${term}`);
+ }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const scrapedContent = await db.query.firecrawlResponses.findMany({
where: eq(firecrawlResponses.inputTerm, term),
columns: {
markdown: true,
summary: true,
},
});
const scrapedContent = await db.query.firecrawlResponses.findMany({
where: eq(firecrawlResponses.inputTerm, term),
columns: {
markdown: true,
summary: true,
},
});
if (scrapedContent.length === 0) {
// Handle the case where there is no scraped content
throw new Error(`No scraped content found for term: ${term}`);
}

Comment on lines +97 to +103

await db
.update(entries)
.set({
takeaways: takeaways.object,
})
.where(eq(entries.inputTerm, term));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for the OpenAI API call

The call to generateObject may fail due to network issues or API errors. It's recommended to add error handling to catch exceptions that may occur during this call to prevent the application from crashing and to handle errors gracefully.

Apply this diff to add a try-catch block:

+    try {
        const takeaways = await generateObject({
          model: openai("gpt-4"),
          system: `
            ...
          `,
          prompt: `
            ...
          `,
          schema: takeawaysSchema,
          temperature: 0.2,
        });

        await db
          .update(entries)
          .set({
            takeaways: takeaways.object,
          })
          .where(eq(entries.inputTerm, term));
+    } catch (error) {
+      // Handle the error appropriately
+      console.error('Error generating takeaways:', error);
+      throw new Error('Failed to generate takeaways');
+    }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +152 to +198
// Step 4: Validate and optimize lengths
const validatedMetaTags = await generateObject({
model: openai("gpt-4"),
system: `
You are an expert SEO consultant with 10 years of experience optimizing content for search engines.
Your task is to validate and optimize meta tags to ensure they meet strict character limits while
maintaining their SEO value and readability.

Key requirements:
1. Title: Max 60 chars (aim for 50-55)
2. Description: Max 160 chars (aim for 145-155)
3. H1: Max 80 chars (aim for 45-50)

Best practices:
- Front-load important keywords
- Maintain readability and natural language
- Preserve core message and intent
- Keep primary keyword visible in truncated versions
- Use punctuation strategically to create natural breaks

If tags exceed limits:
1. Remove unnecessary words while preserving meaning
2. Replace longer words with shorter synonyms
3. Restructure for conciseness
4. Ensure truncation occurs at natural breaks
`,
prompt: `
Original tags:
Title: ${craftedMetaTags.object.title}
Description: ${craftedMetaTags.object.description}
H1: ${craftedMetaTags.object.h1}

Optimize these tags to meet character limits while maintaining SEO value.
If they already meet the limits, return them unchanged.
`,
schema: z.object({
title: z.string().max(60),
description: z.string().max(160),
h1: z.string().max(80),
reasoning: z.object({
titleChanges: z.string(),
descriptionChanges: z.string(),
h1Changes: z.string(),
}),
}),
temperature: 0.1, // Low temperature for consistent, focused outputs
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for OpenAI API calls

The generateObject calls lack error handling. If the OpenAI API request fails or times out, it could cause unhandled exceptions. To ensure robustness, wrap these calls in try-catch blocks to handle potential errors gracefully.

Apply this diff to add error handling:

 try {
   const validatedMetaTags = await generateObject({
     model: openai("gpt-4"),
     system: `
       You are an expert SEO consultant with 10 years of experience optimizing content for search engines.
       ...
     `,
     prompt: `
       Original tags:
       Title: ${craftedMetaTags.object.title}
       Description: ${craftedMetaTags.object.description}
       H1: ${craftedMetaTags.object.h1}

       Optimize these tags to meet character limits while maintaining SEO value.
       If they already meet the limits, return them unchanged.
     `,
     schema: z.object({
       title: z.string().max(60),
       description: z.string().max(160),
       h1: z.string().max(80),
       reasoning: z.object({
         titleChanges: z.string(),
         descriptionChanges: z.string(),
         h1Changes: z.string(),
       }),
     }),
     temperature: 0.1,
   });
 } catch (error) {
   // Handle the error, possibly by logging and returning a fallback value
   console.error('Failed to validate meta tags:', error);
   throw new Error('Meta tag validation failed');
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +200 to +209
// Update database with validated meta tags
await db
.update(entries)
.set({
metaTitle: craftedMetaTags.object.title,
metaDescription: craftedMetaTags.object.description,
metaTitle: validatedMetaTags.object.title,
metaDescription: validatedMetaTags.object.description,
metaH1: validatedMetaTags.object.h1,
})
.where(eq(entries.inputTerm, term));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure error handling for database update operation

The database update operation does not currently handle potential errors. If the update fails, it might silently fail without alerting the system. Wrap the database update in a try-catch block to handle any exceptions and maintain data integrity.

Apply this diff to add error handling:

+try {
  await db
    .update(entries)
    .set({
      metaTitle: validatedMetaTags.object.title,
      metaDescription: validatedMetaTags.object.description,
      metaH1: validatedMetaTags.object.h1,
    })
    .where(eq(entries.inputTerm, term));
+} catch (error) {
+  // Handle the error, possibly by logging and taking corrective action
+  console.error('Database update failed:', error);
+  throw new Error('Failed to update meta tags in database');
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Update database with validated meta tags
await db
.update(entries)
.set({
metaTitle: craftedMetaTags.object.title,
metaDescription: craftedMetaTags.object.description,
metaTitle: validatedMetaTags.object.title,
metaDescription: validatedMetaTags.object.description,
metaH1: validatedMetaTags.object.h1,
})
.where(eq(entries.inputTerm, term));
// Update database with validated meta tags
try {
await db
.update(entries)
.set({
metaTitle: validatedMetaTags.object.title,
metaDescription: validatedMetaTags.object.description,
metaH1: validatedMetaTags.object.h1,
})
.where(eq(entries.inputTerm, term));
} catch (error) {
// Handle the error, possibly by logging and taking corrective action
console.error('Database update failed:', error);
throw new Error('Failed to update meta tags in database');
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants