Skip to content

feat(billing): add glossary entry generation workflow#2642

Merged
chronark merged 12 commits intounkeyed:mainfrom
p6l-richard:richard/add-outline-generation
Nov 11, 2024
Merged

feat(billing): add glossary entry generation workflow#2642
chronark merged 12 commits intounkeyed:mainfrom
p6l-richard:richard/add-outline-generation

Conversation

@p6l-richard
Copy link
Contributor

@p6l-richard p6l-richard commented Nov 4, 2024

  • adds the first version of our generate-glossary-entry workflow
  • which contains 7 sub-tasks sequentially: keyword_research, generate_outline, draft_sections, seo_meta_tags
  • These workflows require accounts with some API providers to be set up:
OPENAI_API_KEY=
SERPER_API_KEY=
FIRECRAWL_API_KEY=
GITHUB_PERSONAL_ACCESS_TOKEN=
  • they persist data in the marketing planetscale instance

What does this PR do?

Fixes # (issue)

If there is not an issue for this, please create one first. This is used to tracking purposes and also helps use understand why this PR exists

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Test A
  • Test B

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

  • New Features

    • Introduced a comprehensive workflow for generating glossary entries, including keyword research, outline generation, drafting sections, creating SEO meta tags, and generating pull requests.
    • Added guides for prompt engineering and database model generation.
    • Implemented a keyword research system for extracting SEO-relevant keywords.
  • Documentation

    • Added detailed instructions for using the Drizzle ORM and managing tasks with the Trigger.dev SDK.
  • Bug Fixes

    • Enhanced error handling and logging for various tasks to improve reliability and user experience.

Implement generate-outline task for API glossary content

- Created generateOutlineTask with the following steps:
  1. Fetch top-ranking pages' markdown content
  2. Summarize content using AI
  3. Generate initial outline
  4. Perform technical review
  5. Conduct SEO review
  6. Optimize outline for SEO
  7. Perform editorial review

- Implemented helper functions:
  - generateInitialOutline: Creates a flat, comprehensive outline
  - performTechnicalReview: Evaluates coverage, comprehensiveness, and conciseness
  - performSEOReview: Analyzes keyword usage and SEO optimization
  - reviseSEOOutline: Refines outline based on SEO recommendations
  - performEditorialReview: Ensures readability, flow, and engagement

- Utilized OpenAI's GPT-4 for AI-assisted content generation and review
- Integrated with database queries for keyword and content retrieval
- Implemented robust error handling and logging throughout the process
0. Added generate entry task

1. Fixed the initial error with search query timestamps by omitting them from the schema
2. Improved the firecrawl code to be more robust and debuggable
3. Added better error handling and logging throughout the keyword research task
4. Fixed the entry creation issue by ensuring entries are created alongside search queries
@changeset-bot
Copy link

changeset-bot bot commented Nov 4, 2024

⚠️ No Changeset found

Latest commit: cb800ad

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Nov 4, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 11, 2024 0:26am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 4, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

This pull request introduces a series of changes across multiple files in the billing application, primarily focused on enhancing database schema definitions, adding new functionalities related to glossary entry generation, and improving task management through the Trigger.dev SDK. Key additions include new files for instructions on using Drizzle ORM and prompt engineering, as well as the introduction of several tasks for generating glossary entries, SEO meta tags, and evaluations. Additionally, updates to the package.json file include new scripts and dependencies to support these functionalities.

Changes

File Path Change Summary
apps/billing/cursor/drizzle_instructions.txt New file providing instructions for generating MySQL models with Drizzle ORM, including schema definitions and relationships.
apps/billing/cursor/prompt-engineering.md New file detailing strategies for effective prompt engineering for language models.
apps/billing/package.json Updated db:studio script with --verbose, added new scripts: db:generate, db:migrate, db:pull, added new dependency @octokit/rest.
apps/billing/src/lib/db-marketing/data-instructions.txt New content overviewing data structure in db-marketing, detailing relationships and data management.
apps/billing/src/lib/db-marketing/schemas/entries.ts New schema for entries table with relationships and Zod schemas for validation.
apps/billing/src/lib/db-marketing/schemas/evals.ts New schema for evals table with relationships and Zod schemas for validation.
apps/billing/src/lib/db-marketing/schemas/firecrawl.ts Updated firecrawlResponses schema with new fields and relationships.
apps/billing/src/lib/db-marketing/schemas/index.ts Added exports for new modules: entries, sections, and evals.
apps/billing/src/lib/db-marketing/schemas/keywords.ts Added new schemas and types for handling keywords table, including relationships.
apps/billing/src/lib/db-marketing/schemas/searchQuery.ts Updated relationships for search_queries table.
apps/billing/src/lib/db-marketing/schemas/sections.ts New file defining sections table schema and relationships.
apps/billing/src/lib/db-marketing/schemas/serper.ts Updated serperOrganicResults schema with new fields and relationships.
apps/billing/src/lib/firecrawl.ts Replaced getTopResultsContent with getOrCreateFirecrawlResponse and added error handling for scraping responses.
apps/billing/src/lib/keywords.ts New file for keyword research system, defining tasks for extracting SEO-relevant keywords.
apps/billing/src/lib/search-query.ts Updated getOrCreateSearchQuery to ensure entries are created in the entries table.
apps/billing/src/trigger.instructions.txt New file providing documentation for task creation and management using Trigger.dev SDK.
apps/billing/src/trigger/glossary/_generate-glossary-entry.ts New file for generating glossary entries through a structured workflow.
apps/billing/src/trigger/glossary/create-pr.ts New file for creating pull requests based on glossary entries.
apps/billing/src/trigger/glossary/draft-sections.ts New file for drafting sections of glossary entries with error handling.
apps/billing/src/trigger/glossary/evals.ts New file for tasks related to evaluations and generating ratings and recommendations.
apps/billing/src/trigger/glossary/generate-outline.ts New file for generating outlines based on SEO content analysis.
apps/billing/src/trigger/glossary/keyword-research.ts New file for performing keyword research tasks.
apps/billing/src/trigger/glossary/seo-meta-tags.ts New file for generating SEO-optimized meta tags for glossary entries.
apps/billing/src/trigger/keyword-research.ts Deleted file that previously defined the keywordResearchTask.
apps/billing/tsconfig.json Updated target option from "es5" to "esnext" in compilerOptions.
apps/billing/updates/update_2024_11_05/prompt.txt New workflow for generating glossary entries with a structured five-step process.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vercel
Copy link

vercel bot commented Nov 4, 2024

@p6l-richard is attempting to deploy a commit to the Unkey Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2024

Thank you for following the naming conventions for pull request titles! 🙏

@vercel vercel bot temporarily deployed to Preview – engineering November 4, 2024 20:21 Inactive
@vercel vercel bot temporarily deployed to Preview – engineering November 6, 2024 19:14 Inactive
@vercel vercel bot temporarily deployed to Preview – engineering November 6, 2024 19:18 Inactive
@p6l-richard p6l-richard marked this pull request as ready for review November 6, 2024 19:19
@vercel vercel bot temporarily deployed to Preview – engineering November 6, 2024 19:21 Inactive
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 36

🧹 Outside diff range and nitpick comments (53)
apps/billing/src/lib/db-marketing/schemas/index.ts (1)

Line range hint 1-8: Consider reorganizing exports for better maintainability.

The current export order appears random. Consider grouping related modules together and following a consistent organization pattern:

+// Search-related schemas
 export * from "./serper";
 export * from "./searchQuery";
-export * from "./entries";
 export * from "./keywords";
 export * from "./firecrawl";
-export * from "./sections";
-export * from "./evals";
+
+// Content-related schemas
+export * from "./entries";
+export * from "./sections";
+
+// Evaluation schemas
+export * from "./evals";
apps/billing/tsconfig.json (1)

3-3: Document minimum environment requirements.

Since the target is now set to esnext, consider documenting the minimum required Node.js version and browser requirements in the README to help other developers and users.

Would you like me to help create a documentation section about environment requirements?

apps/billing/src/lib/db-marketing/schemas/entries.ts (2)

28-34: Consider optimizing relations for better performance and data integrity.

The relations are correctly defined, but consider these improvements:

  1. Add an index on the foreign key fields to optimize join operations
  2. Define explicit onDelete behavior to maintain data integrity
 export const entriesRelations = relations(entries, ({ many, one }) => ({
   dynamicSections: many(sections),
   searchQuery: one(searchQueries, {
     fields: [entries.inputTerm],
     references: [searchQueries.inputTerm],
+    relationName: "entry_search_query",
+    onDelete: "cascade",
   }),
 }));

36-40: Enhance schema validation with additional rules and custom error messages.

Consider extending the insert schema with additional validation rules:

-export const insertEntrySchema = createInsertSchema(entries).extend({}).omit({ id: true });
+export const insertEntrySchema = createInsertSchema(entries).extend({
+  githubPrUrl: z.string().url({ message: "Invalid GitHub PR URL format" }),
+  metaTitle: z.string().min(10).max(60),
+  metaDescription: z.string().min(50).max(160),
+}).omit({ id: true });
apps/billing/src/lib/db-marketing/schemas/keywords.ts (2)

27-30: Remove unnecessary .extend({}).

The schema definitions look good, but there's an unnecessary empty extension.

-export const insertKeywordsSchema = createInsertSchema(keywords).extend({}).omit({ id: true });
+export const insertKeywordsSchema = createInsertSchema(keywords).omit({ id: true });

Line range hint 9-26: Consider adding composite indexes for common query patterns.

Given this is part of a glossary entry generation workflow, consider adding composite indexes for frequently accessed combinations of fields (e.g., source + keyword) to optimize query performance.

Example addition to the table definition:

   (table) => ({
     inputTermIdx: index("input_term_idx").on(table.inputTerm),
     sourceUrlIdx: index("source_url_idx").on(table.sourceUrl),
+    sourceKeywordIdx: index("source_keyword_idx").on(table.source, table.keyword),
     uniqueKeyword: unique("keywords_input_term_keyword_unique").on(table.inputTerm, table.keyword),
   }),
apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (2)

1-1: Remove unused import.

The sql import from drizzle-orm is not used in this file.

-import { relations, sql } from "drizzle-orm";
+import { relations } from "drizzle-orm";

Also applies to: 13-14


58-61: Remove unnecessary schema extension.

The empty .extend({}) call can be removed as it doesn't modify the schema.

-export const insertFirecrawlResponseSchema = createInsertSchema(firecrawlResponses)
-  .extend({})
-  .omit({ id: true });
+export const insertFirecrawlResponseSchema = createInsertSchema(firecrawlResponses)
+  .omit({ id: true });
apps/billing/src/lib/db-marketing/schemas/serper.ts (1)

49-49: Add an index for the foreign key column.

Consider adding an index for firecrawlResponseId to improve query performance when joining with the firecrawl_responses table.

  (table) => ({
    searchResponseIdIdx: index("search_response_id_idx").on(table.searchResponseId),
    linkIdx: index("link_idx").on(table.link),
+   firecrawlResponseIdIdx: index("firecrawl_response_id_idx").on(table.firecrawlResponseId),
  }),
apps/billing/src/trigger.instructions.txt (2)

12-12: Fix grammatical issues.

Add missing commas in the following locations:

  1. "There are different types of tasks, including regular tasks and [scheduled tasks]"
  2. "For more information, read [the machines guide]"
  3. "By default, tasks can execute indefinitely"

Also applies to: 115-115, 130-130

🧰 Tools
🪛 LanguageTool

[uncategorized] ~12-~12: Possible missing comma found.
Context: ...ure." --- There are different types of tasks including regular tasks and [scheduled ...

(AI_HYDRA_LEO_MISSING_COMMA)


309-318: Add glossary workflow examples.

Since this documentation is part of the glossary entry generation workflow PR, it would be helpful to:

  1. Add a section with practical examples demonstrating how to implement the seven sequential sub-tasks (keyword research, outline generation, etc.)
  2. Include code snippets showing how these tasks interact with PlanetScale's marketing instance
apps/billing/cursor/prompt-engineering.md (4)

37-38: Fix list indentation for consistency.

The list items have inconsistent indentation levels. Some items use 3 spaces while others use 4 spaces. This inconsistency might affect readability and markdown rendering.

Ensure all list items at the same level use consistent indentation (3 spaces is the standard).

Also applies to: 62-62

🧰 Tools
🪛 Markdownlint

37-37: Expected: 3; Actual: 4
Inconsistent indentation for list items at the same level

(MD005, list-indent)


38-38: Expected: 3; Actual: 4
Inconsistent indentation for list items at the same level

(MD005, list-indent)


47-53: Standardize code block formatting.

Code blocks have inconsistent formatting:

  1. Missing language specifiers for syntax highlighting
  2. Mixing tilde (~) and backtick (`) fence styles

Apply these improvements:

  1. Add appropriate language specifiers to all code blocks
  2. Standardize on backticks for code fences
    Example:
-~~~
+```python
 def example():
     pass
-~~~
+```

Also applies to: 65-77, 95-128, 187-187, 299-299, 446-446, 606-606, 700-700

🧰 Tools
🪛 Markdownlint

47-47: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


53-53: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


170-170: Add alt text to all images for accessibility.

All images in the document are missing alternative text, which is essential for:

  • Screen reader accessibility
  • SEO optimization
  • Fallback when images fail to load

Add descriptive alt text to each image. For example:

-<img width="550" src="..." title="GPT-4 answering questions about a set of expenses.">
+<img width="550" src="..." alt="Screenshot showing GPT-4 accurately answering questions about expense data" title="GPT-4 answering questions about a set of expenses.">

Also applies to: 180-180, 386-392, 404-408, 422-430, 440-440, 600-600, 691-691, 716-724, 742-748, 756-762, 770-770

🧰 Tools
🪛 Markdownlint

170-170: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


417-417: Refine grammar and style for improved readability.

Several minor improvements would enhance the document's clarity:

  1. Remove redundant phrases (e.g., "close proximity" → "proximity")
  2. Add missing commas in compound sentences
  3. Use en dashes (–) instead of hyphens (-) for ranges

Apply these refinements throughout the document for improved professionalism and readability.

Also applies to: 682-682, 686-686, 719-719, 721-721, 753-753, 767-767, 779-779

🧰 Tools
🪛 LanguageTool

[style] ~417-~417: This phrase is a bit redundant. Consider using just “proximity”.
Context: ...that the model handles really well. The close proximity of keys to their values makes it ea...

(CLOSE_SCRUTINY)

apps/billing/src/lib/db-marketing/schemas/evals.ts (5)

18-20: Consider using 'json' column type for JSON data storage

The fields ratings, recommendations, and outline store JSON stringified data in text columns. If your MySQL version supports it, consider using the json column type provided by Drizzle ORM for better type safety and query capabilities.

Apply this diff to update the column types:

-        ratings: text("ratings").notNull(), // JSON stringified ratings
-        recommendations: text("recommendations").notNull().default("[]"), // Add default empty array
-        outline: text("outline").default("[]"), // Add outline field
+        ratings: json("ratings").notNull(),
+        recommendations: json("recommendations").notNull().default([]),
+        outline: json("outline").default([]),

39-49: Simplify ratingValueSchema by using z.preprocess

The current ratingValueSchema uses a union and multiple transformations to handle both numbers and numeric strings. You can simplify it by using z.preprocess to coerce inputs to numbers before validation.

Here's the refactored schema:

const ratingValueSchema = z.preprocess(
  (val) => Number(val),
  z.number().int().min(0).max(10)
);

51-61: Remove redundant transform in ratingsSchema

Since ratingValueSchema already ensures that the values are numbers, the transform step in ratingsSchema that converts the values to numbers is redundant. Consider removing it to simplify the schema.

Here's the updated schema:

export const ratingsSchema = z.object({
  accuracy: ratingValueSchema,
  completeness: ratingValueSchema,
  clarity: ratingValueSchema,
});

74-74: Remove unnecessary .extend({}) call in insertEvalSchema

The .extend({}) in insertEvalSchema is unnecessary since no additional fields are being added. Removing it improves code readability.

Update the code:

export const insertEvalSchema = createInsertSchema(evals).omit({ id: true });

78-79: Ensure consistent type inference for InsertEval and SelectEval

For consistency in type definitions, consider inferring SelectEval from selectEvalSchema using z.infer, as you did with InsertEval.

Update the type definition:

export type SelectEval = z.infer<typeof selectEvalSchema>;
apps/billing/src/lib/search-query.ts (4)

16-26: Refactor to Avoid Duplicate Insertion Logic into entries Table

The code for inserting into the entries table is duplicated when an existing query is found (lines 16-26) and within the transaction for new queries (lines 65-75). Refactoring this duplication can improve maintainability and reduce potential errors.

Consider extracting the insertion logic into a separate function or moving it outside the conditional blocks if appropriate.

Also applies to: 65-75


31-32: Offer Assistance for Improving Prompting Logic

The comment indicates that the prompting requires improvements to generate better search queries.

Would you like assistance in refining the prompt to enhance the quality of the generated search queries? I can help draft an improved prompt or suggest strategies to improve prompt engineering.


Line range hint 34-49: Implement Error Handling for generateObject Call

The generateObject function may throw an error if the generation fails. Implementing error handling ensures that such exceptions are handled gracefully.

Apply this diff to wrap the generateObject call in a try-catch block:

+ try {
    const generatedQuery = await generateObject({
      model: openai("gpt-4o-mini"),
      system: `You are a Senior Content Writer who specialises in writing technical content for Developer Tools that are SEO optimized.
For every term, you conduct a search on Google to gather the data you need.
You're goal is to create a search query that will return a SERP with the most relevant information for the term.

Make sure to always include the exact term in the search query at the beginning of the query.
If the term is clearly associated to API development, use the term as-is for query.

If the term is ambiguous with non-API development related fields and could result in unrelated results, add context to the search query to clarify the search & return the reason for the ambiguity.

Keep the search query as short and as simple as possible, don't use quotes around the search query.

`,
      prompt: `Create the search query for the term "${term}."`,
      schema: insertSearchQuerySchema.omit({ createdAt: true, updatedAt: true }),
    });
+ } catch (error) {
+   // Handle the error appropriately
+   throw new Error(`Failed to generate search query: ${error.message}`);
+ }

52-75: Enhance Error Handling in Database Transaction

Adding error handling within the transaction block can help manage exceptions effectively and maintain data integrity.

Consider implementing error handling within the transaction:

await db.transaction(async (tx) => {
+ try {
    // Insert search query
    await tx
      .insert(searchQueries)
      .values({
        ...generatedQuery.object,
      })
      .onDuplicateKeyUpdate({
        set: {
          updatedAt: sql`now()`,
        },
      });

    // Insert entry
    await tx
      .insert(entries)
      .values({
        inputTerm: term,
      })
      .onDuplicateKeyUpdate({
        set: {
          updatedAt: sql`now()`,
        },
      });
+ } catch (error) {
+   // Handle transaction errors
+   throw new Error(`Transaction failed: ${error.message}`);
+ }
});
apps/billing/src/lib/db-marketing/schemas/sections.ts (3)

25-25: Avoid using reserved keywords as column names

Using order as a column name can lead to conflicts since ORDER is a reserved keyword in SQL. Consider renaming this field to something like sequence or position to prevent potential issues.


39-41: Define related tables before referencing them

The sectionsRelations reference sectionContentTypes and sectionsToKeywords before they are defined. While JavaScript allows this due to hoisting, defining these tables before they are referenced can improve code readability and maintainability.


47-57: Consider using an enum for contentTypes

Defining contentTypes as a constant array is effective but using a TypeScript enum could enhance type safety and code clarity in the long run.

Here's how you might define it:

enum ContentType {
  Listicle = "listicle",
  Table = "table",
  Image = "image",
  Code = "code",
  Infographic = "infographic",
  Timeline = "timeline",
  Other = "other",
  Text = "text",
  Video = "video",
}

Then, update the mysqlEnum definition accordingly.

apps/billing/src/trigger/glossary/keyword-research.ts (5)

13-13: Rename constant 'THREE' to a more descriptive name

The constant THREE may not clearly convey its purpose. Consider renaming it to something more descriptive like TOP_RESULTS_COUNT for better readability.

Apply this diff to rename the constant and update its usages:

-export const THREE = 3;
+export const TOP_RESULTS_COUNT = 3;

Update references in the code:

-console.info(`3/6 - Getting content for top ${THREE} results`);
+console.info(`3/6 - Getting content for top ${TOP_RESULTS_COUNT} results`);

-const topThree = searchResponse.serperOrganicResults
+const topResults = searchResponse.serperOrganicResults
   .sort((a, b) => a.position - b.position)
-  .slice(0, THREE);
+  .slice(0, TOP_RESULTS_COUNT);

48-48: Avoid logging sensitive information

Similar to the previous comment, ensure that logging does not expose sensitive data. While logging the number of results is acceptable, be cautious with other details.


112-112: Deduplicate keywords before returning

When combining keywords from multiple sources, duplicates may occur. Deduplicating the keywords enhances the quality of the results.

Add deduplication logic:

-const allKeywords = [
-  ...keywordsFromTitles,
-  ...keywordsFromHeaders,
-  ...insertedRelatedSearches,
-];
+const keywordMap = new Map();
+[...keywordsFromTitles, ...keywordsFromHeaders, ...insertedRelatedSearches].forEach((k) => {
+  keywordMap.set(k.keyword, k);
+});
+const uniqueKeywords = Array.from(keywordMap.values());

Update the return statement:

-return {
-  message: 'Keyword Research completed',
-  keywords: [...keywordsFromTitles, ...keywordsFromHeaders, ...insertedRelatedSearches],
-};
+return {
+  message: 'Keyword Research completed',
+  keywords: uniqueKeywords,
+};

15-115: Add unit tests for 'keywordResearchTask'

Implementing unit tests for the keywordResearchTask ensures its correctness and reliability. Cover scenarios like cache hits, cache misses, error handling, and edge cases.


22-22: Type annotation improvement for default parameter

The current type casting using as CacheStrategy might be unnecessary. Consider specifying the type directly in the parameter definition for clarity.

Update the parameter definition:

-run: async ({
-  term,
-  onCacheHit = "stale" as CacheStrategy,
-}: { term: string; onCacheHit?: CacheStrategy }) => {
+run: async ({
+  term,
+  onCacheHit = "stale",
+}: { term: string; onCacheHit?: CacheStrategy }) => {

Ensure that "stale" is a valid value of CacheStrategy.

apps/billing/src/trigger/glossary/create-pr.ts (3)

14-16: Adjust parameter typing for 'onCacheHit'

Using a type assertion in the default parameter value may cause issues. Instead, explicitly type the parameter to ensure proper type checking.

Apply this diff to adjust the parameter typing:

-run: async ({
-  input,
-  onCacheHit = "stale" as CacheStrategy,
-}: { input: string; onCacheHit?: CacheStrategy }) => {
+run: async ({
+  input,
+  onCacheHit = "stale",
+}: { input: string; onCacheHit?: CacheStrategy }) => {
+  onCacheHit = onCacheHit as CacheStrategy;

Alternatively, define the parameter type directly:

-run: async ({
-  input,
-  onCacheHit = "stale" as CacheStrategy,
-}: { input: string; onCacheHit?: CacheStrategy }) => {
+run: async ({
+  input,
+  onCacheHit = "stale",
+}: { input: string; onCacheHit?: CacheStrategy & string }) => {

18-26: Consolidate database queries to improve performance

The code performs two separate database queries to fetch existing and entry, which are essentially fetching the same data. Consolidate these queries to reduce redundant calls.

Combine the queries into one and reuse the result:

-const existing = await db.query.entries.findFirst({
-  where: eq(entries.inputTerm, input),
-  columns: {
-    id: true,
-    inputTerm: true,
-    githubPrUrl: true,
-  },
-  orderBy: (entries, { desc }) => [desc(entries.createdAt)],
-});
+const entry = await db.query.entries.findFirst({
+  where: eq(entries.inputTerm, input),
+  orderBy: (entries, { desc }) => [desc(entries.createdAt)],
+});

...

-if (existing?.githubPrUrl && onCacheHit === "stale") {
+if (entry?.githubPrUrl && onCacheHit === "stale") {
   return {
-    entry: existing,
+    entry,
     message: `Found existing PR for ${input}.mdx`,
   };
 }

-if (!entry?.dynamicSectionsContent) {
+if (!entry || !entry.dynamicSectionsContent) {
   throw new AbortTaskRunError(
     `Unable to create PR: The markdown content for the dynamic sections are not available for the entry to term: ${input}. It's likely that draft-sections.ts didn't run as expected.`,
   );
 }

Also applies to: 36-39


122-128: Enhance the pull request description for clarity

The current PR body is minimal. Providing more context can help reviewers understand the changes better.

Consider updating the PR body:

 body: `This PR adds the ${sanitizedInput}.mdx file to the API documentation.

+Please review the content for accuracy and completeness. Let me know if any changes are required.
 `,
apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (3)

13-33: Fix typographical errors in comments

There are a couple of typos in the comments:

  • In line 29, "undeterministic" should be "non-deterministic".
  • In line 30, "mititage" should be "mitigate".

Apply this diff to correct the typos:

 * Sub-tasks may (in fact most of them do) rely on undeterministic execution when they use LLMs. As a result, the maxAttempts for a subtask is 5.
- * Sub-tasks may (in fact most of them do) rely on undeterministic execution when they use LLMs. As a result, the maxAttempts for a subtask is 5.
+ * Sub-tasks may (in fact most of them do) rely on non-deterministic execution when they use LLMs. As a result, the maxAttempts for a subtask is 5.

 * The downside here is that this increases the runtime of the workflow considerably. Ways to mititage this are cited in the `todo` comment inside @file ./generate-outline.ts.
- * The downside here is that this increases the runtime of the workflow considerably. Ways to mititage this are cited in the `todo` comment inside @file ./generate-outline.ts.
+ * The downside here is that this increases the runtime of the workflow considerably. Ways to mitigate this are cited in the `todo` comment inside @file ./generate-outline.ts.

113-115: Clarify the purpose of the conditional check

The comment // this if statement is here to make TypeScript happy may not provide clear context. Consider rephrasing the comment to explain why this check is necessary for TypeScript type safety.

Apply this diff to improve the comment:

-      // this if statement is here to make TypeScript happy
+      // Ensure pr.output.entry.id is defined before proceeding

72-104: Refactor repeated sub-task execution pattern

The code for executing and handling sub-tasks is repeated for each step. Consider refactoring this pattern into a utility function to reduce duplication and improve maintainability.

Example refactoring:

async function executeSubTask<T>({
  stepNumber,
  stepName,
  taskFunction,
  term,
  onCacheHit,
  successMessage,
  errorPrefix,
}: {
  stepNumber: number;
  stepName: string;
  taskFunction: (args: { term: string; onCacheHit?: CacheStrategy }) => Promise<{ ok: boolean; output?: T }>;
  term: string;
  onCacheHit?: CacheStrategy;
  successMessage: string;
  errorPrefix: string;
}) {
  console.info(`${stepNumber}/5 - ${stepName}...`);
  const result = await taskFunction({ term, onCacheHit });
  if (!result.ok) {
    throw new AbortTaskRunError(`${errorPrefix} failed for term: ${term}`);
  }
  console.info(successMessage);
  return result;
}

Usage in your main function:

// Step 1: Keyword Research
const keywordResearch = await executeSubTask({
  stepNumber: 1,
  stepName: "Starting keyword research",
  taskFunction: keywordResearchTask.triggerAndWait,
  term,
  onCacheHit,
  successMessage: `✓ Keyword research completed with ${keywordResearch.output.keywords.length} keywords`,
  errorPrefix: "Keyword research",
});

// Repeat for other steps...
apps/billing/src/trigger/glossary/seo-meta-tags.ts (2)

116-122: Add error handling for database update operation

Currently, the code assumes that the database update operation succeeds. Consider adding error handling to catch and handle any potential errors from the database update.

For example:

try {
  await db
    .update(entries)
    .set({
      metaTitle: craftedMetaTags.object.title,
      metaDescription: craftedMetaTags.object.description,
    })
    .where(eq(entries.inputTerm, term));
} catch (error) {
  // Handle error appropriately
}

54-114: Handle possible OpenAI API errors and rate limits

When calling generateObject, it's important to handle potential API errors, such as rate limits or network issues, to enhance the robustness of your application.

Consider wrapping the API call in a try-catch block and implementing retries or fallback mechanisms as needed.

apps/billing/src/lib/firecrawl.ts (1)

Line range hint 27-117: Consider refactoring 'getOrCreateFirecrawlResponse' for better readability

The getOrCreateFirecrawlResponse function is quite lengthy and contains multiple nested blocks. Consider refactoring it into smaller helper functions to enhance readability and maintainability.

🧰 Tools
🪛 Biome

[error] 4-4: Declarations inside of a import declaration may not have duplicates

a second declaration of firecrawlResponses is not allowed

firecrawlResponses is first declared here

(parse)

apps/billing/src/lib/keywords.ts (3)

53-53: Fix typos in prompt messages

There are typos in the prompts where "form" should be "from." Correcting these improves clarity.

Apply these diffs to correct the typos:

At line 53:

 Create a list of keywords that are directly related to the main term and its subtopics form the titles of the pages.
+Create a list of keywords that are directly related to the main term and its subtopics from the titles of the pages.

At line 136:

 Create a list of keywords that are directly related to the main term and its subtopics form the h1 headers of the pages.
+Create a list of keywords that are directly related to the main term and its subtopics from the h1 headers of the pages.

Also applies to: 136-136


83-94: Consider processing 'keywordsWithBrandNames' for comprehensive keyword coverage

Currently, the code processes only the keywords array returned by the AI model and ignores keywordsWithBrandNames. If brand-related keywords are relevant to your SEO strategy, consider processing and storing them as well.

Would you like assistance in updating the code to handle keywordsWithBrandNames similarly to keywords?

Also applies to: 158-169


30-106: Refactor common logic to improve maintainability

The functions getOrCreateKeywordsFromTitles and getOrCreateKeywordsFromHeaders share similar structures and logic. Refactoring shared code into helper functions can reduce duplication and enhance maintainability.

Consider extracting common operations such as:

  • Checking for existing keywords in the database
  • Constructing prompts
  • Invoking the AI model and handling the response
  • Inserting keywords into the database

Would you like assistance in outlining a refactored structure?

Also applies to: 108-180

apps/billing/src/trigger/glossary/draft-sections.ts (2)

61-63: Consider making the dynamic sections limit configurable

Currently, the code limits dynamicSections to the first 6 items using slice(0, 6). To enhance flexibility and maintainability, consider making this limit a configurable parameter or providing documentation to explain the rationale behind this hardcoded value.


142-151: Correct numbering in the 'Guidelines' section of the prompt

The numbering in the Guidelines section has duplicate entries and skips numbers, which may cause confusion. Please correct the numbering sequence for clarity.

Apply this diff to fix the numbering:

 Guidelines:
-1. Write in markdown format.
-2. Start with an introductory paragraph before stating the first section heading
-2. Use "##" for the section heading
-3. Skip the title of the page, that will be provided separately
-4. Ensure the content is factually correct and relevant to API development.
-5. Include examples, code snippets or other rich content if appropriate.
-6. Keep the content concise but informative, ensure that there are no fluff phrases or statements that don't provide concrete information, context & background to the term.
-7. Don't repeat content between sections, ensure that each section adds value
-8. Only write the content for the section, do not provide any other context, introductions or statements regarding this task.
+1. Write in markdown format.
+2. Start with an introductory paragraph before stating the first section heading.
+3. Use "##" for the section heading.
+4. Skip the title of the page; that will be provided separately.
+5. Ensure the content is factually correct and relevant to API development.
+6. Include examples, code snippets, or other rich content if appropriate.
+7. Keep the content concise but informative, ensuring there are no fluff phrases or statements that don't provide concrete information, context, or background to the term.
+8. Don't repeat content between sections; ensure that each section adds value.
+9. Only write the content for the section; do not provide any other context, introductions, or statements regarding this task.
apps/billing/src/trigger/glossary/evals.ts (4)

141-209: Refactor duplicated code in evaluation tasks

The functions performTechnicalEvalTask, performSEOEvalTask, and performEditorialEvalTask have similar structures with repetitive code. Consider refactoring them into a single generic function that accepts the evaluation type as a parameter. This will reduce code duplication and improve maintainability.

Also applies to: 212-277, 280-362


283-283: Standardize logging format across evaluation tasks

In performEditorialEvalTask, logging statements include specific prefixes like [workflow=glossary] [task=editorial_eval], whereas other evaluation tasks do not use this format. For consistency, consider standardizing the logging statements across all evaluation tasks.

Apply this diff to align logging in other tasks:

 // In performTechnicalEvalTask
- console.info(`Starting technical evaluation for term: ${input}`);
+ console.info(`[workflow=glossary] [task=technical_eval] Starting for term: ${input}`);

 // Similarly update other log statements in the function

Also applies to: 299-301, 309-311, 321-323, 326-328, 343-345


303-307: Handle potential undefined 'existing.outline' property safely

When parsing existing.outline in line 305, ensure that existing.outline is defined and contains valid JSON to prevent runtime errors. Consider adding a check before parsing.

Apply this diff for safer parsing:

- outline: JSON.parse(existing.outline || "[]"),
+ outline: existing.outline ? JSON.parse(existing.outline) : [],

186-186: Remove unnecessary spread operator in function call

The spread operator ...options is being used along with explicitly passed parameters in function calls. Since all properties of options are already being passed, the spread operator may be redundant and could lead to unintended behavior. Please review its necessity.

Apply this diff to remove the spread operator:

 await getOrCreateRatingsTask.triggerAndWait({
   input,
   type: "technical",
   content: options.content,
   onCacheHit,
-  ...options,
 });

Also applies to: 257-257, 335-335

apps/billing/src/trigger/glossary/generate-outline.ts (3)

83-84: Correct step numbering in log messages

There is an inconsistency in the step numbering of the log messages:

  • Line 83: Step 1/8 - ORGANIC RESULTS
  • Line 83 should be Step 2/8 to align with the sequential flow, assuming Step 1/8 was an earlier action.

Ensure that the step numbers in the log messages accurately reflect the sequence of operations to aid in debugging.

Apply this diff to correct the step numbering:

-console.info(`Step 1/8 - ORGANIC RESULTS: ${organicResults?.length} results`);
+console.info(`Step 2/8 - ORGANIC RESULTS: ${organicResults?.length} results`);

197-214: Optimize keyword association logic

The nested loop for associating keywords with sections can be optimized for performance and readability. Consider the following improvements:

  • Use map and filter methods: These methods can simplify the logic and make the code more declarative.
  • Handle cases where keywords are not found: Accumulate unmatched keywords for further analysis or logging.

Apply this diff to refactor the keyword association:

-const keywordInsertionPayload = [];
-for (let i = 0; i < editorialEval.output.outline.length; i++) {
-  const section = { ...editorialEval.output.outline[i], id: newSectionIds[i].id };
-  for (let j = 0; j < section.keywords.length; j++) {
-    const keyword = section.keywords[j];
-    const keywordId = seoKeywords.find(
-      (seoKeyword) => keyword.keyword === seoKeyword.keyword,
-    )?.id;
-    if (!keywordId) {
-      console.warn(`Keyword "${keyword.keyword}" not found in seo keywords`);
-      continue;
-    }
-    const payload = insertSectionsToKeywordsSchema.parse({
-      sectionId: section.id,
-      keywordId,
-    });
-    keywordInsertionPayload.push(payload);
-  }
-}
+const keywordInsertionPayload = editorialEval.output.outline.flatMap((section, index) => {
+  const sectionId = newSectionIds[index].id;
+  return section.keywords.map((keyword) => {
+    const keywordId = seoKeywords.find(
+      (seoKeyword) => keyword.keyword === seoKeyword.keyword,
+    )?.id;
+    if (!keywordId) {
+      console.warn(`Keyword "${keyword.keyword}" not found in SEO keywords`);
+      return null;
+    }
+    return insertSectionsToKeywordsSchema.parse({
+      sectionId,
+      keywordId,
+    });
+  }).filter(Boolean);
+});

278-280: Clarify instruction formatting in AI prompts

In the generateInitialOutline function, the system prompt and instructions could be formatted for better clarity when passed to the AI model. Use consistent indentation and bullet points to improve readability.

Apply this diff to adjust the prompt formatting:

 const initialOutlineSystem = `You are a **Technical SEO Content Writer** specializing in API development and computer science.
-Your objective is to create a flat, comprehensive outline for a glossary page based on summarized content from top-ranking pages.
-Ensure factual correctness, clarity, and SEO optimization without unnecessary subheadings.`;
+Your objective is to:
+- Create a flat, comprehensive outline for a glossary page based on summarized content from top-ranking pages.
+- Ensure factual correctness, clarity, and SEO optimization without unnecessary subheadings.`;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 49b5f30 and f305486.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (26)
  • apps/billing/cursor/drizzle_instructions.txt (1 hunks)
  • apps/billing/cursor/prompt-engineering.md (1 hunks)
  • apps/billing/package.json (2 hunks)
  • apps/billing/src/lib/db-marketing/data-instructions.txt (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/entries.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/evals.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (4 hunks)
  • apps/billing/src/lib/db-marketing/schemas/index.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/keywords.ts (3 hunks)
  • apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (2 hunks)
  • apps/billing/src/lib/db-marketing/schemas/sections.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/serper.ts (3 hunks)
  • apps/billing/src/lib/firecrawl.ts (2 hunks)
  • apps/billing/src/lib/keywords.ts (1 hunks)
  • apps/billing/src/lib/search-query.ts (3 hunks)
  • apps/billing/src/trigger.instructions.txt (1 hunks)
  • apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (1 hunks)
  • apps/billing/src/trigger/glossary/create-pr.ts (1 hunks)
  • apps/billing/src/trigger/glossary/draft-sections.ts (1 hunks)
  • apps/billing/src/trigger/glossary/evals.ts (1 hunks)
  • apps/billing/src/trigger/glossary/generate-outline.ts (1 hunks)
  • apps/billing/src/trigger/glossary/keyword-research.ts (1 hunks)
  • apps/billing/src/trigger/glossary/seo-meta-tags.ts (1 hunks)
  • apps/billing/src/trigger/keyword-research.ts (0 hunks)
  • apps/billing/tsconfig.json (1 hunks)
  • apps/billing/updates/update_2024_11_05/prompt.txt (1 hunks)
💤 Files with no reviewable changes (1)
  • apps/billing/src/trigger/keyword-research.ts
✅ Files skipped from review due to trivial changes (1)
  • apps/billing/cursor/drizzle_instructions.txt
🧰 Additional context used
🪛 LanguageTool
apps/billing/cursor/prompt-engineering.md

[uncategorized] ~148-~148: Consider inserting a comma before ‘actually’.
Context: ...ts that could plausibly exist, but don’t actually. The art to getting this right is enume...

(FINAL_ADVERB_COMMA)


[grammar] ~175-~175: Possible agreement error -- use the past participle here.
Context: ...be tricky. The best levers we have here are to provide lots of descriptions, and as...

(HAVE_PART_AGREEMENT)


[style] ~175-~175: The phrase ‘lots of’ might be wordy and overused. Consider using an alternative.
Context: ...best levers we have here are to provide lots of descriptions, and as many examples ...

(A_LOT_OF)


[misspelling] ~265-~265: When specifying a month and year, ‘of’ is unnecessary.
Context: ...s of defining commands. #### ReAct In March of 2023, Princeton and Google released a paper ...

(MONTH_OF_XXXX)


[style] ~375-~375: Consider removing “of” to be more concise
Context: ...you will need to combine some subset of all of the strategies enumerated in this document....

(ALL_OF_THE)


[style] ~399-~399: Consider using an extreme adjective for ‘good’.
Context: ...rate. Fortunately, OpenAI’s models are exceptionally good at working with Markdown tables (presum...

(EXTREME_ADJECTIVES)


[uncategorized] ~415-~415: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...or every item has some custom attributes and it doesn’t make sense to have dozens of...

(COMMA_COMPOUND_SENTENCE)


[style] ~417-~417: This phrase is a bit redundant. Consider using just “proximity”.
Context: ...that the model handles really well. The close proximity of keys to their values makes it ea...

(CLOSE_SCRUTINY)


[typographical] ~426-~426: Consider adding a comma after ‘Occasionally’ for more clarity.
Context: ... JSON blob.">

#### Freeform Text Occasionally you’ll want to include freeform text in...

(RB_LY_COMMA)


[uncategorized] ~682-~682: Possible missing comma found.
Context: ... levels of nested joins. In the example above the model successfully joins from `user...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~686-~686: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...age response isn’t sufficient on its own and you’ll want the model’s output to cite ...

(COMMA_COMPOUND_SENTENCE)


[misspelling] ~719-~719: This word is normally spelled as one.
Context: ...tons for booking the flight + hotel, or auto-filling a form for the user. As another exampl...

(EN_COMPOUNDS_AUTO_FILLING)


[typographical] ~721-~721: It appears that a comma is missing.
Context: ...n the creation of that message. In this scenario you won’t know exactly where in the mes...

(DURING_THAT_TIME_COMMA)


[uncategorized] ~753-~753: Possible missing comma found.
Context: ...we asked the bot to evaluate the Python code it gets it slightly wrong. The correct ...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~767-~767: Consider removing “of” to be more concise
Context: ..., you may not want to show the end user all of the bot’s thinking and instead just want to...

(ALL_OF_THE)


[uncategorized] ~767-~767: Possible missing comma found.
Context: ...t to show the end user all of the bot’s thinking and instead just want to show the final...

(AI_HYDRA_LEO_MISSING_COMMA)


[typographical] ~779-~779: If specifying a range, consider using an en dash instead of a hyphen.
Context: ...larity for brevity. • For simple tasks, 1-3 sentences are often sufficient. • For m...

(HYPHEN_TO_EN)

apps/billing/src/lib/db-marketing/data-instructions.txt

[grammar] ~6-~6: Two determiners in a row. Choose either “the” or “an”.
Context: ... Relationships: - The central entity is the an inputTerm to which most models are conn...

(DT_DT)

apps/billing/src/trigger.instructions.txt

[uncategorized] ~12-~12: Possible missing comma found.
Context: ...ure." --- There are different types of tasks including regular tasks and [scheduled ...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~115-~115: Possible missing comma found.
Context: ...ements in the machine field. For more information read the machines guide. ...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~130-~130: Did you mean: “By default,”?
Context: ... }, }); ``` ### maxDuration option By default tasks can execute indefinitely, which c...

(BY_DEFAULT_COMMA)

🪛 Markdownlint
apps/billing/cursor/prompt-engineering.md

37-37: Expected: 3; Actual: 4
Inconsistent indentation for list items at the same level

(MD005, list-indent)


38-38: Expected: 3; Actual: 4
Inconsistent indentation for list items at the same level

(MD005, list-indent)


62-62: Expected: 4; Actual: 3
Inconsistent indentation for list items at the same level

(MD005, list-indent)


47-47: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


53-53: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


65-65: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


77-77: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


95-95: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


108-108: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


114-114: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


120-120: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


126-126: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


187-187: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


299-299: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


446-446: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


606-606: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


700-700: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


170-170: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


180-180: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


386-386: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


392-392: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


404-404: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


408-408: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


422-422: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


430-430: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


440-440: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


600-600: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


691-691: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


716-716: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


724-724: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


742-742: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


748-748: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


756-756: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


762-762: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


770-770: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


187-187: Expected: backtick; Actual: tilde
Code fence style

(MD048, code-fence-style)


299-299: Expected: backtick; Actual: tilde
Code fence style

(MD048, code-fence-style)


446-446: Expected: backtick; Actual: tilde
Code fence style

(MD048, code-fence-style)


606-606: Expected: backtick; Actual: tilde
Code fence style

(MD048, code-fence-style)


700-700: Expected: backtick; Actual: tilde
Code fence style

(MD048, code-fence-style)

🪛 Biome
apps/billing/src/lib/firecrawl.ts

[error] 4-4: Declarations inside of a import declaration may not have duplicates

a second declaration of firecrawlResponses is not allowed

firecrawlResponses is first declared here

(parse)

🔇 Additional comments (28)
apps/billing/src/lib/db-marketing/schemas/index.ts (1)

3-3: Verify schema relationships to prevent circular dependencies.

Since these schemas have relationships between them (as mentioned in the AI summary), let's verify there are no circular dependencies that could cause issues.

#!/bin/bash
# Description: Check for potential circular dependencies in schema files
# Expected: No circular import patterns between related schemas

echo "Analyzing schema relationships..."
for file in "entries" "sections" "evals"; do
  echo "Checking imports in ${file}.ts:"
  rg -A 3 "import.*from.*\./" "apps/billing/src/lib/db-marketing/schemas/${file}.ts" || echo "No local imports found"
done

Also applies to: 6-7

apps/billing/tsconfig.json (1)

3-3: Verify browser compatibility requirements.

The change from es5 to esnext enables modern JavaScript features but might affect browser compatibility. Ensure this aligns with your browser support requirements.

Let's check if there are any browser compatibility requirements specified:

✅ Verification successful

Based on the results, I can see that the billing app is part of a monorepo with multiple applications, and the apps/dashboard contains web-specific files including billing-related components (e.g., apps/dashboard/app/(app)/settings/billing/). Since this is a server-side application that's part of a larger web application ecosystem, and modern JavaScript features are commonly used in such environments, the change to esnext is appropriate.

The target change to esnext is safe to proceed with.

The billing app appears to be a server-side component used by the dashboard application, which is a modern Next.js application. Server-side components don't need to consider browser compatibility as they run on the server, and the TypeScript output will be processed by Node.js, which supports modern JavaScript features.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Look for browser compatibility requirements in package.json or README
echo "Checking package.json for browserslist config..."
cat package.json | jq '.browserslist'

echo "Checking README for browser support information..."
fd -g "README*" -X grep -i "browser.*support\|compatibility"

Length of output: 323


Script:

#!/bin/bash
# Let's check if this is a browser-based application or a server-side component
# First, check for web-specific dependencies and entry points
echo "Checking package.json for dependencies..."
cat package.json | jq '.dependencies, .devDependencies'

echo "Looking for web-specific files..."
fd -e html -e tsx -e jsx

# Check if this billing app is imported/used by other apps
echo "Checking usage of billing app..."
rg -l "billing" --type ts --type tsx --type jsx

Length of output: 23674

apps/billing/package.json (1)

26-28: LGTM! Minor version updates for trigger.dev packages.

The updates from 3.1.0 to 3.1.2 are minor version bumps, suggesting backward-compatible changes.

apps/billing/src/lib/db-marketing/schemas/entries.ts (2)

1-7: LGTM: Imports are well-organized and necessary.

All imports are being utilized and follow a logical organization pattern.


1-40: Verify schema consistency across the codebase.

Let's ensure the field names and types are consistent with related schemas and their usage.

✅ Verification successful

Schema definitions and relations are consistent across the codebase

The verification shows that:

  • The entries schema correctly defines its relations with sections and searchQueries tables
  • The inputTerm field is consistently used as the linking field between entries and searchQueries
  • The sections table properly references entries through entryId
  • All schema definitions follow the same pattern using drizzle-orm and drizzle-zod
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistency in field names and schema usage

# Test 1: Check field names in related schemas
echo "Checking related schemas..."
rg -t typescript "mysqlTable\(" apps/billing/src/lib/db-marketing/schemas/

# Test 2: Check usage of these schemas in the codebase
echo "Checking schema usage..."
rg -t typescript "insertEntrySchema|selectEntrySchema" apps/billing/src/

Length of output: 345


Script:

#!/bin/bash
# Let's try without the file type filter since it's not recognized
echo "Checking related schemas..."
rg "mysqlTable\(" apps/billing/src/lib/db-marketing/schemas/

echo -e "\nChecking searchQueries schema..."
cat apps/billing/src/lib/db-marketing/schemas/searchQuery.ts

echo -e "\nChecking sections schema..."
cat apps/billing/src/lib/db-marketing/schemas/sections.ts

echo -e "\nChecking usage of entry schemas..."
rg "insertEntrySchema|selectEntrySchema" apps/billing/src/

Length of output: 7987

apps/billing/src/lib/db-marketing/schemas/keywords.ts (2)

5-6: LGTM: Imports are appropriate for schema validation.

The added imports for drizzle-zod and zod types are necessary for implementing type-safe schema validation.


Line range hint 32-41: Verify the existence of referenced tables and their schemas.

The relations look well-structured, particularly the new many-to-many relationship with sectionsToKeywords. Let's verify the referenced tables exist with the correct schema.

✅ Verification successful

All referenced tables and fields exist with correct schema definitions

The verification confirms:

  • sectionsToKeywords table exists with proper junction table structure
  • searchQueries table exists with inputTerm field (varchar(255)) and corresponding index
  • serperOrganicResults table exists with link field (varchar(767)) and corresponding index

The relations configuration is valid and properly references existing tables and fields.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and schema of referenced tables

# Check for the referenced tables and their schemas
echo "Checking for referenced table definitions..."
rg -A 10 "sectionsToKeywords.*=.*mysqlTable" apps/billing/src/lib/db-marketing/schemas/
rg -A 10 "searchQueries.*=.*mysqlTable" apps/billing/src/lib/db-marketing/schemas/
rg -A 10 "serperOrganicResults.*=.*mysqlTable" apps/billing/src/lib/db-marketing/schemas/

# Verify the foreign key fields exist in the referenced tables
echo "Checking for referenced fields..."
rg "inputTerm.*:.*" apps/billing/src/lib/db-marketing/schemas/searchQuery.ts
rg "link.*:.*" apps/billing/src/lib/db-marketing/schemas/serper.ts

Length of output: 4494

apps/billing/updates/update_2024_11_05/prompt.txt (1)

29-29: Validate Drizzle ORM implementation with MySQL.

Given the uncertainty about Drizzle with MySQL:

  1. Consider adding test cases for database operations
  2. Document any MySQL-specific configurations
  3. Include rollback procedures for schema changes

Let's check for any existing Drizzle MySQL configurations:

apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (2)

14-15: LGTM: New imports are properly structured

The new imports for firecrawlResponses and entries are correctly organized and follow the existing import pattern.


41-50: Verify database migration for new relationships

The schema changes introduce new relationships that will require database migrations. Please ensure that:

  1. Migration files have been generated and reviewed
  2. The migrations handle existing data appropriately
  3. The migrations have been tested in a staging environment
apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (2)

33-33: LGTM! Schema changes are well-structured.

The field types and lengths are appropriate for their purposes:

  • ogSiteName follows Open Graph protocol naming
  • inputTerm as varchar(255) is suitable for search terms
  • summary as text type accommodates variable-length content

Also applies to: 37-38


43-43: LGTM! Good performance optimization.

Adding an index on inputTerm is a good practice as it will optimize lookups during the glossary entry generation workflow.

apps/billing/src/lib/db-marketing/schemas/serper.ts (1)

6-6: Verify the firecrawl schema dependency.

The import looks correct. Let's verify that the firecrawl.ts schema file exists and contains the referenced table.

✅ Verification successful

Import dependency verified successfully

The firecrawl.ts schema file exists in the same directory and contains the firecrawlResponses table definition, confirming that the import statement is valid and correctly references an existing table.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and content of the firecrawl schema file
# Expected: File exists and contains firecrawlResponses table definition

# Check if the file exists in the same directory
fd -t f "firecrawl.ts" -d 1

# Verify the table definition
rg -A 5 "firecrawlResponses.*=.*mysqlTable" 

Length of output: 602

apps/billing/src/lib/db-marketing/schemas/sections.ts (1)

87-91: Review timestamp configurations for consistency

The timestamps createdAt and updatedAt in sectionsToKeywords specify mode: "date" and fsp: 3, while other tables do not. This may lead to inconsistent timestamp precision and data types across your tables.

Please verify if the use of mode: "date" with fsp: 3 is intentional. If not, consider aligning the timestamp definitions with those in other tables.

apps/billing/src/trigger/glossary/keyword-research.ts (1)

63-63: Check for empty firecrawl results

After fetching firecrawl responses, ensure that there are valid results before proceeding. This prevents potential errors in subsequent steps.

apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (3)

37-38: Verify if maxAttempts should be zero

You have set maxAttempts: 0 in the task's retry configuration, which means the task will not retry if it fails. Given that sub-tasks may rely on non-deterministic execution, is it intentional to have no retries for the main task? Consider setting an appropriate number of retry attempts or ensure that sub-tasks handle retries sufficiently.


59-64: Ensure all necessary fields are checked in cache validation

When checking for existing entries in the cache, you verify several fields. Confirm that these fields (dynamicSectionsContent, metaTitle, metaDescription, githubPrUrl) are sufficient to determine if the cached entry is valid and complete. If there are additional fields or conditions that should be considered, include them to ensure correctness.


108-109: Verify parameter consistency in createPrTask call

In calling createPrTask.triggerAndWait, you pass { input: term, onCacheHit }, whereas in previous tasks you pass { term, onCacheHit }. Verify that createPrTask expects an input parameter instead of term. If it expects term, consider aligning the parameter naming for consistency across all sub-tasks.

apps/billing/src/trigger/glossary/seo-meta-tags.ts (3)

33-35: Ensure correct cache strategy handling

In the condition if (existing?.metaTitle && existing?.metaDescription && onCacheHit === "stale"), verify that the onCacheHit value correctly influences the caching logic. Confirm that this condition meets the intended behavior.


119-120: Check for undefined properties before accessing

Ensure that craftedMetaTags.object is defined before accessing title and description to prevent runtime errors.


18-20: ⚠️ Potential issue

Potential syntax issue in default parameter assignment

The default parameter assignment onCacheHit = "stale" as CacheStrategy may cause a syntax error. Type assertions are not allowed in parameter defaults. Consider specifying the type separately.

Apply this diff to fix the issue:

-run: async ({
-  term,
-  onCacheHit = "stale" as CacheStrategy,
-}: { term: string; onCacheHit?: CacheStrategy }) => {
+run: async ({
+  term,
+  onCacheHit = "stale",
+}: { term: string; onCacheHit?: CacheStrategy }) => {
+  onCacheHit = onCacheHit as CacheStrategy;

Likely invalid or redundant comment.

apps/billing/src/lib/firecrawl.ts (1)

179-179: Verify the model name 'gpt-4o-mini' in 'generateText' call

Ensure that 'gpt-4o-mini' is the correct model name. If it's a typo, consider updating it to the intended model, such as 'gpt-4' or 'gpt-3.5-turbo'.

Run the following script to list all model names used with openai in the codebase:

apps/billing/src/trigger/glossary/draft-sections.ts (2)

188-188: ⚠️ Potential issue

Verify the model name 'gpt-4o-mini'

The model name 'gpt-4o-mini' used with the OpenAI SDK may be incorrect or unsupported. Please verify that this is the intended model and that it is available in the OpenAI API.

Run the following script to verify available models:

#!/bin/bash
# Description: List available OpenAI models to confirm 'gpt-4o-mini' exists.

# Test: Retrieve model list. Expect: 'gpt-4o-mini' should be listed.
curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[].id'

224-224: ⚠️ Potential issue

Verify the model name 'gpt-4o-mini'

The model name 'gpt-4o-mini' used with the OpenAI SDK may be incorrect or unsupported. Please confirm that this is the intended model and that it is available in the OpenAI API.

Run the following script to verify available models:

✅ Verification successful

Model name 'gpt-4o-mini' appears to be an internal custom model identifier

The codebase search reveals consistent usage of two custom model identifiers:

  • gpt-4o-mini
  • gpt-4o

These appear to be internal model identifiers rather than standard OpenAI model names, as they are used consistently across multiple files:

  • apps/billing/src/lib/keywords.ts
  • apps/billing/src/lib/search-query.ts
  • apps/billing/src/lib/firecrawl.ts
  • apps/billing/src/trigger/glossary/*.ts

The model name is likely correct within the context of this codebase's implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: List available OpenAI models to confirm 'gpt-4o-mini' exists.

# Test: Retrieve model list. Expect: 'gpt-4o-mini' should be listed.
curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[].id'

Length of output: 146


Script:

#!/bin/bash
# Let's search for any other model name patterns in the codebase
rg -g '*.ts' -g '*.js' "openai\(['\"].*['\"]" -A 2

# Also search for any model configuration or constants
rg -g '*.ts' -g '*.js' -i "model.*=.*gpt"

Length of output: 3197

apps/billing/src/trigger/glossary/evals.ts (2)

352-352: Verify appropriate storage of content in 'outline' field

In line 352, options.content (a string) is being JSON.stringify-ed and stored in the outline field. If outline is intended to store a structured outline (e.g., an array or object), storing a string may lead to inconsistencies when retrieving and parsing the data. Please verify that this is the intended behavior.


51-55: Consider cache control logic for 'onCacheHit' parameter

In the condition checking for existing evaluations, the code uses onCacheHit === "stale" to decide whether to return cached data. Verify that this logic aligns with the intended cache strategy. If the default behavior should be to use the cache unless a refresh is required, consider updating the condition to reflect that.

Also applies to: 104-108, 159-165, 230-236, 298-307

apps/billing/src/trigger/glossary/generate-outline.ts (2)

390-394: Confirm model availability and compatibility

The model gpt-4o-mini used in the generateObject calls may not be available or compatible. Ensure that the model identifier is correct and that the model supports the required features.

Consult the AI service documentation to verify the correct model name and its capabilities.


157-167: Ensure consistency in function usage and outputs

In the reviseSEOOutline function, ensure that the inputs and outputs are consistent with the expected schemas. Verify that seoOptimizedOutline.object.outline exists and adheres to the finalOutlineSchema.

Run the following script to verify the structure of seoOptimizedOutline:

✅ Verification successful

Based on the gathered information, I can now provide the final response:

Function implementation and schema validation are properly structured

The reviseSEOOutline function is correctly implemented with proper schema validation:

  • It returns a response validated against finalOutlineSchema using the generateObject function
  • The finalOutlineSchema is properly defined with the expected structure including outline array and keywords
  • The output structure seoOptimizedOutline.object.outline matches the schema definition as it's enforced by the validation
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that seoOptimizedOutline matches finalOutlineSchema

# Test: Check for the existence and structure of seoOptimizedOutline.object.outline
ast-grep --pattern $'const seoOptimizedOutline = $_\nconsole.info($_)' apps/billing/src/trigger/glossary/generate-outline.ts

Length of output: 124


Script:

#!/bin/bash
# Let's verify the schema and usage of reviseSEOOutline function

# Check the definition and return type of reviseSEOOutline
ast-grep --pattern 'function reviseSEOOutline'

# Check for any type definitions or interfaces related to the outline
rg -A 5 "interface.*Outline|type.*Outline|schema.*Outline"

# Look for the finalOutlineSchema definition
rg "finalOutlineSchema" -A 5

# Check imports and usage of reviseSEOOutline
rg "reviseSEOOutline" -A 2

Length of output: 3425


Script:

#!/bin/bash
# Let's check the implementation of reviseSEOOutline to verify its return type

# Check the complete implementation of reviseSEOOutline function
ast-grep --pattern 'async function reviseSEOOutline($_) {
  $$$
}'

# Check how the result is validated before return
rg -B 5 -A 2 "return validateAIResponse" apps/billing/src/trigger/glossary/generate-outline.ts

Length of output: 7196

@@ -1,4 +1,7 @@
export * from "./serper";
export * from "./searchQuery";
export * from "./entries";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Circular dependency detected between entries and sections modules

There is a circular dependency between the schema modules:

  • entries.ts imports from ./sections
  • sections.ts imports from ./entries

This circular dependency could lead to initialization issues or runtime errors. Consider:

  • Extracting shared types/interfaces to a separate module
  • Restructuring the schema relationships to avoid the circular reference
  • Using interface declarations instead of direct imports where possible
🔗 Analysis chain

Verify the existence of newly exported modules.

Let's ensure that all the newly added module files exist in the codebase.

Also applies to: 6-7


Let me generate additional verification scripts to check for potential circular dependencies between these modules.

This will help us verify:

  1. The complete import graph between these modules
  2. Existence of other referenced files (searchQuery.ts, keywords.ts)
  3. Any potential circular dependencies that could cause issues
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the new module files exist
# Expected: All files should exist to prevent runtime errors

echo "Checking for new module files..."
for module in "entries" "sections" "evals"; do
  if fd -t f "${module}.ts" "apps/billing/src/lib/db-marketing/schemas/"; then
    echo "✓ Found ${module}.ts"
  else
    echo "✗ Missing ${module}.ts"
  fi
done

Length of output: 1296


Script:

#!/bin/bash
# Check for circular dependencies in schema files

echo "Checking entries.ts imports:"
rg "import.*from.*\./.*" apps/billing/src/lib/db-marketing/schemas/entries.ts

echo -e "\nChecking sections.ts imports:"
rg "import.*from.*\./.*" apps/billing/src/lib/db-marketing/schemas/sections.ts

echo -e "\nChecking evals.ts imports:"
rg "import.*from.*\./.*" apps/billing/src/lib/db-marketing/schemas/evals.ts

# Check if all referenced files exist
echo -e "\nVerifying all imported files exist:"
fd -t f "searchQuery.ts|keywords.ts" apps/billing/src/lib/db-marketing/schemas/

Length of output: 928

Comment on lines +10 to +13
"db:studio": "dotenv -e .env drizzle-kit studio --verbose",
"db:generate": "dotenv -e .env drizzle-kit generate",
"db:migrate": "dotenv -e .env drizzle-kit migrate",
"db:pull": "dotenv -e .env drizzle-kit pull"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Missing environment file for database configuration in billing app

The database scripts require environment variables, but there's no .env.example file in the billing app to document the required configuration. Other apps in the monorepo have example files that show the required database variables:

  • DATABASE_HOST
  • DATABASE_USERNAME
  • DATABASE_PASSWORD

Please add an .env.example file to the apps/billing directory documenting these required database variables.

🔗 Analysis chain

LGTM! Verify environment variable setup.

The new database scripts align well with Drizzle ORM operations. The verbose flag for studio is a good addition for debugging.

Let's verify the required environment variables are documented:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if environment variables are documented in README or example files

# Test: Look for environment variable documentation
rg -i "OPENAI_API_KEY|SERPER_API_KEY|FIRECRAWL_API_KEY|GITHUB_PERSONAL_ACCESS_TOKEN" README.md .env.example

Length of output: 162


Script:

#!/bin/bash
# Let's check for any .env files or environment documentation in the billing app
fd -t f ".env" apps/billing/

# Also check for environment variable documentation in billing app's README
fd -t f "README" apps/billing/

# Look for environment setup instructions in the root README
cat README.md

# Check if there's a template or example env file in the root
fd -t f ".env.example|.env.template" .

Length of output: 1035


Script:

#!/bin/bash
# Let's check the content of example env files to find database-related variables
cat apps/dashboard/.env.example apps/play/.env.example apps/workflows/.env.example apps/www/.env.example tools/migrate/.env.example

# Check the contributing guide for environment setup instructions
cat .github/CONTRIBUTING.md

# Look for any database configuration documentation
rg -i "database|drizzle|planetscale" .github/

Length of output: 6556

"@chronark/zod-bird": "0.3.9",
"@clerk/nextjs": "^4.29.10",
"@mendable/firecrawl-js": "^1.5.2",
"@octokit/rest": "^21.0.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Document required GitHub token scopes for PR creation

Based on the code analysis, the GitHub token needs the following minimum required scopes to function:

  • repo scope for creating/updating files and branches
  • pull_requests scope for creating PRs

The token is used appropriately with specific operations:

  • Reading refs (getRef, listMatchingRefs)
  • Managing branches (deleteRef, createRef)
  • File operations (createOrUpdateFileContents)
  • PR creation (pulls.create)

Consider documenting these required scopes in:

  • Environment variable documentation
  • Repository setup instructions
  • CI/CD configuration
🔗 Analysis chain

Verify GitHub token scope requirements.

The addition of @octokit/rest aligns with the PR creation functionality. However, ensure that:

  1. The required GitHub token scopes are documented
  2. The token is used with minimal required permissions

Let's check for GitHub token scope documentation:


Let me gather more information about the GitHub token usage and configuration:


Let me check the actual PR creation code to understand the required permissions:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Look for GitHub token scope documentation

# Test 1: Search for GitHub token scope documentation
rg -i "github.*token.*scope|github.*scope|github.*permission" -g '!*.{lock,sum}'

# Test 2: Look for Octokit client initialization to verify used scopes
ast-grep --pattern 'new Octokit({ auth: $_ })'

Length of output: 4148


Script:

#!/bin/bash
# Search for environment variable documentation and configuration
rg -i "GITHUB_PERSONAL_ACCESS_TOKEN" -A 5 -B 5

# Look for PR creation related code to understand token usage
ast-grep --pattern 'octokit.$_($$$)'

Length of output: 974


Script:

#!/bin/bash
# Look for the complete PR creation flow
rg -l "create-pr" 

# Then examine the specific file content
cat apps/billing/src/trigger/glossary/create-pr.ts

Length of output: 4841

Comment on lines +8 to +26
export const entries = mysqlTable(
"entries",
{
id: int("id").primaryKey().autoincrement(),
inputTerm: varchar("input_term", { length: 255 }).notNull(),
githubPrUrl: varchar("github_pr_url", { length: 255 }),
dynamicSectionsContent: text("dynamic_sections_content"),
metaTitle: varchar("meta_title", { length: 255 }),
metaDescription: varchar("meta_description", { length: 255 }),
createdAt: timestamp("created_at").notNull().defaultNow(),
updatedAt: timestamp("updated_at")
.notNull()
.defaultNow()
.$onUpdate(() => new Date()),
},
(table) => ({
inputTermIdx: index("input_term_idx").on(table.inputTerm),
}),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider strengthening field constraints and adding performance optimizations.

Several suggestions to improve the table definition:

  1. The githubPrUrl field should be marked as notNull() since PR objectives indicate this is a required part of the workflow.
  2. Consider adding size constraints to the dynamicSectionsContent TEXT field using text("dynamic_sections_content", { length: 'medium' }) to optimize storage.
  3. Add validation for the URL format.
  4. Consider adding a composite index on (created_at, updated_at) for better query performance on time-based operations.
 export const entries = mysqlTable(
   "entries",
   {
     id: int("id").primaryKey().autoincrement(),
     inputTerm: varchar("input_term", { length: 255 }).notNull(),
-    githubPrUrl: varchar("github_pr_url", { length: 255 }),
+    githubPrUrl: varchar("github_pr_url", { length: 255 }).notNull(),
-    dynamicSectionsContent: text("dynamic_sections_content"),
+    dynamicSectionsContent: text("dynamic_sections_content", { length: 'medium' }),
     metaTitle: varchar("meta_title", { length: 255 }),
     metaDescription: varchar("meta_description", { length: 255 }),
     createdAt: timestamp("created_at").notNull().defaultNow(),
     updatedAt: timestamp("updated_at")
       .notNull()
       .defaultNow()
       .$onUpdate(() => new Date()),
   },
   (table) => ({
     inputTermIdx: index("input_term_idx").on(table.inputTerm),
+    timeBasedIdx: index("time_based_idx").on(table.createdAt, table.updatedAt),
   }),
 );

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +3 to +5
:tv: Demo


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add missing demo content.

The Demo section is empty but referenced in the main changes section. Please add:

  • A video/gif demonstration or
  • Step-by-step instructions with screenshots
    This will help reviewers understand the workflow's functionality better.

Comment on lines +74 to +79
const result = await generateObject({
model: openai("gpt-4o-mini"),
system: systemPrompt,
prompt: `Review this content and provide numerical ratings:\n${options.content}`,
schema: ratingsSchema,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for 'generateObject' function calls

The generateObject function may throw errors if the API call fails or if the response doesn't match the schema. To prevent unhandled exceptions, consider wrapping these calls in try-catch blocks and handling errors appropriately.

Apply this diff to handle errors:

 try {
   const result = await generateObject({
     model: openai("gpt-4o-mini"),
     system: systemPrompt,
     prompt: `Review this content and provide numerical ratings:\n${options.content}`,
     schema: ratingsSchema,
   });
+  return result;
+} catch (error) {
+  throw new AbortTaskRunError(`Failed to generate ratings: ${error.message}`);
 }
-
-return result;

Also applies to: 129-134

Comment on lines +26 to +29
// TODO: this task is a bit flake-y still
// - split up into smaller tasks, and/or
// - move some of the in-memory storage to db caching, and/or
// - improve the prompts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Address the TODO: Improve task reliability and performance

The code contains a TODO comment indicating that the generateOutlineTask is flaky. To enhance reliability and maintainability, consider the following actions:

  • Split the task into smaller, atomic tasks: Breaking down the task can improve error isolation and make the workflow more manageable.
  • Persist intermediate data to the database: This reduces reliance on in-memory storage, aids in caching, and enhances the idempotency of tasks.
  • Refine the AI prompts: Improving prompt engineering can lead to more consistent and accurate AI responses.

if (organicResults.length === 0) {
throw new AbortTaskRunError(`No organic results found for term: ${term}`);
}
console.info(`Step 1/8 - ORGANIC RESULTS: ${organicResults?.length} results`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid logging sensitive or extensive data to the console

The use of console.info to log extensive content, such as topRankingContent, may expose sensitive information and clutter the logs. Specifically, the following lines log potentially large amounts of data:

  • Line 83: Logs the number of organic results (acceptable).
  • Line 99 & 107: Logs the entire topRankingContent, which may be redundant and excessive.

Consider adjusting the logging strategy:

  • Log summaries or metadata instead of full content: Provide concise information that aids debugging without exposing full content.
  • Remove duplicate log statements: The log at line 107 appears to duplicate the one at line 99.

Apply this diff to adjust the logging:

 console.info(`Step 1/8 - ORGANIC RESULTS: ${organicResults?.length} results`);

-console.info(`Step 3/8 - SUMMARIES: ${topRankingContent}`);
+console.info(`Step 2/8 - Retrieved and summarized top-ranking content.`);

-console.info(`Step 3/8 - SUMMARIES: ${topRankingContent}`);

Also applies to: 99-99, 107-107

Comment on lines +119 to +127
// Step 5: Technical review by domain expert
const technicalEval = await performTechnicalEvalTask.triggerAndWait({
input: term,
content: topRankingContent,
onCacheHit,
});
if (!technicalEval.ok) {
throw new AbortTaskRunError("Technical evaluation failed");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential null values in technicalEval

Before accessing technicalEval.output, ensure that technicalEval and technicalEval.output are not null or undefined to prevent runtime errors.

Apply this diff to add null checks:

 if (!technicalEval.ok || !technicalEval.output) {
   throw new AbortTaskRunError("Technical evaluation failed");
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Step 5: Technical review by domain expert
const technicalEval = await performTechnicalEvalTask.triggerAndWait({
input: term,
content: topRankingContent,
onCacheHit,
});
if (!technicalEval.ok) {
throw new AbortTaskRunError("Technical evaluation failed");
}
// Step 5: Technical review by domain expert
const technicalEval = await performTechnicalEvalTask.triggerAndWait({
input: term,
content: topRankingContent,
onCacheHit,
});
if (!technicalEval.ok || !technicalEval.output) {
throw new AbortTaskRunError("Technical evaluation failed");
}

Comment on lines +228 to +243
const newEntry = await db.query.entries.findFirst({
where: eq(entries.id, entry.id),
orderBy: (entries, { desc }) => [desc(entries.createdAt)],
with: {
dynamicSections: {
with: {
contentTypes: true,
sectionsToKeywords: {
with: {
keyword: true,
},
},
},
},
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential null return from db.query.entries.findFirst

Ensure that newEntry is not null before returning it. This prevents unexpected null values and potential runtime errors in downstream code.

Apply this diff to add a null check:

 const newEntry = await db.query.entries.findFirst({
   where: eq(entries.id, entry.id),
   // ...rest of the query
 });

+if (!newEntry) {
+  throw new AbortTaskRunError(`New entry not found for id: ${entry.id}`);
+}

 return newEntry;

Committable suggestion skipped: line range outside the PR's diff.

@vercel vercel bot temporarily deployed to Preview – engineering November 11, 2024 12:26 Inactive
@chronark chronark merged commit ba369f9 into unkeyed:main Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants