feat(billing): add glossary entry generation workflow#2642
feat(billing): add glossary entry generation workflow#2642chronark merged 12 commits intounkeyed:mainfrom
Conversation
Implement generate-outline task for API glossary content - Created generateOutlineTask with the following steps: 1. Fetch top-ranking pages' markdown content 2. Summarize content using AI 3. Generate initial outline 4. Perform technical review 5. Conduct SEO review 6. Optimize outline for SEO 7. Perform editorial review - Implemented helper functions: - generateInitialOutline: Creates a flat, comprehensive outline - performTechnicalReview: Evaluates coverage, comprehensiveness, and conciseness - performSEOReview: Analyzes keyword usage and SEO optimization - reviseSEOOutline: Refines outline based on SEO recommendations - performEditorialReview: Ensures readability, flow, and engagement - Utilized OpenAI's GPT-4 for AI-assisted content generation and review - Integrated with database queries for keyword and content retrieval - Implemented robust error handling and logging throughout the process
0. Added generate entry task 1. Fixed the initial error with search query timestamps by omitting them from the schema 2. Improved the firecrawl code to be more robust and debuggable 3. Added better error handling and logging throughout the keyword research task 4. Fixed the entry creation issue by ensuring entries are created alongside search queries
|
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces a series of changes across multiple files in the billing application, primarily focused on enhancing database schema definitions, adding new functionalities related to glossary entry generation, and improving task management through the Trigger.dev SDK. Key additions include new files for instructions on using Drizzle ORM and prompt engineering, as well as the introduction of several tasks for generating glossary entries, SEO meta tags, and evaluations. Additionally, updates to the Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@p6l-richard is attempting to deploy a commit to the Unkey Team on Vercel. A member of the Team first needs to authorize it. |
|
Thank you for following the naming conventions for pull request titles! 🙏 |
…parated, simplified & db cached
There was a problem hiding this comment.
Actionable comments posted: 36
🧹 Outside diff range and nitpick comments (53)
apps/billing/src/lib/db-marketing/schemas/index.ts (1)
Line range hint
1-8: Consider reorganizing exports for better maintainability.The current export order appears random. Consider grouping related modules together and following a consistent organization pattern:
+// Search-related schemas export * from "./serper"; export * from "./searchQuery"; -export * from "./entries"; export * from "./keywords"; export * from "./firecrawl"; -export * from "./sections"; -export * from "./evals"; + +// Content-related schemas +export * from "./entries"; +export * from "./sections"; + +// Evaluation schemas +export * from "./evals";apps/billing/tsconfig.json (1)
3-3: Document minimum environment requirements.Since the target is now set to
esnext, consider documenting the minimum required Node.js version and browser requirements in the README to help other developers and users.Would you like me to help create a documentation section about environment requirements?
apps/billing/src/lib/db-marketing/schemas/entries.ts (2)
28-34: Consider optimizing relations for better performance and data integrity.The relations are correctly defined, but consider these improvements:
- Add an index on the foreign key fields to optimize join operations
- Define explicit
onDeletebehavior to maintain data integrityexport const entriesRelations = relations(entries, ({ many, one }) => ({ dynamicSections: many(sections), searchQuery: one(searchQueries, { fields: [entries.inputTerm], references: [searchQueries.inputTerm], + relationName: "entry_search_query", + onDelete: "cascade", }), }));
36-40: Enhance schema validation with additional rules and custom error messages.Consider extending the insert schema with additional validation rules:
-export const insertEntrySchema = createInsertSchema(entries).extend({}).omit({ id: true }); +export const insertEntrySchema = createInsertSchema(entries).extend({ + githubPrUrl: z.string().url({ message: "Invalid GitHub PR URL format" }), + metaTitle: z.string().min(10).max(60), + metaDescription: z.string().min(50).max(160), +}).omit({ id: true });apps/billing/src/lib/db-marketing/schemas/keywords.ts (2)
27-30: Remove unnecessary.extend({}).The schema definitions look good, but there's an unnecessary empty extension.
-export const insertKeywordsSchema = createInsertSchema(keywords).extend({}).omit({ id: true }); +export const insertKeywordsSchema = createInsertSchema(keywords).omit({ id: true });
Line range hint
9-26: Consider adding composite indexes for common query patterns.Given this is part of a glossary entry generation workflow, consider adding composite indexes for frequently accessed combinations of fields (e.g.,
source+keyword) to optimize query performance.Example addition to the table definition:
(table) => ({ inputTermIdx: index("input_term_idx").on(table.inputTerm), sourceUrlIdx: index("source_url_idx").on(table.sourceUrl), + sourceKeywordIdx: index("source_keyword_idx").on(table.source, table.keyword), uniqueKeyword: unique("keywords_input_term_keyword_unique").on(table.inputTerm, table.keyword), }),apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (2)
1-1: Remove unused import.The
sqlimport fromdrizzle-ormis not used in this file.-import { relations, sql } from "drizzle-orm"; +import { relations } from "drizzle-orm";Also applies to: 13-14
58-61: Remove unnecessary schema extension.The empty
.extend({})call can be removed as it doesn't modify the schema.-export const insertFirecrawlResponseSchema = createInsertSchema(firecrawlResponses) - .extend({}) - .omit({ id: true }); +export const insertFirecrawlResponseSchema = createInsertSchema(firecrawlResponses) + .omit({ id: true });apps/billing/src/lib/db-marketing/schemas/serper.ts (1)
49-49: Add an index for the foreign key column.Consider adding an index for
firecrawlResponseIdto improve query performance when joining with thefirecrawl_responsestable.(table) => ({ searchResponseIdIdx: index("search_response_id_idx").on(table.searchResponseId), linkIdx: index("link_idx").on(table.link), + firecrawlResponseIdIdx: index("firecrawl_response_id_idx").on(table.firecrawlResponseId), }),apps/billing/src/trigger.instructions.txt (2)
12-12: Fix grammatical issues.Add missing commas in the following locations:
- "There are different types of tasks, including regular tasks and [scheduled tasks]"
- "For more information, read [the machines guide]"
- "By default, tasks can execute indefinitely"
Also applies to: 115-115, 130-130
🧰 Tools
🪛 LanguageTool
[uncategorized] ~12-~12: Possible missing comma found.
Context: ...ure." --- There are different types of tasks including regular tasks and [scheduled ...(AI_HYDRA_LEO_MISSING_COMMA)
309-318: Add glossary workflow examples.Since this documentation is part of the glossary entry generation workflow PR, it would be helpful to:
- Add a section with practical examples demonstrating how to implement the seven sequential sub-tasks (keyword research, outline generation, etc.)
- Include code snippets showing how these tasks interact with PlanetScale's marketing instance
apps/billing/cursor/prompt-engineering.md (4)
37-38: Fix list indentation for consistency.The list items have inconsistent indentation levels. Some items use 3 spaces while others use 4 spaces. This inconsistency might affect readability and markdown rendering.
Ensure all list items at the same level use consistent indentation (3 spaces is the standard).
Also applies to: 62-62
🧰 Tools
🪛 Markdownlint
37-37: Expected: 3; Actual: 4
Inconsistent indentation for list items at the same level(MD005, list-indent)
38-38: Expected: 3; Actual: 4
Inconsistent indentation for list items at the same level(MD005, list-indent)
47-53: Standardize code block formatting.Code blocks have inconsistent formatting:
- Missing language specifiers for syntax highlighting
- Mixing tilde (~) and backtick (`) fence styles
Apply these improvements:
- Add appropriate language specifiers to all code blocks
- Standardize on backticks for code fences
Example:-~~~ +```python def example(): pass -~~~ +```Also applies to: 65-77, 95-128, 187-187, 299-299, 446-446, 606-606, 700-700
🧰 Tools
🪛 Markdownlint
47-47: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
53-53: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
170-170: Add alt text to all images for accessibility.All images in the document are missing alternative text, which is essential for:
- Screen reader accessibility
- SEO optimization
- Fallback when images fail to load
Add descriptive alt text to each image. For example:
-<img width="550" src="..." title="GPT-4 answering questions about a set of expenses."> +<img width="550" src="..." alt="Screenshot showing GPT-4 accurately answering questions about expense data" title="GPT-4 answering questions about a set of expenses.">Also applies to: 180-180, 386-392, 404-408, 422-430, 440-440, 600-600, 691-691, 716-724, 742-748, 756-762, 770-770
🧰 Tools
🪛 Markdownlint
170-170: null
Images should have alternate text (alt text)(MD045, no-alt-text)
417-417: Refine grammar and style for improved readability.Several minor improvements would enhance the document's clarity:
- Remove redundant phrases (e.g., "close proximity" → "proximity")
- Add missing commas in compound sentences
- Use en dashes (–) instead of hyphens (-) for ranges
Apply these refinements throughout the document for improved professionalism and readability.
Also applies to: 682-682, 686-686, 719-719, 721-721, 753-753, 767-767, 779-779
🧰 Tools
🪛 LanguageTool
[style] ~417-~417: This phrase is a bit redundant. Consider using just “proximity”.
Context: ...that the model handles really well. The close proximity ofkeysto theirvaluesmakes it ea...(CLOSE_SCRUTINY)
apps/billing/src/lib/db-marketing/schemas/evals.ts (5)
18-20: Consider using 'json' column type for JSON data storageThe fields
ratings,recommendations, andoutlinestore JSON stringified data intextcolumns. If your MySQL version supports it, consider using thejsoncolumn type provided by Drizzle ORM for better type safety and query capabilities.Apply this diff to update the column types:
- ratings: text("ratings").notNull(), // JSON stringified ratings - recommendations: text("recommendations").notNull().default("[]"), // Add default empty array - outline: text("outline").default("[]"), // Add outline field + ratings: json("ratings").notNull(), + recommendations: json("recommendations").notNull().default([]), + outline: json("outline").default([]),
39-49: SimplifyratingValueSchemaby usingz.preprocessThe current
ratingValueSchemauses a union and multiple transformations to handle both numbers and numeric strings. You can simplify it by usingz.preprocessto coerce inputs to numbers before validation.Here's the refactored schema:
const ratingValueSchema = z.preprocess( (val) => Number(val), z.number().int().min(0).max(10) );
51-61: Remove redundanttransforminratingsSchemaSince
ratingValueSchemaalready ensures that the values are numbers, thetransformstep inratingsSchemathat converts the values to numbers is redundant. Consider removing it to simplify the schema.Here's the updated schema:
export const ratingsSchema = z.object({ accuracy: ratingValueSchema, completeness: ratingValueSchema, clarity: ratingValueSchema, });
74-74: Remove unnecessary.extend({})call ininsertEvalSchemaThe
.extend({})ininsertEvalSchemais unnecessary since no additional fields are being added. Removing it improves code readability.Update the code:
export const insertEvalSchema = createInsertSchema(evals).omit({ id: true });
78-79: Ensure consistent type inference forInsertEvalandSelectEvalFor consistency in type definitions, consider inferring
SelectEvalfromselectEvalSchemausingz.infer, as you did withInsertEval.Update the type definition:
export type SelectEval = z.infer<typeof selectEvalSchema>;apps/billing/src/lib/search-query.ts (4)
16-26: Refactor to Avoid Duplicate Insertion Logic intoentriesTableThe code for inserting into the
entriestable is duplicated when an existing query is found (lines 16-26) and within the transaction for new queries (lines 65-75). Refactoring this duplication can improve maintainability and reduce potential errors.Consider extracting the insertion logic into a separate function or moving it outside the conditional blocks if appropriate.
Also applies to: 65-75
31-32: Offer Assistance for Improving Prompting LogicThe comment indicates that the prompting requires improvements to generate better search queries.
Would you like assistance in refining the prompt to enhance the quality of the generated search queries? I can help draft an improved prompt or suggest strategies to improve prompt engineering.
Line range hint
34-49: Implement Error Handling forgenerateObjectCallThe
generateObjectfunction may throw an error if the generation fails. Implementing error handling ensures that such exceptions are handled gracefully.Apply this diff to wrap the
generateObjectcall in a try-catch block:+ try { const generatedQuery = await generateObject({ model: openai("gpt-4o-mini"), system: `You are a Senior Content Writer who specialises in writing technical content for Developer Tools that are SEO optimized. For every term, you conduct a search on Google to gather the data you need. You're goal is to create a search query that will return a SERP with the most relevant information for the term. Make sure to always include the exact term in the search query at the beginning of the query. If the term is clearly associated to API development, use the term as-is for query. If the term is ambiguous with non-API development related fields and could result in unrelated results, add context to the search query to clarify the search & return the reason for the ambiguity. Keep the search query as short and as simple as possible, don't use quotes around the search query. `, prompt: `Create the search query for the term "${term}."`, schema: insertSearchQuerySchema.omit({ createdAt: true, updatedAt: true }), }); + } catch (error) { + // Handle the error appropriately + throw new Error(`Failed to generate search query: ${error.message}`); + }
52-75: Enhance Error Handling in Database TransactionAdding error handling within the transaction block can help manage exceptions effectively and maintain data integrity.
Consider implementing error handling within the transaction:
await db.transaction(async (tx) => { + try { // Insert search query await tx .insert(searchQueries) .values({ ...generatedQuery.object, }) .onDuplicateKeyUpdate({ set: { updatedAt: sql`now()`, }, }); // Insert entry await tx .insert(entries) .values({ inputTerm: term, }) .onDuplicateKeyUpdate({ set: { updatedAt: sql`now()`, }, }); + } catch (error) { + // Handle transaction errors + throw new Error(`Transaction failed: ${error.message}`); + } });apps/billing/src/lib/db-marketing/schemas/sections.ts (3)
25-25: Avoid using reserved keywords as column namesUsing
orderas a column name can lead to conflicts sinceORDERis a reserved keyword in SQL. Consider renaming this field to something likesequenceorpositionto prevent potential issues.
39-41: Define related tables before referencing themThe
sectionsRelationsreferencesectionContentTypesandsectionsToKeywordsbefore they are defined. While JavaScript allows this due to hoisting, defining these tables before they are referenced can improve code readability and maintainability.
47-57: Consider using an enum forcontentTypesDefining
contentTypesas a constant array is effective but using a TypeScriptenumcould enhance type safety and code clarity in the long run.Here's how you might define it:
enum ContentType { Listicle = "listicle", Table = "table", Image = "image", Code = "code", Infographic = "infographic", Timeline = "timeline", Other = "other", Text = "text", Video = "video", }Then, update the
mysqlEnumdefinition accordingly.apps/billing/src/trigger/glossary/keyword-research.ts (5)
13-13: Rename constant 'THREE' to a more descriptive nameThe constant
THREEmay not clearly convey its purpose. Consider renaming it to something more descriptive likeTOP_RESULTS_COUNTfor better readability.Apply this diff to rename the constant and update its usages:
-export const THREE = 3; +export const TOP_RESULTS_COUNT = 3;Update references in the code:
-console.info(`3/6 - Getting content for top ${THREE} results`); +console.info(`3/6 - Getting content for top ${TOP_RESULTS_COUNT} results`); -const topThree = searchResponse.serperOrganicResults +const topResults = searchResponse.serperOrganicResults .sort((a, b) => a.position - b.position) - .slice(0, THREE); + .slice(0, TOP_RESULTS_COUNT);
48-48: Avoid logging sensitive informationSimilar to the previous comment, ensure that logging does not expose sensitive data. While logging the number of results is acceptable, be cautious with other details.
112-112: Deduplicate keywords before returningWhen combining keywords from multiple sources, duplicates may occur. Deduplicating the keywords enhances the quality of the results.
Add deduplication logic:
-const allKeywords = [ - ...keywordsFromTitles, - ...keywordsFromHeaders, - ...insertedRelatedSearches, -]; +const keywordMap = new Map(); +[...keywordsFromTitles, ...keywordsFromHeaders, ...insertedRelatedSearches].forEach((k) => { + keywordMap.set(k.keyword, k); +}); +const uniqueKeywords = Array.from(keywordMap.values());Update the return statement:
-return { - message: 'Keyword Research completed', - keywords: [...keywordsFromTitles, ...keywordsFromHeaders, ...insertedRelatedSearches], -}; +return { + message: 'Keyword Research completed', + keywords: uniqueKeywords, +};
15-115: Add unit tests for 'keywordResearchTask'Implementing unit tests for the
keywordResearchTaskensures its correctness and reliability. Cover scenarios like cache hits, cache misses, error handling, and edge cases.
22-22: Type annotation improvement for default parameterThe current type casting using
as CacheStrategymight be unnecessary. Consider specifying the type directly in the parameter definition for clarity.Update the parameter definition:
-run: async ({ - term, - onCacheHit = "stale" as CacheStrategy, -}: { term: string; onCacheHit?: CacheStrategy }) => { +run: async ({ + term, + onCacheHit = "stale", +}: { term: string; onCacheHit?: CacheStrategy }) => {Ensure that
"stale"is a valid value ofCacheStrategy.apps/billing/src/trigger/glossary/create-pr.ts (3)
14-16: Adjust parameter typing for 'onCacheHit'Using a type assertion in the default parameter value may cause issues. Instead, explicitly type the parameter to ensure proper type checking.
Apply this diff to adjust the parameter typing:
-run: async ({ - input, - onCacheHit = "stale" as CacheStrategy, -}: { input: string; onCacheHit?: CacheStrategy }) => { +run: async ({ + input, + onCacheHit = "stale", +}: { input: string; onCacheHit?: CacheStrategy }) => { + onCacheHit = onCacheHit as CacheStrategy;Alternatively, define the parameter type directly:
-run: async ({ - input, - onCacheHit = "stale" as CacheStrategy, -}: { input: string; onCacheHit?: CacheStrategy }) => { +run: async ({ + input, + onCacheHit = "stale", +}: { input: string; onCacheHit?: CacheStrategy & string }) => {
18-26: Consolidate database queries to improve performanceThe code performs two separate database queries to fetch
existingandentry, which are essentially fetching the same data. Consolidate these queries to reduce redundant calls.Combine the queries into one and reuse the result:
-const existing = await db.query.entries.findFirst({ - where: eq(entries.inputTerm, input), - columns: { - id: true, - inputTerm: true, - githubPrUrl: true, - }, - orderBy: (entries, { desc }) => [desc(entries.createdAt)], -}); +const entry = await db.query.entries.findFirst({ + where: eq(entries.inputTerm, input), + orderBy: (entries, { desc }) => [desc(entries.createdAt)], +}); ... -if (existing?.githubPrUrl && onCacheHit === "stale") { +if (entry?.githubPrUrl && onCacheHit === "stale") { return { - entry: existing, + entry, message: `Found existing PR for ${input}.mdx`, }; } -if (!entry?.dynamicSectionsContent) { +if (!entry || !entry.dynamicSectionsContent) { throw new AbortTaskRunError( `Unable to create PR: The markdown content for the dynamic sections are not available for the entry to term: ${input}. It's likely that draft-sections.ts didn't run as expected.`, ); }Also applies to: 36-39
122-128: Enhance the pull request description for clarityThe current PR body is minimal. Providing more context can help reviewers understand the changes better.
Consider updating the PR body:
body: `This PR adds the ${sanitizedInput}.mdx file to the API documentation. +Please review the content for accuracy and completeness. Let me know if any changes are required. `,apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (3)
13-33: Fix typographical errors in commentsThere are a couple of typos in the comments:
- In line 29, "undeterministic" should be "non-deterministic".
- In line 30, "mititage" should be "mitigate".
Apply this diff to correct the typos:
* Sub-tasks may (in fact most of them do) rely on undeterministic execution when they use LLMs. As a result, the maxAttempts for a subtask is 5. - * Sub-tasks may (in fact most of them do) rely on undeterministic execution when they use LLMs. As a result, the maxAttempts for a subtask is 5. + * Sub-tasks may (in fact most of them do) rely on non-deterministic execution when they use LLMs. As a result, the maxAttempts for a subtask is 5. * The downside here is that this increases the runtime of the workflow considerably. Ways to mititage this are cited in the `todo` comment inside @file ./generate-outline.ts. - * The downside here is that this increases the runtime of the workflow considerably. Ways to mititage this are cited in the `todo` comment inside @file ./generate-outline.ts. + * The downside here is that this increases the runtime of the workflow considerably. Ways to mitigate this are cited in the `todo` comment inside @file ./generate-outline.ts.
113-115: Clarify the purpose of the conditional checkThe comment
// this if statement is here to make TypeScript happymay not provide clear context. Consider rephrasing the comment to explain why this check is necessary for TypeScript type safety.Apply this diff to improve the comment:
- // this if statement is here to make TypeScript happy + // Ensure pr.output.entry.id is defined before proceeding
72-104: Refactor repeated sub-task execution patternThe code for executing and handling sub-tasks is repeated for each step. Consider refactoring this pattern into a utility function to reduce duplication and improve maintainability.
Example refactoring:
async function executeSubTask<T>({ stepNumber, stepName, taskFunction, term, onCacheHit, successMessage, errorPrefix, }: { stepNumber: number; stepName: string; taskFunction: (args: { term: string; onCacheHit?: CacheStrategy }) => Promise<{ ok: boolean; output?: T }>; term: string; onCacheHit?: CacheStrategy; successMessage: string; errorPrefix: string; }) { console.info(`${stepNumber}/5 - ${stepName}...`); const result = await taskFunction({ term, onCacheHit }); if (!result.ok) { throw new AbortTaskRunError(`${errorPrefix} failed for term: ${term}`); } console.info(successMessage); return result; }Usage in your main function:
// Step 1: Keyword Research const keywordResearch = await executeSubTask({ stepNumber: 1, stepName: "Starting keyword research", taskFunction: keywordResearchTask.triggerAndWait, term, onCacheHit, successMessage: `✓ Keyword research completed with ${keywordResearch.output.keywords.length} keywords`, errorPrefix: "Keyword research", }); // Repeat for other steps...apps/billing/src/trigger/glossary/seo-meta-tags.ts (2)
116-122: Add error handling for database update operationCurrently, the code assumes that the database update operation succeeds. Consider adding error handling to catch and handle any potential errors from the database update.
For example:
try { await db .update(entries) .set({ metaTitle: craftedMetaTags.object.title, metaDescription: craftedMetaTags.object.description, }) .where(eq(entries.inputTerm, term)); } catch (error) { // Handle error appropriately }
54-114: Handle possible OpenAI API errors and rate limitsWhen calling
generateObject, it's important to handle potential API errors, such as rate limits or network issues, to enhance the robustness of your application.Consider wrapping the API call in a try-catch block and implementing retries or fallback mechanisms as needed.
apps/billing/src/lib/firecrawl.ts (1)
Line range hint
27-117: Consider refactoring 'getOrCreateFirecrawlResponse' for better readabilityThe
getOrCreateFirecrawlResponsefunction is quite lengthy and contains multiple nested blocks. Consider refactoring it into smaller helper functions to enhance readability and maintainability.🧰 Tools
🪛 Biome
[error] 4-4: Declarations inside of a
importdeclaration may not have duplicatesa second declaration of
firecrawlResponsesis not allowed
firecrawlResponsesis first declared here(parse)
apps/billing/src/lib/keywords.ts (3)
53-53: Fix typos in prompt messagesThere are typos in the prompts where "form" should be "from." Correcting these improves clarity.
Apply these diffs to correct the typos:
At line 53:
Create a list of keywords that are directly related to the main term and its subtopics form the titles of the pages. +Create a list of keywords that are directly related to the main term and its subtopics from the titles of the pages.At line 136:
Create a list of keywords that are directly related to the main term and its subtopics form the h1 headers of the pages. +Create a list of keywords that are directly related to the main term and its subtopics from the h1 headers of the pages.Also applies to: 136-136
83-94: Consider processing 'keywordsWithBrandNames' for comprehensive keyword coverageCurrently, the code processes only the
keywordsarray returned by the AI model and ignoreskeywordsWithBrandNames. If brand-related keywords are relevant to your SEO strategy, consider processing and storing them as well.Would you like assistance in updating the code to handle
keywordsWithBrandNamessimilarly tokeywords?Also applies to: 158-169
30-106: Refactor common logic to improve maintainabilityThe functions
getOrCreateKeywordsFromTitlesandgetOrCreateKeywordsFromHeadersshare similar structures and logic. Refactoring shared code into helper functions can reduce duplication and enhance maintainability.Consider extracting common operations such as:
- Checking for existing keywords in the database
- Constructing prompts
- Invoking the AI model and handling the response
- Inserting keywords into the database
Would you like assistance in outlining a refactored structure?
Also applies to: 108-180
apps/billing/src/trigger/glossary/draft-sections.ts (2)
61-63: Consider making the dynamic sections limit configurableCurrently, the code limits
dynamicSectionsto the first 6 items usingslice(0, 6). To enhance flexibility and maintainability, consider making this limit a configurable parameter or providing documentation to explain the rationale behind this hardcoded value.
142-151: Correct numbering in the 'Guidelines' section of the promptThe numbering in the
Guidelinessection has duplicate entries and skips numbers, which may cause confusion. Please correct the numbering sequence for clarity.Apply this diff to fix the numbering:
Guidelines: -1. Write in markdown format. -2. Start with an introductory paragraph before stating the first section heading -2. Use "##" for the section heading -3. Skip the title of the page, that will be provided separately -4. Ensure the content is factually correct and relevant to API development. -5. Include examples, code snippets or other rich content if appropriate. -6. Keep the content concise but informative, ensure that there are no fluff phrases or statements that don't provide concrete information, context & background to the term. -7. Don't repeat content between sections, ensure that each section adds value -8. Only write the content for the section, do not provide any other context, introductions or statements regarding this task. +1. Write in markdown format. +2. Start with an introductory paragraph before stating the first section heading. +3. Use "##" for the section heading. +4. Skip the title of the page; that will be provided separately. +5. Ensure the content is factually correct and relevant to API development. +6. Include examples, code snippets, or other rich content if appropriate. +7. Keep the content concise but informative, ensuring there are no fluff phrases or statements that don't provide concrete information, context, or background to the term. +8. Don't repeat content between sections; ensure that each section adds value. +9. Only write the content for the section; do not provide any other context, introductions, or statements regarding this task.apps/billing/src/trigger/glossary/evals.ts (4)
141-209: Refactor duplicated code in evaluation tasksThe functions
performTechnicalEvalTask,performSEOEvalTask, andperformEditorialEvalTaskhave similar structures with repetitive code. Consider refactoring them into a single generic function that accepts the evaluationtypeas a parameter. This will reduce code duplication and improve maintainability.Also applies to: 212-277, 280-362
283-283: Standardize logging format across evaluation tasksIn
performEditorialEvalTask, logging statements include specific prefixes like[workflow=glossary] [task=editorial_eval], whereas other evaluation tasks do not use this format. For consistency, consider standardizing the logging statements across all evaluation tasks.Apply this diff to align logging in other tasks:
// In performTechnicalEvalTask - console.info(`Starting technical evaluation for term: ${input}`); + console.info(`[workflow=glossary] [task=technical_eval] Starting for term: ${input}`); // Similarly update other log statements in the functionAlso applies to: 299-301, 309-311, 321-323, 326-328, 343-345
303-307: Handle potential undefined 'existing.outline' property safelyWhen parsing
existing.outlinein line 305, ensure thatexisting.outlineis defined and contains valid JSON to prevent runtime errors. Consider adding a check before parsing.Apply this diff for safer parsing:
- outline: JSON.parse(existing.outline || "[]"), + outline: existing.outline ? JSON.parse(existing.outline) : [],
186-186: Remove unnecessary spread operator in function callThe spread operator
...optionsis being used along with explicitly passed parameters in function calls. Since all properties ofoptionsare already being passed, the spread operator may be redundant and could lead to unintended behavior. Please review its necessity.Apply this diff to remove the spread operator:
await getOrCreateRatingsTask.triggerAndWait({ input, type: "technical", content: options.content, onCacheHit, - ...options, });Also applies to: 257-257, 335-335
apps/billing/src/trigger/glossary/generate-outline.ts (3)
83-84: Correct step numbering in log messagesThere is an inconsistency in the step numbering of the log messages:
- Line 83:
Step 1/8 - ORGANIC RESULTS- Line 83 should be
Step 2/8to align with the sequential flow, assumingStep 1/8was an earlier action.Ensure that the step numbers in the log messages accurately reflect the sequence of operations to aid in debugging.
Apply this diff to correct the step numbering:
-console.info(`Step 1/8 - ORGANIC RESULTS: ${organicResults?.length} results`); +console.info(`Step 2/8 - ORGANIC RESULTS: ${organicResults?.length} results`);
197-214: Optimize keyword association logicThe nested loop for associating keywords with sections can be optimized for performance and readability. Consider the following improvements:
- Use
mapandfiltermethods: These methods can simplify the logic and make the code more declarative.- Handle cases where keywords are not found: Accumulate unmatched keywords for further analysis or logging.
Apply this diff to refactor the keyword association:
-const keywordInsertionPayload = []; -for (let i = 0; i < editorialEval.output.outline.length; i++) { - const section = { ...editorialEval.output.outline[i], id: newSectionIds[i].id }; - for (let j = 0; j < section.keywords.length; j++) { - const keyword = section.keywords[j]; - const keywordId = seoKeywords.find( - (seoKeyword) => keyword.keyword === seoKeyword.keyword, - )?.id; - if (!keywordId) { - console.warn(`Keyword "${keyword.keyword}" not found in seo keywords`); - continue; - } - const payload = insertSectionsToKeywordsSchema.parse({ - sectionId: section.id, - keywordId, - }); - keywordInsertionPayload.push(payload); - } -} +const keywordInsertionPayload = editorialEval.output.outline.flatMap((section, index) => { + const sectionId = newSectionIds[index].id; + return section.keywords.map((keyword) => { + const keywordId = seoKeywords.find( + (seoKeyword) => keyword.keyword === seoKeyword.keyword, + )?.id; + if (!keywordId) { + console.warn(`Keyword "${keyword.keyword}" not found in SEO keywords`); + return null; + } + return insertSectionsToKeywordsSchema.parse({ + sectionId, + keywordId, + }); + }).filter(Boolean); +});
278-280: Clarify instruction formatting in AI promptsIn the
generateInitialOutlinefunction, the system prompt and instructions could be formatted for better clarity when passed to the AI model. Use consistent indentation and bullet points to improve readability.Apply this diff to adjust the prompt formatting:
const initialOutlineSystem = `You are a **Technical SEO Content Writer** specializing in API development and computer science. -Your objective is to create a flat, comprehensive outline for a glossary page based on summarized content from top-ranking pages. -Ensure factual correctness, clarity, and SEO optimization without unnecessary subheadings.`; +Your objective is to: +- Create a flat, comprehensive outline for a glossary page based on summarized content from top-ranking pages. +- Ensure factual correctness, clarity, and SEO optimization without unnecessary subheadings.`;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (26)
apps/billing/cursor/drizzle_instructions.txt(1 hunks)apps/billing/cursor/prompt-engineering.md(1 hunks)apps/billing/package.json(2 hunks)apps/billing/src/lib/db-marketing/data-instructions.txt(1 hunks)apps/billing/src/lib/db-marketing/schemas/entries.ts(1 hunks)apps/billing/src/lib/db-marketing/schemas/evals.ts(1 hunks)apps/billing/src/lib/db-marketing/schemas/firecrawl.ts(4 hunks)apps/billing/src/lib/db-marketing/schemas/index.ts(1 hunks)apps/billing/src/lib/db-marketing/schemas/keywords.ts(3 hunks)apps/billing/src/lib/db-marketing/schemas/searchQuery.ts(2 hunks)apps/billing/src/lib/db-marketing/schemas/sections.ts(1 hunks)apps/billing/src/lib/db-marketing/schemas/serper.ts(3 hunks)apps/billing/src/lib/firecrawl.ts(2 hunks)apps/billing/src/lib/keywords.ts(1 hunks)apps/billing/src/lib/search-query.ts(3 hunks)apps/billing/src/trigger.instructions.txt(1 hunks)apps/billing/src/trigger/glossary/_generate-glossary-entry.ts(1 hunks)apps/billing/src/trigger/glossary/create-pr.ts(1 hunks)apps/billing/src/trigger/glossary/draft-sections.ts(1 hunks)apps/billing/src/trigger/glossary/evals.ts(1 hunks)apps/billing/src/trigger/glossary/generate-outline.ts(1 hunks)apps/billing/src/trigger/glossary/keyword-research.ts(1 hunks)apps/billing/src/trigger/glossary/seo-meta-tags.ts(1 hunks)apps/billing/src/trigger/keyword-research.ts(0 hunks)apps/billing/tsconfig.json(1 hunks)apps/billing/updates/update_2024_11_05/prompt.txt(1 hunks)
💤 Files with no reviewable changes (1)
- apps/billing/src/trigger/keyword-research.ts
✅ Files skipped from review due to trivial changes (1)
- apps/billing/cursor/drizzle_instructions.txt
🧰 Additional context used
🪛 LanguageTool
apps/billing/cursor/prompt-engineering.md
[uncategorized] ~148-~148: Consider inserting a comma before ‘actually’.
Context: ...ts that could plausibly exist, but don’t actually. The art to getting this right is enume...
(FINAL_ADVERB_COMMA)
[grammar] ~175-~175: Possible agreement error -- use the past participle here.
Context: ...be tricky. The best levers we have here are to provide lots of descriptions, and as...
(HAVE_PART_AGREEMENT)
[style] ~175-~175: The phrase ‘lots of’ might be wordy and overused. Consider using an alternative.
Context: ...best levers we have here are to provide lots of descriptions, and as many examples ...
(A_LOT_OF)
[misspelling] ~265-~265: When specifying a month and year, ‘of’ is unnecessary.
Context: ...s of defining commands. #### ReAct In March of 2023, Princeton and Google released a paper ...
(MONTH_OF_XXXX)
[style] ~375-~375: Consider removing “of” to be more concise
Context: ...you will need to combine some subset of all of the strategies enumerated in this document....
(ALL_OF_THE)
[style] ~399-~399: Consider using an extreme adjective for ‘good’.
Context: ...rate. Fortunately, OpenAI’s models are exceptionally good at working with Markdown tables (presum...
(EXTREME_ADJECTIVES)
[uncategorized] ~415-~415: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...or every item has some custom attributes and it doesn’t make sense to have dozens of...
(COMMA_COMPOUND_SENTENCE)
[style] ~417-~417: This phrase is a bit redundant. Consider using just “proximity”.
Context: ...that the model handles really well. The close proximity of keys to their values makes it ea...
(CLOSE_SCRUTINY)
[typographical] ~426-~426: Consider adding a comma after ‘Occasionally’ for more clarity.
Context: ... JSON blob.">
(RB_LY_COMMA)
[uncategorized] ~682-~682: Possible missing comma found.
Context: ... levels of nested joins. In the example above the model successfully joins from `user...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~686-~686: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...age response isn’t sufficient on its own and you’ll want the model’s output to cite ...
(COMMA_COMPOUND_SENTENCE)
[misspelling] ~719-~719: This word is normally spelled as one.
Context: ...tons for booking the flight + hotel, or auto-filling a form for the user. As another exampl...
(EN_COMPOUNDS_AUTO_FILLING)
[typographical] ~721-~721: It appears that a comma is missing.
Context: ...n the creation of that message. In this scenario you won’t know exactly where in the mes...
(DURING_THAT_TIME_COMMA)
[uncategorized] ~753-~753: Possible missing comma found.
Context: ...we asked the bot to evaluate the Python code it gets it slightly wrong. The correct ...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~767-~767: Consider removing “of” to be more concise
Context: ..., you may not want to show the end user all of the bot’s thinking and instead just want to...
(ALL_OF_THE)
[uncategorized] ~767-~767: Possible missing comma found.
Context: ...t to show the end user all of the bot’s thinking and instead just want to show the final...
(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~779-~779: If specifying a range, consider using an en dash instead of a hyphen.
Context: ...larity for brevity. • For simple tasks, 1-3 sentences are often sufficient. • For m...
(HYPHEN_TO_EN)
apps/billing/src/lib/db-marketing/data-instructions.txt
[grammar] ~6-~6: Two determiners in a row. Choose either “the” or “an”.
Context: ... Relationships: - The central entity is the an inputTerm to which most models are conn...
(DT_DT)
apps/billing/src/trigger.instructions.txt
[uncategorized] ~12-~12: Possible missing comma found.
Context: ...ure." --- There are different types of tasks including regular tasks and [scheduled ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~115-~115: Possible missing comma found.
Context: ...ements in the machine field. For more information read the machines guide. ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~130-~130: Did you mean: “By default,”?
Context: ... }, }); ``` ### maxDuration option By default tasks can execute indefinitely, which c...
(BY_DEFAULT_COMMA)
🪛 Markdownlint
apps/billing/cursor/prompt-engineering.md
37-37: Expected: 3; Actual: 4
Inconsistent indentation for list items at the same level
(MD005, list-indent)
38-38: Expected: 3; Actual: 4
Inconsistent indentation for list items at the same level
(MD005, list-indent)
62-62: Expected: 4; Actual: 3
Inconsistent indentation for list items at the same level
(MD005, list-indent)
47-47: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
53-53: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
65-65: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
77-77: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
95-95: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
108-108: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
114-114: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
120-120: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
126-126: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
187-187: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
299-299: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
446-446: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
606-606: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
700-700: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
170-170: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
180-180: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
386-386: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
392-392: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
404-404: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
408-408: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
422-422: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
430-430: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
440-440: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
600-600: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
691-691: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
716-716: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
724-724: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
742-742: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
748-748: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
756-756: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
762-762: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
770-770: null
Images should have alternate text (alt text)
(MD045, no-alt-text)
187-187: Expected: backtick; Actual: tilde
Code fence style
(MD048, code-fence-style)
299-299: Expected: backtick; Actual: tilde
Code fence style
(MD048, code-fence-style)
446-446: Expected: backtick; Actual: tilde
Code fence style
(MD048, code-fence-style)
606-606: Expected: backtick; Actual: tilde
Code fence style
(MD048, code-fence-style)
700-700: Expected: backtick; Actual: tilde
Code fence style
(MD048, code-fence-style)
🪛 Biome
apps/billing/src/lib/firecrawl.ts
[error] 4-4: Declarations inside of a import declaration may not have duplicates
a second declaration of firecrawlResponses is not allowed
firecrawlResponses is first declared here
(parse)
🔇 Additional comments (28)
apps/billing/src/lib/db-marketing/schemas/index.ts (1)
3-3: Verify schema relationships to prevent circular dependencies.
Since these schemas have relationships between them (as mentioned in the AI summary), let's verify there are no circular dependencies that could cause issues.
#!/bin/bash
# Description: Check for potential circular dependencies in schema files
# Expected: No circular import patterns between related schemas
echo "Analyzing schema relationships..."
for file in "entries" "sections" "evals"; do
echo "Checking imports in ${file}.ts:"
rg -A 3 "import.*from.*\./" "apps/billing/src/lib/db-marketing/schemas/${file}.ts" || echo "No local imports found"
doneAlso applies to: 6-7
apps/billing/tsconfig.json (1)
3-3: Verify browser compatibility requirements.
The change from es5 to esnext enables modern JavaScript features but might affect browser compatibility. Ensure this aligns with your browser support requirements.
Let's check if there are any browser compatibility requirements specified:
✅ Verification successful
Based on the results, I can see that the billing app is part of a monorepo with multiple applications, and the apps/dashboard contains web-specific files including billing-related components (e.g., apps/dashboard/app/(app)/settings/billing/). Since this is a server-side application that's part of a larger web application ecosystem, and modern JavaScript features are commonly used in such environments, the change to esnext is appropriate.
The target change to esnext is safe to proceed with.
The billing app appears to be a server-side component used by the dashboard application, which is a modern Next.js application. Server-side components don't need to consider browser compatibility as they run on the server, and the TypeScript output will be processed by Node.js, which supports modern JavaScript features.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Look for browser compatibility requirements in package.json or README
echo "Checking package.json for browserslist config..."
cat package.json | jq '.browserslist'
echo "Checking README for browser support information..."
fd -g "README*" -X grep -i "browser.*support\|compatibility"
Length of output: 323
Script:
#!/bin/bash
# Let's check if this is a browser-based application or a server-side component
# First, check for web-specific dependencies and entry points
echo "Checking package.json for dependencies..."
cat package.json | jq '.dependencies, .devDependencies'
echo "Looking for web-specific files..."
fd -e html -e tsx -e jsx
# Check if this billing app is imported/used by other apps
echo "Checking usage of billing app..."
rg -l "billing" --type ts --type tsx --type jsx
Length of output: 23674
apps/billing/package.json (1)
26-28: LGTM! Minor version updates for trigger.dev packages.
The updates from 3.1.0 to 3.1.2 are minor version bumps, suggesting backward-compatible changes.
apps/billing/src/lib/db-marketing/schemas/entries.ts (2)
1-7: LGTM: Imports are well-organized and necessary.
All imports are being utilized and follow a logical organization pattern.
1-40: Verify schema consistency across the codebase.
Let's ensure the field names and types are consistent with related schemas and their usage.
✅ Verification successful
Schema definitions and relations are consistent across the codebase
The verification shows that:
- The
entriesschema correctly defines its relations withsectionsandsearchQueriestables - The
inputTermfield is consistently used as the linking field betweenentriesandsearchQueries - The
sectionstable properly referencesentriesthroughentryId - All schema definitions follow the same pattern using drizzle-orm and drizzle-zod
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistency in field names and schema usage
# Test 1: Check field names in related schemas
echo "Checking related schemas..."
rg -t typescript "mysqlTable\(" apps/billing/src/lib/db-marketing/schemas/
# Test 2: Check usage of these schemas in the codebase
echo "Checking schema usage..."
rg -t typescript "insertEntrySchema|selectEntrySchema" apps/billing/src/
Length of output: 345
Script:
#!/bin/bash
# Let's try without the file type filter since it's not recognized
echo "Checking related schemas..."
rg "mysqlTable\(" apps/billing/src/lib/db-marketing/schemas/
echo -e "\nChecking searchQueries schema..."
cat apps/billing/src/lib/db-marketing/schemas/searchQuery.ts
echo -e "\nChecking sections schema..."
cat apps/billing/src/lib/db-marketing/schemas/sections.ts
echo -e "\nChecking usage of entry schemas..."
rg "insertEntrySchema|selectEntrySchema" apps/billing/src/
Length of output: 7987
apps/billing/src/lib/db-marketing/schemas/keywords.ts (2)
5-6: LGTM: Imports are appropriate for schema validation.
The added imports for drizzle-zod and zod types are necessary for implementing type-safe schema validation.
Line range hint 32-41: Verify the existence of referenced tables and their schemas.
The relations look well-structured, particularly the new many-to-many relationship with sectionsToKeywords. Let's verify the referenced tables exist with the correct schema.
✅ Verification successful
All referenced tables and fields exist with correct schema definitions
The verification confirms:
sectionsToKeywordstable exists with proper junction table structuresearchQueriestable exists withinputTermfield (varchar(255)) and corresponding indexserperOrganicResultstable exists withlinkfield (varchar(767)) and corresponding index
The relations configuration is valid and properly references existing tables and fields.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and schema of referenced tables
# Check for the referenced tables and their schemas
echo "Checking for referenced table definitions..."
rg -A 10 "sectionsToKeywords.*=.*mysqlTable" apps/billing/src/lib/db-marketing/schemas/
rg -A 10 "searchQueries.*=.*mysqlTable" apps/billing/src/lib/db-marketing/schemas/
rg -A 10 "serperOrganicResults.*=.*mysqlTable" apps/billing/src/lib/db-marketing/schemas/
# Verify the foreign key fields exist in the referenced tables
echo "Checking for referenced fields..."
rg "inputTerm.*:.*" apps/billing/src/lib/db-marketing/schemas/searchQuery.ts
rg "link.*:.*" apps/billing/src/lib/db-marketing/schemas/serper.ts
Length of output: 4494
apps/billing/updates/update_2024_11_05/prompt.txt (1)
29-29: Validate Drizzle ORM implementation with MySQL.
Given the uncertainty about Drizzle with MySQL:
- Consider adding test cases for database operations
- Document any MySQL-specific configurations
- Include rollback procedures for schema changes
Let's check for any existing Drizzle MySQL configurations:
apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (2)
14-15: LGTM: New imports are properly structured
The new imports for firecrawlResponses and entries are correctly organized and follow the existing import pattern.
41-50: Verify database migration for new relationships
The schema changes introduce new relationships that will require database migrations. Please ensure that:
- Migration files have been generated and reviewed
- The migrations handle existing data appropriately
- The migrations have been tested in a staging environment
apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (2)
33-33: LGTM! Schema changes are well-structured.
The field types and lengths are appropriate for their purposes:
ogSiteNamefollows Open Graph protocol naminginputTermas varchar(255) is suitable for search termssummaryas text type accommodates variable-length content
Also applies to: 37-38
43-43: LGTM! Good performance optimization.
Adding an index on inputTerm is a good practice as it will optimize lookups during the glossary entry generation workflow.
apps/billing/src/lib/db-marketing/schemas/serper.ts (1)
6-6: Verify the firecrawl schema dependency.
The import looks correct. Let's verify that the firecrawl.ts schema file exists and contains the referenced table.
✅ Verification successful
Import dependency verified successfully
The firecrawl.ts schema file exists in the same directory and contains the firecrawlResponses table definition, confirming that the import statement is valid and correctly references an existing table.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and content of the firecrawl schema file
# Expected: File exists and contains firecrawlResponses table definition
# Check if the file exists in the same directory
fd -t f "firecrawl.ts" -d 1
# Verify the table definition
rg -A 5 "firecrawlResponses.*=.*mysqlTable"
Length of output: 602
apps/billing/src/lib/db-marketing/schemas/sections.ts (1)
87-91: Review timestamp configurations for consistency
The timestamps createdAt and updatedAt in sectionsToKeywords specify mode: "date" and fsp: 3, while other tables do not. This may lead to inconsistent timestamp precision and data types across your tables.
Please verify if the use of mode: "date" with fsp: 3 is intentional. If not, consider aligning the timestamp definitions with those in other tables.
apps/billing/src/trigger/glossary/keyword-research.ts (1)
63-63: Check for empty firecrawl results
After fetching firecrawl responses, ensure that there are valid results before proceeding. This prevents potential errors in subsequent steps.
apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (3)
37-38: Verify if maxAttempts should be zero
You have set maxAttempts: 0 in the task's retry configuration, which means the task will not retry if it fails. Given that sub-tasks may rely on non-deterministic execution, is it intentional to have no retries for the main task? Consider setting an appropriate number of retry attempts or ensure that sub-tasks handle retries sufficiently.
59-64: Ensure all necessary fields are checked in cache validation
When checking for existing entries in the cache, you verify several fields. Confirm that these fields (dynamicSectionsContent, metaTitle, metaDescription, githubPrUrl) are sufficient to determine if the cached entry is valid and complete. If there are additional fields or conditions that should be considered, include them to ensure correctness.
108-109: Verify parameter consistency in createPrTask call
In calling createPrTask.triggerAndWait, you pass { input: term, onCacheHit }, whereas in previous tasks you pass { term, onCacheHit }. Verify that createPrTask expects an input parameter instead of term. If it expects term, consider aligning the parameter naming for consistency across all sub-tasks.
apps/billing/src/trigger/glossary/seo-meta-tags.ts (3)
33-35: Ensure correct cache strategy handling
In the condition if (existing?.metaTitle && existing?.metaDescription && onCacheHit === "stale"), verify that the onCacheHit value correctly influences the caching logic. Confirm that this condition meets the intended behavior.
119-120: Check for undefined properties before accessing
Ensure that craftedMetaTags.object is defined before accessing title and description to prevent runtime errors.
18-20:
Potential syntax issue in default parameter assignment
The default parameter assignment onCacheHit = "stale" as CacheStrategy may cause a syntax error. Type assertions are not allowed in parameter defaults. Consider specifying the type separately.
Apply this diff to fix the issue:
-run: async ({
- term,
- onCacheHit = "stale" as CacheStrategy,
-}: { term: string; onCacheHit?: CacheStrategy }) => {
+run: async ({
+ term,
+ onCacheHit = "stale",
+}: { term: string; onCacheHit?: CacheStrategy }) => {
+ onCacheHit = onCacheHit as CacheStrategy;Likely invalid or redundant comment.
apps/billing/src/lib/firecrawl.ts (1)
179-179: Verify the model name 'gpt-4o-mini' in 'generateText' call
Ensure that 'gpt-4o-mini' is the correct model name. If it's a typo, consider updating it to the intended model, such as 'gpt-4' or 'gpt-3.5-turbo'.
Run the following script to list all model names used with openai in the codebase:
apps/billing/src/trigger/glossary/draft-sections.ts (2)
188-188:
Verify the model name 'gpt-4o-mini'
The model name 'gpt-4o-mini' used with the OpenAI SDK may be incorrect or unsupported. Please verify that this is the intended model and that it is available in the OpenAI API.
Run the following script to verify available models:
#!/bin/bash
# Description: List available OpenAI models to confirm 'gpt-4o-mini' exists.
# Test: Retrieve model list. Expect: 'gpt-4o-mini' should be listed.
curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[].id'224-224:
Verify the model name 'gpt-4o-mini'
The model name 'gpt-4o-mini' used with the OpenAI SDK may be incorrect or unsupported. Please confirm that this is the intended model and that it is available in the OpenAI API.
Run the following script to verify available models:
✅ Verification successful
Model name 'gpt-4o-mini' appears to be an internal custom model identifier
The codebase search reveals consistent usage of two custom model identifiers:
gpt-4o-minigpt-4o
These appear to be internal model identifiers rather than standard OpenAI model names, as they are used consistently across multiple files:
- apps/billing/src/lib/keywords.ts
- apps/billing/src/lib/search-query.ts
- apps/billing/src/lib/firecrawl.ts
- apps/billing/src/trigger/glossary/*.ts
The model name is likely correct within the context of this codebase's implementation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: List available OpenAI models to confirm 'gpt-4o-mini' exists.
# Test: Retrieve model list. Expect: 'gpt-4o-mini' should be listed.
curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[].id'
Length of output: 146
Script:
#!/bin/bash
# Let's search for any other model name patterns in the codebase
rg -g '*.ts' -g '*.js' "openai\(['\"].*['\"]" -A 2
# Also search for any model configuration or constants
rg -g '*.ts' -g '*.js' -i "model.*=.*gpt"
Length of output: 3197
apps/billing/src/trigger/glossary/evals.ts (2)
352-352: Verify appropriate storage of content in 'outline' field
In line 352, options.content (a string) is being JSON.stringify-ed and stored in the outline field. If outline is intended to store a structured outline (e.g., an array or object), storing a string may lead to inconsistencies when retrieving and parsing the data. Please verify that this is the intended behavior.
51-55: Consider cache control logic for 'onCacheHit' parameter
In the condition checking for existing evaluations, the code uses onCacheHit === "stale" to decide whether to return cached data. Verify that this logic aligns with the intended cache strategy. If the default behavior should be to use the cache unless a refresh is required, consider updating the condition to reflect that.
Also applies to: 104-108, 159-165, 230-236, 298-307
apps/billing/src/trigger/glossary/generate-outline.ts (2)
390-394: Confirm model availability and compatibility
The model gpt-4o-mini used in the generateObject calls may not be available or compatible. Ensure that the model identifier is correct and that the model supports the required features.
Consult the AI service documentation to verify the correct model name and its capabilities.
157-167: Ensure consistency in function usage and outputs
In the reviseSEOOutline function, ensure that the inputs and outputs are consistent with the expected schemas. Verify that seoOptimizedOutline.object.outline exists and adheres to the finalOutlineSchema.
Run the following script to verify the structure of seoOptimizedOutline:
✅ Verification successful
Based on the gathered information, I can now provide the final response:
Function implementation and schema validation are properly structured
The reviseSEOOutline function is correctly implemented with proper schema validation:
- It returns a response validated against
finalOutlineSchemausing thegenerateObjectfunction - The
finalOutlineSchemais properly defined with the expected structure including outline array and keywords - The output structure
seoOptimizedOutline.object.outlinematches the schema definition as it's enforced by the validation
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that seoOptimizedOutline matches finalOutlineSchema
# Test: Check for the existence and structure of seoOptimizedOutline.object.outline
ast-grep --pattern $'const seoOptimizedOutline = $_\nconsole.info($_)' apps/billing/src/trigger/glossary/generate-outline.ts
Length of output: 124
Script:
#!/bin/bash
# Let's verify the schema and usage of reviseSEOOutline function
# Check the definition and return type of reviseSEOOutline
ast-grep --pattern 'function reviseSEOOutline'
# Check for any type definitions or interfaces related to the outline
rg -A 5 "interface.*Outline|type.*Outline|schema.*Outline"
# Look for the finalOutlineSchema definition
rg "finalOutlineSchema" -A 5
# Check imports and usage of reviseSEOOutline
rg "reviseSEOOutline" -A 2
Length of output: 3425
Script:
#!/bin/bash
# Let's check the implementation of reviseSEOOutline to verify its return type
# Check the complete implementation of reviseSEOOutline function
ast-grep --pattern 'async function reviseSEOOutline($_) {
$$$
}'
# Check how the result is validated before return
rg -B 5 -A 2 "return validateAIResponse" apps/billing/src/trigger/glossary/generate-outline.ts
Length of output: 7196
| @@ -1,4 +1,7 @@ | |||
| export * from "./serper"; | |||
| export * from "./searchQuery"; | |||
| export * from "./entries"; | |||
There was a problem hiding this comment.
💡 Codebase verification
Circular dependency detected between entries and sections modules
There is a circular dependency between the schema modules:
entries.tsimports from./sectionssections.tsimports from./entries
This circular dependency could lead to initialization issues or runtime errors. Consider:
- Extracting shared types/interfaces to a separate module
- Restructuring the schema relationships to avoid the circular reference
- Using interface declarations instead of direct imports where possible
🔗 Analysis chain
Verify the existence of newly exported modules.
Let's ensure that all the newly added module files exist in the codebase.
Also applies to: 6-7
Let me generate additional verification scripts to check for potential circular dependencies between these modules.
This will help us verify:
- The complete import graph between these modules
- Existence of other referenced files (searchQuery.ts, keywords.ts)
- Any potential circular dependencies that could cause issues
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the new module files exist
# Expected: All files should exist to prevent runtime errors
echo "Checking for new module files..."
for module in "entries" "sections" "evals"; do
if fd -t f "${module}.ts" "apps/billing/src/lib/db-marketing/schemas/"; then
echo "✓ Found ${module}.ts"
else
echo "✗ Missing ${module}.ts"
fi
done
Length of output: 1296
Script:
#!/bin/bash
# Check for circular dependencies in schema files
echo "Checking entries.ts imports:"
rg "import.*from.*\./.*" apps/billing/src/lib/db-marketing/schemas/entries.ts
echo -e "\nChecking sections.ts imports:"
rg "import.*from.*\./.*" apps/billing/src/lib/db-marketing/schemas/sections.ts
echo -e "\nChecking evals.ts imports:"
rg "import.*from.*\./.*" apps/billing/src/lib/db-marketing/schemas/evals.ts
# Check if all referenced files exist
echo -e "\nVerifying all imported files exist:"
fd -t f "searchQuery.ts|keywords.ts" apps/billing/src/lib/db-marketing/schemas/
Length of output: 928
| "db:studio": "dotenv -e .env drizzle-kit studio --verbose", | ||
| "db:generate": "dotenv -e .env drizzle-kit generate", | ||
| "db:migrate": "dotenv -e .env drizzle-kit migrate", | ||
| "db:pull": "dotenv -e .env drizzle-kit pull" |
There was a problem hiding this comment.
💡 Codebase verification
Missing environment file for database configuration in billing app
The database scripts require environment variables, but there's no .env.example file in the billing app to document the required configuration. Other apps in the monorepo have example files that show the required database variables:
DATABASE_HOSTDATABASE_USERNAMEDATABASE_PASSWORD
Please add an .env.example file to the apps/billing directory documenting these required database variables.
🔗 Analysis chain
LGTM! Verify environment variable setup.
The new database scripts align well with Drizzle ORM operations. The verbose flag for studio is a good addition for debugging.
Let's verify the required environment variables are documented:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if environment variables are documented in README or example files
# Test: Look for environment variable documentation
rg -i "OPENAI_API_KEY|SERPER_API_KEY|FIRECRAWL_API_KEY|GITHUB_PERSONAL_ACCESS_TOKEN" README.md .env.example
Length of output: 162
Script:
#!/bin/bash
# Let's check for any .env files or environment documentation in the billing app
fd -t f ".env" apps/billing/
# Also check for environment variable documentation in billing app's README
fd -t f "README" apps/billing/
# Look for environment setup instructions in the root README
cat README.md
# Check if there's a template or example env file in the root
fd -t f ".env.example|.env.template" .
Length of output: 1035
Script:
#!/bin/bash
# Let's check the content of example env files to find database-related variables
cat apps/dashboard/.env.example apps/play/.env.example apps/workflows/.env.example apps/www/.env.example tools/migrate/.env.example
# Check the contributing guide for environment setup instructions
cat .github/CONTRIBUTING.md
# Look for any database configuration documentation
rg -i "database|drizzle|planetscale" .github/
Length of output: 6556
| "@chronark/zod-bird": "0.3.9", | ||
| "@clerk/nextjs": "^4.29.10", | ||
| "@mendable/firecrawl-js": "^1.5.2", | ||
| "@octokit/rest": "^21.0.2", |
There was a problem hiding this comment.
💡 Codebase verification
Document required GitHub token scopes for PR creation
Based on the code analysis, the GitHub token needs the following minimum required scopes to function:
reposcope for creating/updating files and branchespull_requestsscope for creating PRs
The token is used appropriately with specific operations:
- Reading refs (
getRef,listMatchingRefs) - Managing branches (
deleteRef,createRef) - File operations (
createOrUpdateFileContents) - PR creation (
pulls.create)
Consider documenting these required scopes in:
- Environment variable documentation
- Repository setup instructions
- CI/CD configuration
🔗 Analysis chain
Verify GitHub token scope requirements.
The addition of @octokit/rest aligns with the PR creation functionality. However, ensure that:
- The required GitHub token scopes are documented
- The token is used with minimal required permissions
Let's check for GitHub token scope documentation:
Let me gather more information about the GitHub token usage and configuration:
Let me check the actual PR creation code to understand the required permissions:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Look for GitHub token scope documentation
# Test 1: Search for GitHub token scope documentation
rg -i "github.*token.*scope|github.*scope|github.*permission" -g '!*.{lock,sum}'
# Test 2: Look for Octokit client initialization to verify used scopes
ast-grep --pattern 'new Octokit({ auth: $_ })'
Length of output: 4148
Script:
#!/bin/bash
# Search for environment variable documentation and configuration
rg -i "GITHUB_PERSONAL_ACCESS_TOKEN" -A 5 -B 5
# Look for PR creation related code to understand token usage
ast-grep --pattern 'octokit.$_($$$)'
Length of output: 974
Script:
#!/bin/bash
# Look for the complete PR creation flow
rg -l "create-pr"
# Then examine the specific file content
cat apps/billing/src/trigger/glossary/create-pr.ts
Length of output: 4841
| export const entries = mysqlTable( | ||
| "entries", | ||
| { | ||
| id: int("id").primaryKey().autoincrement(), | ||
| inputTerm: varchar("input_term", { length: 255 }).notNull(), | ||
| githubPrUrl: varchar("github_pr_url", { length: 255 }), | ||
| dynamicSectionsContent: text("dynamic_sections_content"), | ||
| metaTitle: varchar("meta_title", { length: 255 }), | ||
| metaDescription: varchar("meta_description", { length: 255 }), | ||
| createdAt: timestamp("created_at").notNull().defaultNow(), | ||
| updatedAt: timestamp("updated_at") | ||
| .notNull() | ||
| .defaultNow() | ||
| .$onUpdate(() => new Date()), | ||
| }, | ||
| (table) => ({ | ||
| inputTermIdx: index("input_term_idx").on(table.inputTerm), | ||
| }), | ||
| ); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider strengthening field constraints and adding performance optimizations.
Several suggestions to improve the table definition:
- The
githubPrUrlfield should be marked asnotNull()since PR objectives indicate this is a required part of the workflow. - Consider adding size constraints to the
dynamicSectionsContentTEXT field usingtext("dynamic_sections_content", { length: 'medium' })to optimize storage. - Add validation for the URL format.
- Consider adding a composite index on
(created_at, updated_at)for better query performance on time-based operations.
export const entries = mysqlTable(
"entries",
{
id: int("id").primaryKey().autoincrement(),
inputTerm: varchar("input_term", { length: 255 }).notNull(),
- githubPrUrl: varchar("github_pr_url", { length: 255 }),
+ githubPrUrl: varchar("github_pr_url", { length: 255 }).notNull(),
- dynamicSectionsContent: text("dynamic_sections_content"),
+ dynamicSectionsContent: text("dynamic_sections_content", { length: 'medium' }),
metaTitle: varchar("meta_title", { length: 255 }),
metaDescription: varchar("meta_description", { length: 255 }),
createdAt: timestamp("created_at").notNull().defaultNow(),
updatedAt: timestamp("updated_at")
.notNull()
.defaultNow()
.$onUpdate(() => new Date()),
},
(table) => ({
inputTermIdx: index("input_term_idx").on(table.inputTerm),
+ timeBasedIdx: index("time_based_idx").on(table.createdAt, table.updatedAt),
}),
);Committable suggestion skipped: line range outside the PR's diff.
| :tv: Demo | ||
|
|
||
|
|
There was a problem hiding this comment.
Add missing demo content.
The Demo section is empty but referenced in the main changes section. Please add:
- A video/gif demonstration or
- Step-by-step instructions with screenshots
This will help reviewers understand the workflow's functionality better.
| const result = await generateObject({ | ||
| model: openai("gpt-4o-mini"), | ||
| system: systemPrompt, | ||
| prompt: `Review this content and provide numerical ratings:\n${options.content}`, | ||
| schema: ratingsSchema, | ||
| }); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add error handling for 'generateObject' function calls
The generateObject function may throw errors if the API call fails or if the response doesn't match the schema. To prevent unhandled exceptions, consider wrapping these calls in try-catch blocks and handling errors appropriately.
Apply this diff to handle errors:
try {
const result = await generateObject({
model: openai("gpt-4o-mini"),
system: systemPrompt,
prompt: `Review this content and provide numerical ratings:\n${options.content}`,
schema: ratingsSchema,
});
+ return result;
+} catch (error) {
+ throw new AbortTaskRunError(`Failed to generate ratings: ${error.message}`);
}
-
-return result;Also applies to: 129-134
| // TODO: this task is a bit flake-y still | ||
| // - split up into smaller tasks, and/or | ||
| // - move some of the in-memory storage to db caching, and/or | ||
| // - improve the prompts |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Address the TODO: Improve task reliability and performance
The code contains a TODO comment indicating that the generateOutlineTask is flaky. To enhance reliability and maintainability, consider the following actions:
- Split the task into smaller, atomic tasks: Breaking down the task can improve error isolation and make the workflow more manageable.
- Persist intermediate data to the database: This reduces reliance on in-memory storage, aids in caching, and enhances the idempotency of tasks.
- Refine the AI prompts: Improving prompt engineering can lead to more consistent and accurate AI responses.
| if (organicResults.length === 0) { | ||
| throw new AbortTaskRunError(`No organic results found for term: ${term}`); | ||
| } | ||
| console.info(`Step 1/8 - ORGANIC RESULTS: ${organicResults?.length} results`); |
There was a problem hiding this comment.
Avoid logging sensitive or extensive data to the console
The use of console.info to log extensive content, such as topRankingContent, may expose sensitive information and clutter the logs. Specifically, the following lines log potentially large amounts of data:
- Line 83: Logs the number of organic results (acceptable).
- Line 99 & 107: Logs the entire
topRankingContent, which may be redundant and excessive.
Consider adjusting the logging strategy:
- Log summaries or metadata instead of full content: Provide concise information that aids debugging without exposing full content.
- Remove duplicate log statements: The log at line 107 appears to duplicate the one at line 99.
Apply this diff to adjust the logging:
console.info(`Step 1/8 - ORGANIC RESULTS: ${organicResults?.length} results`);
-console.info(`Step 3/8 - SUMMARIES: ${topRankingContent}`);
+console.info(`Step 2/8 - Retrieved and summarized top-ranking content.`);
-console.info(`Step 3/8 - SUMMARIES: ${topRankingContent}`);Also applies to: 99-99, 107-107
| // Step 5: Technical review by domain expert | ||
| const technicalEval = await performTechnicalEvalTask.triggerAndWait({ | ||
| input: term, | ||
| content: topRankingContent, | ||
| onCacheHit, | ||
| }); | ||
| if (!technicalEval.ok) { | ||
| throw new AbortTaskRunError("Technical evaluation failed"); | ||
| } |
There was a problem hiding this comment.
Handle potential null values in technicalEval
Before accessing technicalEval.output, ensure that technicalEval and technicalEval.output are not null or undefined to prevent runtime errors.
Apply this diff to add null checks:
if (!technicalEval.ok || !technicalEval.output) {
throw new AbortTaskRunError("Technical evaluation failed");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Step 5: Technical review by domain expert | |
| const technicalEval = await performTechnicalEvalTask.triggerAndWait({ | |
| input: term, | |
| content: topRankingContent, | |
| onCacheHit, | |
| }); | |
| if (!technicalEval.ok) { | |
| throw new AbortTaskRunError("Technical evaluation failed"); | |
| } | |
| // Step 5: Technical review by domain expert | |
| const technicalEval = await performTechnicalEvalTask.triggerAndWait({ | |
| input: term, | |
| content: topRankingContent, | |
| onCacheHit, | |
| }); | |
| if (!technicalEval.ok || !technicalEval.output) { | |
| throw new AbortTaskRunError("Technical evaluation failed"); | |
| } |
| const newEntry = await db.query.entries.findFirst({ | ||
| where: eq(entries.id, entry.id), | ||
| orderBy: (entries, { desc }) => [desc(entries.createdAt)], | ||
| with: { | ||
| dynamicSections: { | ||
| with: { | ||
| contentTypes: true, | ||
| sectionsToKeywords: { | ||
| with: { | ||
| keyword: true, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }); |
There was a problem hiding this comment.
Handle potential null return from db.query.entries.findFirst
Ensure that newEntry is not null before returning it. This prevents unexpected null values and potential runtime errors in downstream code.
Apply this diff to add a null check:
const newEntry = await db.query.entries.findFirst({
where: eq(entries.id, entry.id),
// ...rest of the query
});
+if (!newEntry) {
+ throw new AbortTaskRunError(`New entry not found for id: ${entry.id}`);
+}
return newEntry;Committable suggestion skipped: line range outside the PR's diff.
marketingplanetscale instanceWhat does this PR do?
Fixes # (issue)
If there is not an issue for this, please create one first. This is used to tracking purposes and also helps use understand why this PR exists
Type of change
How should this be tested?
Checklist
Required
pnpm buildpnpm fmtconsole.logsgit pull origin mainAppreciated
Summary by CodeRabbit
New Features
Documentation
Bug Fixes