feat(billing): add keyword research agent#2151
Conversation
|
📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 WalkthroughWalkthroughThe changes introduce a new configuration for the Drizzle ORM in Changes
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@p6l-richard is attempting to deploy a commit to the Unkey Team on Vercel. A member of the Team first needs to authorize it. |
|
Thank you for following the naming conventions for pull request titles! 🙏 |
There was a problem hiding this comment.
@chronark / @perkinsjr i removed this placeholder
| randomize: true, | ||
| }, | ||
| }, | ||
| dependenciesToBundle: [/^@unkey\//], |
There was a problem hiding this comment.
deprecated, now every dep gets bundled by default
| ".react-email", | ||
| ".content-collections" | ||
| ".content-collections", | ||
| ".trigger" |
There was a problem hiding this comment.
made biome ignore the .trigger code when running it locally
There was a problem hiding this comment.
we have the marketing database (schemas, client) inside apps/billing/src/lib/db-marketing/* since we don't have any other app accessing it right now.
|
|
There was a problem hiding this comment.
Actionable comments posted: 29
🧹 Outside diff range and nitpick comments (8)
apps/billing/drizzle.config.ts (1)
4-7: LGTM: Configuration settings are well-definedThe configuration settings are appropriate and well-structured. The use of a specific schema location and output directory promotes good organization.
Consider adding a comment explaining the purpose of the
verbosesetting, as it might not be immediately clear to all developers why it's enabled.apps/billing/src/lib/env.ts (1)
16-18: LGTM! Consider enhancing API key validation.The addition of
FIRECRAWL_API_KEYandSERPER_API_KEYenvironment variables is correct and aligns with the PR objectives. The implementation is consistent with the existing code style and structure.To further improve the robustness of the environment variable validation, consider adding more specific validation for the API keys. For example, you could use a regex pattern to ensure they match the expected format of the respective APIs. Here's an example of how you might do this:
FIRECRAWL_API_KEY: z.string().regex(/^fc_[a-zA-Z0-9]{32}$/), SERPER_API_KEY: z.string().regex(/^[a-zA-Z0-9]{32}$/),Note: Please adjust the regex patterns to match the actual format of the API keys used by Firecrawl and Serper.
apps/billing/package.json (1)
8-10: Approve script changes with a minor suggestion.The updates to the scripts section look good:
- Switching from
pnpxtopnpm dlxin thedevscript ensures consistency with the project's package manager.- The new
db:pushanddb:studioscripts indicate the introduction of database management, likely using Drizzle ORM.Consider adding a
db:generatescript for schema generation:"scripts": { "dev": "pnpm dlx trigger.dev@latest dev", "db:push": "dotenv -e .env drizzle-kit push", - "db:studio": "dotenv -e .env drizzle-kit studio" + "db:studio": "dotenv -e .env drizzle-kit studio", + "db:generate": "dotenv -e .env drizzle-kit generate:mysql" },This will streamline the database schema management process.
biome.json (1)
44-45: Consider consistent treatment of ".content-collections" across ignore listsWhile ".content-collections" is retained in the
linter.ignorearray, it has been removed from theformatter.ignoreandorganizeImports.ignorearrays. This inconsistency might lead to unexpected behavior where the content is linted but not formatted or have its imports organized.Consider one of the following actions:
- If ".content-collections" should be ignored by all tools, add it back to the
formatter.ignoreandorganizeImports.ignorearrays.- If ".content-collections" should only be ignored by the linter, remove it from the
linter.ignorearray as well for consistency.Please clarify the intended behavior for ".content-collections" and update the configuration accordingly.
Also applies to: 59-65, 70-76
apps/billing/src/lib/search-query.ts (1)
22-33: Correct the grammatical and typographical errors in the system promptThere is a typographical error in the system prompt at line 24: "You're goal" should be "Your goal". Additionally, reviewing the prompt for grammatical improvements can enhance clarity and professionalism.
apps/billing/src/lib/serper.ts (1)
135-137: Improve Error Message SpecificityThe error message at line 136 is generic. Providing more context can help with debugging and user feedback.
Update the error message to include the search query or other relevant details.
if (!newResponse) { - throw new Error("Failed to retrieve newly inserted search response"); + throw new Error(`Failed to retrieve search response for query '${args.query}'`); }apps/billing/src/trigger/keyword-research.ts (2)
110-112: Simplify the filter for top organic resultsThe filter function on lines 110-112 can be simplified for clarity:
const topThreeOrganicResults = searchResponse.serperOrganicResults.slice(0, 3);This assumes that
serperOrganicResultsis already sorted by position. If so, slicing the first three elements may be more efficient and readable.Apply this diff if applicable:
-const topThreeOrganicResults = searchResponse.serperOrganicResults.filter( - (result) => result.position <= THREE, -); +const topThreeOrganicResults = searchResponse.serperOrganicResults.slice(0, 3);
183-195: Handle potential empty 'related_searches'In lines 183-195, you're inserting related searches into the database. Ensure that
searchResponse.serperRelatedSearchescontains data before attempting to insert. Adding a check can prevent errors if the array is empty.Apply this diff to include a safety check:
+if (searchResponse.serperRelatedSearches.length > 0) { await db .insert(keywords) .values( searchResponse.serperRelatedSearches.map((search) => ({ inputTerm: searchQuery.inputTerm, keyword: search.query.toLowerCase(), source: "related_searches", updatedAt: sql`now()`, })), ) .onDuplicateKeyUpdate({ set: { updatedAt: sql`now()`, }, }); +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (17)
- apps/billing/drizzle.config.ts (1 hunks)
- apps/billing/package.json (2 hunks)
- apps/billing/src/lib/db-marketing/client.ts (1 hunks)
- apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (1 hunks)
- apps/billing/src/lib/db-marketing/schemas/index.ts (1 hunks)
- apps/billing/src/lib/db-marketing/schemas/keywords.ts (1 hunks)
- apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (1 hunks)
- apps/billing/src/lib/db-marketing/schemas/serper.ts (1 hunks)
- apps/billing/src/lib/env.ts (1 hunks)
- apps/billing/src/lib/firecrawl.ts (1 hunks)
- apps/billing/src/lib/search-query.ts (1 hunks)
- apps/billing/src/lib/serper.ts (1 hunks)
- apps/billing/src/trigger/example.ts (0 hunks)
- apps/billing/src/trigger/keyword-research.ts (1 hunks)
- apps/billing/trigger.config.ts (0 hunks)
- biome.json (1 hunks)
- internal/billing/package.json (1 hunks)
💤 Files with no reviewable changes (2)
- apps/billing/src/trigger/example.ts
- apps/billing/trigger.config.ts
✅ Files skipped from review due to trivial changes (1)
- apps/billing/src/lib/db-marketing/schemas/index.ts
🔇 Additional comments (24)
apps/billing/drizzle.config.ts (1)
1-3: LGTM: Proper import and export structureThe import of
defineConfigfrom 'drizzle-kit' and the default export of the configuration object are correctly implemented. This structure ensures type safety and enables autocompletion in IDEs.internal/billing/package.json (1)
20-22: New dependencies added for keyword research task.The addition of
@mendable/firecrawl-jsandaipackages aligns with the PR objective of implementing a keyword research task.
@mendable/firecrawl-js(v1.5.2) will be used for scraping the top search results as mentioned in the PR description.ai(v3.0.23) is likely to be used for processing the scraped data or generating keywords, although its specific use is not detailed in the PR objectives.To ensure these dependencies are used in the implementation, let's check for their imports:
✅ Verification successful
Dependencies usage verified.
The
@mendable/firecrawl-jsandaipackages are correctly imported and utilized in the following files:
apps/billing/src/lib/firecrawl.tsapps/billing/src/trigger/keyword-research.tsapps/billing/src/lib/search-query.ts🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of new dependencies in the codebase. # Test 1: Search for imports of @mendable/firecrawl-js echo "Searching for @mendable/firecrawl-js imports:" rg --type typescript 'from\s+[\'"]@mendable/firecrawl-js[\'"]' # Test 2: Search for imports of ai package echo "Searching for ai package imports:" rg --type typescript 'from\s+[\'"]ai[\'"]'Length of output: 195
Script:
#!/bin/bash # Description: Corrected script to verify the usage of new dependencies in the codebase. # Test 1: Search for imports of @mendable/firecrawl-js echo "Searching for @mendable/firecrawl-js imports:" rg --type typescript "from\s+[\"']@mendable/firecrawl-js[\"']" # Test 2: Search for imports of ai package echo "Searching for ai package imports:" rg --type typescript "from\s+[\"']ai[\"']"Length of output: 365
Script:
#!/bin/bash # Description: Corrected script to verify the usage of new dependencies in the codebase by targeting .ts and .tsx files. # Test 1: Search for imports of @mendable/firecrawl-js echo "Searching for @mendable/firecrawl-js imports in .ts and .tsx files:" rg 'from\s+[\"']@mendable/firecrawl-js[\"']' -g '*.ts' -g '*.tsx' # Test 2: Search for imports of ai package echo "Searching for ai package imports in .ts and .tsx files:" rg 'from\s+[\"']ai[\"']' -g '*.ts' -g '*.tsx'Length of output: 577
Script:
#!/bin/bash # Description: Fixed script to verify the usage of new dependencies in the codebase by correctly formatting regex patterns. # Test 1: Search for imports of @mendable/firecrawl-js echo "Searching for @mendable/firecrawl-js imports in .ts and .tsx files:" rg "from\s+[\"']@mendable/firecrawl-js[\"']" -g '*.ts' -g '*.tsx' # Test 2: Search for imports of ai package echo "Searching for ai package imports in .ts and .tsx files:" rg "from\s+[\"']ai[\"']" -g '*.ts' -g '*.tsx'Length of output: 630
apps/billing/src/lib/db-marketing/client.ts (2)
1-4: LGTM: Imports and module structure are well-organized.The imports are appropriate for setting up a Drizzle ORM connection with PlanetScale. Separating the schema definitions into a local module (
./schemas) is a good practice for maintainability.
12-15: Clarify the purpose of cache header removal in the custom fetch function.The custom fetch function removes the cache header from the request. While this might be necessary for certain scenarios, it's not immediately clear why this is needed.
Could you please provide more context on why the cache header needs to be removed? This information would be valuable for future maintainers.
Consider adding a comment explaining the rationale, for example:
fetch: (url: string, init: any) => { // Remove cache header to ensure fresh data on each request // This is necessary because [explain the specific reason] (init as any).cache = undefined; return fetch(url, init); },Additionally, to verify if this cache removal is a common practice or specific to this project, we can search for similar patterns in the codebase:
This will help us understand if this is a project-wide practice or specific to this database client.
✅ Verification successful
Cache header removal is consistently applied across the codebase.
The custom fetch function removes the cache header to ensure fresh data on each request. This pattern is consistently used in multiple components, indicating it’s a standard practice within the project.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for similar cache removal patterns in other files rg --type typescript "cache\s*=\s*undefined" -g '!apps/billing/src/lib/db-marketing/client.ts'Length of output: 131
Script:
#!/bin/bash # Search for similar cache removal patterns in other TypeScript files rg "cache\s*=\s*undefined" --type ts -g '!apps/billing/src/lib/db-marketing/client.ts'Length of output: 800
apps/billing/package.json (3)
42-42: Approve addition of drizzle-kit devDependency.The addition of
drizzle-kitas a devDependency is appropriate:
- It's consistent with the introduction of Drizzle ORM for database management.
- This package is necessary for running the new database scripts (
db:pushanddb:studio) added earlier.
Line range hint
1-44: Summary: Changes align well with PR objectives.The modifications to
package.jsoneffectively support the introduction of the keyword research agent and database management using Drizzle ORM:
- New dependencies for keyword research and AI functionality have been added.
- Drizzle ORM-related packages have been introduced for database management.
- New scripts for database operations have been included.
- Existing dependencies have been updated to their latest versions.
These changes are well-structured, follow best practices, and align perfectly with the PR objectives.
16-17: Approve dependency changes with a security suggestion.The updates to the dependencies section are appropriate and align with the PR objectives:
- New dependencies (@agentic/serper, @ai-sdk/openai, @mendable/firecrawl-js) support the keyword research functionality.
- Updates to existing dependencies (@trigger.dev/nextjs, @trigger.dev/sdk, @trigger.dev/slack) keep the project up-to-date.
- Addition of Drizzle-related packages (drizzle-orm, drizzle-zod) confirms the introduction of Drizzle ORM for database management.
As a security precaution, please run the following command to check for any known vulnerabilities in the new dependencies:
If the result is greater than 0, please review the vulnerabilities and update the dependencies if necessary.
Also applies to: 20-20, 22-24, 27-27, 31-33
biome.json (1)
44-45: LGTM: Addition of ".trigger" to linter ignore listThe addition of ".trigger" to the
linter.ignorearray is appropriate for the newkeyword-researchtask within thebillingtrigger project. This change will prevent the linter from analyzing the trigger-related code, which may have specific requirements or generated code that shouldn't be linted.apps/billing/src/lib/db-marketing/schemas/keywords.ts (3)
25-28: Verify foreign key relationship for 'inputTerm'Ensure that the data types and constraints for
keywords.inputTermandsearchQueries.inputTermare consistent, including length and nullability. This will prevent potential data integrity issues and ensure that the foreign key relationship functions as expected.
6-22: Well-structured schema with appropriate indexes and constraintsThe schema definition for the
keywordstable is well-organized, with clear field definitions, indexes, and a unique constraint oninputTermandkeyword. The use of Drizzle ORM conventions and best practices is evident.
29-32:⚠️ Potential issueHandle nullability in 'sourceUrl' foreign key relationship
The
sourceUrlfield in thekeywordstable is nullable, but it's referencingserperOrganicResults.link. Verify thatserperOrganicResults.linkcan accommodate null values or adjust the foreign key constraint to handle nullable references appropriately. This will prevent potential issues whensourceUrlisNULL.apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (2)
41-46: Verify Compatibility of Foreign Key RelationshipIn the
firecrawlResponsesRelations, the foreign key relationship linksfirecrawlResponses.sourceUrltoserperOrganicResults.link. Ensure that both fields have compatible data types and lengths to maintain referential integrity.Run the following script to compare the definitions:
#!/bin/bash # Description: Verify that the data types and lengths of the related fields are compatible. # Expected: Both fields should be of the same type and length. # Retrieve the definition of 'serperOrganicResults.link' ast-grep --lang typescript --pattern $'link: varchar($_, { length: $_ }).notNull()' ./apps/billing/src/lib/db-marketing/schemas/serper.ts --json # Retrieve the definition of 'firecrawlResponses.sourceUrl' ast-grep --lang typescript --pattern $'sourceUrl: varchar($_, { length: $_ }).notNull()' ./apps/billing/src/lib/db-marketing/schemas/firecrawl.ts --jsonConfirm that the lengths and nullability constraints match between the two fields.
21-31: Verify VARCHAR Lengths and Index Key Size LimitsFields like
sourceUrl,title,ogTitle,ogDescription,ogImage, andogSiteNamehaveVARCHARlengths set to 767. While this accommodates the maximum index length for InnoDB (767 bytes), using largeVARCHARlengths in indexed columns may cause issues, especially with multibyte character sets like UTF-8.Run the following script to ensure that the combined index lengths do not exceed MySQL's limitations:
Ensure that the sum of the indexed
VARCHARlengths multiplied by the maximum bytes per character does not exceed the InnoDB index key limit.apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (1)
39-44:⚠️ Potential issueVerify the correctness of the relationship definition.
According to the comments on lines 37-38, the foreign key is stored in the
serperSearchResponsestable, and thesearchQueriesrelation should have neitherfieldsnorreferences. However, in the relation definition on lines 39-44,fieldsandreferencesare specified. Please confirm that this is intentional. If the foreign key resides inserperSearchResponses, you may need to adjust the relation definition to reflect this.apps/billing/src/lib/serper.ts (3)
70-70: Ensure Successful Completion of All Insert OperationsThe
Promise.allat line 70 waits for all insert operations to complete. If any insert fails, it will cause the entire promise to reject, and the transaction should roll back. However, since you're filtering theinsertPromisesarray withfilter(Boolean), ensure that this doesn't inadvertently omit any necessary operations.Confirm that all required insert operations are included in the
insertPromisesarray and that partial failures are appropriately handled by the transaction.
16-82: Consider Bulk Insert OptimizationIn the
persistSerperResultsfunction, you're performing multiple insert operations for different types of search results. Depending on the volume of data, this could be optimized by using bulk insert operations or adjusting the transaction settings.[performance]
Assess whether batching the insert operations or using more efficient database operations could improve performance.
103-105: Check for Completeness of Existing ResponseThe condition at line 103 checks for
existingResponse.serperOrganicResults.length > 0. Ensure that other related data such asserperRelatedSearchesandserperPeopleAlsoAskare also present if they are critical for your application's functionality.Consider enhancing the completeness check to include other related data:
if ( existingResponse && existingResponse.serperOrganicResults.length > 0 && + existingResponse.serperRelatedSearches.length > 0 && + existingResponse.serperPeopleAlsoAsk.length > 0 ) { return existingResponse; }apps/billing/src/lib/db-marketing/schemas/serper.ts (3)
55-55: Ensure consistent nullability ofimageUrlfield across tables.In
serperOrganicResults, theimageUrlfield is optional, whereas inserperTopStories, it is defined asnotNull(). Verify if this difference is intentional. If images may not always be available, consider makingimageUrloptional inserperTopStories.Also applies to: 119-119
15-15: Verify data types and constraints forinputTermin relations.Ensure that the data type and length of
inputTerminserperSearchResponsesandsearchQueriesare consistent to maintain referential integrity and prevent potential issues.Also applies to: 28-31
52-52: 🛠️ Refactor suggestionReview the maximum length of
linkfields.The
linkfields are defined with a maximum length of 767 characters. Some URLs can exceed this length. Consider increasing the length to accommodate longer URLs, or ensure that truncation or validation is handled appropriately.Also applies to: 84-84, 116-116, 151-151
apps/billing/src/trigger/keyword-research.ts (4)
17-18: Verify retry policy: 'maxAttempts' is set to 0Setting
maxAttempts: 0means the task will not retry upon failure. Is this intended? If you want the task to retry on failure, consider settingmaxAttemptsto a positive value.
68-68: Verify the OpenAI model identifier 'gpt-4o'In lines 68 and 139, the OpenAI model is specified as
'gpt-4o':model: openai("gpt-4o"),Please verify if
'gpt-4o'is a valid model identifier. Typically, OpenAI models are referred to as'gpt-4','gpt-3.5-turbo', etc. If'gpt-4o'is a custom model or alias, ensure that it's correctly configured.Also applies to: 139-139
37-37: Verify specificity to API development in the promptIn line 37, the prompt includes:
- - Focus on technical and context-specific terms related to API development.Is the keyword research intended to be specific to API development? If the tool is meant to handle a variety of topics, consider generalizing this line to make the prompt more adaptable.
125-125: Ensure correct extraction of headers from markdownIn line 125, you're using a regular expression to match headers:
${content.markdown?.match(/^##\s+(.*)$/gm)?.join("\n")}Ensure that this regex correctly captures the headers you intend to extract. It currently matches lines starting with
##. If you also want to include#or deeper header levels like###, consider adjusting the regex.Apply this diff if you want to include all header levels:
-${content.markdown?.match(/^##\s+(.*)$/gm)?.join("\n")} +${content.markdown?.match(/^#+\s+(.*)$/gm)?.join("\n")}
| export const db = drizzle( | ||
| new Client({ | ||
| host: process.env.MARKETING_DATABASE_HOST!, | ||
| username: process.env.MARKETING_DATABASE_USERNAME!, | ||
| password: process.env.MARKETING_DATABASE_PASSWORD!, | ||
|
|
There was a problem hiding this comment.
Enhance robustness with environment variable validation and error handling.
While using environment variables for connection details is a good security practice, the current implementation has some potential issues:
- The use of non-null assertions (
!) on environment variables could lead to runtime errors if the variables are not set. - There's no error handling or connection validation.
Consider implementing the following improvements:
- Validate environment variables before use:
const requiredEnvVars = [
'MARKETING_DATABASE_HOST',
'MARKETING_DATABASE_USERNAME',
'MARKETING_DATABASE_PASSWORD'
] as const;
for (const envVar of requiredEnvVars) {
if (!process.env[envVar]) {
throw new Error(`Missing required environment variable: ${envVar}`);
}
}- Add error handling and connection validation:
import { drizzle } from "drizzle-orm/planetscale-serverless";
import { Client } from "@planetscale/database";
import * as schema from "./schemas";
const client = new Client({
host: process.env.MARKETING_DATABASE_HOST,
username: process.env.MARKETING_DATABASE_USERNAME,
password: process.env.MARKETING_DATABASE_PASSWORD,
fetch: (url: string, init: any) => {
(init as any).cache = undefined; // Remove cache header
return fetch(url, init);
},
});
export const db = drizzle(client, { schema });
// Validate the connection
export async function validateDbConnection() {
try {
await client.execute('SELECT 1');
console.log('Database connection successful');
} catch (error) {
console.error('Failed to connect to the database:', error);
throw error;
}
}These changes will make the database connection setup more robust and easier to debug.
Also applies to: 17-19
| inputTerm: varchar("input_term", { length: 255 }).notNull(), | ||
| keyword: varchar("keyword", { length: 255 }).notNull(), | ||
| source: varchar("source", { length: 255 }).notNull(), | ||
| sourceUrl: varchar("source_url", { length: 767 }), |
There was a problem hiding this comment.
Potential issue with 'sourceUrl' length and indexing
The sourceUrl field is defined with a length of 767 characters and is being indexed (line 19). In MySQL with InnoDB and utf8mb4 character set, the maximum index length is 191 characters (since 191 * 4 = 764 bytes). Indexing a VARCHAR(767) column might cause errors due to exceeding the index length limit.
Consider reducing the length of sourceUrl to 191 characters or implementing a prefix index to ensure compatibility.
| source: varchar("source", { length: 255 }).notNull(), | ||
| sourceUrl: varchar("source_url", { length: 767 }), | ||
| createdAt: timestamp("created_at").defaultNow().notNull(), | ||
| updatedAt: timestamp("updated_at").defaultNow().notNull(), |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Ensure 'updatedAt' automatically reflects record modifications
The updatedAt timestamp is set with defaultNow().notNull(), which initializes it at creation time but doesn't update it on record modifications. To keep updatedAt accurate, consider configuring it to automatically update whenever the record is modified. This can be achieved by setting the default to NOW() and adding an ON UPDATE CURRENT_TIMESTAMP clause in the table definition.
| ogDescription: varchar("og_description", { length: 767 }), | ||
| ogUrl: text("og_url"), | ||
| ogImage: varchar("og_image", { length: 767 }), | ||
| xogSiteName: varchar("og_site_name", { length: 767 }), |
There was a problem hiding this comment.
Typo in Property Name 'xogSiteName'
In line 30, the property is named xogSiteName, but it seems to be a typo. It should likely be ogSiteName to match the pattern of other Open Graph fields.
Apply this diff to fix the typo:
- xogSiteName: varchar("og_site_name", { length: 767 }),
+ ogSiteName: varchar("og_site_name", { length: 767 }),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| xogSiteName: varchar("og_site_name", { length: 767 }), | |
| ogSiteName: varchar("og_site_name", { length: 767 }), |
| keywordsFromHeaders.object.keywords.map((keyword) => ({ | ||
| inputTerm: searchQuery.inputTerm, | ||
| keyword: keyword.keyword.toLowerCase(), | ||
| sourceUrl: keyword.sourceUrl, | ||
| source: "headers", | ||
| })), | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider handling duplicate keywords across different sources
When inserting keywords from titles, headers, and related searches, duplicates might exist across these sources. Currently, duplicates are handled within each insertion but not across different sources. Consider de-duplicating keywords globally to prevent redundancy.
Also applies to: 184-190
| const keywordResearchSystemPrompt = ` | ||
| You are an SEO Expert & Content Writer specializing in creating technical content for Developer Tools that are highly SEO optimized. | ||
|
|
||
| **Your Objectives:** | ||
| 1. **Keyword Extraction:** | ||
| - Extract relevant keywords from the titles of top-ranking organic search results. | ||
| - Focus on technical and context-specific terms related to API development. | ||
|
|
||
| 2. **Quality Assurance:** | ||
| - **Remove Stopwords:** Ensure that keywords do not include common stopwords (e.g., "for," "and," "the," "of," etc.). | ||
| - **Remove Brand Names:** Ensure that keywords do not include brand names (e.g., "GitHub", "YouTube", "npm", etc.). | ||
| - **Remove README keywords:** Ensure to exclude from instructive headers or titles (e.g., "getting started", "installation", etc.) of readmes. | ||
|
|
||
| **Guidelines:** | ||
| - Prioritize keywords that directly relate to the main term and its subtopics. | ||
| - Maintain a focus on terms that potential users or developers are likely to search for in the context of API development. | ||
| - Branded keywords should be included in the keywordsWithBrandNames and not in the keywords. | ||
| `; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Externalize prompt templates for maintainability
The prompt strings in lines 31-48 and 131-137 are lengthy and embedded within the code. Consider externalizing these prompt templates into separate files or constants for better maintainability and readability.
Also applies to: 131-137
| export async function getTopResultsContent(args: { urls: string[] }) { | ||
| // Check for existing responses with markdown content | ||
| const existingResponses = await db.query.firecrawlResponses.findMany({ | ||
| where: and( | ||
| inArray(firecrawlResponses.sourceUrl, args.urls), | ||
| isNotNull(firecrawlResponses.markdown), | ||
| ), | ||
| }); | ||
|
|
||
| const existingUrlSet = new Set(existingResponses.map((r) => r.sourceUrl)); | ||
| const urlsToScrape = args.urls.filter((url) => !existingUrlSet.has(url)); | ||
|
|
||
| // Scrape new URLs | ||
| const newResponses = await Promise.all(urlsToScrape.map(scrapeAndStoreUrl)); // Combine existing and new responses | ||
| return [...existingResponses, ...newResponses]; | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Refactor duplicated code to improve maintainability
The functions getTopResultsContent and getScrapedContentMany share similar logic for checking existing responses, determining URLs to scrape, scraping new URLs, and combining results. Consider refactoring the common code into a shared helper function to adhere to the DRY (Don't Repeat Yourself) principle and improve maintainability.
Also applies to: 84-97
| const [errorResponse] = await db | ||
| .insert(firecrawlResponses) | ||
| .values({ | ||
| sourceUrl: url, | ||
| error: firecrawlResult.error || "Unknown error occurred", | ||
| success: false, | ||
| }) | ||
| .$returningId(); | ||
| return await db.query.firecrawlResponses.findFirst({ | ||
| where: eq(firecrawlResponses.id, errorResponse.id), | ||
| }); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Optimize database operations to avoid unnecessary queries
After inserting responses into the database in both the error handling (lines 33-44) and success handling (lines 46-67) blocks, the code performs additional queries to retrieve the same records. You can modify the insert operations to return the inserted records directly using $returning(), eliminating the need for separate queries and improving performance.
Apply this diff to both insert operations:
// Error handling insert
const [errorResponse] = await db
.insert(firecrawlResponses)
.values({
sourceUrl: url,
error: firecrawlResult.error || "Unknown error occurred",
success: false,
- })
- .$returningId();
- return await db.query.firecrawlResponses.findFirst({
- where: eq(firecrawlResponses.id, errorResponse.id),
- });
+ })
+ .$returning(); // Returns the inserted record
+ return errorResponse;
// Success handling insert
const [insertedResponse] = await db
.insert(firecrawlResponses)
.values({
success: firecrawlResult.success,
markdown: firecrawlResult.markdown ?? null,
sourceUrl: firecrawlResult.metadata?.sourceURL || url,
scrapeId: firecrawlResult.metadata?.scrapeId || "",
title: firecrawlResult.metadata?.title || "",
description: firecrawlResult.metadata?.description || "",
language: firecrawlResult.metadata?.language || "",
ogTitle: firecrawlResult.metadata?.ogTitle || "",
ogDescription: firecrawlResult.metadata?.ogDescription || "",
ogUrl: firecrawlResult.metadata?.ogUrl || "",
ogImage: firecrawlResult.metadata?.ogImage || "",
ogSiteName: firecrawlResult.metadata?.ogSiteName || "",
error: null,
- })
- .$returningId();
- return await db.query.firecrawlResponses.findFirst({
- where: eq(firecrawlResponses.id, insertedResponse.id),
- });
+ })
+ .$returning(); // Returns the inserted record
+ return insertedResponse;Also applies to: 46-67
There was a problem hiding this comment.
@chronark afaiu the returning() isn't supported in MySQL (only in pg)?
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| const firecrawl = new FirecrawlApp({ | ||
| apiKey: process.env.FIRECRAWL_API_KEY!, | ||
| }); |
There was a problem hiding this comment.
Add validation for FIRECRAWL_API_KEY environment variable
Currently, the code uses a non-null assertion ! on process.env.FIRECRAWL_API_KEY, which could lead to runtime errors if the environment variable is not set. Consider adding a validation check to ensure that FIRECRAWL_API_KEY is defined before initializing FirecrawlApp.
Apply this diff to add validation:
+if (!process.env.FIRECRAWL_API_KEY) {
+ throw new Error("FIRECRAWL_API_KEY environment variable is not defined");
+}
const firecrawl = new FirecrawlApp({
- apiKey: process.env.FIRECRAWL_API_KEY!,
+ apiKey: process.env.FIRECRAWL_API_KEY,
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const firecrawl = new FirecrawlApp({ | |
| apiKey: process.env.FIRECRAWL_API_KEY!, | |
| }); | |
| if (!process.env.FIRECRAWL_API_KEY) { | |
| throw new Error("FIRECRAWL_API_KEY environment variable is not defined"); | |
| } | |
| const firecrawl = new FirecrawlApp({ | |
| apiKey: process.env.FIRECRAWL_API_KEY, | |
| }); |
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
* feat(billing): add keyword research agent * `pnpm fmt` * `pnpm fmt` pt 2 * remove db:generate --------- Co-authored-by: chronark <dev@chronark.com>
What does this PR do?
See demo here.
keyword-researchtask to the trigger projectbillingTypically, it returns anywhere between 15-30 related keywords to be used to write content optimized for search engines.
Fixes # (issue)
n/a
Type of change
How should this be tested?
billingtrigger project locallyChecklist
Required
pnpm buildpnpm fmtconsole.logsgit pull origin mainAppreciated
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores