Skip to content

feat(billing): add keyword research agent#2151

Merged
chronark merged 7 commits intounkeyed:mainfrom
p6l-richard:richard/keyword-research
Oct 3, 2024
Merged

feat(billing): add keyword research agent#2151
chronark merged 7 commits intounkeyed:mainfrom
p6l-richard:richard/keyword-research

Conversation

@p6l-richard
Copy link
Contributor

@p6l-richard p6l-richard commented Oct 2, 2024

What does this PR do?

See demo here.

  • adds a keyword-research task to the trigger project billing
  • the task performs a google search through serper api
  • the task scrapes the top 3 search results through firecrawl api
  • the task extracts a list of keywords from:
  1. the title tags of the top 10 search results
  2. the h1 headers of the top 3 search results
  3. related searched from the google search

Typically, it returns anywhere between 15-30 related keywords to be used to write content optimized for search engines.

Fixes # (issue)

n/a

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • run the billing trigger project locally
pnpm dev --filter=billing

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

  • New Features

    • Introduced a new database configuration for Drizzle ORM, enhancing ORM functionality.
    • Added new scripts for database management in the billing application.
    • Implemented new schemas for managing search queries and keywords, improving data organization.
    • Added functionality for web scraping and keyword research using the Firecrawl API and OpenAI.
  • Bug Fixes

    • Improved error handling during database operations and API interactions.
  • Documentation

    • Updated environment variable validation to include new API keys.
  • Chores

    • Removed unnecessary dependencies and cleaned up configuration files.

@changeset-bot
Copy link

changeset-bot bot commented Oct 2, 2024

⚠️ No Changeset found

Latest commit: 36eb9ae

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@CLAassistant
Copy link

CLAassistant commented Oct 2, 2024

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 2, 2024

📝 Walkthrough
📝 Walkthrough
📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes introduce a new configuration for the Drizzle ORM in drizzle.config.ts, along with several new schema definitions for database tables in the db-marketing directory. The package.json file is updated with new scripts and dependencies, and new functionality for scraping and managing search queries is added through various new files. Additionally, environment variables for API keys are included, and a previous example task is removed. The overall structure of the application is enhanced to support new features related to database interactions and API integrations.

Changes

File Path Change Summary
apps/billing/drizzle.config.ts Introduced new ORM configuration with logging, schema location, output directory, database dialect, and SSL settings.
apps/billing/package.json Updated dev script, added db:push and db:studio scripts, and modified dependencies with several new additions and updates.
apps/billing/src/lib/db-marketing/client.ts Added a new file to establish a database connection using Drizzle ORM with PlanetScale client.
apps/billing/src/lib/db-marketing/schemas/firecrawl.ts Introduced schema for firecrawl_responses table with fields, indexes, and relationships.
apps/billing/src/lib/db-marketing/schemas/index.ts Added exports for serper, searchQuery, keywords, and firecrawl schemas.
apps/billing/src/lib/db-marketing/schemas/keywords.ts Defined schema for keywords table with fields, indexes, and relationships.
apps/billing/src/lib/db-marketing/schemas/searchQuery.ts Created schema for search_queries table with fields, constraints, and relationships.
apps/billing/src/lib/db-marketing/schemas/serper.ts Introduced multiple schemas for search-related data management with defined tables and relationships.
apps/billing/src/lib/env.ts Added new environment variables FIRECRAWL_API_KEY and SERPER_API_KEY for validation.
apps/billing/src/lib/firecrawl.ts Implemented functions for scraping content from URLs using Firecrawl API.
apps/billing/src/lib/search-query.ts Added getOrCreateSearchQuery function for managing search queries.
apps/billing/src/lib/serper.ts Introduced functions for persisting and retrieving search results from the Serper API.
apps/billing/src/trigger/example.ts Removed the helloWorld2Task function.
apps/billing/src/trigger/keyword-research.ts Added keywordResearchTask for conducting keyword research using OpenAI's model.
apps/billing/trigger.config.ts Removed dependenciesToBundle property from the configuration.
biome.json Updated ignore lists in linter, formatter, and organizeImports sections.
internal/billing/package.json Added new dependencies @mendable/firecrawl-js and ai with specified versions.

Suggested reviewers

  • mcstepp
  • perkinsjr

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vercel
Copy link

vercel bot commented Oct 2, 2024

@p6l-richard is attempting to deploy a commit to the Unkey Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2024

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chronark / @perkinsjr i removed this placeholder

randomize: true,
},
},
dependenciesToBundle: [/^@unkey\//],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecated, now every dep gets bundled by default

".react-email",
".content-collections"
".content-collections",
".trigger"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made biome ignore the .trigger code when running it locally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the marketing database (schemas, client) inside apps/billing/src/lib/db-marketing/* since we don't have any other app accessing it right now.

@p6l-richard
Copy link
Contributor Author

pnpm build failed for me as I don't have go installed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 29

🧹 Outside diff range and nitpick comments (8)
apps/billing/drizzle.config.ts (1)

4-7: LGTM: Configuration settings are well-defined

The configuration settings are appropriate and well-structured. The use of a specific schema location and output directory promotes good organization.

Consider adding a comment explaining the purpose of the verbose setting, as it might not be immediately clear to all developers why it's enabled.

apps/billing/src/lib/env.ts (1)

16-18: LGTM! Consider enhancing API key validation.

The addition of FIRECRAWL_API_KEY and SERPER_API_KEY environment variables is correct and aligns with the PR objectives. The implementation is consistent with the existing code style and structure.

To further improve the robustness of the environment variable validation, consider adding more specific validation for the API keys. For example, you could use a regex pattern to ensure they match the expected format of the respective APIs. Here's an example of how you might do this:

FIRECRAWL_API_KEY: z.string().regex(/^fc_[a-zA-Z0-9]{32}$/),
SERPER_API_KEY: z.string().regex(/^[a-zA-Z0-9]{32}$/),

Note: Please adjust the regex patterns to match the actual format of the API keys used by Firecrawl and Serper.

apps/billing/package.json (1)

8-10: Approve script changes with a minor suggestion.

The updates to the scripts section look good:

  1. Switching from pnpx to pnpm dlx in the dev script ensures consistency with the project's package manager.
  2. The new db:push and db:studio scripts indicate the introduction of database management, likely using Drizzle ORM.

Consider adding a db:generate script for schema generation:

 "scripts": {
   "dev": "pnpm dlx trigger.dev@latest dev",
   "db:push": "dotenv -e .env drizzle-kit push",
-  "db:studio": "dotenv -e .env drizzle-kit studio"
+  "db:studio": "dotenv -e .env drizzle-kit studio",
+  "db:generate": "dotenv -e .env drizzle-kit generate:mysql"
 },

This will streamline the database schema management process.

biome.json (1)

44-45: Consider consistent treatment of ".content-collections" across ignore lists

While ".content-collections" is retained in the linter.ignore array, it has been removed from the formatter.ignore and organizeImports.ignore arrays. This inconsistency might lead to unexpected behavior where the content is linted but not formatted or have its imports organized.

Consider one of the following actions:

  1. If ".content-collections" should be ignored by all tools, add it back to the formatter.ignore and organizeImports.ignore arrays.
  2. If ".content-collections" should only be ignored by the linter, remove it from the linter.ignore array as well for consistency.

Please clarify the intended behavior for ".content-collections" and update the configuration accordingly.

Also applies to: 59-65, 70-76

apps/billing/src/lib/search-query.ts (1)

22-33: Correct the grammatical and typographical errors in the system prompt

There is a typographical error in the system prompt at line 24: "You're goal" should be "Your goal". Additionally, reviewing the prompt for grammatical improvements can enhance clarity and professionalism.

apps/billing/src/lib/serper.ts (1)

135-137: Improve Error Message Specificity

The error message at line 136 is generic. Providing more context can help with debugging and user feedback.

Update the error message to include the search query or other relevant details.

  if (!newResponse) {
-   throw new Error("Failed to retrieve newly inserted search response");
+   throw new Error(`Failed to retrieve search response for query '${args.query}'`);
  }
apps/billing/src/trigger/keyword-research.ts (2)

110-112: Simplify the filter for top organic results

The filter function on lines 110-112 can be simplified for clarity:

const topThreeOrganicResults = searchResponse.serperOrganicResults.slice(0, 3);

This assumes that serperOrganicResults is already sorted by position. If so, slicing the first three elements may be more efficient and readable.

Apply this diff if applicable:

-const topThreeOrganicResults = searchResponse.serperOrganicResults.filter(
-  (result) => result.position <= THREE,
-);
+const topThreeOrganicResults = searchResponse.serperOrganicResults.slice(0, 3);

183-195: Handle potential empty 'related_searches'

In lines 183-195, you're inserting related searches into the database. Ensure that searchResponse.serperRelatedSearches contains data before attempting to insert. Adding a check can prevent errors if the array is empty.

Apply this diff to include a safety check:

+if (searchResponse.serperRelatedSearches.length > 0) {
     await db
       .insert(keywords)
       .values(
         searchResponse.serperRelatedSearches.map((search) => ({
           inputTerm: searchQuery.inputTerm,
           keyword: search.query.toLowerCase(),
           source: "related_searches",
           updatedAt: sql`now()`,
         })),
       )
       .onDuplicateKeyUpdate({
         set: {
           updatedAt: sql`now()`,
         },
       });
+}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b7f9828 and 1414a9a.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (17)
  • apps/billing/drizzle.config.ts (1 hunks)
  • apps/billing/package.json (2 hunks)
  • apps/billing/src/lib/db-marketing/client.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/index.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/keywords.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/serper.ts (1 hunks)
  • apps/billing/src/lib/env.ts (1 hunks)
  • apps/billing/src/lib/firecrawl.ts (1 hunks)
  • apps/billing/src/lib/search-query.ts (1 hunks)
  • apps/billing/src/lib/serper.ts (1 hunks)
  • apps/billing/src/trigger/example.ts (0 hunks)
  • apps/billing/src/trigger/keyword-research.ts (1 hunks)
  • apps/billing/trigger.config.ts (0 hunks)
  • biome.json (1 hunks)
  • internal/billing/package.json (1 hunks)
💤 Files with no reviewable changes (2)
  • apps/billing/src/trigger/example.ts
  • apps/billing/trigger.config.ts
✅ Files skipped from review due to trivial changes (1)
  • apps/billing/src/lib/db-marketing/schemas/index.ts
🔇 Additional comments (24)
apps/billing/drizzle.config.ts (1)

1-3: LGTM: Proper import and export structure

The import of defineConfig from 'drizzle-kit' and the default export of the configuration object are correctly implemented. This structure ensures type safety and enables autocompletion in IDEs.

internal/billing/package.json (1)

20-22: New dependencies added for keyword research task.

The addition of @mendable/firecrawl-js and ai packages aligns with the PR objective of implementing a keyword research task.

  • @mendable/firecrawl-js (v1.5.2) will be used for scraping the top search results as mentioned in the PR description.
  • ai (v3.0.23) is likely to be used for processing the scraped data or generating keywords, although its specific use is not detailed in the PR objectives.

To ensure these dependencies are used in the implementation, let's check for their imports:

✅ Verification successful

Dependencies usage verified.

The @mendable/firecrawl-js and ai packages are correctly imported and utilized in the following files:

  • apps/billing/src/lib/firecrawl.ts
  • apps/billing/src/trigger/keyword-research.ts
  • apps/billing/src/lib/search-query.ts
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of new dependencies in the codebase.

# Test 1: Search for imports of @mendable/firecrawl-js
echo "Searching for @mendable/firecrawl-js imports:"
rg --type typescript 'from\s+[\'"]@mendable/firecrawl-js[\'"]'

# Test 2: Search for imports of ai package
echo "Searching for ai package imports:"
rg --type typescript 'from\s+[\'"]ai[\'"]'

Length of output: 195


Script:

#!/bin/bash
# Description: Corrected script to verify the usage of new dependencies in the codebase.

# Test 1: Search for imports of @mendable/firecrawl-js
echo "Searching for @mendable/firecrawl-js imports:"
rg --type typescript "from\s+[\"']@mendable/firecrawl-js[\"']"

# Test 2: Search for imports of ai package
echo "Searching for ai package imports:"
rg --type typescript "from\s+[\"']ai[\"']"

Length of output: 365


Script:

#!/bin/bash
# Description: Corrected script to verify the usage of new dependencies in the codebase by targeting .ts and .tsx files.

# Test 1: Search for imports of @mendable/firecrawl-js
echo "Searching for @mendable/firecrawl-js imports in .ts and .tsx files:"
rg 'from\s+[\"']@mendable/firecrawl-js[\"']' -g '*.ts' -g '*.tsx'

# Test 2: Search for imports of ai package
echo "Searching for ai package imports in .ts and .tsx files:"
rg 'from\s+[\"']ai[\"']' -g '*.ts' -g '*.tsx'

Length of output: 577


Script:

#!/bin/bash
# Description: Fixed script to verify the usage of new dependencies in the codebase by correctly formatting regex patterns.

# Test 1: Search for imports of @mendable/firecrawl-js
echo "Searching for @mendable/firecrawl-js imports in .ts and .tsx files:"
rg "from\s+[\"']@mendable/firecrawl-js[\"']" -g '*.ts' -g '*.tsx'

# Test 2: Search for imports of ai package
echo "Searching for ai package imports in .ts and .tsx files:"
rg "from\s+[\"']ai[\"']" -g '*.ts' -g '*.tsx'

Length of output: 630

apps/billing/src/lib/db-marketing/client.ts (2)

1-4: LGTM: Imports and module structure are well-organized.

The imports are appropriate for setting up a Drizzle ORM connection with PlanetScale. Separating the schema definitions into a local module (./schemas) is a good practice for maintainability.


12-15: Clarify the purpose of cache header removal in the custom fetch function.

The custom fetch function removes the cache header from the request. While this might be necessary for certain scenarios, it's not immediately clear why this is needed.

Could you please provide more context on why the cache header needs to be removed? This information would be valuable for future maintainers.

Consider adding a comment explaining the rationale, for example:

fetch: (url: string, init: any) => {
  // Remove cache header to ensure fresh data on each request
  // This is necessary because [explain the specific reason]
  (init as any).cache = undefined;
  return fetch(url, init);
},

Additionally, to verify if this cache removal is a common practice or specific to this project, we can search for similar patterns in the codebase:

This will help us understand if this is a project-wide practice or specific to this database client.

✅ Verification successful

Cache header removal is consistently applied across the codebase.

The custom fetch function removes the cache header to ensure fresh data on each request. This pattern is consistently used in multiple components, indicating it’s a standard practice within the project.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar cache removal patterns in other files
rg --type typescript "cache\s*=\s*undefined" -g '!apps/billing/src/lib/db-marketing/client.ts'

Length of output: 131


Script:

#!/bin/bash
# Search for similar cache removal patterns in other TypeScript files
rg "cache\s*=\s*undefined" --type ts -g '!apps/billing/src/lib/db-marketing/client.ts'

Length of output: 800

apps/billing/package.json (3)

42-42: Approve addition of drizzle-kit devDependency.

The addition of drizzle-kit as a devDependency is appropriate:

  1. It's consistent with the introduction of Drizzle ORM for database management.
  2. This package is necessary for running the new database scripts (db:push and db:studio) added earlier.

Line range hint 1-44: Summary: Changes align well with PR objectives.

The modifications to package.json effectively support the introduction of the keyword research agent and database management using Drizzle ORM:

  1. New dependencies for keyword research and AI functionality have been added.
  2. Drizzle ORM-related packages have been introduced for database management.
  3. New scripts for database operations have been included.
  4. Existing dependencies have been updated to their latest versions.

These changes are well-structured, follow best practices, and align perfectly with the PR objectives.


16-17: Approve dependency changes with a security suggestion.

The updates to the dependencies section are appropriate and align with the PR objectives:

  1. New dependencies (@agentic/serper, @ai-sdk/openai, @mendable/firecrawl-js) support the keyword research functionality.
  2. Updates to existing dependencies (@trigger.dev/nextjs, @trigger.dev/sdk, @trigger.dev/slack) keep the project up-to-date.
  3. Addition of Drizzle-related packages (drizzle-orm, drizzle-zod) confirms the introduction of Drizzle ORM for database management.

As a security precaution, please run the following command to check for any known vulnerabilities in the new dependencies:

If the result is greater than 0, please review the vulnerabilities and update the dependencies if necessary.

Also applies to: 20-20, 22-24, 27-27, 31-33

biome.json (1)

44-45: LGTM: Addition of ".trigger" to linter ignore list

The addition of ".trigger" to the linter.ignore array is appropriate for the new keyword-research task within the billing trigger project. This change will prevent the linter from analyzing the trigger-related code, which may have specific requirements or generated code that shouldn't be linted.

apps/billing/src/lib/db-marketing/schemas/keywords.ts (3)

25-28: Verify foreign key relationship for 'inputTerm'

Ensure that the data types and constraints for keywords.inputTerm and searchQueries.inputTerm are consistent, including length and nullability. This will prevent potential data integrity issues and ensure that the foreign key relationship functions as expected.


6-22: Well-structured schema with appropriate indexes and constraints

The schema definition for the keywords table is well-organized, with clear field definitions, indexes, and a unique constraint on inputTerm and keyword. The use of Drizzle ORM conventions and best practices is evident.


29-32: ⚠️ Potential issue

Handle nullability in 'sourceUrl' foreign key relationship

The sourceUrl field in the keywords table is nullable, but it's referencing serperOrganicResults.link. Verify that serperOrganicResults.link can accommodate null values or adjust the foreign key constraint to handle nullable references appropriately. This will prevent potential issues when sourceUrl is NULL.

apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (2)

41-46: Verify Compatibility of Foreign Key Relationship

In the firecrawlResponsesRelations, the foreign key relationship links firecrawlResponses.sourceUrl to serperOrganicResults.link. Ensure that both fields have compatible data types and lengths to maintain referential integrity.

Run the following script to compare the definitions:

#!/bin/bash
# Description: Verify that the data types and lengths of the related fields are compatible.

# Expected: Both fields should be of the same type and length.

# Retrieve the definition of 'serperOrganicResults.link'
ast-grep --lang typescript --pattern $'link: varchar($_, { length: $_ }).notNull()' ./apps/billing/src/lib/db-marketing/schemas/serper.ts --json

# Retrieve the definition of 'firecrawlResponses.sourceUrl'
ast-grep --lang typescript --pattern $'sourceUrl: varchar($_, { length: $_ }).notNull()' ./apps/billing/src/lib/db-marketing/schemas/firecrawl.ts --json

Confirm that the lengths and nullability constraints match between the two fields.


21-31: Verify VARCHAR Lengths and Index Key Size Limits

Fields like sourceUrl, title, ogTitle, ogDescription, ogImage, and ogSiteName have VARCHAR lengths set to 767. While this accommodates the maximum index length for InnoDB (767 bytes), using large VARCHAR lengths in indexed columns may cause issues, especially with multibyte character sets like UTF-8.

Run the following script to ensure that the combined index lengths do not exceed MySQL's limitations:

Ensure that the sum of the indexed VARCHAR lengths multiplied by the maximum bytes per character does not exceed the InnoDB index key limit.

apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (1)

39-44: ⚠️ Potential issue

Verify the correctness of the relationship definition.

According to the comments on lines 37-38, the foreign key is stored in the serperSearchResponses table, and the searchQueries relation should have neither fields nor references. However, in the relation definition on lines 39-44, fields and references are specified. Please confirm that this is intentional. If the foreign key resides in serperSearchResponses, you may need to adjust the relation definition to reflect this.

apps/billing/src/lib/serper.ts (3)

70-70: Ensure Successful Completion of All Insert Operations

The Promise.all at line 70 waits for all insert operations to complete. If any insert fails, it will cause the entire promise to reject, and the transaction should roll back. However, since you're filtering the insertPromises array with filter(Boolean), ensure that this doesn't inadvertently omit any necessary operations.

Confirm that all required insert operations are included in the insertPromises array and that partial failures are appropriately handled by the transaction.


16-82: Consider Bulk Insert Optimization

In the persistSerperResults function, you're performing multiple insert operations for different types of search results. Depending on the volume of data, this could be optimized by using bulk insert operations or adjusting the transaction settings.

[performance]

Assess whether batching the insert operations or using more efficient database operations could improve performance.


103-105: Check for Completeness of Existing Response

The condition at line 103 checks for existingResponse.serperOrganicResults.length > 0. Ensure that other related data such as serperRelatedSearches and serperPeopleAlsoAsk are also present if they are critical for your application's functionality.

Consider enhancing the completeness check to include other related data:

  if (
    existingResponse &&
    existingResponse.serperOrganicResults.length > 0 &&
+   existingResponse.serperRelatedSearches.length > 0 &&
+   existingResponse.serperPeopleAlsoAsk.length > 0
  ) {
    return existingResponse;
  }
apps/billing/src/lib/db-marketing/schemas/serper.ts (3)

55-55: Ensure consistent nullability of imageUrl field across tables.

In serperOrganicResults, the imageUrl field is optional, whereas in serperTopStories, it is defined as notNull(). Verify if this difference is intentional. If images may not always be available, consider making imageUrl optional in serperTopStories.

Also applies to: 119-119


15-15: Verify data types and constraints for inputTerm in relations.

Ensure that the data type and length of inputTerm in serperSearchResponses and searchQueries are consistent to maintain referential integrity and prevent potential issues.

Also applies to: 28-31


52-52: 🛠️ Refactor suggestion

Review the maximum length of link fields.

The link fields are defined with a maximum length of 767 characters. Some URLs can exceed this length. Consider increasing the length to accommodate longer URLs, or ensure that truncation or validation is handled appropriately.

Also applies to: 84-84, 116-116, 151-151

apps/billing/src/trigger/keyword-research.ts (4)

17-18: Verify retry policy: 'maxAttempts' is set to 0

Setting maxAttempts: 0 means the task will not retry upon failure. Is this intended? If you want the task to retry on failure, consider setting maxAttempts to a positive value.


68-68: Verify the OpenAI model identifier 'gpt-4o'

In lines 68 and 139, the OpenAI model is specified as 'gpt-4o':

model: openai("gpt-4o"),

Please verify if 'gpt-4o' is a valid model identifier. Typically, OpenAI models are referred to as 'gpt-4', 'gpt-3.5-turbo', etc. If 'gpt-4o' is a custom model or alias, ensure that it's correctly configured.

Also applies to: 139-139


37-37: Verify specificity to API development in the prompt

In line 37, the prompt includes:

-       - Focus on technical and context-specific terms related to API development.

Is the keyword research intended to be specific to API development? If the tool is meant to handle a variety of topics, consider generalizing this line to make the prompt more adaptable.


125-125: Ensure correct extraction of headers from markdown

In line 125, you're using a regular expression to match headers:

${content.markdown?.match(/^##\s+(.*)$/gm)?.join("\n")}

Ensure that this regex correctly captures the headers you intend to extract. It currently matches lines starting with ## . If you also want to include # or deeper header levels like ### , consider adjusting the regex.

Apply this diff if you want to include all header levels:

-${content.markdown?.match(/^##\s+(.*)$/gm)?.join("\n")}
+${content.markdown?.match(/^#+\s+(.*)$/gm)?.join("\n")}

Comment on lines +6 to +11
export const db = drizzle(
new Client({
host: process.env.MARKETING_DATABASE_HOST!,
username: process.env.MARKETING_DATABASE_USERNAME!,
password: process.env.MARKETING_DATABASE_PASSWORD!,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance robustness with environment variable validation and error handling.

While using environment variables for connection details is a good security practice, the current implementation has some potential issues:

  1. The use of non-null assertions (!) on environment variables could lead to runtime errors if the variables are not set.
  2. There's no error handling or connection validation.

Consider implementing the following improvements:

  1. Validate environment variables before use:
const requiredEnvVars = [
  'MARKETING_DATABASE_HOST',
  'MARKETING_DATABASE_USERNAME',
  'MARKETING_DATABASE_PASSWORD'
] as const;

for (const envVar of requiredEnvVars) {
  if (!process.env[envVar]) {
    throw new Error(`Missing required environment variable: ${envVar}`);
  }
}
  1. Add error handling and connection validation:
import { drizzle } from "drizzle-orm/planetscale-serverless";
import { Client } from "@planetscale/database";
import * as schema from "./schemas";

const client = new Client({
  host: process.env.MARKETING_DATABASE_HOST,
  username: process.env.MARKETING_DATABASE_USERNAME,
  password: process.env.MARKETING_DATABASE_PASSWORD,
  fetch: (url: string, init: any) => {
    (init as any).cache = undefined; // Remove cache header
    return fetch(url, init);
  },
});

export const db = drizzle(client, { schema });

// Validate the connection
export async function validateDbConnection() {
  try {
    await client.execute('SELECT 1');
    console.log('Database connection successful');
  } catch (error) {
    console.error('Failed to connect to the database:', error);
    throw error;
  }
}

These changes will make the database connection setup more robust and easier to debug.

Also applies to: 17-19

inputTerm: varchar("input_term", { length: 255 }).notNull(),
keyword: varchar("keyword", { length: 255 }).notNull(),
source: varchar("source", { length: 255 }).notNull(),
sourceUrl: varchar("source_url", { length: 767 }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential issue with 'sourceUrl' length and indexing

The sourceUrl field is defined with a length of 767 characters and is being indexed (line 19). In MySQL with InnoDB and utf8mb4 character set, the maximum index length is 191 characters (since 191 * 4 = 764 bytes). Indexing a VARCHAR(767) column might cause errors due to exceeding the index length limit.

Consider reducing the length of sourceUrl to 191 characters or implementing a prefix index to ensure compatibility.

source: varchar("source", { length: 255 }).notNull(),
sourceUrl: varchar("source_url", { length: 767 }),
createdAt: timestamp("created_at").defaultNow().notNull(),
updatedAt: timestamp("updated_at").defaultNow().notNull(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure 'updatedAt' automatically reflects record modifications

The updatedAt timestamp is set with defaultNow().notNull(), which initializes it at creation time but doesn't update it on record modifications. To keep updatedAt accurate, consider configuring it to automatically update whenever the record is modified. This can be achieved by setting the default to NOW() and adding an ON UPDATE CURRENT_TIMESTAMP clause in the table definition.

ogDescription: varchar("og_description", { length: 767 }),
ogUrl: text("og_url"),
ogImage: varchar("og_image", { length: 767 }),
xogSiteName: varchar("og_site_name", { length: 767 }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Typo in Property Name 'xogSiteName'

In line 30, the property is named xogSiteName, but it seems to be a typo. It should likely be ogSiteName to match the pattern of other Open Graph fields.

Apply this diff to fix the typo:

-    xogSiteName: varchar("og_site_name", { length: 767 }),
+    ogSiteName: varchar("og_site_name", { length: 767 }),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
xogSiteName: varchar("og_site_name", { length: 767 }),
ogSiteName: varchar("og_site_name", { length: 767 }),

Comment on lines +157 to +163
keywordsFromHeaders.object.keywords.map((keyword) => ({
inputTerm: searchQuery.inputTerm,
keyword: keyword.keyword.toLowerCase(),
sourceUrl: keyword.sourceUrl,
source: "headers",
})),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider handling duplicate keywords across different sources

When inserting keywords from titles, headers, and related searches, duplicates might exist across these sources. Currently, duplicates are handled within each insertion but not across different sources. Consider de-duplicating keywords globally to prevent redundancy.

Also applies to: 184-190

Comment on lines +31 to +48
const keywordResearchSystemPrompt = `
You are an SEO Expert & Content Writer specializing in creating technical content for Developer Tools that are highly SEO optimized.

**Your Objectives:**
1. **Keyword Extraction:**
- Extract relevant keywords from the titles of top-ranking organic search results.
- Focus on technical and context-specific terms related to API development.

2. **Quality Assurance:**
- **Remove Stopwords:** Ensure that keywords do not include common stopwords (e.g., "for," "and," "the," "of," etc.).
- **Remove Brand Names:** Ensure that keywords do not include brand names (e.g., "GitHub", "YouTube", "npm", etc.).
- **Remove README keywords:** Ensure to exclude from instructive headers or titles (e.g., "getting started", "installation", etc.) of readmes.

**Guidelines:**
- Prioritize keywords that directly relate to the main term and its subtopics.
- Maintain a focus on terms that potential users or developers are likely to search for in the context of API development.
- Branded keywords should be included in the keywordsWithBrandNames and not in the keywords.
`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Externalize prompt templates for maintainability

The prompt strings in lines 31-48 and 131-137 are lengthy and embedded within the code. Consider externalizing these prompt templates into separate files or constants for better maintainability and readability.

Also applies to: 131-137

Comment on lines +10 to +25
export async function getTopResultsContent(args: { urls: string[] }) {
// Check for existing responses with markdown content
const existingResponses = await db.query.firecrawlResponses.findMany({
where: and(
inArray(firecrawlResponses.sourceUrl, args.urls),
isNotNull(firecrawlResponses.markdown),
),
});

const existingUrlSet = new Set(existingResponses.map((r) => r.sourceUrl));
const urlsToScrape = args.urls.filter((url) => !existingUrlSet.has(url));

// Scrape new URLs
const newResponses = await Promise.all(urlsToScrape.map(scrapeAndStoreUrl)); // Combine existing and new responses
return [...existingResponses, ...newResponses];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor duplicated code to improve maintainability

The functions getTopResultsContent and getScrapedContentMany share similar logic for checking existing responses, determining URLs to scrape, scraping new URLs, and combining results. Consider refactoring the common code into a shared helper function to adhere to the DRY (Don't Repeat Yourself) principle and improve maintainability.

Also applies to: 84-97

Comment on lines +33 to +44
const [errorResponse] = await db
.insert(firecrawlResponses)
.values({
sourceUrl: url,
error: firecrawlResult.error || "Unknown error occurred",
success: false,
})
.$returningId();
return await db.query.firecrawlResponses.findFirst({
where: eq(firecrawlResponses.id, errorResponse.id),
});
}
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize database operations to avoid unnecessary queries

After inserting responses into the database in both the error handling (lines 33-44) and success handling (lines 46-67) blocks, the code performs additional queries to retrieve the same records. You can modify the insert operations to return the inserted records directly using $returning(), eliminating the need for separate queries and improving performance.

Apply this diff to both insert operations:

// Error handling insert
 const [errorResponse] = await db
   .insert(firecrawlResponses)
   .values({
     sourceUrl: url,
     error: firecrawlResult.error || "Unknown error occurred",
     success: false,
-  })
-  .$returningId();
- return await db.query.firecrawlResponses.findFirst({
-   where: eq(firecrawlResponses.id, errorResponse.id),
- });
+  })
+  .$returning(); // Returns the inserted record
+  return errorResponse;

// Success handling insert
 const [insertedResponse] = await db
   .insert(firecrawlResponses)
   .values({
     success: firecrawlResult.success,
     markdown: firecrawlResult.markdown ?? null,
     sourceUrl: firecrawlResult.metadata?.sourceURL || url,
     scrapeId: firecrawlResult.metadata?.scrapeId || "",
     title: firecrawlResult.metadata?.title || "",
     description: firecrawlResult.metadata?.description || "",
     language: firecrawlResult.metadata?.language || "",
     ogTitle: firecrawlResult.metadata?.ogTitle || "",
     ogDescription: firecrawlResult.metadata?.ogDescription || "",
     ogUrl: firecrawlResult.metadata?.ogUrl || "",
     ogImage: firecrawlResult.metadata?.ogImage || "",
     ogSiteName: firecrawlResult.metadata?.ogSiteName || "",
     error: null,
-  })
-  .$returningId();
- return await db.query.firecrawlResponses.findFirst({
-   where: eq(firecrawlResponses.id, insertedResponse.id),
- });
+  })
+  .$returning(); // Returns the inserted record
+  return insertedResponse;

Also applies to: 46-67

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chronark afaiu the returning() isn't supported in MySQL (only in pg)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I believe so.

Comment on lines +6 to +8
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add validation for FIRECRAWL_API_KEY environment variable

Currently, the code uses a non-null assertion ! on process.env.FIRECRAWL_API_KEY, which could lead to runtime errors if the environment variable is not set. Consider adding a validation check to ensure that FIRECRAWL_API_KEY is defined before initializing FirecrawlApp.

Apply this diff to add validation:

+if (!process.env.FIRECRAWL_API_KEY) {
+  throw new Error("FIRECRAWL_API_KEY environment variable is not defined");
+}
 const firecrawl = new FirecrawlApp({
-  apiKey: process.env.FIRECRAWL_API_KEY!,
+  apiKey: process.env.FIRECRAWL_API_KEY,
});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
if (!process.env.FIRECRAWL_API_KEY) {
throw new Error("FIRECRAWL_API_KEY environment variable is not defined");
}
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY,
});

@vercel
Copy link

vercel bot commented Oct 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dashboard ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 3, 2024 10:32am
planetfall ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 3, 2024 10:32am
play ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 3, 2024 10:32am
workflows ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 3, 2024 10:32am
www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 3, 2024 10:32am

@vercel vercel bot temporarily deployed to Preview – play October 3, 2024 10:26 Inactive
@vercel vercel bot temporarily deployed to Preview – planetfall October 3, 2024 10:28 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard October 3, 2024 10:30 Inactive
@vercel vercel bot temporarily deployed to Preview – workflows October 3, 2024 10:30 Inactive
@vercel vercel bot temporarily deployed to Preview – www October 3, 2024 10:32 Inactive
@chronark chronark self-assigned this Oct 3, 2024
@chronark chronark merged commit 8a9477b into unkeyed:main Oct 3, 2024
mcstepp pushed a commit that referenced this pull request Oct 8, 2024
* feat(billing): add keyword research agent

* `pnpm fmt`

* `pnpm fmt` pt 2

* remove db:generate

---------

Co-authored-by: chronark <dev@chronark.com>
@coderabbitai coderabbitai bot mentioned this pull request Dec 18, 2024
18 tasks
@coderabbitai coderabbitai bot mentioned this pull request Jan 22, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants