feat(billing): add deep-ish research task#2943
feat(billing): add deep-ish research task#2943p6l-richard wants to merge 15 commits intounkeyed:mainfrom
Conversation
* Fixed schema drift by bringing in the inputTermAndKeywordHash column from add-branding-filter branch * Fixed TypeScript errors by downgrading drizzle-orm to 0.33.0 and drizzle-kit to 0.24.2 * Added takeaways-schema to index exports to ensure it's properly included * Fixed 'TypeError: Cannot read properties of undefined (reading type)' error during drizzle-kit push
* Removed dependency on @unkey/db to resolve conflicting drizzle-orm versions (0.31.2 vs 0.33.0) * This dependency is already being removed in another PR
|
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
@p6l-richard is attempting to deploy a commit to the Unkey Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThis pull request introduces modifications across multiple components. In the Changes
Sequence Diagram(s)sequenceDiagram
participant TR as TechnicalResearchTask
participant EX as ExaDomainSearchTask
participant EV as EvaluateSearchResults
participant API as Exa API
TR->>EX: Initiate domain searches in parallel
EX->>API: Execute search for input term
API-->>EX: Return search results
TR->>TR: Collect & deduplicate results
TR->>EV: Evaluate deduplicated results
EV->>API: Validate search results via AI
API-->>EV: Return evaluation data
TR->>API: Scrape content from evaluated URLs
API-->>TR: Return scraped content
TR->>TR: Log cost breakdown & summary
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Thank you for following the naming conventions for pull request titles! 🙏 |
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
apps/billing/src/trigger/glossary/research/exa-domain-search.ts (1)
35-77: 🛠️ Refactor suggestionHandle undefined
domainCategorygracefully
The code at line 48 finds a category by name, but if none is found,domainCategorymight beundefined. At line 74, you assigndomainCategoryintosearchResultWithCategory. Consider error handling or a default scenario to avoid a runtime error if an unexpected domain name is passed.- const domainCategory = domainCategories.find((c) => c.name === domain); - ... - searchResultWithCategory.category = domainCategory; + const domainCategory = domainCategories.find((c) => c.name === domain); + if (!domainCategory) { + throw new Error(`Unrecognized domain category: ${domain}`); + } + searchResultWithCategory.category = domainCategory;apps/billing/src/lib/db-marketing/schemas/serper.ts (1)
1-212: 💡 Verification agent🧩 Analysis chain
Consider adding database migration scripts for these schema changes.
These schema changes will require database migrations in production. Ensure that migration scripts are included in this PR to safely update the database schema without data loss.
🏁 Script executed:
#!/bin/bash # Check if migration files exist for these schema changes fd -e sql -e ts "migration" --exec cat {}Length of output: 75178
Action Required: Add Migration Scripts for Serper Schema Changes
The new serper schema file (apps/billing/src/lib/db-marketing/schemas/serper.ts) introduces multiple tables (e.g. serper_search_responses, serper_organic_results, serper_sitelinks, serper_top_stories, serper_people_also_ask, serper_related_searches). However, the migration files currently present do not include any scripts for these new table definitions. Please add corresponding migration scripts to safely update the production database schema without data loss.
- Add migration files that create the new serper tables.
- Ensure the migrations are tested in a staging environment before deploying to production.
🧹 Nitpick comments (6)
apps/billing/src/trigger/glossary/research/types.ts (1)
1-13: Well-structured cost tracking typeThe
ExaCoststype provides a good structure for tracking API costs from exa.ai, which aligns with the research workflow implementation. Consider adding JSDoc comments to document the purpose of this type and what each property represents.For additional robustness, you might want to implement a validation function to ensure that when breakdowns are provided, they sum up to the total cost.
/** * Represents costs associated with exa.ai API usage * @property costDollars.total - The total cost in USD * @property costDollars.search - Breakdown of search-related costs * @property costDollars.contents - Breakdown of content-related costs */ export type ExaCosts = { costDollars: { total: number; search?: { neural?: number; keyword?: number; }; contents?: { text?: number; summary?: number; }; }; }; /** * Validates that breakdown costs add up to the total (if provided) */ export function validateExaCosts(costs: ExaCosts): boolean { const { total, search, contents } = costs.costDollars; let sum = 0; if (search) { sum += (search.neural || 0) + (search.keyword || 0); } if (contents) { sum += (contents.text || 0) + (contents.summary || 0); } // If we have breakdowns, verify they match the total return sum === 0 || Math.abs(total - sum) < 0.001; // Allow for minor floating point imprecision }apps/billing/src/trigger/glossary/research/_technical-research.ts (2)
7-59: Handle partially failed search runs
The current implementation performs parallel domain searches (lines 19–28), and you log a warning if any fail (lines 29–31), then continue using only successful results. This is fine, but consider whether you should retry partially failed searches or surface warnings to upstream callers for better visibility.
88-101: Check for rate limits and large content requests
Scraping the content (lines 89–92) may invoke a large volume of requests in certain scenarios. Confirm that you handle potential rate limits or large responses from the Exa API gracefully.Would you like help creating a retry strategy or adding backoff logic to handle potential 429 responses from the API?
apps/billing/src/trigger/glossary/research/exa-domain-search.ts (1)
6-33: Ensure domain categories are relevant and comprehensive
YourdomainCategoriesconstant (lines 6–33) enumerates official and community sources, as well as a general "Google" category. Verify that these cover all your needed search domains, and consider logging mismatched user inputs if a domain is not found.apps/billing/src/trigger/glossary/research/evaluate-search-results.ts (2)
18-30: Ensure robust error handling for the Gemini API.Initialization of the Google Generative AI client and usage is straightforward. However, consider how the app should behave if the environment variable is unset or if the AI request fails, ensuring graceful error handling and fallback logic.
74-125: Token usage calculation is helpful.Logging the cost for input and output tokens provides good transparency. Consider adding a threshold or alert if costs exceed a certain budget, to avoid unexpected usage in production.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (15)
apps/billing/package.json(2 hunks)apps/billing/src/lib/db-marketing/schemas/entries.ts(2 hunks)apps/billing/src/lib/db-marketing/schemas/evals.ts(3 hunks)apps/billing/src/lib/db-marketing/schemas/firecrawl.ts(1 hunks)apps/billing/src/lib/db-marketing/schemas/index.ts(1 hunks)apps/billing/src/lib/db-marketing/schemas/keywords.ts(1 hunks)apps/billing/src/lib/db-marketing/schemas/searchQuery.ts(2 hunks)apps/billing/src/lib/db-marketing/schemas/sections.ts(2 hunks)apps/billing/src/lib/db-marketing/schemas/serper.ts(7 hunks)apps/billing/src/trigger/glossary/research/_technical-research.ts(1 hunks)apps/billing/src/trigger/glossary/research/evaluate-search-results.ts(1 hunks)apps/billing/src/trigger/glossary/research/exa-domain-search.ts(1 hunks)apps/billing/src/trigger/glossary/research/types.ts(1 hunks)apps/billing/trigger.config.ts(1 hunks)apps/workflows/next-env.d.ts(1 hunks)
🔇 Additional comments (32)
apps/billing/src/lib/db-marketing/schemas/index.ts (1)
8-8: Export addition looks goodThe new export follows the established pattern in this file and integrates well with the existing schema exports. This is a necessary addition for the research workflow mentioned in the PR objectives.
apps/billing/trigger.config.ts (1)
7-7: Consider if 60 seconds is sufficient for research tasksThe addition of
maxDurationis appropriate for API-intensive research workflows. However, consider if 60 seconds is sufficient for complex research queries involving multiple API calls to exa.ai. Note that this configuration applies globally to all triggers in the project, not just the new research ones.Have you tested the research workflow with real-world queries to ensure they complete within this time limit?
apps/workflows/next-env.d.ts (1)
1-5: Standard Next.js TypeScript declaration fileThis is a standard auto-generated file created by Next.js for TypeScript support. No changes needed as it follows Next.js conventions. As noted in the comment, this file should not be edited manually.
apps/billing/src/lib/db-marketing/schemas/sections.ts (2)
2-2: Remove unused import only if confirmed unused
It appearsvarcharwas removed from the import list, which is acceptable if it is truly no longer needed throughout the file. Good job keeping the imports tidy.
13-13: Verify performance impact of switching from varchar to text
Switchingheadingfromvarchar("heading", { length: 255 })totext("heading").notNull()can impact indexing and performance if queries are frequently filtering or sorting on this field. Consider ensuring you truly need unbounded text and do not plan to index this column heavily.Would you like to confirm the usage patterns for this field and decide whether partial indexes or a shorter
varcharmight be more optimal?apps/billing/src/trigger/glossary/research/_technical-research.ts (2)
60-82: Validate deduplication logic
Your deduplication strategy (lines 66–69) looks correct. Just ensure that you don’t lose relevant variations if URLs differ only by query parameters. If that’s intentional, this is fine. Otherwise, consider including query parameters in the deduplication key.
102-126: Return object structure appears sound
The final return includes a summary, included content, and a detailed cost breakdown. This structure (lines 102–125) neatly organizes relevant data. Looks good!apps/billing/src/trigger/glossary/research/exa-domain-search.ts (1)
1-5: Keep typed imports aligned
You importSelectEntryfrom another schema (line 1), presumably to typeinputTerm. Verify that the shape ofSelectEntryand itsinputTermfield matches your usage, or consider simplifying by using a direct string type if that’s all you need.apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (2)
7-7: No issues with newly addedtextimport.This import call is consistent with the usage of text columns below. No special concerns.
22-26: Be mindful of indexing and performance when switching fromvarchartotext.Increasing
inputTermlength to 767 and converting other columns totextallows for more data flexibility. However, large columns can impact indexing speed, query performance, and storage. Verify that these columns do not need partial or specialized indexes to maintain performance at scale.You may wish to confirm by searching for queries filtering on these fields and ensure that indexing coverage and usage patterns are optimal.
apps/billing/src/trigger/glossary/research/evaluate-search-results.ts (4)
7-11: Validation for evaluation object is clear and concise.The Zod schema enforces numerical rating and textual justification. This helps avoid unexpected data from the AI system.
13-16: Nice typed approach for EvaluateSearchOptions.Clearly defines expected structure for inputTerm and searchResults, improving code readability.
32-72: Use caution with large prompt texts and cost.You’ve included thorough instructions in the prompt, which is excellent for directing the AI. Take care to periodically review token usage for cost management, especially with large search result sets.
1-5:Details
✅ Verification successful
Check environment variables and references in imports.
Imports appear correct. Ensure that
process.env.GEMINI_API_KEYis properly configured, or any undefined references here will block the task from running.
🏁 Script executed:
#!/bin/bash # Verify references to the GEMINI_API_KEY environment variable rg 'GEMINI_API_KEY' -A 3Length of output: 405
Verification Complete: Environment Variable Reference Confirmed
The imports inapps/billing/src/trigger/glossary/research/evaluate-search-results.tsare correct. The code correctly referencesprocess.env.GEMINI_API_KEY(as confirmed by the search output). Please ensure that theGEMINI_API_KEYenvironment variable is properly configured in your deployment environment to prevent runtime issues.apps/billing/src/lib/db-marketing/schemas/entries.ts (2)
33-38: Transition totextfor meta fields and GitHub PR URL.This provides more flexibility in storing longer text. Confirm that there are no constraints or indexing needs on these columns. Also confirm there’s no risk of storing overly large content unnecessarily, which can affect table size.
50-50: Renamed index toinputTermHashIdx.Renaming the index clarifies its purpose. Ensure that references in migrations or other code referencing the old name have been updated accordingly.
apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (2)
26-26: Improved schema flexibility with field type changesConverting these fields from
varchartotextis a good decision. This change allows for unlimited text storage, which is beneficial for content like titles, descriptions, and Open Graph metadata that can vary significantly in length.Also applies to: 28-30, 32-33
37-37: Enhanced field length for improved data handlingIncreasing the
inputTermlength from 255 to 767 characters aligns with similar changes in other schema files, creating consistency across the database design and allowing for longer search terms.apps/billing/src/lib/db-marketing/schemas/evals.ts (2)
2-2: Schema update supports new evaluation typeThe changes to imports, adding "brand_bias" to evaluation types, and updating the field type to varchar appropriately support the new research workflow. Using varchar instead of mysqlEnum provides more flexibility for future additions.
Also applies to: 7-7, 17-17
73-84: Well-structured schema for brand bias evaluationThe new schemas for brand bias evaluation are well-designed with:
- Clear numerical scores for commercial bias, neutrality, and educational value
- Appropriate validation constraints (min/max values of 0-10)
- A structured recommendation format with specific enum types
This implementation aligns perfectly with the PR objective of preventing bias towards vendor marketing materials.
apps/billing/src/lib/db-marketing/schemas/keywords.ts (2)
13-16: Improved schema design with increased field lengths and hash implementationThe increased field lengths (767 characters) for
inputTerm,keyword, andsourceprovide better support for longer text values. The addition of the hash field is a clever solution to MySQL's index length limitations while maintaining data integrity.
23-27: Efficient indexing strategy updateThe new index on the
keywordfield and the updated unique constraint using the hash value instead of the combined fields represent a more efficient database design. This approach avoids MySQL's index length limitations while still maintaining uniqueness constraints.apps/billing/package.json (2)
8-9: Developer experience improvement with update check flagAdding the
--skip-update-checkflag to the development scripts will improve developer workflow by preventing interruptions from update prompts during development and deployment.
21-21: Appropriate dependencies for research workflow implementationThe addition of
@ai-sdk/google(v1.1.19) andexa-js(v1.4.10), along with updating the Trigger.dev dependencies, aligns perfectly with the PR objective of creating a research workflow that utilizes custom searches from exa.ai.Also applies to: 26-28, 33-33
apps/billing/src/lib/db-marketing/schemas/serper.ts (8)
13-13: Field length increase aligns with glossary research requirements.Increasing the
inputTermlength from 255 to 767 characters is appropriate for handling longer search terms that may come from the new exa.ai research workflow mentioned in the PR objectives.
50-54: Field type change from varchar to text improves storage flexibility.Changing
titleandimageUrlfields from varchar to text is an appropriate modification as search result titles and image URLs can potentially be longer than expected varchar limits. This aligns with the research workflow requirements that may need to store more comprehensive search results.
63-66: Good addition of type exports for better type safety.The new schema exports (
serperOrganicResultSchemaand theSerperOrganicResulttype) improve TypeScript type safety and code readability when working with search results in the application.
89-89: Consistent field type change improves flexibility.Changing the
titlefield inserperSitelinksfrom varchar to text maintains consistency with similar changes in other tables and allows for storing longer titles.
121-126: Appropriate field type changes for top stories content.The changes to field types in
serperTopStories(title, source, date, imageUrl) from varchar to text are appropriate for accommodating potentially longer content from various search sources. This will help prevent truncation issues when storing search results.
154-157: Consistent field type changes for "People Also Ask" data.Changing
questionandtitlefields to text type is consistent with other changes and appropriate for storing potentially longer content from "People Also Ask" search results.
188-188: Field length increase for related searches queries.Increasing the
queryfield length from 255 to 767 characters inserperRelatedSearchesis consistent with theinputTermchange and allows for storing longer related search queries.
67-76:Details
❓ Verification inconclusive
Verify removal of sitelinks relation was intentional.
The sitelinks relation has been removed from
serperOrganicResultsRelations. If this is intentional as part of refactoring the data model, ensure that any code that previously accessed organic results' sitelinks has been updated accordingly.
🏁 Script executed:
#!/bin/bash # Search for code that might be using the removed sitelinks relation rg -A 3 "sitelinks" --type tsLength of output: 1122
Action: Confirm intentional removal of sitelinks relation and update dependent code as needed.
It appears that the sitelinks relation was removed from the
serperOrganicResultsRelationsdefinition inapps/billing/src/lib/db-marketing/schemas/serper.ts(lines 67–76). However, our search results show that sitelinks-related definitions (e.g.,"serper_sitelinks", the schema for sitelinks viainsertSitelinkSchema, and the typeNewSitelinkParams) are still present in the same file. Please verify that:
- The removal of the sitelinks relation from
serperOrganicResultsRelationsis intentional and part of your data model refactor.- All modules or API endpoints that used to access organic results’ sitelinks have been updated to rely on the remaining stand-alone sitelinks definitions.
| import { batch, task } from "@trigger.dev/sdk/v3"; | ||
| import Exa from "exa-js"; | ||
| import { evaluateSearchResults } from "./evaluate-search-results"; | ||
| import { domainCategories, exaDomainSearchTask } from "./exa-domain-search"; | ||
| import type { ExaCosts } from "./types"; | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Ensure environment variables are set correctly
This file imports Exa from exa-js (line 2) and relies on process.env.EXA_API_KEY (line 89). Consider verifying that the environment variable is defined and handled properly to avoid runtime errors.
Check the environment setup:
🏁 Script executed:
#!/bin/bash
# Verifies that EXA_API_KEY is set in your environment.
if [ -z "$EXA_API_KEY" ]; then
echo "EXA_API_KEY not set. Please configure it appropriately."
exit 1
fi
echo "EXA_API_KEY is set."Length of output: 145
Action Required: Set the EXA_API_KEY Environment Variable
During verification, the environment variable EXA_API_KEY was found to be unset. Since the file at apps/billing/src/trigger/glossary/research/_technical-research.ts (specifically around line 89) relies on this key, please ensure that:
EXA_API_KEYis defined in your environment before running the code.- Consider adding a safeguard (e.g., a runtime check or a fallback mechanism) to prevent runtime errors when the key is missing.
| "dev": "pnpm dlx trigger.dev@latest dev --skip-update-check", | ||
| "trigger:deploy": "pnpm dlx trigger.dev@latest deploy --skip-update-check", |
There was a problem hiding this comment.
There were some version conflicts when I updated to the latest trigger version, so I skipped their update check
There was a problem hiding this comment.
These changes were from a previous branch.
The PR changes don't use this code but I left it to avoid a db drift causing an issue
There was a problem hiding this comment.
These changes were from a previous branch.
The PR changes don't use this code but I left it to avoid a db drift causing an issue
There was a problem hiding this comment.
These changes were from a previous branch.
The PR changes don't use this code but I left it to avoid a db drift causing an issue
There was a problem hiding this comment.
These changes were from a previous branch.
The PR changes don't use this code but I left it to avoid a db drift causing an issue
There was a problem hiding this comment.
These changes were from a previous branch.
The PR changes don't use this code but I left it to avoid a db drift causing an issue
There was a problem hiding this comment.
I prefix entry tasks with a _, so that they get pinned to the top of the folder
|
|
||
| // we perform a search for each search category in parallel: | ||
| const { runs } = await batch.triggerByTaskAndWait( | ||
| domainCategories.map((domainCategory) => ({ |
There was a problem hiding this comment.
the domainCategories look like this:
export const domainCategories = [
{
name: "Official",
domains: ["tools.ietf.org", "datatracker.ietf.org", "rfc-editor.org", "w3.org", "iso.org"],
description: "Official standards and specifications sources",
},
{
name: "Community",
domains: [
"stackoverflow.com",
"github.com",
"wikipedia.org",
"news.ycombinator.com",
"stackexchange.com",
],
description: "Community-driven platforms and forums",
},
{
name: "Neutral",
domains: ["owasp.org", "developer.mozilla.org"],
description: "Educational and vendor-neutral resources",
},
{
name: "Google",
domains: [], // Empty domains array to search without domain restrictions
description: "General search results without domain restrictions",
},
] as const;There was a problem hiding this comment.
I don't know why, but exa-js doesn't provide the type for this on their response even though they return this data.
There was a problem hiding this comment.
ran a fresh pnpm i after I pull the latest from upstream
What does this PR do?
Demo
Type of change
How should this be tested?
technicaltechnicalResearchTask(){ "inputTerm": "single-sign-on" }PRODRun TestbuttonChecklist
Required
pnpm buildpnpm fmtconsole.logsgit pull origin mainAppreciated
Summary by CodeRabbit
New Features
Refactor
Chores