Skip to content

Longer bulk categorize#257

Merged
elie222 merged 66 commits intomainfrom
bulk-categorize
Nov 25, 2024
Merged

Longer bulk categorize#257
elie222 merged 66 commits intomainfrom
bulk-categorize

Conversation

@elie222
Copy link
Owner

@elie222 elie222 commented Nov 8, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new CategorizeSendersProgress component for tracking categorization progress.
    • Added a ProgressPanel component for improved display of archiving progress.
    • Implemented batch processing for sender categorization with enhanced error handling.
    • New API endpoint for retrieving user categorization progress.
    • Enhanced sender categorization functionality with AI-driven methods and static rules.
    • New environment variables added for improved configuration management.
  • Improvements

    • Enhanced error handling across various components and actions.
    • Updated success messages to provide clearer feedback to users based on the number of uncategorized senders.
    • Streamlined categorization logic for better performance and maintainability.
    • Simplified component logic by removing unused state and features.
    • Updated logging levels for improved clarity in error reporting.
  • Bug Fixes

    • Resolved issues related to uncategorized senders retrieval and session management.
    • Adjusted the handling of sender categorization to prevent potential runtime errors.
  • Documentation

    • Updated environment variable settings for improved configuration management.

@vercel
Copy link

vercel bot commented Nov 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
inbox-zero ✅ Ready (Inspect) Visit Preview Nov 25, 2024 9:40pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 8, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request introduces significant updates across various components and utilities related to sender categorization and user interactions. Key changes include renaming and refactoring of categorization actions, enhancements to API endpoints for batch processing and progress tracking, and updates to environment configurations. New components for progress display and improved error handling mechanisms have been added. Additionally, several unused features have been removed, streamlining the overall functionality. The changes reflect a comprehensive effort to enhance the maintainability and clarity of the categorization process.

Changes

File Change Summary
apps/web/app/(app)/smart-categories/CategorizeWithAiButton.tsx Renamed action from categorizeSendersAction to bulkCategorizeSendersAction, updated toast messages, added buttonProps parameter.
apps/web/app/api/user/categorize/senders/batch/route.ts Defined a new API endpoint for handling POST requests related to sender categorization.
apps/web/env.ts Added several new optional environment variables: NEXTAUTH_URL, QSTASH_TOKEN, QSTASH_CURRENT_SIGNING_KEY, QSTASH_NEXT_SIGNING_KEY, WEBHOOK_URL, INTERNAL_API_KEY.
apps/web/utils/actions/categorize.ts Refactored categorizeSendersAction to bulkCategorizeSendersAction, added getUncategorizedSenders, removed categorizeEmailAction.
apps/web/app/api/user/categorize/senders/find-senders.ts Removed the entire file containing functions for finding senders.
apps/web/utils/gmail/message.ts Updated error message in getMessagesBatch function to specify a maximum of 100 messages.
apps/web/app/api/user/categorize/senders/progress/route.ts Introduced a new API endpoint for retrieving user categorization progress.
apps/web/components/ProgressPanel.tsx Added a new component for displaying progress information during categorization.
apps/web/app/(app)/smart-categories/CategorizeProgress.tsx Introduced CategorizeSendersProgress component for tracking categorization progress.
apps/web/app/api/user/categorize/senders/uncategorized/route.ts Enhanced error handling in the GET method, removed local getUncategorizedSenders function.
apps/web/utils/user/validate.ts Added validateUserAndAiAccess function for checking user access to AI features.
apps/web/prisma/schema.prisma Updated Account and User models, added unique constraint to userId.
apps/web/utils/actions/client.ts Updated onDeleteFilter function parameter from gmailLabelId to filterId.
apps/web/utils/actions/middleware.ts Added logging statement in withActionInstrumentation function.
apps/web/utils/actions/ai-rule.ts Introduced generateRulesPromptAction function for generating rules based on user activity.
apps/web/utils/actions/helpers.ts Updated return type for getSessionAndGmailClient to enhance error handling.
apps/web/utils/ai/categorize-sender/ai-categorize-senders.ts Updated user parameter type in aiCategorizeSenders function.
apps/web/utils/ai/choose-rule/run-rules.ts Updated getApplicableRules function to use findFirst instead of findUnique.
apps/web/prisma/migrations/20241119163400_categorize_date_range/migration.sql Dropped categorizeEmails column and added timestamp columns.
apps/web/prisma/migrations/20241125052523_remove_categorized_time/migration.sql Dropped newestCategorizedEmailTime and oldestCategorizedEmailTime columns.
apps/web/app/api/user/categorize/senders/batch/handle-batch.ts Introduced handleBatchRequest function for processing batch requests for sender categorizations.
apps/web/utils/internal-api.ts Added isValidInternalApiKey function for validating internal API keys.
apps/web/.env.example Added new environment variable declarations: NEXTAUTH_URL, QSTASH_TOKEN, QSTASH_CURRENT_SIGNING_KEY, QSTASH_NEXT_SIGNING_KEY.

Possibly related PRs

🐰 In the garden, where bunnies play,
Changes sprout like flowers in May.
With buttons now more bright and clear,
Categorizing senders brings us cheer!
New paths for progress, we hop along,
In this world of code, we all belong! 🌼✨

Warning

Rate limit exceeded

@elie222 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 54 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b62a30b and 8e2dc7c.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (11)
apps/web/utils/internal-api.ts (1)

1-3: LGTM! Consider documenting the header's purpose.

The header constant follows security best practices with the 'x-' prefix for custom headers. Consider adding a JSDoc comment to document its purpose and expected usage.

 import { env } from "@/env";

+/** Header key used to authenticate internal API requests */
 export const INTERNAL_API_KEY_HEADER = "x-internal-api-key";
apps/web/app/api/user/categorize/senders/batch/trigger.ts (1)

5-10: Enhance schema validation rules.

Consider adding stricter validation rules to ensure data integrity:

  • Add .min(1) for userId to ensure it's non-empty
  • Add .min(0) for pageIndex to prevent negative page numbers
 export const categorizeSendersBatchSchema = z.object({
-  userId: z.string(),
+  userId: z.string().min(1),
   pageToken: z.string().optional(),
-  pageIndex: z.number().default(0),
+  pageIndex: z.number().min(0).default(0),
 });
apps/web/app/api/user/categorize/senders/batch/route.ts (2)

13-13: Document and consider making MAX_PAGES configurable.

The MAX_PAGES constant's value seems arbitrary. Consider:

  1. Adding a comment explaining why 10 is the chosen limit
  2. Making it configurable via environment variable for different environments
+// Maximum number of batch pages to process to prevent infinite loops
+// Can be configured via MAX_CATEGORIZE_PAGES env var
-const MAX_PAGES = 10;
+const MAX_PAGES = process.env.MAX_CATEGORIZE_PAGES ? parseInt(process.env.MAX_CATEGORIZE_PAGES) : 10;

42-46: Improve response handling and status information.

The current implementation has several areas for improvement:

  1. The sleep duration (100ms) seems arbitrary
  2. Limited status information in the response
  3. No way to track actual processing progress
-  // Make sure the next batch API request is triggered
-  await sleep(100);
+  // Ensure the next batch trigger has time to initialize
+  // This value should be adjusted based on monitoring data
+  await sleep(process.env.BATCH_TRIGGER_DELAY_MS ? 
+    parseInt(process.env.BATCH_TRIGGER_DELAY_MS) : 100);

-  return NextResponse.json({ status: "processing" });
+  return NextResponse.json({
+    status: "processing",
+    currentPage: pageIndex,
+    totalProcessed: pageIndex * BATCH_SIZE,
+    estimatedTimeRemaining: ((MAX_PAGES - pageIndex) * 100) / 1000 + " seconds",
+  });
apps/web/app/(app)/smart-categories/CategorizeWithAiButton.tsx (1)

31-32: Consider enhancing state management for bulk operations.

While the action has been updated for bulk processing, the state management might need refinement:

  1. The isCategorizing state might not accurately reflect the ongoing background processing.
  2. Users might need a way to track the progress of the bulk operation.

Consider these improvements:

 export function CategorizeWithAiButton() {
   const [isCategorizing, setIsCategorizing] = useState(false);
+  const [progress, setProgress] = useState(0);
+  const [totalItems, setTotalItems] = useState(0);

   // ... existing code ...

   const result = await handleActionCall(
     "bulkCategorizeSendersAction",
     bulkCategorizeSendersAction,
+    {
+      onProgress: (processed, total) => {
+        setProgress(processed);
+        setTotalItems(total);
+      }
+    }
   );
apps/web/.env.example (2)

6-7: Document the purpose of INTERNAL_API_KEY

While the implementation and secret generation guidance is good, please add a brief comment explaining the purpose of this key (e.g., "Used for authenticating internal API requests between services").


6-7: Consider adding validation requirements

The comment should include any specific requirements for the key format or minimum length to ensure generated keys meet the validation criteria implemented in env.ts and internal-api.ts.

Example:

 # Generate a random secret here: https://generate-secret.vercel.app/32
+# Must be at least 32 characters long and contain only alphanumeric characters
 INTERNAL_API_KEY=
apps/web/env.ts (1)

10-10: Add documentation for INTERNAL_API_KEY.

Consider adding JSDoc comments to document the purpose and requirements of this environment variable:

+    /** Internal API key used for authenticating batch categorization requests.
+     * Should be a secure random string of at least 32 characters.
+     * @example "your-secure-random-internal-api-key-here"
+     */
     INTERNAL_API_KEY: z.string().min(1),
apps/web/utils/actions/categorize.ts (3)

105-105: Update the log message to match the function name

The log message in categorizeSenders references categorizeSendersAction, which may cause confusion. Update it to reflect the current function name.

-  console.log("categorizeSendersAction", userId, pageToken);
+  console.log("categorizeSenders", userId, pageToken);

Line range hint 304-305: Fix the domain matching logic in preCategorizeSendersWithStaticRules

The condition includes an unintended > character, which may prevent correct identification of personal email domains.

-    if (personalEmailDomains.some((domain) => sender.includes(`@${domain}>`)))
+    if (personalEmailDomains.some((domain) => sender.includes(`@${domain}`)))

242-245: Return a response from bulkCategorizeSendersAction for consistency

The bulkCategorizeSendersAction function does not return any value after triggering the batch. Consider returning a success status for consistency with other action functions.

     await triggerCategorizeBatch({ userId: session.user.id, pageIndex: 0 });
+    return { success: true };
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e9c2031 and e5c8cbd.

📒 Files selected for processing (8)
  • apps/web/.env.example (1 hunks)
  • apps/web/app/(app)/smart-categories/CategorizeWithAiButton.tsx (3 hunks)
  • apps/web/app/api/user/categorize/senders/batch/route.ts (1 hunks)
  • apps/web/app/api/user/categorize/senders/batch/trigger.ts (1 hunks)
  • apps/web/env.ts (1 hunks)
  • apps/web/utils/actions/categorize.ts (2 hunks)
  • apps/web/utils/internal-api.ts (1 hunks)
  • turbo.json (1 hunks)
🔇 Additional comments (6)
apps/web/app/api/user/categorize/senders/batch/trigger.ts (1)

13-13: ⚠️ Potential issue

Security: Avoid using NEXT_PUBLIC_ prefix for internal API URLs.

Using NEXT_PUBLIC_BASE_URL for internal API endpoints is risky as it exposes the URL to the client. Consider:

  1. Using a separate INTERNAL_API_URL environment variable
  2. Adding input validation for the URL construction

Let's verify the environment variable usage:

apps/web/app/api/user/categorize/senders/batch/route.ts (1)

31-41: 🛠️ Refactor suggestion

Enhance batch processing reliability and status tracking.

The current implementation has potential reliability issues:

  1. Non-awaited trigger could fail silently
  2. No mechanism to track failed batches
  3. Limited status information

Consider implementing:

  1. Error handling for the trigger
  2. Status tracking mechanism
  3. Cleanup for failed batches
   // Check if completed
-  if (pageIndex >= MAX_PAGES) return NextResponse.json({ status: "completed" });
-  if (!nextPageToken) return NextResponse.json({ status: "completed" });
+  if (pageIndex >= MAX_PAGES) {
+    console.log({
+      message: "Max pages reached",
+      userId,
+      totalPages: pageIndex,
+    });
+    return NextResponse.json({ 
+      status: "completed",
+      reason: "max_pages_reached",
+      processedPages: pageIndex,
+    });
+  }
+  if (!nextPageToken) {
+    return NextResponse.json({ 
+      status: "completed",
+      reason: "no_more_data",
+      processedPages: pageIndex,
+    });
+  }

   // Do not await this, so we can return early
-  triggerCategorizeBatch({
-    userId,
-    pageToken: nextPageToken,
-    pageIndex: pageIndex + 1,
-  });
+  try {
+    triggerCategorizeBatch({
+      userId,
+      pageToken: nextPageToken,
+      pageIndex: pageIndex + 1,
+    });
+  } catch (error) {
+    console.error({
+      message: "Failed to trigger next batch",
+      userId,
+      pageIndex,
+      error: error.message,
+    });
+    throw error;
+  }
apps/web/app/(app)/smart-categories/CategorizeWithAiButton.tsx (2)

7-7: LGTM: Import statement updated for bulk processing.

The import statement has been correctly updated to use the new bulk categorization action.


Line range hint 26-53: Verify rate limiting and batch size constraints.

Since this is now handling bulk operations, ensure that:

  1. The backend has appropriate rate limiting
  2. There are reasonable constraints on batch sizes
  3. The operation can be safely interrupted if needed

Let's verify the implementation of these safety measures:

✅ Verification successful

Let me try a different search strategy to find relevant information about batch processing and safety measures.


Let me check the implementation of the batch processing and categorization logic to fully understand the safety measures.


Based on the gathered information, I can now provide the final response:

Batch processing has appropriate safety measures in place

The implementation includes several safety constraints:

  • Gmail batch operations are limited to 100 items (BATCH_LIMIT)
  • Bulk categorization is capped at 10 pages (MAX_PAGES)
  • The batch route is protected with internal API key validation
  • Asynchronous processing with proper error handling is implemented
  • Rate limiting is inherently handled by Gmail's API quotas

The current implementation appears to have sufficient safeguards for:

  1. Batch size constraints (100 items per batch)
  2. Overall processing limits (10 pages maximum)
  3. Safe interruption (stateless page-based processing)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for rate limiting and batch size validation in the API route and action
echo "Checking for rate limiting implementation..."
rg -A 5 "rate.*limit|batch.*size|maxItems" apps/web/app/api/user/categorize/

echo "Checking for error handling of large batches..."
ast-grep --pattern 'throw new Error($$$)' apps/web/app/api/user/categorize/

Length of output: 360


Script:

#!/bin/bash
# Search for the bulk categorization action implementation
echo "Checking bulkCategorizeSendersAction implementation..."
rg -l "bulkCategorizeSendersAction" 
rg -A 10 "bulkCategorizeSendersAction.*=|export.*bulkCategorizeSendersAction"

# Search for any validation or constraints in the route handlers
echo -e "\nChecking route handlers for validation..."
fd "route.ts" apps/web/app/api/user/categorize/ --exec rg -A 10 "export.*POST|export.*GET"

# Look for any size or limit related constants
echo -e "\nChecking for size/limit constants..."
rg "MAX_|LIMIT_|BATCH_" --type ts

Length of output: 4445


Script:

#!/bin/bash
# Check the batch processing implementation
echo "Checking batch processing implementation..."
rg -A 10 "triggerCategorizeBatch" 

# Check the categorization batch route implementation
echo -e "\nChecking categorization batch route..."
cat apps/web/app/api/user/categorize/senders/batch/route.ts

# Check Gmail batch utility implementation
echo -e "\nChecking Gmail batch utility..."
cat apps/web/utils/gmail/batch.ts

Length of output: 9216

turbo.json (1)

11-13: Verify the necessity of multiple authentication credentials.

The code includes both the new INTERNAL_API_KEY and the existing Google OAuth credentials. Please verify if both authentication methods are intentionally needed.

If this represents a transition period:

  1. Consider adding documentation about the migration timeline
  2. Plan for the eventual removal of deprecated credentials
  3. Ensure proper communication to users about any authentication changes

If both are needed permanently, document the specific use case for each authentication method.

apps/web/env.ts (1)

10-10: Verify the internal API key usage across the codebase.

Let's ensure the internal API key is being used securely in the new batch categorization endpoints.

✅ Verification successful

Internal API key implementation verified and secure

The internal API key is implemented securely in the batch categorization endpoints:

  • Properly validated using isValidInternalApiKey function
  • Only used server-side in API routes
  • Transmitted via secure header x-internal-api-key
  • No exposure in client-side code
  • Proper 401 response on invalid key
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check how INTERNAL_API_KEY is used across the codebase

# Test 1: Find files that use INTERNAL_API_KEY
echo "Files using INTERNAL_API_KEY:"
rg "INTERNAL_API_KEY" --type ts

# Test 2: Check for potential security issues in API routes
echo -e "\nAPI route implementations:"
rg -A 10 "export async function (GET|POST|PUT|DELETE)" "app/api"

# Test 3: Look for potential key exposure in client-side code
echo -e "\nChecking for potential exposure in client-side code:"
rg "INTERNAL_API_KEY" "app/\(page\|layout\|client\)"

Length of output: 1192


Script:

#!/bin/bash
# Let's check the API routes with correct paths and analyze the implementation
echo "Checking API route implementations:"
rg -A 10 "export async function (GET|POST|PUT|DELETE)" "apps/web/app/api"

# Let's verify client-side code with correct paths
echo -e "\nChecking for potential exposure in client-side code:"
rg "INTERNAL_API_KEY" "apps/web/app/(.*page|.*layout|.*client)"

# Let's examine the internal API key validation implementation
echo -e "\nChecking internal API validation:"
cat apps/web/utils/internal-api.ts

# Let's check the batch categorization endpoint implementation
echo -e "\nChecking batch categorization implementation:"
cat apps/web/app/api/user/categorize/senders/batch/route.ts

Length of output: 3237

Comment on lines 5 to 7
export function isValidInternalApiKey(apiKey: string | null) {
return !!apiKey && apiKey === env.INTERNAL_API_KEY;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

⚠️ Potential issue

Based on the gathered information, I can now provide the final response:

Enhance security of API key validation with appropriate measures

The suggested changes need to be adjusted based on the codebase findings:

  1. The environment variable is already validated using Zod with z.string().min(1) in env.ts
  2. The API key is used in a controlled internal API context for batch processing
  3. The codebase already uses crypto.timingSafeEqual for other sensitive comparisons (e.g., in webhook validation)

Suggested implementation:

import { timingSafeEqual } from "node:crypto";

export function isValidInternalApiKey(apiKey: string | null): boolean {
  if (!apiKey || !env.INTERNAL_API_KEY) return false;
  
  try {
    return timingSafeEqual(
      Buffer.from(apiKey),
      Buffer.from(env.INTERNAL_API_KEY)
    );
  } catch {
    return false;
  }
}
  • The whitespace trimming can be skipped as this is an internal API key comparison
  • Length validation is not critical here as the key is validated at environment level
  • Using timingSafeEqual aligns with existing security patterns in the codebase
🔗 Analysis chain

Enhance security of API key validation.

The current implementation has potential security vulnerabilities:

  1. Direct string comparison is vulnerable to timing attacks
  2. Missing validation for key length and format
  3. No whitespace handling
  4. Accepting null could mask configuration issues

Consider this more secure implementation:

-export function isValidInternalApiKey(apiKey: string | null) {
-  return !!apiKey && apiKey === env.INTERNAL_API_KEY;
+import { timingSafeEqual } from "crypto";
+
+export function isValidInternalApiKey(apiKey: string | null): boolean {
+  if (!apiKey || !env.INTERNAL_API_KEY) return false;
+  
+  const sanitizedKey = apiKey.trim();
+  if (sanitizedKey.length < 32) return false; // minimum recommended length
+  
+  try {
+    return timingSafeEqual(
+      Buffer.from(sanitizedKey),
+      Buffer.from(env.INTERNAL_API_KEY)
+    );
+  } catch {
+    return false;
+  }
+}

Let's verify the environment variable is properly set:


Let me gather more information about how this internal API key is used in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if INTERNAL_API_KEY is defined in env files
echo "Checking env files for INTERNAL_API_KEY..."
rg "INTERNAL_API_KEY" .env* || echo "No INTERNAL_API_KEY found in env files"

# Check minimum length validation in other internal API related code
echo "Checking for other internal API key validations..."
rg "INTERNAL_API_KEY.*length" || echo "No other length validations found"

Length of output: 477


Script:

#!/bin/bash
# Check where INTERNAL_API_KEY is used
echo "Checking usage of INTERNAL_API_KEY..."
rg "INTERNAL_API_KEY" -A 3 -B 3

# Check for any existing crypto imports or timing-safe comparisons
echo -e "\nChecking for crypto usage..."
rg "timingSafeEqual|crypto" --type ts

# Check for other API key validation patterns
echo -e "\nChecking for other API key validation patterns..."
ast-grep --pattern 'function $FUNC($_): $_ { 
  $$$
  apiKey
  $$$
}'

Length of output: 4559

Comment on lines 12 to 23
export async function triggerCategorizeBatch(body: CategorizeSendersBatchBody) {
const url = `${env.NEXT_PUBLIC_BASE_URL}/api/user/categorize/senders/batch`;

fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
[INTERNAL_API_KEY_HEADER]: env.INTERNAL_API_KEY,
},
body: JSON.stringify(body),
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix async/await usage and add error handling.

The current implementation has several issues:

  1. The fetch call isn't awaited, causing the function to return before the request completes
  2. Missing error handling
  3. No response handling or return value
  4. No timeout protection for the network request

Apply this fix:

 export async function triggerCategorizeBatch(body: CategorizeSendersBatchBody) {
   const url = `${env.NEXT_PUBLIC_BASE_URL}/api/user/categorize/senders/batch`;
 
-  fetch(url, {
-    method: "POST",
-    headers: {
-      "Content-Type": "application/json",
-      [INTERNAL_API_KEY_HEADER]: env.INTERNAL_API_KEY,
-    },
-    body: JSON.stringify(body),
-  });
+  try {
+    const controller = new AbortController();
+    const timeoutId = setTimeout(() => controller.abort(), 30000); // 30s timeout
+    
+    const response = await fetch(url, {
+      method: "POST",
+      headers: {
+        "Content-Type": "application/json",
+        [INTERNAL_API_KEY_HEADER]: env.INTERNAL_API_KEY,
+      },
+      body: JSON.stringify(body),
+      signal: controller.signal,
+    });
+    
+    clearTimeout(timeoutId);
+    
+    if (!response.ok) {
+      throw new Error(`HTTP error! status: ${response.status}`);
+    }
+    
+    return await response.json();
+  } catch (error) {
+    if (error instanceof Error) {
+      throw new Error(`Failed to trigger batch categorization: ${error.message}`);
+    }
+    throw error;
+  }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function triggerCategorizeBatch(body: CategorizeSendersBatchBody) {
const url = `${env.NEXT_PUBLIC_BASE_URL}/api/user/categorize/senders/batch`;
fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
[INTERNAL_API_KEY_HEADER]: env.INTERNAL_API_KEY,
},
body: JSON.stringify(body),
});
}
export async function triggerCategorizeBatch(body: CategorizeSendersBatchBody) {
const url = `${env.NEXT_PUBLIC_BASE_URL}/api/user/categorize/senders/batch`;
try {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30s timeout
const response = await fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
[INTERNAL_API_KEY_HEADER]: env.INTERNAL_API_KEY,
},
body: JSON.stringify(body),
signal: controller.signal,
});
clearTimeout(timeoutId);
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
return await response.json();
} catch (error) {
if (error instanceof Error) {
throw new Error(`Failed to trigger batch categorization: ${error.message}`);
}
throw error;
}
}

Comment on lines 22 to 30
const json = await request.json();
const body = categorizeSendersBatchSchema.parse(json);
const { userId, pageToken, pageIndex } = body;

console.log("categorizeSendersBatch", userId, pageIndex);

// Process the batch
const { nextPageToken } = await categorizeSenders(userId, pageToken);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling and logging for batch processing.

The current implementation could benefit from:

  1. More comprehensive error handling
  2. Structured logging with additional context
  3. Validation of userId format
   const json = await request.json();
   const body = categorizeSendersBatchSchema.parse(json);
   const { userId, pageToken, pageIndex } = body;
 
-  console.log("categorizeSendersBatch", userId, pageIndex);
+  console.log({
+    message: "Processing categorization batch",
+    userId,
+    pageIndex,
+    batchSize: pageToken?.length || 0,
+    timestamp: new Date().toISOString(),
+  });
 
   // Process the batch
-  const { nextPageToken } = await categorizeSenders(userId, pageToken);
+  try {
+    const { nextPageToken } = await categorizeSenders(userId, pageToken);
+  } catch (error) {
+    console.error({
+      message: "Batch processing failed",
+      userId,
+      pageIndex,
+      error: error.message,
+    });
+    throw error;
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const json = await request.json();
const body = categorizeSendersBatchSchema.parse(json);
const { userId, pageToken, pageIndex } = body;
console.log("categorizeSendersBatch", userId, pageIndex);
// Process the batch
const { nextPageToken } = await categorizeSenders(userId, pageToken);
const json = await request.json();
const body = categorizeSendersBatchSchema.parse(json);
const { userId, pageToken, pageIndex } = body;
console.log({
message: "Processing categorization batch",
userId,
pageIndex,
batchSize: pageToken?.length || 0,
timestamp: new Date().toISOString(),
});
// Process the batch
try {
const { nextPageToken } = await categorizeSenders(userId, pageToken);
} catch (error) {
console.error({
message: "Batch processing failed",
userId,
pageIndex,
error: error.message,
});
throw error;
}

Comment on lines 15 to 21
export const POST = withError(async (request: Request) => {
const headersList = headers();
const apiKey = headersList.get(INTERNAL_API_KEY_HEADER);

if (!isValidInternalApiKey(apiKey))
return new NextResponse("Unauthorized", { status: 401 });

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance API security and error handling.

While basic authentication is implemented, consider these security improvements:

  1. Add rate limiting to prevent abuse
  2. Implement request logging for security auditing
  3. Return a more structured error response
 export const POST = withError(async (request: Request) => {
+  // Log incoming requests (excluding sensitive data)
+  console.log(`Batch categorization request from ${request.headers.get('x-forwarded-for')}`);
+
   const headersList = headers();
   const apiKey = headersList.get(INTERNAL_API_KEY_HEADER);
 
   if (!isValidInternalApiKey(apiKey))
-    return new NextResponse("Unauthorized", { status: 401 });
+    return NextResponse.json(
+      { error: "Invalid or missing API key" },
+      { status: 401 }
+    );

Committable suggestion skipped: line range outside the PR's diff.

loading: "Categorizing senders...",
success: () => {
return "Senders categorized successfully!";
return "Categorization started! This might take a while.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance user feedback for background processing.

The success message correctly indicates that processing continues in the background, but consider adding:

  1. A way for users to monitor the progress
  2. A link to view the results when available

Consider this enhancement:

 success: () => {
-  return "Categorization started! This might take a while.";
+  return {
+    message: "Categorization started! This might take a while.",
+    action: {
+      label: "View Progress",
+      onClick: () => router.push("/smart-categories/progress")
+    }
+  };
 },

Committable suggestion skipped: line range outside the PR's diff.

apps/web/env.ts Outdated
NODE_ENV: z.enum(["development", "production", "test"]),
DATABASE_URL: z.string().url(),
NEXTAUTH_SECRET: z.string().min(1),
INTERNAL_API_KEY: z.string().min(1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider enhancing security validation for INTERNAL_API_KEY.

While the basic validation ensures a non-empty string, consider strengthening the validation for this security-critical key:

-    INTERNAL_API_KEY: z.string().min(1),
+    INTERNAL_API_KEY: z.string()
+      .min(32, "Internal API key must be at least 32 characters")
+      .regex(/^[a-zA-Z0-9_-]+$/, "Internal API key must only contain alphanumeric characters, underscores, and hyphens"),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
INTERNAL_API_KEY: z.string().min(1),
INTERNAL_API_KEY: z.string()
.min(32, "Internal API key must be at least 32 characters")
.regex(/^[a-zA-Z0-9_-]+$/, "Internal API key must only contain alphanumeric characters, underscores, and hyphens"),

Comment on lines 128 to 130
const accessToken = user.accounts[0].access_token;

if (!userHasAiAccess) return { error: "Please upgrade for AI access" };
if (!accessToken) return { error: "No access token" };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure user.accounts is not empty before accessing its first element

Accessing user.accounts[0] without checking if the array has elements can lead to runtime errors if user.accounts is empty. Verify the array length before accessing its elements.

+  if (!user.accounts || user.accounts.length === 0) return { error: "No accounts found" };
   const accessToken = user.accounts[0].access_token;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const accessToken = user.accounts[0].access_token;
if (!userHasAiAccess) return { error: "Please upgrade for AI access" };
if (!accessToken) return { error: "No access token" };
if (!user.accounts || user.accounts.length === 0) return { error: "No accounts found" };
const accessToken = user.accounts[0].access_token;
if (!accessToken) return { error: "No access token" };

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)
apps/web/app/api/user/categorize/senders/find-senders.ts (2)

20-20: Consider using a named constant for maxResults.

The hard-coded value 50 would be more maintainable as a named constant, especially if it represents a Gmail API limitation or a carefully chosen threshold.

+const DEFAULT_PAGE_SIZE = 50; // Maximum number of threads to fetch per page
+
 export async function findSendersWithPagination(
   gmail: gmail_v1.Gmail,
   accessToken: string,
   maxPages: number,
 ) {
   // ...
   const { senders, nextPageToken: newNextPageToken } = await findSenders(
     gmail,
     accessToken,
-    50,
+    DEFAULT_PAGE_SIZE,
     nextPageToken,
   );

Line range hint 7-37: Consider memory usage and rate limiting for bulk operations.

The implementation accumulates all results in memory through the allSenders Map. For large result sets across multiple pages, this could lead to significant memory usage. Additionally, consider implementing rate limiting or respecting Gmail API quotas for bulk operations.

Suggestions:

  1. Consider implementing a streaming/batched processing approach.
  2. Add rate limiting between API calls.
  3. Monitor and handle Gmail API quota errors.
apps/web/utils/actions/categorize.ts (2)

171-178: Consider implementing batch processing for AI categorization

The current implementation processes senders sequentially through the AI service, which could be slow for large datasets. Consider implementing batch processing to improve performance.

- const aiResults = await aiCategorizeSenders({
-   user,
-   senders: sendersToCategorizeWithAi.map((sender) => ({
-     emailAddress: sender,
-     snippets: sendersResult.senders.get(sender)?.map((m) => m.snippet) || [],
-   })),
-   categories,
- });
+ const batchSize = 50;
+ const aiResults = [];
+ for (let i = 0; i < sendersToCategorizeWithAi.length; i += batchSize) {
+   const batch = sendersToCategorizeWithAi.slice(i, i + batchSize);
+   const batchResults = await aiCategorizeSenders({
+     user,
+     senders: batch.map((sender) => ({
+       emailAddress: sender,
+       snippets: sendersResult.senders.get(sender)?.map((m) => m.snippet) || [],
+     })),
+     categories,
+   });
+   aiResults.push(...batchResults);
+ }

187-188: Address the TODO comment regarding unknown sender categorization

The TODO comment indicates that the current implementation might be slow and suggests breaking out the functionality or streaming results. This should be addressed before finalizing the implementation.

Would you like me to help create a GitHub issue to track this enhancement task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e5c8cbd and d8c9713.

📒 Files selected for processing (4)
  • apps/web/app/api/user/categorize/senders/batch/route.ts (1 hunks)
  • apps/web/app/api/user/categorize/senders/find-senders.ts (2 hunks)
  • apps/web/utils/actions/categorize.ts (2 hunks)
  • apps/web/utils/gmail/message.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/web/app/api/user/categorize/senders/batch/route.ts

export async function findSenders(
gmail: gmail_v1.Gmail,
accessToken: string,
maxResults: number,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for maxResults parameter.

Consider adding validation to ensure maxResults is within acceptable bounds for the Gmail API. This can prevent potential API errors or rate limiting issues during bulk operations.

 export async function findSenders(
   gmail: gmail_v1.Gmail,
   accessToken: string,
   maxResults: number,
   pageToken?: string,
 ) {
+  if (maxResults <= 0 || maxResults > 100) {
+    throw new Error('maxResults must be between 1 and 100');
+  }
   const senders: SenderMap = new Map();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
maxResults: number,
export async function findSenders(
gmail: gmail_v1.Gmail,
accessToken: string,
maxResults: number,
pageToken?: string,
) {
if (maxResults <= 0 || maxResults > 100) {
throw new Error('maxResults must be between 1 and 100');
}
const senders: SenderMap = new Map();

Comment on lines +43 to 46
// https://claude.ai/chat/3984ad45-f5d4-4196-8309-9b5dc9211d05
console.error(
`Error fetching message ${message.error.code} ${message.error.message}`,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling for production use.

The current error logging implementation appears to be a temporary solution, as indicated by the TODO comment. Consider implementing a more robust error handling strategy:

  1. Use a proper logging service instead of console.error
  2. Include additional context (e.g., request ID, timestamp)
  3. Consider retrying failed requests with exponential backoff
  4. Implement proper error reporting to your monitoring system

Here's a suggested implementation:

- // https://claude.ai/chat/3984ad45-f5d4-4196-8309-9b5dc9211d05
- console.error(
-   `Error fetching message ${message.error.code} ${message.error.message}`,
- );
+ logger.error('Failed to fetch message in batch', {
+   errorCode: message.error.code,
+   errorMessage: message.error.message,
+   timestamp: new Date().toISOString(),
+   // Add request ID or other context as needed
+ });
+ 
+ // Optionally, report to error monitoring
+ errorReporting.captureException(new Error('Batch message fetch failed'), {
+   extra: {
+     code: message.error.code,
+     message: message.error.message
+   }
+ });

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 205 to 232
for (const sender of unknownSenders) {
const messages = sendersResult.senders.get(sender.sender);

let previousEmails =
messages?.map((m) => m.snippet).filter(isDefined) || [];

if (previousEmails.length === 0) {
previousEmails = await getPreviousEmails(gmail, sender.sender);
}

// categorize unknown senders
// TODO: this will take a while. so probably break this out, or stream results as they come in
const unknownSenders = [
...results,
...existingSenders.map((s) => ({
sender: s.email,
category: s.category?.name,
})),
].filter(
(r) =>
!r.category ||
r.category === defaultCategory.UNKNOWN.name ||
r.category === REQUEST_MORE_INFORMATION_CATEGORY,
);

console.log(
`Found ${unknownSenders.length} unknown senders to categorize with AI`,
);

for (const sender of unknownSenders) {
const messages = sendersResult.senders.get(sender.sender);
const aiResult = await aiCategorizeSender({
user,
sender: sender.sender,
previousEmails,
categories,
});

let previousEmails =
messages?.map((m) => m.snippet).filter(isDefined) || [];
if (aiResult) {
await saveResult(
{
sender: sender.sender,
category: aiResult.category,
},
categories,
userId,
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Implement parallel processing for unknown senders

The current implementation processes unknown senders sequentially in a for loop, which could be slow for large datasets. Consider using Promise.all for parallel processing.

- for (const sender of unknownSenders) {
-   const messages = sendersResult.senders.get(sender.sender);
-   // ... rest of the loop
- }
+ const processSender = async (sender) => {
+   const messages = sendersResult.senders.get(sender.sender);
+   let previousEmails = messages?.map((m) => m.snippet).filter(isDefined) || [];
+   if (previousEmails.length === 0) {
+     previousEmails = await getPreviousEmails(gmail, sender.sender);
+   }
+   const aiResult = await aiCategorizeSender({
+     user,
+     sender: sender.sender,
+     previousEmails,
+     categories,
+   });
+   if (aiResult) {
+     await saveResult(
+       { sender: sender.sender, category: aiResult.category },
+       categories,
+       userId,
+     );
+   }
+ };
+ 
+ await Promise.all(unknownSenders.map(processSender));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (const sender of unknownSenders) {
const messages = sendersResult.senders.get(sender.sender);
let previousEmails =
messages?.map((m) => m.snippet).filter(isDefined) || [];
if (previousEmails.length === 0) {
previousEmails = await getPreviousEmails(gmail, sender.sender);
}
// categorize unknown senders
// TODO: this will take a while. so probably break this out, or stream results as they come in
const unknownSenders = [
...results,
...existingSenders.map((s) => ({
sender: s.email,
category: s.category?.name,
})),
].filter(
(r) =>
!r.category ||
r.category === defaultCategory.UNKNOWN.name ||
r.category === REQUEST_MORE_INFORMATION_CATEGORY,
);
console.log(
`Found ${unknownSenders.length} unknown senders to categorize with AI`,
);
for (const sender of unknownSenders) {
const messages = sendersResult.senders.get(sender.sender);
const aiResult = await aiCategorizeSender({
user,
sender: sender.sender,
previousEmails,
categories,
});
let previousEmails =
messages?.map((m) => m.snippet).filter(isDefined) || [];
if (aiResult) {
await saveResult(
{
sender: sender.sender,
category: aiResult.category,
},
categories,
userId,
);
}
}
const processSender = async (sender) => {
const messages = sendersResult.senders.get(sender.sender);
let previousEmails = messages?.map((m) => m.snippet).filter(isDefined) || [];
if (previousEmails.length === 0) {
previousEmails = await getPreviousEmails(gmail, sender.sender);
}
const aiResult = await aiCategorizeSender({
user,
sender: sender.sender,
previousEmails,
categories,
});
if (aiResult) {
await saveResult(
{ sender: sender.sender, category: aiResult.category },
categories,
userId,
);
}
};
await Promise.all(unknownSenders.map(processSender));

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (7)
apps/web/utils/upstash.ts (2)

4-7: Consider enhancing error handling and type safety.

While the basic implementation is functional, consider these improvements:

  1. Add return type annotation for better type safety
  2. Add more specific error message including configuration instructions
  3. Consider basic token format validation

Here's a suggested improvement:

-export function getQstashClient() {
+export function getQstashClient(): Client {
   if (!env.QSTASH_TOKEN) 
-    throw new Error("QSTASH_TOKEN is not set");
+    throw new Error(
+      "QSTASH_TOKEN is not configured. Please set this environment variable to enable bulk categorization features."
+    );
+
+  const token = env.QSTASH_TOKEN.trim();
+  if (!token.match(/^[A-Za-z0-9_-]+$/)) {
+    throw new Error("QSTASH_TOKEN appears to be invalid. Please check your configuration.");
+  }
+
-  return new Client({ token: env.QSTASH_TOKEN });
+  return new Client({ token });
 }

1-7: Consider adding resilience patterns for queue operations.

Since this client is used for bulk categorization operations, consider implementing these patterns:

  1. Wrap the client in a service class that handles retries and rate limiting
  2. Add monitoring/logging for queue operations
  3. Consider implementing a circuit breaker pattern for bulk operations

Example monitoring implementation:

class QstashService {
  private client: Client;
  private metrics = {
    operations: 0,
    failures: 0,
  };

  constructor(client: Client) {
    this.client = client;
  }

  // Add methods with monitoring
}

Would you like me to provide a complete implementation of the service class with these patterns?

apps/web/app/api/user/categorize/senders/batch/route.ts (3)

8-8: Consider moving MAX_PAGES to configuration.

Magic numbers should be defined in a configuration file for better maintainability and reusability.

-const MAX_PAGES = 50;
+import { MAX_CATEGORIZATION_PAGES } from "@/config/constants";

10-14: Add type documentation for the request body structure.

For better code maintainability and developer experience, consider adding JSDoc comments describing the expected request body structure.

+/**
+ * @typedef {Object} CategorizeBatchRequest
+ * @property {string} userId - The ID of the user
+ * @property {string} pageToken - Token for pagination
+ * @property {number} pageIndex - Current page index
+ */
 export const POST = withError(
   verifySignatureAppRouter(async (request: Request) => {

26-32: Add timeout handling and error recovery for async operations.

The asynchronous batch triggering could benefit from:

  1. Timeout handling for long-running operations
  2. Error recovery mechanism for failed triggers
  3. Dead letter queue for failed batches

Consider implementing a timeout mechanism and error recovery:

+    const TRIGGER_TIMEOUT = 30000; // 30 seconds
+    try {
+      await Promise.race([
+        triggerCategorizeBatch({
           userId,
           pageToken: nextPageToken,
           pageIndex: pageIndex + 1,
-        });
+        }),
+        new Promise((_, reject) =>
+          setTimeout(() => reject(new Error('Trigger timeout')), TRIGGER_TIMEOUT)
+        ),
+      ]);
+    } catch (error) {
+      console.error('Failed to trigger next batch:', error);
+      // Store failed batch in dead letter queue for retry
+      await storeFailedBatch({ userId, pageToken: nextPageToken, pageIndex: pageIndex + 1 });
+    }
apps/web/package.json (1)

66-66: Document Qstash configuration requirements.

Since this dependency requires specific environment variables (QSTASH_TOKEN, QSTASH_CURRENT_SIGNING_KEY, QSTASH_NEXT_SIGNING_KEY), consider updating the project documentation.

Consider adding these requirements to your README.md or environment setup documentation:

# Add to README.md or .env.example
+# Qstash Configuration
+QSTASH_TOKEN=
+QSTASH_CURRENT_SIGNING_KEY=
+QSTASH_NEXT_SIGNING_KEY=
apps/web/utils/actions/categorize.ts (1)

104-270: Enhance error logging for better observability

While the error handling is good, adding structured error logging would help with debugging and monitoring in production.

Consider adding structured error logging:

+ import { logger } from "@/utils/logger"; // Add appropriate logger

  export async function categorizeSenders(userId: string, pageToken?: string) {
-   console.log("categorizeSendersAction", userId, pageToken);
+   logger.info("Starting categorizeSenders", { userId, pageToken });
    
    const user = await prisma.user.findUnique({
      where: { id: userId },
      // ...
    });
-   if (!user) return { error: "User not found" };
+   if (!user) {
+     logger.error("User not found in categorizeSenders", { userId });
+     return { error: "User not found" };
+   }
    // Apply similar changes to other error cases
  }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d8c9713 and f6a50a9.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (7)
  • apps/web/app/api/user/categorize/senders/batch/route.ts (1 hunks)
  • apps/web/app/api/user/categorize/senders/batch/trigger.ts (1 hunks)
  • apps/web/env.ts (1 hunks)
  • apps/web/package.json (1 hunks)
  • apps/web/utils/actions/categorize.ts (2 hunks)
  • apps/web/utils/upstash.ts (1 hunks)
  • turbo.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • apps/web/app/api/user/categorize/senders/batch/trigger.ts
  • apps/web/env.ts
  • turbo.json
🔇 Additional comments (3)
apps/web/utils/upstash.ts (1)

1-2: LGTM! Import statements are clean and well-organized.

The imports follow good practices with external dependencies first, followed by internal imports.

apps/web/app/api/user/categorize/senders/batch/route.ts (1)

16-19: Improve error handling and logging for batch processing.

The current implementation needs more comprehensive error handling and structured logging.

The previous review comment about improving error handling and logging is still applicable here.

apps/web/package.json (1)

66-66: Verify @upstash/qstash version and security.

The addition of @upstash/qstash for batch processing aligns well with the existing stack (you're already using @upstash/redis).

Let's verify the version and check for any security advisories:

Comment on lines 1 to 7
import { Client } from "@upstash/qstash";
import { env } from "@/env";

export function getQstashClient() {
if (!env.QSTASH_TOKEN) throw new Error("QSTASH_TOKEN is not set");
return new Client({ token: env.QSTASH_TOKEN });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Inconsistent error handling detected with getQstashClient usage.

  • apps/web/app/api/user/categorize/senders/batch/trigger.ts
🔗 Analysis chain

Verify consistent error handling across usage points.

Let's ensure that all code using this client properly handles potential errors.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Find all files that import and use the getQstashClient
echo "Checking for getQstashClient usage patterns..."

# Look for files importing the client
rg -l "getQstashClient" 

# Check for try-catch blocks around client usage
ast-grep --pattern 'try {
  $$$
  getQstashClient()
  $$$
} catch'

Length of output: 279

Comment on lines 126 to 127
const accessToken = user.accounts[0].access_token;
if (!accessToken) return { error: "No access token" };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add null check for user.accounts array

The code directly accesses user.accounts[0] without checking if the array is empty, which could lead to runtime errors.

Apply this fix:

-  const accessToken = user.accounts[0].access_token;
-  if (!accessToken) return { error: "No access token" };
+  if (!user.accounts?.length) return { error: "No accounts found" };
+  const accessToken = user.accounts[0].access_token;
+  if (!accessToken) return { error: "No access token" };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const accessToken = user.accounts[0].access_token;
if (!accessToken) return { error: "No access token" };
if (!user.accounts?.length) return { error: "No accounts found" };
const accessToken = user.accounts[0].access_token;
if (!accessToken) return { error: "No access token" };

Comment on lines 263 to 264
const accessToken = user.accounts[0].access_token;
if (!accessToken) return { error: "No access token" };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add null check for user.accounts array

Similar to the previous issue, the code directly accesses user.accounts[0] without checking if the array is empty.

Apply this fix:

-  const accessToken = user.accounts[0].access_token;
-  if (!accessToken) return { error: "No access token" };
+  if (!user.accounts?.length) return { error: "No accounts found" };
+  const accessToken = user.accounts[0].access_token;
+  if (!accessToken) return { error: "No access token" };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const accessToken = user.accounts[0].access_token;
if (!accessToken) return { error: "No access token" };
if (!user.accounts?.length) return { error: "No accounts found" };
const accessToken = user.accounts[0].access_token;
if (!accessToken) return { error: "No access token" };

Comment on lines 184 to 229
// TODO: this will take a while. so probably break this out, or stream results as they come in
const unknownSenders = [
...results,
...existingSenders.map((s) => ({
sender: s.email,
category: s.category?.name,
})),
].filter(
(r) =>
!r.category ||
r.category === defaultCategory.UNKNOWN.name ||
r.category === REQUEST_MORE_INFORMATION_CATEGORY,
);

console.log(
`Found ${unknownSenders.length} unknown senders to categorize with AI`,
);

for (const sender of unknownSenders) {
const messages = sendersResult.senders.get(sender.sender);

let previousEmails =
messages?.map((m) => m.snippet).filter(isDefined) || [];

if (previousEmails.length === 0) {
previousEmails = await getPreviousEmails(gmail, sender.sender);
}

const aiResults = await aiCategorizeSenders({
const aiResult = await aiCategorizeSender({
user,
senders: sendersToCategorizeWithAi.map((sender) => ({
emailAddress: sender,
snippets:
sendersResult.senders.get(sender)?.map((m) => m.snippet) || [],
})),
sender: sender.sender,
previousEmails,
categories,
});

const results = [...categorizedSenders, ...aiResults];

for (const result of results) {
await saveResult(result, categories, u.id);
if (aiResult) {
await saveResult(
{
sender: sender.sender,
category: aiResult.category,
},
categories,
userId,
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize processing of unknown senders

The sequential processing of unknown senders in a for loop could be slow for large datasets. The TODO comment also acknowledges this performance concern.

Consider implementing parallel processing using Promise.all:

-  for (const sender of unknownSenders) {
-    const messages = sendersResult.senders.get(sender.sender);
-    let previousEmails = messages?.map((m) => m.snippet).filter(isDefined) || [];
-    if (previousEmails.length === 0) {
-      previousEmails = await getPreviousEmails(gmail, sender.sender);
-    }
-    const aiResult = await aiCategorizeSender({
-      user,
-      sender: sender.sender,
-      previousEmails,
-      categories,
-    });
-    if (aiResult) {
-      await saveResult(
-        { sender: sender.sender, category: aiResult.category },
-        categories,
-        userId,
-      );
-    }
-  }
+  const processSender = async (sender: typeof unknownSenders[0]) => {
+    const messages = sendersResult.senders.get(sender.sender);
+    let previousEmails = messages?.map((m) => m.snippet).filter(isDefined) || [];
+    if (previousEmails.length === 0) {
+      previousEmails = await getPreviousEmails(gmail, sender.sender);
+    }
+    const aiResult = await aiCategorizeSender({
+      user,
+      sender: sender.sender,
+      previousEmails,
+      categories,
+    });
+    if (aiResult) {
+      await saveResult(
+        { sender: sender.sender, category: aiResult.category },
+        categories,
+        userId,
+      );
+    }
+  };
+
+  await Promise.allSettled(unknownSenders.map(processSender));

Note: Using Promise.allSettled instead of Promise.all ensures that if one sender fails, it won't affect the processing of others.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// TODO: this will take a while. so probably break this out, or stream results as they come in
const unknownSenders = [
...results,
...existingSenders.map((s) => ({
sender: s.email,
category: s.category?.name,
})),
].filter(
(r) =>
!r.category ||
r.category === defaultCategory.UNKNOWN.name ||
r.category === REQUEST_MORE_INFORMATION_CATEGORY,
);
console.log(
`Found ${unknownSenders.length} unknown senders to categorize with AI`,
);
for (const sender of unknownSenders) {
const messages = sendersResult.senders.get(sender.sender);
let previousEmails =
messages?.map((m) => m.snippet).filter(isDefined) || [];
if (previousEmails.length === 0) {
previousEmails = await getPreviousEmails(gmail, sender.sender);
}
const aiResults = await aiCategorizeSenders({
const aiResult = await aiCategorizeSender({
user,
senders: sendersToCategorizeWithAi.map((sender) => ({
emailAddress: sender,
snippets:
sendersResult.senders.get(sender)?.map((m) => m.snippet) || [],
})),
sender: sender.sender,
previousEmails,
categories,
});
const results = [...categorizedSenders, ...aiResults];
for (const result of results) {
await saveResult(result, categories, u.id);
if (aiResult) {
await saveResult(
{
sender: sender.sender,
category: aiResult.category,
},
categories,
userId,
);
}
}
// TODO: this will take a while. so probably break this out, or stream results as they come in
const unknownSenders = [
...results,
...existingSenders.map((s) => ({
sender: s.email,
category: s.category?.name,
})),
].filter(
(r) =>
!r.category ||
r.category === defaultCategory.UNKNOWN.name ||
r.category === REQUEST_MORE_INFORMATION_CATEGORY,
);
console.log(
`Found ${unknownSenders.length} unknown senders to categorize with AI`,
);
const processSender = async (sender: typeof unknownSenders[0]) => {
const messages = sendersResult.senders.get(sender.sender);
let previousEmails = messages?.map((m) => m.snippet).filter(isDefined) || [];
if (previousEmails.length === 0) {
previousEmails = await getPreviousEmails(gmail, sender.sender);
}
const aiResult = await aiCategorizeSender({
user,
sender: sender.sender,
previousEmails,
categories,
});
if (aiResult) {
await saveResult(
{ sender: sender.sender, category: aiResult.category },
categories,
userId,
);
}
};
await Promise.allSettled(unknownSenders.map(processSender));

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (17)
apps/web/app/api/user/categorize/senders/progress/route.ts (1)

5-7: Consider defining an explicit interface for CategorizeProgress

Instead of deriving the type from the implementation, consider defining an explicit interface. This would make the contract more maintainable and decoupled from the implementation details.

-export type CategorizeProgress = Awaited<
-  ReturnType<typeof getCategorizeProgress>
->;
+export interface CategorizeProgress {
+  // Define the expected shape of the progress data
+  total: number;
+  processed: number;
+  status: 'pending' | 'completed' | 'error';
+}
apps/web/app/(app)/smart-categories/CategorizeProgress.tsx (1)

7-11: Improve prop type definition and naming.

The refresh prop has inconsistent typing - it's marked as required but has a default value. Also, the prop name could be more descriptive.

Consider this improvement:

-export function CategorizeSendersProgress({
-  refresh = false,
-}: {
-  refresh: boolean;
-}) {
+export function CategorizeSendersProgress({
+  enableAutoRefresh = false,
+}: {
+  enableAutoRefresh?: boolean;
+}) {
apps/web/app/(app)/bulk-unsubscribe/ArchiveProgress.tsx (3)

25-31: LGTM: Clean implementation of ProgressPanel

The refactoring to use ProgressPanel improves code maintainability and reusability. The prop names are semantic and self-documenting.

Consider adding a brief JSDoc comment above the component to document its purpose and usage:

+/**
+ * Displays progress for bulk email archiving operations.
+ * Automatically cleans up 5 seconds after completion.
+ */
export const ArchiveProgress = memo(() => {

Add Type Safety for activeThreads

The review comment is valid. Consider adding TypeScript type guards or runtime checks to ensure type safety for activeThreads.

🔗 Analysis chain

Line range hint 11-12: Consider adding type safety for activeThreads

The comment suggests previous type-related issues. Consider adding TypeScript type guards or runtime checks to ensure type safety.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for type definitions and usage of activeThreads
ast-grep --pattern 'type $_Props = {
  $$$
  activeThreads: $_
  $$$
}'

# Look for any error handling related to activeThreads
rg -A 3 "activeThreads.*undefined|null" 

Length of output: 109017


Add Error Handling in ArchiveProgress.tsx

Implement proper error handling around the resetTotalThreads function to manage potential failures gracefully and enhance user experience during rapid operations.

🔗 Analysis chain

Line range hint 17-22: Verify cleanup timing and error handling

The 5-second delay before resetting might affect user experience in rapid operations. Additionally, ensure error states are properly handled.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for other uses of resetTotalThreads to understand the impact
rg "resetTotalThreads" 

# Search for error handling patterns in similar progress components
rg -A 5 "catch|error|Error" --type tsx

Length of output: 109017

apps/web/utils/redis/categorization-progress.ts (3)

5-9: Consider adding validation constraints to the schema.

The schema could benefit from additional validation rules to ensure data integrity:

  • pageIndex should probably be non-negative
  • sendersCategorized should be non-negative
 const categorizationProgressSchema = z.object({
-  pageIndex: z.number(),
-  sendersCategorized: z.number(),
+  pageIndex: z.number().min(0),
+  sendersCategorized: z.number().min(0),
 });

11-13: Validate userId to prevent Redis key injection.

While unlikely, malicious userId values could potentially affect Redis key patterns. Consider sanitizing or validating the userId.

 function getKey(userId: string) {
+  if (!userId || userId.includes(':')) {
+    throw new Error('Invalid userId provided');
+  }
   return `categorization-progress:${userId}`;
 }

1-55: Consider adding monitoring and rate limiting.

For production reliability:

  1. Add Redis operation monitoring/metrics
  2. Implement rate limiting for progress updates
  3. Consider adding a cleanup mechanism for stale progress entries

Would you like assistance in implementing any of these suggestions?

apps/web/app/api/user/categorize/senders/batch/route.ts (2)

9-9: Document the MAX_PAGES constant.

Add a comment explaining why 50 is chosen as the maximum number of pages and what happens when this limit is reached.

+// Maximum number of pages to process in a categorization batch to prevent infinite loops
 const MAX_PAGES = 50;

42-42: Enhance response with progress details.

Include additional information in the response to help track progress.

-    return NextResponse.json({ status: "processing" });
+    return NextResponse.json({
+      status: "processing",
+      progress: {
+        currentPage: pageIndex,
+        totalProcessed: categorizedCount,
+        hasMorePages: !!nextPageToken
+      }
+    });
apps/web/components/ProgressPanel.tsx (3)

1-13: LGTM! Consider adding JSDoc documentation.

The imports and type definitions are well-structured. The props interface is clear and properly typed.

Consider adding JSDoc documentation to the interface for better developer experience:

+/**
+ * Props for the ProgressPanel component
+ * @param totalItems - Total number of items to process
+ * @param remainingItems - Number of items yet to be processed
+ * @param inProgressText - Text to display while processing
+ * @param completedText - Text to display when complete
+ * @param itemLabel - Optional label for items (defaults to "items")
+ */
interface ProgressPanelProps {
  totalItems: number;
  remainingItems: number;
  inProgressText: string;
  completedText: string;
  itemLabel?: string;
}

15-59: LGTM! Consider enhancing accessibility.

The implementation is solid with good defensive programming and smooth animations. The progress calculation and state management are handled well.

Consider enhancing accessibility by adding these ARIA attributes:

-<div className="px-4 py-2">
+<div className="px-4 py-2" role="region" aria-label="Progress Status">
   <AnimatePresence mode="wait">
     <motion.div
       key="progress"
       initial={{ opacity: 1 }}
       exit={{ opacity: 0 }}
       transition={{ duration: 0.3 }}
     >
-      <ProgressBar
+      <ProgressBar
+        aria-valuemin={0}
+        aria-valuemax={100}
+        aria-valuenow={progress}
         value={progress}
         className="w-full"
         color={isCompleted ? "green" : "blue"}
       />

22-24: Consider memoizing progress calculations.

The progress calculations are performed on every render. While this might not be a performance bottleneck for most cases, consider memoizing these values if the component re-renders frequently.

+const totalProcessed = useMemo(
+  () => totalItems - remainingItems,
+  [totalItems, remainingItems]
+);
+const progress = useMemo(
+  () => (totalProcessed / totalItems) * 100,
+  [totalProcessed, totalItems]
+);
+const isCompleted = useMemo(
+  () => progress === 100,
+  [progress]
+);
-const totalProcessed = totalItems - remainingItems;
-const progress = (totalProcessed / totalItems) * 100;
-const isCompleted = progress === 100;

Don't forget to add the import:

+import { useMemo } from "react";
apps/web/app/(app)/smart-categories/page.tsx (3)

39-39: Consider adding error handling for progress fetching

While the progress fetching integration is clean, consider adding error handling to gracefully handle cases where progress retrieval fails. This ensures the page remains functional even if the progress tracking service is temporarily unavailable.

- const [senders, categories, user, progress] = await Promise.all([
+ const [senders, categories, user, progressResult] = await Promise.all([
    prisma.newsletter.findMany({/*...*/}),
    getUserCategories(session.user.id),
    prisma.user.findUnique({/*...*/}),
-   getCategorizationProgress({ userId: session.user.id }),
+   getCategorizationProgress({ userId: session.user.id }).catch(error => {
+     console.error('Failed to fetch categorization progress:', error);
+     return null;
+   }),
  ]);

Also applies to: 53-53


65-65: Consider adding type safety for the refresh prop

The refresh prop conversion to boolean is correct but could benefit from explicit typing for better maintainability.

- <CategorizeSendersProgress refresh={!!progress} />
+ <CategorizeSendersProgress refresh={Boolean(progress)} />

63-66: Consider adding a loading state for progress tracking

To prevent layout shifts and improve user experience, consider implementing a loading state for the progress tracking. This would help maintain UI stability while the progress data is being fetched.

 <ClientOnly>
   <ArchiveProgress />
-  <CategorizeSendersProgress refresh={!!progress} />
+  <Suspense fallback={<ProgressSkeleton />}>
+    <CategorizeSendersProgress refresh={!!progress} />
+  </Suspense>
 </ClientOnly>
apps/web/utils/actions/categorize.ts (1)

129-129: Add error handling for Gmail client initialization

The Gmail client initialization should be wrapped in a try-catch block to handle potential initialization errors.

-  const gmail = getGmailClient({ accessToken });
+  let gmail;
+  try {
+    gmail = getGmailClient({ accessToken });
+  } catch (error) {
+    console.error('Failed to initialize Gmail client:', error);
+    return { error: "Failed to initialize Gmail client" };
+  }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f6a50a9 and 8371899.

📒 Files selected for processing (12)
  • apps/web/app/(app)/bulk-unsubscribe/ArchiveProgress.tsx (2 hunks)
  • apps/web/app/(app)/smart-categories/CategorizeProgress.tsx (1 hunks)
  • apps/web/app/(app)/smart-categories/page.tsx (4 hunks)
  • apps/web/app/api/user/categorize/senders/batch/route.ts (1 hunks)
  • apps/web/app/api/user/categorize/senders/batch/trigger.ts (1 hunks)
  • apps/web/app/api/user/categorize/senders/progress/route.ts (1 hunks)
  • apps/web/components/ProgressPanel.tsx (1 hunks)
  • apps/web/env.ts (2 hunks)
  • apps/web/utils/actions/categorize.ts (2 hunks)
  • apps/web/utils/date.ts (1 hunks)
  • apps/web/utils/redis/categorization-progress.ts (1 hunks)
  • turbo.json (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • apps/web/app/api/user/categorize/senders/batch/trigger.ts
  • apps/web/env.ts
  • turbo.json
🔇 Additional comments (6)
apps/web/app/(app)/smart-categories/CategorizeProgress.tsx (1)

21-29: ⚠️ Potential issue

Fix incorrect progress calculation.

There's a critical issue in the progress calculation. Setting both totalItems and remainingItems to data.sendersCategorized will always result in showing 0% progress since they're equal.

Apply this fix:

   return (
     <ProgressPanel
       totalItems={data.sendersCategorized}
-      remainingItems={data.sendersCategorized}
+      remainingItems={data.sendersRemaining}
       inProgressText="Categorizing senders..."
       completedText="Categorizing complete!"
       itemLabel="senders"
     />
   );

Also, verify that the API response includes the correct fields:

apps/web/app/(app)/bulk-unsubscribe/ArchiveProgress.tsx (1)

5-5: LGTM: Clean import of reusable component

The import of ProgressPanel follows good practices by keeping UI components modular and reusable.

apps/web/utils/date.ts (1)

37-38: LGTM! Good refactoring of time constants.

The introduction of ONE_MINUTE_MS as a base unit and its usage in ONE_HOUR_MS improves code maintainability and readability while preserving the same values.

Let's verify that all dependent time constants maintain their correct values:

✅ Verification successful

Verified! The refactored time constants are consistently and correctly used across the codebase without any issues.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the time constants maintain their expected values
# by searching for any calculations or comparisons using these constants

# Search for usage of time constants in calculations or comparisons
rg -A 2 "ONE_(MINUTE|HOUR|DAY|MONTH|YEAR)_MS.*[+\-*/=<>]" 

# Search for any tests involving these constants
fd -e test.ts -e spec.ts -X rg "ONE_(MINUTE|HOUR|DAY|MONTH|YEAR)_MS"

Length of output: 1567


Script:

#!/bin/bash
# Description: Comprehensive search for usages of time constants in the codebase

# List all occurrences of the time constants
rg "ONE_(MINUTE|HOUR|DAY|MONTH|YEAR)_MS"

# Check for any reassignments or modifications of the constants
rg "ONE_(MINUTE|HOUR|DAY|MONTH|YEAR)_MS\s*="

# Identify files where magic numbers might be used instead of the constants
fd -e ts | xargs rg "[0-9]{4,}"

Length of output: 9943

apps/web/app/(app)/smart-categories/page.tsx (1)

28-29: LGTM: Clean import additions

The new imports for progress tracking functionality are well-organized and follow the project's conventions.

apps/web/utils/actions/categorize.ts (2)

126-127: ⚠️ Potential issue

Add null check for user.accounts array

Direct access to user.accounts[0] without checking the array could lead to runtime errors.

Apply this fix:

-  const accessToken = user.accounts[0].access_token;
-  if (!accessToken) return { error: "No access token" };
+  if (!user.accounts?.length) return { error: "No accounts found" };
+  const accessToken = user.accounts[0].access_token;
+  if (!accessToken) return { error: "No access token" };

Likely invalid or redundant comment.


266-267: ⚠️ Potential issue

Add null check for user.accounts array

Similar to the previous issue, direct access to user.accounts[0] without checking the array could lead to runtime errors.

Apply this fix:

-  const accessToken = user.accounts[0].access_token;
-  if (!accessToken) return { error: "No access token" };
+  if (!user.accounts?.length) return { error: "No accounts found" };
+  const accessToken = user.accounts[0].access_token;
+  if (!accessToken) return { error: "No access token" };

Likely invalid or redundant comment.

Comment on lines +9 to +12
async function getCategorizeProgress(userId: string) {
const progress = await getCategorizationProgress({ userId });
return progress;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and consider removing redundant wrapper

The getCategorizeProgress function is a simple pass-through wrapper that could be eliminated. If kept, it should include proper error handling and parameter validation.

 async function getCategorizeProgress(userId: string) {
+  if (!userId) throw new Error("userId is required");
+  
+  try {
   const progress = await getCategorizationProgress({ userId });
   return progress;
+  } catch (error) {
+    console.error("Failed to fetch categorization progress:", error);
+    throw new Error("Failed to fetch categorization progress");
+  }
 }

Alternatively, consider removing this wrapper function and using getCategorizationProgress directly in the route handler.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function getCategorizeProgress(userId: string) {
const progress = await getCategorizationProgress({ userId });
return progress;
}
async function getCategorizeProgress(userId: string) {
if (!userId) throw new Error("userId is required");
try {
const progress = await getCategorizationProgress({ userId });
return progress;
} catch (error) {
console.error("Failed to fetch categorization progress:", error);
throw new Error("Failed to fetch categorization progress");
}
}

Comment on lines 14 to 20
export async function GET() {
const session = await auth();
if (!session?.user) return NextResponse.json({ error: "Not authenticated" });

const result = await getCategorizeProgress(session.user.id);
return NextResponse.json(result);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add proper HTTP status codes and rate limiting

The route handler should use appropriate HTTP status codes and implement rate limiting to protect the API.

 export async function GET() {
   const session = await auth();
-  if (!session?.user) return NextResponse.json({ error: "Not authenticated" });
+  if (!session?.user) {
+    return NextResponse.json(
+      { error: "Not authenticated" },
+      { status: 401 }
+    );
+  }
 
   try {
     const result = await getCategorizeProgress(session.user.id);
     return NextResponse.json(result);
+  } catch (error) {
+    console.error("Error fetching progress:", error);
+    return NextResponse.json(
+      { error: "Internal server error" },
+      { status: 500 }
+    );
   }
 }

Consider implementing rate limiting using Next.js middleware or a service like Upstash Rate Limiter to protect against abuse.

Committable suggestion skipped: line range outside the PR's diff.

@@ -0,0 +1,20 @@
import { NextResponse } from "next/server";
import { auth } from "@/app/api/auth/[...nextauth]/auth";
import { getCategorizationProgress } from "@/utils/redis/categorization-progress";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Add error handling and Redis client management

The getCategorizationProgress function lacks try-catch blocks for managing Redis operation errors. Additionally, Redis client initialization should be reviewed to ensure proper connection management.

🔗 Analysis chain

Verify Redis utility implementation

Let's verify the implementation of the Redis utility function to ensure proper error handling and connection management.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for the Redis utility implementation
rg -A 10 "export.*getCategorizationProgress" --type typescript

# Look for Redis connection configuration
rg "createClient|REDIS_URL" --type typescript

Length of output: 182


Script:

#!/bin/bash
# Search for the Redis utility implementation
rg -A 10 "export.*getCategorizationProgress" --type ts

# Look for Redis connection configuration
rg "createClient|REDIS_URL" --type ts

Length of output: 1331


Script:

#!/bin/bash
# Check for error handling in getCategorizationProgress
rg "getCategorizationProgress" apps/web/utils/redis/categorization-progress.ts --context 5

# Search for try-catch blocks within categorization-progress.ts
rg "try\s*{" apps/web/utils/redis/categorization-progress.ts --context 5

# Verify Redis client initialization and connection management
rg "createClient" apps/web/utils/redis/*.{ts,tsx} --context 5

Length of output: 1123


Script:

#!/bin/bash
# Check for error handling in getCategorizationProgress
rg "getCategorizationProgress" apps/web/utils/redis/categorization-progress.ts --context 5

# Search for try-catch blocks within categorization-progress.ts with corrected regex
rg "try\s*\{" apps/web/utils/redis/categorization-progress.ts --context 5

# Verify Redis client initialization and connection management in .ts and .tsx files
rg "createClient" apps/web/utils/redis/*.{ts,tsx} --context 5

Length of output: 1046


Script:

#!/bin/bash
# Check for error handling in getCategorizationProgress
rg "getCategorizationProgress" apps/web/utils/redis/categorization-progress.ts --context 5

# Search for try-catch blocks within categorization-progress.ts with corrected regex
rg "try\s*\{" apps/web/utils/redis/categorization-progress.ts --context 5

# Verify Redis client initialization and connection management in .ts files
rg "createClient" apps/web/utils/redis/*.ts --context 5

Length of output: 950

Comment on lines 12 to 17
const { data } = useSWR<CategorizeProgress>(
"/api/user/categorize/senders/progress",
{
refreshInterval: refresh ? 1_000 : undefined,
},
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance data fetching implementation with proper error handling and loading states.

The current implementation has several areas for improvement:

  1. Missing error handling for API failures
  2. Abrupt null rendering during loading
  3. Potentially aggressive refresh interval (1 second)

Consider this improved implementation:

-  const { data } = useSWR<CategorizeProgress>(
+  const { data, error, isLoading } = useSWR<CategorizeProgress>(
     "/api/user/categorize/senders/progress",
     {
-      refreshInterval: refresh ? 1_000 : undefined,
+      refreshInterval: refresh ? 5_000 : undefined, // 5 seconds is more reasonable
+      onError: (err) => {
+        console.error("Failed to fetch categorization progress:", err);
+      },
     },
   );

+  if (error) {
+    return (
+      <div className="text-red-500">
+        Failed to load categorization progress
+      </div>
+    );
+  }

+  if (isLoading) {
+    return <ProgressPanel isLoading totalItems={0} remainingItems={0} />;
+  }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +15 to +24
export async function getCategorizationProgress({
userId,
}: {
userId: string;
}) {
const key = getKey(userId);
const progress = await redis.get<RedisCategorizationProgress>(key);
if (!progress) return null;
return progress;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling and type safety to getCategorizationProgress.

The function should handle Redis operation failures and ensure type safety:

 export async function getCategorizationProgress({
   userId,
 }: {
   userId: string;
-}) {
+}): Promise<RedisCategorizationProgress | null> {
   const key = getKey(userId);
-  const progress = await redis.get<RedisCategorizationProgress>(key);
-  if (!progress) return null;
-  return progress;
+  try {
+    const progress = await redis.get<RedisCategorizationProgress>(key);
+    if (!progress) return null;
+    return categorizationProgressSchema.parse(progress);
+  } catch (error) {
+    console.error('Failed to get categorization progress:', error);
+    throw new Error('Failed to get categorization progress');
+  }
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 19 to 24
// Process the batch
const { nextPageToken, categorizedCount } = await categorizeSenders(
userId,
pageToken,
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add timeout handling for batch processing.

Long-running batch operations should have a timeout to prevent hanging requests and resource exhaustion.

     // Process the batch
+    const TIMEOUT_MS = 30000; // 30 seconds
+    const timeoutPromise = new Promise((_, reject) =>
+      setTimeout(() => reject(new Error('Batch processing timeout')), TIMEOUT_MS)
+    );
+    
+    const processingPromise = categorizeSenders(userId, pageToken);
+    
+    const { nextPageToken, categorizedCount } = await Promise.race([
+      processingPromise,
+      timeoutPromise,
+    ]);
-    const { nextPageToken, categorizedCount } = await categorizeSenders(
-      userId,
-      pageToken,
-    );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Process the batch
const { nextPageToken, categorizedCount } = await categorizeSenders(
userId,
pageToken,
);
// Process the batch
const TIMEOUT_MS = 30000; // 30 seconds
const timeoutPromise = new Promise((_, reject) =>
setTimeout(() => reject(new Error('Batch processing timeout')), TIMEOUT_MS)
);
const processingPromise = categorizeSenders(userId, pageToken);
const { nextPageToken, categorizedCount } = await Promise.race([
processingPromise,
timeoutPromise,
]);

Comment on lines 25 to 29
await saveCategorizationProgress({
userId,
pageIndex: pageIndex + 1,
incrementCount: categorizedCount || 0,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and validation for progress tracking.

The progress tracking should handle Redis failures gracefully and validate the categorizedCount.

+    if (categorizedCount < 0) {
+      console.error("Invalid categorizedCount:", categorizedCount);
+      return NextResponse.json(
+        { error: "Invalid categorization count" },
+        { status: 500 }
+      );
+    }
+
+    try {
       await saveCategorizationProgress({
         userId,
         pageIndex: pageIndex + 1,
         incrementCount: categorizedCount || 0,
       });
+    } catch (error) {
+      console.error("Failed to save progress:", error);
+      // Continue processing as this is non-critical
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await saveCategorizationProgress({
userId,
pageIndex: pageIndex + 1,
incrementCount: categorizedCount || 0,
});
if (categorizedCount < 0) {
console.error("Invalid categorizedCount:", categorizedCount);
return NextResponse.json(
{ error: "Invalid categorization count" },
{ status: 500 }
);
}
try {
await saveCategorizationProgress({
userId,
pageIndex: pageIndex + 1,
incrementCount: categorizedCount || 0,
});
} catch (error) {
console.error("Failed to save progress:", error);
// Continue processing as this is non-critical
}

Comment on lines 31 to 40
// Check if completed
if (pageIndex >= MAX_PAGES)
return NextResponse.json({ status: "completed" });
if (!nextPageToken) return NextResponse.json({ status: "completed" });

await triggerCategorizeBatch({
userId,
pageToken: nextPageToken,
pageIndex: pageIndex + 1,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle race conditions and add error handling for batch triggering.

The current implementation has potential race conditions and lacks error handling for the next batch trigger.

     // Check if completed
-    if (pageIndex >= MAX_PAGES)
+    const isLastPage = pageIndex >= MAX_PAGES;
+    const hasMorePages = !!nextPageToken;
+    
+    if (isLastPage || !hasMorePages) {
+      try {
+        // Mark categorization as completed in your storage
+        await saveCategorizationProgress({
+          userId,
+          pageIndex: pageIndex + 1,
+          incrementCount: 0,
+          status: 'completed'
+        });
+      } catch (error) {
+        console.error("Failed to mark completion:", error);
+      }
       return NextResponse.json({ status: "completed" });
-    if (!nextPageToken) return NextResponse.json({ status: "completed" });
+    }

-    await triggerCategorizeBatch({
-      userId,
-      pageToken: nextPageToken,
-      pageIndex: pageIndex + 1,
-    });
+    try {
+      await triggerCategorizeBatch({
+        userId,
+        pageToken: nextPageToken,
+        pageIndex: pageIndex + 1,
+      });
+    } catch (error) {
+      console.error("Failed to trigger next batch:", error);
+      return NextResponse.json(
+        { error: "Failed to schedule next batch" },
+        { status: 500 }
+      );
+    }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 11 to 15
export const POST = withError(
verifySignatureAppRouter(async (request: Request) => {
const json = await request.json();
const body = categorizeSendersBatchSchema.parse(json);
const { userId, pageToken, pageIndex } = body;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add request size limits and additional validation.

The endpoint should implement:

  1. Request body size limits to prevent memory issues
  2. Additional validation for userId format
 export const POST = withError(
   verifySignatureAppRouter(async (request: Request) => {
+    const contentLength = request.headers.get('content-length');
+    if (contentLength && parseInt(contentLength) > 1024 * 1024) {
+      return NextResponse.json(
+        { error: 'Request body too large' },
+        { status: 413 }
+      );
+    }
     const json = await request.json();
     const body = categorizeSendersBatchSchema.parse(json);
     const { userId, pageToken, pageIndex } = body;
+    
+    if (!/^[a-zA-Z0-9-_]+$/.test(userId)) {
+      return NextResponse.json(
+        { error: 'Invalid userId format' },
+        { status: 400 }
+      );
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const POST = withError(
verifySignatureAppRouter(async (request: Request) => {
const json = await request.json();
const body = categorizeSendersBatchSchema.parse(json);
const { userId, pageToken, pageIndex } = body;
export const POST = withError(
verifySignatureAppRouter(async (request: Request) => {
const contentLength = request.headers.get('content-length');
if (contentLength && parseInt(contentLength) > 1024 * 1024) {
return NextResponse.json(
{ error: 'Request body too large' },
{ status: 413 }
);
}
const json = await request.json();
const body = categorizeSendersBatchSchema.parse(json);
const { userId, pageToken, pageIndex } = body;
if (!/^[a-zA-Z0-9-_]+$/.test(userId)) {
return NextResponse.json(
{ error: 'Invalid userId format' },
{ status: 400 }
);
}

Comment on lines 204 to 232
for (const sender of unknownSenders) {
const messages = sendersResult.senders.get(sender.sender);

let previousEmails =
messages?.map((m) => m.snippet).filter(isDefined) || [];

if (previousEmails.length === 0) {
previousEmails = await getPreviousEmails(gmail, sender.sender);
}

const aiResults = await aiCategorizeSenders({
const aiResult = await aiCategorizeSender({
user,
senders: sendersToCategorizeWithAi.map((sender) => ({
emailAddress: sender,
snippets:
sendersResult.senders.get(sender)?.map((m) => m.snippet) || [],
})),
sender: sender.sender,
previousEmails,
categories,
});

const results = [...categorizedSenders, ...aiResults];

for (const result of results) {
await saveResult(result, categories, u.id);
if (aiResult) {
await saveResult(
{
sender: sender.sender,
category: aiResult.category,
},
categories,
userId,
);
categorizedCount++;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize processing of unknown senders

The sequential processing in the for loop could be slow for large datasets, as noted in the TODO comment.

Consider implementing parallel processing:

-  for (const sender of unknownSenders) {
-    const messages = sendersResult.senders.get(sender.sender);
-    let previousEmails = messages?.map((m) => m.snippet).filter(isDefined) || [];
-    if (previousEmails.length === 0) {
-      previousEmails = await getPreviousEmails(gmail, sender.sender);
-    }
-    const aiResult = await aiCategorizeSender({
-      user,
-      sender: sender.sender,
-      previousEmails,
-      categories,
-    });
-    if (aiResult) {
-      await saveResult(
-        { sender: sender.sender, category: aiResult.category },
-        categories,
-        userId,
-      );
-      categorizedCount++;
-    }
-  }
+  const processSender = async (sender: typeof unknownSenders[0]) => {
+    const messages = sendersResult.senders.get(sender.sender);
+    let previousEmails = messages?.map((m) => m.snippet).filter(isDefined) || [];
+    if (previousEmails.length === 0) {
+      previousEmails = await getPreviousEmails(gmail, sender.sender);
+    }
+    const aiResult = await aiCategorizeSender({
+      user,
+      sender: sender.sender,
+      previousEmails,
+      categories,
+    });
+    if (aiResult) {
+      await saveResult(
+        { sender: sender.sender, category: aiResult.category },
+        categories,
+        userId,
+      );
+      return true;
+    }
+    return false;
+  };
+
+  const results = await Promise.allSettled(unknownSenders.map(processSender));
+  categorizedCount += results.filter(
+    (r) => r.status === "fulfilled" && r.value
+  ).length;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (const sender of unknownSenders) {
const messages = sendersResult.senders.get(sender.sender);
let previousEmails =
messages?.map((m) => m.snippet).filter(isDefined) || [];
if (previousEmails.length === 0) {
previousEmails = await getPreviousEmails(gmail, sender.sender);
}
const aiResults = await aiCategorizeSenders({
const aiResult = await aiCategorizeSender({
user,
senders: sendersToCategorizeWithAi.map((sender) => ({
emailAddress: sender,
snippets:
sendersResult.senders.get(sender)?.map((m) => m.snippet) || [],
})),
sender: sender.sender,
previousEmails,
categories,
});
const results = [...categorizedSenders, ...aiResults];
for (const result of results) {
await saveResult(result, categories, u.id);
if (aiResult) {
await saveResult(
{
sender: sender.sender,
category: aiResult.category,
},
categories,
userId,
);
categorizedCount++;
}
}
const processSender = async (sender: typeof unknownSenders[0]) => {
const messages = sendersResult.senders.get(sender.sender);
let previousEmails = messages?.map((m) => m.snippet).filter(isDefined) || [];
if (previousEmails.length === 0) {
previousEmails = await getPreviousEmails(gmail, sender.sender);
}
const aiResult = await aiCategorizeSender({
user,
sender: sender.sender,
previousEmails,
categories,
});
if (aiResult) {
await saveResult(
{ sender: sender.sender, category: aiResult.category },
categories,
userId,
);
return true;
}
return false;
};
const results = await Promise.allSettled(unknownSenders.map(processSender));
categorizedCount += results.filter(
(r) => r.status === "fulfilled" && r.value
).length;

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (6)
apps/web/utils/redis/categorization-progress.ts (2)

12-14: Add input validation to getKey

The function should validate userId to prevent invalid Redis keys.

 function getKey(userId: string) {
+  if (!userId?.trim()) throw new Error("userId is required");
   return `categorization-progress:${userId}`;
 }

1-58: Consider architectural improvements

Several architectural aspects could be enhanced:

  1. The 5-minute expiration might be too short for bulk operations
  2. No retry mechanism for Redis operations
  3. Missing monitoring/metrics for tracking Redis operation success/failure

Consider:

  1. Making expiration time configurable via environment variable
  2. Adding retry mechanism for Redis operations
  3. Adding monitoring/metrics for Redis operations health
apps/web/components/ProgressPanel.tsx (4)

7-19: Add JSDoc documentation and props validation

Consider adding JSDoc documentation for better IDE support and props validation for edge cases.

+/**
+ * A progress panel component that displays a progress bar and status message
+ * @param {Object} props
+ * @param {number} props.totalItems - Total number of items to process
+ * @param {number} props.remainingItems - Number of items remaining to process
+ * @param {string} props.inProgressText - Text to display while processing
+ * @param {string} props.completedText - Text to display when complete
+ * @param {string} props.itemLabel - Label for the items being processed
+ */
 export function ProgressPanel({
   totalItems,
   remainingItems,
   inProgressText,
   completedText,
   itemLabel,
 }: {
-  totalItems: number;
-  remainingItems: number;
+  totalItems: number & NonNegative;
+  remainingItems: number & NonNegative;
   inProgressText: string;
   completedText: string;
   itemLabel: string;
 }) {
+  if (remainingItems > totalItems) {
+    throw new Error("remainingItems cannot be greater than totalItems");
+  }

20-24: Optimize calculations with useMemo

Consider memoizing the calculated values to prevent unnecessary recalculations on re-renders.

+import { useMemo } from "react";
+
 export function ProgressPanel({
   // ... props
 }) {
-  const totalProcessed = totalItems - remainingItems;
-  const progress = (totalProcessed / totalItems) * 100;
-  const isCompleted = progress === 100;
+  const { totalProcessed, progress, isCompleted } = useMemo(() => ({
+    totalProcessed: totalItems - remainingItems,
+    progress: (totalItems - remainingItems) / totalItems * 100,
+    isCompleted: remainingItems === 0
+  }), [totalItems, remainingItems]);

   if (!totalItems) return null;

26-56: Enhance accessibility features

The component has good accessibility foundations but could be improved further.

 <div className="px-4 py-2">
+  <div role="region" aria-label="Progress Status">
     <AnimatePresence mode="wait">
       <motion.div
         key="progress"
         initial={{ opacity: 1 }}
         exit={{ opacity: 0 }}
         transition={{ duration: 0.3 }}
       >
         <ProgressBar
           value={progress}
           className="w-full"
           color={isCompleted ? "green" : "blue"}
+          aria-valuemin={0}
+          aria-valuemax={100}
+          aria-valuenow={progress}
+          aria-valuetext={`${progress.toFixed(0)}% complete`}
         />
-        <p className="mt-2 flex justify-between text-sm" aria-live="polite">
+        <p className="mt-2 flex justify-between text-sm" aria-live="assertive">
           <span
             className={cn(
               "text-muted-foreground",
               isCompleted ? "text-green-500" : "",
             )}
           >
             {isCompleted ? completedText : inProgressText}
           </span>
           <span>
             {totalProcessed} of {totalItems} {itemLabel} processed
           </span>
         </p>
       </motion.div>
     </AnimatePresence>
+  </div>
 </div>

1-57: Well-implemented progress panel component

The component is well-structured with good separation of concerns, proper animation handling, and accessibility considerations. The implementation successfully fulfills the requirements described in the PR summary for providing visual feedback during bulk categorization operations.

Consider extracting the progress calculation logic into a custom hook if this pattern is used across multiple components in the application:

function useProgress(totalItems: number, remainingItems: number) {
  return useMemo(() => ({
    totalProcessed: totalItems - remainingItems,
    progress: (totalItems - remainingItems) / totalItems * 100,
    isCompleted: remainingItems === 0
  }), [totalItems, remainingItems]);
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8371899 and c245e6c.

📒 Files selected for processing (4)
  • apps/web/app/(app)/smart-categories/CategorizeProgress.tsx (1 hunks)
  • apps/web/app/api/user/categorize/senders/batch/route.ts (1 hunks)
  • apps/web/components/ProgressPanel.tsx (1 hunks)
  • apps/web/utils/redis/categorization-progress.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/web/app/(app)/smart-categories/CategorizeProgress.tsx
  • apps/web/app/api/user/categorize/senders/batch/route.ts
🔇 Additional comments (1)
apps/web/utils/redis/categorization-progress.ts (1)

16-58: Address previous review comments

The existing review comments for these functions are still valid and should be addressed:

  1. Add error handling and type safety to getCategorizationProgress
  2. Enhance saveCategorizationProgress with validation and configuration
  3. Improve deleteCategorizationProgress with error handling

Comment on lines 5 to 10
const categorizationProgressSchema = z.object({
pageIndex: z.number(),
categorized: z.number(),
remaining: z.number(),
});
type RedisCategorizationProgress = z.infer<typeof categorizationProgressSchema>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance schema validation rules

The schema should include additional constraints to ensure data integrity:

  1. remaining should be non-negative
  2. categorized should be non-negative
  3. Consider adding a relationship validation between fields
 const categorizationProgressSchema = z.object({
-  pageIndex: z.number(),
-  categorized: z.number(),
-  remaining: z.number(),
+  pageIndex: z.number().int().min(0),
+  categorized: z.number().int().min(0),
+  remaining: z.number().int().min(0),
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const categorizationProgressSchema = z.object({
pageIndex: z.number(),
categorized: z.number(),
remaining: z.number(),
});
type RedisCategorizationProgress = z.infer<typeof categorizationProgressSchema>;
const categorizationProgressSchema = z.object({
pageIndex: z.number().int().min(0),
categorized: z.number().int().min(0),
remaining: z.number().int().min(0),
});
type RedisCategorizationProgress = z.infer<typeof categorizationProgressSchema>;

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (7)
apps/web/utils/actions/categorize.ts (1)

120-120: Remove debug console.log statement

Consider removing the console.log statement as it's not needed in production code.

-    console.log("fastCategorizeSendersAction");
apps/web/utils/categorize/senders/categorize.ts (6)

54-58: Handle Insufficient AI Access More Effectively

Currently, the function returns an error message when the user lacks AI access. Consider throwing an exception or providing a redirect to an upgrade page to enhance the user experience and guide the user toward obtaining AI access.


75-81: Optimize Database Query by Limiting Selected Fields

In the existingSenders query, you are selecting email, name, and description from the nested category relation. If description is not used later in the code, consider removing it from the select clause to reduce data retrieval and improve performance.

Apply this diff if description is unnecessary:

       select: {
         email: true,
         category: { select: {
           name: true,
-          description: true
         } },
       },

85-85: Provide User Guidance When No Categories Are Found

The error message "No categories found" might be unclear to users. Consider providing additional guidance or a call to action, such as prompting the user to create categories or checking if they have the necessary permissions.


141-146: Avoid Redundant API Calls When Retrieving Previous Emails

If previousEmails.length === 0, the code fetches previous emails again using getPreviousEmails. This can lead to unnecessary API calls. Consider restructuring the logic to ensure that email snippets are only fetched once per sender.

Apply this diff to optimize the logic:

        let previousEmails =
          messages?.map((m) => m.snippet).filter(isDefined) || [];

-       if (previousEmails.length === 0) {
+       if (!messages || previousEmails.length === 0) {
          previousEmails = await getPreviousEmails(gmail, sender.sender);
        }

190-195: Consistent AI Access Check Across Functions

The AI access check logic is repeated in both categorizeSenders and fastCategorizeSenders. Consider abstracting this check into a helper function to promote code reuse and maintainability.

Example:

async function hasUserAiAccess(user: User & { premium?: { aiAutomationAccess: boolean } }) {
  return hasAiAccess(user.premium?.aiAutomationAccess, user.aiApiKey);
}

285-293: Cache Previous Emails to Reduce API Calls

In the getPreviousEmails function, consider implementing caching for previously fetched emails to minimize redundant API calls when the same sender is processed multiple times.

Example using a simple in-memory cache:

const emailCache = new Map<string, string[]>();

async function getPreviousEmails(gmail: gmail_v1.Gmail, sender: string) {
  if (emailCache.has(sender)) {
    return emailCache.get(sender)!;
  }
  const threadsFromSender = await getThreadsFromSender(gmail, sender, 3);
  const previousEmails = threadsFromSender
    .map((t) => t?.snippet)
    .filter(isDefined);
  emailCache.set(sender, previousEmails);
  return previousEmails;
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c245e6c and b8747d7.

📒 Files selected for processing (4)
  • apps/web/app/api/google/webhook/process-history.ts (1 hunks)
  • apps/web/app/api/user/categorize/senders/batch/route.ts (1 hunks)
  • apps/web/utils/actions/categorize.ts (4 hunks)
  • apps/web/utils/categorize/senders/categorize.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • apps/web/app/api/google/webhook/process-history.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/web/app/api/user/categorize/senders/batch/route.ts
🔇 Additional comments (5)
apps/web/utils/actions/categorize.ts (4)

20-27: LGTM! New imports are well-organized

The new imports are properly structured and align with the added functionality for bulk categorization and sender management.


Line range hint 77-93: LGTM! Function rename reflects batch processing capability

The function has been appropriately renamed to bulkCategorizeSendersAction and now uses batch processing through triggerCategorizeBatch.


121-125: LGTM! Clean implementation with proper error handling

The function follows the established pattern and properly handles authentication and error cases.


104-105: ⚠️ Potential issue

Add null check for user.accounts array

The code directly accesses user.accounts[0] without checking if the array is empty.

Apply this fix:

-  const accessToken = user.accounts[0].access_token;
-  if (!accessToken) return { error: "No access token" };
+  if (!user.accounts?.length) return { error: "No accounts found" };
+  const accessToken = user.accounts[0].access_token;
+  if (!accessToken) return { error: "No access token" };
apps/web/utils/categorize/senders/categorize.ts (1)

220-222: Handle Rate Limits When Fetching Previous Emails

Fetching previous emails for each sender in a loop may hit API rate limits. Implement batching or introduce delays between calls to prevent exceeding the API's rate limits.

[performance_issue]

Example:

// Introduce a delay between API calls
for (const sender of sendersToCategorizeWithAi) {
  await new Promise((resolve) => setTimeout(resolve, 100)); // 100ms delay
  const previousEmails = await getPreviousEmails(gmail, sender);
  sendersWithSnippets.set(sender, previousEmails);
}

Comment on lines 60 to 61
const accessToken = user.accounts[0].access_token;
if (!accessToken) return { error: "No access token" };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure Access Token Existence Before Usage

The code assumes that user.accounts[0] exists and contains an access_token. If user.accounts is empty or undefined, this will lead to a runtime error. Introduce a check to verify that user.accounts[0] exists before attempting to access access_token.

Apply this diff to fix the issue:

- const accessToken = user.accounts[0].access_token;
+ const accessToken = user.accounts[0]?.access_token;
  if (!accessToken) return { error: "No access token" };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const accessToken = user.accounts[0].access_token;
if (!accessToken) return { error: "No access token" };
const accessToken = user.accounts[0]?.access_token;
if (!accessToken) return { error: "No access token" };

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)
apps/web/app/(app)/smart-categories/CategorizeProgress.tsx (2)

11-16: Add JSDoc documentation for the custom hook.

Consider adding JSDoc documentation to improve maintainability and provide better IDE support.

+/**
+ * Hook to manage the bulk categorization progress state.
+ * @returns {Object} Object containing the current state and setter
+ * @returns {boolean} isBulkCategorizing - Whether bulk categorization is in progress
+ * @returns {(value: boolean) => void} setIsBulkCategorizing - Function to update the state
+ */
 export function useCategorizeProgress() {

27-27: Consider adjusting the refresh interval based on the total items.

A fixed 1-second refresh interval might be too aggressive, especially with a large number of items. Consider implementing a dynamic refresh interval based on the total items count or the progress percentage.

-      refreshInterval: refresh || isBulkCategorizing ? 1_000 : undefined,
+      refreshInterval: refresh || isBulkCategorizing
+        ? Math.max(1_000, Math.min(data?.totalItems ?? 0, 5_000))
+        : undefined,
apps/web/app/(app)/smart-categories/CategorizeWithAiButton.tsx (1)

38-38: Consider resetting bulk categorizing state on completion

While the state is properly set at the start, consider resetting setIsBulkCategorizing to false in both success and error cases.

 if (isActionError(result)) {
   setIsCategorizing(false);
+  setIsBulkCategorizing(false);
   throw new Error(result.error);
 }

 setIsCategorizing(false);
+setIsBulkCategorizing(false);

Also applies to: 40-41

apps/web/utils/actions/categorize.ts (1)

33-33: Consider making the LIMIT configurable

The hardcoded LIMIT value could be moved to an environment variable or configuration file for better flexibility.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 140b828 and fadbe8f.

📒 Files selected for processing (4)
  • apps/web/app/(app)/smart-categories/CategorizeProgress.tsx (1 hunks)
  • apps/web/app/(app)/smart-categories/CategorizeWithAiButton.tsx (3 hunks)
  • apps/web/utils/actions/categorize.ts (1 hunks)
  • apps/web/utils/redis/categorization-progress.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/web/utils/redis/categorization-progress.ts
🔇 Additional comments (9)
apps/web/app/(app)/smart-categories/CategorizeProgress.tsx (1)

24-29: ⚠️ Potential issue

Enhance data fetching implementation.

The current implementation is missing error handling and proper loading states. This was previously flagged in past reviews but hasn't been addressed.

-  const { data } = useSWR<CategorizeProgress>(
+  const { data, error, isLoading } = useSWR<CategorizeProgress>(
     "/api/user/categorize/senders/progress",
     {
       refreshInterval: refresh || isBulkCategorizing ? 1_000 : undefined,
+      onError: (err) => {
+        console.error("Failed to fetch categorization progress:", err);
+      },
     },
   );
apps/web/app/(app)/smart-categories/CategorizeWithAiButton.tsx (5)

7-7: LGTM: Import changes align with the action renaming and new features

The imports correctly reflect the renamed bulk categorization action and add necessary types for button customization and progress tracking.

Also applies to: 12-13


15-19: LGTM: Component signature enhancement adds flexibility

The addition of optional buttonProps allows for customization while maintaining backward compatibility.


24-25: LGTM: Progress tracking integration

The integration with useCategorizeProgress hook enables user feedback during the categorization process.


54-58: LGTM: Improved user feedback messages

The toast messages now provide more specific feedback based on the actual state of uncategorized senders.


66-66: LGTM: Flexible button customization

The implementation properly handles custom button content while maintaining sensible defaults.

Also applies to: 68-73

apps/web/utils/actions/categorize.ts (3)

4-21: LGTM! Well-organized imports and logging setup.

The new imports are logically grouped and the scoped logger implementation will help with debugging and monitoring.


100-115: Add input validation for senderAddress

This issue was already identified in a previous review. The suggestion to add email validation remains valid.


82-86: 🛠️ Refactor suggestion

Add error handling for queue operations

The queue publishing operation should include error handling to ensure system reliability.

-      await publishToAiCategorizeSendersQueue({
-        userId: user.id,
-        senders: uncategorizedSenders,
-      });
+      try {
+        await publishToAiCategorizeSendersQueue({
+          userId: user.id,
+          senders: uncategorizedSenders,
+        });
+      } catch (error) {
+        logger.error("Failed to publish to queue", {
+          userId: user.id,
+          sendersCount: uncategorizedSenders.length,
+          error,
+        });
+        throw error;
+      }

Likely invalid or redundant comment.

Comment on lines +60 to +70
for (let i = 0; i < 20; i++) {
const newUncategorizedSenders = await getUncategorizedSenders(i * LIMIT);

if (categories.length === 0) return { error: "No categories found" };
logger.trace("Got uncategorized senders", {
userId: user.id,
uncategorizedSenders: newUncategorizedSenders.length,
});

// pre-categorize senders with static rules
const categorizedSenders = preCategorizeSendersWithStaticRules(senders);
if (newUncategorizedSenders.length === 0) continue;
uncategorizedSenders.push(...newUncategorizedSenders);
totalUncategorizedSenders += newUncategorizedSenders.length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve batch processing configuration and error handling

The hardcoded iteration limit (20) and the accumulation of uncategorized senders could be improved:

  1. Make the iteration limit configurable
  2. Add error handling for the getUncategorizedSenders call
  3. Consider implementing a maximum total senders limit
-    for (let i = 0; i < 20; i++) {
+    const MAX_ITERATIONS = process.env.MAX_ITERATIONS ? parseInt(process.env.MAX_ITERATIONS) : 20;
+    const MAX_TOTAL_SENDERS = process.env.MAX_TOTAL_SENDERS ? parseInt(process.env.MAX_TOTAL_SENDERS) : 2000;
+    
+    for (let i = 0; i < MAX_ITERATIONS; i++) {
+      try {
         const newUncategorizedSenders = await getUncategorizedSenders(i * LIMIT);
+        if (totalUncategorizedSenders + newUncategorizedSenders.length > MAX_TOTAL_SENDERS) {
+          logger.warn("Reached maximum total senders limit", {
+            userId: user.id,
+            totalUncategorizedSenders,
+            maxLimit: MAX_TOTAL_SENDERS,
+          });
+          break;
+        }
         
         if (newUncategorizedSenders.length === 0) continue;
         uncategorizedSenders.push(...newUncategorizedSenders);
         totalUncategorizedSenders += newUncategorizedSenders.length;
+      } catch (error) {
+        logger.error("Failed to get uncategorized senders", {
+          userId: user.id,
+          iteration: i,
+          error,
+        });
+        break;
+      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (let i = 0; i < 20; i++) {
const newUncategorizedSenders = await getUncategorizedSenders(i * LIMIT);
if (categories.length === 0) return { error: "No categories found" };
logger.trace("Got uncategorized senders", {
userId: user.id,
uncategorizedSenders: newUncategorizedSenders.length,
});
// pre-categorize senders with static rules
const categorizedSenders = preCategorizeSendersWithStaticRules(senders);
if (newUncategorizedSenders.length === 0) continue;
uncategorizedSenders.push(...newUncategorizedSenders);
totalUncategorizedSenders += newUncategorizedSenders.length;
const MAX_ITERATIONS = process.env.MAX_ITERATIONS ? parseInt(process.env.MAX_ITERATIONS) : 20;
const MAX_TOTAL_SENDERS = process.env.MAX_TOTAL_SENDERS ? parseInt(process.env.MAX_TOTAL_SENDERS) : 2000;
for (let i = 0; i < MAX_ITERATIONS; i++) {
try {
const newUncategorizedSenders = await getUncategorizedSenders(i * LIMIT);
if (totalUncategorizedSenders + newUncategorizedSenders.length > MAX_TOTAL_SENDERS) {
logger.warn("Reached maximum total senders limit", {
userId: user.id,
totalUncategorizedSenders,
maxLimit: MAX_TOTAL_SENDERS,
});
break;
}
logger.trace("Got uncategorized senders", {
userId: user.id,
uncategorizedSenders: newUncategorizedSenders.length,
});
if (newUncategorizedSenders.length === 0) continue;
uncategorizedSenders.push(...newUncategorizedSenders);
totalUncategorizedSenders += newUncategorizedSenders.length;
} catch (error) {
logger.error("Failed to get uncategorized senders", {
userId: user.id,
iteration: i,
error,
});
break;
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (8)
apps/web/utils/actions/unsubscriber.ts (2)

23-29: Enhance the comment to be more specific

The query logic with OR condition is robust, but the comment could be more descriptive.

-    // Doing this in two steps so that can check for both formats of email
+    // Check for both original and normalized email formats to handle existing entries

31-38: Add error handling for upsert operation

While the upsert logic is correct, consider adding error handling for cases where existingSender?.id is undefined.

+    if (!existingSender?.id) {
+      // Handle new newsletter creation
+      return await prisma.newsletter.create({
+        data: {
+          status,
+          email: sender,
+          user: { connect: { id: session.user.id } },
+        },
+      });
+    }
+
     return await prisma.newsletter.upsert({
       where: { id: existingSender?.id },
       create: {
         status,
         email: sender,
         user: { connect: { id: session.user.id } },
       },
       update: { status },
     });
apps/web/app/api/user/stats/new-senders/route.ts (1)

46-48: Consider implementing an email comparison service

Given this is part of a broader email standardization effort, consider:

  1. Creating a dedicated email comparison service/utility
  2. Adding memoization for frequently compared addresses
  3. Implementing validation caching if performance becomes a concern

This would help maintain consistency across the application and make it easier to update email comparison logic in the future.

apps/web/app/(app)/smart-categories/page.tsx (3)

40-40: Consider enhancing error handling for Promise.all

While the functionality is correct, consider adding proper error handling for the parallel requests to improve reliability.

-  const [senders, categories, user, progress] = await Promise.all([
+  try {
+    const [senders, categories, user, progress] = await Promise.all([
     prisma.newsletter.findMany({
       where: { userId: session.user.id, categoryId: { not: null } },
       select: {
         id: true,
         email: true,
         category: { select: { id: true, description: true, name: true } },
       },
     }),
     getUserCategories(session.user.id),
     prisma.user.findUnique({
       where: { id: session.user.id },
       select: { autoCategorizeSenders: true },
     }),
     getCategorizationProgress({ userId: session.user.id }),
-  ]);
+    ]);
+  } catch (error) {
+    console.error('Failed to fetch category data:', error);
+    throw new Error('Failed to load categories. Please try again.');
+  }

Also applies to: 54-54, 57-58


81-85: Consider enhancing tooltip information

The tooltip could be more specific about the time estimate and provide more detailed information about the categorization process.

-                  text="Categorize thousands of senders in your inbox. This will take a few minutes."
+                  text="Automatically categorize thousands of senders in your inbox. This process typically takes 3-5 minutes depending on the number of emails."

87-97: Add aria-label for better accessibility

The button includes an icon but could benefit from improved accessibility.

               <CategorizeWithAiButton
                 buttonProps={{
+                  "aria-label": "Start bulk categorization of email senders",
                   children: (
                     <>
                       <SparklesIcon className="mr-2 size-4" />
                       Bulk Categorize
                     </>
                   ),
                   variant: "outline",
                 }}
               />
apps/web/utils/actions/categorize.ts (2)

63-64: Consider making the iteration limit configurable

The hardcoded iteration limit of 20 should be moved to a configuration variable for better flexibility and maintainability.

+const MAX_ITERATIONS = process.env.MAX_ITERATIONS ? parseInt(process.env.MAX_ITERATIONS) : 20;
-for (let i = 0; i < 20; i++) {
+for (let i = 0; i < MAX_ITERATIONS; i++) {

85-89: Consider implementing rate limiting for queue publishing

Publishing to the queue without rate limiting could potentially overwhelm the system with a large number of requests.

Consider implementing a rate limiting strategy or batch delay between queue publications to prevent system overload. This could be achieved using a delay between batches or by implementing a token bucket algorithm.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e61fdd5 and 110c24e.

📒 Files selected for processing (11)
  • apps/web/app/(app)/smart-categories/CategorizeProgress.tsx (1 hunks)
  • apps/web/app/(app)/smart-categories/page.tsx (6 hunks)
  • apps/web/app/api/google/webhook/block-unsubscribed-emails.ts (2 hunks)
  • apps/web/app/api/google/webhook/process-history.ts (12 hunks)
  • apps/web/app/api/user/categorize/senders/uncategorized/get-uncategorized-senders.ts (1 hunks)
  • apps/web/app/api/user/stats/new-senders/route.ts (2 hunks)
  • apps/web/app/api/user/stats/newsletters/route.ts (2 hunks)
  • apps/web/utils/actions/categorize.ts (2 hunks)
  • apps/web/utils/actions/unsubscriber.ts (1 hunks)
  • apps/web/utils/ai/choose-rule/run-rules.ts (3 hunks)
  • apps/web/utils/categorize/senders/categorize.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • apps/web/app/(app)/smart-categories/CategorizeProgress.tsx
  • apps/web/app/api/google/webhook/process-history.ts
  • apps/web/app/api/user/categorize/senders/uncategorized/get-uncategorized-senders.ts
  • apps/web/utils/ai/choose-rule/run-rules.ts
🔇 Additional comments (11)
apps/web/app/api/google/webhook/block-unsubscribed-emails.ts (2)

10-10: LGTM! Good practice to use centralized email parsing.

The addition of the extractEmailAddress utility function import aligns with the codebase's effort to standardize email handling across different modules.


27-27: Verify query performance with the new OR condition.

The enhanced matching logic that checks both raw and extracted email formats is more robust. However, let's verify the query performance, especially for tables with a large number of records.

✅ Verification successful

Based on the shell script results, I can now generate the final response:

The OR condition query is safe to use with the current schema

The Newsletter model has a simple structure with no indexes on the email field. However, this is acceptable because:

  • The table scope is limited to unsubscribed emails only
  • The OR condition operates on a single field (email)
  • The query is used in a webhook context where sub-second performance isn't critical
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are appropriate indexes on the email field

# Look for index definitions in schema files
fd -e prisma "schema.prisma" -x cat {} | rg -A 5 -B 5 "model Newsletter"

# Check for any performance-related comments or issues
rg -g "!node_modules" -g "!.git" -A 3 "Newsletter.*performance|index"

Length of output: 16573

apps/web/utils/actions/unsubscriber.ts (2)

7-7: LGTM!

The import of extractEmailAddress aligns with the improved email handling strategy.


11-14: LGTM!

The destructured parameters improve code readability while maintaining type safety.

apps/web/app/api/user/stats/new-senders/route.ts (1)

13-13: LGTM: Clean import addition

The import of extractEmailAddress is well-placed and follows the project's import conventions.

apps/web/app/api/user/stats/newsletters/route.ts (1)

12-12: LGTM: Clean import addition

The import statement follows the project's conventions and is appropriately placed with other imports.

apps/web/app/(app)/smart-categories/page.tsx (2)

4-4: LGTM: New imports are well-organized and purposeful

The added imports align with the new functionality for progress tracking and enhanced UI components.

Also applies to: 28-30


66-66: Verify progress component refresh behavior

The refresh prop is tied directly to the existence of progress data. Consider if this is the optimal refresh trigger.

✅ Verification successful

The refresh prop usage is appropriate and well-implemented

The refresh prop is correctly used as part of the SWR configuration to control the polling interval. When either refresh is true or isBulkCategorizing is active, it polls every second (1000ms) to update the progress. This implementation ensures efficient progress tracking during categorization.

  • The component uses the refresh prop alongside the bulk categorization state to determine when to poll for updates
  • It automatically stops polling when both conditions are false
  • There's a proper cleanup mechanism that stops tracking after completion (3-second delay)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for CategorizeSendersProgress implementation to verify refresh prop usage
ast-grep --pattern 'export function CategorizeSendersProgress({ refresh }: { refresh: boolean }) {
  $$$
}'

Length of output: 109


Script:

#!/bin/bash
# Search for CategorizeSendersProgress component with a more flexible pattern
ast-grep --pattern 'function CategorizeSendersProgress($_) {
  $$$
}'

# Also search for the file containing this component
fd CategorizeSendersProgress

Length of output: 3993

apps/web/utils/categorize/senders/categorize.ts (1)

1-21: LGTM! Well-organized imports and logging setup.

The imports are logically grouped and the use of a scoped logger follows good practices for maintainable logging.

apps/web/utils/actions/categorize.ts (2)

4-22: LGTM! Well-organized imports and logger setup.

The new imports are properly structured and the scoped logger implementation will help with debugging and tracing.


134-139: LGTM! Robust sender identification implementation.

The use of the composite key (email_userId) and email address extraction improves the reliability of sender categorization.

Comment on lines 21 to 22
const sender = extractEmailAddress(newsletterEmail);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Email extraction implementation is incomplete and may cause issues

The extractEmailAddress function has a critical flaw in its implementation. When an email is provided without angle brackets (e.g., "john.doe@gmail.com"), it returns an empty string instead of the email address. This could lead to data inconsistency as seen in its multiple usages across the codebase:

  • Used for sender identification in apps/web/app/api/google/webhook/process-history.ts
  • Used for email matching in apps/web/app/api/user/stats/newsletters/route.ts
  • Used for categorization in apps/web/utils/categorize/senders/categorize.ts

The function should be modified to handle both formats:

export function extractEmailAddress(email: string): string {
  const match = email.match(/<(.*)>/);
  return match ? match[1] : email;
}
🔗 Analysis chain

Verify email extraction behavior across the codebase

The introduction of extractEmailAddress standardizes email handling, but we should verify its behavior with different email formats.


Let me analyze the implementation and usage of extractEmailAddress to verify its behavior.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage and implementation of extractEmailAddress

# Find the implementation
echo "=== extractEmailAddress Implementation ==="
ast-grep --pattern 'export function extractEmailAddress($_) {
  $$$
}'

# Find all usages to ensure consistent behavior
echo "=== extractEmailAddress Usage ==="
rg -A 2 'extractEmailAddress\(' 

Length of output: 3998


Script:

#!/bin/bash
# Get the full implementation of email.ts to understand email extraction logic
echo "=== Email Utility Implementation ==="
cat apps/web/utils/email.ts

# Check for any test files related to email extraction
echo -e "\n=== Email-related Test Files ==="
fd -e test.ts -e spec.ts email

Length of output: 1479

Comment on lines 46 to 48
status: userNewsletters?.find(
(n) => extractEmailAddress(n.email) === extractEmailAddress(email.from),
)?.status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Email normalization has critical issues

The current implementation of extractEmailAddress has serious flaws:

  1. It only extracts emails within angle brackets <...>
  2. Returns empty string for plain email addresses without brackets
  3. No null/undefined checks
  4. No validation of email format

This could lead to:

  • False negatives in email comparisons when plain addresses are used
  • Silent failures with empty string returns
  • Potential runtime errors with null/undefined inputs

Locations needing attention:

  • apps/web/utils/email.ts: The extractEmailAddress function needs to be fixed to handle all email formats
  • All 13 files using this function (found in the search results) will be affected by this bug
🔗 Analysis chain

Verify email normalization edge cases

The use of extractEmailAddress improves email comparison robustness, but we should verify:

  1. Error handling for invalid email formats
  2. Performance impact on large datasets
  3. Handling of null/undefined values

Let's check the implementation and usage of extractEmailAddress:


Let me gather more information about the extractEmailAddress function implementation since it wasn't found in the previous search.


Let's check the implementation of the extractEmailAddress function since we found its location in apps/web/utils/email.ts.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check extractEmailAddress implementation
echo "Checking email utility implementation:"
rg -A 10 "extractEmailAddress.*=" --type ts

# Check for error handling patterns
echo "Checking error handling patterns:"
rg "throw.*Error.*email" --type ts

# Look for similar email comparisons that might need the same treatment
echo "Checking for similar email comparisons:"
rg "===.*@.*\." --type ts

Length of output: 2595


Script:

#!/bin/bash
# Search for extractEmailAddress function definition
ast-grep --pattern 'function extractEmailAddress($_) {
  $$$
}'

# Search for extractEmailAddress const/let/var definition
ast-grep --pattern 'const extractEmailAddress = $_'

# Search for any file containing extractEmailAddress in its name
fd extractEmailAddress

# Broader search for the function name
rg "extractEmailAddress" --type ts

Length of output: 3438


Script:

#!/bin/bash
# Check the implementation of extractEmailAddress
echo "Checking extractEmailAddress implementation:"
cat apps/web/utils/email.ts

# Check for tests
echo "Checking for email utility tests:"
fd -e test.ts -e spec.ts email

Length of output: 1491

Comment on lines 91 to 93
status: userNewsletters?.find(
(n) => extractEmailAddress(n.email) === extractEmailAddress(email.from),
)?.status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider optimizing email comparison and adding error handling

While the use of extractEmailAddress improves email matching accuracy, the current implementation could be optimized and made more robust.

Consider applying these improvements:

 newsletters = newsletterCounts.data.map((email) => {
+  const normalizedFromEmail = extractEmailAddress(email.from);
+  if (!normalizedFromEmail) {
+    console.warn(`Invalid email address: ${email.from}`);
+  }
   return {
     name: email.from,
     value: email.count,
     inboxEmails: email.inboxEmails,
     readEmails: email.readEmails,
     lastUnsubscribeLink: email.lastUnsubscribeLink,
     autoArchived: findAutoArchiveFilter(autoArchiveFilters, email.from),
     status: userNewsletters?.find(
-      (n) => extractEmailAddress(n.email) === extractEmailAddress(email.from),
+      (n) => {
+        const normalizedEmail = extractEmailAddress(n.email);
+        return normalizedEmail && normalizedEmail === normalizedFromEmail;
+      },
     )?.status,
   };
 });

This refactoring:

  1. Extracts the email once outside the find loop for better performance
  2. Adds null checks to handle invalid email addresses
  3. Includes logging for debugging invalid emails
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
status: userNewsletters?.find(
(n) => extractEmailAddress(n.email) === extractEmailAddress(email.from),
)?.status,
newsletters = newsletterCounts.data.map((email) => {
const normalizedFromEmail = extractEmailAddress(email.from);
if (!normalizedFromEmail) {
console.warn(`Invalid email address: ${email.from}`);
}
return {
name: email.from,
value: email.count,
inboxEmails: email.inboxEmails,
readEmails: email.readEmails,
lastUnsubscribeLink: email.lastUnsubscribeLink,
autoArchived: findAutoArchiveFilter(autoArchiveFilters, email.from),
status: userNewsletters?.find(
(n) => {
const normalizedEmail = extractEmailAddress(n.email);
return normalizedEmail && normalizedEmail === normalizedFromEmail;
},
)?.status,
};
});

💡 Codebase verification

Email normalization has potential issues that need attention

The current implementation of extractEmailAddress has a critical flaw that could lead to inconsistent email matching:

  1. It only extracts emails within angle brackets <...> and returns an empty string for plain email addresses
  2. This means extractEmailAddress("john@example.com") returns "" while extractEmailAddress("John <john@example.com>") returns "john@example.com"

This needs to be fixed in apps/web/utils/email.ts to handle both formats:

  • Emails with angle brackets: "Name <email@domain.com>"
  • Plain email addresses: "email@domain.com"
🔗 Analysis chain

Verify consistent email normalization across the codebase

Let's ensure that extractEmailAddress is used consistently for email comparisons throughout the application.


Let me analyze the results to verify the email normalization consistency:

  1. The extractEmailAddress function is consistently imported from @/utils/email across all files.
  2. It's used in all critical email comparison scenarios:
    • Newsletter status matching
    • Sender categorization
    • Unsubscription handling
    • Webhook processing
    • Message grouping

Let me check the actual implementation of extractEmailAddress to ensure it's robust:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage patterns of extractEmailAddress and email comparisons

# Find direct email comparisons that might need normalization
echo "Finding potential email comparisons that might need normalization:"
rg -n "=== .*@.*\." --type ts

# Find current usage of extractEmailAddress
echo -e "\nCurrent usage of extractEmailAddress:"
rg -n "extractEmailAddress" --type ts

# Find other email comparison patterns
echo -e "\nOther email comparison patterns:"
rg -n "email.*===|===.*email" --type ts

Length of output: 5023


Script:

#!/bin/bash
# Get the implementation of extractEmailAddress
cat apps/web/utils/email.ts

Length of output: 1282

Comment on lines 147 to 176
export async function categorizeWithAi({
user,
sendersWithSnippets,
categories,
}: {
user: UserEmailWithAI;
sendersWithSnippets: Map<string, string[]>;
categories: Pick<Category, "name" | "description">[];
}) {
const categorizedSenders = preCategorizeSendersWithStaticRules(
Array.from(sendersWithSnippets.keys()),
);

const sendersToCategorizeWithAi = categorizedSenders
.filter((sender) => !sender.category)
.map((sender) => sender.sender);

console.log(
`Found ${sendersToCategorizeWithAi.length} senders to categorize with AI`,
);

const aiResults = await aiCategorizeSenders({
user,
senders: sendersToCategorizeWithAi.map((sender) => ({
emailAddress: sender,
snippets: sendersWithSnippets.get(sender) || [],
})),
categories,
});

return [...categorizedSenders, ...aiResults];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Prevent potential memory leaks in sendersWithSnippets Map

The sendersWithSnippets Map could grow unbounded as it stores email snippets for all senders. Consider implementing cleanup or using a bounded cache.

Consider using a LRU cache or implementing cleanup:

+import LRU from 'lru-cache';

+const CACHE_OPTIONS = {
+  max: 500, // Maximum number of items
+  maxSize: 5000000, // Maximum cache size (5MB)
+  sizeCalculation: (snippets: string[]) => 
+    snippets.reduce((sum, s) => sum + s.length * 2, 0), // Approximate size in bytes
+};

 export async function categorizeWithAi({
   user,
-  sendersWithSnippets,
+  sendersWithSnippets,
   categories,
 }: {
   user: UserEmailWithAI;
-  sendersWithSnippets: Map<string, string[]>;
+  sendersWithSnippets: LRU<string, string[]>;
   categories: Pick<Category, "name" | "description">[];
 })

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 23 to 54
export async function categorizeSender(
senderAddress: string,
user: Pick<User, "id" | "email"> & UserAIFields,
gmail: gmail_v1.Gmail,
) {
const categories = await getUserCategories(user.id);

if (categories.length === 0) return { categoryId: undefined };

const previousEmails = await getPreviousEmails(gmail, senderAddress);

const aiResult = await aiCategorizeSender({
user,
sender: senderAddress,
previousEmails,
categories,
});

if (aiResult) {
const { newsletter } = await updateSenderCategory({
sender: senderAddress,
categories,
categoryName: aiResult.category,
userId: user.id,
});

return { categoryId: newsletter.categoryId };
}

logger.error(`No AI result for sender: ${senderAddress}`);

return { categoryId: undefined };
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling with detailed error types

The function could benefit from more detailed error types to distinguish between different failure scenarios (no categories, AI failure, database errors, etc.).

Consider implementing a discriminated union type for the return value:

+type CategorizeSenderResult =
+  | { success: true; categoryId: string }
+  | { success: false; error: 'NO_CATEGORIES' | 'AI_FAILED' | 'UPDATE_FAILED' };

-export async function categorizeSender(
+export async function categorizeSender(
   senderAddress: string,
   user: Pick<User, "id" | "email"> & UserAIFields,
   gmail: gmail_v1.Gmail,
-) {
+): Promise<CategorizeSenderResult> {
   const categories = await getUserCategories(user.id);

-  if (categories.length === 0) return { categoryId: undefined };
+  if (categories.length === 0) return { success: false, error: 'NO_CATEGORIES' };

   // ... rest of the function ...

-  return { categoryId: undefined };
+  return { success: false, error: 'AI_FAILED' };
 }

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
apps/web/utils/actions/client.ts (1)

13-15: Consider updating the success message for clarity.

The success message "Auto archive disabled!" could be more specific about the filter deletion operation.

Consider this change:

-  handleActionResult(result, "Auto archive disabled!");
+  handleActionResult(result, "Filter removed successfully!");
apps/web/app/(app)/bulk-unsubscribe/hooks.ts (1)

260-266: Optimize Promise.resolve() usage

Creating a new Promise.resolve() on every render is unnecessary and could impact performance.

Consider memoizing the callback:

+  const noopRefetchPremium = React.useCallback(
+    () => Promise.resolve(),
+    []
+  );
   const { onDisableAutoArchive } = useAutoArchive({
     item,
-    hasUnsubscribeAccess: true,
+    hasUnsubscribeAccess,
     mutate,
     posthog,
-    refetchPremium: () => Promise.resolve(),
+    refetchPremium: noopRefetchPremium,
   });
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 110c24e and 587c0de.

📒 Files selected for processing (3)
  • apps/web/app/(app)/bulk-unsubscribe/BulkActions.tsx (0 hunks)
  • apps/web/app/(app)/bulk-unsubscribe/hooks.ts (3 hunks)
  • apps/web/utils/actions/client.ts (1 hunks)
💤 Files with no reviewable changes (1)
  • apps/web/app/(app)/bulk-unsubscribe/BulkActions.tsx
🔇 Additional comments (2)
apps/web/utils/actions/client.ts (1)

13-14: LGTM! Parameter rename improves clarity.

The parameter rename from gmailLabelId to filterId better reflects the actual purpose of the function.

Let's verify that all callers have been updated to use the new parameter name:

✅ Verification successful

Parameter rename is consistently used across the codebase

The verification shows that all callers of onDeleteFilter are using the new parameter name correctly:

  • In apps/web/app/(app)/bulk-unsubscribe/hooks.ts, it's called with item.autoArchived?.id
  • The implementation in apps/web/utils/actions/client.ts uses the new filterId parameter
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all usages of onDeleteFilter to ensure they've been updated
# to use filterId instead of gmailLabelId

# Search for function calls
rg -A 2 "onDeleteFilter\(" 

Length of output: 526

apps/web/app/(app)/bulk-unsubscribe/hooks.ts (1)

184-186: LGTM: Improved error handling for auto-archive disable

The added null check for item.autoArchived?.id prevents potential runtime errors when disabling auto-archive functionality.

Comment on lines +260 to +266
const { onDisableAutoArchive } = useAutoArchive({
item,
hasUnsubscribeAccess: true,
mutate,
posthog,
refetchPremium: () => Promise.resolve(),
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Security concern: Bypassing access control

The hook is initialized with hasUnsubscribeAccess: true, which bypasses access control checks. This could allow unauthorized users to perform auto-archive operations.

Consider using the actual access control value:

-    hasUnsubscribeAccess: true,
+    hasUnsubscribeAccess,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { onDisableAutoArchive } = useAutoArchive({
item,
hasUnsubscribeAccess: true,
mutate,
posthog,
refetchPremium: () => Promise.resolve(),
});
const { onDisableAutoArchive } = useAutoArchive({
item,
hasUnsubscribeAccess,
mutate,
posthog,
refetchPremium: () => Promise.resolve(),
});

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (6)
apps/web/app/api/user/categorize/senders/uncategorized/get-uncategorized-senders.ts (2)

4-4: Add documentation for the MAX_ITERATIONS constant

Consider adding a comment explaining the purpose of this constant and why 200 was chosen as the limit.

+// Maximum number of iterations to prevent infinite loops when fetching senders
+// Each iteration fetches {limit} senders, so with limit=100, this allows up to 20,000 senders
 const MAX_ITERATIONS = 200;

39-41: Consider batching the filter operation for large datasets

When dealing with large datasets, consider processing the filter operation in chunks to optimize memory usage.

-    uncategorizedSenders = allSenders.filter(
-      (email) => !existingSenderEmails.has(email),
-    );
+    const CHUNK_SIZE = 1000;
+    uncategorizedSenders = [];
+    for (let i = 0; i < allSenders.length; i += CHUNK_SIZE) {
+      const chunk = allSenders.slice(i, i + CHUNK_SIZE);
+      uncategorizedSenders.push(
+        ...chunk.filter((email) => !existingSenderEmails.has(email))
+      );
+    }
apps/web/utils/actions/categorize-email.ts (1)

17-56: Consider splitting the implementation into smaller, focused functions

The current implementation handles multiple responsibilities in a single function. Consider extracting the following into separate functions:

  1. Input validation and user verification
  2. Email content processing
  3. Categorization logic

This would improve maintainability and testability.

apps/web/utils/categorize/senders/categorize.ts (3)

82-82: Implement or remove the commented color generation code

The commented code suggests that category colors were planned but not implemented. Either implement the feature or remove the comment to maintain clean code.

If you'd like to implement the color feature, I can help generate the implementation.


105-105: Address the TODO comment regarding missing categories

The TODO raises a valid concern about handling cases where users don't have all categories set up.

Would you like me to help implement a solution that creates default categories for new users or handles missing categories gracefully?


141-172: Add progress tracking for bulk categorization

For better user experience and monitoring, consider adding progress tracking for the AI categorization process.

Apply this diff:

 export async function categorizeWithAi({
   user,
   sendersWithSnippets,
   categories,
+  onProgress?: (progress: { total: number; completed: number }) => void,
 }: {
   user: UserEmailWithAI;
   sendersWithSnippets: Map<string, string[]>;
   categories: Pick<Category, "name" | "description">[];
+  onProgress?: (progress: { total: number; completed: number }) => void;
 }) {
   const categorizedSenders = preCategorizeSendersWithStaticRules(
     Array.from(sendersWithSnippets.keys()),
   );

   const sendersToCategorizeWithAi = categorizedSenders
     .filter((sender) => !sender.category)
     .map((sender) => sender.sender);

   logger.info("Found senders to categorize with AI", {
     count: sendersToCategorizeWithAi.length,
   });

+  const total = sendersToCategorizeWithAi.length;
+  let completed = 0;
+  onProgress?.({ total, completed });

   const aiResults = await aiCategorizeSenders({
     user,
     senders: sendersToCategorizeWithAi.map((sender) => ({
       emailAddress: sender,
       snippets: sendersWithSnippets.get(sender) || [],
     })),
     categories,
+    onSenderCategorized: () => {
+      completed++;
+      onProgress?.({ total, completed });
+    },
   });

   return [...categorizedSenders, ...aiResults];
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f856c0f and b62a30b.

📒 Files selected for processing (5)
  • apps/web/app/api/user/categorize/senders/batch/handle-batch.ts (1 hunks)
  • apps/web/app/api/user/categorize/senders/uncategorized/get-uncategorized-senders.ts (1 hunks)
  • apps/web/utils/actions/categorize-email.ts (1 hunks)
  • apps/web/utils/ai/categorize-sender/ai-categorize-senders.ts (2 hunks)
  • apps/web/utils/categorize/senders/categorize.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/web/app/api/user/categorize/senders/batch/handle-batch.ts
  • apps/web/utils/ai/categorize-sender/ai-categorize-senders.ts
🔇 Additional comments (8)
apps/web/app/api/user/categorize/senders/uncategorized/get-uncategorized-senders.ts (3)

6-16: LGTM! Well-structured function signature with proper typing

The function parameters are well-defined with appropriate default values and TypeScript types.


51-55: LGTM! Clear return structure with helpful documentation

The return object is well-structured, and the comment explains the purpose of the optional nextOffset field.


6-16: Verify integration with CategorizeWithAiButton component

Let's ensure the function interface matches its usage in the UI component.

apps/web/utils/actions/categorize-email.ts (5)

1-16: LGTM! Well-organized imports with clear separation of concerns.

The imports are properly structured and all necessary for the implementation.


51-52: Potential null reference with user email

The user email access could be unsafe if the email field is undefined.

This issue was previously identified. Please refer to the existing suggestion to handle this more safely:

-      { email: u.email },
+      { email: u.email ?? throw new Error("User email is required for categorization") },

35-38: Add error handling for email processing operations

This issue was previously identified. The email processing operations still lack proper error handling. Please refer to the existing suggestion for implementing comprehensive error handling for content processing, unsubscribe link extraction, and previous email checking.


54-54: Improve response handling and type safety

This issue was previously identified. Please refer to the existing suggestion for implementing more comprehensive response handling with proper error cases and additional metadata.


45-47: Verify AI configuration compatibility

The AI configuration (apiKey, provider, model) is passed directly from user preferences. Ensure these values are validated for compatibility.

✅ Verification successful

Based on the search results, I can now provide a final response as there is clear evidence of AI configuration validation in the codebase:

AI configuration is properly validated

The codebase has robust validation for AI configuration:

  • AI provider and model combinations are validated through Zod schema in apps/web/app/api/user/settings/validation.ts
  • The getModel function in apps/web/utils/llms/index.ts handles provider-specific model defaults and compatibility
  • API key requirements are validated based on provider selection (required for OpenAI, optional for Anthropic)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for AI provider and model validation logic
rg -A 5 "aiProvider|aiModel" --type typescript

Length of output: 83


Script:

#!/bin/bash
# Search for AI provider and model validation logic without file type restriction
rg -A 5 "aiProvider|aiModel"

Length of output: 22836


Script:

#!/bin/bash
# Search for validation or configuration related to AI settings
rg -A 5 "(validateAI|checkAI|configureAI)"

Length of output: 44


Script:

#!/bin/bash
# Find TypeScript/JavaScript files that might contain AI configuration
fd -e ts -e js --exec rg -l "aiProvider|aiModel|aiApiKey" {}

Length of output: 1114

Comment on lines +18 to +50
export async function categorizeSender(
senderAddress: string,
user: Pick<User, "id" | "email"> & UserAIFields,
gmail: gmail_v1.Gmail,
userCategories?: Pick<Category, "id" | "name" | "description">[],
) {
const categories = userCategories || (await getUserCategories(user.id));
if (categories.length === 0) return { categoryId: undefined };

const previousEmails = await getPreviousEmails(gmail, senderAddress);

const aiResult = await aiCategorizeSender({
user,
sender: senderAddress,
previousEmails,
categories,
});

if (aiResult) {
const { newsletter } = await updateSenderCategory({
sender: senderAddress,
categories,
categoryName: aiResult.category,
userId: user.id,
});

return { categoryId: newsletter.categoryId };
}

logger.error(`No AI result for sender: ${senderAddress}`);

return { categoryId: undefined };
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for senderAddress parameter

The function should validate the email address format before processing.

Apply this diff:

 export async function categorizeSender(
   senderAddress: string,
   user: Pick<User, "id" | "email"> & UserAIFields,
   gmail: gmail_v1.Gmail,
   userCategories?: Pick<Category, "id" | "name" | "description">[],
 ) {
+  if (!senderAddress || !senderAddress.includes('@')) {
+    logger.error(`Invalid sender address: ${senderAddress}`);
+    return { categoryId: undefined };
+  }
+
   const categories = userCategories || (await getUserCategories(user.id));
   if (categories.length === 0) return { categoryId: undefined };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function categorizeSender(
senderAddress: string,
user: Pick<User, "id" | "email"> & UserAIFields,
gmail: gmail_v1.Gmail,
userCategories?: Pick<Category, "id" | "name" | "description">[],
) {
const categories = userCategories || (await getUserCategories(user.id));
if (categories.length === 0) return { categoryId: undefined };
const previousEmails = await getPreviousEmails(gmail, senderAddress);
const aiResult = await aiCategorizeSender({
user,
sender: senderAddress,
previousEmails,
categories,
});
if (aiResult) {
const { newsletter } = await updateSenderCategory({
sender: senderAddress,
categories,
categoryName: aiResult.category,
userId: user.id,
});
return { categoryId: newsletter.categoryId };
}
logger.error(`No AI result for sender: ${senderAddress}`);
return { categoryId: undefined };
}
export async function categorizeSender(
senderAddress: string,
user: Pick<User, "id" | "email"> & UserAIFields,
gmail: gmail_v1.Gmail,
userCategories?: Pick<Category, "id" | "name" | "description">[],
) {
if (!senderAddress || !senderAddress.includes('@')) {
logger.error(`Invalid sender address: ${senderAddress}`);
return { categoryId: undefined };
}
const categories = userCategories || (await getUserCategories(user.id));
if (categories.length === 0) return { categoryId: undefined };
const previousEmails = await getPreviousEmails(gmail, senderAddress);
const aiResult = await aiCategorizeSender({
user,
sender: senderAddress,
previousEmails,
categories,
});
if (aiResult) {
const { newsletter } = await updateSenderCategory({
sender: senderAddress,
categories,
categoryName: aiResult.category,
userId: user.id,
});
return { categoryId: newsletter.categoryId };
}
logger.error(`No AI result for sender: ${senderAddress}`);
return { categoryId: undefined };
}

@elie222 elie222 merged commit 1f2855c into main Nov 25, 2024
@elie222 elie222 deleted the bulk-categorize branch November 25, 2024 21:36
@coderabbitai coderabbitai bot mentioned this pull request Aug 5, 2025
@coderabbitai coderabbitai bot mentioned this pull request Aug 25, 2025
@coderabbitai coderabbitai bot mentioned this pull request Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant