Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sanitize input data #1902

Merged
merged 8 commits into from
Oct 29, 2024
Merged

feat: sanitize input data #1902

merged 8 commits into from
Oct 29, 2024

Conversation

saikumarrs
Copy link
Member

@saikumarrs saikumarrs commented Oct 23, 2024

PR Description

All the user inputs to the SDK are sanitized to remove BigInt and circular references. They create problems while converting the data to a string.

This makes a lot of things easy down the line of the event data processing.

I've also fixed validations in a few user APIs.

Linear task (optional)

https://linear.app/rudderstack/issue/SDK-2254/investigate-deep-cloning-object-utilities-js-sdk

Cross Browser Tests

Please confirm you have tested for the following browsers:

  • Chrome
  • Firefox
  • IE11

Sanity Suite

  • All sanity suite test cases pass locally

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

Summary by CodeRabbit

  • New Features

    • Introduced new utility functions for validating write keys and data plane URLs.
    • Added a new function for sanitizing values in JSON data.
    • Enhanced error handling in the UserSessionManager and RudderAnalytics classes.
    • Updated constants to exclude sensitive data like authToken from metadata.
  • Bug Fixes

    • Streamlined error handling by removing redundant validation tests and updating error messages.
  • Refactor

    • Replaced deprecated JSON stringification methods with a new unified method across various components.
    • Removed unnecessary logger parameters from several functions to simplify code.
  • Documentation

    • Updated test cases to reflect changes in validation and error handling logic.

@saikumarrs saikumarrs self-assigned this Oct 23, 2024
Copy link
Contributor

coderabbitai bot commented Oct 23, 2024

📝 Walkthrough

Walkthrough

This pull request introduces significant changes across various modules, primarily focusing on updating JSON utility functions and enhancing error handling. The stringifyWithoutCircular function has been replaced with stringifyData, which modifies how JSON data is processed, particularly concerning null values and key exclusions. New functions like getSanitizedValue and isBigInt have been added, while several test cases have been adjusted or removed to reflect these changes. Additionally, constants related to logging and error messages have been updated to improve clarity and functionality.

Changes

File Path Change Summary
packages/analytics-js-common/__tests__/utilities/json.test.ts Updated test cases to transition from stringifyWithoutCircular to stringifyData, adding scenarios for null value handling and key exclusions. Introduced tests for getSanitizedValue.
packages/analytics-js-common/src/utilities/checks.ts Added isBigInt function to check for bigint type, exported new function.
packages/analytics-js-common/src/utilities/json.ts Replaced stringifyWithoutCircular with stringifyData, added getSanitizedValue and traverseWithThis, removed getCircularReplacer. Updated export statements.
packages/analytics-js-cookies/__tests__/cookieUtilities.test.ts Removed tests for getEncryptedValueBrowser and getEncryptedValue related to non-stringifiable inputs.
packages/analytics-js-cookies/src/cookiesUtilities.ts Updated getEncryptedValueInternal to use stringifyData instead of stringifyWithoutCircular.
packages/analytics-js-plugins/__tests__/bugsnag/utils.test.ts Removed test for getAppStateForMetadata checking BigInt conversion.
packages/analytics-js-plugins/__tests__/errorReporting/utils.test.ts Removed test for getAppStateForMetadata related to BigInt signal conversion.
packages/analytics-js-plugins/__tests__/utilities/queue.test.ts Removed tests for getDeliveryPayload regarding BigInt handling.
packages/analytics-js-plugins/__tests__/xhrQueue/utilities.test.ts Removed tests for getBatchDeliveryPayload related to circular dependencies and BigInt.
packages/analytics-js-plugins/src/beaconQueue/utilities.ts Updated getBatchDeliveryPayload to use stringifyData.
packages/analytics-js-plugins/src/bugsnag/utils.ts Updated getAppStateForMetadata to use stringifyData.
packages/analytics-js-plugins/src/errorReporting/utils.ts Updated getAppStateForMetadata and getErrorDeliveryPayload to use stringifyData.
packages/analytics-js-plugins/src/utilities/eventsDelivery.ts Removed logger parameter from getDeliveryPayload and getDMTDeliveryPayload, replaced stringifyWithoutCircular with stringifyData.
packages/analytics-js-plugins/src/xhrQueue/index.ts Removed logger parameter from getRequestInfo and getBatchDeliveryPayload calls.
packages/analytics-js-plugins/src/xhrQueue/utilities.ts Removed logger parameter from getBatchDeliveryPayload and getRequestInfo.
packages/analytics-js/__tests__/services/HttpClient/HttpClient.test.ts Removed test for getAsyncData handling non-stringifiable values.
packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts Updated getEncryptedCookieData and makeRequestToSetCookie to use stringifyData. Enhanced error handling in setServerSideCookies.
packages/analytics-js/src/services/HttpClient/HttpClient.ts Removed this.logger parameter from xhrRequest function call.
packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts Removed logger parameter from xhrRequest signature and updated payload preparation logic.
packages/analytics-js/src/services/StoreManager/Store.ts Replaced stringifyWithoutCircular with stringifyData in set method.
packages/analytics-js-common/src/constants/logMessages.ts Removed CIRCULAR_REFERENCE_WARNING and JSON_STRINGIFY_WARNING, added BAD_DATA_WARNING.
packages/analytics-js-common/src/utilities/errors.ts Updated getMutatedError to use stringifyData.
packages/analytics-js-common/src/utilities/object.ts Added isObject function and updated isObjectAndNotNull to use it.
packages/analytics-js-plugins/src/errorReporting/event/event.ts Updated normaliseError to use stringifyData.
packages/analytics-js/__tests__/components/configManager/ConfigManager.test.ts Removed tests for invalid write key and data plane URL.
packages/analytics-js/__tests__/components/configManager/validate.test.ts Removed tests for validateLoadArgs.
packages/analytics-js/__tests__/components/core/Analytics.test.ts Removed test for loading analytics script without a dataPlaneUrl, added tests for invalid inputs.
packages/analytics-js/__tests__/components/userSessionManager/UserSessionManager.test.ts Updated tests to use stringifyData.
packages/analytics-js/src/app/RudderAnalytics.ts Enhanced input sanitization using getSanitizedValue.
packages/analytics-js/src/components/configManager/ConfigManager.ts Removed validateLoadArgs function call from init method.
packages/analytics-js/src/components/configManager/util/validate.ts Removed validateWriteKey, validateDataPlaneUrl, and validateLoadArgs functions.
packages/analytics-js/src/components/core/Analytics.ts Updated load method to require dataPlaneUrl as mandatory and added validation for writeKey.
packages/analytics-js/src/constants/logMessages.ts Updated error message functions for validation errors, removed redundant messages.
packages/analytics-js/src/services/ErrorHandler/processError.ts Updated processError to use stringifyData.
packages/analytics-js-common/src/constants/integrations/integration_cname.js Minor syntactical adjustment in commonNames object.
packages/analytics-js-plugins/src/bugsnag/constants.ts Updated APP_STATE_EXCLUDE_KEYS to include 'authToken'.
packages/analytics-js-plugins/src/errorReporting/constants.ts Updated APP_STATE_EXCLUDE_KEYS to include 'authToken'.
packages/analytics-js/src/components/core/utilities.ts Added isWriteKeyValid and isDataPlaneUrlValid functions for validation.

Possibly related PRs

  • fix: filter non error type errors #1865: This PR modifies error handling in the Error Reporting utilities, specifically focusing on how errors are normalized and ensuring that only valid error types are processed. This relates to the main PR's changes in JSON utility functions, particularly in how errors and their representations are handled in the context of JSON stringification.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 97.67442% with 3 lines in your changes missing coverage. Please review.

Project coverage is 57.47%. Comparing base (eb0c51b) to head (a9401ac).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
...ckages/analytics-js-common/src/utilities/errors.ts 50.00% 1 Missing ⚠️
.../analytics-js-plugins/src/beaconQueue/utilities.ts 0.00% 1 Missing ⚠️
...ages/analytics-js/src/components/core/Analytics.ts 91.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1902      +/-   ##
===========================================
- Coverage    57.49%   57.47%   -0.02%     
===========================================
  Files          485      485              
  Lines        16493    16495       +2     
  Branches      3289     3313      +24     
===========================================
- Hits          9482     9481       -1     
+ Misses        5792     5734      -58     
- Partials      1219     1280      +61     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (19)
packages/analytics-js-common/src/types/stringify.ts (1)

1-5: Add JSDoc documentation for better maintainability.

The type definition is well-structured, but would benefit from JSDoc comments explaining the purpose and usage of each option.

+/**
+ * Options for JSON stringification with circular reference handling
+ */
 export type StringifyOptions = {
+  /** Whether to exclude null values from the stringified output */
   excludeNull?: boolean;
+  /** Whether to exclude BigInt values from the stringified output */
   excludeBigInt?: boolean;
+  /** Array of keys to exclude from the stringified output */
   excludeKeys?: string[];
 };
packages/analytics-js-common/src/utilities/sanitize.ts (1)

4-9: Enhance documentation clarity.

While the documentation follows JSDoc conventions, it could be more specific about:

  • What "sanitized" means in this context
  • How BigInt values are transformed
  • Potential exceptions that might be thrown (e.g., from JSON.parse)

Consider expanding the documentation:

 /**
- * Handles circular references and BigInts in the input value
+ * Sanitizes input value by handling circular references and converting BigInts to strings.
+ * This function ensures the value can be safely serialized to JSON.
  * @param value Input value
  * @param logger Logger instance
  * @returns Sanitized value
+ * @throws {SyntaxError} When the stringified value cannot be parsed as JSON
  */
packages/analytics-js-plugins/src/beaconQueue/utilities.ts (1)

Line range hint 25-42: Consider enhancing error handling for data sanitization.

The error handling could be more specific about sanitization failures to help with debugging in production.

Consider enhancing the error logging:

 try {
   const blobPayload = json.stringifyWithoutCircular(data);
   const blobOptions: BlobPropertyBag = { type: 'text/plain' };

   if (blobPayload) {
     return new Blob([blobPayload], blobOptions);
   }
-  logger?.error(BEACON_QUEUE_STRING_CONVERSION_FAILURE_ERROR(BEACON_QUEUE_PLUGIN));
+  logger?.error(
+    BEACON_QUEUE_STRING_CONVERSION_FAILURE_ERROR(BEACON_QUEUE_PLUGIN),
+    { dataLength: JSON.stringify(data).length }
+  );
 } catch (err) {
-  logger?.error(BEACON_QUEUE_BLOB_CONVERSION_FAILURE_ERROR(BEACON_QUEUE_PLUGIN), err);
+  logger?.error(
+    BEACON_QUEUE_BLOB_CONVERSION_FAILURE_ERROR(BEACON_QUEUE_PLUGIN),
+    { error: err, sanitizationFailed: true }
+  );
 }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 34-34: packages/analytics-js-plugins/src/beaconQueue/utilities.ts#L34
Added line #L34 was not covered by tests

packages/analytics-js-common/src/utilities/json.ts (2)

19-23: Document BigInt handling behavior

The function now silently converts BigInt values to undefined when excludeBigInt is true. This behavior should be documented to prevent confusion.

Add JSDoc comment above the function:

+/**
+ * Creates a replacer function for JSON.stringify that handles:
+ * - Circular references (converts to "[Circular Reference]")
+ * - BigInt values (converts to undefined when excludeBigInt is true)
+ * - Null values (converts to undefined when excludeNull is true)
+ * - Specified keys (converts to undefined when in excludeKeys)
+ */
const getCircularReplacer = (

51-53: Enhance JSDoc documentation for options parameter

The current documentation could be more detailed about the available options and their effects.

Update the JSDoc to be more descriptive:

 /**
  * Utility method for JSON stringify object excluding null values & circular references
  *
  * @param {*} value input value
- * @param {StringifyOptions} options optional options for excluding keys and null values
+ * @param {StringifyOptions} options Configuration options:
+ *   - excludeNull: When true, removes null values from the output (default: true)
+ *   - excludeBigInt: When true, removes BigInt values from the output (default: true)
+ *   - excludeKeys: Array of key names to exclude from the output (default: [])
  * @param {function} logger optional logger instance
  * @returns string
  */
packages/analytics-js-common/src/utilities/checks.ts (1)

38-43: Documentation could be enhanced with examples.

The implementation is correct and follows TypeScript best practices with the type predicate pattern. Consider enhancing the JSDoc with examples to help other developers:

 /**
  * Checks if the input is a BigInt
  * @param value input value
  * @returns True if the input is a BigInt
+ * @example
+ * isBigInt(BigInt(9007199254740991)) // returns true
+ * isBigInt(123) // returns false
+ * isBigInt(null) // returns false
  */
packages/analytics-js-common/__tests__/utilities/sanitize.test.ts (2)

9-31: Consider expanding circular reference test coverage

While the current test effectively covers basic circular references, consider adding test cases for:

  1. Array circular references
  2. Deeper nested circular references (3+ levels)
  3. Multiple circular references at the same level

Example addition:

it('should handle array circular references', () => {
  const arr: any[] = [1, 2];
  arr.push(arr);
  const nestedObj = { arr };
  
  const sanitizedValue = getSanitizedValue(nestedObj, mockLogger);
  expect(sanitizedValue).toEqual({
    arr: [1, 2, '[Circular Reference]']
  });
});

33-61: Enhance BigInt and undefined handling test coverage

Consider adding test cases for:

  1. Arrays containing BigInt/undefined values
  2. Objects that become empty after sanitization
  3. Mixed array and object structures

Example additions:

it('should handle arrays with BigInt and undefined', () => {
  const obj = {
    arr: [BigInt(1), undefined, 'valid', {a: BigInt(2)}]
  };
  
  expect(getSanitizedValue(obj)).toEqual({
    arr: ['valid', {}]
  });
});

it('should handle empty objects after sanitization', () => {
  const obj = {
    a: { b: BigInt(1) },
    c: { d: undefined }
  };
  
  expect(getSanitizedValue(obj)).toEqual({});
});
packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (1)

25-46: Review architectural approach to error handling and data sanitization.

The current changes present several architectural concerns:

  1. Inconsistent error handling between similar functions (getDeliveryPayload vs getDMTDeliveryPayload)
  2. Removal of error handling without proper alternatives
  3. Missing implementation of input data sanitization mentioned in PR objectives

Consider:

  • Implementing a consistent error handling strategy across all payload processing functions
  • Adding a dedicated input sanitization layer
  • Adding comprehensive input validation before processing

Suggested approach:

  1. Create a dedicated sanitization module
  2. Implement consistent error handling across all payload-related functions
  3. Add input validation at the entry points
  4. Consider adding logging for debugging and monitoring purposes

Would you like assistance in designing this architecture?

packages/analytics-js-cookies/src/cookiesUtilities.ts (2)

Line range hint 15-24: Consider additional error handling for malicious inputs.

The error handling is good but could be enhanced to specifically catch and handle potentially malicious inputs before encryption.

Consider adding input validation:

 const getEncryptedValueInternal = (
   value: string | ApiObject,
   encryptFn: (value: string) => string,
   debug: boolean,
 ): Nullable<string> => {
   const fallbackValue = null;
   try {
+    // Validate input before processing
+    if (typeof value === 'string' && value.length > 50000) {
+      throw new Error('Input exceeds maximum allowed length');
+    }
     const strValue = stringifyWithoutCircular(value, { excludeNull: false });
     if (isNull(strValue)) {
       return null;
     }
     return encryptFn(strValue);
   } catch (err) {
     if (debug) {
-      console.error('Error occurred during encryption: ', err);
+      console.error('[RudderStack] Error occurred during encryption: ', err);
     }
     return fallbackValue;
   }
 };

16-16: Document the stringification behavior change.

The switch to using an options object is a breaking change in the function signature. Consider adding JSDoc to document the expected behavior of excludeNull: false.

Add documentation above the function:

+/**
+ * Encrypts a value after stringification, with null values preserved
+ * @param value - The value to encrypt (string or object)
+ * @param encryptFn - The encryption function to use
+ * @param debug - Whether to log debug information
+ * @returns The encrypted string or null if encryption fails
+ */
 const getEncryptedValueInternal = (
packages/analytics-js/src/services/HttpClient/HttpClient.ts (2)

Line range hint 1-134: Consider implementing additional security measures.

To align with the PR's focus on input sanitization, consider implementing these security enhancements:

  1. Input validation for request configurations
  2. Content-type validation and sanitization
  3. Response size limits
  4. Rate limiting mechanism
  5. CORS validation
  6. Request/Response logging sanitization (removing sensitive data)

Would you like assistance in implementing any of these security measures?


Based on the gathered information, I can now provide a final response regarding the input sanitization concern.

Input sanitization is partially implemented but needs enhancement

The codebase already has some security measures in place:

  • URL validation through isValidURL utility from @rudderstack/analytics-js-common/utilities/url
  • Data serialization with circular reference protection via stringifyWithoutCircular
  • Basic validation for data plane URLs in validateDataPlaneUrl

However, the review comment is partially correct as there are gaps in request data sanitization:

  • The URL validation is only used in config management but not in the HTTP client
  • No content-type validation in the request handler
  • No sanitization of request parameters before sending

Recommended improvements:

  • Utilize the existing isValidURL utility to validate URLs in HttpClient.ts
  • Add parameter sanitization before creating XHR request options
  • Implement content-type validation in the request handler
🔗 Analysis chain

Line range hint 65-77: Input sanitization is missing for request data.

Given that this PR's objective is to implement input data sanitization, the current implementation doesn't sanitize the request data before sending it to the server. This could potentially lead to security vulnerabilities.

Let's verify if any sanitization is implemented in the dependent modules:

Consider implementing the following security measures:

  1. Sanitize URL parameters before making requests
  2. Validate and sanitize request options
  3. Implement content-type validation

Also applies to: 56-77


Let me gather more information about the request handling and data processing in this codebase.

The previous search was too narrow. Let's expand our search to understand the request flow and data handling better. These commands will help us:

  1. Find where createXhrRequestOptions is used and its context
  2. Look at the actual implementation of createXhrRequestOptions
  3. Check how options are processed throughout the HttpClient module

Let me check for any data validation or sanitization in the request handling flow, particularly focusing on the stringifyWithoutCircular function and any URL processing.

These commands will help us understand:

  1. How the request data is processed before sending
  2. If there's any URL validation or processing
  3. If there are any validation utilities used in the project

Let me check the validation utilities found in the codebase, particularly in the config manager, to see if they're used or could be used for request data validation.

These commands will help us understand:

  1. Any existing validation functions that could be reused
  2. URL-specific validation if it exists
  3. The validation utilities available in the config manager

Let me check one final thing to see if the URL validation utility is being used in the HttpClient or if there are any other relevant utilities we should be aware of.

These commands will help us understand:

  1. If URL validation is used in the HTTP client
  2. How request data is processed before sending
  3. If there are any other security-related utilities available
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any sanitization in the xhr handler and related files
rg -l "sanitize|escape|encode" packages/analytics-js/src/services/HttpClient/xhr/

Length of output: 140


Script:

#!/bin/bash
# Check for request data handling in HttpClient and related files
rg -A 5 "createXhrRequestOptions" packages/analytics-js/src/services/HttpClient/

# Check the implementation of createXhrRequestOptions
ast-grep --pattern 'function createXhrRequestOptions($_) {
  $$$
}'

# Look for any data processing or validation
rg -A 5 "options\." packages/analytics-js/src/services/HttpClient/

Length of output: 7450


Script:

#!/bin/bash
# Check stringifyWithoutCircular implementation
ast-grep --pattern 'function stringifyWithoutCircular($_) {
  $$$
}'

# Look for URL validation or processing
rg -A 5 "url" packages/analytics-js/src/services/HttpClient/

# Check for any validation utilities that might be used
rg -l "validate|sanitize|clean" packages/analytics-js/src/

Length of output: 6075


Script:

#!/bin/bash
# Check validation implementation in config manager
ast-grep --pattern 'function validate($_) {
  $$$
}'

# Look for any URL validation patterns
rg -A 5 "validateUrl|isValidUrl|encodeUrl" packages/analytics-js/src/

# Check the content of validate.ts
cat packages/analytics-js/src/components/configManager/util/validate.ts

Length of output: 3026


Script:

#!/bin/bash
# Check for imports of validation utilities
rg "import.*isValidURL" packages/analytics-js/src/

# Look at the implementation of stringifyWithoutCircular
rg -A 10 "stringifyWithoutCircular" packages/analytics-js/src/

# Check for any other security-related utilities
rg -l "encodeURI|encodeURIComponent|sanitize" packages/analytics-js/src/services/

Length of output: 13506

packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (1)

Line range hint 126-134: Enhance error handling for malformed payloads

The current error handling for send operation could be more specific about the nature of the failure.

     try {
       xhr.send(payload);
     } catch (err) {
+      const errorDetails = {
+        payloadSize: payload?.length,
+        contentType: options.headers['Content-Type'],
+      };
       reject({
-        error: getMutatedError(err, XHR_SEND_ERROR(FAILED_REQUEST_ERR_MSG_PREFIX, options.url)),
+        error: getMutatedError(err, XHR_SEND_ERROR(FAILED_REQUEST_ERR_MSG_PREFIX, options.url, errorDetails)),
         xhr,
         options,
       });
     }
packages/analytics-js-common/__tests__/utilities/json.test.ts (4)

67-70: Make options configuration explicit for better test clarity.

The test case verifies null value exclusion behavior but doesn't explicitly show the configuration. Consider making the options explicit:

-const json = stringifyWithoutCircular(objWithCircular);
+const json = stringifyWithoutCircular(objWithCircular, { excludeNull: true });

Line range hint 81-93: Consider adding more specific assertions.

While the test verifies that circular references are handled, it could benefit from additional assertions to confirm the actual structure of the stringified output.

Example:

const parsed = JSON.parse(json);
expect(parsed.reused).toEqual([1, 2, 3]);
expect(parsed.reusedAgain[2]).toEqual([1, 2, 3]);

142-152: Add a comment explaining the default BigInt handling behavior.

Consider adding a brief comment to explain that BigInt values are excluded by default for better test documentation.

+// By default, BigInt values are excluded from stringification to prevent JSON.stringify errors
 it('should remove BigInt values from input', () => {

170-177: Consider adding a test case for null values in arrays.

The current test verifies null handling for object properties. Consider adding a companion test for null values in arrays to ensure comprehensive coverage.

Example:

it('should not exclude null values in arrays when configured', () => {
  const arrayWithNull = ['value1', null, 'value2'];
  const json = stringifyWithoutCircular(arrayWithNull, { excludeNull: false });
  expect(json).toBe('["value1",null,"value2"]');
});
packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (1)

371-372: Consider extracting stringification options to a constant.

The changes look good. However, since the same options { excludeNull: false } are used in multiple places, consider extracting it to a named constant for better maintainability.

+ const COOKIE_STRINGIFY_OPTIONS = { excludeNull: false } as const;
+
  // ... later in the code ...
- const before = stringifyWithoutCircular(cData.value, { excludeNull: false });
- const after = stringifyWithoutCircular(cookieValue, { excludeNull: false });
+ const before = stringifyWithoutCircular(cData.value, COOKIE_STRINGIFY_OPTIONS);
+ const after = stringifyWithoutCircular(cookieValue, COOKIE_STRINGIFY_OPTIONS);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f5ddc6c and 3d8ab91.

⛔ Files ignored due to path filters (1)
  • .eslintrc.json is excluded by !**/*.json
📒 Files selected for processing (23)
  • packages/analytics-js-common/tests/utilities/json.test.ts (2 hunks)
  • packages/analytics-js-common/tests/utilities/sanitize.test.ts (1 hunks)
  • packages/analytics-js-common/src/types/stringify.ts (1 hunks)
  • packages/analytics-js-common/src/utilities/checks.ts (2 hunks)
  • packages/analytics-js-common/src/utilities/json.ts (2 hunks)
  • packages/analytics-js-common/src/utilities/sanitize.ts (1 hunks)
  • packages/analytics-js-cookies/tests/cookieUtilities.test.ts (0 hunks)
  • packages/analytics-js-cookies/src/cookiesUtilities.ts (1 hunks)
  • packages/analytics-js-plugins/tests/bugsnag/utils.test.ts (0 hunks)
  • packages/analytics-js-plugins/tests/errorReporting/utils.test.ts (0 hunks)
  • packages/analytics-js-plugins/tests/utilities/queue.test.ts (0 hunks)
  • packages/analytics-js-plugins/tests/xhrQueue/utilities.test.ts (0 hunks)
  • packages/analytics-js-plugins/src/beaconQueue/utilities.ts (1 hunks)
  • packages/analytics-js-plugins/src/bugsnag/utils.ts (1 hunks)
  • packages/analytics-js-plugins/src/errorReporting/utils.ts (1 hunks)
  • packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (1 hunks)
  • packages/analytics-js-plugins/src/xhrQueue/index.ts (2 hunks)
  • packages/analytics-js-plugins/src/xhrQueue/utilities.ts (3 hunks)
  • packages/analytics-js/tests/services/HttpClient/HttpClient.test.ts (0 hunks)
  • packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (2 hunks)
  • packages/analytics-js/src/services/HttpClient/HttpClient.ts (1 hunks)
  • packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (1 hunks)
  • packages/analytics-js/src/services/StoreManager/Store.ts (2 hunks)
💤 Files with no reviewable changes (6)
  • packages/analytics-js-cookies/tests/cookieUtilities.test.ts
  • packages/analytics-js-plugins/tests/bugsnag/utils.test.ts
  • packages/analytics-js-plugins/tests/errorReporting/utils.test.ts
  • packages/analytics-js-plugins/tests/utilities/queue.test.ts
  • packages/analytics-js-plugins/tests/xhrQueue/utilities.test.ts
  • packages/analytics-js/tests/services/HttpClient/HttpClient.test.ts
🧰 Additional context used
📓 Learnings (1)
packages/analytics-js-plugins/src/bugsnag/utils.ts (1)
Learnt from: saikumarrs
PR: rudderlabs/rudder-sdk-js#1740
File: packages/analytics-js-plugins/src/bugsnag/utils.ts:119-119
Timestamp: 2024-10-08T15:52:59.819Z
Learning: The `state` parameter in the `getNewClient` function within `packages/analytics-js-plugins/src/bugsnag/utils.ts` is correctly typed as `ApplicationState`.
🪛 GitHub Check: codecov/patch
packages/analytics-js-plugins/src/beaconQueue/utilities.ts

[warning] 34-34: packages/analytics-js-plugins/src/beaconQueue/utilities.ts#L34
Added line #L34 was not covered by tests

🔇 Additional comments (36)
packages/analytics-js-common/src/types/stringify.ts (1)

3-3: Verify BigInt compatibility with target browsers.

Since the PR checklist mentions IE11 testing and BigInt is not supported in IE11, ensure that appropriate fallback handling is implemented in the actual stringification logic.

✅ Verification successful

Let me check the actual implementation of BigInt handling to make a proper assessment.


Based on the implementation shown, I can now provide a final response:

BigInt handling is properly implemented with IE11 compatibility in mind

The code already implements a safe approach for handling BigInt values:

  • By default, excludeBigInt is set to true, which means BigInt values are automatically excluded from serialization
  • When excludeBigInt is false, the code properly handles the error case and returns null with appropriate warning
  • The implementation is tested thoroughly as shown in the test cases

This approach ensures compatibility with IE11 by removing BigInt values from the serialized output by default, preventing any runtime errors in unsupported browsers.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for BigInt usage and polyfills in the codebase

# Test 1: Look for BigInt type checks or conversions
echo "Checking for BigInt usage patterns..."
rg -l "typeof.*BigInt|BigInt\(" 

# Test 2: Look for BigInt polyfills
echo "Checking for BigInt polyfills..."
rg -l "polyfill.*BigInt|window\.BigInt"

Length of output: 538


Script:

#!/bin/bash
# Check the actual implementation in json.ts since it appears to be the main file handling BigInt
echo "Checking json.ts implementation:"
rg -A 10 -B 10 "BigInt" "packages/analytics-js-common/src/utilities/json.ts"

# Check the test cases to understand the handling
echo -e "\nChecking test implementation:"
rg -A 5 -B 5 "BigInt" "packages/analytics-js-common/__tests__/utilities/json.test.ts"

Length of output: 2500

packages/analytics-js-common/src/utilities/sanitize.ts (2)

1-2: LGTM! Clean and focused imports.

The imports are well-structured and properly typed, importing only what's necessary for the implementation.


1-13: Let me analyze the test coverage by examining the test file content before making a final determination.

#!/bin/bash
# Check the content of the test file
echo "Examining test file content..."
cat packages/analytics-js-common/__tests__/utilities/sanitize.test.ts

# Check the usage of stringifyWithoutCircular to ensure complete coverage
echo -e "\nChecking stringifyWithoutCircular usage..."
rg "stringifyWithoutCircular" --type ts
packages/analytics-js-plugins/src/beaconQueue/utilities.ts (2)

Line range hint 34-41: Ensure cross-browser compatibility of Blob handling.

The PR objectives emphasize cross-browser testing, particularly for IE11. The Blob creation and handling needs verification across browsers.

Let's check for any browser-specific handling:

#!/bin/bash
# Look for browser-specific Blob handling or polyfills
rg -l "Blob.*IE|polyfill.*Blob" 

# Check for any browser compatibility tests
rg -l "browser.*test.*Blob" packages/analytics-js-plugins/__tests__/
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 34-34: packages/analytics-js-plugins/src/beaconQueue/utilities.ts#L34
Added line #L34 was not covered by tests


34-34: ⚠️ Potential issue

Verify the impact of removing stringification parameter.

The removal of the second argument from stringifyWithoutCircular might affect how circular references and sensitive data are handled. Since this PR focuses on input sanitization, we need to ensure this change maintains proper data sanitization.

Let's verify the implementation of the new stringification:

The modified line lacks test coverage according to CodeCov. Given that this code handles event data that might contain sensitive information, comprehensive testing is crucial.

Consider adding test cases that verify:

  1. Proper handling of circular references
  2. Sanitization of sensitive data
  3. Edge cases with different data types
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 34-34: packages/analytics-js-plugins/src/beaconQueue/utilities.ts#L34
Added line #L34 was not covered by tests

packages/analytics-js-common/src/utilities/json.ts (4)

3-3: LGTM: Import statements are properly organized

The new imports for isBigInt and StringifyOptions are correctly placed and support the new functionality.

Also applies to: 5-5


10-10: LGTM: Well-structured options parameter with defaults

The refactoring to use an options object with proper TypeScript types and default values improves the API's maintainability and extensibility.

Also applies to: 13-13


57-59: LGTM: Clean implementation with proper error handling

The function implementation is clean and maintains proper error handling while adopting the new options pattern.

Also applies to: 62-62


Line range hint 1-67: Verify usage in security-sensitive contexts

While the implementation is secure, ensure that security-sensitive data is properly excluded using the excludeKeys option when this utility is used for logging or data transmission.

✅ Verification successful

Based on the search results, I can generate the final response as I have gathered sufficient information about how stringifyWithoutCircular is used throughout the codebase.

Proper handling of sensitive data confirmed

The implementation and usage of stringifyWithoutCircular across the codebase shows appropriate handling of sensitive data:

  1. The utility is consistently used with excludeKeys option to filter out sensitive data:

    • In app state serialization: APP_STATE_EXCLUDE_KEYS is used
    • In cookie data handling: excludeNull: false is used to ensure complete control over data
    • In error reporting: Sensitive data is filtered before sending to Bugsnag
  2. The codebase demonstrates defensive practices:

    • Error handling wraps all stringification operations
    • Logging operations use the utility with appropriate exclusion options
    • Storage operations encrypt the stringified data before persistence
  3. Security-sensitive contexts properly utilize the exclusion capabilities:

    • User session management filters sensitive data
    • HTTP client operations sanitize payloads
    • Analytics data transmission implements proper data filtering
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of stringifyWithoutCircular in logging contexts
rg -A 3 "stringifyWithoutCircular.*log" 

# Search for potential sensitive data handling
rg -l -e "password" -e "token" -e "secret" -e "key" | xargs -I {} rg -A 5 "stringifyWithoutCircular" {}

Length of output: 30916

packages/analytics-js-common/src/utilities/checks.ts (2)

84-84: LGTM!

The export is correctly added and maintains alphabetical ordering.


38-43: Verify sanitization implementation using this utility.

Since this PR focuses on input data sanitization, let's verify how this utility is used in the sanitization process.

✅ Verification successful

Let me analyze the usage of isBigInt utility function by searching for its imports and usage in sanitization-related code.


Let me try a different search pattern to find how isBigInt is used in the codebase.


Let me check the implementation details of how isBigInt is used in the json utility file and sanitization tests.


I have enough context to provide the final response now.

The isBigInt utility is properly integrated into the sanitization process

The utility function is used in the JSON stringification process to handle BigInt values:

  1. It's imported and used in json.ts as part of the circular reference replacer function
  2. By default, BigInt values are excluded during stringification (excludeBigInt = true)
  3. The sanitization tests verify that BigInt values are properly handled and removed from the output
  4. The implementation helps prevent JSON serialization errors since BigInt values are not JSON-serializable
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find usage of isBigInt in sanitization-related code

# Search for files that import and use isBigInt
rg -l "import.*isBigInt.*from.*checks" | xargs rg "isBigInt\("

# Search for sanitization-related files that might need to use this utility
rg -l "sanitize|clean|validate" | xargs rg "BigInt|bigint"

Length of output: 135945


Script:

#!/bin/bash
# Search for files that import isBigInt and use it in sanitization context
ast-grep --pattern 'import { $_ as isBigInt } from $path'

# Search for sanitization-related files that use isBigInt
rg -A 5 "sanitize.*\{.*isBigInt" 

Length of output: 94


Script:

#!/bin/bash
# Search for files that import isBigInt
rg "import.*isBigInt"

# Search for files that use isBigInt in function calls or conditions
rg "isBigInt\("

# Search for sanitize-related files
rg -l "sanitize" | xargs rg -l "bigint|BigInt"

Length of output: 734


Script:

#!/bin/bash
# Check the json utility implementation
rg -A 10 "isBigInt" packages/analytics-js-common/src/utilities/json.ts

# Check the sanitization test implementation
rg -A 10 "BigInt" packages/analytics-js-common/__tests__/utilities/sanitize.test.ts

Length of output: 1460

packages/analytics-js-common/__tests__/utilities/sanitize.test.ts (3)

1-8: LGTM: Well-structured test setup with proper mocking

The test suite is well-organized with clear describe blocks and proper logger mocking.


1-84: Add security-focused test cases

Given that this is an input sanitization feature, consider adding test cases for:

  1. XSS payload sanitization
  2. Prototype pollution attempts
  3. Maximum depth/size limits to prevent DoS
  4. Unicode edge cases

This aligns with the PR's security checklist requirement.

#!/bin/bash
# Check if there are any existing security-related tests
rg -l "xss|injection|prototype.*pollution|sanitiz" --type test

# Look for any size/depth limit implementations
rg -l "maxDepth|maxSize|limit" --type ts

63-84: Expand null value testing and verify browser compatibility

While the current test covers basic null scenarios, consider adding:

  1. Arrays containing null values
  2. Objects with null prototype
  3. Mixed null and undefined scenarios

Also, since the PR objectives mention cross-browser testing, particularly IE11, we should verify that the JSON handling works consistently across browsers.

packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (1)

37-46: ⚠️ Potential issue

Add error handling and input sanitization.

The function doesn't handle potential errors from getDeliveryPayload and doesn't perform any input sanitization, which was mentioned in the PR objectives.

Consider adding error handling and sanitization:

-const validateEventPayloadSize = (event: RudderEvent, logger?: ILogger) => {
-  const payloadStr = getDeliveryPayload(event);
-  const payloadSize = payloadStr.length;
+const validateEventPayloadSize = (event: RudderEvent, logger?: ILogger) => {
+  try {
+    // Sanitize the event data before processing
+    const sanitizedEvent = sanitizeEventData(event);
+    const payloadStr = getDeliveryPayload(sanitizedEvent);
+    const payloadSize = payloadStr.length;

Let's verify if any input sanitization is implemented elsewhere:

packages/analytics-js-cookies/src/cookiesUtilities.ts (2)

16-16: Verify cross-browser compatibility of stringification options.

The PR objectives mention cross-browser testing requirements. Ensure the stringifyWithoutCircular options object works consistently across browsers, especially IE11.

Run this verification script to check for browser-specific handling:

#!/bin/bash
# Check for browser-specific code and tests
# Description: Look for browser compatibility handling and tests

# Search for browser-specific test files
echo "Checking for browser-specific tests:"
fd -g '*browser*.test.*'

# Search for IE11 specific handling
echo "Checking for IE11 specific code:"
rg -g '!dist' -g '!node_modules' -A 5 '(IE11|Internet Explorer|isIE)'

16-16: Verify input sanitization coverage.

While the change to use excludeNull: false maintains existing behavior, consider enhancing input sanitization before stringification to prevent potential XSS or injection attacks in cookie values.

Run this verification script to check current sanitization measures:

✅ Verification successful

Let me generate additional verification scripts to check for input sanitization specifically in the cookie utilities and related code:

This will help us:

  1. Understand the context and implementation of stringifyWithoutCircular
  2. Find any sanitization wrappers around it
  3. Look for cookie-specific security measures

Based on the search results, I can now provide a final response regarding the input sanitization concern:

Input sanitization is properly handled through multiple layers of protection

The stringifyWithoutCircular function used for cookie values has built-in safeguards:

  1. Handles circular references safely by replacing them with "[Circular Reference]"
  2. Includes options for excluding null values and specific keys
  3. Has error handling that returns null on invalid input
  4. Is used in conjunction with encryption before storage

Additionally, the cookie utilities have:

  • Value validation before storage
  • Error handling for failed encryption/decryption
  • Proper domain and path scoping
  • Secure cookie options

The implementation follows security best practices and doesn't require additional sanitization.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for input sanitization patterns across the codebase
# Description: Look for sanitization functions that might be used before this stringification

# Search for sanitization-related function definitions
echo "Checking for sanitization functions:"
rg -g '!dist' -g '!node_modules' -A 5 'function.*(sanitize|escape|clean|validate).*'

# Search for XSS prevention patterns
echo "Checking for XSS prevention patterns:"
rg -g '!dist' -g '!node_modules' -A 5 '(encodeURI|encodeURIComponent|DOMPurify|sanitize|escape)'

Length of output: 20718


Script:

#!/bin/bash
# Check cookie utilities and related code for sanitization
echo "Checking cookie utilities implementation:"
rg -g '!dist' -g '!node_modules' -A 10 'stringifyWithoutCircular' 

echo -e "\nChecking cookie value sanitization:"
ast-grep --pattern 'function $_(value$_) {
  $$$
  stringifyWithoutCircular($$$)
  $$$
}'

echo -e "\nChecking cookie-related security measures:"
rg -g '!dist' -g '!node_modules' -A 5 'cookie.*sanitize|sanitize.*cookie|cookie.*validate|validate.*cookie'

Length of output: 226019

packages/analytics-js/src/services/HttpClient/HttpClient.ts (3)

Line range hint 65-77: Verify the impact of logger removal on error tracing.

While removing the logger parameter simplifies the interface, we should ensure that critical errors are still properly tracked, especially for security-relevant issues like failed requests or malformed responses.

Consider implementing structured logging for security-relevant operations. This would help in monitoring and auditing API communication patterns.


Let me try one more search to specifically find the implementation and usage of setAuthHeader.

#!/bin/bash
# Search for setAuthHeader implementation and usage
rg "setAuthHeader" -B 2 -A 2 --type ts

# Also try to find any auth header related code
rg "Authorization.*header" -B 2 -A 2 --type ts

Line range hint 69-73: Implement response data validation.

The raw response handling path (isRawResponse) bypasses JSON parsing but doesn't implement any sanitization. This could expose the application to injection vulnerabilities if the response data is rendered in the DOM or used in sensitive operations.

Let's check how the raw response is used throughout the codebase:

packages/analytics-js-plugins/src/xhrQueue/utilities.ts (2)

14-16: ⚠️ Potential issue

Consider adding error handling for stringification failures.

With the removal of the logger parameter, stringification errors might be silently ignored. Consider implementing proper error handling to ensure data integrity.

Let's verify the stringifyWithoutCircular implementation:

Suggested implementation:

 const getBatchDeliveryPayload = (events: RudderEvent[], currentTime: string): string => {
   const batchPayload: XHRBatchPayload = { batch: events, sentAt: currentTime };
-  return json.stringifyWithoutCircular(batchPayload) as string;
+  try {
+    const result = json.stringifyWithoutCircular(batchPayload);
+    if (!result) {
+      throw new Error('Stringification resulted in null or undefined');
+    }
+    return result;
+  } catch (error) {
+    throw new Error(`Failed to stringify batch payload: ${error.message}`);
+  }
 };

Line range hint 64-84: Validate activeDataplaneUrl before URL construction.

The function uses state.lifecycle.activeDataplaneUrl.value without validation, which could lead to URL construction errors or security issues if the value is malformed.

Suggested implementation:

 const getRequestInfo = (itemData: XHRRetryQueueItemData, state: ApplicationState) => {
   let data;
   let headers;
   let url: string;
   const currentTime = getCurrentTimeFormatted();
+  const dataplaneUrl = state.lifecycle.activeDataplaneUrl?.value;
+  if (!dataplaneUrl || typeof dataplaneUrl !== 'string') {
+    throw new Error('Invalid or missing dataplane URL');
+  }
   if (Array.isArray(itemData)) {
     const finalEvents = itemData.map((queueItemData: XHRQueueItemData) =>
       eventsDelivery.getFinalEventForDeliveryMutator(queueItemData.event, currentTime),
     );
     data = getBatchDeliveryPayload(finalEvents, currentTime);
     headers = itemData[0] ? clone(itemData[0].headers) : {};
-    url = getBatchDeliveryUrl(state.lifecycle.activeDataplaneUrl.value as string);
+    url = getBatchDeliveryUrl(dataplaneUrl);
   } else {

Additionally, consider standardizing the payload generation approach between array and single event cases for better maintainability.

Let's verify the usage of different payload generation methods:

packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (2)

Line range hint 91-108: Add browser compatibility tests

The XHR implementation needs verification across different browsers as per PR requirements. The current error handling looks solid but should be tested thoroughly.

Consider adding specific test cases for:

  • IE11 compatibility
  • Different status code handling across browsers
  • Timeout behavior in various browsers

Add a comment documenting the browser compatibility requirements and testing status.


Line range hint 65-73: Verify input sanitization implementation

While the code handles stringification, there might be security implications when processing user input data. Consider implementing additional sanitization steps before stringification.

Run this verification script to check for potential security risks:

packages/analytics-js-plugins/src/xhrQueue/index.ts (3)

69-69: Verify logging coverage after logger parameter removal.

The removal of the logger parameter from getRequestInfo aligns with the goal of reducing exposure of potentially sensitive data. However, ensure that critical errors are still being logged through alternative means.

#!/bin/bash
# Check if there are any remaining logging mechanisms for request info errors
rg -A 3 "getRequestInfo.*error|error.*getRequestInfo" 

Line range hint 69-105: Review security considerations for data handling.

The code handles sensitive data through the queue system. While some security measures are in place (base64 encoding of anonymousId, payload size validation), consider:

  1. Implementing input validation before queueing
  2. Sanitizing event data before stringification
  3. Adding rate limiting for queue operations
  4. Implementing payload encryption for sensitive fields

This aligns with the PR's focus on input data sanitization and the pending security checklist.


105-105: Consider additional payload validation before stringification.

While the type assertion ensures string output, consider adding explicit validation for the payload content before stringification to prevent potential XSS or injection attacks.

packages/analytics-js-plugins/src/errorReporting/utils.ts (1)

70-70: LGTM! Improved parameter structure using named options.

The change from positional parameters to a named options object enhances code maintainability and readability.

packages/analytics-js-common/__tests__/utilities/json.test.ts (3)

73-79: LGTM! Clear and focused test case.

The test case effectively verifies the basic stringification functionality without circular references.


134-137: LGTM! Proper usage of the new options object structure.

The test case correctly demonstrates the usage of multiple options in the new configuration object.


154-168: LGTM! Well-structured error handling test.

The test effectively verifies both the error condition and logging behavior when BigInt serialization fails.

packages/analytics-js-plugins/src/bugsnag/utils.ts (2)

68-69: Verify cross-browser compatibility of JSON operations.

As per PR requirements, ensure that the JSON stringification and parsing operations work consistently across Chrome, Firefox, and IE11, particularly for:

  • Circular reference handling
  • BigInt serialization
  • Unicode character handling

Let's check for browser-specific JSON handling code:

✅ Verification successful

Let me analyze the findings from the shell script results:

  1. The code uses stringifyWithoutCircular utility that handles BigInt values by default by excluding them from JSON serialization
  2. The code has explicit handling for BigInt through:
    • Type checking via isBigInt utility
    • StringifyOptions interface with excludeBigInt option
    • Tests verifying BigInt handling behavior
  3. The implementation shows that BigInts are handled safely by:
    • Removing them by default during serialization
    • Providing configuration to control BigInt handling
    • Proper error handling when BigInts can't be serialized

Based on this analysis, I can provide the final response:

BigInt values are already handled safely by the SDK's serialization utilities

The JSON stringification changes have built-in handling for BigInt values:

  • BigInts are excluded by default during serialization to prevent JSON.stringify errors
  • The code includes proper type checking and error handling for BigInt values
  • Tests verify the BigInt handling behavior works as expected

The reason for this verification is that the code already implements proper BigInt handling through its stringification utilities, making it safe across browsers that support BigInt. The implementation follows best practices by:

  1. Excluding BigInts by default to prevent serialization errors
  2. Providing configuration options for custom BigInt handling if needed
  3. Including proper error handling and logging when BigInt serialization fails
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for browser-specific JSON handling
echo "Searching for browser-specific JSON handling..."
rg -g '!*.test.*' -g '!*.spec.*' "JSON\.(parse|stringify)" -B 2 -A 2

echo "Searching for BigInt handling..."
rg "BigInt" -B 2 -A 2

Length of output: 480156


68-69: Verify sensitive data exclusion patterns.

Since this function processes application state before sending to Bugsnag (external service), it's crucial to verify that all sensitive data patterns are properly excluded.

Let's verify the APP_STATE_EXCLUDE_KEYS patterns:

packages/analytics-js/src/services/StoreManager/Store.ts (2)

211-211: 🛠️ Refactor suggestion

Consider improving null/undefined handling clarity and security.

The current ternary operation handles undefined and null values differently, which could lead to confusion. Consider making the logic more explicit and documenting the security implications.

Here's a suggested refactor for better clarity and security:

-    return typeof formattedValue === 'undefined' ? value : (formattedValue ?? '');
+    if (typeof formattedValue === 'undefined') {
+      return value; // Preserve original value if plugin returns undefined
+    }
+    
+    if (formattedValue === null) {
+      // Document security implication: Returning empty string for null values
+      return '';
+    }
+    
+    return formattedValue;

Let's verify the encryption plugin behavior:

#!/bin/bash
# Description: Check for encryption plugin implementations
# Look for plugins that might be affected by the null/undefined handling change

# Search for storage.encrypt and storage.decrypt plugin implementations
rg -A 5 "storage\.(encrypt|decrypt)"

112-112: LGTM! Verify null value handling across the codebase.

The change to use an options object with excludeNull: false is a good practice for extensibility and ensures consistent null value handling during stringification.

Let's verify the consistent handling of null values:

✅ Verification successful

Let me generate new shell scripts to search for stringifyWithoutCircular usage specifically.

The previous search might have failed due to syntax. Let's try with a different approach to find all occurrences and the implementation of this utility function to verify the null value handling consistency.


Based on the search results, I can now provide a final response regarding the consistency of null value handling:

The excludeNull: false option is consistently used across the codebase for storage operations

The codebase shows consistent handling of null values in storage operations:

  • In Store.ts, stringifyWithoutCircular is called with { excludeNull: false } to ensure null values are preserved
  • The same pattern is followed in other storage-related components:
    • UserSessionManager's cookie data handling
    • Store class implementation in both v1.1 and current versions
    • Common utilities that deal with storage serialization

The change aligns with the existing pattern in the codebase and maintains consistency in how null values are handled during storage operations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent null value handling across the codebase
# Look for other stringifyWithoutCircular calls to ensure they handle null values consistently

# Search for all stringifyWithoutCircular usages
rg "stringifyWithoutCircular" -A 2 -B 2

Length of output: 10991


Script:

#!/bin/bash
# Search for stringifyWithoutCircular with better context
rg -U "stringifyWithoutCircular" -B 5 -A 5

# Also search for the file containing the utility function
fd "stringifyWithoutCircular" --type f

Length of output: 98947

packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (1)

308-308: LGTM! Verify cookie encryption behavior.

The update to use the new options object format for stringifyWithoutCircular is consistent with the codebase changes. The excludeNull: false option ensures null values are properly included in the encrypted cookie data.

✅ Verification successful

Based on the results, I can now generate the final response since I have sufficient information about the usage of stringifyWithoutCircular across the codebase.

The stringifyWithoutCircular option is consistently implemented

The changes to use { excludeNull: false } are consistently applied across all usages in the codebase:

  • Store encryption in Store.ts
  • Cookie data handling in UserSessionManager.ts
  • Cookie utilities in cookiesUtilities.ts
  • Test cases in json.test.ts

This confirms that the cookie encryption behavior maintains consistency throughout the system.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify consistent usage of stringifyWithoutCircular options across the codebase
ast-grep --pattern 'stringifyWithoutCircular($_, { excludeNull: $_ })'

Length of output: 995

Copy link

github-actions bot commented Oct 23, 2024

size-limit report 📦

Name Size (Base) Size (Current) Size Limit Status
Plugins Module Federation Mapping - Legacy - CDN 332 B 332 B (0%) 512 B
Plugins - Legacy - CDN 15.59 KB 15.48 KB (-0.74% ▼) 16 KB
Plugins Module Federation Mapping - Modern - CDN 331 B 331 B (0%) 512 B
Plugins - Modern - CDN 7.2 KB 7.2 KB (0%) 7.5 KB
Common - No bundling 16.21 KB 16.21 KB (0%) 16.5 KB
Cookies Utils - Legacy - NPM (ESM) 1.54 KB 1.45 KB (-5.9% ▼) 2 KB
Cookies Utils - Legacy - NPM (CJS) 1.75 KB 1.65 KB (-5.26% ▼) 2 KB
Cookies Utils - Legacy - NPM (UMD) 1.53 KB 1.43 KB (-6.44% ▼) 2 KB
Cookies Utils - Modern - NPM (ESM) 1.17 KB 1.11 KB (-5.26% ▼) 1.5 KB
Cookies Utils - Modern - NPM (CJS) 1.4 KB 1.34 KB (-4.61% ▼) 1.5 KB
Cookies Utils - Modern - NPM (UMD) 1.16 KB 1.08 KB (-6.17% ▼) 1.5 KB
Load Snippet 758 B 758 B (0%) 1 KB
Core (v1.1) - NPM (ESM) 30.28 KB 30.36 KB (+0.27% ▲) 32 KB
Core (v1.1) - NPM (CJS) 30.48 KB 30.55 KB (+0.21% ▲) 32 KB
Core (v1.1) - NPM (UMD) 30.35 KB 30.37 KB (+0.07% ▲) 32 KB
Core (Content Script - v1.1) - NPM (ESM) 29.84 KB 29.89 KB (+0.17% ▲) 30.5 KB
Core (Content Script - v1.1) - NPM (CJS) 30.06 KB 30.09 KB (+0.12% ▲) 30.5 KB
Core (Content Script - v1.1) - NPM (UMD) 29.91 KB 29.92 KB (+0.05% ▲) 30 KB
Core - Legacy - CDN 48.46 KB 48.33 KB (-0.29% ▼) 48.5 KB
Core - Modern - CDN 24.78 KB 24.75 KB (-0.15% ▼) 25 KB
Core - Legacy - NPM (ESM) 48.24 KB 48.24 KB (-0.01% ▼) 48.5 KB
Core - Legacy - NPM (CJS) 48.54 KB 48.46 KB (-0.16% ▼) 49 KB
Core - Legacy - NPM (UMD) 48.26 KB 48.19 KB (-0.14% ▼) 48.5 KB
Core - Modern - NPM (ESM) 24.52 KB 24.48 KB (-0.15% ▼) 25 KB
Core - Modern - NPM (CJS) 24.72 KB 24.72 KB (-0.02% ▼) 25 KB
Core - Modern - NPM (UMD) 24.57 KB 24.53 KB (-0.17% ▼) 25 KB
Core (Bundled) - Legacy - NPM (ESM) 48.24 KB 48.24 KB (-0.01% ▼) 48.5 KB
Core (Bundled) - Legacy - NPM (CJS) 48.55 KB 48.44 KB (-0.23% ▼) 49 KB
Core (Bundled) - Legacy - NPM (UMD) 48.26 KB 48.19 KB (-0.14% ▼) 48.5 KB
Core (Bundled) - Modern - NPM (ESM) 39.41 KB 39.34 KB (-0.17% ▼) 39.5 KB
Core (Bundled) - Modern - NPM (CJS) 39.65 KB 39.6 KB (-0.12% ▼) 40 KB
Core (Bundled) - Modern - NPM (UMD) 39.38 KB 39.29 KB (-0.23% ▼) 39.5 KB
Core (Content Script) - Legacy - NPM (ESM) 47.73 KB 47.61 KB (-0.25% ▼) 48 KB
Core (Content Script) - Legacy - NPM (CJS) 47.98 KB 47.89 KB (-0.19% ▼) 48 KB
Core (Content Script) - Legacy - NPM (UMD) 47.71 KB 47.62 KB (-0.2% ▼) 48 KB
Core (Content Script) - Modern - NPM (ESM) 38.8 KB 38.84 KB (+0.11% ▲) 39 KB
Core (Content Script) - Modern - NPM (CJS) 39.06 KB 39.05 KB (-0.03% ▼) 40 KB
Core (Content Script) - Modern - NPM (UMD) 38.81 KB 38.76 KB (-0.12% ▼) 39 KB
Service Worker - Legacy - NPM (ESM) 30.6 KB 30.6 KB (0%) 31 KB
Service Worker - Legacy - NPM (CJS) 30.88 KB 30.88 KB (0%) 31 KB
Service Worker - Legacy - NPM (UMD) 30.68 KB 30.68 KB (0%) 31 KB
Service Worker - Modern - NPM (ESM) 25.74 KB 25.74 KB (0%) 26 KB
Service Worker - Modern - NPM (CJS) 26.02 KB 26.02 KB (0%) 26.5 KB
Service Worker - Modern - NPM (UMD) 25.79 KB 25.79 KB (0%) 26 KB
All Integrations - Legacy - CDN 94.68 KB 94.68 KB (0%) 95.3 KB
All Integrations - Modern - CDN 89.82 KB 89.82 KB (0%) 91 KB

@saikumarrs saikumarrs force-pushed the feat.sanitize-data-sdk-2254 branch from 3d8ab91 to 7723b27 Compare October 25, 2024 06:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Outside diff range and nitpick comments (9)
packages/analytics-js-common/src/constants/logMessages.ts (1)

12-13: Consider enhancing the warning message clarity.

While the message consolidation is good, the warning message could be more precise about what constitutes "bad data" to help with debugging.

Consider this alternative message format:

-  `${context}${LOG_CONTEXT_SEPARATOR}A bad data (like circular reference, BigInt) has been detected in the object and the property "${key}" has been dropped from the output.`;
+  `${context}${LOG_CONTEXT_SEPARATOR}Non-serializable data (circular reference, BigInt, etc.) detected in property "${key}". This property has been dropped from the output.`;
packages/analytics-js-common/src/utilities/object.ts (1)

Line range hint 1-121: Consider enhancing type safety and documentation for sanitization strategy

As this file is part of the input sanitization implementation, consider these improvements:

  1. Add JSDoc comments to isObject explaining its role in the sanitization strategy
  2. Consider using TypeScript's strict type checking by avoiding any types where possible
  3. Add unit tests specifically for edge cases in data sanitization
  4. Consider creating a dedicated sanitization module that uses these utilities

This will make the sanitization strategy more maintainable and secure.

packages/analytics-js/src/components/configManager/ConfigManager.ts (1)

Line range hint 89-146: Consider security implications of removed input validation.

The removal of validateLoadArgs from the init() method could potentially allow invalid or malicious input to propagate through the system. While this aligns with the PR's goal of sanitizing input data, we should ensure that:

  1. Invalid writeKey values don't lead to unauthorized access
  2. Malformed dataPlaneUrl values don't enable request forgery
  3. Error handling adequately catches and handles invalid inputs

Consider implementing validation at the state update level or adding input sanitization middleware to prevent potential security issues.

packages/analytics-js-common/__tests__/utilities/json.test.ts (4)

7-74: Enhance test coverage for stringifyData.

While the current tests cover basic scenarios well, consider adding the following test cases for more robust coverage:

  • Empty objects and arrays
  • Objects with special characters in keys
  • Deeply nested objects (to verify stack overflow protection)
  • Large arrays (to verify performance)
  • Mixed data types within arrays

Example test case:

it('should handle edge cases', () => {
  const obj = {
    empty: {},
    emptyArray: [],
    'special@key': 'value',
    deeplyNested: /* deep object */,
    largeArray: Array(1000).fill(1),
    mixedArray: [1, 'string', true, null, undefined]
  };
  // ... assertions
});

83-104: Document behavior for function properties.

The test includes function properties (i and s) but doesn't explicitly verify how they're handled in the sanitized output. Consider adding assertions to document the expected behavior.

// Add assertion
expect(typeof getSanitizedValue(obj).i).toBe('undefined'); // or expected behavior

Also applies to: 106-106


234-247: Verify logger is not called for valid data types.

The test verifies sanitization of various data types but doesn't explicitly check that the logger isn't called when processing valid data. Consider adding this verification.

it('should not log warnings for valid data types', () => {
  const mockLogger = { warn: jest.fn() };
  // ... your existing test code ...
  expect(mockLogger.warn).not.toHaveBeenCalled();
});

1-5: Add JSDoc comments for test constants.

Consider adding JSDoc comments to explain the purpose and usage of the constants circularReferenceNotice and bigIntNotice. This would help other developers understand their significance in the tests.

/** Placeholder text used when circular references are detected during sanitization */
const circularReferenceNotice = '[Circular Reference]';

/** Placeholder text used when BigInt values are detected during sanitization */
const bigIntNotice = '[BigInt]';
packages/analytics-js/src/app/RudderAnalytics.ts (2)

119-123: Enhance JSDoc documentation

The JSDoc comments could be more descriptive. Consider adding:

  • Description of what the method does
  • @throws documentation if any
  • Example usage

Line range hint 324-341: Add input sanitization to group method parameters

While the error handling for empty calls is good, the group method parameters should be sanitized consistently with other methods.

Apply this diff to add sanitization:

   group(
     groupId: string | number | Nullable<ApiObject>,
     traits?: Nullable<ApiOptions> | Nullable<ApiObject> | ApiCallback,
     options?: Nullable<ApiOptions> | ApiCallback,
     callback?: ApiCallback,
   ) {
     if (arguments.length === 0) {
       this.logger.error(EMPTY_GROUP_CALL_ERROR(RS_APP));
       return;
     }
 
     this.getAnalyticsInstance().group(
-      groupArgumentsToCallOptions(groupId, traits, options, callback),
+      groupArgumentsToCallOptions(
+        getSanitizedValue(groupId),
+        getSanitizedValue(traits),
+        getSanitizedValue(options),
+        callback,
+      ),
     );
   }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3d8ab91 and 7723b27.

⛔ Files ignored due to path filters (1)
  • .eslintrc.json is excluded by !**/*.json
📒 Files selected for processing (34)
  • packages/analytics-js-common/tests/utilities/json.test.ts (1 hunks)
  • packages/analytics-js-common/src/constants/logMessages.ts (1 hunks)
  • packages/analytics-js-common/src/utilities/checks.ts (2 hunks)
  • packages/analytics-js-common/src/utilities/errors.ts (2 hunks)
  • packages/analytics-js-common/src/utilities/json.ts (1 hunks)
  • packages/analytics-js-common/src/utilities/object.ts (2 hunks)
  • packages/analytics-js-cookies/tests/cookieUtilities.test.ts (0 hunks)
  • packages/analytics-js-cookies/src/cookiesUtilities.ts (2 hunks)
  • packages/analytics-js-plugins/tests/bugsnag/utils.test.ts (0 hunks)
  • packages/analytics-js-plugins/tests/errorReporting/utils.test.ts (0 hunks)
  • packages/analytics-js-plugins/tests/utilities/queue.test.ts (0 hunks)
  • packages/analytics-js-plugins/tests/xhrQueue/utilities.test.ts (0 hunks)
  • packages/analytics-js-plugins/src/beaconQueue/utilities.ts (1 hunks)
  • packages/analytics-js-plugins/src/bugsnag/utils.ts (1 hunks)
  • packages/analytics-js-plugins/src/errorReporting/event/event.ts (2 hunks)
  • packages/analytics-js-plugins/src/errorReporting/utils.ts (3 hunks)
  • packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (2 hunks)
  • packages/analytics-js-plugins/src/xhrQueue/index.ts (2 hunks)
  • packages/analytics-js-plugins/src/xhrQueue/utilities.ts (3 hunks)
  • packages/analytics-js/tests/components/configManager/ConfigManager.test.ts (0 hunks)
  • packages/analytics-js/tests/components/configManager/validate.test.ts (0 hunks)
  • packages/analytics-js/tests/components/core/Analytics.test.ts (0 hunks)
  • packages/analytics-js/tests/components/userSessionManager/UserSessionManager.test.ts (4 hunks)
  • packages/analytics-js/tests/services/HttpClient/HttpClient.test.ts (0 hunks)
  • packages/analytics-js/src/app/RudderAnalytics.ts (5 hunks)
  • packages/analytics-js/src/components/configManager/ConfigManager.ts (1 hunks)
  • packages/analytics-js/src/components/configManager/util/validate.ts (2 hunks)
  • packages/analytics-js/src/components/core/Analytics.ts (2 hunks)
  • packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (4 hunks)
  • packages/analytics-js/src/constants/logMessages.ts (1 hunks)
  • packages/analytics-js/src/services/ErrorHandler/processError.ts (2 hunks)
  • packages/analytics-js/src/services/HttpClient/HttpClient.ts (1 hunks)
  • packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (2 hunks)
  • packages/analytics-js/src/services/StoreManager/Store.ts (3 hunks)
💤 Files with no reviewable changes (9)
  • packages/analytics-js-cookies/tests/cookieUtilities.test.ts
  • packages/analytics-js-plugins/tests/bugsnag/utils.test.ts
  • packages/analytics-js-plugins/tests/errorReporting/utils.test.ts
  • packages/analytics-js-plugins/tests/utilities/queue.test.ts
  • packages/analytics-js-plugins/tests/xhrQueue/utilities.test.ts
  • packages/analytics-js/tests/components/configManager/ConfigManager.test.ts
  • packages/analytics-js/tests/components/configManager/validate.test.ts
  • packages/analytics-js/tests/components/core/Analytics.test.ts
  • packages/analytics-js/tests/services/HttpClient/HttpClient.test.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/analytics-js/tests/components/userSessionManager/UserSessionManager.test.ts
🚧 Files skipped from review as they are similar to previous changes (10)
  • packages/analytics-js-common/src/utilities/checks.ts
  • packages/analytics-js-cookies/src/cookiesUtilities.ts
  • packages/analytics-js-plugins/src/beaconQueue/utilities.ts
  • packages/analytics-js-plugins/src/bugsnag/utils.ts
  • packages/analytics-js-plugins/src/errorReporting/utils.ts
  • packages/analytics-js-plugins/src/xhrQueue/index.ts
  • packages/analytics-js-plugins/src/xhrQueue/utilities.ts
  • packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts
  • packages/analytics-js/src/services/HttpClient/HttpClient.ts
  • packages/analytics-js/src/services/StoreManager/Store.ts
🧰 Additional context used
📓 Learnings (2)
packages/analytics-js/src/components/configManager/util/validate.ts (2)
Learnt from: MoumitaM
PR: rudderlabs/rudder-sdk-js#1777
File: packages/analytics-js/src/components/configManager/util/validate.ts:0-0
Timestamp: 2024-10-08T15:52:59.819Z
Learning: The function `isTopLevelDomain` in `validate.ts` was renamed to `isWebpageTopLevelDomain` for clarity.
Learnt from: MoumitaM
PR: rudderlabs/rudder-sdk-js#1777
File: packages/analytics-js/src/components/configManager/util/validate.ts:0-0
Timestamp: 2024-07-04T07:11:53.001Z
Learning: The function `isTopLevelDomain` in `validate.ts` was renamed to `isWebpageTopLevelDomain` for clarity.
packages/analytics-js/src/constants/logMessages.ts (1)
Learnt from: saikumarrs
PR: rudderlabs/rudder-sdk-js#1730
File: packages/analytics-js/src/constants/logMessages.ts:0-0
Timestamp: 2024-10-08T15:52:59.819Z
Learning: The `DATA_PLANE_URL_VALIDATION_ERROR` function in `packages/analytics-js/src/constants/logMessages.ts` should use `string | undefined` instead of `any` for the `dataPlaneUrl` parameter to enhance type safety.
🔇 Additional comments (37)
packages/analytics-js-common/src/utilities/errors.ts (3)

Line range hint 9-17: Ensure cross-browser compatibility of error handling

While the error handling logic uses standard JavaScript features, please verify that stringifyData behaves consistently across all required browsers (Chrome, Firefox, IE11) for various input types, especially:

  • BigInt values (mentioned in AI summary)
  • Complex objects with special characters
  • Objects with undefined/null values

This is particularly important as the PR checklist indicates browser testing hasn't been completed yet.


13-13: Consider security implications of error message content

While using stringifyData aligns with the PR's goal of input sanitization, we should ensure that sensitive information is not inadvertently exposed in error messages. Consider:

  1. Implementing a sanitization step specifically for error messages
  2. Adding a way to redact sensitive fields before serialization
✅ Verification successful

Based on the analysis of the code, I can now provide a final response:

Error serialization is already secure through stringifyData implementation

The stringifyData utility used for error serialization already implements secure data handling:

  1. It handles circular references and BigInt values safely
  2. It supports excluding sensitive fields through the excludeKeys parameter
  3. The error message construction is standardized through the error utilities

The current implementation in errors.ts is following security best practices by:

  • Using type checking to handle errors appropriately
  • Providing structured error messages
  • Leveraging the sanitized stringification
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any existing sanitization or redaction mechanisms
rg -i "sanitize|redact" --type ts

# Look for sensitive data handling patterns
rg -i "password|secret|key|token" --type ts

Length of output: 183494


Script:

#!/bin/bash
# Look for stringifyData implementation and usage to understand sanitization
ast-grep --pattern 'stringifyData($$$)'

# Look for error handling patterns
rg -A 5 "new Error" --type ts

Length of output: 59383


1-2: Verify circular reference handling in stringifyData

The change from stringifyWithoutCircular to stringifyData suggests a change in serialization behavior. We should ensure that the new function maintains proper handling of circular references to prevent potential issues.

✅ Verification successful

Circular reference handling is properly maintained

The change from stringifyWithoutCircular to stringifyData is safe as the new implementation maintains proper handling of circular references through the getSanitizedValue function. The code shows that:

  1. stringifyData is used for basic JSON stringification with null/key exclusion
  2. getSanitizedValue (which is imported alongside) handles circular references by tracking object ancestors and replacing circular references with '[Circular Reference]'
  3. BigInt values are also properly sanitized to '[BigInt]'
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for the implementation of stringifyData to verify circular reference handling
ast-grep --pattern 'function stringifyData($_) {
  $$$
}'

# Search for related test cases
rg -l "circular" --glob "**/*.test.{ts,js}"

Length of output: 183494


Script:

#!/bin/bash
# Search for stringifyData implementation and related tests
rg -A 10 "function stringifyData" --type ts
rg -A 10 "describe\('stringifyData" --type ts

Length of output: 1140


Script:

#!/bin/bash
# Search for stringifyData implementation in json.ts
cat packages/analytics-js-common/src/utilities/json.ts

Length of output: 3506

packages/analytics-js-common/src/constants/logMessages.ts (2)

20-21: LGTM!

The export changes correctly reflect the consolidation of warning messages.


12-21: Verify the impact of consolidated warning messages.

The consolidation of warning messages might affect error tracking and monitoring systems that rely on specific error messages.

Let's check for any direct string references to the old warning messages:

✅ Verification successful

Based on the git history and code changes, I can determine that the warning message consolidation was intentional and does not have any negative impact. Here's why:

The warning messages for circular references and bad data (BigInt) were consolidated into a single message BAD_DATA_WARNING to improve code maintainability and reduce duplication.

The key evidence:

  1. The change was made in commit 7723b27 with commit message "feat: sanitize input data"
  2. The consolidated message handles both circular references and BigInt cases
  3. The message format and structure remains consistent
  4. All references to the old warning messages in tests and code were updated accordingly
  5. The functionality remains the same - warning users about problematic data that needs to be dropped

The consolidation of these warning messages was an intentional improvement to the codebase and does not negatively impact error tracking or monitoring systems. The warning message still provides the necessary context about what data was dropped and why.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining references to the old warning messages
# This helps ensure we haven't missed updating any error handling code

echo "Checking for references to old warning messages..."
rg -i "circular.*reference.*warning|json.*stringify.*warning" \
   --type ts \
   --type js \
   --type test

Length of output: 235


Script:

#!/bin/bash
# Let's search for any references to the old warning messages with correct file types
# and also check the history of the file to understand what was consolidated

echo "Checking for references to old warning messages..."
rg -i "circular.*reference|json.*stringify.*warning" \
   --type ts \
   --type js

echo -e "\nChecking the history of logMessages.ts..."
git log -p -- "**/logMessages.ts"

Length of output: 264102

packages/analytics-js/src/services/ErrorHandler/processError.ts (3)

Line range hint 8-24: Verify cross-browser compatibility

The processError function handles various error types and is critical for error reporting. The PR objectives specifically mention the need for cross-browser testing (Chrome, Firefox, IE11).

Given that the sandbox environment doesn't support browser testing, please ensure:

  1. Test the error handling with different browser-specific error objects
  2. Verify the behavior in IE11, especially for:
    • ErrorEvent handling
    • JSON stringification of errors
    • Error object property access

Consider adding JSDoc comments to document the expected behavior of stringifyData for different error types:

/**
 * Processes various types of errors into a consistent string format
 * @param error - The error to process (can be string, Error, ErrorEvent, or unknown type)
 * @returns A string representation of the error
 * @throws If the error cannot be processed or stringified
 */

1-1: Verify the availability of stringifyData utility

The import statement has been updated to use the new stringifyData function. Let's verify its availability and implementation in the common utilities.

✅ Verification successful

Import of stringifyData utility is valid and properly implemented

The verification confirms that:

  • The stringifyData utility is properly implemented in packages/analytics-js-common/src/utilities/json.ts
  • It is correctly exported and widely used across the codebase
  • The import path @rudderstack/analytics-js-common/utilities/json is consistent with other imports in the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the export of stringifyData function and its implementation

# Check if the function is properly exported
ast-grep --pattern 'export { $$$ stringifyData $$$ }' 

# Look for the implementation
ast-grep --pattern 'function stringifyData($$$) {
  $$$
}'

Length of output: 116


Script:

#!/bin/bash
# Let's try different approaches to locate the stringifyData utility

# Search for stringifyData in all files
rg "stringifyData" -A 3

# Search for the file that might contain this utility
fd "json.ts" --type f

# Look for export patterns with different syntax
ast-grep --pattern 'export const stringifyData'

# Check if it's defined as an arrow function
ast-grep --pattern 'export const stringifyData = ($$$) => {'

Length of output: 16256


22-22: Verify error serialization behavior

The change from stringifyWithoutCircular to stringifyData affects how non-standard errors are serialized. This is a critical path for error handling, and we need to ensure the new function maintains the same safety guarantees for handling circular references and potentially sensitive data.

Consider adding explicit test cases for:

  1. Circular reference handling
  2. Sensitive data sanitization
  3. Error objects with non-standard properties
✅ Verification successful

No safety concerns with stringifyData replacement

After reviewing the implementation, the change from stringifyWithoutCircular to stringifyData is safe because:

  1. The stringifyData function maintains proper error handling through the try-catch block in processError
  2. The implementation handles null values and provides JSON serialization with safety checks
  3. The test coverage in processError.test.ts verifies the behavior for various error scenarios including:
    • Standard Error objects
    • String errors
    • Custom objects with message property
    • Custom objects without message property
  4. While stringifyData doesn't explicitly handle circular references like the old function, this is not a concern in the error handling context as error objects are typically simple structures, and any serialization failures are caught by the surrounding try-catch block.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Look for test coverage of stringifyData with error objects

# Check test coverage for error object serialization
rg -A 5 "stringifyData.*Error" --type ts

Length of output: 42


Script:

#!/bin/bash
# Let's first find the stringifyData implementation
ast-grep --pattern 'function stringifyData'

# Also search for any tests related to error handling
rg -l "describe.*Error" --type ts

# Look for the old implementation of stringifyWithoutCircular
ast-grep --pattern 'function stringifyWithoutCircular'

# Check for any error handling tests
rg "test.*error.*stringifyData|stringifyWithoutCircular" --type ts -A 5

Length of output: 770


Script:

#!/bin/bash
# Let's check the implementation details in the processError.test.ts file
cat packages/analytics-js/__tests__/services/ErrorHandler/processError.test.ts

# Also check the actual implementation file
cat packages/analytics-js/src/services/ErrorHandler/processError.ts

# Look for stringifyData implementation in utilities
fd stringifyData --type f --exec cat {}

Length of output: 6592


Script:

#!/bin/bash
# Let's find the implementation of stringifyData in the common utilities
cat packages/analytics-js-common/src/utilities/json.ts

# Also check if there are any tests for stringifyData
fd stringifyData --type f --extension test.ts --exec cat {}

Length of output: 3570

packages/analytics-js/src/components/configManager/util/validate.ts (3)

66-67: LGTM! Export changes are well-organized.

The new validation functions are properly exported and maintain the alphabetical ordering convention.


9-11: Consider documenting the validation strategy change.

The shift from throwing errors to returning booleans represents a significant change in error handling strategy. This change requires careful updates in all consuming code to properly handle validation failures.

Consider:

  1. Adding JSDoc comments to document the new validation behavior
  2. Updating error handling guidelines in the codebase
  3. Ensuring consistent error handling patterns across all validation functions

Let's check for potential usage patterns that might need updates:

#!/bin/bash
# Search for potential error handling patterns that need updates
rg "validateWriteKey|validateDataPlaneUrl" -A 3

11-11: Enhance URL validation for better security.

The current implementation relies solely on isValidURL. Consider adding additional sanitization checks for the data plane URL to prevent potential security issues like SSRF attacks.
[security]

Let's verify the URL validation implementation:

packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (2)

28-29: Previous review comment about inconsistent error handling remains valid.

The function maintains a nullable return type while getDeliveryPayload doesn't, creating inconsistency in error handling approaches.


6-6: Verify the sanitization implementation in stringifyData.

The change from stringifyWithoutCircular to stringifyData suggests enhanced data handling. Let's verify the implementation to ensure proper sanitization.

✅ Verification successful

The stringifyData implementation is properly handling data sanitization

The implementation in @rudderstack/analytics-js-common/utilities/json.ts shows that stringifyData is a well-implemented function that:

  • Handles null value exclusion through a configurable flag
  • Supports key exclusion through an optional array parameter
  • Properly serializes data using JSON.stringify with a custom replacer function

The change from stringifyWithoutCircular to stringifyData is appropriate as the module also provides a separate getSanitizedValue function that specifically handles circular references and BigInt values, which can be used when needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of stringifyData function
# Expected: Should find the implementation with sanitization logic

ast-grep --pattern 'export function stringifyData<T>($_): $_ {
  $$$
}'

Length of output: 73


Script:

#!/bin/bash
# Let's try a different approach to find the stringifyData implementation
# First, let's find the file containing stringifyData
fd -e ts -e js "json.ts" -x cat {} 

# Also search for any direct references to stringifyData
rg "stringifyData" -A 3

Length of output: 19557

packages/analytics-js-plugins/src/errorReporting/event/event.ts (3)

70-72: Review error logging sanitization approach

The error logging implementation needs careful consideration:

  1. Ensure that stringifyData(error, false) properly sanitizes sensitive information
  2. Verify that the false parameter provides appropriate sanitization level for warning logs

Let's analyze the error logging patterns across the codebase:

Consider adding a dedicated error sanitization utility that:

  1. Explicitly defines which error properties should be sanitized
  2. Provides consistent sanitization across all error logging
  3. Documents the security considerations for error handling

4-4: Verify security implications of stringifyData function

The switch to stringifyData for error serialization needs careful verification to ensure proper sanitization of potentially sensitive or malicious data in error objects.

Let's verify the implementation and usage of the new function:

✅ Verification successful

The switch to stringifyData is a security improvement for error serialization

The change from stringifyWithoutCircular to stringifyData is a verified security enhancement. The new function:

  1. Properly handles and sanitizes circular references and BigInt values
  2. Has comprehensive test coverage for security edge cases
  3. Includes proper logging of potentially dangerous data structures
  4. Uses safe traversal methods that respect object prototypes

The implementation shows careful consideration of security with:

  • Ancestor tracking to detect circular references
  • Safe object property enumeration using Object.hasOwnProperty
  • Explicit type checking and sanitization
  • Warning logs for potentially dangerous data
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Analyze the implementation and usage of stringifyData

# Check the implementation of stringifyData
echo "Checking stringifyData implementation:"
rg -A 10 "export.*stringifyData" --type ts

# Verify test coverage
echo "Checking test coverage:"
rg -l "stringifyData.*test" --type ts

# Look for security-related comments/considerations
echo "Checking for security considerations:"
rg -i "security|sanitize|escape|xss" --type ts $(fd -e ts -e js utilities)

Length of output: 246627


Script:

#!/bin/bash
# Check the implementation and usage of stringifyData function
echo "Checking stringifyData implementation:"
rg -A 10 "export.*stringifyData" --type ts packages/analytics-js-common/src/utilities/json.ts

# Check test coverage for stringifyData
echo "Checking test coverage:"
rg -A 10 "stringifyData" --type ts packages/analytics-js-common/__tests__/utilities/json.test.ts

# Look for security-related comments/considerations
echo "Checking for security considerations:"
rg -i "security|sanitize|escape|xss" --type ts packages/analytics-js-common/src/utilities/json.ts

Length of output: 2777


Script:

#!/bin/bash
# Check the actual implementation of stringifyData and getSanitizedValue
echo "Checking full implementation:"
rg -l "stringifyData|getSanitizedValue" --type ts packages/analytics-js-common/src/utilities/json.ts | xargs cat

# Check for any security-related tests
echo "Checking security-related tests:"
rg -B 2 -A 5 "security|sanitize|malicious|circular|prototype" --type ts packages/analytics-js-common/__tests__/utilities/json.test.ts

Length of output: 6751


4-4: Ensure cross-browser compatibility of stringifyData

As per PR objectives, cross-browser testing is required. The stringifyData function needs to be tested across Chrome, Firefox, and IE11 to ensure consistent error handling and sanitization.

Let's check for browser-specific handling:

Also applies to: 70-72

packages/analytics-js-common/src/utilities/object.ts (1)

121-121: Verify usage of isObject across the codebase

Since this utility is being exported and is crucial for input sanitization, let's verify its usage pattern across the codebase.

packages/analytics-js-common/src/utilities/json.ts (3)

1-8: LGTM! Well-organized imports and constants.

The imports are properly typed and organized, with clear separation between types and utilities.


103-103: LGTM! Clean exports.

The exports are well-organized and only expose the necessary functions.


1-103: Verify security implications of data sanitization.

Since this PR focuses on input data sanitization, we should verify that all potential attack vectors are covered.

Let's check for any potentially dangerous patterns in the codebase:

✅ Verification successful

Based on the scan results, I can now provide a comprehensive security assessment of the data sanitization changes. Let me generate the final response:

Security implications of data sanitization are properly handled

The code implements robust sanitization with proper handling of potential security concerns:

  • The JSON sanitization utility properly handles BigInt and circular references
  • Direct JSON operations in the codebase are properly wrapped with error handling and sanitization
  • The only eval/Function usage found is in loading-scripts/src/index.ts which is unrelated to data sanitization
  • Storage access (localStorage/sessionStorage) and cookie operations are properly encapsulated in utility functions with appropriate validation
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potentially dangerous patterns that might bypass sanitization

# Look for direct JSON.parse/stringify usage that might bypass these utilities
echo "Checking for direct JSON operations that might bypass sanitization..."
rg "JSON\.(parse|stringify)" --type ts

# Look for eval or Function constructor usage that might process unsanitized data
echo "Checking for dangerous eval patterns..."
rg "(eval\(|new Function\()" --type ts

# Look for places where raw user input might be processed
echo "Checking for potential raw input processing..."
rg "(localStorage|sessionStorage|location\.|document\.cookie)" --type ts

Length of output: 30432

packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (3)

3-3: LGTM: Import change aligns with sanitization objectives

The replacement of stringifyWithoutCircular with stringifyData is consistent with the PR's focus on input sanitization.


Line range hint 114-122: Enhance security measures for XHR requests

The current implementation could benefit from additional security measures:

  1. URL sanitization before opening the request
  2. Validation of header values to prevent injection
  3. Documentation of CORS implications when using withCredentials
#!/bin/bash
# Check for URL sanitization across the codebase
rg -g '*.{ts,js}' 'sanitizeUrl|validateUrl' 

Consider implementing URL validation:

-    xhr.open(options.method, options.url, true);
+    const sanitizedUrl = sanitizeUrl(options.url);
+    if (!sanitizedUrl) {
+      reject({
+        error: new Error('Invalid URL format'),
+        xhr,
+        options,
+      });
+      return;
+    }
+    xhr.open(options.method, sanitizedUrl, true);

Line range hint 65-73: Add input validation before stringification

While the change to stringifyData improves sanitization, consider adding explicit validation before processing the data, especially when sendRawData is true.

Consider implementing validation:

   if (options.sendRawData === true) {
+    if (!validateInputData(options.data)) {
+      reject({
+        error: new Error('Invalid input data format'),
+        undefined,
+        options,
+      });
+      return;
+    }
     payload = options.data;
   } else {
+    const validatedData = validateInputData(options.data);
+    if (!validatedData) {
+      reject({
+        error: new Error('Invalid input data format'),
+        undefined,
+        options,
+      });
+      return;
+    }
     payload = stringifyData(options.data);
packages/analytics-js/src/components/configManager/ConfigManager.ts (3)

Line range hint 1-324: Request for additional context and documentation.

The changes to ConfigManager.ts raise several concerns:

  1. The removal of input validation without clear replacement mechanisms needs explanation
  2. The relationship between these changes and the PR's input sanitization objective isn't clear
  3. Documentation updates might be needed to reflect the new validation/sanitization strategy

Could you provide additional context about:

  1. Where the input sanitization is being implemented?
  2. How the validation responsibilities have been redistributed?
  3. Whether documentation updates are planned?

Line range hint 147-190: Clarify alignment with PR objectives.

While this PR is titled "feat: sanitize input data", the changes in ConfigManager.ts appear to be removing validation rather than adding sanitization. This seems inconsistent with the PR's stated objectives.

The processConfig method still validates the source config, but the removal of input validation without clear replacement sanitization mechanisms needs clarification.


13-13: Verify replacement validation strategy for writeKey and dataPlaneUrl.

The removal of validateLoadArgs import suggests a change in validation strategy. Need to ensure that writeKey and dataPlaneUrl validation is properly handled elsewhere in the codebase.

packages/analytics-js/src/app/RudderAnalytics.ts (3)

21-21: LGTM: Import changes align with sanitization feature

The addition of getSanitizedValue import is appropriate for the input sanitization implementation.


217-223: LGTM: Consistent input sanitization implementation

Input sanitization is correctly and consistently applied across all event parameters using getSanitizedValue.

Also applies to: 246-251


127-136: Verify breaking changes in load method signature

The load method now requires dataPlaneUrl as a mandatory parameter, which is a breaking change. This could affect existing implementations.

Let's check for existing usage patterns:

packages/analytics-js/src/constants/logMessages.ts (3)

69-70: LGTM! Enhanced type safety for write key validation.

The updated function signature now requires both context and writeKey parameters, enforcing better type safety and input validation. This change aligns well with the PR's objective of input data sanitization.


76-76: LGTM! Improved error message clarity.

The updated error message is more precise and professional by explicitly mentioning "the provided callback" instead of just "callback".


72-73: Consider previous guidance on dataPlaneUrl type.

While the stricter type checking aligns with input sanitization goals, note that there was previous guidance (PR #1730) suggesting the use of string | undefined for the dataPlaneUrl parameter. However, if the current implementation ensures that dataPlaneUrl is always provided before this validation occurs, then the stricter typing is appropriate.

Let's verify the usage of this function to ensure the stricter typing is appropriate:

packages/analytics-js/src/components/core/Analytics.ts (5)

62-67: LGTM! Well-organized error message imports.

The new error message imports are appropriately scoped for validation scenarios and follow consistent naming conventions.


71-71: LGTM! Validation utilities import aligns with PR objectives.

The import of validation utilities from a dedicated validation module follows good separation of concerns and aligns with the PR's goal of implementing input data sanitization.


113-120: LGTM! Robust input validation implementation.

The early validation checks for writeKey and dataPlaneUrl with clear error messages enhance the robustness of the SDK by preventing invalid states.


125-127: LGTM! Clean state management implementation.

The state updates are simplified and use the clone function to prevent mutation issues, following immutable state management best practices.


108-108: ⚠️ Breaking Change: Modified load method signature.

The load method now requires dataPlaneUrl as a mandatory string parameter instead of an optional string or partial LoadOptions. This is a breaking change that might affect existing implementations.

Let's verify the impact:

✅ Verification successful

✅ Breaking change is already handled in existing codebase

Based on the search results, all existing usages of the load method across the codebase already provide both writeKey and dataPlaneUrl as mandatory parameters, including:

  • Test files in packages/analytics-js/__tests__/
  • Example applications in examples/ directory (React, Next.js, Angular, Gatsby)
  • Loading scripts in packages/loading-scripts/
  • Sanity suite in packages/sanity-suite/

The change in method signature aligns with the existing implementation patterns, making it a non-breaking change in practice.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for existing load method calls that might be affected
rg -A 2 "\.load\(" --type ts

Length of output: 4766

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
packages/analytics-js/src/app/RudderAnalytics.ts (2)

Line range hint 406-412: Add sanitization to identify method parameters

Unlike page and track methods, the identify method parameters are not being sanitized. Since this method handles sensitive user data, it's important to apply the same sanitization pattern here.

Apply this pattern to sanitize the parameters:

   identify(
     userId: string | number | Nullable<IdentifyTraits>,
     traits?: Nullable<IdentifyTraits> | Nullable<ApiOptions> | ApiCallback,
     options?: Nullable<ApiOptions> | ApiCallback,
     callback?: ApiCallback,
   ) {
-    this.getAnalyticsInstance().identify(
-      identifyArgumentsToCallOptions(userId, traits, options, callback),
-    );
+    this.getAnalyticsInstance().identify(
+      identifyArgumentsToCallOptions(
+        getSanitizedValue(userId),
+        getSanitizedValue(traits),
+        getSanitizedValue(options),
+        callback,
+      ),
+    );
   }

Input sanitization is incomplete in several methods

The following methods accept user input but lack getSanitizedValue implementation:

  • alias(): Missing sanitization for to, from, and options parameters
  • group(): Missing sanitization for groupId, traits, and options parameters
  • consent(): Missing sanitization for options parameter
  • setAnonymousId(): Missing sanitization for anonymousId and rudderAmpLinkerParam parameters
  • startSession(): Missing sanitization for sessionId parameter
  • setAuthToken(): Missing sanitization for token parameter
🔗 Analysis chain

Line range hint 1-516: Verify complete implementation of sanitization

While sanitization has been implemented for several methods, we should verify that all methods that handle user input are properly sanitized.


Let me try with a different pattern to find method calls and sanitization:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all public methods in the class that might need sanitization
ast-grep --pattern 'class RudderAnalytics {
  $$$
  $method($_) {
    $$$
  }
  $$$
}'

# Search for current usage of getSanitizedValue
rg "getSanitizedValue" --type typescript

Length of output: 164


Script:

#!/bin/bash
# Search for all public methods in the class that might need sanitization
ast-grep --pattern 'class RudderAnalytics {
  $$$
  $method($_) {
    $$$
  }
  $$$
}'

# Search for current usage of getSanitizedValue
rg "getSanitizedValue"

Length of output: 2561


Script:

#!/bin/bash
# Look for method implementations that might need sanitization
ast-grep --pattern 'getAnalyticsInstance().$(method)($$$)'

# Look for getSanitizedValue usage
ast-grep --pattern 'getSanitizedValue($$$)'

Length of output: 2034

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7723b27 and 3cbaf9b.

📒 Files selected for processing (8)
  • packages/analytics-js-common/src/constants/integrations/integration_cname.js (1 hunks)
  • packages/analytics-js-integrations/tests/integrations/Gainsight_PX/browser.test.js (7 hunks)
  • packages/analytics-js-integrations/src/integrations/Gainsight_PX/browser.js (1 hunks)
  • packages/analytics-js-integrations/src/integrations/Gainsight_PX/nativeSdkLoader.js (1 hunks)
  • packages/analytics-js-integrations/src/integrations/Mixpanel/util.js (2 hunks)
  • packages/analytics-js/src/app/RudderAnalytics.ts (5 hunks)
  • packages/analytics-js/src/components/core/Analytics.ts (2 hunks)
  • packages/analytics-js/src/constants/logMessages.ts (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • packages/analytics-js-common/src/constants/integrations/integration_cname.js
  • packages/analytics-js-integrations/src/integrations/Gainsight_PX/browser.js
  • packages/analytics-js-integrations/src/integrations/Mixpanel/util.js
🧰 Additional context used
📓 Learnings (1)
packages/analytics-js/src/constants/logMessages.ts (1)
Learnt from: saikumarrs
PR: rudderlabs/rudder-sdk-js#1730
File: packages/analytics-js/src/constants/logMessages.ts:0-0
Timestamp: 2024-10-08T15:52:59.819Z
Learning: The `DATA_PLANE_URL_VALIDATION_ERROR` function in `packages/analytics-js/src/constants/logMessages.ts` should use `string | undefined` instead of `any` for the `dataPlaneUrl` parameter to enhance type safety.
🪛 Biome
packages/analytics-js-integrations/src/integrations/Gainsight_PX/nativeSdkLoader.js

[error] 20-20: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

🪛 GitHub Check: codecov/patch
packages/analytics-js/src/components/core/Analytics.ts

[warning] 114-115: packages/analytics-js/src/components/core/Analytics.ts#L114-L115
Added lines #L114 - L115 were not covered by tests


[warning] 119-120: packages/analytics-js/src/components/core/Analytics.ts#L119-L120
Added lines #L119 - L120 were not covered by tests

packages/analytics-js/src/constants/logMessages.ts

[warning] 70-70: packages/analytics-js/src/constants/logMessages.ts#L70
Added line #L70 was not covered by tests


[warning] 73-73: packages/analytics-js/src/constants/logMessages.ts#L73
Added line #L73 was not covered by tests

🔇 Additional comments (12)
packages/analytics-js-integrations/__tests__/integrations/Gainsight_PX/browser.test.js (3)

130-146: 🛠️ Refactor suggestion

Add test cases for input data sanitization.

Given that this PR focuses on input sanitization, consider adding test cases that verify:

  • Handling of malformed input data
  • Sanitization of potentially dangerous values in userId
  • Edge cases with special characters or invalid data types

Example test cases to add:

test('Testing identify call with malformed user data', () => {
  gainsightPX.identify({
    message: {
      userId: '<script>alert("xss")</script>',
      context: {
        traits: {
          name: '"><img src=x onerror=alert(1)>',
          email: '[email protected]\n\rBCC: [email protected]'
        }
      }
    }
  });

  // Verify that the data was sanitized before being passed to PX
  expect(window.aptrinsic.mock.calls[0][1].id).not.toMatch(/<script>/);
  expect(window.aptrinsic.mock.calls[0][1].name).not.toMatch(/<img/);
});

219-230: 🛠️ Refactor suggestion

Enhance track event test coverage for data sanitization.

While testing empty properties is good, consider adding test cases that verify:

  • Sanitization of event names containing special characters or scripts
  • Handling of malformed property values
  • Protection against injection attempts in event data

Example test cases to add:

test('Test track call with potentially dangerous event data', () => {
  gainsightPX.track({
    message: {
      event: 'user.action;<script>alert(1)</script>',
      properties: {
        payload: '{"malicious": "<img src=x onerror=alert(1)>"}',
        query: 'DROP TABLE users;'
      }
    }
  });

  // Verify that the event name and properties were sanitized
  expect(window.aptrinsic.mock.calls[0][1]).not.toMatch(/<script>/);
  expect(JSON.stringify(window.aptrinsic.mock.calls[0][2])).not.toMatch(/<img/);
});

test('Test track call with various data types and edge cases', () => {
  gainsightPX.track({
    message: {
      event: 'test.event',
      properties: {
        nullValue: null,
        undefinedValue: undefined,
        functionValue: () => {},
        circularRef: {},
        bigInt: BigInt(9007199254740991)
      }
    }
  });

  // Verify that complex data types are handled appropriately
  expect(() => JSON.stringify(window.aptrinsic.mock.calls[0][2])).not.toThrow();
});

Line range hint 1-232: Improve overall test coverage for input sanitization.

While the existing tests cover the basic functionality, they don't adequately verify the new input sanitization features. Consider:

  1. Adding a dedicated test suite for input sanitization that covers:

    • Cross-site scripting (XSS) prevention
    • SQL injection prevention
    • Special character handling
    • Invalid data type handling
    • Circular reference handling
    • BigInt and other complex type conversions
  2. Testing error scenarios:

    • Invalid UTF-8 sequences
    • Extremely large inputs
    • Nested object depth limits
    • Array size limits
  3. Adding property validation tests:

    • Maximum string lengths
    • Allowed character sets
    • Numeric range validation

Let's verify the current test coverage:

✅ Verification successful

Let me analyze the test coverage for input sanitization in the Gainsight_PX integration tests.

Input sanitization is already well covered in the test suite

The test file shows comprehensive validation coverage including:

  1. Input validation for required fields:
  • User ID validation
  • Anonymous ID validation
  • Event name validation
  • Group ID validation
  1. Edge case handling:
  • Empty/null values
  • Missing required fields
  • Invalid data types
  1. Data sanitization:
  • Testing identify call with ID only
  • Testing identify call with traits
  • Testing track call with empty properties
  • Testing group call with anonymous ID fallback
  1. Error handling:
  • Testing error scenarios when required fields are missing
  • Testing error scenarios for invalid input types
  • Testing error scenarios for missing event names

The test coverage appears thorough and follows testing best practices for input validation and sanitization. No additional test cases are needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check test coverage for sanitization-related code

# Test 1: Look for existing sanitization tests
echo "Searching for sanitization-related tests..."
rg -i "sanitiz|escape|validate|malicious|injection|xss" --type js --type ts

# Test 2: Check for error handling tests
echo "Searching for error handling tests..."
rg -i "throw|error|invalid|malformed" --type test

# Test 3: Look for input validation patterns
echo "Searching for input validation patterns..."
ast-grep --pattern 'expect($_).not.toMatch($pattern)'

Length of output: 361937

packages/analytics-js/src/constants/logMessages.ts (3)

76-76: LGTM! Improved error message clarity.

The updated error message is more precise and better describes the actual issue.


72-73: Verify the dataPlaneUrl type change impact.

While the changes look good, there are two concerns:

  1. The type change from string | undefined to string might need verification to ensure all call sites handle undefined values before calling this function
  2. Test coverage is missing for this modified function

Let's verify the impact of this type change:

#!/bin/bash
# Search for all usages of this error message to verify type handling
rg "DATA_PLANE_URL_VALIDATION_ERROR" --type ts -A 2 -B 2

# Search for existing test cases
rg "DATA_PLANE_URL_VALIDATION_ERROR" -t test
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 73-73: packages/analytics-js/src/constants/logMessages.ts#L73
Added line #L73 was not covered by tests


69-70: Add test coverage for the updated error message function.

The function changes look good with improved type safety by making parameters required. However, test coverage is missing for this modified function.

Let's verify if there are existing tests we can update:

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 70-70: packages/analytics-js/src/constants/logMessages.ts#L70
Added line #L70 was not covered by tests

packages/analytics-js/src/app/RudderAnalytics.ts (4)

22-22: LGTM: Import of sanitization utility

The addition of getSanitizedValue import aligns with the PR's objective of implementing input data sanitization.


129-133: LGTM: Load method documentation and sanitization

The changes appropriately document the method parameters and implement sanitization for loadOptions. This ensures that any potentially problematic data in the options is cleaned before being processed by the Analytics instance.

Also applies to: 152-156


378-383: LGTM: Track method sanitization

The sanitization implementation for track events follows a consistent pattern with other methods and properly handles all input parameters.


349-355: Verify page tracking functionality with sanitized inputs

While the sanitization implementation looks correct, we should verify that it doesn't affect the tracking of legitimate page view data.

✅ Verification successful

Based on the gathered information, I can now provide a final response:

Page tracking with sanitization is properly implemented and tested

The sanitization implementation is verified to be safe for page tracking because:

  • The getSanitizedValue function is well-tested and handles various data types appropriately
  • It preserves legitimate values while safely handling problematic cases like circular references and BigInts
  • The sanitization is applied uniformly to all page parameters (category, name, properties, options)
  • Test coverage confirms the page method continues to work as expected with the sanitized values
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for tests that verify page tracking with various input types
rg -l "page\(" --type test

Length of output: 57


Script:

#!/bin/bash
# Search for test files containing page tracking tests
fd -e test.js -e test.ts -e spec.js -e spec.ts | xargs rg -l "page\("

Length of output: 2019


Script:

#!/bin/bash
# Search for page tracking implementation and its usage
rg -A 5 "page\(" packages/analytics-js/src

Length of output: 5450


Script:

#!/bin/bash
# Look for sanitization function implementation
rg -A 3 "getSanitizedValue" packages/analytics-js/src

Length of output: 1765


Script:

#!/bin/bash
# Search for page tracking related tests in __tests__ directories
fd -g "*test*" -g "*spec*" packages/analytics-js/__tests__ 2>/dev/null | xargs rg -l "page\("

Length of output: 257


Script:

#!/bin/bash
# Check the test implementation for page tracking with sanitized values
rg -A 10 "page\(" packages/analytics-js/__tests__/app/RudderAnalytics.test.ts

Length of output: 445


Script:

#!/bin/bash
# Look for the actual implementation of getSanitizedValue
rg -A 5 "export function getSanitizedValue" packages/analytics-js-common/src/utilities/json.ts

Length of output: 96


Script:

#!/bin/bash
# Check if there are any existing tests specifically for sanitization
rg -l "getSanitizedValue" packages/analytics-js/__tests__ packages/analytics-js-common/__tests__

Length of output: 158


Script:

#!/bin/bash
# Check the sanitization function tests
rg -A 10 "getSanitizedValue" packages/analytics-js-common/__tests__/utilities/json.test.ts

Length of output: 2167


Script:

#!/bin/bash
# Look for the sanitization function implementation in the common package
fd -e ts -e js | xargs rg -l "getSanitizedValue" packages/analytics-js-common/src/

Length of output: 295


Script:

#!/bin/bash
# Check more page tracking test cases
rg -B 5 -A 10 "describe.*page" packages/analytics-js/__tests__/app/RudderAnalytics.test.ts

Length of output: 92

packages/analytics-js/src/components/core/Analytics.ts (2)

62-67: LGTM: Error message constants are well-defined.

The new error message constants are appropriately imported and follow the established pattern for error handling in the codebase.


108-108: Breaking change: Load method signature updated.

The load method signature has been updated to require dataPlaneUrl as a mandatory string parameter. This is a breaking change that might affect existing implementations.

Consider adding a deprecation notice in the documentation and providing migration guidelines for users updating from previous versions.

✅ Verification successful

Let me analyze the search results and generate the final response.

Breaking change confirmed: dataPlaneUrl parameter is now mandatory

The search results show that all existing implementations across example apps, tests, and loading scripts are already using both writeKey and dataPlaneUrl parameters when calling the load method. This indicates that while this is technically a breaking change in the method signature, it aligns with the existing usage patterns in the codebase.

Examples of current usage:

  • Example apps (React, Next.js, Angular): .load('<writeKey>', '<dataplaneUrl>')
  • Test files: .load('writeKey', 'data-plane-url')
  • Loading scripts: .load('__WRITE_KEY__', '__DATAPLANE_URL__')
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for existing load method calls that might be affected
rg -A 2 "\.load\(" --type ts

Length of output: 5056

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
packages/analytics-js-common/src/utilities/json.ts (1)

29-57: Add documentation about stateful behavior.

While the implementation is correct, the stateful nature of the ancestors array between calls should be documented to prevent future misunderstandings.

Add this note to the function's documentation:

+/**
+ * Creates a replacer function for JSON stringification that maintains state between calls
+ * to track ancestor objects for circular reference detection.
+ * @param logger Optional logger instance
+ * @returns A replacer function that handles BigInt values and circular references
+ */
 const getReplacer = (logger?: ILogger): ((key: string, value: any) => any) => {
packages/analytics-js/src/app/RudderAnalytics.ts (2)

Line range hint 109-121: Consider adding explicit error handling for invalid inputs.

While the string validation is good, the methods silently handle invalid inputs. Consider:

  1. Logging a warning when invalid writeKey/instanceId is provided
  2. Documenting the fallback behavior in method JSDoc
 setDefaultInstanceKey(writeKey: string) {
   if (isString(writeKey) && writeKey) {
     this.defaultAnalyticsKey = writeKey;
+  } else {
+    this.logger.warn(`Invalid writeKey provided: ${writeKey}`);
   }
 }

 getAnalyticsInstance(writeKey?: string): IAnalytics {
   let instanceId = writeKey;
   if (!isString(instanceId) || !instanceId) {
+    this.logger.warn(`Invalid instanceId provided: ${instanceId}, falling back to default`);
     instanceId = this.defaultAnalyticsKey;
   }

Line range hint 133-160: Enhance method documentation with @throws information.

While the JSDoc additions are good, consider documenting potential error scenarios.

 /**
  * Loads the SDK
  * @param writeKey Source write key
  * @param dataPlaneUrl Data plane URL
  * @param loadOptions Additional options for loading the SDK
+ * @throws {Error} When writeKey or dataPlaneUrl is invalid
  * @returns none
  */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3cbaf9b and 5dc538b.

📒 Files selected for processing (5)
  • packages/analytics-js-common/src/utilities/json.ts (1 hunks)
  • packages/analytics-js-plugins/src/bugsnag/constants.ts (1 hunks)
  • packages/analytics-js-plugins/src/errorReporting/constants.ts (1 hunks)
  • packages/analytics-js/src/app/RudderAnalytics.ts (11 hunks)
  • packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts
🧰 Additional context used
📓 Learnings (1)
packages/analytics-js-common/src/utilities/json.ts (2)
Learnt from: saikumarrs
PR: rudderlabs/rudder-sdk-js#1902
File: packages/analytics-js-common/src/utilities/json.ts:9-27
Timestamp: 2024-10-28T08:14:00.846Z
Learning: In `packages/analytics-js-common/src/utilities/json.ts`, the `stringifyData` function assumes that data is sanitized beforehand, so additional error handling within this function is not necessary.
Learnt from: saikumarrs
PR: rudderlabs/rudder-sdk-js#1902
File: packages/analytics-js-common/src/utilities/json.ts:29-57
Timestamp: 2024-10-28T08:15:18.358Z
Learning: In `packages/analytics-js-common/src/utilities/json.ts`, the `getReplacer` function is designed to keep track of ancestors between multiple function calls, so the `ancestors` array should not be reset at the start of each top-level call.
🔇 Additional comments (10)
packages/analytics-js-plugins/src/errorReporting/constants.ts (2)

20-20: LGTM! Good security practice to exclude auth tokens from error reports.

Adding 'authToken' to the excluded keys helps prevent sensitive authentication data from being exposed in error reports.


20-20: Verify consistent exclusion of 'authToken' across all error reporting modules.

Let's ensure this security measure is consistently applied across the codebase.

✅ Verification successful

'authToken' is consistently excluded in all error reporting configurations

The verification shows that 'authToken' is properly excluded in both error reporting modules:

  • packages/analytics-js-plugins/src/bugsnag/constants.ts
  • packages/analytics-js-plugins/src/errorReporting/constants.ts

Both modules maintain identical APP_STATE_EXCLUDE_KEYS arrays that include 'authToken' among other sensitive data fields.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent exclusion of 'authToken' in error reporting configurations
# Expected: All error reporting related constants should exclude 'authToken'

# Search for files containing error reporting configuration
rg -l "APP_STATE_EXCLUDE_KEYS" --type ts

# Check the content of these files for 'authToken' exclusion
rg -A 10 -B 10 "APP_STATE_EXCLUDE_KEYS.*=.*\[" --type ts

Length of output: 4091

packages/analytics-js-plugins/src/bugsnag/constants.ts (2)

34-34: LGTM! Good security improvement.

Adding 'authToken' to the excluded keys prevents sensitive authentication data from being exposed in error reports.


Line range hint 19-35: Verify completeness of excluded keys for sensitive data.

While adding 'authToken' is good, let's verify if there are other sensitive tokens or credentials that should be excluded from error reports.

packages/analytics-js-common/src/utilities/json.ts (4)

1-8: LGTM: Well-organized imports and constants.

The imports are properly typed and organized, with clear separation between types and utilities.


9-27: LGTM: Well-implemented stringification with clear assumptions.

The function is well-typed and handles exclusions effectively. As per the learnings, this function assumes input data is pre-sanitized, which is a valid design choice that keeps the function focused and performant.


84-101: LGTM: Well-implemented sanitization function.

The function effectively combines the replacer and traversal logic while maintaining the intended flexibility with its generic type. The initial ancestor registration is handled correctly.


103-103: LGTM: Clean exports.

The exports are appropriately minimal, only exposing the necessary public functions.

packages/analytics-js/src/app/RudderAnalytics.ts (2)

22-22: LGTM! Import changes are well-organized.

The addition of getSanitizedValue import and reorganization of the isString import support the new input sanitization feature.

Also applies to: 25-25


22-22: Verify the implementation of getSanitizedValue function.

Let's ensure the sanitization function is properly implemented and tested.

✅ Verification successful

Let me check the actual implementation of the getSanitizedValue function.


Let me try a different approach to find the implementation.


Implementation of getSanitizedValue is well-implemented and thoroughly tested

The function is properly implemented with:

  • Robust handling of circular references and BigInt values
  • Comprehensive recursive traversal of objects and arrays
  • Well-documented code with clear type definitions
  • Thorough test coverage as seen in packages/analytics-js-common/__tests__/utilities/json.test.ts
  • Appropriate usage across the RudderAnalytics class for sanitizing various inputs
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the implementation and tests of getSanitizedValue

# Look for the implementation
echo "Searching for getSanitizedValue implementation..."
rg -p "export.*getSanitizedValue"

# Look for test cases
echo "Searching for getSanitizedValue tests..."
rg -p "test.*getSanitizedValue"

# Check usage patterns
echo "Checking usage patterns..."
rg -p "getSanitizedValue\("

Length of output: 2227


Script:

#!/bin/bash
# Get the implementation details of getSanitizedValue
echo "Checking getSanitizedValue implementation..."
rg -p -B 5 -A 15 "function getSanitizedValue" packages/analytics-js-common/src/utilities/json.ts

Length of output: 197


Script:

#!/bin/bash
# Get the content of the json.ts file
echo "Checking the content of json.ts..."
cat packages/analytics-js-common/src/utilities/json.ts

Length of output: 3555

packages/analytics-js/src/app/RudderAnalytics.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)
packages/analytics-js/src/components/configManager/util/validate.ts (3)

Line range hint 15-33: Add error handling for URL parsing.

The getTopDomain function creates a URL object without error handling. Invalid URLs will throw uncaught exceptions.

Add error handling:

 const getTopDomain = (url: string) => {
+  if (!url) {
+    throw new Error('URL cannot be empty or undefined');
+  }
+
+  try {
     // Create a URL object
     const urlObj = new URL(url);
 
     // Extract the host and protocol
     const { host, protocol } = urlObj;
 
     // Split the host into parts
     const parts: string[] = host.split('.');
     let topDomain;
     // Handle different cases, especially for co.uk or similar TLDs
     if (parts.length > 2) {
       // Join the last two parts for the top-level domain
       topDomain = `${parts[parts.length - 2]}.${parts[parts.length - 1]}`;
     } else {
       // If only two parts or less, return as it is
       topDomain = host;
     }
     return { topDomain, protocol };
+  } catch (error) {
+    throw new Error(`Invalid URL provided: ${error.message}`);
+  }
 };

Line range hint 39-43: Prevent path traversal in endpoint concatenation.

The getDataServiceUrl function directly concatenates the endpoint parameter with URLs, which could be vulnerable to path traversal attacks.

Add path sanitization:

 const getDataServiceUrl = (endpoint: string, useExactDomain: boolean) => {
+  if (!endpoint) {
+    throw new Error('Endpoint cannot be empty or undefined');
+  }
+
+  // Sanitize endpoint by removing any parent directory traversal
+  const sanitizedEndpoint = endpoint
+    .replace(/^[/\\]+/, '') // Remove leading slashes
+    .replace(/\.\.\//g, '') // Remove parent directory traversal
+    .replace(/\.\./g, ''); // Remove remaining dots
+
   const url = useExactDomain ? window.location.origin : getTopDomainUrl(window.location.href);
-  const formattedEndpoint = endpoint.startsWith('/') ? endpoint.substring(1) : endpoint;
-  return `${url}/${formattedEndpoint}`;
+  return `${url}/${sanitizedEndpoint}`;
 };

Line range hint 45-48: Add input validation for domain comparison.

The isWebpageTopLevelDomain function lacks input validation and assumes window.location is always available.

Add validation:

 const isWebpageTopLevelDomain = (providedDomain: string): boolean => {
+  if (!providedDomain) {
+    return false;
+  }
+
+  // Ensure we're running in a browser context
+  if (typeof window === 'undefined' || !window.location) {
+    throw new Error('Function must be called in a browser context');
+  }
+
   const { topDomain } = getTopDomain(window.location.href);
   return topDomain === providedDomain;
 };
packages/analytics-js/src/app/RudderAnalytics.ts (2)

Line range hint 133-160: Ensure consistent sanitization of all load parameters.

While loadOptions is now sanitized, consider sanitizing writeKey and dataPlaneUrl parameters as well for consistency.

 load(writeKey: string, dataPlaneUrl: string, loadOptions?: Partial<LoadOptions>) {
+  const sanitizedWriteKey = getSanitizedValue(writeKey);
+  const sanitizedDataPlaneUrl = getSanitizedValue(dataPlaneUrl);
+
+  if (!sanitizedWriteKey || !sanitizedDataPlaneUrl) {
+    this.logger.error('Invalid writeKey or dataPlaneUrl provided to load method');
+    return;
+  }

-  if (this.analyticsInstances[writeKey]) {
+  if (this.analyticsInstances[sanitizedWriteKey]) {
     return;
   }

-  this.setDefaultInstanceKey(writeKey);
+  this.setDefaultInstanceKey(sanitizedWriteKey);
   // ... rest of the code ...

   this.getAnalyticsInstance(writeKey).load(
-    writeKey,
-    dataPlaneUrl,
+    sanitizedWriteKey,
+    sanitizedDataPlaneUrl,
     getSanitizedValue(loadOptions),
   );

Line range hint 489-536: Enhance type safety for utility methods.

Consider adding type guards and validation for the sanitized values to ensure type safety.

Example for setAnonymousId:

 setAnonymousId(anonymousId?: string, rudderAmpLinkerParam?: string) {
+  const sanitizedId = getSanitizedValue(anonymousId);
+  const sanitizedParam = getSanitizedValue(rudderAmpLinkerParam);
+
+  if (sanitizedId !== undefined && typeof sanitizedId !== 'string') {
+    this.logger.error('Invalid anonymousId type provided');
+    return;
+  }
+
   this.getAnalyticsInstance().setAnonymousId(
-    getSanitizedValue(anonymousId),
-    getSanitizedValue(rudderAmpLinkerParam),
+    sanitizedId,
+    sanitizedParam,
   );
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 5dc538b and 68d1482.

📒 Files selected for processing (5)
  • packages/analytics-js/tests/components/core/Analytics.test.ts (1 hunks)
  • packages/analytics-js/src/app/RudderAnalytics.ts (11 hunks)
  • packages/analytics-js/src/components/configManager/util/validate.ts (1 hunks)
  • packages/analytics-js/src/components/core/Analytics.ts (2 hunks)
  • packages/analytics-js/src/components/core/utilities.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/analytics-js/src/components/core/Analytics.ts
🧰 Additional context used
📓 Learnings (1)
packages/analytics-js/src/components/configManager/util/validate.ts (2)
Learnt from: MoumitaM
PR: rudderlabs/rudder-sdk-js#1777
File: packages/analytics-js/src/components/configManager/util/validate.ts:0-0
Timestamp: 2024-10-08T15:52:59.819Z
Learning: The function `isTopLevelDomain` in `validate.ts` was renamed to `isWebpageTopLevelDomain` for clarity.
Learnt from: MoumitaM
PR: rudderlabs/rudder-sdk-js#1777
File: packages/analytics-js/src/components/configManager/util/validate.ts:0-0
Timestamp: 2024-07-04T07:11:53.001Z
Learning: The function `isTopLevelDomain` in `validate.ts` was renamed to `isWebpageTopLevelDomain` for clarity.
🔇 Additional comments (9)
packages/analytics-js/src/components/core/utilities.ts (4)

1-2: LGTM! Appropriate use of common utilities.

Good practice to leverage existing validation utilities from the common package.


15-15: LGTM! Comprehensive write key validation.

The function properly combines type checking and content validation, including whitespace trimming.


19-19: LGTM! Clean export statement.

All utility functions are properly exported.


15-19: Verify consistent validation usage across the codebase.

Let's ensure these validation functions are consistently used wherever write keys and data plane URLs are handled.

✅ Verification successful

Based on the search results, I can now provide a final response:

Validation functions are correctly implemented and used consistently

The validation functions are properly implemented in utilities.ts and used consistently in the core Analytics class initialization:

  1. isWriteKeyValid validates that the write key is a non-empty string
  2. isDataPlaneUrlValid validates that the data plane URL is a valid URL string using the common URL validation utility
  3. These validations are called early in the Analytics class initialization with appropriate error messages
  4. The validation functions are also used by other components like ConfigManager when needed

The implementation follows good practices:

  • Early validation before proceeding with initialization
  • Clear error messages for invalid inputs
  • Reuse of common validation utilities
  • Consistent validation across components
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential locations where these validations should be applied

# Search for writeKey parameter/property usage
echo "Checking writeKey usage:"
rg -l "writeKey" --type ts

# Search for dataPlaneUrl parameter/property usage
echo -e "\nChecking dataPlaneUrl usage:"
rg -l "dataPlaneUrl" --type ts

# Search for URL validation to ensure consistent approach
echo -e "\nChecking other URL validations that might need alignment:"
rg -l "isValidURL|validateUrl" --type ts

Length of output: 4775


Script:

#!/bin/bash
# Let's check the actual implementation of validation in key files

# Check Analytics class initialization and validation
echo "=== Analytics class validation ==="
ast-grep --pattern 'class Analytics {
  $$$
  constructor($$$) {
    $$$
  }
  $$$
}'

# Check RudderAnalytics class initialization
echo -e "\n=== RudderAnalytics class validation ==="
ast-grep --pattern 'class RudderAnalytics {
  $$$
  constructor($$$) {
    $$$
  }
  $$$
}'

# Check ConfigManager validation
echo -e "\n=== ConfigManager validation ==="
rg "isWriteKeyValid|isDataPlaneUrlValid" -A 5 packages/analytics-js/src/components/configManager/ConfigManager.ts

# Check if there are any other URL validation implementations
echo -e "\n=== Other URL validations ==="
rg "isValidURL" -A 3

Length of output: 202321


Script:

#!/bin/bash
# Let's check how these validation functions are used in the core Analytics class initialization

# Check Analytics class validation usage
echo "=== Analytics class validation usage ==="
rg "isWriteKeyValid|isDataPlaneUrlValid" -A 5 packages/analytics-js/src/components/core/Analytics.ts

# Check ConfigManager validation usage
echo -e "\n=== ConfigManager validation usage ==="
rg "isWriteKeyValid|isDataPlaneUrlValid" -A 5 packages/analytics-js/src/components/configManager/ConfigManager.ts

# Check if there are any other validation implementations for writeKey/dataPlaneUrl
echo -e "\n=== Other validation implementations ==="
rg "writeKey.*valid|dataPlaneUrl.*valid" -A 3

Length of output: 141546

packages/analytics-js/src/components/configManager/util/validate.ts (2)

Line range hint 1-55: Verify URL validation implementation across the codebase.

Given the removal of validation functions and the focus on input sanitization, we should verify that URL validation is properly implemented elsewhere in the codebase.

Let's check for URL validation patterns:

#!/bin/bash
# Search for URL validation implementations
echo "Searching for URL validation patterns..."
rg "new URL\(|isValidURL|validateURL" --type ts

# Search for potential URL sanitization
echo "Searching for URL sanitization..."
rg "sanitizeURL|sanitizeUrl|cleanURL|cleanUrl" --type ts

1-2: Clarify the validation strategy changes.

The removal of validation functions (validateWriteKey, validateDataPlaneUrl, validateLoadArgs) seems to contradict the PR's objective of implementing input data sanitization. Please clarify:

  1. Where has the validation logic been moved to?
  2. How is input sanitization being handled without these validations?

Let's check where these validations might have moved:

✅ Verification successful

Based on the comprehensive search results, I can now generate the final response:

Input validation is properly handled through alternative mechanisms

The removal of the validation functions (validateWriteKey, validateDataPlaneUrl, validateLoadArgs) does not compromise data sanitization as the validation logic has been refactored into:

  1. Direct validation checks in the Analytics class:

    • isWriteKeyValid() validates write key is non-empty string
    • isDataPlaneUrlValid() validates URL using isValidURL utility
    • Both validations occur during the load() method with proper error logging
  2. Type-level validation through TypeScript:

    • Strong typing for configuration interfaces (IConfigManager, LoadOptions)
    • Runtime type checking with utilities like isString, isNullOrUndefined
  3. URL sanitization:

    • URL validation through new URL() constructor
    • URL path sanitization using removeDuplicateSlashes
    • Proper error handling for malformed URLs

The changes align with the PR's objective by providing robust input validation while improving code organization.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for new validation implementations
rg "validateWriteKey|validateDataPlaneUrl|validateLoadArgs" --type ts

Length of output: 3752


Script:

#!/bin/bash
# Search for validation related code in the current file and its imports
echo "Searching for validation in the current file..."
rg "validate|sanitize|check" packages/analytics-js/src/components/configManager/util/validate.ts -A 5

echo "\nSearching for validation in common utilities..."
rg "validate|sanitize|check" packages/analytics-js-common/src/utilities/checks.ts -A 5
rg "validate|sanitize|check" packages/analytics-js-common/src/utilities/object.ts -A 5

echo "\nSearching for new validation implementations..."
fd "validate" --type f --extension ts --exec rg "export.*validate" {}

Length of output: 2634


Script:

#!/bin/bash
# Let's check for any new validation implementations in the codebase
echo "Searching for new validation patterns in core and config modules..."
rg "class.*Config|interface.*Config" --type ts -A 10
rg "validateConfig|validateOptions|sanitizeConfig|sanitizeOptions" --type ts -A 5

echo "\nChecking for data plane URL handling..."
rg "dataPlaneUrl|sourceConfigUrl" --type ts -A 5

echo "\nChecking for write key validation..."
rg "writeKey.*=|writeKey.*:|setWriteKey" --type ts -A 5

Length of output: 76874

packages/analytics-js/__tests__/components/core/Analytics.test.ts (3)

131-201: LGTM! Well-structured test cases for input validation.

The new test cases effectively validate the input sanitization requirements with:

  • Clear test organization
  • Proper spy management
  • Consistent error message format
  • Good coverage of basic edge cases

132-166: Consider adding more edge cases to write key validation tests.

The test cases effectively cover basic invalid inputs. Consider adding tests for:

  • Null/undefined write keys
  • Very long write keys (e.g., exceeding maximum length)
  • Write keys containing special characters

168-201: Enhance data plane URL validation test coverage.

While the current tests cover basic cases, consider adding tests for:

  • URLs with/without protocols (http/https)
  • URLs with query parameters
  • URLs with fragments
  • URLs with special characters
  • URLs with different ports
  • URLs with trailing slashes

coderabbitai[bot]
coderabbitai bot previously approved these changes Oct 28, 2024
@saikumarrs saikumarrs marked this pull request as ready for review October 28, 2024 08:57
@saikumarrs saikumarrs requested a review from a team October 28, 2024 08:57
@saikumarrs saikumarrs requested a review from a team as a code owner October 28, 2024 08:57
MoumitaM
MoumitaM previously approved these changes Oct 29, 2024
@saikumarrs saikumarrs dismissed stale reviews from MoumitaM and coderabbitai[bot] via a9401ac October 29, 2024 06:55
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
packages/analytics-js/src/app/RudderAnalytics.ts (2)

Line range hint 134-161: Consider sanitizing all load parameters for consistency.

While loadOptions is now sanitized, consider also sanitizing writeKey and dataPlaneUrl parameters for consistent input handling across all parameters.

 load(writeKey: string, dataPlaneUrl: string, loadOptions?: Partial<LoadOptions>) {
-  if (this.analyticsInstances[writeKey]) {
+  const sanitizedWriteKey = getSanitizedValue(writeKey);
+  const sanitizedDataPlaneUrl = getSanitizedValue(dataPlaneUrl);
+  if (this.analyticsInstances[sanitizedWriteKey]) {
     return;
   }

-  this.setDefaultInstanceKey(writeKey);
+  this.setDefaultInstanceKey(sanitizedWriteKey);
   const preloadedEventsArray = this.getPreloadedEvents();

   // Track page loaded lifecycle event if enabled
   this.trackPageLifecycleEvents(preloadedEventsArray, loadOptions);

   // The array will be mutated in the below method
   promotePreloadedConsentEventsToTop(preloadedEventsArray);

   setExposedGlobal(GLOBAL_PRELOAD_BUFFER, clone(preloadedEventsArray));

-  this.analyticsInstances[writeKey] = new Analytics();
+  this.analyticsInstances[sanitizedWriteKey] = new Analytics();
   this.getAnalyticsInstance(writeKey).load(
-    writeKey,
-    dataPlaneUrl,
+    sanitizedWriteKey,
+    sanitizedDataPlaneUrl,
     getSanitizedValue(loadOptions),
   );
 }

351-357: Consider performance optimization for high-frequency API calls.

The current implementation sanitizes all parameters for every API call. For high-frequency events (e.g., tracking user interactions), this might impact performance. Consider:

  1. Implementing memoization for repeated values
  2. Adding batch processing capabilities
  3. Sanitizing only complex objects, skipping primitive values

Example implementation of memoization:

// Add this utility function
const memoizedSanitize = (() => {
  const cache = new WeakMap();
  return (value: any) => {
    if (typeof value !== 'object' || value === null) {
      return value; // Return primitives as-is
    }
    if (cache.has(value)) {
      return cache.get(value);
    }
    const sanitized = getSanitizedValue(value);
    cache.set(value, sanitized);
    return sanitized;
  };
})();

Also applies to: 380-385, 414-419, 437-444, 477-482

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 68d1482 and a9401ac.

📒 Files selected for processing (1)
  • packages/analytics-js/src/app/RudderAnalytics.ts (11 hunks)

@saikumarrs saikumarrs merged commit b71c44a into develop Oct 29, 2024
13 checks passed
@saikumarrs saikumarrs deleted the feat.sanitize-data-sdk-2254 branch October 29, 2024 08:57
MoumitaM added a commit that referenced this pull request Nov 11, 2024
This reverts commit b71c44a.

revert: sanitization of input data
@MoumitaM MoumitaM mentioned this pull request Nov 11, 2024
5 tasks
MoumitaM added a commit that referenced this pull request Nov 11, 2024
* Revert "feat: add error handling to all public apis (#1907)"

This reverts commit 9fbaf81.

revert: error handling changes to all public apis

* Revert "feat: sanitize input data (#1902)"

This reverts commit b71c44a.

revert: sanitization of input data
saikumarrs added a commit that referenced this pull request Nov 18, 2024
@coderabbitai coderabbitai bot mentioned this pull request Nov 18, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants