-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: sanitize input data #1902
Conversation
📝 WalkthroughWalkthroughThis pull request introduces significant changes across various modules, primarily focusing on updating JSON utility functions and enhancing error handling. The Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1902 +/- ##
===========================================
- Coverage 57.49% 57.47% -0.02%
===========================================
Files 485 485
Lines 16493 16495 +2
Branches 3289 3313 +24
===========================================
- Hits 9482 9481 -1
+ Misses 5792 5734 -58
- Partials 1219 1280 +61 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (19)
packages/analytics-js-common/src/types/stringify.ts (1)
1-5
: Add JSDoc documentation for better maintainability.The type definition is well-structured, but would benefit from JSDoc comments explaining the purpose and usage of each option.
+/** + * Options for JSON stringification with circular reference handling + */ export type StringifyOptions = { + /** Whether to exclude null values from the stringified output */ excludeNull?: boolean; + /** Whether to exclude BigInt values from the stringified output */ excludeBigInt?: boolean; + /** Array of keys to exclude from the stringified output */ excludeKeys?: string[]; };packages/analytics-js-common/src/utilities/sanitize.ts (1)
4-9
: Enhance documentation clarity.While the documentation follows JSDoc conventions, it could be more specific about:
- What "sanitized" means in this context
- How BigInt values are transformed
- Potential exceptions that might be thrown (e.g., from JSON.parse)
Consider expanding the documentation:
/** - * Handles circular references and BigInts in the input value + * Sanitizes input value by handling circular references and converting BigInts to strings. + * This function ensures the value can be safely serialized to JSON. * @param value Input value * @param logger Logger instance * @returns Sanitized value + * @throws {SyntaxError} When the stringified value cannot be parsed as JSON */packages/analytics-js-plugins/src/beaconQueue/utilities.ts (1)
Line range hint
25-42
: Consider enhancing error handling for data sanitization.The error handling could be more specific about sanitization failures to help with debugging in production.
Consider enhancing the error logging:
try { const blobPayload = json.stringifyWithoutCircular(data); const blobOptions: BlobPropertyBag = { type: 'text/plain' }; if (blobPayload) { return new Blob([blobPayload], blobOptions); } - logger?.error(BEACON_QUEUE_STRING_CONVERSION_FAILURE_ERROR(BEACON_QUEUE_PLUGIN)); + logger?.error( + BEACON_QUEUE_STRING_CONVERSION_FAILURE_ERROR(BEACON_QUEUE_PLUGIN), + { dataLength: JSON.stringify(data).length } + ); } catch (err) { - logger?.error(BEACON_QUEUE_BLOB_CONVERSION_FAILURE_ERROR(BEACON_QUEUE_PLUGIN), err); + logger?.error( + BEACON_QUEUE_BLOB_CONVERSION_FAILURE_ERROR(BEACON_QUEUE_PLUGIN), + { error: err, sanitizationFailed: true } + ); }🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 34-34: packages/analytics-js-plugins/src/beaconQueue/utilities.ts#L34
Added line #L34 was not covered by testspackages/analytics-js-common/src/utilities/json.ts (2)
19-23
: Document BigInt handling behaviorThe function now silently converts BigInt values to undefined when
excludeBigInt
is true. This behavior should be documented to prevent confusion.Add JSDoc comment above the function:
+/** + * Creates a replacer function for JSON.stringify that handles: + * - Circular references (converts to "[Circular Reference]") + * - BigInt values (converts to undefined when excludeBigInt is true) + * - Null values (converts to undefined when excludeNull is true) + * - Specified keys (converts to undefined when in excludeKeys) + */ const getCircularReplacer = (
51-53
: Enhance JSDoc documentation for options parameterThe current documentation could be more detailed about the available options and their effects.
Update the JSDoc to be more descriptive:
/** * Utility method for JSON stringify object excluding null values & circular references * * @param {*} value input value - * @param {StringifyOptions} options optional options for excluding keys and null values + * @param {StringifyOptions} options Configuration options: + * - excludeNull: When true, removes null values from the output (default: true) + * - excludeBigInt: When true, removes BigInt values from the output (default: true) + * - excludeKeys: Array of key names to exclude from the output (default: []) * @param {function} logger optional logger instance * @returns string */packages/analytics-js-common/src/utilities/checks.ts (1)
38-43
: Documentation could be enhanced with examples.The implementation is correct and follows TypeScript best practices with the type predicate pattern. Consider enhancing the JSDoc with examples to help other developers:
/** * Checks if the input is a BigInt * @param value input value * @returns True if the input is a BigInt + * @example + * isBigInt(BigInt(9007199254740991)) // returns true + * isBigInt(123) // returns false + * isBigInt(null) // returns false */packages/analytics-js-common/__tests__/utilities/sanitize.test.ts (2)
9-31
: Consider expanding circular reference test coverageWhile the current test effectively covers basic circular references, consider adding test cases for:
- Array circular references
- Deeper nested circular references (3+ levels)
- Multiple circular references at the same level
Example addition:
it('should handle array circular references', () => { const arr: any[] = [1, 2]; arr.push(arr); const nestedObj = { arr }; const sanitizedValue = getSanitizedValue(nestedObj, mockLogger); expect(sanitizedValue).toEqual({ arr: [1, 2, '[Circular Reference]'] }); });
33-61
: Enhance BigInt and undefined handling test coverageConsider adding test cases for:
- Arrays containing BigInt/undefined values
- Objects that become empty after sanitization
- Mixed array and object structures
Example additions:
it('should handle arrays with BigInt and undefined', () => { const obj = { arr: [BigInt(1), undefined, 'valid', {a: BigInt(2)}] }; expect(getSanitizedValue(obj)).toEqual({ arr: ['valid', {}] }); }); it('should handle empty objects after sanitization', () => { const obj = { a: { b: BigInt(1) }, c: { d: undefined } }; expect(getSanitizedValue(obj)).toEqual({}); });packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (1)
25-46
: Review architectural approach to error handling and data sanitization.The current changes present several architectural concerns:
- Inconsistent error handling between similar functions (
getDeliveryPayload
vsgetDMTDeliveryPayload
)- Removal of error handling without proper alternatives
- Missing implementation of input data sanitization mentioned in PR objectives
Consider:
- Implementing a consistent error handling strategy across all payload processing functions
- Adding a dedicated input sanitization layer
- Adding comprehensive input validation before processing
Suggested approach:
- Create a dedicated sanitization module
- Implement consistent error handling across all payload-related functions
- Add input validation at the entry points
- Consider adding logging for debugging and monitoring purposes
Would you like assistance in designing this architecture?
packages/analytics-js-cookies/src/cookiesUtilities.ts (2)
Line range hint
15-24
: Consider additional error handling for malicious inputs.The error handling is good but could be enhanced to specifically catch and handle potentially malicious inputs before encryption.
Consider adding input validation:
const getEncryptedValueInternal = ( value: string | ApiObject, encryptFn: (value: string) => string, debug: boolean, ): Nullable<string> => { const fallbackValue = null; try { + // Validate input before processing + if (typeof value === 'string' && value.length > 50000) { + throw new Error('Input exceeds maximum allowed length'); + } const strValue = stringifyWithoutCircular(value, { excludeNull: false }); if (isNull(strValue)) { return null; } return encryptFn(strValue); } catch (err) { if (debug) { - console.error('Error occurred during encryption: ', err); + console.error('[RudderStack] Error occurred during encryption: ', err); } return fallbackValue; } };
16-16
: Document the stringification behavior change.The switch to using an options object is a breaking change in the function signature. Consider adding JSDoc to document the expected behavior of
excludeNull: false
.Add documentation above the function:
+/** + * Encrypts a value after stringification, with null values preserved + * @param value - The value to encrypt (string or object) + * @param encryptFn - The encryption function to use + * @param debug - Whether to log debug information + * @returns The encrypted string or null if encryption fails + */ const getEncryptedValueInternal = (packages/analytics-js/src/services/HttpClient/HttpClient.ts (2)
Line range hint
1-134
: Consider implementing additional security measures.To align with the PR's focus on input sanitization, consider implementing these security enhancements:
- Input validation for request configurations
- Content-type validation and sanitization
- Response size limits
- Rate limiting mechanism
- CORS validation
- Request/Response logging sanitization (removing sensitive data)
Would you like assistance in implementing any of these security measures?
Based on the gathered information, I can now provide a final response regarding the input sanitization concern.
Input sanitization is partially implemented but needs enhancement
The codebase already has some security measures in place:
- URL validation through
isValidURL
utility from@rudderstack/analytics-js-common/utilities/url
- Data serialization with circular reference protection via
stringifyWithoutCircular
- Basic validation for data plane URLs in
validateDataPlaneUrl
However, the review comment is partially correct as there are gaps in request data sanitization:
- The URL validation is only used in config management but not in the HTTP client
- No content-type validation in the request handler
- No sanitization of request parameters before sending
Recommended improvements:
- Utilize the existing
isValidURL
utility to validate URLs inHttpClient.ts
- Add parameter sanitization before creating XHR request options
- Implement content-type validation in the request handler
🔗 Analysis chain
Line range hint
65-77
: Input sanitization is missing for request data.Given that this PR's objective is to implement input data sanitization, the current implementation doesn't sanitize the request data before sending it to the server. This could potentially lead to security vulnerabilities.
Let's verify if any sanitization is implemented in the dependent modules:
Consider implementing the following security measures:
- Sanitize URL parameters before making requests
- Validate and sanitize request options
- Implement content-type validation
Also applies to: 56-77
Let me gather more information about the request handling and data processing in this codebase.
The previous search was too narrow. Let's expand our search to understand the request flow and data handling better. These commands will help us:
- Find where createXhrRequestOptions is used and its context
- Look at the actual implementation of createXhrRequestOptions
- Check how options are processed throughout the HttpClient module
Let me check for any data validation or sanitization in the request handling flow, particularly focusing on the stringifyWithoutCircular function and any URL processing.
These commands will help us understand:
- How the request data is processed before sending
- If there's any URL validation or processing
- If there are any validation utilities used in the project
Let me check the validation utilities found in the codebase, particularly in the config manager, to see if they're used or could be used for request data validation.
These commands will help us understand:
- Any existing validation functions that could be reused
- URL-specific validation if it exists
- The validation utilities available in the config manager
Let me check one final thing to see if the URL validation utility is being used in the HttpClient or if there are any other relevant utilities we should be aware of.
These commands will help us understand:
- If URL validation is used in the HTTP client
- How request data is processed before sending
- If there are any other security-related utilities available
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for any sanitization in the xhr handler and related files rg -l "sanitize|escape|encode" packages/analytics-js/src/services/HttpClient/xhr/Length of output: 140
Script:
#!/bin/bash # Check for request data handling in HttpClient and related files rg -A 5 "createXhrRequestOptions" packages/analytics-js/src/services/HttpClient/ # Check the implementation of createXhrRequestOptions ast-grep --pattern 'function createXhrRequestOptions($_) { $$$ }' # Look for any data processing or validation rg -A 5 "options\." packages/analytics-js/src/services/HttpClient/Length of output: 7450
Script:
#!/bin/bash # Check stringifyWithoutCircular implementation ast-grep --pattern 'function stringifyWithoutCircular($_) { $$$ }' # Look for URL validation or processing rg -A 5 "url" packages/analytics-js/src/services/HttpClient/ # Check for any validation utilities that might be used rg -l "validate|sanitize|clean" packages/analytics-js/src/Length of output: 6075
Script:
#!/bin/bash # Check validation implementation in config manager ast-grep --pattern 'function validate($_) { $$$ }' # Look for any URL validation patterns rg -A 5 "validateUrl|isValidUrl|encodeUrl" packages/analytics-js/src/ # Check the content of validate.ts cat packages/analytics-js/src/components/configManager/util/validate.tsLength of output: 3026
Script:
#!/bin/bash # Check for imports of validation utilities rg "import.*isValidURL" packages/analytics-js/src/ # Look at the implementation of stringifyWithoutCircular rg -A 10 "stringifyWithoutCircular" packages/analytics-js/src/ # Check for any other security-related utilities rg -l "encodeURI|encodeURIComponent|sanitize" packages/analytics-js/src/services/Length of output: 13506
packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (1)
Line range hint
126-134
: Enhance error handling for malformed payloadsThe current error handling for send operation could be more specific about the nature of the failure.
try { xhr.send(payload); } catch (err) { + const errorDetails = { + payloadSize: payload?.length, + contentType: options.headers['Content-Type'], + }; reject({ - error: getMutatedError(err, XHR_SEND_ERROR(FAILED_REQUEST_ERR_MSG_PREFIX, options.url)), + error: getMutatedError(err, XHR_SEND_ERROR(FAILED_REQUEST_ERR_MSG_PREFIX, options.url, errorDetails)), xhr, options, }); }packages/analytics-js-common/__tests__/utilities/json.test.ts (4)
67-70
: Make options configuration explicit for better test clarity.The test case verifies null value exclusion behavior but doesn't explicitly show the configuration. Consider making the options explicit:
-const json = stringifyWithoutCircular(objWithCircular); +const json = stringifyWithoutCircular(objWithCircular, { excludeNull: true });
Line range hint
81-93
: Consider adding more specific assertions.While the test verifies that circular references are handled, it could benefit from additional assertions to confirm the actual structure of the stringified output.
Example:
const parsed = JSON.parse(json); expect(parsed.reused).toEqual([1, 2, 3]); expect(parsed.reusedAgain[2]).toEqual([1, 2, 3]);
142-152
: Add a comment explaining the default BigInt handling behavior.Consider adding a brief comment to explain that BigInt values are excluded by default for better test documentation.
+// By default, BigInt values are excluded from stringification to prevent JSON.stringify errors it('should remove BigInt values from input', () => {
170-177
: Consider adding a test case for null values in arrays.The current test verifies null handling for object properties. Consider adding a companion test for null values in arrays to ensure comprehensive coverage.
Example:
it('should not exclude null values in arrays when configured', () => { const arrayWithNull = ['value1', null, 'value2']; const json = stringifyWithoutCircular(arrayWithNull, { excludeNull: false }); expect(json).toBe('["value1",null,"value2"]'); });packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (1)
371-372
: Consider extracting stringification options to a constant.The changes look good. However, since the same options
{ excludeNull: false }
are used in multiple places, consider extracting it to a named constant for better maintainability.+ const COOKIE_STRINGIFY_OPTIONS = { excludeNull: false } as const; + // ... later in the code ... - const before = stringifyWithoutCircular(cData.value, { excludeNull: false }); - const after = stringifyWithoutCircular(cookieValue, { excludeNull: false }); + const before = stringifyWithoutCircular(cData.value, COOKIE_STRINGIFY_OPTIONS); + const after = stringifyWithoutCircular(cookieValue, COOKIE_STRINGIFY_OPTIONS);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
.eslintrc.json
is excluded by!**/*.json
📒 Files selected for processing (23)
- packages/analytics-js-common/tests/utilities/json.test.ts (2 hunks)
- packages/analytics-js-common/tests/utilities/sanitize.test.ts (1 hunks)
- packages/analytics-js-common/src/types/stringify.ts (1 hunks)
- packages/analytics-js-common/src/utilities/checks.ts (2 hunks)
- packages/analytics-js-common/src/utilities/json.ts (2 hunks)
- packages/analytics-js-common/src/utilities/sanitize.ts (1 hunks)
- packages/analytics-js-cookies/tests/cookieUtilities.test.ts (0 hunks)
- packages/analytics-js-cookies/src/cookiesUtilities.ts (1 hunks)
- packages/analytics-js-plugins/tests/bugsnag/utils.test.ts (0 hunks)
- packages/analytics-js-plugins/tests/errorReporting/utils.test.ts (0 hunks)
- packages/analytics-js-plugins/tests/utilities/queue.test.ts (0 hunks)
- packages/analytics-js-plugins/tests/xhrQueue/utilities.test.ts (0 hunks)
- packages/analytics-js-plugins/src/beaconQueue/utilities.ts (1 hunks)
- packages/analytics-js-plugins/src/bugsnag/utils.ts (1 hunks)
- packages/analytics-js-plugins/src/errorReporting/utils.ts (1 hunks)
- packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (1 hunks)
- packages/analytics-js-plugins/src/xhrQueue/index.ts (2 hunks)
- packages/analytics-js-plugins/src/xhrQueue/utilities.ts (3 hunks)
- packages/analytics-js/tests/services/HttpClient/HttpClient.test.ts (0 hunks)
- packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (2 hunks)
- packages/analytics-js/src/services/HttpClient/HttpClient.ts (1 hunks)
- packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (1 hunks)
- packages/analytics-js/src/services/StoreManager/Store.ts (2 hunks)
💤 Files with no reviewable changes (6)
- packages/analytics-js-cookies/tests/cookieUtilities.test.ts
- packages/analytics-js-plugins/tests/bugsnag/utils.test.ts
- packages/analytics-js-plugins/tests/errorReporting/utils.test.ts
- packages/analytics-js-plugins/tests/utilities/queue.test.ts
- packages/analytics-js-plugins/tests/xhrQueue/utilities.test.ts
- packages/analytics-js/tests/services/HttpClient/HttpClient.test.ts
🧰 Additional context used
📓 Learnings (1)
packages/analytics-js-plugins/src/bugsnag/utils.ts (1)
Learnt from: saikumarrs PR: rudderlabs/rudder-sdk-js#1740 File: packages/analytics-js-plugins/src/bugsnag/utils.ts:119-119 Timestamp: 2024-10-08T15:52:59.819Z Learning: The `state` parameter in the `getNewClient` function within `packages/analytics-js-plugins/src/bugsnag/utils.ts` is correctly typed as `ApplicationState`.
🪛 GitHub Check: codecov/patch
packages/analytics-js-plugins/src/beaconQueue/utilities.ts
[warning] 34-34: packages/analytics-js-plugins/src/beaconQueue/utilities.ts#L34
Added line #L34 was not covered by tests
🔇 Additional comments (36)
packages/analytics-js-common/src/types/stringify.ts (1)
3-3
: Verify BigInt compatibility with target browsers.Since the PR checklist mentions IE11 testing and
BigInt
is not supported in IE11, ensure that appropriate fallback handling is implemented in the actual stringification logic.✅ Verification successful
Let me check the actual implementation of BigInt handling to make a proper assessment.
Based on the implementation shown, I can now provide a final response:
BigInt handling is properly implemented with IE11 compatibility in mind
The code already implements a safe approach for handling BigInt values:
- By default,
excludeBigInt
is set totrue
, which means BigInt values are automatically excluded from serialization- When
excludeBigInt
isfalse
, the code properly handles the error case and returns null with appropriate warning- The implementation is tested thoroughly as shown in the test cases
This approach ensures compatibility with IE11 by removing BigInt values from the serialized output by default, preventing any runtime errors in unsupported browsers.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for BigInt usage and polyfills in the codebase # Test 1: Look for BigInt type checks or conversions echo "Checking for BigInt usage patterns..." rg -l "typeof.*BigInt|BigInt\(" # Test 2: Look for BigInt polyfills echo "Checking for BigInt polyfills..." rg -l "polyfill.*BigInt|window\.BigInt"Length of output: 538
Script:
#!/bin/bash # Check the actual implementation in json.ts since it appears to be the main file handling BigInt echo "Checking json.ts implementation:" rg -A 10 -B 10 "BigInt" "packages/analytics-js-common/src/utilities/json.ts" # Check the test cases to understand the handling echo -e "\nChecking test implementation:" rg -A 5 -B 5 "BigInt" "packages/analytics-js-common/__tests__/utilities/json.test.ts"Length of output: 2500
packages/analytics-js-common/src/utilities/sanitize.ts (2)
1-2
: LGTM! Clean and focused imports.The imports are well-structured and properly typed, importing only what's necessary for the implementation.
1-13
: Let me analyze the test coverage by examining the test file content before making a final determination.#!/bin/bash # Check the content of the test file echo "Examining test file content..." cat packages/analytics-js-common/__tests__/utilities/sanitize.test.ts # Check the usage of stringifyWithoutCircular to ensure complete coverage echo -e "\nChecking stringifyWithoutCircular usage..." rg "stringifyWithoutCircular" --type tspackages/analytics-js-plugins/src/beaconQueue/utilities.ts (2)
Line range hint
34-41
: Ensure cross-browser compatibility of Blob handling.The PR objectives emphasize cross-browser testing, particularly for IE11. The
Blob
creation and handling needs verification across browsers.Let's check for any browser-specific handling:
#!/bin/bash # Look for browser-specific Blob handling or polyfills rg -l "Blob.*IE|polyfill.*Blob" # Check for any browser compatibility tests rg -l "browser.*test.*Blob" packages/analytics-js-plugins/__tests__/🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 34-34: packages/analytics-js-plugins/src/beaconQueue/utilities.ts#L34
Added line #L34 was not covered by tests
34-34
:⚠️ Potential issueVerify the impact of removing stringification parameter.
The removal of the second argument from
stringifyWithoutCircular
might affect how circular references and sensitive data are handled. Since this PR focuses on input sanitization, we need to ensure this change maintains proper data sanitization.Let's verify the implementation of the new stringification:
The modified line lacks test coverage according to CodeCov. Given that this code handles event data that might contain sensitive information, comprehensive testing is crucial.
Consider adding test cases that verify:
- Proper handling of circular references
- Sanitization of sensitive data
- Edge cases with different data types
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 34-34: packages/analytics-js-plugins/src/beaconQueue/utilities.ts#L34
Added line #L34 was not covered by testspackages/analytics-js-common/src/utilities/json.ts (4)
3-3
: LGTM: Import statements are properly organizedThe new imports for
isBigInt
andStringifyOptions
are correctly placed and support the new functionality.Also applies to: 5-5
10-10
: LGTM: Well-structured options parameter with defaultsThe refactoring to use an options object with proper TypeScript types and default values improves the API's maintainability and extensibility.
Also applies to: 13-13
57-59
: LGTM: Clean implementation with proper error handlingThe function implementation is clean and maintains proper error handling while adopting the new options pattern.
Also applies to: 62-62
Line range hint
1-67
: Verify usage in security-sensitive contextsWhile the implementation is secure, ensure that security-sensitive data is properly excluded using the
excludeKeys
option when this utility is used for logging or data transmission.✅ Verification successful
Based on the search results, I can generate the final response as I have gathered sufficient information about how
stringifyWithoutCircular
is used throughout the codebase.Proper handling of sensitive data confirmed
The implementation and usage of
stringifyWithoutCircular
across the codebase shows appropriate handling of sensitive data:
The utility is consistently used with
excludeKeys
option to filter out sensitive data:
- In app state serialization:
APP_STATE_EXCLUDE_KEYS
is used- In cookie data handling:
excludeNull: false
is used to ensure complete control over data- In error reporting: Sensitive data is filtered before sending to Bugsnag
The codebase demonstrates defensive practices:
- Error handling wraps all stringification operations
- Logging operations use the utility with appropriate exclusion options
- Storage operations encrypt the stringified data before persistence
Security-sensitive contexts properly utilize the exclusion capabilities:
- User session management filters sensitive data
- HTTP client operations sanitize payloads
- Analytics data transmission implements proper data filtering
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for usage of stringifyWithoutCircular in logging contexts rg -A 3 "stringifyWithoutCircular.*log" # Search for potential sensitive data handling rg -l -e "password" -e "token" -e "secret" -e "key" | xargs -I {} rg -A 5 "stringifyWithoutCircular" {}Length of output: 30916
packages/analytics-js-common/src/utilities/checks.ts (2)
84-84
: LGTM!The export is correctly added and maintains alphabetical ordering.
38-43
: Verify sanitization implementation using this utility.Since this PR focuses on input data sanitization, let's verify how this utility is used in the sanitization process.
✅ Verification successful
Let me analyze the usage of
isBigInt
utility function by searching for its imports and usage in sanitization-related code.
Let me try a different search pattern to find how
isBigInt
is used in the codebase.
Let me check the implementation details of how
isBigInt
is used in the json utility file and sanitization tests.
I have enough context to provide the final response now.
The
isBigInt
utility is properly integrated into the sanitization processThe utility function is used in the JSON stringification process to handle BigInt values:
- It's imported and used in
json.ts
as part of the circular reference replacer function- By default, BigInt values are excluded during stringification (excludeBigInt = true)
- The sanitization tests verify that BigInt values are properly handled and removed from the output
- The implementation helps prevent JSON serialization errors since BigInt values are not JSON-serializable
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find usage of isBigInt in sanitization-related code # Search for files that import and use isBigInt rg -l "import.*isBigInt.*from.*checks" | xargs rg "isBigInt\(" # Search for sanitization-related files that might need to use this utility rg -l "sanitize|clean|validate" | xargs rg "BigInt|bigint"Length of output: 135945
Script:
#!/bin/bash # Search for files that import isBigInt and use it in sanitization context ast-grep --pattern 'import { $_ as isBigInt } from $path' # Search for sanitization-related files that use isBigInt rg -A 5 "sanitize.*\{.*isBigInt"Length of output: 94
Script:
#!/bin/bash # Search for files that import isBigInt rg "import.*isBigInt" # Search for files that use isBigInt in function calls or conditions rg "isBigInt\(" # Search for sanitize-related files rg -l "sanitize" | xargs rg -l "bigint|BigInt"Length of output: 734
Script:
#!/bin/bash # Check the json utility implementation rg -A 10 "isBigInt" packages/analytics-js-common/src/utilities/json.ts # Check the sanitization test implementation rg -A 10 "BigInt" packages/analytics-js-common/__tests__/utilities/sanitize.test.tsLength of output: 1460
packages/analytics-js-common/__tests__/utilities/sanitize.test.ts (3)
1-8
: LGTM: Well-structured test setup with proper mockingThe test suite is well-organized with clear describe blocks and proper logger mocking.
1-84
: Add security-focused test casesGiven that this is an input sanitization feature, consider adding test cases for:
- XSS payload sanitization
- Prototype pollution attempts
- Maximum depth/size limits to prevent DoS
- Unicode edge cases
This aligns with the PR's security checklist requirement.
#!/bin/bash # Check if there are any existing security-related tests rg -l "xss|injection|prototype.*pollution|sanitiz" --type test # Look for any size/depth limit implementations rg -l "maxDepth|maxSize|limit" --type ts
63-84
: Expand null value testing and verify browser compatibilityWhile the current test covers basic null scenarios, consider adding:
- Arrays containing null values
- Objects with null prototype
- Mixed null and undefined scenarios
Also, since the PR objectives mention cross-browser testing, particularly IE11, we should verify that the JSON handling works consistently across browsers.
packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (1)
37-46
:⚠️ Potential issueAdd error handling and input sanitization.
The function doesn't handle potential errors from
getDeliveryPayload
and doesn't perform any input sanitization, which was mentioned in the PR objectives.Consider adding error handling and sanitization:
-const validateEventPayloadSize = (event: RudderEvent, logger?: ILogger) => { - const payloadStr = getDeliveryPayload(event); - const payloadSize = payloadStr.length; +const validateEventPayloadSize = (event: RudderEvent, logger?: ILogger) => { + try { + // Sanitize the event data before processing + const sanitizedEvent = sanitizeEventData(event); + const payloadStr = getDeliveryPayload(sanitizedEvent); + const payloadSize = payloadStr.length;Let's verify if any input sanitization is implemented elsewhere:
packages/analytics-js-cookies/src/cookiesUtilities.ts (2)
16-16
: Verify cross-browser compatibility of stringification options.The PR objectives mention cross-browser testing requirements. Ensure the
stringifyWithoutCircular
options object works consistently across browsers, especially IE11.Run this verification script to check for browser-specific handling:
#!/bin/bash # Check for browser-specific code and tests # Description: Look for browser compatibility handling and tests # Search for browser-specific test files echo "Checking for browser-specific tests:" fd -g '*browser*.test.*' # Search for IE11 specific handling echo "Checking for IE11 specific code:" rg -g '!dist' -g '!node_modules' -A 5 '(IE11|Internet Explorer|isIE)'
16-16
: Verify input sanitization coverage.While the change to use
excludeNull: false
maintains existing behavior, consider enhancing input sanitization before stringification to prevent potential XSS or injection attacks in cookie values.Run this verification script to check current sanitization measures:
✅ Verification successful
Let me generate additional verification scripts to check for input sanitization specifically in the cookie utilities and related code:
This will help us:
- Understand the context and implementation of stringifyWithoutCircular
- Find any sanitization wrappers around it
- Look for cookie-specific security measures
Based on the search results, I can now provide a final response regarding the input sanitization concern:
Input sanitization is properly handled through multiple layers of protection
The
stringifyWithoutCircular
function used for cookie values has built-in safeguards:
- Handles circular references safely by replacing them with "[Circular Reference]"
- Includes options for excluding null values and specific keys
- Has error handling that returns null on invalid input
- Is used in conjunction with encryption before storage
Additionally, the cookie utilities have:
- Value validation before storage
- Error handling for failed encryption/decryption
- Proper domain and path scoping
- Secure cookie options
The implementation follows security best practices and doesn't require additional sanitization.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for input sanitization patterns across the codebase # Description: Look for sanitization functions that might be used before this stringification # Search for sanitization-related function definitions echo "Checking for sanitization functions:" rg -g '!dist' -g '!node_modules' -A 5 'function.*(sanitize|escape|clean|validate).*' # Search for XSS prevention patterns echo "Checking for XSS prevention patterns:" rg -g '!dist' -g '!node_modules' -A 5 '(encodeURI|encodeURIComponent|DOMPurify|sanitize|escape)'Length of output: 20718
Script:
#!/bin/bash # Check cookie utilities and related code for sanitization echo "Checking cookie utilities implementation:" rg -g '!dist' -g '!node_modules' -A 10 'stringifyWithoutCircular' echo -e "\nChecking cookie value sanitization:" ast-grep --pattern 'function $_(value$_) { $$$ stringifyWithoutCircular($$$) $$$ }' echo -e "\nChecking cookie-related security measures:" rg -g '!dist' -g '!node_modules' -A 5 'cookie.*sanitize|sanitize.*cookie|cookie.*validate|validate.*cookie'Length of output: 226019
packages/analytics-js/src/services/HttpClient/HttpClient.ts (3)
Line range hint
65-77
: Verify the impact of logger removal on error tracing.While removing the logger parameter simplifies the interface, we should ensure that critical errors are still properly tracked, especially for security-relevant issues like failed requests or malformed responses.
Consider implementing structured logging for security-relevant operations. This would help in monitoring and auditing API communication patterns.
Let me try one more search to specifically find the implementation and usage of
setAuthHeader
.#!/bin/bash # Search for setAuthHeader implementation and usage rg "setAuthHeader" -B 2 -A 2 --type ts # Also try to find any auth header related code rg "Authorization.*header" -B 2 -A 2 --type ts
Line range hint
69-73
: Implement response data validation.The raw response handling path (
isRawResponse
) bypasses JSON parsing but doesn't implement any sanitization. This could expose the application to injection vulnerabilities if the response data is rendered in the DOM or used in sensitive operations.Let's check how the raw response is used throughout the codebase:
packages/analytics-js-plugins/src/xhrQueue/utilities.ts (2)
14-16
:⚠️ Potential issueConsider adding error handling for stringification failures.
With the removal of the logger parameter, stringification errors might be silently ignored. Consider implementing proper error handling to ensure data integrity.
Let's verify the stringifyWithoutCircular implementation:
Suggested implementation:
const getBatchDeliveryPayload = (events: RudderEvent[], currentTime: string): string => { const batchPayload: XHRBatchPayload = { batch: events, sentAt: currentTime }; - return json.stringifyWithoutCircular(batchPayload) as string; + try { + const result = json.stringifyWithoutCircular(batchPayload); + if (!result) { + throw new Error('Stringification resulted in null or undefined'); + } + return result; + } catch (error) { + throw new Error(`Failed to stringify batch payload: ${error.message}`); + } };
Line range hint
64-84
: Validate activeDataplaneUrl before URL construction.The function uses
state.lifecycle.activeDataplaneUrl.value
without validation, which could lead to URL construction errors or security issues if the value is malformed.Suggested implementation:
const getRequestInfo = (itemData: XHRRetryQueueItemData, state: ApplicationState) => { let data; let headers; let url: string; const currentTime = getCurrentTimeFormatted(); + const dataplaneUrl = state.lifecycle.activeDataplaneUrl?.value; + if (!dataplaneUrl || typeof dataplaneUrl !== 'string') { + throw new Error('Invalid or missing dataplane URL'); + } if (Array.isArray(itemData)) { const finalEvents = itemData.map((queueItemData: XHRQueueItemData) => eventsDelivery.getFinalEventForDeliveryMutator(queueItemData.event, currentTime), ); data = getBatchDeliveryPayload(finalEvents, currentTime); headers = itemData[0] ? clone(itemData[0].headers) : {}; - url = getBatchDeliveryUrl(state.lifecycle.activeDataplaneUrl.value as string); + url = getBatchDeliveryUrl(dataplaneUrl); } else {Additionally, consider standardizing the payload generation approach between array and single event cases for better maintainability.
Let's verify the usage of different payload generation methods:
packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (2)
Line range hint
91-108
: Add browser compatibility testsThe XHR implementation needs verification across different browsers as per PR requirements. The current error handling looks solid but should be tested thoroughly.
Consider adding specific test cases for:
- IE11 compatibility
- Different status code handling across browsers
- Timeout behavior in various browsers
Add a comment documenting the browser compatibility requirements and testing status.
Line range hint
65-73
: Verify input sanitization implementationWhile the code handles stringification, there might be security implications when processing user input data. Consider implementing additional sanitization steps before stringification.
Run this verification script to check for potential security risks:
packages/analytics-js-plugins/src/xhrQueue/index.ts (3)
69-69
: Verify logging coverage after logger parameter removal.The removal of the logger parameter from
getRequestInfo
aligns with the goal of reducing exposure of potentially sensitive data. However, ensure that critical errors are still being logged through alternative means.#!/bin/bash # Check if there are any remaining logging mechanisms for request info errors rg -A 3 "getRequestInfo.*error|error.*getRequestInfo"
Line range hint
69-105
: Review security considerations for data handling.The code handles sensitive data through the queue system. While some security measures are in place (base64 encoding of anonymousId, payload size validation), consider:
- Implementing input validation before queueing
- Sanitizing event data before stringification
- Adding rate limiting for queue operations
- Implementing payload encryption for sensitive fields
This aligns with the PR's focus on input data sanitization and the pending security checklist.
105-105
: Consider additional payload validation before stringification.While the type assertion ensures string output, consider adding explicit validation for the payload content before stringification to prevent potential XSS or injection attacks.
packages/analytics-js-plugins/src/errorReporting/utils.ts (1)
70-70
: LGTM! Improved parameter structure using named options.The change from positional parameters to a named options object enhances code maintainability and readability.
packages/analytics-js-common/__tests__/utilities/json.test.ts (3)
73-79
: LGTM! Clear and focused test case.The test case effectively verifies the basic stringification functionality without circular references.
134-137
: LGTM! Proper usage of the new options object structure.The test case correctly demonstrates the usage of multiple options in the new configuration object.
154-168
: LGTM! Well-structured error handling test.The test effectively verifies both the error condition and logging behavior when BigInt serialization fails.
packages/analytics-js-plugins/src/bugsnag/utils.ts (2)
68-69
: Verify cross-browser compatibility of JSON operations.As per PR requirements, ensure that the JSON stringification and parsing operations work consistently across Chrome, Firefox, and IE11, particularly for:
- Circular reference handling
- BigInt serialization
- Unicode character handling
Let's check for browser-specific JSON handling code:
✅ Verification successful
Let me analyze the findings from the shell script results:
- The code uses
stringifyWithoutCircular
utility that handles BigInt values by default by excluding them from JSON serialization- The code has explicit handling for BigInt through:
- Type checking via
isBigInt
utility- StringifyOptions interface with
excludeBigInt
option- Tests verifying BigInt handling behavior
- The implementation shows that BigInts are handled safely by:
- Removing them by default during serialization
- Providing configuration to control BigInt handling
- Proper error handling when BigInts can't be serialized
Based on this analysis, I can provide the final response:
BigInt values are already handled safely by the SDK's serialization utilities
The JSON stringification changes have built-in handling for BigInt values:
- BigInts are excluded by default during serialization to prevent JSON.stringify errors
- The code includes proper type checking and error handling for BigInt values
- Tests verify the BigInt handling behavior works as expected
The reason for this verification is that the code already implements proper BigInt handling through its stringification utilities, making it safe across browsers that support BigInt. The implementation follows best practices by:
- Excluding BigInts by default to prevent serialization errors
- Providing configuration options for custom BigInt handling if needed
- Including proper error handling and logging when BigInt serialization fails
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for browser-specific JSON handling echo "Searching for browser-specific JSON handling..." rg -g '!*.test.*' -g '!*.spec.*' "JSON\.(parse|stringify)" -B 2 -A 2 echo "Searching for BigInt handling..." rg "BigInt" -B 2 -A 2Length of output: 480156
68-69
: Verify sensitive data exclusion patterns.Since this function processes application state before sending to Bugsnag (external service), it's crucial to verify that all sensitive data patterns are properly excluded.
Let's verify the APP_STATE_EXCLUDE_KEYS patterns:
packages/analytics-js/src/services/StoreManager/Store.ts (2)
211-211
: 🛠️ Refactor suggestionConsider improving null/undefined handling clarity and security.
The current ternary operation handles undefined and null values differently, which could lead to confusion. Consider making the logic more explicit and documenting the security implications.
Here's a suggested refactor for better clarity and security:
- return typeof formattedValue === 'undefined' ? value : (formattedValue ?? ''); + if (typeof formattedValue === 'undefined') { + return value; // Preserve original value if plugin returns undefined + } + + if (formattedValue === null) { + // Document security implication: Returning empty string for null values + return ''; + } + + return formattedValue;Let's verify the encryption plugin behavior:
#!/bin/bash # Description: Check for encryption plugin implementations # Look for plugins that might be affected by the null/undefined handling change # Search for storage.encrypt and storage.decrypt plugin implementations rg -A 5 "storage\.(encrypt|decrypt)"
112-112
: LGTM! Verify null value handling across the codebase.The change to use an options object with
excludeNull: false
is a good practice for extensibility and ensures consistent null value handling during stringification.Let's verify the consistent handling of null values:
✅ Verification successful
Let me generate new shell scripts to search for
stringifyWithoutCircular
usage specifically.The previous search might have failed due to syntax. Let's try with a different approach to find all occurrences and the implementation of this utility function to verify the null value handling consistency.
Based on the search results, I can now provide a final response regarding the consistency of null value handling:
The
excludeNull: false
option is consistently used across the codebase for storage operationsThe codebase shows consistent handling of null values in storage operations:
- In Store.ts,
stringifyWithoutCircular
is called with{ excludeNull: false }
to ensure null values are preserved- The same pattern is followed in other storage-related components:
- UserSessionManager's cookie data handling
- Store class implementation in both v1.1 and current versions
- Common utilities that deal with storage serialization
The change aligns with the existing pattern in the codebase and maintains consistency in how null values are handled during storage operations.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for consistent null value handling across the codebase # Look for other stringifyWithoutCircular calls to ensure they handle null values consistently # Search for all stringifyWithoutCircular usages rg "stringifyWithoutCircular" -A 2 -B 2Length of output: 10991
Script:
#!/bin/bash # Search for stringifyWithoutCircular with better context rg -U "stringifyWithoutCircular" -B 5 -A 5 # Also search for the file containing the utility function fd "stringifyWithoutCircular" --type fLength of output: 98947
packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (1)
308-308
: LGTM! Verify cookie encryption behavior.The update to use the new options object format for
stringifyWithoutCircular
is consistent with the codebase changes. TheexcludeNull: false
option ensures null values are properly included in the encrypted cookie data.✅ Verification successful
Based on the results, I can now generate the final response since I have sufficient information about the usage of
stringifyWithoutCircular
across the codebase.The
stringifyWithoutCircular
option is consistently implementedThe changes to use
{ excludeNull: false }
are consistently applied across all usages in the codebase:
- Store encryption in
Store.ts
- Cookie data handling in
UserSessionManager.ts
- Cookie utilities in
cookiesUtilities.ts
- Test cases in
json.test.ts
This confirms that the cookie encryption behavior maintains consistency throughout the system.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify consistent usage of stringifyWithoutCircular options across the codebase ast-grep --pattern 'stringifyWithoutCircular($_, { excludeNull: $_ })'Length of output: 995
packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts
Outdated
Show resolved
Hide resolved
size-limit report 📦
|
3d8ab91
to
7723b27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Outside diff range and nitpick comments (9)
packages/analytics-js-common/src/constants/logMessages.ts (1)
12-13
: Consider enhancing the warning message clarity.While the message consolidation is good, the warning message could be more precise about what constitutes "bad data" to help with debugging.
Consider this alternative message format:
- `${context}${LOG_CONTEXT_SEPARATOR}A bad data (like circular reference, BigInt) has been detected in the object and the property "${key}" has been dropped from the output.`; + `${context}${LOG_CONTEXT_SEPARATOR}Non-serializable data (circular reference, BigInt, etc.) detected in property "${key}". This property has been dropped from the output.`;packages/analytics-js-common/src/utilities/object.ts (1)
Line range hint
1-121
: Consider enhancing type safety and documentation for sanitization strategyAs this file is part of the input sanitization implementation, consider these improvements:
- Add JSDoc comments to
isObject
explaining its role in the sanitization strategy- Consider using TypeScript's strict type checking by avoiding
any
types where possible- Add unit tests specifically for edge cases in data sanitization
- Consider creating a dedicated sanitization module that uses these utilities
This will make the sanitization strategy more maintainable and secure.
packages/analytics-js/src/components/configManager/ConfigManager.ts (1)
Line range hint
89-146
: Consider security implications of removed input validation.The removal of
validateLoadArgs
from theinit()
method could potentially allow invalid or malicious input to propagate through the system. While this aligns with the PR's goal of sanitizing input data, we should ensure that:
- Invalid
writeKey
values don't lead to unauthorized access- Malformed
dataPlaneUrl
values don't enable request forgery- Error handling adequately catches and handles invalid inputs
Consider implementing validation at the state update level or adding input sanitization middleware to prevent potential security issues.
packages/analytics-js-common/__tests__/utilities/json.test.ts (4)
7-74
: Enhance test coverage for stringifyData.While the current tests cover basic scenarios well, consider adding the following test cases for more robust coverage:
- Empty objects and arrays
- Objects with special characters in keys
- Deeply nested objects (to verify stack overflow protection)
- Large arrays (to verify performance)
- Mixed data types within arrays
Example test case:
it('should handle edge cases', () => { const obj = { empty: {}, emptyArray: [], 'special@key': 'value', deeplyNested: /* deep object */, largeArray: Array(1000).fill(1), mixedArray: [1, 'string', true, null, undefined] }; // ... assertions });
83-104
: Document behavior for function properties.The test includes function properties (
i
ands
) but doesn't explicitly verify how they're handled in the sanitized output. Consider adding assertions to document the expected behavior.// Add assertion expect(typeof getSanitizedValue(obj).i).toBe('undefined'); // or expected behaviorAlso applies to: 106-106
234-247
: Verify logger is not called for valid data types.The test verifies sanitization of various data types but doesn't explicitly check that the logger isn't called when processing valid data. Consider adding this verification.
it('should not log warnings for valid data types', () => { const mockLogger = { warn: jest.fn() }; // ... your existing test code ... expect(mockLogger.warn).not.toHaveBeenCalled(); });
1-5
: Add JSDoc comments for test constants.Consider adding JSDoc comments to explain the purpose and usage of the constants
circularReferenceNotice
andbigIntNotice
. This would help other developers understand their significance in the tests./** Placeholder text used when circular references are detected during sanitization */ const circularReferenceNotice = '[Circular Reference]'; /** Placeholder text used when BigInt values are detected during sanitization */ const bigIntNotice = '[BigInt]';packages/analytics-js/src/app/RudderAnalytics.ts (2)
119-123
: Enhance JSDoc documentationThe JSDoc comments could be more descriptive. Consider adding:
- Description of what the method does
- @throws documentation if any
- Example usage
Line range hint
324-341
: Add input sanitization to group method parametersWhile the error handling for empty calls is good, the group method parameters should be sanitized consistently with other methods.
Apply this diff to add sanitization:
group( groupId: string | number | Nullable<ApiObject>, traits?: Nullable<ApiOptions> | Nullable<ApiObject> | ApiCallback, options?: Nullable<ApiOptions> | ApiCallback, callback?: ApiCallback, ) { if (arguments.length === 0) { this.logger.error(EMPTY_GROUP_CALL_ERROR(RS_APP)); return; } this.getAnalyticsInstance().group( - groupArgumentsToCallOptions(groupId, traits, options, callback), + groupArgumentsToCallOptions( + getSanitizedValue(groupId), + getSanitizedValue(traits), + getSanitizedValue(options), + callback, + ), ); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
.eslintrc.json
is excluded by!**/*.json
📒 Files selected for processing (34)
- packages/analytics-js-common/tests/utilities/json.test.ts (1 hunks)
- packages/analytics-js-common/src/constants/logMessages.ts (1 hunks)
- packages/analytics-js-common/src/utilities/checks.ts (2 hunks)
- packages/analytics-js-common/src/utilities/errors.ts (2 hunks)
- packages/analytics-js-common/src/utilities/json.ts (1 hunks)
- packages/analytics-js-common/src/utilities/object.ts (2 hunks)
- packages/analytics-js-cookies/tests/cookieUtilities.test.ts (0 hunks)
- packages/analytics-js-cookies/src/cookiesUtilities.ts (2 hunks)
- packages/analytics-js-plugins/tests/bugsnag/utils.test.ts (0 hunks)
- packages/analytics-js-plugins/tests/errorReporting/utils.test.ts (0 hunks)
- packages/analytics-js-plugins/tests/utilities/queue.test.ts (0 hunks)
- packages/analytics-js-plugins/tests/xhrQueue/utilities.test.ts (0 hunks)
- packages/analytics-js-plugins/src/beaconQueue/utilities.ts (1 hunks)
- packages/analytics-js-plugins/src/bugsnag/utils.ts (1 hunks)
- packages/analytics-js-plugins/src/errorReporting/event/event.ts (2 hunks)
- packages/analytics-js-plugins/src/errorReporting/utils.ts (3 hunks)
- packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (2 hunks)
- packages/analytics-js-plugins/src/xhrQueue/index.ts (2 hunks)
- packages/analytics-js-plugins/src/xhrQueue/utilities.ts (3 hunks)
- packages/analytics-js/tests/components/configManager/ConfigManager.test.ts (0 hunks)
- packages/analytics-js/tests/components/configManager/validate.test.ts (0 hunks)
- packages/analytics-js/tests/components/core/Analytics.test.ts (0 hunks)
- packages/analytics-js/tests/components/userSessionManager/UserSessionManager.test.ts (4 hunks)
- packages/analytics-js/tests/services/HttpClient/HttpClient.test.ts (0 hunks)
- packages/analytics-js/src/app/RudderAnalytics.ts (5 hunks)
- packages/analytics-js/src/components/configManager/ConfigManager.ts (1 hunks)
- packages/analytics-js/src/components/configManager/util/validate.ts (2 hunks)
- packages/analytics-js/src/components/core/Analytics.ts (2 hunks)
- packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (4 hunks)
- packages/analytics-js/src/constants/logMessages.ts (1 hunks)
- packages/analytics-js/src/services/ErrorHandler/processError.ts (2 hunks)
- packages/analytics-js/src/services/HttpClient/HttpClient.ts (1 hunks)
- packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (2 hunks)
- packages/analytics-js/src/services/StoreManager/Store.ts (3 hunks)
💤 Files with no reviewable changes (9)
- packages/analytics-js-cookies/tests/cookieUtilities.test.ts
- packages/analytics-js-plugins/tests/bugsnag/utils.test.ts
- packages/analytics-js-plugins/tests/errorReporting/utils.test.ts
- packages/analytics-js-plugins/tests/utilities/queue.test.ts
- packages/analytics-js-plugins/tests/xhrQueue/utilities.test.ts
- packages/analytics-js/tests/components/configManager/ConfigManager.test.ts
- packages/analytics-js/tests/components/configManager/validate.test.ts
- packages/analytics-js/tests/components/core/Analytics.test.ts
- packages/analytics-js/tests/services/HttpClient/HttpClient.test.ts
✅ Files skipped from review due to trivial changes (1)
- packages/analytics-js/tests/components/userSessionManager/UserSessionManager.test.ts
🚧 Files skipped from review as they are similar to previous changes (10)
- packages/analytics-js-common/src/utilities/checks.ts
- packages/analytics-js-cookies/src/cookiesUtilities.ts
- packages/analytics-js-plugins/src/beaconQueue/utilities.ts
- packages/analytics-js-plugins/src/bugsnag/utils.ts
- packages/analytics-js-plugins/src/errorReporting/utils.ts
- packages/analytics-js-plugins/src/xhrQueue/index.ts
- packages/analytics-js-plugins/src/xhrQueue/utilities.ts
- packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts
- packages/analytics-js/src/services/HttpClient/HttpClient.ts
- packages/analytics-js/src/services/StoreManager/Store.ts
🧰 Additional context used
📓 Learnings (2)
packages/analytics-js/src/components/configManager/util/validate.ts (2)
Learnt from: MoumitaM PR: rudderlabs/rudder-sdk-js#1777 File: packages/analytics-js/src/components/configManager/util/validate.ts:0-0 Timestamp: 2024-10-08T15:52:59.819Z Learning: The function `isTopLevelDomain` in `validate.ts` was renamed to `isWebpageTopLevelDomain` for clarity.
Learnt from: MoumitaM PR: rudderlabs/rudder-sdk-js#1777 File: packages/analytics-js/src/components/configManager/util/validate.ts:0-0 Timestamp: 2024-07-04T07:11:53.001Z Learning: The function `isTopLevelDomain` in `validate.ts` was renamed to `isWebpageTopLevelDomain` for clarity.
packages/analytics-js/src/constants/logMessages.ts (1)
Learnt from: saikumarrs PR: rudderlabs/rudder-sdk-js#1730 File: packages/analytics-js/src/constants/logMessages.ts:0-0 Timestamp: 2024-10-08T15:52:59.819Z Learning: The `DATA_PLANE_URL_VALIDATION_ERROR` function in `packages/analytics-js/src/constants/logMessages.ts` should use `string | undefined` instead of `any` for the `dataPlaneUrl` parameter to enhance type safety.
🔇 Additional comments (37)
packages/analytics-js-common/src/utilities/errors.ts (3)
Line range hint
9-17
: Ensure cross-browser compatibility of error handlingWhile the error handling logic uses standard JavaScript features, please verify that
stringifyData
behaves consistently across all required browsers (Chrome, Firefox, IE11) for various input types, especially:
- BigInt values (mentioned in AI summary)
- Complex objects with special characters
- Objects with undefined/null values
This is particularly important as the PR checklist indicates browser testing hasn't been completed yet.
13-13
: Consider security implications of error message contentWhile using
stringifyData
aligns with the PR's goal of input sanitization, we should ensure that sensitive information is not inadvertently exposed in error messages. Consider:
- Implementing a sanitization step specifically for error messages
- Adding a way to redact sensitive fields before serialization
✅ Verification successful
Based on the analysis of the code, I can now provide a final response:
Error serialization is already secure through stringifyData implementation
The
stringifyData
utility used for error serialization already implements secure data handling:
- It handles circular references and BigInt values safely
- It supports excluding sensitive fields through the
excludeKeys
parameter- The error message construction is standardized through the error utilities
The current implementation in
errors.ts
is following security best practices by:
- Using type checking to handle errors appropriately
- Providing structured error messages
- Leveraging the sanitized stringification
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any existing sanitization or redaction mechanisms rg -i "sanitize|redact" --type ts # Look for sensitive data handling patterns rg -i "password|secret|key|token" --type tsLength of output: 183494
Script:
#!/bin/bash # Look for stringifyData implementation and usage to understand sanitization ast-grep --pattern 'stringifyData($$$)' # Look for error handling patterns rg -A 5 "new Error" --type tsLength of output: 59383
1-2
: Verify circular reference handling in stringifyDataThe change from
stringifyWithoutCircular
tostringifyData
suggests a change in serialization behavior. We should ensure that the new function maintains proper handling of circular references to prevent potential issues.✅ Verification successful
Circular reference handling is properly maintained
The change from
stringifyWithoutCircular
tostringifyData
is safe as the new implementation maintains proper handling of circular references through thegetSanitizedValue
function. The code shows that:
stringifyData
is used for basic JSON stringification with null/key exclusiongetSanitizedValue
(which is imported alongside) handles circular references by tracking object ancestors and replacing circular references with '[Circular Reference]'- BigInt values are also properly sanitized to '[BigInt]'
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for the implementation of stringifyData to verify circular reference handling ast-grep --pattern 'function stringifyData($_) { $$$ }' # Search for related test cases rg -l "circular" --glob "**/*.test.{ts,js}"Length of output: 183494
Script:
#!/bin/bash # Search for stringifyData implementation and related tests rg -A 10 "function stringifyData" --type ts rg -A 10 "describe\('stringifyData" --type tsLength of output: 1140
Script:
#!/bin/bash # Search for stringifyData implementation in json.ts cat packages/analytics-js-common/src/utilities/json.tsLength of output: 3506
packages/analytics-js-common/src/constants/logMessages.ts (2)
20-21
: LGTM!The export changes correctly reflect the consolidation of warning messages.
12-21
: Verify the impact of consolidated warning messages.The consolidation of warning messages might affect error tracking and monitoring systems that rely on specific error messages.
Let's check for any direct string references to the old warning messages:
✅ Verification successful
Based on the git history and code changes, I can determine that the warning message consolidation was intentional and does not have any negative impact. Here's why:
The warning messages for circular references and bad data (BigInt) were consolidated into a single message
BAD_DATA_WARNING
to improve code maintainability and reduce duplication.The key evidence:
- The change was made in commit 7723b27 with commit message "feat: sanitize input data"
- The consolidated message handles both circular references and BigInt cases
- The message format and structure remains consistent
- All references to the old warning messages in tests and code were updated accordingly
- The functionality remains the same - warning users about problematic data that needs to be dropped
The consolidation of these warning messages was an intentional improvement to the codebase and does not negatively impact error tracking or monitoring systems. The warning message still provides the necessary context about what data was dropped and why.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any remaining references to the old warning messages # This helps ensure we haven't missed updating any error handling code echo "Checking for references to old warning messages..." rg -i "circular.*reference.*warning|json.*stringify.*warning" \ --type ts \ --type js \ --type testLength of output: 235
Script:
#!/bin/bash # Let's search for any references to the old warning messages with correct file types # and also check the history of the file to understand what was consolidated echo "Checking for references to old warning messages..." rg -i "circular.*reference|json.*stringify.*warning" \ --type ts \ --type js echo -e "\nChecking the history of logMessages.ts..." git log -p -- "**/logMessages.ts"Length of output: 264102
packages/analytics-js/src/services/ErrorHandler/processError.ts (3)
Line range hint
8-24
: Verify cross-browser compatibilityThe
processError
function handles various error types and is critical for error reporting. The PR objectives specifically mention the need for cross-browser testing (Chrome, Firefox, IE11).Given that the sandbox environment doesn't support browser testing, please ensure:
- Test the error handling with different browser-specific error objects
- Verify the behavior in IE11, especially for:
- ErrorEvent handling
- JSON stringification of errors
- Error object property access
Consider adding JSDoc comments to document the expected behavior of
stringifyData
for different error types:/** * Processes various types of errors into a consistent string format * @param error - The error to process (can be string, Error, ErrorEvent, or unknown type) * @returns A string representation of the error * @throws If the error cannot be processed or stringified */
1-1
: Verify the availability of stringifyData utilityThe import statement has been updated to use the new
stringifyData
function. Let's verify its availability and implementation in the common utilities.✅ Verification successful
Import of stringifyData utility is valid and properly implemented
The verification confirms that:
- The
stringifyData
utility is properly implemented inpackages/analytics-js-common/src/utilities/json.ts
- It is correctly exported and widely used across the codebase
- The import path
@rudderstack/analytics-js-common/utilities/json
is consistent with other imports in the codebase🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the export of stringifyData function and its implementation # Check if the function is properly exported ast-grep --pattern 'export { $$$ stringifyData $$$ }' # Look for the implementation ast-grep --pattern 'function stringifyData($$$) { $$$ }'Length of output: 116
Script:
#!/bin/bash # Let's try different approaches to locate the stringifyData utility # Search for stringifyData in all files rg "stringifyData" -A 3 # Search for the file that might contain this utility fd "json.ts" --type f # Look for export patterns with different syntax ast-grep --pattern 'export const stringifyData' # Check if it's defined as an arrow function ast-grep --pattern 'export const stringifyData = ($$$) => {'Length of output: 16256
22-22
: Verify error serialization behaviorThe change from
stringifyWithoutCircular
tostringifyData
affects how non-standard errors are serialized. This is a critical path for error handling, and we need to ensure the new function maintains the same safety guarantees for handling circular references and potentially sensitive data.Consider adding explicit test cases for:
- Circular reference handling
- Sensitive data sanitization
- Error objects with non-standard properties
✅ Verification successful
No safety concerns with stringifyData replacement
After reviewing the implementation, the change from
stringifyWithoutCircular
tostringifyData
is safe because:
- The
stringifyData
function maintains proper error handling through the try-catch block inprocessError
- The implementation handles null values and provides JSON serialization with safety checks
- The test coverage in
processError.test.ts
verifies the behavior for various error scenarios including:
- Standard Error objects
- String errors
- Custom objects with message property
- Custom objects without message property
- While
stringifyData
doesn't explicitly handle circular references like the old function, this is not a concern in the error handling context as error objects are typically simple structures, and any serialization failures are caught by the surrounding try-catch block.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Look for test coverage of stringifyData with error objects # Check test coverage for error object serialization rg -A 5 "stringifyData.*Error" --type tsLength of output: 42
Script:
#!/bin/bash # Let's first find the stringifyData implementation ast-grep --pattern 'function stringifyData' # Also search for any tests related to error handling rg -l "describe.*Error" --type ts # Look for the old implementation of stringifyWithoutCircular ast-grep --pattern 'function stringifyWithoutCircular' # Check for any error handling tests rg "test.*error.*stringifyData|stringifyWithoutCircular" --type ts -A 5Length of output: 770
Script:
#!/bin/bash # Let's check the implementation details in the processError.test.ts file cat packages/analytics-js/__tests__/services/ErrorHandler/processError.test.ts # Also check the actual implementation file cat packages/analytics-js/src/services/ErrorHandler/processError.ts # Look for stringifyData implementation in utilities fd stringifyData --type f --exec cat {}Length of output: 6592
Script:
#!/bin/bash # Let's find the implementation of stringifyData in the common utilities cat packages/analytics-js-common/src/utilities/json.ts # Also check if there are any tests for stringifyData fd stringifyData --type f --extension test.ts --exec cat {}Length of output: 3570
packages/analytics-js/src/components/configManager/util/validate.ts (3)
66-67
: LGTM! Export changes are well-organized.The new validation functions are properly exported and maintain the alphabetical ordering convention.
9-11
: Consider documenting the validation strategy change.The shift from throwing errors to returning booleans represents a significant change in error handling strategy. This change requires careful updates in all consuming code to properly handle validation failures.
Consider:
- Adding JSDoc comments to document the new validation behavior
- Updating error handling guidelines in the codebase
- Ensuring consistent error handling patterns across all validation functions
Let's check for potential usage patterns that might need updates:
#!/bin/bash # Search for potential error handling patterns that need updates rg "validateWriteKey|validateDataPlaneUrl" -A 3
11-11
: Enhance URL validation for better security.The current implementation relies solely on
isValidURL
. Consider adding additional sanitization checks for the data plane URL to prevent potential security issues like SSRF attacks.
[security]Let's verify the URL validation implementation:
packages/analytics-js-plugins/src/utilities/eventsDelivery.ts (2)
28-29
: Previous review comment about inconsistent error handling remains valid.The function maintains a nullable return type while
getDeliveryPayload
doesn't, creating inconsistency in error handling approaches.
6-6
: Verify the sanitization implementation in stringifyData.The change from
stringifyWithoutCircular
tostringifyData
suggests enhanced data handling. Let's verify the implementation to ensure proper sanitization.✅ Verification successful
The stringifyData implementation is properly handling data sanitization
The implementation in
@rudderstack/analytics-js-common/utilities/json.ts
shows thatstringifyData
is a well-implemented function that:
- Handles null value exclusion through a configurable flag
- Supports key exclusion through an optional array parameter
- Properly serializes data using JSON.stringify with a custom replacer function
The change from
stringifyWithoutCircular
tostringifyData
is appropriate as the module also provides a separategetSanitizedValue
function that specifically handles circular references and BigInt values, which can be used when needed.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the implementation of stringifyData function # Expected: Should find the implementation with sanitization logic ast-grep --pattern 'export function stringifyData<T>($_): $_ { $$$ }'Length of output: 73
Script:
#!/bin/bash # Let's try a different approach to find the stringifyData implementation # First, let's find the file containing stringifyData fd -e ts -e js "json.ts" -x cat {} # Also search for any direct references to stringifyData rg "stringifyData" -A 3Length of output: 19557
packages/analytics-js-plugins/src/errorReporting/event/event.ts (3)
70-72
: Review error logging sanitization approachThe error logging implementation needs careful consideration:
- Ensure that
stringifyData(error, false)
properly sanitizes sensitive information- Verify that the
false
parameter provides appropriate sanitization level for warning logsLet's analyze the error logging patterns across the codebase:
Consider adding a dedicated error sanitization utility that:
- Explicitly defines which error properties should be sanitized
- Provides consistent sanitization across all error logging
- Documents the security considerations for error handling
4-4
: Verify security implications of stringifyData functionThe switch to
stringifyData
for error serialization needs careful verification to ensure proper sanitization of potentially sensitive or malicious data in error objects.Let's verify the implementation and usage of the new function:
✅ Verification successful
The switch to
stringifyData
is a security improvement for error serializationThe change from
stringifyWithoutCircular
tostringifyData
is a verified security enhancement. The new function:
- Properly handles and sanitizes circular references and BigInt values
- Has comprehensive test coverage for security edge cases
- Includes proper logging of potentially dangerous data structures
- Uses safe traversal methods that respect object prototypes
The implementation shows careful consideration of security with:
- Ancestor tracking to detect circular references
- Safe object property enumeration using
Object.hasOwnProperty
- Explicit type checking and sanitization
- Warning logs for potentially dangerous data
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Analyze the implementation and usage of stringifyData # Check the implementation of stringifyData echo "Checking stringifyData implementation:" rg -A 10 "export.*stringifyData" --type ts # Verify test coverage echo "Checking test coverage:" rg -l "stringifyData.*test" --type ts # Look for security-related comments/considerations echo "Checking for security considerations:" rg -i "security|sanitize|escape|xss" --type ts $(fd -e ts -e js utilities)Length of output: 246627
Script:
#!/bin/bash # Check the implementation and usage of stringifyData function echo "Checking stringifyData implementation:" rg -A 10 "export.*stringifyData" --type ts packages/analytics-js-common/src/utilities/json.ts # Check test coverage for stringifyData echo "Checking test coverage:" rg -A 10 "stringifyData" --type ts packages/analytics-js-common/__tests__/utilities/json.test.ts # Look for security-related comments/considerations echo "Checking for security considerations:" rg -i "security|sanitize|escape|xss" --type ts packages/analytics-js-common/src/utilities/json.tsLength of output: 2777
Script:
#!/bin/bash # Check the actual implementation of stringifyData and getSanitizedValue echo "Checking full implementation:" rg -l "stringifyData|getSanitizedValue" --type ts packages/analytics-js-common/src/utilities/json.ts | xargs cat # Check for any security-related tests echo "Checking security-related tests:" rg -B 2 -A 5 "security|sanitize|malicious|circular|prototype" --type ts packages/analytics-js-common/__tests__/utilities/json.test.tsLength of output: 6751
4-4
: Ensure cross-browser compatibility of stringifyDataAs per PR objectives, cross-browser testing is required. The
stringifyData
function needs to be tested across Chrome, Firefox, and IE11 to ensure consistent error handling and sanitization.Let's check for browser-specific handling:
Also applies to: 70-72
packages/analytics-js-common/src/utilities/object.ts (1)
121-121
: Verify usage of isObject across the codebaseSince this utility is being exported and is crucial for input sanitization, let's verify its usage pattern across the codebase.
packages/analytics-js-common/src/utilities/json.ts (3)
1-8
: LGTM! Well-organized imports and constants.The imports are properly typed and organized, with clear separation between types and utilities.
103-103
: LGTM! Clean exports.The exports are well-organized and only expose the necessary functions.
1-103
: Verify security implications of data sanitization.Since this PR focuses on input data sanitization, we should verify that all potential attack vectors are covered.
Let's check for any potentially dangerous patterns in the codebase:
✅ Verification successful
Based on the scan results, I can now provide a comprehensive security assessment of the data sanitization changes. Let me generate the final response:
Security implications of data sanitization are properly handled
The code implements robust sanitization with proper handling of potential security concerns:
- The JSON sanitization utility properly handles BigInt and circular references
- Direct JSON operations in the codebase are properly wrapped with error handling and sanitization
- The only
eval
/Function
usage found is inloading-scripts/src/index.ts
which is unrelated to data sanitization- Storage access (localStorage/sessionStorage) and cookie operations are properly encapsulated in utility functions with appropriate validation
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for potentially dangerous patterns that might bypass sanitization # Look for direct JSON.parse/stringify usage that might bypass these utilities echo "Checking for direct JSON operations that might bypass sanitization..." rg "JSON\.(parse|stringify)" --type ts # Look for eval or Function constructor usage that might process unsanitized data echo "Checking for dangerous eval patterns..." rg "(eval\(|new Function\()" --type ts # Look for places where raw user input might be processed echo "Checking for potential raw input processing..." rg "(localStorage|sessionStorage|location\.|document\.cookie)" --type tsLength of output: 30432
packages/analytics-js/src/services/HttpClient/xhr/xhrRequestHandler.ts (3)
3-3
: LGTM: Import change aligns with sanitization objectivesThe replacement of
stringifyWithoutCircular
withstringifyData
is consistent with the PR's focus on input sanitization.
Line range hint
114-122
: Enhance security measures for XHR requestsThe current implementation could benefit from additional security measures:
- URL sanitization before opening the request
- Validation of header values to prevent injection
- Documentation of CORS implications when using withCredentials
#!/bin/bash # Check for URL sanitization across the codebase rg -g '*.{ts,js}' 'sanitizeUrl|validateUrl'Consider implementing URL validation:
- xhr.open(options.method, options.url, true); + const sanitizedUrl = sanitizeUrl(options.url); + if (!sanitizedUrl) { + reject({ + error: new Error('Invalid URL format'), + xhr, + options, + }); + return; + } + xhr.open(options.method, sanitizedUrl, true);
Line range hint
65-73
: Add input validation before stringificationWhile the change to
stringifyData
improves sanitization, consider adding explicit validation before processing the data, especially whensendRawData
is true.Consider implementing validation:
if (options.sendRawData === true) { + if (!validateInputData(options.data)) { + reject({ + error: new Error('Invalid input data format'), + undefined, + options, + }); + return; + } payload = options.data; } else { + const validatedData = validateInputData(options.data); + if (!validatedData) { + reject({ + error: new Error('Invalid input data format'), + undefined, + options, + }); + return; + } payload = stringifyData(options.data);packages/analytics-js/src/components/configManager/ConfigManager.ts (3)
Line range hint
1-324
: Request for additional context and documentation.The changes to ConfigManager.ts raise several concerns:
- The removal of input validation without clear replacement mechanisms needs explanation
- The relationship between these changes and the PR's input sanitization objective isn't clear
- Documentation updates might be needed to reflect the new validation/sanitization strategy
Could you provide additional context about:
- Where the input sanitization is being implemented?
- How the validation responsibilities have been redistributed?
- Whether documentation updates are planned?
Line range hint
147-190
: Clarify alignment with PR objectives.While this PR is titled "feat: sanitize input data", the changes in ConfigManager.ts appear to be removing validation rather than adding sanitization. This seems inconsistent with the PR's stated objectives.
The
processConfig
method still validates the source config, but the removal of input validation without clear replacement sanitization mechanisms needs clarification.
13-13
: Verify replacement validation strategy for writeKey and dataPlaneUrl.The removal of
validateLoadArgs
import suggests a change in validation strategy. Need to ensure thatwriteKey
anddataPlaneUrl
validation is properly handled elsewhere in the codebase.packages/analytics-js/src/app/RudderAnalytics.ts (3)
21-21
: LGTM: Import changes align with sanitization featureThe addition of
getSanitizedValue
import is appropriate for the input sanitization implementation.
217-223
: LGTM: Consistent input sanitization implementationInput sanitization is correctly and consistently applied across all event parameters using
getSanitizedValue
.Also applies to: 246-251
127-136
: Verify breaking changes in load method signatureThe load method now requires
dataPlaneUrl
as a mandatory parameter, which is a breaking change. This could affect existing implementations.Let's check for existing usage patterns:
packages/analytics-js/src/constants/logMessages.ts (3)
69-70
: LGTM! Enhanced type safety for write key validation.The updated function signature now requires both
context
andwriteKey
parameters, enforcing better type safety and input validation. This change aligns well with the PR's objective of input data sanitization.
76-76
: LGTM! Improved error message clarity.The updated error message is more precise and professional by explicitly mentioning "the provided callback" instead of just "callback".
72-73
: Consider previous guidance on dataPlaneUrl type.While the stricter type checking aligns with input sanitization goals, note that there was previous guidance (PR #1730) suggesting the use of
string | undefined
for thedataPlaneUrl
parameter. However, if the current implementation ensures thatdataPlaneUrl
is always provided before this validation occurs, then the stricter typing is appropriate.Let's verify the usage of this function to ensure the stricter typing is appropriate:
packages/analytics-js/src/components/core/Analytics.ts (5)
62-67
: LGTM! Well-organized error message imports.The new error message imports are appropriately scoped for validation scenarios and follow consistent naming conventions.
71-71
: LGTM! Validation utilities import aligns with PR objectives.The import of validation utilities from a dedicated validation module follows good separation of concerns and aligns with the PR's goal of implementing input data sanitization.
113-120
: LGTM! Robust input validation implementation.The early validation checks for
writeKey
anddataPlaneUrl
with clear error messages enhance the robustness of the SDK by preventing invalid states.
125-127
: LGTM! Clean state management implementation.The state updates are simplified and use the
clone
function to prevent mutation issues, following immutable state management best practices.
108-108
:⚠️ Breaking Change: Modified load method signature.The
load
method now requiresdataPlaneUrl
as a mandatory string parameter instead of an optional string or partialLoadOptions
. This is a breaking change that might affect existing implementations.Let's verify the impact:
✅ Verification successful
✅ Breaking change is already handled in existing codebase
Based on the search results, all existing usages of the
load
method across the codebase already provide bothwriteKey
anddataPlaneUrl
as mandatory parameters, including:
- Test files in
packages/analytics-js/__tests__/
- Example applications in
examples/
directory (React, Next.js, Angular, Gatsby)- Loading scripts in
packages/loading-scripts/
- Sanity suite in
packages/sanity-suite/
The change in method signature aligns with the existing implementation patterns, making it a non-breaking change in practice.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for existing load method calls that might be affected rg -A 2 "\.load\(" --type tsLength of output: 4766
packages/analytics-js/src/components/configManager/util/validate.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
packages/analytics-js/src/app/RudderAnalytics.ts (2)
Line range hint
406-412
: Add sanitization to identify method parametersUnlike page and track methods, the identify method parameters are not being sanitized. Since this method handles sensitive user data, it's important to apply the same sanitization pattern here.
Apply this pattern to sanitize the parameters:
identify( userId: string | number | Nullable<IdentifyTraits>, traits?: Nullable<IdentifyTraits> | Nullable<ApiOptions> | ApiCallback, options?: Nullable<ApiOptions> | ApiCallback, callback?: ApiCallback, ) { - this.getAnalyticsInstance().identify( - identifyArgumentsToCallOptions(userId, traits, options, callback), - ); + this.getAnalyticsInstance().identify( + identifyArgumentsToCallOptions( + getSanitizedValue(userId), + getSanitizedValue(traits), + getSanitizedValue(options), + callback, + ), + ); }
Input sanitization is incomplete in several methods
The following methods accept user input but lack
getSanitizedValue
implementation:
alias()
: Missing sanitization forto
,from
, andoptions
parametersgroup()
: Missing sanitization forgroupId
,traits
, andoptions
parametersconsent()
: Missing sanitization foroptions
parametersetAnonymousId()
: Missing sanitization foranonymousId
andrudderAmpLinkerParam
parametersstartSession()
: Missing sanitization forsessionId
parametersetAuthToken()
: Missing sanitization fortoken
parameter🔗 Analysis chain
Line range hint
1-516
: Verify complete implementation of sanitizationWhile sanitization has been implemented for several methods, we should verify that all methods that handle user input are properly sanitized.
Let me try with a different pattern to find method calls and sanitization:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for all public methods in the class that might need sanitization ast-grep --pattern 'class RudderAnalytics { $$$ $method($_) { $$$ } $$$ }' # Search for current usage of getSanitizedValue rg "getSanitizedValue" --type typescriptLength of output: 164
Script:
#!/bin/bash # Search for all public methods in the class that might need sanitization ast-grep --pattern 'class RudderAnalytics { $$$ $method($_) { $$$ } $$$ }' # Search for current usage of getSanitizedValue rg "getSanitizedValue"Length of output: 2561
Script:
#!/bin/bash # Look for method implementations that might need sanitization ast-grep --pattern 'getAnalyticsInstance().$(method)($$$)' # Look for getSanitizedValue usage ast-grep --pattern 'getSanitizedValue($$$)'Length of output: 2034
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
- packages/analytics-js-common/src/constants/integrations/integration_cname.js (1 hunks)
- packages/analytics-js-integrations/tests/integrations/Gainsight_PX/browser.test.js (7 hunks)
- packages/analytics-js-integrations/src/integrations/Gainsight_PX/browser.js (1 hunks)
- packages/analytics-js-integrations/src/integrations/Gainsight_PX/nativeSdkLoader.js (1 hunks)
- packages/analytics-js-integrations/src/integrations/Mixpanel/util.js (2 hunks)
- packages/analytics-js/src/app/RudderAnalytics.ts (5 hunks)
- packages/analytics-js/src/components/core/Analytics.ts (2 hunks)
- packages/analytics-js/src/constants/logMessages.ts (1 hunks)
✅ Files skipped from review due to trivial changes (3)
- packages/analytics-js-common/src/constants/integrations/integration_cname.js
- packages/analytics-js-integrations/src/integrations/Gainsight_PX/browser.js
- packages/analytics-js-integrations/src/integrations/Mixpanel/util.js
🧰 Additional context used
📓 Learnings (1)
packages/analytics-js/src/constants/logMessages.ts (1)
Learnt from: saikumarrs PR: rudderlabs/rudder-sdk-js#1730 File: packages/analytics-js/src/constants/logMessages.ts:0-0 Timestamp: 2024-10-08T15:52:59.819Z Learning: The `DATA_PLANE_URL_VALIDATION_ERROR` function in `packages/analytics-js/src/constants/logMessages.ts` should use `string | undefined` instead of `any` for the `dataPlaneUrl` parameter to enhance type safety.
🪛 Biome
packages/analytics-js-integrations/src/integrations/Gainsight_PX/nativeSdkLoader.js
[error] 20-20: The assignment should not be in an expression.
The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.(lint/suspicious/noAssignInExpressions)
🪛 GitHub Check: codecov/patch
packages/analytics-js/src/components/core/Analytics.ts
[warning] 114-115: packages/analytics-js/src/components/core/Analytics.ts#L114-L115
Added lines #L114 - L115 were not covered by tests
[warning] 119-120: packages/analytics-js/src/components/core/Analytics.ts#L119-L120
Added lines #L119 - L120 were not covered by testspackages/analytics-js/src/constants/logMessages.ts
[warning] 70-70: packages/analytics-js/src/constants/logMessages.ts#L70
Added line #L70 was not covered by tests
[warning] 73-73: packages/analytics-js/src/constants/logMessages.ts#L73
Added line #L73 was not covered by tests
🔇 Additional comments (12)
packages/analytics-js-integrations/__tests__/integrations/Gainsight_PX/browser.test.js (3)
130-146
: 🛠️ Refactor suggestionAdd test cases for input data sanitization.
Given that this PR focuses on input sanitization, consider adding test cases that verify:
- Handling of malformed input data
- Sanitization of potentially dangerous values in userId
- Edge cases with special characters or invalid data types
Example test cases to add:
test('Testing identify call with malformed user data', () => { gainsightPX.identify({ message: { userId: '<script>alert("xss")</script>', context: { traits: { name: '"><img src=x onerror=alert(1)>', email: '[email protected]\n\rBCC: [email protected]' } } } }); // Verify that the data was sanitized before being passed to PX expect(window.aptrinsic.mock.calls[0][1].id).not.toMatch(/<script>/); expect(window.aptrinsic.mock.calls[0][1].name).not.toMatch(/<img/); });
219-230
: 🛠️ Refactor suggestionEnhance track event test coverage for data sanitization.
While testing empty properties is good, consider adding test cases that verify:
- Sanitization of event names containing special characters or scripts
- Handling of malformed property values
- Protection against injection attempts in event data
Example test cases to add:
test('Test track call with potentially dangerous event data', () => { gainsightPX.track({ message: { event: 'user.action;<script>alert(1)</script>', properties: { payload: '{"malicious": "<img src=x onerror=alert(1)>"}', query: 'DROP TABLE users;' } } }); // Verify that the event name and properties were sanitized expect(window.aptrinsic.mock.calls[0][1]).not.toMatch(/<script>/); expect(JSON.stringify(window.aptrinsic.mock.calls[0][2])).not.toMatch(/<img/); }); test('Test track call with various data types and edge cases', () => { gainsightPX.track({ message: { event: 'test.event', properties: { nullValue: null, undefinedValue: undefined, functionValue: () => {}, circularRef: {}, bigInt: BigInt(9007199254740991) } } }); // Verify that complex data types are handled appropriately expect(() => JSON.stringify(window.aptrinsic.mock.calls[0][2])).not.toThrow(); });
Line range hint
1-232
: Improve overall test coverage for input sanitization.While the existing tests cover the basic functionality, they don't adequately verify the new input sanitization features. Consider:
Adding a dedicated test suite for input sanitization that covers:
- Cross-site scripting (XSS) prevention
- SQL injection prevention
- Special character handling
- Invalid data type handling
- Circular reference handling
- BigInt and other complex type conversions
Testing error scenarios:
- Invalid UTF-8 sequences
- Extremely large inputs
- Nested object depth limits
- Array size limits
Adding property validation tests:
- Maximum string lengths
- Allowed character sets
- Numeric range validation
Let's verify the current test coverage:
✅ Verification successful
Let me analyze the test coverage for input sanitization in the Gainsight_PX integration tests.
Input sanitization is already well covered in the test suite
The test file shows comprehensive validation coverage including:
- Input validation for required fields:
- User ID validation
- Anonymous ID validation
- Event name validation
- Group ID validation
- Edge case handling:
- Empty/null values
- Missing required fields
- Invalid data types
- Data sanitization:
- Testing identify call with ID only
- Testing identify call with traits
- Testing track call with empty properties
- Testing group call with anonymous ID fallback
- Error handling:
- Testing error scenarios when required fields are missing
- Testing error scenarios for invalid input types
- Testing error scenarios for missing event names
The test coverage appears thorough and follows testing best practices for input validation and sanitization. No additional test cases are needed.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check test coverage for sanitization-related code # Test 1: Look for existing sanitization tests echo "Searching for sanitization-related tests..." rg -i "sanitiz|escape|validate|malicious|injection|xss" --type js --type ts # Test 2: Check for error handling tests echo "Searching for error handling tests..." rg -i "throw|error|invalid|malformed" --type test # Test 3: Look for input validation patterns echo "Searching for input validation patterns..." ast-grep --pattern 'expect($_).not.toMatch($pattern)'Length of output: 361937
packages/analytics-js/src/constants/logMessages.ts (3)
76-76
: LGTM! Improved error message clarity.The updated error message is more precise and better describes the actual issue.
72-73
: Verify the dataPlaneUrl type change impact.While the changes look good, there are two concerns:
- The type change from
string | undefined
tostring
might need verification to ensure all call sites handle undefined values before calling this function- Test coverage is missing for this modified function
Let's verify the impact of this type change:
#!/bin/bash # Search for all usages of this error message to verify type handling rg "DATA_PLANE_URL_VALIDATION_ERROR" --type ts -A 2 -B 2 # Search for existing test cases rg "DATA_PLANE_URL_VALIDATION_ERROR" -t test🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 73-73: packages/analytics-js/src/constants/logMessages.ts#L73
Added line #L73 was not covered by tests
69-70
: Add test coverage for the updated error message function.The function changes look good with improved type safety by making parameters required. However, test coverage is missing for this modified function.
Let's verify if there are existing tests we can update:
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 70-70: packages/analytics-js/src/constants/logMessages.ts#L70
Added line #L70 was not covered by testspackages/analytics-js/src/app/RudderAnalytics.ts (4)
22-22
: LGTM: Import of sanitization utilityThe addition of
getSanitizedValue
import aligns with the PR's objective of implementing input data sanitization.
129-133
: LGTM: Load method documentation and sanitizationThe changes appropriately document the method parameters and implement sanitization for loadOptions. This ensures that any potentially problematic data in the options is cleaned before being processed by the Analytics instance.
Also applies to: 152-156
378-383
: LGTM: Track method sanitizationThe sanitization implementation for track events follows a consistent pattern with other methods and properly handles all input parameters.
349-355
: Verify page tracking functionality with sanitized inputsWhile the sanitization implementation looks correct, we should verify that it doesn't affect the tracking of legitimate page view data.
✅ Verification successful
Based on the gathered information, I can now provide a final response:
Page tracking with sanitization is properly implemented and tested
The sanitization implementation is verified to be safe for page tracking because:
- The
getSanitizedValue
function is well-tested and handles various data types appropriately- It preserves legitimate values while safely handling problematic cases like circular references and BigInts
- The sanitization is applied uniformly to all page parameters (category, name, properties, options)
- Test coverage confirms the page method continues to work as expected with the sanitized values
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for tests that verify page tracking with various input types rg -l "page\(" --type testLength of output: 57
Script:
#!/bin/bash # Search for test files containing page tracking tests fd -e test.js -e test.ts -e spec.js -e spec.ts | xargs rg -l "page\("Length of output: 2019
Script:
#!/bin/bash # Search for page tracking implementation and its usage rg -A 5 "page\(" packages/analytics-js/srcLength of output: 5450
Script:
#!/bin/bash # Look for sanitization function implementation rg -A 3 "getSanitizedValue" packages/analytics-js/srcLength of output: 1765
Script:
#!/bin/bash # Search for page tracking related tests in __tests__ directories fd -g "*test*" -g "*spec*" packages/analytics-js/__tests__ 2>/dev/null | xargs rg -l "page\("Length of output: 257
Script:
#!/bin/bash # Check the test implementation for page tracking with sanitized values rg -A 10 "page\(" packages/analytics-js/__tests__/app/RudderAnalytics.test.tsLength of output: 445
Script:
#!/bin/bash # Look for the actual implementation of getSanitizedValue rg -A 5 "export function getSanitizedValue" packages/analytics-js-common/src/utilities/json.tsLength of output: 96
Script:
#!/bin/bash # Check if there are any existing tests specifically for sanitization rg -l "getSanitizedValue" packages/analytics-js/__tests__ packages/analytics-js-common/__tests__Length of output: 158
Script:
#!/bin/bash # Check the sanitization function tests rg -A 10 "getSanitizedValue" packages/analytics-js-common/__tests__/utilities/json.test.tsLength of output: 2167
Script:
#!/bin/bash # Look for the sanitization function implementation in the common package fd -e ts -e js | xargs rg -l "getSanitizedValue" packages/analytics-js-common/src/Length of output: 295
Script:
#!/bin/bash # Check more page tracking test cases rg -B 5 -A 10 "describe.*page" packages/analytics-js/__tests__/app/RudderAnalytics.test.tsLength of output: 92
packages/analytics-js/src/components/core/Analytics.ts (2)
62-67
: LGTM: Error message constants are well-defined.The new error message constants are appropriately imported and follow the established pattern for error handling in the codebase.
108-108
: Breaking change: Load method signature updated.The
load
method signature has been updated to requiredataPlaneUrl
as a mandatory string parameter. This is a breaking change that might affect existing implementations.Consider adding a deprecation notice in the documentation and providing migration guidelines for users updating from previous versions.
✅ Verification successful
Let me analyze the search results and generate the final response.
Breaking change confirmed:
dataPlaneUrl
parameter is now mandatoryThe search results show that all existing implementations across example apps, tests, and loading scripts are already using both
writeKey
anddataPlaneUrl
parameters when calling theload
method. This indicates that while this is technically a breaking change in the method signature, it aligns with the existing usage patterns in the codebase.Examples of current usage:
- Example apps (React, Next.js, Angular):
.load('<writeKey>', '<dataplaneUrl>')
- Test files:
.load('writeKey', 'data-plane-url')
- Loading scripts:
.load('__WRITE_KEY__', '__DATAPLANE_URL__')
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for existing load method calls that might be affected rg -A 2 "\.load\(" --type tsLength of output: 5056
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (3)
packages/analytics-js-common/src/utilities/json.ts (1)
29-57
: Add documentation about stateful behavior.While the implementation is correct, the stateful nature of the
ancestors
array between calls should be documented to prevent future misunderstandings.Add this note to the function's documentation:
+/** + * Creates a replacer function for JSON stringification that maintains state between calls + * to track ancestor objects for circular reference detection. + * @param logger Optional logger instance + * @returns A replacer function that handles BigInt values and circular references + */ const getReplacer = (logger?: ILogger): ((key: string, value: any) => any) => {packages/analytics-js/src/app/RudderAnalytics.ts (2)
Line range hint
109-121
: Consider adding explicit error handling for invalid inputs.While the string validation is good, the methods silently handle invalid inputs. Consider:
- Logging a warning when invalid writeKey/instanceId is provided
- Documenting the fallback behavior in method JSDoc
setDefaultInstanceKey(writeKey: string) { if (isString(writeKey) && writeKey) { this.defaultAnalyticsKey = writeKey; + } else { + this.logger.warn(`Invalid writeKey provided: ${writeKey}`); } } getAnalyticsInstance(writeKey?: string): IAnalytics { let instanceId = writeKey; if (!isString(instanceId) || !instanceId) { + this.logger.warn(`Invalid instanceId provided: ${instanceId}, falling back to default`); instanceId = this.defaultAnalyticsKey; }
Line range hint
133-160
: Enhance method documentation with @throws information.While the JSDoc additions are good, consider documenting potential error scenarios.
/** * Loads the SDK * @param writeKey Source write key * @param dataPlaneUrl Data plane URL * @param loadOptions Additional options for loading the SDK + * @throws {Error} When writeKey or dataPlaneUrl is invalid * @returns none */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- packages/analytics-js-common/src/utilities/json.ts (1 hunks)
- packages/analytics-js-plugins/src/bugsnag/constants.ts (1 hunks)
- packages/analytics-js-plugins/src/errorReporting/constants.ts (1 hunks)
- packages/analytics-js/src/app/RudderAnalytics.ts (11 hunks)
- packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/analytics-js/src/components/userSessionManager/UserSessionManager.ts
🧰 Additional context used
📓 Learnings (1)
packages/analytics-js-common/src/utilities/json.ts (2)
Learnt from: saikumarrs PR: rudderlabs/rudder-sdk-js#1902 File: packages/analytics-js-common/src/utilities/json.ts:9-27 Timestamp: 2024-10-28T08:14:00.846Z Learning: In `packages/analytics-js-common/src/utilities/json.ts`, the `stringifyData` function assumes that data is sanitized beforehand, so additional error handling within this function is not necessary.
Learnt from: saikumarrs PR: rudderlabs/rudder-sdk-js#1902 File: packages/analytics-js-common/src/utilities/json.ts:29-57 Timestamp: 2024-10-28T08:15:18.358Z Learning: In `packages/analytics-js-common/src/utilities/json.ts`, the `getReplacer` function is designed to keep track of ancestors between multiple function calls, so the `ancestors` array should not be reset at the start of each top-level call.
🔇 Additional comments (10)
packages/analytics-js-plugins/src/errorReporting/constants.ts (2)
20-20
: LGTM! Good security practice to exclude auth tokens from error reports.Adding 'authToken' to the excluded keys helps prevent sensitive authentication data from being exposed in error reports.
20-20
: Verify consistent exclusion of 'authToken' across all error reporting modules.Let's ensure this security measure is consistently applied across the codebase.
✅ Verification successful
'authToken' is consistently excluded in all error reporting configurations
The verification shows that 'authToken' is properly excluded in both error reporting modules:
- packages/analytics-js-plugins/src/bugsnag/constants.ts
- packages/analytics-js-plugins/src/errorReporting/constants.ts
Both modules maintain identical APP_STATE_EXCLUDE_KEYS arrays that include 'authToken' among other sensitive data fields.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for consistent exclusion of 'authToken' in error reporting configurations # Expected: All error reporting related constants should exclude 'authToken' # Search for files containing error reporting configuration rg -l "APP_STATE_EXCLUDE_KEYS" --type ts # Check the content of these files for 'authToken' exclusion rg -A 10 -B 10 "APP_STATE_EXCLUDE_KEYS.*=.*\[" --type tsLength of output: 4091
packages/analytics-js-plugins/src/bugsnag/constants.ts (2)
34-34
: LGTM! Good security improvement.Adding 'authToken' to the excluded keys prevents sensitive authentication data from being exposed in error reports.
Line range hint
19-35
: Verify completeness of excluded keys for sensitive data.While adding 'authToken' is good, let's verify if there are other sensitive tokens or credentials that should be excluded from error reports.
packages/analytics-js-common/src/utilities/json.ts (4)
1-8
: LGTM: Well-organized imports and constants.The imports are properly typed and organized, with clear separation between types and utilities.
9-27
: LGTM: Well-implemented stringification with clear assumptions.The function is well-typed and handles exclusions effectively. As per the learnings, this function assumes input data is pre-sanitized, which is a valid design choice that keeps the function focused and performant.
84-101
: LGTM: Well-implemented sanitization function.The function effectively combines the replacer and traversal logic while maintaining the intended flexibility with its generic type. The initial ancestor registration is handled correctly.
103-103
: LGTM: Clean exports.The exports are appropriately minimal, only exposing the necessary public functions.
packages/analytics-js/src/app/RudderAnalytics.ts (2)
22-22
: LGTM! Import changes are well-organized.The addition of
getSanitizedValue
import and reorganization of theisString
import support the new input sanitization feature.Also applies to: 25-25
22-22
: Verify the implementation of getSanitizedValue function.Let's ensure the sanitization function is properly implemented and tested.
✅ Verification successful
Let me check the actual implementation of the
getSanitizedValue
function.
Let me try a different approach to find the implementation.
Implementation of getSanitizedValue is well-implemented and thoroughly tested
The function is properly implemented with:
- Robust handling of circular references and BigInt values
- Comprehensive recursive traversal of objects and arrays
- Well-documented code with clear type definitions
- Thorough test coverage as seen in
packages/analytics-js-common/__tests__/utilities/json.test.ts
- Appropriate usage across the RudderAnalytics class for sanitizing various inputs
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check the implementation and tests of getSanitizedValue # Look for the implementation echo "Searching for getSanitizedValue implementation..." rg -p "export.*getSanitizedValue" # Look for test cases echo "Searching for getSanitizedValue tests..." rg -p "test.*getSanitizedValue" # Check usage patterns echo "Checking usage patterns..." rg -p "getSanitizedValue\("Length of output: 2227
Script:
#!/bin/bash # Get the implementation details of getSanitizedValue echo "Checking getSanitizedValue implementation..." rg -p -B 5 -A 15 "function getSanitizedValue" packages/analytics-js-common/src/utilities/json.tsLength of output: 197
Script:
#!/bin/bash # Get the content of the json.ts file echo "Checking the content of json.ts..." cat packages/analytics-js-common/src/utilities/json.tsLength of output: 3555
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (5)
packages/analytics-js/src/components/configManager/util/validate.ts (3)
Line range hint
15-33
: Add error handling for URL parsing.The
getTopDomain
function creates a URL object without error handling. Invalid URLs will throw uncaught exceptions.Add error handling:
const getTopDomain = (url: string) => { + if (!url) { + throw new Error('URL cannot be empty or undefined'); + } + + try { // Create a URL object const urlObj = new URL(url); // Extract the host and protocol const { host, protocol } = urlObj; // Split the host into parts const parts: string[] = host.split('.'); let topDomain; // Handle different cases, especially for co.uk or similar TLDs if (parts.length > 2) { // Join the last two parts for the top-level domain topDomain = `${parts[parts.length - 2]}.${parts[parts.length - 1]}`; } else { // If only two parts or less, return as it is topDomain = host; } return { topDomain, protocol }; + } catch (error) { + throw new Error(`Invalid URL provided: ${error.message}`); + } };
Line range hint
39-43
: Prevent path traversal in endpoint concatenation.The
getDataServiceUrl
function directly concatenates the endpoint parameter with URLs, which could be vulnerable to path traversal attacks.Add path sanitization:
const getDataServiceUrl = (endpoint: string, useExactDomain: boolean) => { + if (!endpoint) { + throw new Error('Endpoint cannot be empty or undefined'); + } + + // Sanitize endpoint by removing any parent directory traversal + const sanitizedEndpoint = endpoint + .replace(/^[/\\]+/, '') // Remove leading slashes + .replace(/\.\.\//g, '') // Remove parent directory traversal + .replace(/\.\./g, ''); // Remove remaining dots + const url = useExactDomain ? window.location.origin : getTopDomainUrl(window.location.href); - const formattedEndpoint = endpoint.startsWith('/') ? endpoint.substring(1) : endpoint; - return `${url}/${formattedEndpoint}`; + return `${url}/${sanitizedEndpoint}`; };
Line range hint
45-48
: Add input validation for domain comparison.The
isWebpageTopLevelDomain
function lacks input validation and assumes window.location is always available.Add validation:
const isWebpageTopLevelDomain = (providedDomain: string): boolean => { + if (!providedDomain) { + return false; + } + + // Ensure we're running in a browser context + if (typeof window === 'undefined' || !window.location) { + throw new Error('Function must be called in a browser context'); + } + const { topDomain } = getTopDomain(window.location.href); return topDomain === providedDomain; };packages/analytics-js/src/app/RudderAnalytics.ts (2)
Line range hint
133-160
: Ensure consistent sanitization of all load parameters.While loadOptions is now sanitized, consider sanitizing writeKey and dataPlaneUrl parameters as well for consistency.
load(writeKey: string, dataPlaneUrl: string, loadOptions?: Partial<LoadOptions>) { + const sanitizedWriteKey = getSanitizedValue(writeKey); + const sanitizedDataPlaneUrl = getSanitizedValue(dataPlaneUrl); + + if (!sanitizedWriteKey || !sanitizedDataPlaneUrl) { + this.logger.error('Invalid writeKey or dataPlaneUrl provided to load method'); + return; + } - if (this.analyticsInstances[writeKey]) { + if (this.analyticsInstances[sanitizedWriteKey]) { return; } - this.setDefaultInstanceKey(writeKey); + this.setDefaultInstanceKey(sanitizedWriteKey); // ... rest of the code ... this.getAnalyticsInstance(writeKey).load( - writeKey, - dataPlaneUrl, + sanitizedWriteKey, + sanitizedDataPlaneUrl, getSanitizedValue(loadOptions), );
Line range hint
489-536
: Enhance type safety for utility methods.Consider adding type guards and validation for the sanitized values to ensure type safety.
Example for setAnonymousId:
setAnonymousId(anonymousId?: string, rudderAmpLinkerParam?: string) { + const sanitizedId = getSanitizedValue(anonymousId); + const sanitizedParam = getSanitizedValue(rudderAmpLinkerParam); + + if (sanitizedId !== undefined && typeof sanitizedId !== 'string') { + this.logger.error('Invalid anonymousId type provided'); + return; + } + this.getAnalyticsInstance().setAnonymousId( - getSanitizedValue(anonymousId), - getSanitizedValue(rudderAmpLinkerParam), + sanitizedId, + sanitizedParam, ); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- packages/analytics-js/tests/components/core/Analytics.test.ts (1 hunks)
- packages/analytics-js/src/app/RudderAnalytics.ts (11 hunks)
- packages/analytics-js/src/components/configManager/util/validate.ts (1 hunks)
- packages/analytics-js/src/components/core/Analytics.ts (2 hunks)
- packages/analytics-js/src/components/core/utilities.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/analytics-js/src/components/core/Analytics.ts
🧰 Additional context used
📓 Learnings (1)
packages/analytics-js/src/components/configManager/util/validate.ts (2)
Learnt from: MoumitaM PR: rudderlabs/rudder-sdk-js#1777 File: packages/analytics-js/src/components/configManager/util/validate.ts:0-0 Timestamp: 2024-10-08T15:52:59.819Z Learning: The function `isTopLevelDomain` in `validate.ts` was renamed to `isWebpageTopLevelDomain` for clarity.
Learnt from: MoumitaM PR: rudderlabs/rudder-sdk-js#1777 File: packages/analytics-js/src/components/configManager/util/validate.ts:0-0 Timestamp: 2024-07-04T07:11:53.001Z Learning: The function `isTopLevelDomain` in `validate.ts` was renamed to `isWebpageTopLevelDomain` for clarity.
🔇 Additional comments (9)
packages/analytics-js/src/components/core/utilities.ts (4)
1-2
: LGTM! Appropriate use of common utilities.Good practice to leverage existing validation utilities from the common package.
15-15
: LGTM! Comprehensive write key validation.The function properly combines type checking and content validation, including whitespace trimming.
19-19
: LGTM! Clean export statement.All utility functions are properly exported.
15-19
: Verify consistent validation usage across the codebase.Let's ensure these validation functions are consistently used wherever write keys and data plane URLs are handled.
✅ Verification successful
Based on the search results, I can now provide a final response:
Validation functions are correctly implemented and used consistently
The validation functions are properly implemented in
utilities.ts
and used consistently in the core Analytics class initialization:
isWriteKeyValid
validates that the write key is a non-empty stringisDataPlaneUrlValid
validates that the data plane URL is a valid URL string using the common URL validation utility- These validations are called early in the Analytics class initialization with appropriate error messages
- The validation functions are also used by other components like ConfigManager when needed
The implementation follows good practices:
- Early validation before proceeding with initialization
- Clear error messages for invalid inputs
- Reuse of common validation utilities
- Consistent validation across components
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for potential locations where these validations should be applied # Search for writeKey parameter/property usage echo "Checking writeKey usage:" rg -l "writeKey" --type ts # Search for dataPlaneUrl parameter/property usage echo -e "\nChecking dataPlaneUrl usage:" rg -l "dataPlaneUrl" --type ts # Search for URL validation to ensure consistent approach echo -e "\nChecking other URL validations that might need alignment:" rg -l "isValidURL|validateUrl" --type tsLength of output: 4775
Script:
#!/bin/bash # Let's check the actual implementation of validation in key files # Check Analytics class initialization and validation echo "=== Analytics class validation ===" ast-grep --pattern 'class Analytics { $$$ constructor($$$) { $$$ } $$$ }' # Check RudderAnalytics class initialization echo -e "\n=== RudderAnalytics class validation ===" ast-grep --pattern 'class RudderAnalytics { $$$ constructor($$$) { $$$ } $$$ }' # Check ConfigManager validation echo -e "\n=== ConfigManager validation ===" rg "isWriteKeyValid|isDataPlaneUrlValid" -A 5 packages/analytics-js/src/components/configManager/ConfigManager.ts # Check if there are any other URL validation implementations echo -e "\n=== Other URL validations ===" rg "isValidURL" -A 3Length of output: 202321
Script:
#!/bin/bash # Let's check how these validation functions are used in the core Analytics class initialization # Check Analytics class validation usage echo "=== Analytics class validation usage ===" rg "isWriteKeyValid|isDataPlaneUrlValid" -A 5 packages/analytics-js/src/components/core/Analytics.ts # Check ConfigManager validation usage echo -e "\n=== ConfigManager validation usage ===" rg "isWriteKeyValid|isDataPlaneUrlValid" -A 5 packages/analytics-js/src/components/configManager/ConfigManager.ts # Check if there are any other validation implementations for writeKey/dataPlaneUrl echo -e "\n=== Other validation implementations ===" rg "writeKey.*valid|dataPlaneUrl.*valid" -A 3Length of output: 141546
packages/analytics-js/src/components/configManager/util/validate.ts (2)
Line range hint
1-55
: Verify URL validation implementation across the codebase.Given the removal of validation functions and the focus on input sanitization, we should verify that URL validation is properly implemented elsewhere in the codebase.
Let's check for URL validation patterns:
#!/bin/bash # Search for URL validation implementations echo "Searching for URL validation patterns..." rg "new URL\(|isValidURL|validateURL" --type ts # Search for potential URL sanitization echo "Searching for URL sanitization..." rg "sanitizeURL|sanitizeUrl|cleanURL|cleanUrl" --type ts
1-2
: Clarify the validation strategy changes.The removal of validation functions (
validateWriteKey
,validateDataPlaneUrl
,validateLoadArgs
) seems to contradict the PR's objective of implementing input data sanitization. Please clarify:
- Where has the validation logic been moved to?
- How is input sanitization being handled without these validations?
Let's check where these validations might have moved:
✅ Verification successful
Based on the comprehensive search results, I can now generate the final response:
Input validation is properly handled through alternative mechanisms
The removal of the validation functions (
validateWriteKey
,validateDataPlaneUrl
,validateLoadArgs
) does not compromise data sanitization as the validation logic has been refactored into:
Direct validation checks in the Analytics class:
isWriteKeyValid()
validates write key is non-empty stringisDataPlaneUrlValid()
validates URL usingisValidURL
utility- Both validations occur during the
load()
method with proper error loggingType-level validation through TypeScript:
- Strong typing for configuration interfaces (IConfigManager, LoadOptions)
- Runtime type checking with utilities like
isString
,isNullOrUndefined
URL sanitization:
- URL validation through
new URL()
constructor- URL path sanitization using
removeDuplicateSlashes
- Proper error handling for malformed URLs
The changes align with the PR's objective by providing robust input validation while improving code organization.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for new validation implementations rg "validateWriteKey|validateDataPlaneUrl|validateLoadArgs" --type tsLength of output: 3752
Script:
#!/bin/bash # Search for validation related code in the current file and its imports echo "Searching for validation in the current file..." rg "validate|sanitize|check" packages/analytics-js/src/components/configManager/util/validate.ts -A 5 echo "\nSearching for validation in common utilities..." rg "validate|sanitize|check" packages/analytics-js-common/src/utilities/checks.ts -A 5 rg "validate|sanitize|check" packages/analytics-js-common/src/utilities/object.ts -A 5 echo "\nSearching for new validation implementations..." fd "validate" --type f --extension ts --exec rg "export.*validate" {}Length of output: 2634
Script:
#!/bin/bash # Let's check for any new validation implementations in the codebase echo "Searching for new validation patterns in core and config modules..." rg "class.*Config|interface.*Config" --type ts -A 10 rg "validateConfig|validateOptions|sanitizeConfig|sanitizeOptions" --type ts -A 5 echo "\nChecking for data plane URL handling..." rg "dataPlaneUrl|sourceConfigUrl" --type ts -A 5 echo "\nChecking for write key validation..." rg "writeKey.*=|writeKey.*:|setWriteKey" --type ts -A 5Length of output: 76874
packages/analytics-js/__tests__/components/core/Analytics.test.ts (3)
131-201
: LGTM! Well-structured test cases for input validation.The new test cases effectively validate the input sanitization requirements with:
- Clear test organization
- Proper spy management
- Consistent error message format
- Good coverage of basic edge cases
132-166
: Consider adding more edge cases to write key validation tests.The test cases effectively cover basic invalid inputs. Consider adding tests for:
- Null/undefined write keys
- Very long write keys (e.g., exceeding maximum length)
- Write keys containing special characters
168-201
: Enhance data plane URL validation test coverage.While the current tests cover basic cases, consider adding tests for:
- URLs with/without protocols (http/https)
- URLs with query parameters
- URLs with fragments
- URLs with special characters
- URLs with different ports
- URLs with trailing slashes
a9401ac
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
packages/analytics-js/src/app/RudderAnalytics.ts (2)
Line range hint 134-161
: Consider sanitizing all load parameters for consistency.
While loadOptions
is now sanitized, consider also sanitizing writeKey
and dataPlaneUrl
parameters for consistent input handling across all parameters.
load(writeKey: string, dataPlaneUrl: string, loadOptions?: Partial<LoadOptions>) {
- if (this.analyticsInstances[writeKey]) {
+ const sanitizedWriteKey = getSanitizedValue(writeKey);
+ const sanitizedDataPlaneUrl = getSanitizedValue(dataPlaneUrl);
+ if (this.analyticsInstances[sanitizedWriteKey]) {
return;
}
- this.setDefaultInstanceKey(writeKey);
+ this.setDefaultInstanceKey(sanitizedWriteKey);
const preloadedEventsArray = this.getPreloadedEvents();
// Track page loaded lifecycle event if enabled
this.trackPageLifecycleEvents(preloadedEventsArray, loadOptions);
// The array will be mutated in the below method
promotePreloadedConsentEventsToTop(preloadedEventsArray);
setExposedGlobal(GLOBAL_PRELOAD_BUFFER, clone(preloadedEventsArray));
- this.analyticsInstances[writeKey] = new Analytics();
+ this.analyticsInstances[sanitizedWriteKey] = new Analytics();
this.getAnalyticsInstance(writeKey).load(
- writeKey,
- dataPlaneUrl,
+ sanitizedWriteKey,
+ sanitizedDataPlaneUrl,
getSanitizedValue(loadOptions),
);
}
351-357
: Consider performance optimization for high-frequency API calls.
The current implementation sanitizes all parameters for every API call. For high-frequency events (e.g., tracking user interactions), this might impact performance. Consider:
- Implementing memoization for repeated values
- Adding batch processing capabilities
- Sanitizing only complex objects, skipping primitive values
Example implementation of memoization:
// Add this utility function
const memoizedSanitize = (() => {
const cache = new WeakMap();
return (value: any) => {
if (typeof value !== 'object' || value === null) {
return value; // Return primitives as-is
}
if (cache.has(value)) {
return cache.get(value);
}
const sanitized = getSanitizedValue(value);
cache.set(value, sanitized);
return sanitized;
};
})();
Also applies to: 380-385, 414-419, 437-444, 477-482
This reverts commit b71c44a. revert: sanitization of input data
This reverts commit 77259d5.
PR Description
All the user inputs to the SDK are sanitized to remove BigInt and circular references. They create problems while converting the data to a string.
This makes a lot of things easy down the line of the event data processing.
I've also fixed validations in a few user APIs.
Linear task (optional)
https://linear.app/rudderstack/issue/SDK-2254/investigate-deep-cloning-object-utilities-js-sdk
Cross Browser Tests
Please confirm you have tested for the following browsers:
Sanity Suite
Security
Summary by CodeRabbit
New Features
authToken
from metadata.Bug Fixes
Refactor
Documentation