Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Add support for user-defined HTTP headers in NetworkReader. #568

Merged
merged 30 commits into from
Oct 31, 2024

Conversation

anlowee
Copy link
Contributor

@anlowee anlowee commented Oct 30, 2024

Description

Modified constructors of NetworkReader, DownloaderThread and CurlDownloadHandler to support custom headers passed by users.

Note that this PR is the same as #564 , which was messed by a newbie contributor.

Validation performed

Added the Authorization header with a Bearer token to fetch a file from remote, validating the response.

Summary by CodeRabbit

  • New Features

    • Enhanced the CurlDownloadHandler and NetworkReader classes to support custom HTTP headers in download requests.
    • Users can now pass key-value pairs for HTTP headers, improving request customization.
  • Bug Fixes

    • Improved error handling for reserved headers and invalid header formats.
  • Tests

    • Added new test cases to validate functionality with valid and illegal HTTP header key-value pairs.

Copy link
Contributor

coderabbitai bot commented Oct 30, 2024

Walkthrough

The changes involve modifications to the CurlDownloadHandler and NetworkReader classes to support custom HTTP headers during download requests. The constructors of both classes have been updated to include a new optional parameter for HTTP headers, which is processed and validated. Additionally, new test cases have been introduced to ensure the correct handling of valid and invalid HTTP headers. The updates maintain existing functionalities while enhancing the ability to customize HTTP requests.

Changes

File Change Summary
components/core/src/clp/CurlDownloadHandler.cpp - Constructor updated to include http_header_kv_pairs parameter for custom HTTP headers.
- Logic added to process headers, check for reserved headers, and throw exceptions if necessary.
- Updated header appending logic.
components/core/src/clp/CurlDownloadHandler.hpp - Added include statements for <optional> and <unordered_map>.
- Constructor signature updated to include http_header_kv_pairs with documentation reflecting this change.
components/core/src/clp/NetworkReader.cpp - Constructor updated to include http_header_kv_pairs parameter.
- DownloaderThread constructor modified to accept the new parameter.
- Adjustments made to curl_write_callback for better data handling.
components/core/src/clp/NetworkReader.hpp - Constructor signature updated to include http_header_kv_pairs.
- DownloaderThread constructor modified to accept http_header_kv_pairs.
components/core/tests/test-NetworkReader.cpp - Added test cases for valid and illegal HTTP header key-value pairs to validate header handling in NetworkReader.
- Included necessary headers for new test cases.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments to apply before we merge. Most of the comments in the round can be directly committed through GitHub if you want.
Also, can you mention in the PR description that this PR is the same as #564 and we had discussion in the previous PR?

components/core/src/clp/NetworkReader.cpp Outdated Show resolved Hide resolved
components/core/src/clp/CurlDownloadHandler.cpp Outdated Show resolved Hide resolved
components/core/src/clp/CurlDownloadHandler.cpp Outdated Show resolved Hide resolved
components/core/src/clp/CurlDownloadHandler.cpp Outdated Show resolved Hide resolved
components/core/src/clp/CurlDownloadHandler.cpp Outdated Show resolved Hide resolved
components/core/src/clp/CurlDownloadHandler.cpp Outdated Show resolved Hide resolved
components/core/tests/test-NetworkReader.cpp Outdated Show resolved Hide resolved
components/core/tests/test-NetworkReader.cpp Outdated Show resolved Hide resolved
components/core/tests/test-NetworkReader.cpp Outdated Show resolved Hide resolved
@anlowee anlowee marked this pull request as ready for review October 31, 2024 02:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (7)
components/core/tests/test-NetworkReader.cpp (1)

195-222: Consider addressing external test dependency and magic number.

While the test is well-structured, there are two areas for improvement:

  1. The test depends on an external service (httpbin.org). Consider:

    • Adding retry logic for network failures
    • Documenting the external dependency
    • Adding a mock server for offline testing
  2. The magic number 10 should be defined as a named constant with clear intent.

Here's a suggested improvement:

+    // Number of test headers to generate for comprehensive coverage
+    static constexpr int cNumTestHeaders{10};
-    constexpr int cNumHttpHeaderKeyValuePairs{10};
-    for (size_t i{0}; i < cNumHttpHeaderKeyValuePairs; ++i) {
+    for (size_t i{0}; i < cNumTestHeaders; ++i) {
components/core/src/clp/NetworkReader.hpp (1)

Line range hint 99-112: Consider enhancing header parameter documentation

While the documentation is accurate and references the CURL documentation appropriately, consider adding:

  • Example usage with common headers
  • Any header name restrictions or formatting requirements
  • Impact of invalid header formats

Example addition:

     * @param http_header_kv_pairs Key-value pairs representing HTTP headers to pass to the server
-     * in the download request. Doc: https://curl.se/libcurl/c/CURLOPT_HTTPHEADER.html
+     * in the download request. Headers should follow HTTP/1.1 format (e.g., {"Authorization": 
+     * "Bearer token"}). Invalid header formats may cause the request to fail.
+     * Doc: https://curl.se/libcurl/c/CURLOPT_HTTPHEADER.html
components/core/src/clp/CurlDownloadHandler.cpp (5)

83-93: Simplify the lambda function in std::transform for clarity

The lambda function used in std::transform can be simplified by removing the explicit return type, as it can be inferred by the compiler. This enhances readability without affecting functionality.

Apply this diff to simplify the lambda:

         std::transform(
             lower_key.begin(),
             lower_key.end(),
             lower_key.begin(),
-            [](unsigned char c) -> char {
+            [](unsigned char c) {
                 // Implicitly cast the input character into `unsigned char` to avoid UB:
                 // https://en.cppreference.com/w/cpp/string/byte/tolower
                 return static_cast<char>(std::tolower(c));
             }
         );

76-110: Handle potential exceptions from header processing

When processing user-provided headers, consider wrapping the loop with a try-catch block to handle any unexpected exceptions that may arise, ensuring robustness of the constructor.

Example:

try {
    for (auto const& [key, value] : http_header_kv_pairs.value()) {
        // Existing header processing code
    }
} catch (const std::exception& e) {
    // Handle exception, possibly by rethrowing or logging
}

78-82: Clarify the comment on duplicate header keys

The comment mentions that duplicate handling is left to the server. However, since std::unordered_map does not allow duplicate keys, consider revising the comment for accuracy.

Apply this diff to update the comment:

             // NOTE: We rely on the unordered_map to prevent duplicate keys.
-            // handling to the server.

83-83: Prefer emplace over insert for efficiency

When initializing lower_key, consider using std::string lower_key{key}; directly without the braces for clarity.

Apply this diff:

-    auto lower_key{key};
+    std::string lower_key = key;

61-68: Use consistent naming for constants

Consider renaming the constants cRangeHeaderName, cCacheControlHeaderName, and cPragmaHeaderName to use uppercase letters and underscores, following the convention for constant expressions.

Apply this diff to rename the constants:

-    constexpr std::string_view cRangeHeaderName{"range"};
+    constexpr std::string_view RANGE_HEADER_NAME{"range"};

Repeat for the other constants.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b92a6c6 and aad32df.

📒 Files selected for processing (5)
  • components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
  • components/core/src/clp/CurlDownloadHandler.hpp (3 hunks)
  • components/core/src/clp/NetworkReader.cpp (4 hunks)
  • components/core/src/clp/NetworkReader.hpp (5 hunks)
  • components/core/tests/test-NetworkReader.cpp (2 hunks)
🔇 Additional comments (10)
components/core/src/clp/CurlDownloadHandler.hpp (3)

8-8: LGTM! Required includes for custom headers support

The addition of <optional> and <unordered_map> headers is necessary and properly placed in the include section.

Also applies to: 11-11


59-61: LGTM! Well-documented parameter

The documentation for http_header_kv_pairs is clear and includes a helpful reference to the libcurl documentation.


72-74: Consider validating header names and values

While the implementation looks good, consider adding validation for header names and values to prevent injection of malformed headers.

Let's check if there's any validation in the implementation:

Consider implementing these validations:

  1. Header names should be valid HTTP header names
  2. Header values should not contain control characters
  3. Certain headers (like Host, Content-Length) should be restricted
components/core/tests/test-NetworkReader.cpp (1)

9-9: LGTM! The new includes are properly organized and necessary.

The additions of unordered_map, fmt, and json headers are well-placed and essential for the new test cases.

Also applies to: 14-15

components/core/src/clp/NetworkReader.cpp (4)

124-125: LGTM: Constructor parameter addition is well-designed

The new optional parameter for HTTP headers is appropriately typed and follows good practices:

  • Uses std::optional for optional headers
  • Uses std::unordered_map for efficient key-value lookup
  • Maintains backward compatibility by adding at the end

137-142: LGTM: Proper parameter forwarding to DownloaderThread

The initialization correctly moves the headers map to avoid unnecessary copying while maintaining the existing initialization pattern.


227-228: LGTM: Clean parameter propagation to CurlDownloadHandler

The headers are properly forwarded to the CurlDownloadHandler constructor.


Line range hint 226-237: Verify thread-safety in error handling

The error handling block involves shared state modifications. Let's verify there are no potential race conditions in error reporting between threads.

✅ Verification successful

Let me gather more context about the error handling implementation to verify thread safety.


Thread safety is properly implemented in error handling

The error handling implementation is thread-safe due to:

  • Error state variables (m_state and m_curl_ret_code) are atomic
  • The error message buffer is protected by shared pointer semantics
  • The ordering of state updates in set_download_completion_status is documented and properly synchronized
  • Buffer resource access is protected by mutex locks
  • Condition variable notifications occur under mutex protection
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all error handling patterns in the codebase
# to verify thread-safety in error reporting

# Look for error state modifications
rg -p "set_download_completion_status|m_curl_error_msg_buf" --type cpp

# Look for mutex usage around error handling
rg -p "std::mutex|std::lock_guard|std::unique_lock" --type cpp

Length of output: 1480


Script:

#!/bin/bash
# Look for the implementation of set_download_completion_status
ast-grep --pattern 'auto set_download_completion_status(CURLcode $_) -> void { $$$ }'

# Look for any other methods that might access the error state
rg -p "m_download_completed|m_curl_error_code" --type cpp -A 2

Length of output: 152


Script:

#!/bin/bash
# Look for the implementation details in NetworkReader.hpp and NetworkReader.cpp
cat components/core/src/clp/NetworkReader.hpp | rg -A 10 "set_download_completion_status"

# Check for any concurrent access patterns to the error state
cat components/core/src/clp/NetworkReader.cpp | rg -A 5 "set_download_completion_status"

Length of output: 1442

components/core/src/clp/NetworkReader.hpp (2)

16-16: LGTM: Include statement properly placed

The addition of <unordered_map> is necessary for the HTTP headers map type and follows the project's include ordering convention.


Line range hint 251-272: Verify consistent header handling in implementation

The header parameter addition looks good. However, we should verify:

  1. Header validation in the implementation
  2. Proper cleanup of CURL header structures
  3. Thread-safety of header handling
✅ Verification successful

Let me gather more information about the CURL header handling implementation.


Let me check one more aspect about the header validation and setup in the CURL handler.


HTTP header handling implementation is properly managed and thread-safe

The implementation shows robust header handling with:

  • Proper RAII cleanup through CurlStringList wrapper that handles curl_slist memory management
  • Case-insensitive header key validation
  • Thread-safe header handling as headers are stored per-instance
  • Correct CURL option setup using CURLOPT_HTTPHEADER
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for header-related validation and cleanup in the implementation
rg -A 5 "http_header_kv_pairs" components/core/src/clp/

Length of output: 6901


Script:

#!/bin/bash
# Search for CURL header setup and cleanup
ast-grep --pattern 'struct CurlDownloadHandler {
  $$$
  ~CurlDownloadHandler() {
    $$$
  }
  $$$
}'

# Look for header list creation and cleanup
rg -A 10 "curl_slist" components/core/src/clp/

Length of output: 3596


Script:

#!/bin/bash
# Search for header setup in CurlDownloadHandler
rg -A 15 "if \(http_header_kv_pairs\.has_value\(\)\)" components/core/src/clp/CurlDownloadHandler.cpp

# Look for CURL header option setup
rg "CURLOPT_HTTPHEADER" components/core/src/clp/

Length of output: 1652

components/core/tests/test-NetworkReader.cpp Show resolved Hide resolved
components/core/src/clp/CurlDownloadHandler.cpp Outdated Show resolved Hide resolved
components/core/src/clp/CurlDownloadHandler.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between aad32df and f67a590.

📒 Files selected for processing (1)
  • components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
🔇 Additional comments (3)
components/core/src/clp/CurlDownloadHandler.cpp (3)

31-32: Consider reordering constructor parameters for better logical grouping

The http_header_kv_pairs parameter would be more logically placed before the timeout parameters since it's related to the request configuration rather than timing constraints.


61-68: 🛠️ Refactor suggestion

Optimize reserved headers set initialization

Consider making the reserved_headers set static to prevent reconstruction on each constructor call.


94-107: ⚠️ Potential issue

Enhance header injection prevention

The current implementation only checks for \r\n at the end of header values. To prevent HTTP header injection attacks, consider checking for \r or \n anywhere in the value.

Comment on lines +83 to +93
auto lower_key{key};
std::transform(
lower_key.begin(),
lower_key.end(),
lower_key.begin(),
[](unsigned char c) -> char {
// Implicitly cast the input character into `unsigned char` to avoid UB:
// https://en.cppreference.com/w/cpp/string/byte/tolower
return static_cast<char>(std::tolower(c));
}
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider caching the lowercase key transformation

The lowercase transformation of the key is used only for comparison but discarded afterwards. If the same key needs to be processed multiple times (e.g., in error messages), consider caching the transformed value.

Apply this diff to optimize the key transformation:

-            auto lower_key{key};
+            std::string lower_key;
+            lower_key.reserve(key.size());
+            std::transform(
                 key.begin(),
                 key.end(),
                 std::back_inserter(lower_key),
                 [](unsigned char c) -> char {
                     return static_cast<char>(std::tolower(c));
                 }
             );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto lower_key{key};
std::transform(
lower_key.begin(),
lower_key.end(),
lower_key.begin(),
[](unsigned char c) -> char {
// Implicitly cast the input character into `unsigned char` to avoid UB:
// https://en.cppreference.com/w/cpp/string/byte/tolower
return static_cast<char>(std::tolower(c));
}
);
std::string lower_key;
lower_key.reserve(key.size());
std::transform(
key.begin(),
key.end(),
std::back_inserter(lower_key),
[](unsigned char c) -> char {
// Implicitly cast the input character into `unsigned char` to avoid UB:
// https://en.cppreference.com/w/cpp/string/byte/tolower
return static_cast<char>(std::tolower(c));
}
);

@anlowee anlowee changed the title clp-core: Add custom headers support for NetworkReader clp-core: Add Custom Headers Support for NetworkReader Oct 31, 2024
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For PR title, how about:

core: Add support for user-defined HTTP headers in `NetworkReader`.

@anlowee anlowee changed the title clp-core: Add Custom Headers Support for NetworkReader core-clp: Add support for user-defined HTTP headers in NetworkReader. Oct 31, 2024
@anlowee anlowee changed the title core-clp: Add support for user-defined HTTP headers in NetworkReader. core: Add support for user-defined HTTP headers in NetworkReader. Oct 31, 2024
@anlowee anlowee merged commit 9f6a6ce into y-scope:main Oct 31, 2024
20 checks passed
@anlowee anlowee deleted the xiaochong-add-custom-headers-v2 branch October 31, 2024 21:39
LinZhihao-723 added a commit to LinZhihao-723/clp that referenced this pull request Nov 12, 2024
* ffi: Add support for serializing/deserializing auto-generated and user-generated schema tree node IDs. (y-scope#557)

Co-authored-by: kirkrodrigues <[email protected]>

* clp: Add missing C++ standard library includes in IR parsing files. (y-scope#561)

Co-authored-by: kirkrodrigues <[email protected]>

* log-viewer-webui: Update `yscope-log-viewer` to the latest version (which uses `clp-ffi-js`). (y-scope#562)

* package: Upgrade dependencies to resolve security issues. (y-scope#536)

* clp-s: Implement table packing (y-scope#466)

Co-authored-by: wraymo <[email protected]>
Co-authored-by: Kirk Rodrigues <[email protected]>
Co-authored-by: wraymo <[email protected]>

* log-viewer-webui: Update `yscope-log-viewer` to the latest version. (y-scope#565)

* ci: Switch GitHub macOS build workflow to use macos-13 (x86) and macos-14 (ARM) runners. (y-scope#566)

* core: Add support for user-defined HTTP headers in `NetworkReader`. (y-scope#568)

Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Xiaochong Wei <[email protected]>

* chore: Update to the latest version of yscope-dev-utils. (y-scope#574)

* build(core): Upgrade msgpack to v7.0.0. (y-scope#575)

* feat(ffi): Update IR stream protocol version handling in preparation for releasing the kv-pair IR stream format: (y-scope#573)

- Bump the IR stream protocol version to 0.1.0 for the kv-pair IR stream format.
- Treat the previous IR stream format's versions as backwards compatible.
- Differentiate between backwards-compatible and supported versions during validation.

Co-authored-by: kirkrodrigues <[email protected]>

* fix(taskfiles): Trim trailing slash from URL prefix in `download-and-extract-tar` (fixes y-scope#577). (y-scope#578)

* fix(ffi): Correct `clp::ffi::ir_stream::Deserializer::deserialize_next_ir_unit`'s return value when failing to read the next IR unit's type tag. (y-scope#579)

* fix(taskfiles): Update `yscope-log-viewer` sources in `log-viewer-webui-clients` sources list (fixes y-scope#576). (y-scope#580)

* fix(cmake): Add Homebrew path detection for `mariadb-connector-c` to fix macOS build failure. (y-scope#582)

Co-authored-by: kirkrodrigues <[email protected]>

* refactor(ffi): Make `get_schema_subtree_bitmap` a public method of `KeyValuePairLogEvent`. (y-scope#581)

* ci: Schedule GitHub workflows to daily run to detect failures due to upgraded dependencies or environments. (y-scope#583)

* docs: Update the required version of task. (y-scope#567)

* Add pr check workflow

---------

Co-authored-by: kirkrodrigues <[email protected]>
Co-authored-by: Junhao Liao <[email protected]>
Co-authored-by: Henry8192 <[email protected]>
Co-authored-by: Devin Gibson <[email protected]>
Co-authored-by: wraymo <[email protected]>
Co-authored-by: wraymo <[email protected]>
Co-authored-by: Xiaochong(Eddy) Wei <[email protected]>
Co-authored-by: Xiaochong Wei <[email protected]>
Co-authored-by: haiqi96 <[email protected]>
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants