Skip to content

Fix the S3 endpoint constructor ambiguity#700

Merged
rapids-bot[bot] merged 9 commits intorapidsai:branch-25.06from
kingcrimsontianyu:fix-s3-ctor
Apr 28, 2025
Merged

Fix the S3 endpoint constructor ambiguity#700
rapids-bot[bot] merged 9 commits intorapidsai:branch-25.06from
kingcrimsontianyu:fix-s3-ctor

Conversation

@kingcrimsontianyu
Copy link
Contributor

@kingcrimsontianyu kingcrimsontianyu commented Apr 22, 2025

This PR closes #688 .
This PR also introduces to unit tests the utility RAII class EnvVarContext to enable temporarily setting environment variables to new values for a test and then restoring to previous states afterwards.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 22, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kingcrimsontianyu kingcrimsontianyu added bug Something isn't working non-breaking Introduces a non-breaking change c++ Affects the C++ API of KvikIO python Affects the Python API of KvikIO labels Apr 22, 2025
@kingcrimsontianyu
Copy link
Contributor Author

/ok to test

@kingcrimsontianyu
Copy link
Contributor Author

/ok to test

@kingcrimsontianyu
Copy link
Contributor Author

/ok to test

@kingcrimsontianyu kingcrimsontianyu marked this pull request as ready for review April 23, 2025 05:21
@kingcrimsontianyu kingcrimsontianyu requested review from a team as code owners April 23, 2025 05:21
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor questions/suggestions

* @brief RAII class used to temporarily set environment variables to new values upon construction,
* and restore them to previous values upon destruction.
*/
class EnvVarContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat! I might borrow this for cudf compression tests, they are getting to a point where more than one env var is being set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in a future PR this class could be moved out of the test and become part of KvikIO source. Same for the TempDir class from utils.hpp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's fine in tests, it's not a part of the functionality of the kvikIO library

namespace kvikio::test {
EnvVarContext::EnvVarContext(std::unordered_map<std::string, std::string> const& env_var_entries)
{
for (auto const& [key, current_value] : env_var_entries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know we'd ever need to unset an env var instead of setting to the specified value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of CUDF_BENCHMARK_DROP_CACHE. No matter what this env var's value is, as long as it exists, the page cache is to be dropped (i.e. CUDF_BENCHMARK_DROP_CACHE=<nothing>/true/false/123 all have the same effect). In this case to fully revert to a previous state, we need to unset the env var.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not referring to reverting the state, but the actual test state requiring an env var to be unset.

Copy link
Contributor Author

@kingcrimsontianyu kingcrimsontianyu Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think a thorough test of S3 endpoint would need this feature. The constructor of S3 endpoint has "optional" parameters, but if they are neither passed as argument or set via the env var, an exception will be thrown. We may want to check if the constructor can throw properly when some arguments are missing in one test, and then check when another combination of arguments are missing in another test. This is where env var context could help.

@kingcrimsontianyu kingcrimsontianyu requested a review from vuule April 23, 2025 14:09
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good; just one small suggestion

Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com>
@kingcrimsontianyu
Copy link
Contributor Author

/merge

Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@rapids-bot rapids-bot bot merged commit 3b9c6d7 into rapidsai:branch-25.06 Apr 28, 2025
63 checks passed
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Apr 28, 2025
rapidsai/kvikio#700 fixed a bug in S3 endpoint constructor. This PR updates cuDF accordingly.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

URL: #18565
@kingcrimsontianyu kingcrimsontianyu added breaking Introduces a breaking change and removed non-breaking Introduces a non-breaking change labels Apr 29, 2025
rapids-bot bot pushed a commit that referenced this pull request May 7, 2025
This PR adds the `x-amz-security-token` header which is needed when using temporary credentials (These start with `ASIA`). According to the [curl docs](https://curl.se/libcurl/c/CURLOPT_HTTPHEADER.html), the list of headers needs to be freed after use, So I added a destructor to `S3Endpoint` that does that.
Important to note is that I added a parameter for the session token, but it comes directly after the other credential parameters and _before_ the endpoint override, making this a breaking change for code that interacts directly with these C++ APIs! But it makes it consistent with the other values; otherwise 1 of them would need to be passed only via the env var.
It seems there might be another API change here #700 so maybe both changes can be done in the same release.
I hope this is useful for others too, looking forward to any comments!

Fixes #584 (@TomAugspurger)

Authors:
  - Joost Hoozemans (https://github.com/joosthooz)

Approvers:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Bradley Dice (https://github.com/bdice)

URL: #693
rapids-bot bot pushed a commit that referenced this pull request Jun 13, 2025
KvikIO unit test has a utility class `EnvVarContext` class, introduced in #700, and slightly improved in #735. It has been identified that this class was incorrectly initialized, resulting in UB: it causes the unit test failure in C++20, which by fluke was not observed in C++17. This PR fixes this error. Specifically, the constructor of `EnvVarContext` is:
```
EnvVarContext(std::initializer_list<std::pair<std::string_view, std::string_view>> env_var_entries);
```
There are several ways of instantiation:
```
// Direct initialization
EnvVarContext env_var_ctx({{"env_1", "v1"}, {"env_2", "v2"}});

// Direct list initialization
EnvVarContext env_var_ctx{{"env_1", "v1"}, {"env_2", "v2"}};

// Copy list initialization
EnvVarContext env_var_ctx = {{"env_1", "v1"}, {"env_2", "v2"}};
```
The erroneous instantiation performed is:
```
// Extra pair of braces
// {}: brace-enclosed initializer list
// {{"env_1", "v1"}, {"env_2", "v2"}}: one element of type pair<std::string_view, std::string_view>
// {"env_1", "v1"}: first
// {"env_2", "v2"}: second
EnvVarContext env_var_ctx{{{"env_1", "v1"}, {"env_2", "v2"}}};
``` 
As a result, the initializer list only has 1 pair, with the key being `{"env_1", "v1"}` and value being `{"env_2", "v2"}`. For the key, for instance, the 5-th overload (https://en.cppreference.com/w/cpp/string/basic_string_view/basic_string_view.html) of the constructor was used, where `first` points to "env_1" and `last` points to "v1". Since the two iterators do not form a valid range, UB ensues.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #751
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Introduces a breaking change bug Something isn't working c++ Affects the C++ API of KvikIO python Affects the Python API of KvikIO

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ambiguity in the S3Endpoint constructors

3 participants