Fix the S3 endpoint constructor ambiguity#700
Fix the S3 endpoint constructor ambiguity#700rapids-bot[bot] merged 9 commits intorapidsai:branch-25.06from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
|
/ok to test |
vuule
left a comment
There was a problem hiding this comment.
minor questions/suggestions
| * @brief RAII class used to temporarily set environment variables to new values upon construction, | ||
| * and restore them to previous values upon destruction. | ||
| */ | ||
| class EnvVarContext { |
There was a problem hiding this comment.
neat! I might borrow this for cudf compression tests, they are getting to a point where more than one env var is being set
There was a problem hiding this comment.
Perhaps in a future PR this class could be moved out of the test and become part of KvikIO source. Same for the TempDir class from utils.hpp.
There was a problem hiding this comment.
IMO it's fine in tests, it's not a part of the functionality of the kvikIO library
| namespace kvikio::test { | ||
| EnvVarContext::EnvVarContext(std::unordered_map<std::string, std::string> const& env_var_entries) | ||
| { | ||
| for (auto const& [key, current_value] : env_var_entries) { |
There was a problem hiding this comment.
do you know we'd ever need to unset an env var instead of setting to the specified value?
There was a problem hiding this comment.
I was thinking of CUDF_BENCHMARK_DROP_CACHE. No matter what this env var's value is, as long as it exists, the page cache is to be dropped (i.e. CUDF_BENCHMARK_DROP_CACHE=<nothing>/true/false/123 all have the same effect). In this case to fully revert to a previous state, we need to unset the env var.
There was a problem hiding this comment.
I'm not referring to reverting the state, but the actual test state requiring an env var to be unset.
There was a problem hiding this comment.
Yes. I think a thorough test of S3 endpoint would need this feature. The constructor of S3 endpoint has "optional" parameters, but if they are neither passed as argument or set via the env var, an exception will be thrown. We may want to check if the constructor can throw properly when some arguments are missing in one test, and then check when another combination of arguments are missing in another test. This is where env var context could help.
vuule
left a comment
There was a problem hiding this comment.
looks good; just one small suggestion
Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com>
|
/merge |
rapidsai/kvikio#700 fixed a bug in S3 endpoint constructor. This PR updates cuDF accordingly. Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #18565
This PR adds the `x-amz-security-token` header which is needed when using temporary credentials (These start with `ASIA`). According to the [curl docs](https://curl.se/libcurl/c/CURLOPT_HTTPHEADER.html), the list of headers needs to be freed after use, So I added a destructor to `S3Endpoint` that does that. Important to note is that I added a parameter for the session token, but it comes directly after the other credential parameters and _before_ the endpoint override, making this a breaking change for code that interacts directly with these C++ APIs! But it makes it consistent with the other values; otherwise 1 of them would need to be passed only via the env var. It seems there might be another API change here #700 so maybe both changes can be done in the same release. I hope this is useful for others too, looking forward to any comments! Fixes #584 (@TomAugspurger) Authors: - Joost Hoozemans (https://github.com/joosthooz) Approvers: - Tianyu Liu (https://github.com/kingcrimsontianyu) - Bradley Dice (https://github.com/bdice) URL: #693
KvikIO unit test has a utility class `EnvVarContext` class, introduced in #700, and slightly improved in #735. It has been identified that this class was incorrectly initialized, resulting in UB: it causes the unit test failure in C++20, which by fluke was not observed in C++17. This PR fixes this error. Specifically, the constructor of `EnvVarContext` is: ``` EnvVarContext(std::initializer_list<std::pair<std::string_view, std::string_view>> env_var_entries); ``` There are several ways of instantiation: ``` // Direct initialization EnvVarContext env_var_ctx({{"env_1", "v1"}, {"env_2", "v2"}}); // Direct list initialization EnvVarContext env_var_ctx{{"env_1", "v1"}, {"env_2", "v2"}}; // Copy list initialization EnvVarContext env_var_ctx = {{"env_1", "v1"}, {"env_2", "v2"}}; ``` The erroneous instantiation performed is: ``` // Extra pair of braces // {}: brace-enclosed initializer list // {{"env_1", "v1"}, {"env_2", "v2"}}: one element of type pair<std::string_view, std::string_view> // {"env_1", "v1"}: first // {"env_2", "v2"}: second EnvVarContext env_var_ctx{{{"env_1", "v1"}, {"env_2", "v2"}}}; ``` As a result, the initializer list only has 1 pair, with the key being `{"env_1", "v1"}` and value being `{"env_2", "v2"}`. For the key, for instance, the 5-th overload (https://en.cppreference.com/w/cpp/string/basic_string_view/basic_string_view.html) of the constructor was used, where `first` points to "env_1" and `last` points to "v1". Since the two iterators do not form a valid range, UB ensues. Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #751
This PR closes #688 .
This PR also introduces to unit tests the utility RAII class
EnvVarContextto enable temporarily setting environment variables to new values for a test and then restoring to previous states afterwards.