Skip to content

Add an overload getenv_or that supports env var alias. Add new env var KVIKIO_NUM_THREADS. Fix UB.#735

Merged
rapids-bot[bot] merged 12 commits intorapidsai:branch-25.08from
kingcrimsontianyu:alias-io-thread
May 30, 2025
Merged

Add an overload getenv_or that supports env var alias. Add new env var KVIKIO_NUM_THREADS. Fix UB.#735
rapids-bot[bot] merged 12 commits intorapidsai:branch-25.08from
kingcrimsontianyu:alias-io-thread

Conversation

@kingcrimsontianyu
Copy link
Contributor

This PR performs the following items:

  • Add a new overload kvikio::getenv_or that supports env var alias, i.e. multiple different env vars referring to the same property.
  • Add new env var KVIKIO_NUM_THREADS as the alias of KVIKIO_NTHREADS, and add new C++ API num_threads/set_thread_pool_nthreads as the alias of existing thread_pool_nthreads/set_thread_pool_nthreads. This makes C++ and Python API (i.e. kvikio.defaults.get("num_threads")) consistent.
  • Fix a well-hidden UB related to string temporaries and std::initializer_list.

@kingcrimsontianyu kingcrimsontianyu added bug Something isn't working feature request New feature or request c++ Affects the C++ API of KvikIO labels May 27, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented May 27, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kingcrimsontianyu kingcrimsontianyu added non-breaking Introduces a non-breaking change and removed feature request New feature or request labels May 27, 2025
@kingcrimsontianyu kingcrimsontianyu marked this pull request as ready for review May 27, 2025 22:16
@kingcrimsontianyu kingcrimsontianyu requested review from a team as code owners May 27, 2025 22:16
@kingcrimsontianyu kingcrimsontianyu requested review from vuule and wence- May 27, 2025 22:17

if (env_name_target.empty()) { return {env_name_target, default_val, false}; }

auto res = getenv_or<T>(env_name_target, default_val);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified the logic so that each type's own version of getenv_or<T> is in charge of value extraction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adds an extra call to getenv, but AFAIK this is fine - repeated calls are VERY fast.

@kingcrimsontianyu kingcrimsontianyu requested a review from vuule May 28, 2025 13:54
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just a few ideas

Comment on lines 92 to 95
KVIKIO_EXPECT(
!has_already_been_set,
"Environment variable " + std::string{env_var_name} + " has already been set by its alias.",
std::invalid_argument);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how messy would it be to allow multiple aliases to be set, as long as they have the same value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented, in a not so messy way.


if (env_name_target.empty()) { return {env_name_target, default_val, false}; }

auto res = getenv_or<T>(env_name_target, default_val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adds an extra call to getenv, but AFAIK this is fine - repeated calls are VERY fast.

Comment on lines 93 to 99
// Env var has an empty value
{
kvikio::test::EnvVarContext env_var_ctx{{{"KVIKIO_TEST_ALIAS", ""}}};
EXPECT_THAT(
[=] { kvikio::getenv_or({"KVIKIO_TEST_ALIAS"}, 123); },
ThrowsMessage<std::invalid_argument>(HasSubstr("unknown config value KVIKIO_TEST_ALIAS=")));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a test where the empty variable is read as a string, so the value is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
When a string stream initialized with an empty string is extracted to a string, its fail bit will still be set. So implementation has been changed (with if constexpr) to permit empty value for string type.

@kingcrimsontianyu kingcrimsontianyu requested a review from vuule May 29, 2025 04:58
Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com>
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@kingcrimsontianyu
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit de4bccd into rapidsai:branch-25.08 May 30, 2025
71 checks passed
rapids-bot bot pushed a commit that referenced this pull request Jun 13, 2025
KvikIO unit test has a utility class `EnvVarContext` class, introduced in #700, and slightly improved in #735. It has been identified that this class was incorrectly initialized, resulting in UB: it causes the unit test failure in C++20, which by fluke was not observed in C++17. This PR fixes this error. Specifically, the constructor of `EnvVarContext` is:
```
EnvVarContext(std::initializer_list<std::pair<std::string_view, std::string_view>> env_var_entries);
```
There are several ways of instantiation:
```
// Direct initialization
EnvVarContext env_var_ctx({{"env_1", "v1"}, {"env_2", "v2"}});

// Direct list initialization
EnvVarContext env_var_ctx{{"env_1", "v1"}, {"env_2", "v2"}};

// Copy list initialization
EnvVarContext env_var_ctx = {{"env_1", "v1"}, {"env_2", "v2"}};
```
The erroneous instantiation performed is:
```
// Extra pair of braces
// {}: brace-enclosed initializer list
// {{"env_1", "v1"}, {"env_2", "v2"}}: one element of type pair<std::string_view, std::string_view>
// {"env_1", "v1"}: first
// {"env_2", "v2"}: second
EnvVarContext env_var_ctx{{{"env_1", "v1"}, {"env_2", "v2"}}};
``` 
As a result, the initializer list only has 1 pair, with the key being `{"env_1", "v1"}` and value being `{"env_2", "v2"}`. For the key, for instance, the 5-th overload (https://en.cppreference.com/w/cpp/string/basic_string_view/basic_string_view.html) of the constructor was used, where `first` points to "env_1" and `last` points to "v1". Since the two iterators do not form a valid range, UB ensues.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #751
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working c++ Affects the C++ API of KvikIO non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants