Skip to content

Conversation

@dentiny
Copy link
Contributor

@dentiny dentiny commented Nov 8, 2024

As titled, I think having MB explicitly called out is more readable than 1024 * 1024 or 1<<20
Intended use case: #48262 (comment)

Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
@dentiny dentiny added the go add ONLY when ready to merge, run all tests label Nov 8, 2024
Copy link
Contributor

@rynewang rynewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can we make at least one usage in code base to avoid dead code and to showcase? src/ray/stats/metric_defs.cc can be a good start.

constexpr unsigned long long operator""_PiB(unsigned long long sz) {
return sz * 1024ULL * 1024ULL * 1024ULL * 1024ULL * 1024ULL;
}
constexpr unsigned long long operator""_PB(unsigned long long sz) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use long long instead of size_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much difference?

  • unsigned long long is the largest unsigned range we could have in C++ natively
  • we won't have negative value here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ofc they are the same: size_t is just typedef of unsigned long long on 64bit system. However it has a clearer semantics: it represents a size, which is what all these "number of bytes" about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong preference, updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I found size_t doesn't compile

./src/ray/util/size_literals.h:22:18: error: 'constexpr size_t operator""_B(size_t)' has invalid argument list
   22 | constexpr size_t operator""_B(size_t sz) { return sz; }
      |                  ^~~~~~~~

Working version:

constexpr unsigned long long operator""_B(unsigned long long sz) { return sz; }

Non-workable version:

constexpr size_t operator""_B(size_t sz) { return sz; }


#include <cstdint>

constexpr unsigned long long operator""_PiB(unsigned long long sz) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap everything in namespace ray::literals so users must opt-in with using

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think putting these function into a namespace is a good idea.
For example, gflag requires all flag values to placed in outmost namespace.
A typical use case is:

DEFINE_int64(some_flag_name, 15_MiB, "some flag description");

Signed-off-by: dentiny <[email protected]>
Copy link
Contributor

@rynewang rynewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since we are here - is it possible to define a class ByteSize { size_t bytes_; static operator""_KiB(...){...} } so we can be more type rich? Just discussion - did not research on how big this is.


constexpr unsigned long long operator""_B(unsigned long long sz) { return sz; }

constexpr unsigned long long operator""_KiB(unsigned long long sz) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I'd prefer ordering as B, KB, MB,...., KiB, MiB, ....

Signed-off-by: dentiny <[email protected]>
@dentiny dentiny requested a review from a team as a code owner November 8, 2024 05:06
@dentiny
Copy link
Contributor Author

dentiny commented Nov 8, 2024

LGTM, but can we make at least one usage in code base to avoid dead code and to showcase? src/ray/stats/metric_defs.cc can be a good start.

Sure, updated one place for metrics defs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] why not:

constexpr size_t operator""_PiB(long double sz) {
const long double res = sz * 1_PiB;
return static_cast<size_t>(res);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and these double-to-size_t code are so duplicated, I wonder if we can make it a macro.

// Macro to generate (long double -> size_t) literal from (size_t -> size_t)
#define CREATE_FLOAT_SIZE_LITERAL(suffix) \
  constexpr size_t operator""_##suffix(long double sz) { \
    return static_cast<size_t>(sz * 1_##suffix); \
  }

// Using the macro to define long double -> size_t literals
CREATE_FLOAT_SIZE_LITERAL(KiB)
CREATE_FLOAT_SIZE_LITERAL(MiB)
CREATE_FLOAT_SIZE_LITERAL(GiB)
CREATE_FLOAT_SIZE_LITERAL(TiB)
CREATE_FLOAT_SIZE_LITERAL(PiB)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] why not:
constexpr size_t operator""_PiB(long double sz) {
const long double res = sz * 1_PiB;
return static_cast<size_t>(res);
}

Updated.

Copy link
Contributor Author

@dentiny dentiny Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of macros.

Using macro is generally not recommended (effective C++ item-1 has a few good arguments):

  • Hard to debug, easier to get wrong;
  • Purely textual replacement, no compiler check (i.e. what if repeated macro definition);
  • Affective, especially in header file.

I almost only use macros at error propagation (i.e. ray/util/logging.h) to save some boilerplate code for unhappy path.
For here, both implementations are just one-liner, don't see much benefit.

@dentiny
Copy link
Contributor Author

dentiny commented Nov 8, 2024

Also since we are here - is it possible to define a class ByteSize { size_t bytes_; static operator""_KiB(...){...} } so we can be more type rich? Just discussion - did not research on how big this is.

Emm could you please give me some examples you think it's better to have such class?
Usually, from my limited experience, size literal seems to be enough.

Signed-off-by: dentiny <[email protected]>
@dentiny dentiny force-pushed the hjiang/size-literals branch from 267eeef to 3b8640d Compare November 8, 2024 08:05
@dentiny dentiny requested a review from rynewang November 8, 2024 08:22
@rynewang rynewang enabled auto-merge (squash) November 12, 2024 20:09
@rynewang rynewang merged commit 4f6a419 into ray-project:master Nov 12, 2024
3 checks passed
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
As titled, I think having `MB` explicitly called out is more readable
than `1024 * 1024` or `1<<20`
Intended use case:
ray-project#48262 (comment)


Signed-off-by: dentiny <[email protected]>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
As titled, I think having `MB` explicitly called out is more readable
than `1024 * 1024` or `1<<20`
Intended use case:
ray-project#48262 (comment)

Signed-off-by: dentiny <[email protected]>
Signed-off-by: mohitjain2504 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants