-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Reduce tensorprotoutils binary size #6634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce tensorprotoutils binary size #6634
Conversation
Refactor the unpack tensor logic to switch on data size Add test cases
| size_t element_size_in_bytes, | ||
| gsl::span<const char> source_bytes, gsl::span<char> destination_bytes) { | ||
| void SwapByteOrderCopy(size_t element_size_in_bytes, | ||
| gsl::span<const uint8_t> source_bytes, gsl::span<uint8_t> destination_bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason i went with "char" was that it was explicitly mentioned as one of the types that one can use to examine an object as bytes.
https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing
i'm fine with the more verbose "unsigned char" as well, or if you can point me to a reference that indicates uint8_t is ok here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://www.gotw.ca/gotw/072.htm has a few comments on why unsigned types are better (although technically that is about bits I think it still applies). std::byte is also defined using unsigned char. And the data isn't inherently signed, so using an unsigned type seems like a better fit on that front as will.
Given uint8_t is an alias to unsigned char that's shorter to type and explicit about the number of bits, what's the benefit of typing the longer 'unsigned char'?
Or alternatively we could define it the same way as c++17 and have a byte type, but that should probably be a different PR.
namespace std
{
enum class byte : unsigned char {};
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that char, unsigned char, and std::byte are special in that they are allowed to be used for working with bytes of an object.
if uint8_t is guaranteed to be an alias of unsigned char, that should be fine too. all I found was that it should be an 8-bit unsigned integral type.
maybe i'm being overly pedantic here. but i think it was correct before and i'm not completely sure about this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For it to be incorrect, wouldn't that require uint8_t to point to something that was not 8 bits in size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comes down to whether CHAR_BITS is 8 on the platform or not, as that size is considered to be one 'byte'.
https://stackoverflow.com/a/16138308/684911
https://stackoverflow.com/a/43144010/684911
If protobuf packs data using 8 bits for a byte I don't think you could use it on platforms where a byte has more than 8 bits. So unsigned char is technically the most correct, uint8_t should work fine on all platforms we support, but defining an equivalent of std::byte would maybe be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For it to be incorrect, wouldn't that require uint8_t to point to something that was not 8 bits in size?
hm, i'm not quite saying that uint8_t is not 8 bits wide... but that it might not necessarily be the same type as unsigned char. the second paragraph here talks about some possible distinction: https://stackoverflow.com/a/16138470
yes, i think a byte type is what we want here. we conceptually deal with bytes of data, not (signed or unsigned) integers or characters.
until we can use std::byte, another idea is to have an onnxruntime::byte (typedef to unsigned char, and eventually std::byte). and "byte" is shorter to type than "uint8_t" :)
this is just my preference. but on the platforms we support, using uint8_t is probably fine.
| gsl::span<uint8_t> destination_bytes); | ||
|
|
||
| /** | ||
| * Reads from a little-endian source with check that T is trivially copyable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: only true if not GCC below 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are stuck with supporting the older GCC versions, maybe we can consider this:
https://stackoverflow.com/a/31798726
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that matters. If there's code that is going to fail, it will fail in our CIs which use later versions of GCC won't it, as we don't have any code that is selectively included for a build with GCC < 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then my original comment applies, regarding "with check that T is trivially copyable."
yuslepukhin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
…nsorProtoUtilsBinarySize
Use unsigned char for bytes. Tried defining 'byte' the same way as c++17 but that created a little more pain than it was worth. Clashed with a typedef in a Windows RPC header, and without the relaxed enum initialization in C++17 it's painful to initialize something like a std::vector<byte>.
Description:
Refactor a few aspects of tensorprotoutils to reduce the binary size.
Reductions: 16.2 KB in Windows release build.
Motivation and Context
Reduce binary size growth from adding external data support in more places.