-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for various endianness issues #5302
base: main
Are you sure you want to change the base?
Conversation
words->insert(words->end(), _.words + _.word_index, | ||
_.words + index_after_operand); | ||
} | ||
words->insert(words->end(), _.native_words.get() + _.word_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment will need to change. This is not always translating to native endianness.
@@ -590,7 +590,7 @@ spv_result_t Parser::parseOperand(size_t inst_offset, | |||
case SPV_OPERAND_TYPE_OPTIONAL_LITERAL_STRING: { | |||
const size_t max_words = _.num_words - _.word_index; | |||
std::string string = | |||
spvtools::utils::MakeString(_.words + _.word_index, max_words, false); | |||
spvtools::utils::MakeString(_.native_words.get() + _.word_index, max_words, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this change is correct. The SPEC says:
A string is interpreted as a nul-terminated stream of characters. All string comparisons are case sensitive. The character set is Unicode in the UTF-8 encoding scheme. The UTF-8 octets (8-bit bytes) are packed four per word, following the little-endian convention (i.e., the first octet is in the lowest-order 8 bits of the word).
How do you know that the native_words
are the correct little-endian order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the spec confusingly says "little-endian" it actually calls for packing bytes into 4-byte words by shifting them, which is what MakeString
implements. This means that the spv\0
([73,70,76,0]
) string gets turned into the 0x00767073
word which will end up like 73 70 76 00
on a little-endian machine, but on a big-endian machine it will be 00 76 70 73
. Very much not a byte string that can be read directly!
We therefore want all the words to be in the native order because MakeString
is actually endinaness-agnostic as it only looks at the 32-bit words, it never tries to cast them to chars and read the result as a string. Only looking at things at the word level means endianness conversion is just pre/post processing to swap the words to match machine order, with zero additional complexity required.
I had to go back through to 2015 archives for this one:
|
I also dug up the original discussion from 2016 is on the public bugzilla: https://www.khronos.org/members/login/bugzilla-public/show_bug.cgi?id=1474 |
Thanks for digging this up, I found a few dead links last time but wasn't sure where to point to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've loaded the prior context (from 2015-2016) and am now starting to look at this.
This patch lacks tests, so it can't be accepted as-is.
@@ -218,6 +218,7 @@ class Parser { | |||
// Is the SPIR-V binary in a different endianness from the host native | |||
// endianness? | |||
bool requires_endian_conversion; | |||
std::unique_ptr<uint32_t[]> native_words; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document what this means.
@@ -262,6 +263,9 @@ spv_result_t Parser::parseModule() { | |||
<< _.words[0] << "'."; | |||
} | |||
_.requires_endian_conversion = !spvIsHostEndian(_.endian); | |||
_.native_words = std::make_unique<uint32_t[]>(_.num_words); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to fix the logic without doubling copying the whole module, for everybody in all situations.
There are endianness tests, and two of them that deal with strings specifically. See SPIRV-Tools/test/binary_parse_test.cpp Line 723 in 7c58952
and SPIRV-Tools/test/binary_parse_test.cpp Line 748 in 7c58952
I'd start by modifying them to make sure they cover the case that I think needs fixing: On a little endian machine, a big-endian SPIR-V module is not correctly having its strings be parsed correctly. |
The SPIR-V specifications is agnostic on the issue of endianness (at least at the 32-bit word level), and tooling has some supporting code for it. It sadly appears to suffer from quite a bit of rot, this PR tackles the following:
spirv-dis
fails to disassemble a SPIR-V file in the non-native endiannessMakeString
I ran and passed all tests on a real PowerPC G5, which is as fun a way to spend a day as any other. There's probably more I can do, but I'd like this to be reviewed before just so I know I'm on the right track.
Additionally, I'm not sure how relevant big-endian SPIR-V support really is considering the state of the tooling. Rather it would probably be beneficial to have big-endian architectures default to emitting little-endian files, as I doubt there are many, if any implementations that correctly implement endianness agnosticism. I think it needs to at least be an option!