Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MurmurHash3 unaligned writes #74

Open
BillyDonahue opened this issue Sep 20, 2019 · 1 comment
Open

MurmurHash3 unaligned writes #74

BillyDonahue opened this issue Sep 20, 2019 · 1 comment

Comments

@BillyDonahue
Copy link

In MurmurHash3.cpp, there's some care to point out that getblock32 and getblock64are customization points for alignment and endianness as data is loaded from thevoid* key`.

// Block read - if your platform needs to do endian-swapping or can only
// handle aligned reads, do the conversion here

But there are alignment and endian issues with writing to the void* out parameters as well. For example, lines like: ((uint64_t*)out)[1] = h2;. Maybe add analogous putblock32, putblock64 functions as customization points.

void putblock32( void* p, int i, uint32_t t ) {
  memcpy((char*)p + i * sizeof(t), &t, sizeof(t));
}
void putblock64( void* p, int i, uint64_t t ) {
  memcpy((char*)p + i * sizeof(t), &t, sizeof(t));
}

// ...so, 
putblock64(out, 0, h1);  // formerly `((uint64_t*)out)[0] = h1;`
putblock64(out, 1, h2);  // formerly `((uint64_t*)out)[1] = h2;`

I'm not sure it's technically ok to even form the unaligned pointers in the first place. That is, I don't think it's safe C++ to cast a void* to uint32_t* unless it's aligned appropriately.

@leethomason
Copy link

I'll second this issue. I've seen the problem when I pass a char* to MurmerHash3. In this line: const uint64_t * blocks = (const uint64_t *)(data); in https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp

The cast from a void* to an unaligned uint64_t (and then dereferencing the uint64_t) is undefined behavior. A desktop class chip will handle this without a hitch. But there are mobile chips that will fault and crash.

UBSan (undefined behavior sanitizer) flags the problem. We have run into this issue in https://github.com/adobe/orc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants