Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64-bit Widening Multiplication #1495

Open
waterlens opened this issue Jan 19, 2024 · 5 comments
Open

64-bit Widening Multiplication #1495

waterlens opened this issue Jan 19, 2024 · 5 comments

Comments

@waterlens
Copy link

Currently, wasm doesn't have instructions to calculate the high bits of 64-bit widening multiplication. Is it possible to add it in the future?
For your information, LLVM emits a very slow routine to simulate it. And it seems unlikely to be optimized easily for those wasm JIT & AOT compilers.

@dtig
Copy link
Member

dtig commented Jan 19, 2024

Do you mean the Arm64 equivalent of umulh instruction? This would be a straightforward addition to the spec, because IIRC on x64 this would be just the regular mul instruction with a REX prefix. This is also potentially an optimization opportunity because doing the equivalent of a * (unsigned __int128)b >> 64; doesn't seem to generate efficient bytecode in the LLVM backend.

@waterlens
Copy link
Author

waterlens commented Jan 20, 2024

@dtig Thank you for your comment!

Yes, umulh is exactly one of the instructions I proposed.

For the architecture support, I can give a simple summary here.

  • RV64M: mulh, mulhsu, mulhu
  • x86_64: mul, imul, mulx (with BMI2 extension)
  • ARMv8: umulh, smulh
    However, it's rare to provide such instructions on the 32-bit platform.

About adding them to the spec, there are several considerations.

  • Usefulness
    Positive. 64-bit Widening multiplication is very common in numeric calculation and cryptography code.
  • Architecture Support
    Neutral. It will be straightforward to support it on the 64-bit arch but hard on the 32-bit arch.
  • Necessity
    Positive. Currently, LLVM emits a __multi3 call to emulate the 128-bit multiplication for 64-bit widening multiplication, and wasm compilers like cranelift is problematic in optimizing it out, by __multi3 optimization bytecodealliance/wasmtime#4077.
    First, for 64-bit widening multiplication, we only expect i64*i64 -> i128, not i128*i128->i128 (__multi3). It not only emulates the calculation of high 64-bit, but also the low 64-bit. Inefficiency! Actually i128 * i128 -> i128 can be implemented using one i64*i64 -> i128 and two i64*i64 -> i64, and I have no idea why they don't directly emit the i64*i64 -> i128. It's a problem of LLVM, but it can never be a real concern on native 64-bit targets. However, it's indeed one for wasm.
    Second, wasm requires the function to return the result i128 through linear memory, which adds an extra layer and hides the information required by optimizers. The function is in a standalone library in some cases, which makes it worse because it's unlikely to inline the routine.

Considering those all, I suggest adding them to wasm spec as an extension. It could be expected that some implementations will happily implement it while others can ignore the extension.

@waterlens
Copy link
Author

waterlens commented Jan 20, 2024

Shall we draft a proposal for it?

@dtig
Copy link
Member

dtig commented Mar 13, 2024

Sorry I lost track of this issue, I think drafting a proposal is a good next step - though it might also be a good question for the CG or the SIMD subgroup to see if there's other operations that we can include as we don't have a precedent for introducing one-off operations, but there's probably a class of operations we could consider. Here is some documentation on how to get a proposal started, and in parallel this could also be a good topic for discussion at a future SIMD subgroup meeting. @penzn are the subgroup meetings still running?

@waterlens
Copy link
Author

waterlens commented Mar 19, 2024

Thank you for your time. Yeah, it could be better to discuss it somewhere. I would like to hear some opinions from other people. I saw some people misunderstood this as trying to add the instruction directly into the core instr set, but it's not correct. Actually, I'm proposing this as an extension. Because it might be hard to do the 64-bit widening multiplication on some platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants