Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: FP16 value type and operations #1497

Open
SPY opened this issue Jan 30, 2024 · 2 comments
Open

Proposal: FP16 value type and operations #1497

SPY opened this issue Jan 30, 2024 · 2 comments

Comments

@SPY
Copy link

SPY commented Jan 30, 2024

Motivation

From a lab toy ML found its adoption in day-to-day usage and integrated in numerous web application now.
To unblock full potential of AI-augmented application many initiative came to Web space recently.
WebGPU proposal allows GPU enabled machines to perform better on AI front.
Half-precision floating point number is a common choice for ML usage, because it provides better memory bandwidth and performance, and less precision is not that important there.
JS Float16Array proposal improve integration of JS and WebGPU API.
Wasm Memory control proposal aims to make GPU <-> Wasm interaction more efficient by reducing memory traffic.
Modern hardware as well brings more native support for FP16: ARMv8-A NEON fp16 and F16C for example.
I believe introduction of native support for half-precision floating point computation to WebAssembly will extend that could be achieved on that field as well and match and complement trends going on the hardware stage.

Potential solutions

Second-class support

We can mimic JS approach and introduce only 2 memory instruction for reading and writing f32 values in binary16 format.

  • f32.load_f16: [i32] -> [f32]
  • f32.store_f16: [i32, f32] -> []

It is easy to implement by VM, but only makes more efficient communication with memory regions shared with GPU somehow.

First-class support

For full scale support I suggest to refer a dedicated explainer for more details.

Briefly,

  • New value type: f16.
  • New shape for v128 values: f16x8.
  • Instructions for scalar arithmetic operations over f16 to be on parity with f32 values.
  • Vector instructions for f16x8 shape.

Despite the fact it is more invasive change it unblocks not only better interaction with GPU originated memory, but also could provide fallback for devices without GPU available for web-usage. Also, it could be used for smaller ML models: text processing, context inference, etc.

Conclusion

I believe second first-class support approach is more beneficial for ecosystem.
All said above could be also applied to non-ML graphic applications.

@bakkot
Copy link

bakkot commented Mar 29, 2024

People pursuing this may wish to follow along at tc39/proposal-float16array#12: x86 prior to Sapphire Rapids does not have a native way to do casts from float64 to float16, which means it would need to be done in software (though it can probably be done fairly cheaply, depending on your definition of "cheaply").

Not relevant if there's only casts from f32, but do note that f64 -> f32 -> f16 gives different results than f64 -> f16 (because of rounding), so it may make sense to have both, particularly as languages like C, C++ and Swift add native support for f16 and have casts from f64 -> f16.

@syg
Copy link

syg commented Mar 29, 2024

x86 prior to Sapphire Rapids

It's worth repeating that Sapphire Rapids is Xeon. There's nothing on Intel roadmaps AFAICT to bring this AVX512 extension to consumer chips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants