Optimized gamma correction on Apple Silicon #1590
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a follow up to #1581. Previous pull request added an optional path for Metal 2.3 - this PR uses it.
In Metal - we have to manually do gamma correction. Traditionally - this has meant:
Apple Silicon can read and write safely from a framebuffer - because each portion of the framebuffer is directly owned by a GPU core in its tile memory. So we don't need two passes, and we don't need an intermediate buffer. We render straight to the main framebuffer, and then execute a shader that reads each pixel value from the main buffer, and writes the corrected value back directly to the framebuffer. This improves performance (render pass changes on Apple Silicon can be expensive) and reduces memory use.
On macOS - Apple Silicon functionality requires shaders compiled against Metal 2.3. Metal 2.3 shaders also only run on macOS 11 - so we need to maintain Metal 2.1 shaders for earlier macOS versions. (macOS 11 is the first version to support Apple Silicon Macs, so we don't need to worry about an Apple Silicon Mac running Metal 2.1.)
Additionally - we need to do a check to make sure the GPU is an Apple GPU before attempting to load the optimized shader. This check has been surfaced to the device class as a tile memory capability.