Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use Metal for creating PlatformBuffers #2356

Merged
merged 12 commits into from
Apr 12, 2024

Conversation

mrousavy
Copy link
Contributor

@mrousavy mrousavy commented Apr 11, 2024

Changes the implementation of MakeImageFromPlatformBuffer (which currently uses SkData) to use Metal instead (MTLTexture).

In theory, this should be more efficient as it is operating on the GPU, whereas the previous approach (SkData) uses CPU-shared memory (IOSurface), but I haven't tested the performance yet.

Either way, this PR also reorganizes the code a bit to be more flexible around the PixelBuffer formats and should allow YUV buffers as well (future PR).

I added getCVPixelBufferBaseFormat(..) which returns an enum, currently only rgb. In the next follow-up PR that will be extended to also support yuv, and I will add CVPixelBufferUtils::YUV utils.

@mrousavy mrousavy marked this pull request as ready for review April 11, 2024 10:19
…ple PixelFormats (YUV)

Currently it always returns RGB. In the future it might also support YUV
@wcandillon wcandillon self-requested a review April 11, 2024 10:46
@wcandillon
Copy link
Contributor

wcandillon commented Apr 11, 2024

The PR looks good however it doesn't pass the tests on iOS.
E2E=true yarn test -i PlatformBuffer is the command to run the package folder to run the tests.
See https://github.com/shopify/react-native-skia?tab=readme-ov-file#running-end-to-end-tests to run the e2e tests.
Happy to do it with you offline if needed (and I can add missing documentation on the process if needed).

Metal is unfortunately not available on Github Action so we can only run such test locally (see actions/runner-images#6138)

Copy link
Contributor

@wcandillon wcandillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented on the failing tests on the PR comment thread

@mrousavy
Copy link
Contributor Author

@wcandillon I refactored makePlatformBuffer(...) function (which is used in the test) to both auto-convert to 32BGRA, and use GPU IOSurface: b2c97b6

@mrousavy mrousavy requested a review from wcandillon April 11, 2024 14:16
@wcandillon
Copy link
Contributor

very cool, I'm probably merge the makePlatformBuffer().
But the new MakeImageFromBuffer has exactly the same performance, can you confirm?

@mrousavy
Copy link
Contributor Author

mrousavy commented Apr 11, 2024

Yea, the call to MakeImageFromPlatformBuffer(..) alone has roughly the same performance. I would've expected the copy to take a bit longer, but that's apparently not the case on my iPhones. Maybe on really old iPhones, I don't know.

Either way, I think the performance difference will come into effect once you actually submit work to the GPU (flush()), as it will need to upload the data to the GPU first with the SkData approach, and doesn't need to do that with the Metal Texture approach.
At least that's my understanding of it, 99% sure.

@wcandillon
Copy link
Contributor

wcandillon commented Apr 11, 2024 via email

@mrousavy
Copy link
Contributor Author

As an extreme stress-test, we can try react-native-vision-camera v4-skia branch, select a format that has 120 FPS and high resolution (4k) and compare the performance like this by actually checking how fast it renders (VisionCamera has an FPS counter in the top left).

I think we need this PR either way, but we can test this just for fun later if you want?

@mrousavy
Copy link
Contributor Author

first let’s check if the gpu buffer works with the current method (it will not most likely) then we will have to use this this.

Yea for the future we can add branches for both, right?

  1. In PlatformBuffer.MakeFromImage(..) we could add a check to see if the SkImage is backed by a texture. If yes, we do IOSurface approach (GPU), if no we do the normal data copy approach (CPU).
  2. In Image.MakeImageFromPlatformBuffer(..) we could add a check to see if the CMSampleBuffer has a backing IOSurface. If yes, we create the SkImage from a Metal texture, if no we create it from SkData.

We don't need this for 99% of the cases, but maybe in the future we could add that if someone really needs CPU-based CMSampleBuffers (aka CVPixelBuffers that have been manually created from void* data instead of streamed in from the hardware)

@wcandillon
Copy link
Contributor

this new debug method for creating "test" platform buffer is the one we need to use.
I'm eager to merge this part

However is it also compatible with the MakeImageFromPlatformBuffer implementation.
I will try to perform more benchmarking (4k 120fps) to see if there are any differences.

Copy link
Contributor

@wcandillon wcandillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current method (in main) supports both types of SampleBuffers and seems to match exactly performance-wise (while abstracting the texture creation to Skia which is preferable).
Will run more benchmarks

Copy link
Contributor

@wcandillon wcandillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be benchmarked a bit more.

@wcandillon wcandillon self-requested a review April 11, 2024 19:07
Copy link
Contributor

@wcandillon wcandillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be benchmarked

@wcandillon
Copy link
Contributor

wcandillon commented Apr 11, 2024

This PR has two potentially compelling changes:

  1. adding a new way to create a Buffer that is not supported yet by MakeImageFromBuffer().
  2. have better performance

Right now I am observing neither but I will keep testing.

@mrousavy
Copy link
Contributor Author

This PR changes the implementation of both SkImage -> PlatformBuffer and PlatformBuffer -> SkImage to use GPU Metal Textures.
Previously, both were SkData (CPU) approaches.

The GPU Metal Texture is always preferred because that's the format it is in when you conventionally get a CMSampleBuffer - either from the Camera (AVCaptureVideoDataOutput) or when streaming frames from an existing video file.

So GPU Metal Texture (aka IOSurface) is definitely the way to go from my side, I'm not sure if I understand what the counterargument here is?

@mrousavy
Copy link
Contributor Author

As for benchmark; my assumptions are that the SkData approach uses more memory (explicit CPU copy), as well as increased latency when rendering.

We can test this by streaming 4k 120 FPS buffers in VisionCamera and compare the bigger picture - including GPU flush and draw times - not just the PlatformBuffer -> SkImage conversion itself.

Let me know if you want to do that before merging that PR, but then I can only do it tomorrow (dont have a pc with me now)

@mrousavy
Copy link
Contributor Author

mrousavy commented Apr 12, 2024

@wcandillon and I have chatted offline for a while about this. I am convinced that Metal Buffers are the way to go (that's what every guide on the internet including apple developer documentation suggests), but William made the discovery that the "CPU-approach" (aka getting the raw bytes of the CVPixelBuffer and constructing an SkImage with SkData from it) is also really fast.

To clarify; these are the two approaches we have:

  1. "CPU-based" approach where we lock the CVPixelBuffer (CVPixelBufferGetBaseAddress(..)) and copy over the raw data (void*) to an SkImage. This is definitely going over the CPU, as void* is CPU-exposed memory. (this is the code)
  2. "GPU-based" approach where we use a Metal Texture pool to create Metal Textures from the given CMSampleBuffer (we use it's IOSurface). Then we create an SkImage using the Metal Texture as a backend Texture. (this is the code)

I was actually quite surprised how fast the CPU-based approach is with SkData, but nevertheless the Metal approach is faster and more importantly; more energy efficient. Here are my findings:

CPU RAM FPS CPU Frame Time GPU Frame Time
Metal Texture 96% 409 MB 41-61 FPS 18.3ms 11ms
SkData 112% (+16%) 456 MB (+47MB) 25-45 FPS (-16 FPS) 40.3ms (+22ms) 14.1ms (+3.1ms)

And here's the screenshots from Xcode, where you can see that the SkData based approach throttles after time (yes I let the phone cool down in between):

Xcode Profiling screenshots

GPU-based

  1. CPU usage:
    metal-cpu
  2. RAM usage:
    metal-ram
  3. FPS (CPU and GPU):
    metal-fps

CPU-based

  1. CPU usage:
    skdata-cpu
  2. RAM usage:
    skdata-ram
  3. FPS (CPU and GPU):
    skdata-fps

So my conclusion is, that the GPU-based approach is faster, and especially more consistent because after a while of leaving it running it continues to maintain that FPS rate, whereas the CPU usage likely got thermal throttled because of the huge FPS drops. Also it used more memory.

Metal automatically flushes texture cache after 1 second.

// pragma MARK: CVPixelBuffer -> Skia Texture

GrBackendTexture SkiaCVPixelBufferUtils::getSkiaTextureForCVPixelBufferPlane(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of this PR? I've not sure what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it converts one plane of a CMSampleBuffer to a GrBackendTexture. In RGB, we currently only have one plane so this is just used directly for RGB buffers. See here:

GrBackendTexture SkiaCVPixelBufferUtils::RGB::getSkiaTextureForCVPixelBuffer(
CVPixelBufferRef pixelBuffer) {
return getSkiaTextureForCVPixelBufferPlane(pixelBuffer, /* planeIndex */ 0);
}

But for YUV this is going to be relevant since we have multiple planes, so we need to call that method more often (once for Y plane and once for UV plane).

So we need this regardless of RGB or YUV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RGB buffers are also split by planes. It's just always one single plane in RGB. The methods that don't require you to pass a plane index are just convenience methods for the plane methods with index 0.
So we need this method both for RGB and for YUV.

@wcandillon
Copy link
Contributor

It looks good, there is just some code that referes to yuv planes that maybe shouldn't be here

if (textureCache == nil) {
// Create a new Texture Cache
auto result = CVMetalTextureCacheCreate(kCFAllocatorDefault, nil,
MTLCreateSystemDefaultDevice(), nil,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I guess that's fine. So far in our code we made it so we invoke this function only once but I guess that's fair.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I had code in #2352 where I moved that into a separate public function so that we can get that context elsewhere. This included the device. But I think it's fair like now, maybe in the future if we need this somewhere else as well we can refactor a bit.

// pragma MARK: getTextureCache()

CVMetalTextureCacheRef SkiaCVPixelBufferUtils::getTextureCache() {
static thread_local CVMetalTextureCacheRef textureCache = nil;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be in the metal context (which is thread safe and that you can pass as an argument).
That way the thread specific code is only at a single place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I don't really want to separate out the declaration of this variable (static thread_local) to somewhere else, especially not another file, when the initialization (the line below) is here..

If you are talking about also moving the initialization out to ThreadSkiaContext then I wouldn't do that either, as we shouldn't always initialize a Metal Texture pool unless we really need it.

@wcandillon wcandillon self-requested a review April 12, 2024 11:38
@wcandillon wcandillon merged commit a7041d4 into Shopify:main Apr 12, 2024
10 of 11 checks passed
@mrousavy
Copy link
Contributor Author

Nice!!! 💪🚀

btw is the CI stuck?
image

Copy link
Contributor

🎉 This PR is included in version 1.2.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants