Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@jonahwilliams
Copy link
Contributor

This one almost workes. The idea is that most of our clips are rectangular, and so we can avoid the expensive software rasterization using the UIView clipToBounds functionality.

flutter/flutter#142830

ClipViewSetMaskView(clipView);
[(FlutterClippingMaskView*)clipView.maskView clipRect:(*iter)->GetRect()
matrix:transformMatrix];
clip_rect = transformMatrix.mapRect((*iter)->GetRect());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be correct, its possible that a skew transform makes a rect non rectangular, we'd need to check that. Also I think once we hit a case where there is a non-clipRect, we'd need to stop using clipsToBounds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a mapRect variant that tells you if it was rect-to-rect, I think you have to use bool mapRect(src,dst) to get this info. Alternately, SkMatrix has a getter to ask if it preserves rects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh nice, thank you!

@hellohuanlin
Copy link
Contributor

The math here can get a bit hairy, and it only covers the rectangle case. I think I will start by trying if CAShapeLayer helps.

Our FlutterClippingMaskView's drawing code looks fine to me. My guess is that with a UIView (backed by a general CALayer) as a mask, the bitmap mask image has to be ready during the CPU drawing of the masked layer, which likely has triggered the software rendering of the mask (otherwise there would be at least 1 unmasked frame produced).

But with CAShapeLayer as the mask, Apple should be able to optimize it, since its path can be directly accessed during the drawing phase of the masked layer, and then send the vertices to GPU, without the pre-rendered bitmap image.

^ Just my guess of how CALayer is implemented internally :)

@jonahwilliams
Copy link
Contributor Author

That sounds like a good idea @hellohuanlin , I'll close this one!

@jmagman
Copy link
Member

jmagman commented Apr 4, 2024

But with CAShapeLayer as the mask, Apple should be able to optimize it, since its path can be directly accessed during the drawing phase of the masked layer, and then send the vertices to GPU, without the pre-rendered bitmap image.

See also how macOS is doing this with CAShapeLayer:

- (void)maskToPath:(CGPathRef)path withOrigin:(CGPoint)origin {
CAShapeLayer* maskLayer = self.layer.mask;
if (maskLayer == nil) {
maskLayer = [CAShapeLayer layer];
self.layer.mask = maskLayer;
}
maskLayer.path = path;
maskLayer.transform = CATransform3DMakeTranslation(-origin.x, -origin.y, 0);
}

@hellohuanlin
Copy link
Contributor

So I did a benchmark on this PR and got the result below (10 runs, iPhone 11). I will try again with more runs (maybe 20) for a more accurate/stable result later.

Main branch

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis		0.25 (3.21%)	
worst_frame_build_time_millis		0.86 (37.91%)	
90th_percentile_frame_build_time_millis		0.38 (10.52%)	
99th_percentile_frame_build_time_millis		0.59 (8.90%)	
average_frame_rasterizer_time_millis		3.67 (3.11%)	
worst_frame_rasterizer_time_millis		13.72 (21.19%)	
90th_percentile_frame_rasterizer_time_millis		6.13 (8.01%)	
99th_percentile_frame_rasterizer_time_millis		10.43 (10.53%)	
average_layer_cache_count		0.00 (0.00%)	
90th_percentile_layer_cache_count		0.00 (0.00%)	
99th_percentile_layer_cache_count		0.00 (0.00%)	
worst_layer_cache_count		0.00 (0.00%)	
average_layer_cache_memory		0.00 (0.00%)	
90th_percentile_layer_cache_memory		0.00 (0.00%)	
99th_percentile_layer_cache_memory		0.00 (0.00%)	
worst_layer_cache_memory		0.00 (0.00%)	
average_picture_cache_count		0.00 (0.00%)	
90th_percentile_picture_cache_count		0.00 (0.00%)	
99th_percentile_picture_cache_count		0.00 (0.00%)	
worst_picture_cache_count		0.00 (0.00%)	
average_picture_cache_memory		0.00 (0.00%)	
90th_percentile_picture_cache_memory		0.00 (0.00%)	
99th_percentile_picture_cache_memory		0.00 (0.00%)	
worst_picture_cache_memory		0.00 (0.00%)	
old_gen_gc_count		0.10 (435.89%)	
average_vsync_transitions_missed		0.96 (23.08%)	
90th_percentile_vsync_transitions_missed		0.95 (22.94%)	
99th_percentile_vsync_transitions_missed		1.00 (31.62%)	
average_frame_request_pending_latency		11910.09 (1.91%)	
90th_percentile_frame_request_pending_latency		14360.15 (0.46%)	
99th_percentile_frame_request_pending_latency		16309.70 (0.85%)	
average_cpu_usage		22.78 (2.29%)	
average_gpu_usage		6.05 (6.81%)	
average_memory_usage		163.51 (3.33%)	
90th_percentile_memory_usage		177.15 (1.48%)	
99th_percentile_memory_usage		179.18 (1.11%)	
total_ui_gc_time		1.22 (28.83%)	
30hz_frame_percentage		0.00 (0.00%)	
60hz_frame_percentage		100.00 (0.00%)	
80hz_frame_percentage		0.00 (0.00%)	
90hz_frame_percentage		0.00 (0.00%)	
120hz_frame_percentage		0.00 (0.00%)	
illegal_refresh_rate_frame_count		0.00 (0.00%)	
average_gpu_frame_time		0.92 (3.74%)	
90th_percentile_gpu_frame_time		1.29 (8.65%)	
99th_percentile_gpu_frame_time		1.55 (7.51%)	
worst_gpu_frame_time		1.71 (15.65%)	
average_gpu_memory_mb		0.00 (0.00%)	
90th_percentile_gpu_memory_mb		0.00 (0.00%)	
99th_percentile_gpu_memory_mb		0.00 (0.00%)	
worst_gpu_memory_mb		0.00 (0.00%)	

After this PR

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis		0.24 (4.62%)	
worst_frame_build_time_millis		1.37 (77.08%)	
90th_percentile_frame_build_time_millis		0.38 (12.67%)	
99th_percentile_frame_build_time_millis		0.59 (9.94%)	
average_frame_rasterizer_time_millis		3.60 (1.80%)	
worst_frame_rasterizer_time_millis		14.39 (13.78%)	
90th_percentile_frame_rasterizer_time_millis		6.28 (6.25%)	
99th_percentile_frame_rasterizer_time_millis		9.98 (11.53%)	
average_layer_cache_count		0.00 (0.00%)	
90th_percentile_layer_cache_count		0.00 (0.00%)	
99th_percentile_layer_cache_count		0.00 (0.00%)	
worst_layer_cache_count		0.00 (0.00%)	
average_layer_cache_memory		0.00 (0.00%)	
90th_percentile_layer_cache_memory		0.00 (0.00%)	
99th_percentile_layer_cache_memory		0.00 (0.00%)	
worst_layer_cache_memory		0.00 (0.00%)	
average_picture_cache_count		0.00 (0.00%)	
90th_percentile_picture_cache_count		0.00 (0.00%)	
99th_percentile_picture_cache_count		0.00 (0.00%)	
worst_picture_cache_count		0.00 (0.00%)	
average_picture_cache_memory		0.00 (0.00%)	
90th_percentile_picture_cache_memory		0.00 (0.00%)	
99th_percentile_picture_cache_memory		0.00 (0.00%)	
worst_picture_cache_memory		0.00 (0.00%)	
old_gen_gc_count		0.00 (0.00%)	
average_vsync_transitions_missed		1.07 (9.45%)	
90th_percentile_vsync_transitions_missed		1.33 (35.36%)	
99th_percentile_vsync_transitions_missed		1.50 (50.92%)	
average_frame_request_pending_latency		11892.16 (1.15%)	
90th_percentile_frame_request_pending_latency		14357.33 (0.20%)	
99th_percentile_frame_request_pending_latency		16245.00 (0.87%)	
average_cpu_usage		22.62 (2.19%)	
average_gpu_usage		6.26 (10.13%)	
average_memory_usage		147.22 (8.96%)	
90th_percentile_memory_usage		158.42 (10.09%)	
99th_percentile_memory_usage		161.52 (8.94%)	
total_ui_gc_time		1.35 (36.03%)	
30hz_frame_percentage		0.00 (0.00%)	
60hz_frame_percentage		100.00 (0.00%)	
80hz_frame_percentage		0.00 (0.00%)	
90hz_frame_percentage		0.00 (0.00%)	
120hz_frame_percentage		0.00 (0.00%)	
illegal_refresh_rate_frame_count		0.00 (0.00%)	
average_gpu_frame_time		0.90 (3.21%)	
90th_percentile_gpu_frame_time		1.26 (5.35%)	
99th_percentile_gpu_frame_time		1.57 (8.01%)	
worst_gpu_frame_time		1.68 (8.06%)	
average_gpu_memory_mb		0.00 (0.00%)	
90th_percentile_gpu_memory_mb		0.00 (0.00%)	
99th_percentile_gpu_memory_mb		0.00 (0.00%)	
worst_gpu_memory_mb		0.00 (0.00%)	

Observation

  • average frames: 3.67 -> 3.6 (down 2%), noise ~2%
  • 90th frames: 6.13 -> 6.28 (up 2%), noise ~5%
  • 99th frames: 10.43 -> 9.98 (down 5%), noise ~10%
  • worst frames: 13.72 -> 14.39 (up 5%), noise 10-20%

For average frames, the noise and delta are close, so it may be hard to notice a trend by just looking at the graph. However, we should be able to use Google Sheet to see a clear improvement over a course of a week (~50 runs).

For 90th/99th/worst frames, since we only sample a very small subset of frames, the noise is much larger. I think we can ignore the data (unless we see a noticeable regression). I still believe that it improves these cases, it's just the variance and we get unlucky.

So I believe it's still a very good change to make.

@flar
Copy link
Contributor

flar commented Apr 8, 2024

There is a utility in the repo to "pretty print" these A/B tables in Ascii: https://github.com/flutter/flutter/blob/master/dev/devicelab/bin/summarize.dart

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants