Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism #26928

flar · 2021-06-24T02:46:03Z

The mechanism is a drop-in replacement for:

  SkPictureRecorder -> DisplayListBuilder
  SkPicture         -> DisplayList
  SkCanvas          -> DisplayListCanvasDispatcher

The current Flutter Picture/Canvas mechanism redirects (will redirect) to DisplayList with minimal changes

Sets the stage for: flutter/flutter#53501
Replaces: #25234

This is a new approach to implementing a replacement for SkPicture than the previous attempt (#25234). In this approach I used a native DisplayList mechanism heavily based on Skia's SkLiteDL with some tweaks for our needs. Unlike 25234 which required extensive modification of the Dart Picture, PictureRecorder, and Canvas classes, this implementation will slide in natively by mimicking the SkCanvas interface and will be used directly by the native counterpart to Canvas. Thus, the dart:ui classes are not aware that they are using a new storage format.

Some of the tests written for this mechanism turned up bugs in Skia which are actively being worked on. ~~At least one is likely already integrated so I can remove a workaround as soon as I can verify and test with it.~~ (fix integrated and workaround removed for ColorFilter)

Some caveats on this WIP:

It is 100% functional and runs all of the apps I've tested with
Most of the code is in new files in flow/display_list* and flow/layers/display_list_layer*
It is currently enabled by default to run the pre-integration tests, but due to the need for stability will be turned off by default (opt-in) before merging
The opt-out/opt-in is controlled using --dart-flag=--enable-display-list or --dart-flag=--no-enable-display-list
Testing should be complete now ~~is nearly complete. Most all of the functionality needed by Flutter is tested (Text is a notable exception) and~~ including some of the cases that are only needed to mimic SkCanvas fully are tested as well.
Every rendering primitive ~~(~90%)~~ should be tested against every rendering attribute ~~(~99%)~~ in display_list_canvas_unittests.cc

dnfield

My primary concern right now is around the pointer arithmetic/void* usage. Do we have some compelling evidence to show that this is buying us enough in performance gains to avoid using C++ type safety features?

flow/display_list.cc

dnfield · 2021-06-24T17:00:35Z

flow/display_list.cc

+  }
+};
+
+// struct DrawShadowRecOp final : DLOp {


Commented out code.

For a good reason documented in so many other places. I got tired of adding the description comment after a while.

Skia doesn't expose that structure (yet). There is a bug out there. They are looking into it. I cannot fully capture all SkCanvas rendering until I can support this structure. The code will eventually be used, and it must exist for completeness, but I am blocked by their header file structuring.

https://bugs.chromium.org/p/skia/issues/detail?id=12125

(I'm hoping that they'll release the structure before I merge this. Otherwise I'll put more reasons in or switch these to ifdefs or something.)

This Skia bug will be discussed in a synch-up today, so I can make a decision about this code soon.

Skia says that this will take some time as they are also working on a redesign of the Shadow stuff so they would like to prioritize the Metal work over this header work. I think I'll just delete all references to the ShadowRec stuff (except, obviously, in the SkCanvas->DL adapter) for now. We only use this from Dart Canvas presently and that code already works around this issue.

flow/display_list.cc

flow/display_list.h

dnfield · 2021-06-24T17:12:58Z

flow/display_list.h

+  SkRect cull_;
+
+  template <typename T, typename... Args>
+  void* push(size_t extra, Args&&... args);


We should really, really try to avoid void*. Related to my other comment above.

Boilerplate from SkLiteDL used in both Android and Chrome. What would you suggest instead?

Chrome uses a template here: https://source.chromium.org/chromium/chromium/src/+/main:cc/paint/paint_op_buffer.h;l=1109?q=SkLiteDL&ss=chromium%2Fchromium%2Fsrc

They also still have some void pointers, but some of it is abstracted away or more contained and avoids some pointer arithmetic in https://source.chromium.org/chromium/chromium/src/+/main:cc/paint/paint_op_reader.cc

Part of the challenge is that code like this is really hard to audit for security/safety purposes. And that will become more true as we extend/refactor it (unless we specifically refactor it for safety auditing purposes :). If we can do any of that now we should.

But AFAICT, SkLiteDL is not actually used anywhere in Chrome or Android.

By "using" I meant that Skia had provided SkLiteDL as a template for them to use. They customized it to their needs, but you can see the basic bones of SkLiteDL in the structure of their code. You found the Chromium variant, but Android has their own as well - similarly customized.

Can we customize this to avoid some of the pointer arithmetic? :)

I'm not sure what you are suggesting? Making the structs full classes? That adds overhead of a vtable per op even though the actions are only needed in one or two methods - currently those use a switch statement to inline the 2 implementations they care about (dispatch and destructor).

Since the vast majority of the entries in the list are trivially destructible the destructor can dance along the array only calling unref on the few sk_sp<> references that dot the list and then free the array. If we made them all virtual then each op would have to be manually destructed by a virtual pointer.

The comparison method can also do 99+% of its work by a memcmp in this scenario.

The memory of all of the ops is localized which would go away if we converted to some sort of virtual typesafe list of pointers to the structures. It might end up localized, but that might deteriorate over time.

Perhaps I'm not familiar with what you are suggesting, but those are the advantages that it looks like the design was aiming to accomodate.

I like the suggestion from @chinmaygarde to move the op definitions into another file and format so that we can switch up the implementation more easily. We could then investigate the performance impact of something like a vector of virtual dispatch struct/classes.

The one question that I don't think could be easily answered from the above list of "advantages" of the current approach, though, would be the impact of memory fragmentation on locality and, by extension, on app performance. That situation would only come up in practice on long running apps. Has something like that shown up as an issue before?

dnfield · 2021-06-24T17:18:03Z

flow/layers/display_list_layer.cc

+  if (op_cnt_1 > 10) {
+    statistics.AddPictureTooComplexToCompare();
+    return false;
+  }


Why 10?

Could we somehow surface in the display list interface whether it has expensive-to-compare ops in it? For example, if there's a single op that takes 100ns to compare, but 50 ops that each take 1ns...

Do we have benchmarks to say that this is a good number to go with? If not, do we have a plan (and an issue filed) to make this more reliable/performant?

I copied that from the PictureLayer implementation for now. It can be tuned over time. We might want to have a higher threshold given that the DL.equals() method exists and is much more efficient than the SkPicture technique of serializing the stream and comparing a hash.

And for now this is based on the count of the records in the DL. We could also compute and base this on a complexity metric. Since the vast majority of the ops are compared using a bulk compare, we could base it on the number of bytes. I think @knopp probably has some experience with how to tune this metric. For this first pass, I should probably bump this up or eliminate the test - Matt?

I switched to a byte size comparison with a guess at a threshold of 10,000 bytes. I am waiting to hear back from @knopp before I push the change.

dnfield · 2021-06-24T17:24:56Z

flow/layers/layer.h

+  virtual const DisplayListLayer* as_display_list_layer() const {
+    return nullptr;
+  }


I know I'm late to the party here, but rather than continuing to add to these, can we just create a virtual method that gives the layer an opportunity to virtually do what it needs to do with the other layer? It looks like we're more or less doing that with the DiffContext object in some places, but that's failing to capture some significance somehow.

This would be for @knopp to decide and would apply to the overall implementation of the DiffContext (see #21824)

flow/raster_cache.cc

lib/ui/compositing/scene_builder.cc

chinmaygarde

Haven't gone over the whole thing yet but sending the comments I have. More to follow.

flow/display_list.cc

chinmaygarde · 2021-06-24T19:53:57Z

flow/display_list.cc

+#undef DEFINE_DRAW_2ARG_OP
+
+// 4 byte header + 28 byte payload packs efficiently into 32 bytes
+struct DrawArcOp final : DLOp {


Full disclosure, I stopped double checking each struct definition right about here as it is just repetition of the same pattern.

Perhaps (later if needed) this entire file can be generated from a manifest? It's fairly straightforward to make a target depend on a generated TU in GN. It will also allow us to switch implementations on a whim. For instance, to see if Dan's concerns make sense on whether we should just use vtable dispatch.

Even if I am unsure if we could have just gotten away with simple vtable dispatch, I couldn't spot any obvious deficiencies with this approach.

I like the idea for generating everything from a manifest as that would improve clarity on everything. I'll look into that as a follow-on along with investigating different storage solutions as @dnfield was suggesting.

chinmaygarde · 2021-06-24T19:56:09Z

flow/BUILD.gn

@@ -12,6 +12,12 @@ source_set("flow") {
    "compositor_context.h",
    "diff_context.cc",
    "diff_context.h",
+    "display_list.cc",


Minor nit: Can we put the ops in their own header? The stuff about the display list is way more interesting that the op definitions. Will just make the code a bit easier to navigate.

Currently they are private to that cc file. Their definitions aren't needed anywhere else.

Basically display_list.[h,cc] exist only to define, pack, and unpack the structure through Dispatcher. Everything else is in a different file.

I'll lump this in with the previous suggestions for using a manifest and for investigating alternate storage solutions.

flar · 2021-06-24T20:25:14Z

Perhaps we need to have a meta-discussion about the structure packing and sizes.

I wasn't really paying much attention to it until I started writing the display_list_unittests.cc and I decided to assert the sizes of the DLs being generated - mostly for completeness. It seemed like it was just verifying some info that we had an API to provide (dl->bytes()), but I suddenly realized that the sizes weren't what I thought they should be. That should have probably been obvious due to alignment of data members and such, but it was a lot different than what I was expecting. For one thing, I used to have a few ops that I packed into "4 bytes" before I realized that the code imposed 8-byte alignment and for a good reason (ptr alignment). Then I realized that some structures had some major packing issues, so I went about analyzing all of them and adjusting their field orders until they had "fairly optimum" sizes. I documented my work as I went for my own tracking and possibly to inform future maintainers.

But, in the long run, it isn't a big thing to worry about. It might be nice to track that someone doesn't add a huge data structure to every Op and explode our DL sizes, but if it packs to 20 bytes on this platform and 24 on that platform, the difference isn't critical.

On the other hand, a half hour of my diligence in rearranging a few structures saved us some memory at no run-time cost, so there are some minor benefits to it. And I discovered the Windows compiler preference for 16-byte alignments which is gratuitous for our data here, so that is another side benefit.

flow/display_list.cc

flar · 2021-06-29T23:55:19Z

My next push will remove the references to ShadowRec for now.

How do we feel about the scheduling of this PR and whether we should introduce it as an opt-in or opt-out at this time?

@chinmaygarde @dnfield

chinmaygarde

Barring a couple of nits, I believe this patch is good to go.

While its size make it daunting to review, a substantial portion is boilerplate that we can generate from a manifest (if needed). It is conceptually sound and builds on proven work in Skia. It's also extremely well tested. There are few comment threads suggesting alternatives but I don't think there are any remaining construction issues. We have also needed this mechanism for a very long time. For these reasons, I feel comfortable in landing this right now despite the size.

The only suggestion would be to make the flag off by default with a flag that flips the switch immediately to follow. If the patch is implicated in a roll, I rather keep the delta as small as possible.

common/settings.h

chinmaygarde · 2021-06-30T21:09:08Z

shell/common/switches.cc

@@ -66,6 +66,8 @@ static const std::string gAllowedDartFlags[] = {
    "--write-service-info",
    "--null_assertions",
    "--strict_null_safety_checks",
+    "--enable-display-list",


These flags are meant to be sent directly to the Dart VM during initialization. For engine specific flags, we add them to shell/common/switches.h

My impression is that would advertise these switches. I was looking to keep them un-advertised for now...?

All switches are un-advertised and unstable really.

chinmaygarde · 2021-06-30T21:09:36Z

shell/common/switches.cc

@@ -404,6 +406,16 @@ Settings SettingsFromCommandLine(const fml::CommandLine& command_line) {
      settings.dart_flags.push_back(flag);
    }
  }
+  if (std::find(settings.dart_flags.begin(), settings.dart_flags.end(),


See the comment above about adding to shell switches instead of Dart switches.

Or not - intentionally. They do work as is, but are not advertised as I don't know if we want them to become part of our public API unless we plan to support them beyond the initial testing and buy-in phase.

flow/layers/display_list_layer_unittests.cc

The mechanism is a drop-in replacement for: SkPictureRecorder -> DisplayListBuilder SkPicture -> DisplayList SkCanvas -> DisplayListCanvasDispatcher The current Flutter Picture/Canvas mechanism redirects to DisplayList with minimal changes

… fixed

…lated Skia bug

…G ifdef

…L mechanism (flutter/engine#26928)

…L mechanism (flutter/engine#26928) (#85675)

chinmaygarde · 2021-07-01T18:00:55Z

One thing to keep track of still is immediately flipping the flag to use DLs by default.

flar · 2021-07-01T18:40:46Z

One thing to keep track of still is immediately flipping the flag to use DLs by default.

I actually have about 5 or 6 "follow-on" tasks to create issues for. The flag flip is waiting for successful integration down the line to set a baseline.

…L mechanism (#26928)" This reverts commit eab0cd4.

…ism (flutter#26928)

flar added affects: engine Work in progress (WIP) Not ready (yet) for review! labels Jun 24, 2021

flar requested review from chinmaygarde, knopp and dnfield June 24, 2021 02:46

google-cla bot added the cla: yes label Jun 24, 2021

flar mentioned this pull request Jun 24, 2021

[WIP] Display list picture prototype #25234

Closed

flar changed the title ~~Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism~~ [WIP] Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism Jun 24, 2021

dnfield reviewed Jun 24, 2021

View reviewed changes

chinmaygarde reviewed Jun 24, 2021

View reviewed changes

flow/display_list.cc Outdated Show resolved Hide resolved

flar mentioned this pull request Jun 27, 2021

No default SkFontMgr on Fuchsia causes unexpected engine test behavior flutter/flutter#82202

Open

flar force-pushed the DisplayList-2 branch from 3cb7a74 to 578ab90 Compare June 27, 2021 18:41

flar requested review from dnfield and chinmaygarde June 30, 2021 21:01

chinmaygarde approved these changes Jun 30, 2021

View reviewed changes

flar removed the Work in progress (WIP) Not ready (yet) for review! label Jun 30, 2021

flar changed the title ~~[WIP] Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism~~ Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism Jun 30, 2021

flar added 10 commits June 30, 2021 15:59

fix licenses and a potential source of bulk compare failures

b681d3f

fix shell test expecting string output for cache stats

45d4cfd

use std::numeric_limits for min/max float values

f3da850

fix windows compile errors due to structure packing

921d21d

Fix missing save layer flags

30c0380

switch to enum classes and fully implement SkC::drawPicture

e021fef

adopt review suggestions on naming style

285815b

add alignment pragmas to rein in Windows x64 compiler wasted bytes

3a39b55

remove the ColorFilter workaround from unit test now that Skia bug is…

5c0e31f

… fixed

flar added 12 commits June 30, 2021 16:02

more style guide name changes and more testing of dl.Equals

b8399f3

break up complicated unit tests and enable experimental DL test

5f16bcc

minor formatting and simplification of DL unit tests

0b275af

flush out render testing matrix and fix shadow bounds

d98decd

Skip font rendering tests on Fuchsia

6608c5f

fix signed size_t comparisons causing vector overflow

a160d27

eliminate another unsigned size_t comparison in a unit test

b578935

more review feedback

cc83cf5

add drawImageLattice tests and more style guide renames

bd50c5c

remove all ShadowRec code and leave a few comments pointing to the re…

4a79eef

…lated Skia bug

switch DL off by default and move some EXPECT_DEATH tests inside DEBU…

0d66114

…G ifdef

adjust new DL sources for recent removal of Fuchsia legacy code

3755bf1

flar force-pushed the DisplayList-2 branch from 14cddc9 to 3755bf1 Compare June 30, 2021 23:26

flar added the waiting for tree to go green This PR is approved and tested, but waiting for the tree to be green to land. label Jul 1, 2021

fluttergithubbot merged commit eab0cd4 into flutter:master Jul 1, 2021

engine-flutter-autoroll mentioned this pull request Jul 1, 2021

Roll Engine from 28c21828fcda to eab0cd490abc (1 revision) flutter/flutter#85675

Merged

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 1, 2021

eab0cd4 Implement a DisplayList mechanism similar to the Skia SkLiteD…

efc0e81

…L mechanism (flutter/engine#26928)

fluttergithubbot pushed a commit to flutter/flutter that referenced this pull request Jul 1, 2021

eab0cd4 Implement a DisplayList mechanism similar to the Skia SkLiteD…

21a95f6

…L mechanism (flutter/engine#26928) (#85675)

flar added a commit that referenced this pull request Jul 1, 2021

Revert "Implement a DisplayList mechanism similar to the Skia SkLiteD…

ed4a0fc

…L mechanism (#26928)" This reverts commit eab0cd4.

This was referenced Jul 1, 2021

Revert "Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism" #27122

Closed

☂️ - Take advantage of new DisplayList format flutter/flutter#85737

Closed

enable DisplayList by default #27130

Merged

chinmaygarde mentioned this pull request Jul 14, 2021

Reland enable DisplayList by default #27407

Merged

moffatman pushed a commit to moffatman/engine that referenced this pull request Aug 5, 2021

Implement a DisplayList mechanism similar to the Skia SkLiteDL mechan…

928c563

…ism (flutter#26928)

flar mentioned this pull request Aug 27, 2021

Consider adding a custom recording format flutter/flutter#53501

Closed

naudzghebre pushed a commit to naudzghebre/engine that referenced this pull request Sep 2, 2021

Implement a DisplayList mechanism similar to the Skia SkLiteDL mechan…

3b74aae

…ism (flutter#26928)

Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism #26928

Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism #26928

Conversation

flar commented Jun 24, 2021 • edited Loading

dnfield left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chinmaygarde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flar commented Jun 24, 2021

flar commented Jun 29, 2021 • edited Loading

chinmaygarde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chinmaygarde commented Jul 1, 2021

flar commented Jul 1, 2021

flar commented Jun 24, 2021 •

edited

Loading

flar commented Jun 29, 2021 •

edited

Loading