Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism #26928

Merged
merged 26 commits into from
Jul 1, 2021

Conversation

flar
Copy link
Contributor

@flar flar commented Jun 24, 2021

The mechanism is a drop-in replacement for:

  SkPictureRecorder -> DisplayListBuilder
  SkPicture         -> DisplayList
  SkCanvas          -> DisplayListCanvasDispatcher

The current Flutter Picture/Canvas mechanism redirects (will redirect) to DisplayList with minimal changes

Sets the stage for: flutter/flutter#53501
Replaces: #25234

This is a new approach to implementing a replacement for SkPicture than the previous attempt (#25234). In this approach I used a native DisplayList mechanism heavily based on Skia's SkLiteDL with some tweaks for our needs. Unlike 25234 which required extensive modification of the Dart Picture, PictureRecorder, and Canvas classes, this implementation will slide in natively by mimicking the SkCanvas interface and will be used directly by the native counterpart to Canvas. Thus, the dart:ui classes are not aware that they are using a new storage format.

Some of the tests written for this mechanism turned up bugs in Skia which are actively being worked on. At least one is likely already integrated so I can remove a workaround as soon as I can verify and test with it. (fix integrated and workaround removed for ColorFilter)

Some caveats on this WIP:

  • It is 100% functional and runs all of the apps I've tested with
  • Most of the code is in new files in flow/display_list* and flow/layers/display_list_layer*
  • It is currently enabled by default to run the pre-integration tests, but due to the need for stability will be turned off by default (opt-in) before merging
  • The opt-out/opt-in is controlled using --dart-flag=--enable-display-list or --dart-flag=--no-enable-display-list
  • Testing should be complete now is nearly complete. Most all of the functionality needed by Flutter is tested (Text is a notable exception) and including some of the cases that are only needed to mimic SkCanvas fully are tested as well.
  • Every rendering primitive (~90%) should be tested against every rendering attribute (~99%) in display_list_canvas_unittests.cc

@flar flar added affects: engine Work in progress (WIP) Not ready (yet) for review! labels Jun 24, 2021
@google-cla google-cla bot added the cla: yes label Jun 24, 2021
@flar flar changed the title Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism [WIP] Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism Jun 24, 2021
Copy link
Contributor

@dnfield dnfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary concern right now is around the pointer arithmetic/void* usage. Do we have some compelling evidence to show that this is buying us enough in performance gains to avoid using C++ type safety features?

flow/display_list.cc Outdated Show resolved Hide resolved
flow/display_list.cc Show resolved Hide resolved
}
};

// struct DrawShadowRecOp final : DLOp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a good reason documented in so many other places. I got tired of adding the description comment after a while.

Skia doesn't expose that structure (yet). There is a bug out there. They are looking into it. I cannot fully capture all SkCanvas rendering until I can support this structure. The code will eventually be used, and it must exist for completeness, but I am blocked by their header file structuring.

https://bugs.chromium.org/p/skia/issues/detail?id=12125

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm hoping that they'll release the structure before I merge this. Otherwise I'll put more reasons in or switch these to ifdefs or something.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Skia bug will be discussed in a synch-up today, so I can make a decision about this code soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skia says that this will take some time as they are also working on a redesign of the Shadow stuff so they would like to prioritize the Metal work over this header work. I think I'll just delete all references to the ShadowRec stuff (except, obviously, in the SkCanvas->DL adapter) for now. We only use this from Dart Canvas presently and that code already works around this issue.

flow/display_list.cc Show resolved Hide resolved
flow/display_list.h Outdated Show resolved Hide resolved
SkRect cull_;

template <typename T, typename... Args>
void* push(size_t extra, Args&&... args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really, really try to avoid void*. Related to my other comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boilerplate from SkLiteDL used in both Android and Chrome. What would you suggest instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They also still have some void pointers, but some of it is abstracted away or more contained and avoids some pointer arithmetic in https://source.chromium.org/chromium/chromium/src/+/main:cc/paint/paint_op_reader.cc

Part of the challenge is that code like this is really hard to audit for security/safety purposes. And that will become more true as we extend/refactor it (unless we specifically refactor it for safety auditing purposes :). If we can do any of that now we should.

But AFAICT, SkLiteDL is not actually used anywhere in Chrome or Android.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "using" I meant that Skia had provided SkLiteDL as a template for them to use. They customized it to their needs, but you can see the basic bones of SkLiteDL in the structure of their code. You found the Chromium variant, but Android has their own as well - similarly customized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we customize this to avoid some of the pointer arithmetic? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you are suggesting? Making the structs full classes? That adds overhead of a vtable per op even though the actions are only needed in one or two methods - currently those use a switch statement to inline the 2 implementations they care about (dispatch and destructor).

Since the vast majority of the entries in the list are trivially destructible the destructor can dance along the array only calling unref on the few sk_sp<> references that dot the list and then free the array. If we made them all virtual then each op would have to be manually destructed by a virtual pointer.

The comparison method can also do 99+% of its work by a memcmp in this scenario.

The memory of all of the ops is localized which would go away if we converted to some sort of virtual typesafe list of pointers to the structures. It might end up localized, but that might deteriorate over time.

Perhaps I'm not familiar with what you are suggesting, but those are the advantages that it looks like the design was aiming to accomodate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion from @chinmaygarde to move the op definitions into another file and format so that we can switch up the implementation more easily. We could then investigate the performance impact of something like a vector of virtual dispatch struct/classes.

The one question that I don't think could be easily answered from the above list of "advantages" of the current approach, though, would be the impact of memory fragmentation on locality and, by extension, on app performance. That situation would only come up in practice on long running apps. Has something like that shown up as an issue before?

Comment on lines 65 to 71
if (op_cnt_1 > 10) {
statistics.AddPictureTooComplexToCompare();
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 10?

Could we somehow surface in the display list interface whether it has expensive-to-compare ops in it? For example, if there's a single op that takes 100ns to compare, but 50 ops that each take 1ns...

Do we have benchmarks to say that this is a good number to go with? If not, do we have a plan (and an issue filed) to make this more reliable/performant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied that from the PictureLayer implementation for now. It can be tuned over time. We might want to have a higher threshold given that the DL.equals() method exists and is much more efficient than the SkPicture technique of serializing the stream and comparing a hash.

And for now this is based on the count of the records in the DL. We could also compute and base this on a complexity metric. Since the vast majority of the ops are compared using a bulk compare, we could base it on the number of bytes. I think @knopp probably has some experience with how to tune this metric. For this first pass, I should probably bump this up or eliminate the test - Matt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to a byte size comparison with a guess at a threshold of 10,000 bytes. I am waiting to hear back from @knopp before I push the change.

Comment on lines +265 to +245
virtual const DisplayListLayer* as_display_list_layer() const {
return nullptr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm late to the party here, but rather than continuing to add to these, can we just create a virtual method that gives the layer an opportunity to virtually do what it needs to do with the other layer? It looks like we're more or less doing that with the DiffContext object in some places, but that's failing to capture some significance somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be for @knopp to decide and would apply to the overall implementation of the DiffContext (see #21824)

flow/raster_cache.cc Show resolved Hide resolved
lib/ui/compositing/scene_builder.cc Show resolved Hide resolved
Copy link
Member

@chinmaygarde chinmaygarde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't gone over the whole thing yet but sending the comments I have. More to follow.

flow/display_list.cc Outdated Show resolved Hide resolved
flow/display_list.cc Show resolved Hide resolved
#undef DEFINE_DRAW_2ARG_OP

// 4 byte header + 28 byte payload packs efficiently into 32 bytes
struct DrawArcOp final : DLOp {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full disclosure, I stopped double checking each struct definition right about here as it is just repetition of the same pattern.

Perhaps (later if needed) this entire file can be generated from a manifest? It's fairly straightforward to make a target depend on a generated TU in GN. It will also allow us to switch implementations on a whim. For instance, to see if Dan's concerns make sense on whether we should just use vtable dispatch.

Even if I am unsure if we could have just gotten away with simple vtable dispatch, I couldn't spot any obvious deficiencies with this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea for generating everything from a manifest as that would improve clarity on everything. I'll look into that as a follow-on along with investigating different storage solutions as @dnfield was suggesting.

@@ -12,6 +12,12 @@ source_set("flow") {
"compositor_context.h",
"diff_context.cc",
"diff_context.h",
"display_list.cc",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: Can we put the ops in their own header? The stuff about the display list is way more interesting that the op definitions. Will just make the code a bit easier to navigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently they are private to that cc file. Their definitions aren't needed anywhere else.

Basically display_list.[h,cc] exist only to define, pack, and unpack the structure through Dispatcher. Everything else is in a different file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll lump this in with the previous suggestions for using a manifest and for investigating alternate storage solutions.

@flar
Copy link
Contributor Author

flar commented Jun 24, 2021

Perhaps we need to have a meta-discussion about the structure packing and sizes.

I wasn't really paying much attention to it until I started writing the display_list_unittests.cc and I decided to assert the sizes of the DLs being generated - mostly for completeness. It seemed like it was just verifying some info that we had an API to provide (dl->bytes()), but I suddenly realized that the sizes weren't what I thought they should be. That should have probably been obvious due to alignment of data members and such, but it was a lot different than what I was expecting. For one thing, I used to have a few ops that I packed into "4 bytes" before I realized that the code imposed 8-byte alignment and for a good reason (ptr alignment). Then I realized that some structures had some major packing issues, so I went about analyzing all of them and adjusting their field orders until they had "fairly optimum" sizes. I documented my work as I went for my own tracking and possibly to inform future maintainers.

But, in the long run, it isn't a big thing to worry about. It might be nice to track that someone doesn't add a huge data structure to every Op and explode our DL sizes, but if it packs to 20 bytes on this platform and 24 on that platform, the difference isn't critical.

On the other hand, a half hour of my diligence in rearranging a few structures saved us some memory at no run-time cost, so there are some minor benefits to it. And I discovered the Windows compiler preference for 16-byte alignments which is gratuitous for our data here, so that is another side benefit.

flow/display_list.cc Outdated Show resolved Hide resolved
@flar
Copy link
Contributor Author

flar commented Jun 29, 2021

My next push will remove the references to ShadowRec for now.

How do we feel about the scheduling of this PR and whether we should introduce it as an opt-in or opt-out at this time?

@chinmaygarde @dnfield

Copy link
Member

@chinmaygarde chinmaygarde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Barring a couple of nits, I believe this patch is good to go.

While its size make it daunting to review, a substantial portion is boilerplate that we can generate from a manifest (if needed). It is conceptually sound and builds on proven work in Skia. It's also extremely well tested. There are few comment threads suggesting alternatives but I don't think there are any remaining construction issues. We have also needed this mechanism for a very long time. For these reasons, I feel comfortable in landing this right now despite the size.

The only suggestion would be to make the flag off by default with a flag that flips the switch immediately to follow. If the patch is implicated in a roll, I rather keep the delta as small as possible.

common/settings.h Outdated Show resolved Hide resolved
@@ -66,6 +66,8 @@ static const std::string gAllowedDartFlags[] = {
"--write-service-info",
"--null_assertions",
"--strict_null_safety_checks",
"--enable-display-list",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These flags are meant to be sent directly to the Dart VM during initialization. For engine specific flags, we add them to shell/common/switches.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is that would advertise these switches. I was looking to keep them un-advertised for now...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All switches are un-advertised and unstable really.

@@ -404,6 +406,16 @@ Settings SettingsFromCommandLine(const fml::CommandLine& command_line) {
settings.dart_flags.push_back(flag);
}
}
if (std::find(settings.dart_flags.begin(), settings.dart_flags.end(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above about adding to shell switches instead of Dart switches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not - intentionally. They do work as is, but are not advertised as I don't know if we want them to become part of our public API unless we plan to support them beyond the initial testing and buy-in phase.

flow/layers/display_list_layer_unittests.cc Show resolved Hide resolved
@flar flar removed the Work in progress (WIP) Not ready (yet) for review! label Jun 30, 2021
@flar flar changed the title [WIP] Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism Implement a DisplayList mechanism similar to the Skia SkLiteDL mechanism Jun 30, 2021
@flar flar added the waiting for tree to go green This PR is approved and tested, but waiting for the tree to be green to land. label Jul 1, 2021
@fluttergithubbot fluttergithubbot merged commit eab0cd4 into flutter:master Jul 1, 2021
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 1, 2021
fluttergithubbot pushed a commit to flutter/flutter that referenced this pull request Jul 1, 2021
@chinmaygarde
Copy link
Member

One thing to keep track of still is immediately flipping the flag to use DLs by default.

@flar
Copy link
Contributor Author

flar commented Jul 1, 2021

One thing to keep track of still is immediately flipping the flag to use DLs by default.

I actually have about 5 or 6 "follow-on" tasks to create issues for. The flag flip is waiting for successful integration down the line to set a baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects: engine cla: yes waiting for tree to go green This PR is approved and tested, but waiting for the tree to be green to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants