-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge performance regression in 2.0.17 #13899
Comments
I think you should be able to just pass |
Would this flag affect system libraries compilation (libc, libc++...)? |
Passing |
Ah yes, the system libraries won't effected by |
I should have said I was testing on Safari 14.0.3. On Chrome 90 performance doesn't seem to have any regression, regardless of flags. After further testing on Safari:
Regardless of flags: ~14% size reduction. Mostly from system libraries? Seems like the new LLVM pass manager combined with the old Binaryen inlining behavior gives the better performance for our use case on both browsers. Anyway, I think it would be nice when there are changes to defaults if changelog included the flags needed to revert to old behavior 😅. |
Thats somewhat counter intuitive.. it sounds like the only way to match the old performance is to use the new pass manager. I guess we may want to revisit the binaryen inlining change and/or file a bug with apple about that. |
I know... I don't get it but I've run each (flag combination x version) multiple times and those are the results I get... |
Good point about the changelog, thanks, I'll open a PR to update it for both those changes. |
As for the perf issue, perhaps more inlining ends up hitting a worse case with Safari register allocation or something like that? If you can provide a testcase that could be interesting to look at. Otherwise let's see if there are more reports of issues. |
I see performance is dropping when going from 2.0.16 to 2.0.17. it drops from 50fps to 8fps. Strange thing is that it drops only when using Firefox. Chrome and Safari are doing fine on 2.0.17 builds. code is at https://github.com/dirkwhoffmann/virtualc64web |
Did you try with |
@kripken I wonder if we can find a reasonable a value to set |
Just tell me if you would like me to retest with another value than 1. 🤓 |
It would be good if both of you can test with more values of Also, these are specific browser limitations or bugs that we need to investigate and file. Can you please create testcases showing the issue, that is, a build with the 'bad' and the 'good' settings? Then we can file bugs for them to look into. (I can help with the filing part.) Without filing bugs for them things may not improve. |
ok first some testing for vc64web ... still fast values=1,1024,2048,4096,16384,18k,19k,19250,19280,19296,19304,19306 |
machine is a mbp 2007 with macos10.11 and firefox 78.10.0esr (64-Bit) logoutput for
logoutput for
any thoughts? is it serious? or is it high enough to ignore it ? BTW what is the meaning of the number ? Is the unit of the number bytes or instructions ? should I still provide the two wasm files? |
Thanks! Yes, the wasm files would be very helpful. I'd like to see the change between them. And we can file a bug on the relevant browser with them. The meaning of the number is the number of IR nodes in binaryen, which is almost exactly the number of instructions in wasm. That the number is 19306/7 is interesting - this is not a small function that is being inlined. The next question is into what does it get inlined (I can see that with the wasm files). |
@kripken I made the two builds and put them into the zip file ... is that maybe already enough and you are looking only for those .. or do you need the other js files which are needed to run the whole thing too? see the whole thing running (currently with a 2.0.16 build) here https://dirkwhoffmann.github.io/virtualc64web/ the source files (and also the other js files which are needed to run it) are here https://github.com/dirkwhoffmann/virtualc64web |
Thanks @mithrendal ! Yes, the other JS files will be needed for filing a browser bug. What are the steps to plug in the js/wasm combos from the archive, to get it running with that repo? Meanwhile, inspecting the wasm files, it looks like the difference is
and after:
(the combined new function is index 326 since 215 was removed, and then all subsequent indexes decreased by 1. I don't think there is much we can do on the tools side since this looks like something we should inline:
|
To summarize the last few comments in this issue, we have a testcase that dropped from 50fps to 8fps on Firefox, after binaryen started to do more inlining. There is no change on Chrome or Safari, so this is Firefox specific. Investigating the wasm file, I don't see a reason not to perform the inlining (see last comment) - it looks like something the toolchain should be doing as best I can tell (that is, the new inlining heuristics look broadly better, and there is no obvious tweak to the heuristics to avoid this particular testcase). Perhaps this is hitting a bad case in the Firefox register allocator, or some internal limit on the optimizing compiler, or tiering? We can file a bug for further investigation if that would be useful (still need some details from @mithrendal for how to run the code). |
@kripken, thanks, I'll investigate next week. I assume this is x64? |
as vc64web will soon move to its new home address vc64web.github.io (we wait first to get the CORS related http-response headers set at related cross sites from which vc64web currently can fetch its retro content from the demo scene) I took this opportunity to publish the latest copy of it into it, (effectively use the new address for this issue) that is the build emsdk_2.0.17 with the problematic parameter so you will currently get that 2.0.17 version up and running here at its future address http://vc64web.github.io (the current official address with the 2.0.16 build is still https://dirkwhoffmann.github.io/virtualc64web/) when you start it, it asks you for ROM-chips. You may just hit install open-roms or press ESC now. In both case it will install three system roms from the mega65-open-roms project. But if you only! do this the slowdown will not yet! occur. This is because the open-roms do not include the romchip of the Commodore 1541 floppy drive and I found out that on the C++ side the slowdown occurs only! when emulating the 1541 floppy drive. so the long function with 19307 WASM instructions seems to be this one in /Emulator/C64.cpp see link to code here @dirkwhoffmann can you confirm ? @lars-t-hansen had the question whether the code is x64 ? Can you say something on that matter too? to get complete build (including the js-files) just head to https://github.com/vc64web/vc64web.github.io and click green button 'code' -> Download ZIP to get the floppy drive8 activated you will have to click the missing 1541-rom-chip icon here. try this file https://sourceforge.net/p/vice-emu/code/HEAD/tree/trunk/vice/data/DRIVES/d1541II?format=raw as soon you loaded the missing floppy drive rom, the emulator will activate the emulation code for the 1541 floppy drive and begin to slow down dramatically when using firefox and when being compiled with you can spot the perfstats in the web console they will be outputted every second. (there is also a live-debug-console which you can enable in the settings of vc64web) if there is still need to build and compile then you will find the source code currently here https://github.com/dirkwhoffmann/virtualc64web |
I think the question was where you saw the regression - is your machine x64 (as opposed to 32-bit, or ARM)? Info about the browser, CPU, and OS can be helpful in figuring out why it became slower on a particular combination. |
Mithrendal was using Firefox 78.10.0esr (64-Bit) on a MacBook Pro from 2007 (Intel core2duo 2.2 GHz) with macOS 10.11 El Capitan. |
today I changed the current app at to a 2.0.17 the firefox problematic 2.0.17 / 19307 build is still at for building I used an intel macmini on bigsur latest patch level with emsdk 2.0.17. test running was at a slow 2007 mbp machine as @dirkwhoffmann already descibed |
I have hardware like that but the latest Mac OS I have for it is 10.6.8, on which esr78 is not supported. Updating to 10.11 is a little bit of a project, as it's not exclusively my computer. @mithrendal, I'm wondering, can you rerun the experiment but first go into about:config and set javascript.options.wasm_baselinejit (it may be called something slightly different than that but it should be obvious) to false? The most obvious explanation for the problem is that you're being bitten by tiering / being stuck in slow code, and it would be good to rule that out. (Load times may increase.) |
@lars-t-hansen I did set the baslinejit to false and it changed not anything, I spotted between 2 and 4 FPS on the old machine with that setting... then I did a test on a newer machine a intel mac mini 2014 2.6 Ghz Big Sur 11.2.3 Firefox 88 64Bit: I got full and around |
Thanks. It'll be easier to test on the newer hardware with newer Firefox, I'll try to repro. |
OK, I can repro on my current Linux desktop system, life is good. https://bugzilla.mozilla.org/show_bug.cgi?id=1708381. |
@lars-t-hansen vc64web now finally moved to its new home address http://vc64web.github.io , which you used as a test case for the aggressive inline problem on firefox bug 1708381 that means the address serves no longer the build with the problematic inlining for firefox. As far as I understood from the bugzilla ticket you and your colleges have already isolated a small testcase which shows the same problem right? Anyway, if you still need that problematic build I can make one for you on another address. Good luck ! |
@mithrendal, we do indeed have a separate test case for this, thanks for letting us know. The bug referenced above should have all the details. |
@mithrendal I've been working on a fix for this (https://bugzilla.mozilla.org/show_bug.cgi?id=1712078). I'd like to verify that it really fixes the performance problem on the problematic build. So .. could you please make it available on another URL, per your comment above? |
sure 🤓 build with maximal inlining (parameter published to build with limited inlining ( published to |
@mithrendal Thanks. With the patch at https://bugzilla.mozilla.org/show_bug.cgi?id=1712078#c9 I got the following results: Limited inlining, with patch 56 fps In other words .. the patch fixes this problem. |
We have seen some cases in both Chrome and Firefox where extremely large modules cause overhead, #3730 (comment) (and link therein) emscripten-core/emscripten#13899 (comment) There is no "right" value to use as a limit here, but pick an arbitrary one that is very high. (This setting is verified to have no effect on the emscripten benchmark suite.)
Hi, |
It sounds like the OOM is of the browser itself? Inlining leads to larger function sizes, which can require more work by the VM. For example, if the VM does work that is |
Yes, browser eats memory till crash. Problem stably happens in imho Amount of locals can be reduced by splitting big functions into subfunctions (each outer loop for example). It's like a research to measure all browsers and derive good enough ranges. Maybe |
You can try |
I already use |
We've experienced a drastic performance regression (between 3x - 8x) in our app by updating from 2.0.16 to 2.017.
On the other hand, size has decreased 14%.
I suspect it's due to LLVM's new pass manager. Is there any flag we can use to select the legacy LLVM pass manager? This new behavior is preventing us from updating as performance is mission critical for our use case and we don't really care about size.
Thanks.
The text was updated successfully, but these errors were encountered: