Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 3.10.x uses significantly more memory and is significantly slower than 3.9.0 #3434

Closed
PatSmuk360 opened this issue Sep 12, 2019 · 95 comments · May be fixed by PGP-or-D1E/openlibrary#6
Closed

Comments

@PatSmuk360
Copy link

Our builds recently started failing because we run about 80 Less builds in parallel during our project's build process and the new version of Less.js uses so much memory that Node crashes. We traced the crash to upgrading from Less.js from 3.9.0 to 3.10.3.

I changed our Less script to compile the files sequentially (building 2 files at a time) and sampled Node's memory usage during the process and got the following results:

less graph

Less.js seems to use 130% more memory now and takes about 100% longer to compile for us.

Just wondering if you've benchmarked Less.js and whether you see similar results

@matthew-dean
Copy link
Member

That's odd. What is your Node version?

@matthew-dean
Copy link
Member

@PatSmuk360 Could you test and get a memory profile of 3.10.0 and see if it differs?

@PatSmuk360
Copy link
Author

We're running the latest version of 10 (10.16.3).

Heap snapshot before:

image

Heap snapshot after:

image

@PatSmuk360
Copy link
Author

Also I tried on Node 12.10.0 and it seems a lot worse, hitting 587 MB of memory usage at one point in the sequential build.

@PatSmuk360
Copy link
Author

CPU profile before:
CPU-20190916T133934.cpuprofile.zip

image

CPU profile after:
CPU-20190916T134917.cpuprofile.zip

image

@matthew-dean
Copy link
Member

@PatSmuk360 So, the long and short is that the difference between those versions is that the codebase was transformed to ES6 syntax. Which isn't technically a breaking change, which is why it wasn't a major version.

BUT... my suspicion is that some of the Babel conversions for things like object / array spread syntax are less efficient than the more verbose ES5 versions. That's why I was asking if you could test 3.10.0, because originally I was exporting a transpiled package that was compatible with Node 6 and up, but it broke an integration with a particular library that couldn't handle class syntax. So I went down to Node 4 with no class constructs.

If we can figure out exactly which ES5 transform is not performing well, we could theoretically more granularly set those Babel export settings to a more performant export.

@matthew-dean
Copy link
Member

@PatSmuk360 Incidentally, what's that split function that's getting so much time?

@kevinramharak
Copy link

@matthew-dean That seems to be the String.prototype.split. If you open the profile in your chrome devtools you can see all the data color coded. I'm trying to alter the profile to link to https://cdn.jsdelivr.net/npm/[email protected]/dist/less.cjs.js as its source so it is easier to inspect the bottlenecks. Is there a source map available for the *.cjs.js to *.js file? The https://cdn.jsdelivr.net/npm/[email protected]/dist/less.min.js.map file seems to map the .min file to the ES6 source. Maybe we can feed the source map to the dev tools so that we can figure out where transpilation causes bottlenecks.

@kevinramharak
Copy link

This line in specific

sourceLines = inputSource.split('\n');
seems to have an high amount of cpu time. But considering the operation it performs that does not seem out of place to me.

@kevinramharak
Copy link

kevinramharak commented Sep 17, 2019

I don't have much experience with heap profiles, but the thing that stands out to me is the increase in the amount of (closures), (system), (array), system / Context. Trying to tie that to the cpu profile it seems that the increase of these objects results in a massive increase in time spent garbage collecting.

@matthew-dean
Copy link
Member

@kevinramharak In general, converting an AST like Less's with n depth into a serialized, flatted output tree like CSS involves lots of temporary object creation. So, what could be happening is that with certain transforms, it's adding an additional x amount of objects. Even temporarily, you can get an exponential effect where each node creating even just 2-3x the objects, multiplied by the number of nodes multiplied by the number of times it has to flatten rules... I could see it adding up. We were probably naive in general to essentially think of ES6 syntax as essentially syntactic sugar of ES5 syntax. (JavaScript developers are probably guilty of this in general.) In reality, transpiling newer syntax can create ES5 patterns that are not very performant. For 99% of devs, this isn't a big deal because they're not iterating through that code hundreds or thousands of times per second. But this is my guess as to what's happening, because there were no other major changes.

@matthew-dean
Copy link
Member

@kevinramharak Re: the source lines - this is because the original Less parser doesn't track lines / columns of input, so when source mapping was added, it needed to essentially chunk input into lines to figure out how it should map to the original source. This won't be an issue in 4.x+, but that makes sense why it would spend a lot of time there now.

@alecpl
Copy link

alecpl commented Oct 7, 2019

I'm using less.min.js in a browser and 3.10.3 is two times slower than 2.7.3 I was using before. Both in Chrome and Firefox.

@matthew-dean
Copy link
Member

@PatSmuk360 Can you check out this branch? https://github.com/matthew-dean/less.js/tree/3.11.0

In short, Babel's transpilation to ES5 is kinda awful, and uses tons of Object.defineProperty calls to transpile classes. I switched transpilation to TypeScript, which has much saner output of function prototypes.

This is all fine and dandy, but after doing this, Less's browser tests won't run because it uses the very old and outdated PhantomJS, and thus far, no one (including me) has been able to successfully migrate the tests off of PhantomJS onto Headless Chrome.

But, one problem at a time: if the dist source on that branch doesn't have the memory issues, then maybe we could tackle the mess that is browser testing.

@matthew-dean
Copy link
Member

I've managed to migrate Less browser tests to Headless Chrome, but in terms of performance / stability, I need a lot of feedback from users before it's safe to merge, because of changing the transpilation pipeline entirely from Babel to TypeScript.

Current branch(es) can be found here: #3442

@alecpl
Copy link

alecpl commented Oct 16, 2019

Still 2x slower than 2.7.

@matthew-dean
Copy link
Member

@alecpl That's good information, but really I want to know if 3.11.0 is an improvement over 3.10.

@matthew-dean
Copy link
Member

The odd thing about this is that, in theory, the original Less code was converted with Lebab, which should be the opposite of Babel. Which means that ES5 -> ES6 -> ES5 should be nearly identical, but that's obviously not the case. So I'll need to investigate (unless anyone else has the time, which is welcome support) just how the ES6->ES5 code differs from the original ES5 code.

@matthew-dean
Copy link
Member

@alecpl

So, I've run a variety of tests, and spent the time doing benchmarking of 3.11.0 vs. 3.10.3 vs. 3.9.0 vs. 2.7.3 in Headless Chrome

I've found no evidence that the Less compiler is slower for any version, let alone 2x slower. It could still be true that 3.10.0 has more memory overhead because of the transpilation settings, and maybe if a system was constrained for space and doing more memory swaps or more GC as a result, it might result in a slowdown? But I can't verify it.

You can test yourself running grunt benchmark on the 3.11.0 branch. They may report different numbers on an individual run, but if you run enough times, you should see the times are roughly equal. So I don't know where you are getting your data.

@PatSmuk360 Have you been able to test the memory overhead of 3.11.0?

@alecpl
Copy link

alecpl commented Oct 19, 2019

I do not build your code by myself. I don't use nodejs. I just take less.min.js file from dist folder for two different versions I mentioned and use them on my page. Then in console I see timings printed by Less code. My less code uses multiple file and the output file is about 100kB of minified css. That's a "real life" benchmark. I'm talking about Roundcube code.

Chrome is much faster than Firefox, but in both the difference between versions is similar.

@matthew-dean
Copy link
Member

@alecpl Hmm....maybe it's a particular difference for that particular Less code. It's true that the benchmark is arbitrary. If I have time, I'll add those Less files to benchmarking.

@stale
Copy link

stale bot commented Feb 16, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 16, 2020
@PatSmuk360
Copy link
Author

Just wanted to post a quick update: I tried 3.11.1 and it was just as slow as 3.10.3. I'll see if I can cook up a representative benchmark for testing/profiling this.

@stale stale bot removed the stale label Feb 26, 2020
@rjgotten
Copy link
Contributor

rjgotten commented May 4, 2020

I've upgraded to 3.11.1 and have the same memory consumption problems now.
A modestly sized project is taking ~600MB RAM to build through webpack and less-loader.

I've taken a heap allocation timeline resulting in this;

heap-timeline

Something is causing crazy huge allocations kept alive by Ruleset.


[EDIT]
I'm downgrading to 3.9 and grabbing a heap alloc timeline from that. Going to see if I spot anything radically different.

@rjgotten
Copy link
Contributor

rjgotten commented May 4, 2020

@matthew-dean
Found a hint for you.

In 3.11 one of the listed retainers for RuleSet is the ImportManager.
In 3.9 this is not the case.

If everything is kept alive by ImportManager and ImportManager is a singleton across the entire compilation process. Well... yes; that would significantly inflate memory use because nothing can be garbage collected. Not even intermediary rulesets.

@matthew-dean
Copy link
Member

@rjgotten Hmm...... if an object has a reference to another object, why would that prevent GC? I mean, technically, all objects retain references to the public API nodes via the prototype chain. It would only prevent GC if the reverse is true i.e. some object retaining references to every ruleset instance.

@matthew-dean
Copy link
Member

matthew-dean commented Dec 6, 2020

@jrnail23 @Justineo @rjgotten

Note that this is a 4.0 build so you might encounter errors if your code has these breaking changes:

  • Parentheses are required for mixin calls (e.g. .mixin; is not allowed)
  • Less's default math mode is now parens-division, so slashes (intended as math) need to be in parentheses

@matthew-dean
Copy link
Member

matthew-dean commented Dec 6, 2020

So, while I'm now much more optimistic I've fixed the issue, based on the latest benchmarks, I have yet to figure out why this issue has been happening. If performance holds up for everyone else, I can give a run-down of how I finally narrowed down the issue and what I found. In other words, I think I found where the issue is/was, but not necessarily why.

@Justineo
Copy link
Contributor

Justineo commented Dec 6, 2020

The result of the same test suite as #3434 (comment):

Version Time
v3.9.0 ~1.6s
v3.10.0~v3.11.1 ~3.6s
v3.11.2+ ~12s
4.0.1-alpha.2+b2049010 ~1.6s

I can confirm that for my specific test suite, the performance level has improved to as fast as v3.9.0. Very much appreciated Matt! While I'm not sure about the break change on math mode. Changing this may lead to breakage of many of our applications so we may be stuck on v3.9.0 even the performance issue is solved.

@matthew-dean
Copy link
Member

@Justineo

While I'm not sure about the break change on math mode.

You can explicitly compile in math=always mode to get previous math behavior. It's just a different default.

@matthew-dean
Copy link
Member

matthew-dean commented Dec 6, 2020

Breakdown of the issue

TL;DR - Beware of the class pattern

The problem was in the transpilation of classes in both Babel and TypeScript. (In other words, both had the same performance issues with the transpiled code, with Babel being slightly worse.) Now, years ago, when the class pattern was introduced, I was told--and until this issue, believed--that class inheritance in JavaScript is syntactic sugar for functional inheritance.

In short, it isn't. (Edit: well.... it is and it isn't. It depends on what you mean by "inheritance" and how you've defined it as you'll see shortly.... i.e. there are several patterns to create "inheritance" in the prototype chain in JavaScript, and classes represent just one pattern, but that pattern is somewhat different from all the others, hence TS's / Babel's need for helper code to "mimic" this pattern using specialized functions.)

Less code looked like this:

var Node = function() {
  this.foo = 'bar';
}

var Inherited = function() {
  this.value = 1;
}
Inherited.prototype = new Node();

var myNode = new Inherited();

Now say you wanted to re-write this using "modern" JS. In fact, you can automate this process, which I did, but either way I would have written it the same, which was:

class Node {
  constructor() {
    this.foo = 'bar';
  }
}
class Inherited extends Node {
  constructor() {
    super();
    this.value = 1;
  }
}
var myNode = new Inherited();

Same thing, right? Actually, no. The first one will create an object with a property of { value: 1 }, and on its prototype chain, it has an object of { foo: 'bar' }.

The second one, because it will call the constructor on both Inherited and Node, will create an object with a structure like { value: 1, foo: 'bar' }.

Now, for the user, this really doesn't matter, because you can access both value and foo from myNode in either case. Functionally, they appear to behave the same.

Here's where I enter pure speculation

From what I remember in articles about JIT engines like V8, object structures are actually very important. If I create a bunch of structures like { value: 1 }, { value: 2 }, { value: 3 }, { value: 4 }, V8 creates an internal static representation of that structure. You're essentially storing the structure+data once, and then the data 3 more times.

BUT if I add different properties to this each time, like: { a: 'a', value: 1 }, { b: 'b', value: 2 }, { c: 'c', value: 3 }, { d: 'd', value: 4 }, then those are 4 different structures with 4 datasets, even if they were created from the same set of original classes. Every mutation of a JS object de-optimizes data lookups, and a class pattern that's transpiled into functions causes (maybe) more-unique mutations. (I honestly have no idea if this is true for native class support in browsers.)

AFAIK, what's also true is that the more properties you have on an object, the longer that an individual property lookup takes.

Again for end users, this rarely matters because JS engines are so fast. Buuuuut say you're maintaining an engine that creates and looks up properties / methods on LOTS of objects as quickly as possible. (Ding ding ding.) Suddenly these small differences between the way TypeScript / Babel "extend" objects and native functional prototypal inheritance adds up very quickly.

I wrote a bare-bones implementation of a Node and an inheriting function, using the old Less syntax and the class pattern transpiled with TS. Right out of the gate, the inherited node consumes 25% more memory / resources, and that's BEFORE any instances of the inherited node have been created.

Now consider that some nodes inherit from other nodes. That means, the in-memory representation of the classes starts to multiply, as does their inherited instances.

Again, this is speculation

I have to emphasize taking all this with a grain of salt, because I've never heard of a conversion to classes being this disastrous for performance. What I think is happening is that there's some special combination of object / instance lookups that Less is using that is seriously de-optimizing the JIT. (If some JavaScript engine expert knows why transpiled classes perform so much worse than native JS inheritance methods in this case, I would love to know.) I tried to create a performance measure of creating thousands of inherited objects using both methods, and I could never see a consistent difference in performance.

So, before you walk away with "classes in JavaScript are bad", it could be some other really unfortunate pattern in the Less codebase coupled with the difference in transpiled classes vs. native JS that created this drop in performance.

How I finally figured it out

Honestly, I never ever suspected the class pattern. I knew that the original ES5 code ran faster than the reverse-Babelified code, but I suspected something around arrow functions or spread syntax somewhere. I still had the up-to-date ES5 branch, so one day I decided to run lebab again, and only use these transforms: let,class,commonjs,template. That's when I discovered it again had performance problems. I knew it wasn't string templating or let-to-var; I thought maybe requires-to-imports did something so I played with that for a while. That left classes. So on a hunch, I rewrote all the extended classes back to functional inheritance. Bam, performance was back.

A cautionary tale

So! Lesson learned. If your project is on old ES5 code, and you're craving that "modern" Babel-ified or TypeScripted goodness, remember that whatever you transpile, you didn't write.

I was still able to re-write classes into something more "class-like", with a slightly more maintainable pattern, but more or less retaining Less node's original functional inheritance pattern, and I'll be leaving a strong note in the project for a future maintainer to never do this again.

@Justineo
Copy link
Contributor

Justineo commented Dec 7, 2020

You can explicitly compile in math=always mode to get previous math behavior. It's just a different default.

I’m aware of this. We have many Less codebases across many different teams so break changes even in default compilation options would increase communication costs. I’m not sure whether the benefits are worth the breakage.

@Justineo
Copy link
Contributor

Justineo commented Dec 7, 2020

Thank you for the detailed breakdown!

I saw the usage of Object.assign in the compiled output, which means it only works in browsers that support native ES class syntax unless polyfills are now required. So can we just use the native syntax without transpiling down to ES5 if we intend to drop support for older environments (eg. IE11, Node 4, ...)?

At the same time, I think it's better if we can separate the fix of performance degradation and the breaking changes, which means to land the performance fix in v3 and include the breaking changes only in v4.

@rjgotten
Copy link
Contributor

rjgotten commented Dec 7, 2020

@matthew-dean
The fact that ES classes are the culprit is crazy scary.
Thanks for the elaborate breakdown.

Goes to show: composition over inheritance 😛

Though FYI; if you want a cleaner protypal inheritance chain you really should do it a bit different than your example.
If you use Object.create like this:

var Node = function() {
  this.foo = 'bar';
}
Node.prototype = Object.create();
Node.prototype.constructor = Node;

var Inherited = function() {
  Node.prototype.constructor.call( this );
  this.value = 1;
}
Inherited.prototype = Object.create( Node.prototype );
Inherited.prototype.constructor = Inherited;

var myNode = new Inherited();

then you flatten the properties onto the instances themselves which then share the same shape; the prototypes share shape; and you avoid having to crawl up the prototype chains for each property access. 😉

@matthew-dean
Copy link
Member

matthew-dean commented Dec 8, 2020

@Justineo

I saw the usage of Object.assign in the compiled output, which means it only works in browsers that support native ES class syntax unless polyfills are now required. So can we just use the native syntax without transpiling down to ES5 if we intend to drop support for older environments (eg. IE11, Node 4, ...)?

I don't think that's quite right i.e. Object.assign landed shortly before classes implementation but your point is taken. I was just hoping to avoid writing [Something].prototype.property over and over :/

Technically, everything's still transpiling, so I dunno. The original goal was a more maintainable / readable code-base. If you need an Object.assign polyfill in some environment, so be it. It would be on a version of something that Less doesn't support.

At the same time, I think it's better if we can separate the fix of performance degradation and the breaking changes, which means to land the performance fix in v3 and include the breaking changes only in v4.

I've been thinking of that, and I guess that's also a fair point. I'm just trying to make my head not explode building / maintaining 3 major versions of Less.

@rjgotten AFAIK that's what a class pattern is supposed to do, exactly as you've defined it. Unless I'm not seeing a critical difference. So 🤷‍♂️ . I'm not going to try to alter Less's Node inheritance pattern again if it's performing well (other than I may remove this Object.assign call as @Justineo suggested.)

@matthew-dean
Copy link
Member

@Justineo @rjgotten Can you try this: [email protected]+b1390a54

@Justineo
Copy link
Contributor

Justineo commented Dec 8, 2020

Just tried it:

Version Time
v3.9.0 ~1.6s
v3.10.0~v3.11.1 ~3.6s
v3.11.2+ ~12s
4.0.1-alpha.2+b2049010 ~1.6s
3.13.0-alpha.10+b1390a54 ~4.7s

Ps. Tested with Node.js v12.13.1.

@matthew-dean
Copy link
Member

I'm.... what.

@rjgotten
Copy link
Contributor

rjgotten commented Dec 8, 2020

Didn't have the time to set up seperate benchmarks.
But from an integration test perspective, here's some real-world numbers: the cumulative results of a production-grade Webpack project that uses Less.

Version Time Peak Memory
3.9 35376ms 950MB
3.11.3 37878ms 920MB
3.13.0-alpha.10+b1390a54 34801ms 740MB
3.13.1-alpha.1+84d40222 37367ms 990MB
4.0.1-alpha.2+b2049010 35857ms 770MB

For me the 3.13.0 gives the best results... 🙈

3.11.3 also seems to not have the runaway memory use that I saw before with the 3.11.1 version.
But the 4.0.1 and 3.13.0 betas are better still.

The 3.13.1 which restored the cache doesn't help with improving my real-world compile times and just inflates memory use.

@matthew-dean
Copy link
Member

@rjgotten Yeah, I think the problem is that, in this thread, people are measuring different things, and everyone thus far essentially has private data which is impossible to replicate (without submitting to Less benchmarking, or translating that into a PR). Less does have a benchmark test, which I could use against different clones of the repo, to verify for myself differences in parse / eval time, but where caching doesn't (currently) apply.

Here is a published build with the inheritance changes on the tree, and with the parse tree cache restored: [email protected]+e8d05c61

@Justineo
Copy link
Contributor

Justineo commented Dec 8, 2020

Test case

https://github.com/ecomfe/dls-tooling/tree/master/packages/less-plugin-dls

Results

Version Time
v3.9.0 ~1.6s
v3.10.0~v3.11.1 ~3.6s
v3.11.2+ ~12s
4.0.1-alpha.2 ~1.6s
3.13.0-alpha.10 ~4.7s
3.13.0-alpha.12 ~1.6s

Node.js version

v12.13.1


To me this version seem to work perfectly. I'll ask my colleagues to try out and verify.

@rjgotten
Copy link
Contributor

rjgotten commented Dec 8, 2020

Works slightly worse than the alpha.10 for me, but could also just be variance:

Version Time Peak Memory
3.9 35376ms 950MB
3.11.3 37878ms 920MB
3.13.0-alpha.10+b1390a54 34801ms 740MB
3.13.0-alpha.12+e8d05c61 36263ms 760MB
3.13.1-alpha.1+84d40222 37367ms 990MB
4.0.1-alpha.2+b2049010 35857ms 770MB

@jrnail23
Copy link

jrnail23 commented Dec 8, 2020

@matthew-dean, it looks like my code is not yet compatible with the 4.0.1-alpha.2+b2049010 changes. I'll see if I can resolve my issues and give it a try.

@matthew-dean
Copy link
Member

@rjgotten Yeah, the difference seems minor for your test case.

I think the outcome of this is that I might prepare both a 3.x and a 4.0 release. I wasn't itching to push 4.0 with this issue unresolved. Thankfully, it looks like it has been.

The other item I'll do is open an issue for anyone to grab which is to put in a CI pipeline that fails Less if it falls below a certain benchmark threshold.

@matthew-dean
Copy link
Member

@jrnail23 It would be great if you can modify to run the 4.0 alpha, but can you in the meantime try [email protected]+e8d05c61?

@jrnail23
Copy link

jrnail23 commented Dec 8, 2020

@matthew-dean, I'm getting the following for my build:

  • v3.9: 98 seconds
  • v3.10.3: 161 seconds
  • v3.13.0-alpha.12: 93-96 seconds

So it looks like you're back down to where you want to be! Nice work!

@rjgotten
Copy link
Contributor

rjgotten commented Dec 9, 2020

Nice work!

Yeah, I'm going to second; third; and fourth that.
This was a nasty, nasty performance regression that really took a lot of effort to clean out.

Job very well done.

@matthew-dean
Copy link
Member

matthew-dean commented Dec 9, 2020

Okay team! I'm going to publish the 3.x build, and sometime later, maybe next week, publish 4.0. Both should now have the performance fixes. Thanks to everyone who helped debug!

@matthew-dean
Copy link
Member

By the way, I'm currently in the process of converting the Less.js codebase to TypeScript, and plan to use the opportunity to do some performance tuning / refactoring. I'm open to help if anyone is interested! matthew-dean/less.js@master...matthew-dean:next

@kevinramharak
Copy link

kevinramharak commented Jan 7, 2021

Anything specific where you would prefer some extra eyes? The PR is very large, so I don't really know where to start

@matthew-dean
Copy link
Member

@kevinramharak That's a fair question. To be honest, I ran into unexpected roadblocks with converting to TypeScript (errors that became hard to resolve), and am now re-thinking doing it at all. Less is running fine as it is, and a lot of the reasons I wanted to convert (to make refactoring / adding new language features easier) are now irrelevant as I've decided to move my ideas for new language features to a new pre-processing language. I don't want to over-self-promote, so DM me on Twitter or Gitter if you want details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants