Massive CPU usage #266

0atman · 2016-09-16T12:25:25Z

I tried the latest version of alda today (alda update), and found that running alda up maxed out my machine's CPU and basically locked it up with loadaverages over 80(!).

Here are the offending threads:

What's happened in the last 10 days? My machine is very powerful, and more importantly didn't used to have this problem.

My version:

$ alda repl
Sep 16, 2016 1:25:00 PM com.jsyn.engine.SynthesisEngine start
INFO: Pure Java JSyn from www.softsynth.com, rate = 44100, RT, V16.7.3 (build 457, 2014-12-25)
Preparing MIDI system... done.

 █████╗ ██╗     ██████╗  █████╗ 
██╔══██╗██║     ██╔══██╗██╔══██╗
███████║██║     ██║  ██║███████║
██╔══██║██║     ██║  ██║██╔══██║
██║  ██║███████╗██████╔╝██║  ██║
╚═╝  ╚═╝╚══════╝╚═════╝ ╚═╝  ╚═╝

            v1.0.0-rc38
         repl session

My machine:

$ uname -a
Linux T420 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux

My jvm:

$ java -version
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)

The text was updated successfully, but these errors were encountered:

daveyarwood · 2016-09-16T12:36:32Z

You're not alone... I've started to notice similar issues on my machine, which is also fairly powerful (it's a 2014 Macbook Pro).

The problem seems to be something about the worker processes as they're starting up. With my system monitor open, I see the CPU of each worker process spike to 100-200% CPU for about 10-15 seconds, then go down to 2-3% and stay there.

I'll do some experiments with commenting out blocks of code and see if I can figure out the cause for the high CPU usage.

0atman · 2016-09-16T12:38:25Z

Thanks Dave, maybe they are doing compilation? I don't know the details.

Maybe you could tweak the niceness of the processes while starting up, so they are more polite? No idea how to do that in java I'm afraid, maybe some jvm options?

daveyarwood · 2016-09-16T13:40:16Z

The code for forking the worker processes is here.

There is no compilation involved -- Alda basically runs from a jar file (it's wrapped in an executable which essentially just runs java -jar followed by the contents of the jar file). The jar file contains all of the compiled code for the client, server, and workers. So to start a worker process, the server runs a util function to determine the path of the program you're running (e.g. /usr/local/bin/alda), then starts a new process of the same program, but with different command-line args so that it starts up a worker.

I've been googling around and not really finding any definitive solutions for limiting CPU usage in a JVM process (apparently there isn't a convenient JVM option for that)... tweaking niceness might be something we can do.

This looks promising, though: http://stackoverflow.com/a/2877813/2338327

Apparently spawning a child process in Java causes high memory usage in general, and a workaround is to spawn one "exec server" process early on in the application, then have that secondary process be the one to spawn your external processes. Because the second process is much smaller, the external processes end up consuming less memory. It's something to do with child processes using the same amount of memory as the parent process, or something to that affect. I'll play around with this.

0atman · 2016-09-16T13:49:38Z

Got it, but be careful of mem being a red herring: See my screenshot, I in no way ran out of memory:

My load was 80 - EIGHTY! - before I managed to kill the server, no swapping or out of memory errors, even no I/O!.

0atman · 2016-09-16T13:49:59Z

As we are friends, I know you would tell me if you were mining bitcoin on my computer, so it can't be that...

daveyarwood · 2016-09-16T14:06:05Z

Hmm... perhaps memory is not the issue after all. I'll try the change I mentioned anyway and see if it helps at all -- if not, then back to the drawing board.

daveyarwood · 2016-09-16T14:40:02Z

I tried the ProcessHelper (intermediate process-spawning process) approach -- here's my code on a branch. I'm having trouble getting it to work, though. The child processes are not being spawned.

I don't think the issue is memory, anyway -- it's unconstrained CPU use -- so I think I'll abandon that approach.

daveyarwood · 2016-09-16T14:42:56Z

I'm not sure if we can tweak niceness in a cross-platform way, but we could at least shell out and use Unix commands to set the niceness of each worker process pid immediately after spawning it... it's better than nothing.

0atman · 2016-09-16T14:48:08Z

Would be perfect to profile the code and see what is spinning and optimise
it - solve it at source. But First we'd need to replicate it.

On Fri, 16 Sep 2016 at 15:43 Dave Yarwood [email protected] wrote:

I'm not sure if we can tweak niceness in a cross-platform way, but we
could at least shell out and use Unix commands to set the niceness of each
worker process pid immediately after spawning it... it's better than
nothing.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#266 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAG9sfU9MOq_JlroP2va4ur5dUxzo1Pbks5qqqrzgaJpZM4J-4my
.

daveyarwood · 2016-09-16T15:56:06Z

I did take a stab at profiling using jvisualvm, but I wasn't really able to find a single source of high CPU usage... although the problem there could definitely be lack of experience with profiling on my part.

I am reproducing this issue every time I run alda up. I can see the worker processes start up in Activity Monitor, and CPU spikes to around 200% for about 10 seconds before dropping back down to a reasonable level (2-3%) and staying there.

I think I might be getting somewhere by commenting out code as a hacky attempt at profiling... will keep digging!

daveyarwood · 2016-09-16T16:04:18Z

Simply requiring our logging library appears to cause CPU to spike >200% in the worker processes briefly when they start up... not sure what that's all about. Maybe we could try a build where workers do no logging at all?

Currently the only time the user ever sees worker logs is if they log an error to ~/.alda/logs/error.log, but we could easily get the same functionality without requiring the whole log library.

refi64 · 2016-09-16T16:10:04Z

@daveyarwood What if you compile the library AOT?

daveyarwood · 2016-09-17T00:20:34Z

AOT compiling does not appear to help.

elydpg · 2016-09-17T02:37:42Z

Hmm......

I wonder if this is why my computer is freezing beyond use? Of course this is from leaving the server running by default, so idk if it's the cause of the issue.

daveyarwood · 2016-09-17T12:27:22Z

@elyisgreat Yikes, sorry about that! I noticed on my computer that the problem gets worse when the computer goes into sleep mode and then comes back*, but I think that high CPU usage is the root cause.

*what happens is, the existing worker processes are hanging around briefly while they're shutting down (see #160), and meanwhile the server is starting N more workers (4 by default, or a different number if you started the server with the --workers N option), so you have a total of N x 2 worker processes active at once. Because of the CPU usage issue, this can really slow down even a high-powered system.

I've been thinking about how to handle this in terms of Alda releases... Seems like it is not the easiest thing in the world to fix the immediate issue with CPU usage. One thing I can do though, is push a release out ASAP that disables the "immediately start more workers after coming out of sleep" behavior, and make it opt-in via an option when you run alda up. Once the CPU usage issue is fixed, I can make it the default again. This will at least make it safe to have your Alda server always running in the background.

daveyarwood · 2016-09-17T14:37:08Z

1.0.0-rc39 makes the worker-cycling behavior introduced in 1.0.0-rc38 an opt-in feature. To opt in, include --cycle-workers when starting a server, i.e. alda --cycle-workers up. See the release notes for more details.

Once we are able to fix this CPU issue, we can remove the --cycle-workers option and go back to it happening by default.

daveyarwood · 2016-09-17T15:30:59Z

So far, I've been able to determine that requiring timbre and instaparse both cause the CPU to spike. The way I'm testing this is by commenting out code so that only certain code paths are happening, and each time I test it, I am building alda to /tmp and running /tmp/alda -p 12345 worker. I also commented out nearly all the code in the alda.worker namespace, so that all it's doing is requiring alda.now, waiting 20 seconds, and then exiting.

I'm not sure yet what we can do about this. We definitely need instaparse, at least. I wonder if there is some way we can get the Java process to take its time loading the dependencies, so it doesn't blow through all the CPU at once?

daveyarwood · 2016-09-17T16:02:14Z

I created a branch with this change to alda.worker where it just loads timbre and instaparse, sleeps for 20 seconds, then exits. With the requires, I'm seeing my CPU spike to 150-200%. Without them, CPU usage is < 5%.

daveyarwood · 2016-09-17T18:24:46Z

An interesting observation: when starting 1 worker, my CPU spikes ~30-40%. When starting 4 workers, it spikes 175-200%. Could there be something off about the way we start a worker process?

Code for starting a worker process is here -- basically we just call alda.worker/start-worker! from Java using Clojure's Java API.

daveyarwood · 2016-09-18T12:41:03Z

I had an idea for a workaround this morning -- regardless of what is causing the spike in CPU, we can work around it by not trying to start up all 4 workers at the same time. We really only need 1 worker to be available for the server to be considered "up," and the other 3 workers can take their time. I don't think it will be a major inconvenience if all 4 workers are not available right away.

So what we can do is stagger the spawning of the workers, i.e.:

spawn worker 1
wait 5 seconds; spawn worker 2
wait 5 seconds; spawn worker 3
wait 5 seconds; spawn worker 4

We can do this on a background thread so it doesn't hold up the server, and this won't feel like it takes any longer than it does currently to spawn the workers. (In fact, it might even be faster, because the first worker won't be competing with 3 others for CPU.)

0atman · 2016-09-18T12:46:26Z

nice workaround!

daveyarwood · 2016-09-18T21:33:06Z

I also think the default number of workers should maybe be 2 instead of 4. For the most part, you shouldn't need to play more than 1 score at a time, and having 2 workers allows you to start playing the next score as the one before it is ending.

(Remember that you can always add more workers when starting the server by running e.g. alda --workers 4 up)

daveyarwood · 2016-09-18T22:26:50Z

I believe 1.0.0-rc41 fixes this issue. @0atman @kirbyfan64 @elyisgreat please test and let me know what you think!

To summarize the changes in this release:

There is now a 10 second delay between starting workers. The server is still considered "up" as soon as there is at least 1 worker available.
The default number of workers is now 2.
The server now checks on (and adjusts) the number of workers every 60 seconds, instead of every 30 seconds. (This is not exactly relevant to the high CPU usage, but worth mentioning)
Removed the --cycle-workers option -- cycling workers is now the default behavior again, like it was prior to 1.0.0-rc39.

0atman · 2016-09-19T19:33:03Z

Awesome, rc41 works great on my linux machine.

However, it doesn't spawn workers on my windows machine:

λ alda up
[27713] Starting Alda server...
[27713] Server up ?
[27713] Starting worker processes...
[ this hangs, I waited for 5 minutes ]
^C

λ alda status
[27713] Server up (0/2 workers available)

daveyarwood · 2016-09-19T21:15:57Z

That's strange... was it working before this release? I can't think of what might have changed between 1.0.0-rc38 and 41 that would affect workers spawning on Windows.

0atman · 2016-09-19T21:46:48Z

Ah, I had upgraded from rc31, sorry I can't be more specific!

Interestingly alda repl works fine.

daveyarwood · 2016-09-19T21:50:02Z

It makes sense that the REPL works -- the REPL doesn't use a server or worker processes at all, it's just a standalone program.

Can you try alda -p 12345 worker and see if any errors come up? 12345 is a made-up number, but we can at least see if the workers are crashing right when they start up.

daveyarwood · 2016-09-20T13:32:55Z

I'm going to close this -- I think the problem is solved by 1.0.0-rc41, and workers not spawning on Windows seems like a separate issue. @0atman I'll be happy to continue to troubleshoot that with you -- I want to make sure Alda works on Windows!

elydpg · 2016-09-20T14:18:49Z

Makes sense. If I'm still seeing freezing issues I will comment here.

daveyarwood added bug up for grabs labels Sep 16, 2016

daveyarwood added this to the 1.0.0 milestone Sep 17, 2016

This was referenced Sep 17, 2016

Something in encore 2.79.1 (used by timbre) causes large spike in CPU usage taoensso/encore#33

Closed

Requiring instaparse.core causes large spike in CPU usage Engelberg/instaparse#146

Closed

daveyarwood added in progress and removed up for grabs labels Sep 18, 2016

daveyarwood self-assigned this Sep 18, 2016

daveyarwood added ready and removed in progress labels Sep 19, 2016

daveyarwood closed this as completed Sep 20, 2016

daveyarwood removed the ready label Sep 20, 2016

0atman mentioned this issue Sep 20, 2016

Alda not starting workers in some cases #271

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive CPU usage #266

Massive CPU usage #266

0atman commented Sep 16, 2016 •

edited

Loading

daveyarwood commented Sep 16, 2016

0atman commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

0atman commented Sep 16, 2016

0atman commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

0atman commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

daveyarwood commented Sep 16, 2016 •

edited

Loading

refi64 commented Sep 16, 2016

daveyarwood commented Sep 17, 2016

elydpg commented Sep 17, 2016

daveyarwood commented Sep 17, 2016 •

edited

Loading

daveyarwood commented Sep 17, 2016

daveyarwood commented Sep 17, 2016

daveyarwood commented Sep 17, 2016

daveyarwood commented Sep 17, 2016

daveyarwood commented Sep 18, 2016 •

edited

Loading

0atman commented Sep 18, 2016

daveyarwood commented Sep 18, 2016

daveyarwood commented Sep 18, 2016 •

edited

Loading

0atman commented Sep 19, 2016 •

edited

Loading

daveyarwood commented Sep 19, 2016

0atman commented Sep 19, 2016

daveyarwood commented Sep 19, 2016

daveyarwood commented Sep 20, 2016 •

edited

Loading

elydpg commented Sep 20, 2016

Massive CPU usage #266

Massive CPU usage #266

Comments

0atman commented Sep 16, 2016 • edited Loading

daveyarwood commented Sep 16, 2016

0atman commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

0atman commented Sep 16, 2016

0atman commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

0atman commented Sep 16, 2016

daveyarwood commented Sep 16, 2016

daveyarwood commented Sep 16, 2016 • edited Loading

refi64 commented Sep 16, 2016

daveyarwood commented Sep 17, 2016

elydpg commented Sep 17, 2016

daveyarwood commented Sep 17, 2016 • edited Loading

daveyarwood commented Sep 17, 2016

daveyarwood commented Sep 17, 2016

daveyarwood commented Sep 17, 2016

daveyarwood commented Sep 17, 2016

daveyarwood commented Sep 18, 2016 • edited Loading

0atman commented Sep 18, 2016

daveyarwood commented Sep 18, 2016

daveyarwood commented Sep 18, 2016 • edited Loading

0atman commented Sep 19, 2016 • edited Loading

daveyarwood commented Sep 19, 2016

0atman commented Sep 19, 2016

daveyarwood commented Sep 19, 2016

daveyarwood commented Sep 20, 2016 • edited Loading

elydpg commented Sep 20, 2016

0atman commented Sep 16, 2016 •

edited

Loading

daveyarwood commented Sep 16, 2016 •

edited

Loading

daveyarwood commented Sep 17, 2016 •

edited

Loading

daveyarwood commented Sep 18, 2016 •

edited

Loading

daveyarwood commented Sep 18, 2016 •

edited

Loading

0atman commented Sep 19, 2016 •

edited

Loading

daveyarwood commented Sep 20, 2016 •

edited

Loading