-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Massive CPU usage #266
Comments
You're not alone... I've started to notice similar issues on my machine, which is also fairly powerful (it's a 2014 Macbook Pro). The problem seems to be something about the worker processes as they're starting up. With my system monitor open, I see the CPU of each worker process spike to 100-200% CPU for about 10-15 seconds, then go down to 2-3% and stay there. I'll do some experiments with commenting out blocks of code and see if I can figure out the cause for the high CPU usage. |
Thanks Dave, maybe they are doing compilation? I don't know the details. Maybe you could tweak the niceness of the processes while starting up, so they are more polite? No idea how to do that in java I'm afraid, maybe some jvm options? |
The code for forking the worker processes is here. There is no compilation involved -- Alda basically runs from a jar file (it's wrapped in an executable which essentially just runs I've been googling around and not really finding any definitive solutions for limiting CPU usage in a JVM process (apparently there isn't a convenient JVM option for that)... tweaking niceness might be something we can do. This looks promising, though: http://stackoverflow.com/a/2877813/2338327 Apparently spawning a child process in Java causes high memory usage in general, and a workaround is to spawn one "exec server" process early on in the application, then have that secondary process be the one to spawn your external processes. Because the second process is much smaller, the external processes end up consuming less memory. It's something to do with child processes using the same amount of memory as the parent process, or something to that affect. I'll play around with this. |
As we are friends, I know you would tell me if you were mining bitcoin on my computer, so it can't be that... |
Hmm... perhaps memory is not the issue after all. I'll try the change I mentioned anyway and see if it helps at all -- if not, then back to the drawing board. |
I tried the ProcessHelper (intermediate process-spawning process) approach -- here's my code on a branch. I'm having trouble getting it to work, though. The child processes are not being spawned. I don't think the issue is memory, anyway -- it's unconstrained CPU use -- so I think I'll abandon that approach. |
I'm not sure if we can tweak niceness in a cross-platform way, but we could at least shell out and use Unix commands to set the niceness of each worker process pid immediately after spawning it... it's better than nothing. |
Would be perfect to profile the code and see what is spinning and optimise On Fri, 16 Sep 2016 at 15:43 Dave Yarwood [email protected] wrote:
|
I did take a stab at profiling using I am reproducing this issue every time I run I think I might be getting somewhere by commenting out code as a hacky attempt at profiling... will keep digging! |
Simply requiring our logging library appears to cause CPU to spike >200% in the worker processes briefly when they start up... not sure what that's all about. Maybe we could try a build where workers do no logging at all? Currently the only time the user ever sees worker logs is if they log an error to |
@daveyarwood What if you compile the library AOT? |
AOT compiling does not appear to help. |
Hmm...... I wonder if this is why my computer is freezing beyond use? Of course this is from leaving the server running by default, so idk if it's the cause of the issue. |
@elyisgreat Yikes, sorry about that! I noticed on my computer that the problem gets worse when the computer goes into sleep mode and then comes back*, but I think that high CPU usage is the root cause. *what happens is, the existing worker processes are hanging around briefly while they're shutting down (see #160), and meanwhile the server is starting N more workers (4 by default, or a different number if you started the server with the I've been thinking about how to handle this in terms of Alda releases... Seems like it is not the easiest thing in the world to fix the immediate issue with CPU usage. One thing I can do though, is push a release out ASAP that disables the "immediately start more workers after coming out of sleep" behavior, and make it opt-in via an option when you run |
1.0.0-rc39 makes the worker-cycling behavior introduced in 1.0.0-rc38 an opt-in feature. To opt in, include Once we are able to fix this CPU issue, we can remove the |
So far, I've been able to determine that requiring timbre and instaparse both cause the CPU to spike. The way I'm testing this is by commenting out code so that only certain code paths are happening, and each time I test it, I am building I'm not sure yet what we can do about this. We definitely need instaparse, at least. I wonder if there is some way we can get the Java process to take its time loading the dependencies, so it doesn't blow through all the CPU at once? |
I created a branch with this change to |
An interesting observation: when starting 1 worker, my CPU spikes ~30-40%. When starting 4 workers, it spikes 175-200%. Could there be something off about the way we start a worker process? Code for starting a worker process is here -- basically we just call |
I had an idea for a workaround this morning -- regardless of what is causing the spike in CPU, we can work around it by not trying to start up all 4 workers at the same time. We really only need 1 worker to be available for the server to be considered "up," and the other 3 workers can take their time. I don't think it will be a major inconvenience if all 4 workers are not available right away. So what we can do is stagger the spawning of the workers, i.e.:
We can do this on a background thread so it doesn't hold up the server, and this won't feel like it takes any longer than it does currently to spawn the workers. (In fact, it might even be faster, because the first worker won't be competing with 3 others for CPU.) |
nice workaround! |
I also think the default number of workers should maybe be 2 instead of 4. For the most part, you shouldn't need to play more than 1 score at a time, and having 2 workers allows you to start playing the next score as the one before it is ending. (Remember that you can always add more workers when starting the server by running e.g. |
I believe 1.0.0-rc41 fixes this issue. @0atman @kirbyfan64 @elyisgreat please test and let me know what you think! To summarize the changes in this release:
|
Awesome, rc41 works great on my linux machine. However, it doesn't spawn workers on my windows machine: λ alda up
[27713] Starting Alda server...
[27713] Server up ?
[27713] Starting worker processes...
[ this hangs, I waited for 5 minutes ]
^C
λ alda status
[27713] Server up (0/2 workers available) |
That's strange... was it working before this release? I can't think of what might have changed between 1.0.0-rc38 and 41 that would affect workers spawning on Windows. |
Ah, I had upgraded from rc31, sorry I can't be more specific! Interestingly |
It makes sense that the REPL works -- the REPL doesn't use a server or worker processes at all, it's just a standalone program. Can you try |
I'm going to close this -- I think the problem is solved by 1.0.0-rc41, and workers not spawning on Windows seems like a separate issue. @0atman I'll be happy to continue to troubleshoot that with you -- I want to make sure Alda works on Windows! |
Makes sense. If I'm still seeing freezing issues I will comment here. |
I tried the latest version of alda today (
alda update
), and found that runningalda up
maxed out my machine's CPU and basically locked it up with loadaverages over 80(!).Here are the offending threads:
What's happened in the last 10 days? My machine is very powerful, and more importantly didn't used to have this problem.
My version:
My machine:
My jvm:
The text was updated successfully, but these errors were encountered: