Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU usage during idle #51

Closed
theronic opened this issue Mar 18, 2019 · 6 comments
Closed

100% CPU usage during idle #51

theronic opened this issue Mar 18, 2019 · 6 comments

Comments

@theronic
Copy link

theronic commented Mar 18, 2019

Not sure if this is timely related, websockets or something about polling, but I'm seeing release executable using 100% CPU during idle. Is this expected?

Probably this loop

@comnik
Copy link
Owner

comnik commented Mar 18, 2019

Right now, this is partly on us (that loop you point out), partly on Timely (TimelyDataflow/differential-dataflow#114). A first step to rectifying this was event-driven scheduling landing in Timely, but some pieces are still missing. And it is not clear whether we even want it for large-scale, data-processing use cases.

If you increase the polling timeout here:

poll.poll(&mut events, Some(Duration::from_millis(0)))
that should get it down quite a bit, but I'm not sure whether that has any adverse effects on the rest of the system.

edit: tried it with 100ms, that reduced the usage to 0.1%

@comnik
Copy link
Owner

comnik commented Mar 18, 2019

Right now I think we could do inputs completely blocking, because we always step the worker to catch up with all inputs before continuing the event loop. But that is something we want to keep flexible.

edit: nope, that would be dangerous, because the other workers would stop receiving commands.

@theronic
Copy link
Author

Understandably, dedicated DF nodes could peg at 100%, but if adoption is a priority, then it seems worthwhile to be able to run DF (perhaps in debug mode) against blocking channels so it doesn't use up one of my eight cores. Since mutations arrive via evented websockets, it seems that could wake up the poll thread. For distributed replication, maybe a layer of crossbeam-channel indirection could feed into the main loop?

@comnik
Copy link
Owner

comnik commented Mar 19, 2019

Yeah, I agree. Does the polling timeout workaround help? With superficial testing I don't notice any immediate problems.

Running with a blocking input source for single-worker, development mode would be a helpful feature. I tried it just now and it stalls sometimes, because the command-sequencing dataflow is not taken into consideration when the server decides, whether more work is needed to process all inputs. Fixing that shouldn't be too hard, but needs a bit of work.

@comnik
Copy link
Owner

comnik commented Mar 29, 2019

Update: server now comes with a "blocking" compile-time feature flag. Another potential problem there is sources, which is why we now have some scheduling logic (1a393c4).

That has to be integrated and the sequencer dataflow has to be probed.

comnik pushed a commit that referenced this issue Apr 29, 2019
This is the first step to remediate #51 without running the chance of accidentally slowing down useful work.

* Add until_next() method to scheduler
* Use step_or_park in server loop
@comnik
Copy link
Owner

comnik commented May 6, 2019

Fixed, now that TimelyDataflow/timely-dataflow#268 landed in Timely master.

@comnik comnik closed this as completed May 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants