Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicore processing in future being deprecated in rstudio #231

Open
kmishra9 opened this issue Aug 9, 2019 · 6 comments
Open

Multicore processing in future being deprecated in rstudio #231

kmishra9 opened this issue Aug 9, 2019 · 6 comments
Labels

Comments

@kmishra9
Copy link

kmishra9 commented Aug 9, 2019

Hey there -- for the parallelization vignette, this may be important news

@kmishra9
Copy link
Author

kmishra9 commented Aug 9, 2019

Actually yeah this is a big deal -- I swapped from multicore to multisession and got a 25x speedup (I think multicore is already falling back on "sequential"). I was wondering why things were going so slowly despite using the parallelization framework...

@jeremyrcoyle
Copy link
Collaborator

Good to know. Did you get the warning mentioned in that future issue when you tried multicore in RStudio?

@kmishra9
Copy link
Author

kmishra9 commented Aug 9, 2019

Yup.

I'm also wondering -- in the vignette, 12 was chosen as the number of logical cores... But if its spinning up a bunch of threads, is there any reason not to spin up like 100 threads (or sessions in this case) and just let the CPU handle them all as it sees fit? This way, it'll stay at 100% utilization all the time. I imagine the answer may be "scheduler overhead" but limiting to 12 seems to reduce total CPU utilization below 100%

@kmishra9
Copy link
Author

kmishra9 commented Aug 9, 2019

Can confirm that setting up 30 workers on my 12 thread laptop seems to be a solid move for maxing utilization

image

@jeremyrcoyle
Copy link
Collaborator

The number of threads was based on the machine used to run the vignette.

That said, in my experience, i've found that even hyperthreading (using 2 threads/core) either has no effect or can hurt performance over using 1 thread/core. Also, ML in R is almost always memory bound and not CPU bound, unless you have a machine built specifically to have a ton of memory relative to the number of cores. You'll see that kernel_task process in your screenshot, which can indicate swapping memory onto your disk when you run out of physical memory. That's is going to hurt your performance a lot. Pegging your CPU at 100% does not always result in the shortest run times. YMMV, feel free to play around and benchmark different numbers of threads for your specific compute resources and application.

@kmishra9
Copy link
Author

kmishra9 commented Aug 9, 2019

Very fair argument surrounding being memory-limited. I think I avoided swapping because my data isn't too large, but would agree that's generally the rate-limiting factor relative to CPU usage. Totally agree its a YMMV situation -- may be worth mentioning the tradeoff in the vignette.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants