-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GluonTS not using all available (CPU) resources #200
Comments
I've seen gluonts in the past using high CPU usage but it does vary by underlying model.
|
Thanks for the quick reply!
|
Set |
Working on |
Sorry for the late reply. I have been doing some more testing and it appears the CPU capacity usage drops when several models from GluonTS are run, including DeepAR, WaveNet, Transformer, SFF, NPTS. I attached a few lines from the top of the run log, hoping you're willing to take a look at them and help me find out what's wrong here. If you need more details, please let me know. The input pandas dataframe consists of 50k rows (tried less as well btw) with a datetime column in the correct format and multiple feature columns with proper data (no NaN, extreme outliers or otherwise false values). Thank you very much in advance for your support!! |
Some things I have noticed:
What frequency are you using with 50K rows of history? This must be hourly or minute level data? Neural networks are always unreliable. I've been working on adding the TiDE model and it keeps killing my kernel for no clear reason. For GluonTS, I am adding in the next release limited support for their pytorch approach, which might help, since apparently mxnet is deprecated. Something else that can help with full core utilization is setting some environmental variables. |
Thanks for your reply Colin, first of all. I really appreciate your support! Secondly, I'm running the script directly from the command line (user terminal on Ubuntu 22.04), FYI. I ran the script once again with the suggested parameters:
Run results:
I attached the terminal output and full debug log here for a quick review, hoping you are willing to take a look at it. I would be very grateful if you would take the effort to do so and help me further along here since I don't know what to do at this point. terminal ouput.txt Last but not least: Thanks in advance for any help and please let me know if you need more details. |
Cloud provider VMs are listed as vCPUs which is the number of 'logical' processors or threads, not the physical number of cores. For workloads with lots of small, lightweight tasks this works out fine, but data science tends to pretty heavy duty workloads that can't hyperthread very effectively, so the number of physical cores is the performance constraint. Normally the number of actual cores is listed number of CPUs / 2. It might be worth setting n_jobs to VM CPU official size / 2. But as for a low usage of RAM and CPU, that might be because the 'superfast' model list is mostly matrix operations that are quite efficient and don't parallelize much. That said, you should double check and make sure Numpy is liked to a blas/lapack correctly. It should be correct but worth checking (and also setting that OMP_NUM_THREADS environmental variable which can help). Here's a stack overflow thing on checking for the lapack. Some errors are to be expected . Some combination of parameters AutoTS generates, and some parameters on some datasets, don't work. Those errors don't look like a problem to me. Something that can help diagnose which model is crashing your script is passing Not sure if you have looked at the production_example.py example yet, but you definitely should if you haven't.
Check |
Thanks for the advise on the virtual CPUs, I am playing with OMP_NUM_THREADS variable and n_jobs parameter settings but didn't get much better results than about 30% usage yet. It's better than 13% though, so I will persevere to find an optimum (>30% hopefully!). Any other advise that might help on this? Or is this actually the best possible on a VM? I checked the numpy package which is using openblas: The script doesn't crash anymore with a smaller input dataset (currently testing with 500 rows of data). With 50k rows of data I get some memory allocation errors. Regarding the other errors, could you please take a look at this merged debug log: For sure I studied the production_example.py before, but will do that once again tonight :) Not yet succeeded to check the Again, thanks a lot for your help in advance, very much appreciated! |
Where are you collecting your CPU utilization from? From the droplet internally or externally on the dashboard? Internally collected metrics on a VM might be inaccurate. if your only goal is 100% CPU utilization, try `model_list='parallel' then increase n_jobs until you see 100%. Although I wouldn't advise aiming for fully 100% all the time, that often means the system is overutilized and 'thrashing', a bit lower is better. Actually should point out that utilization will still be low if you are only inputting ONE time series. Many of the optimizations here are designed to parallelize across multiple time series, not across a single input time series. Try with the example load_daily() dataset and see how utilization is looking. If you only have one series, try But really, you shouldn't be aiming for maximal CPU utilization, you should focus on overall runtime optimization. Some operations and some models can't be parallelized but are still fast. openblas config looks good. Is this a dataset you can share? If you want you can send it to me and I can see how it works for me. Given that you are working with 5 minute frequency and 1 step ahead predictions I'm 90% willing to bet you are trying to do some sort of semi-high frequency stock trading automation. No, 'Transformer failed on fit' are not usually something to worry about, model.df_wide_numeric should be a pandas dataframe, it will be available after .fit() is run. Although if you just want to check the data, you could try just fit_data: model = AutoTS()
model = model.fit_data(df)
print(model.df_wide_numeric) |
Thanks again for your reply Colin. I'm using the graphs on the DO dashboard for CPU and RAM utilization analysis, but also tried using the internal Ubuntu performance metrics application. The metrics of both instruments are in line with each other. Full CPU utilization is not my goal necessarily, however because of the quite extensive amount of training data it would save a lot of time if all CPUs and their capacity could be utilized as optimally as possible. Nevertheless I thank you for pointing out that runtime optimization is much more essential than CPU utilization optimization. Thanks to your explanation I do understand now that parallelization is limited depending on the model (list) used. Thanks for the confirmation on the openblas config. We're trying to predict dynamic electricity prices in order to advise on when and where to charge/de-charge EV's and other high-capacity batteries like home batteries. Here's an example of the input data (simplified to only 5 columns of features and 1k records): data.csv Lastly, using
Still I get the above errors however, which might be the cause of other errors as well I guess. Hope you will be able to evaluate the example data and pinpoint the cause of the errors. Thanks again in advance! |
I'm a bit busy with a few other things at the moment but hope to give your data a proper examination in the next couple days. |
Thanks Colin, I'm very happy to hear you're willing to take the effort to help us on this matter. Looking forward to your message after you had the opportunity to examine this case. Thanks a million already! |
While training GluonTS model less than 10% of the CPU resources (16 cores in my test case) are used, while other models use 70%+ of the available CPU resources. Memory and Disk resources capacity are not the issue in my case, there are more than enough free resources there as well. The model is set to use
n_jobs='auto'
, which seems to work fine for other models which are using the major part of the CPU capacity. Is GluonTS not configured to or capable of parallelizing tasks? Thanks for for your reply and explanation in advance.The text was updated successfully, but these errors were encountered: