-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Settings > Performance > Threads: Automatic setting works poorly on CPUs with many threads #7160
Comments
I checked the code. The automatic detection works fine, but there are differences in what the code does depending on if the number of threads is set to 0 or not. Oddly enough, performance is actually worse for CPUs with few cores. Can you enable verbose mode and see what gets printed in the terminal/command prompt? Here's what I see. Automatic:
Manual (4 threads, the maximum for my computer):
For me, automatic is about 30% faster. |
How do I do that? |
To enable verbose mode, find your Open the terminal or command prompt. Run RawTherapee from there. You may need to add the |
Is the command terminal supposed to show something as soon as I change the threads option? |
If it does show something, you can ignore it. We are only interested in what it shows when the preview updates. |
Its not showing anything. Am I doing something wrong? |
I have encountered this issue before, but I don't remember how to get it to show messages because it's been a while since I debugged in Windows. Maybe @Desmis can tell us how to see the verbose messages. |
@Lawrence37 The options file, in : C:\Users\jdesm\AppData\Local\RawTherapee5-dev [External Editor] and after in console Mingw64 |
@Desmis that worked, apparently the dash is what threw it off. I needed to type Here is the output, I hope it is useful (because I don't really understand it)
Automatic:
|
Interesting. It says it uses one thread when you manually set it, but uses 24 threads when it's automatic. Theoretically, it should be much faster with 24 threads (automatic) which is the opposite of what you observe. |
I was puzzled by that as well. I guess it possible I mixed them up? |
I am puzzled by both settings only using one main thread and a difference in nested threads. The GUI has an algorithm to calculate an optimum setting, but the efficiency depends on memory and wavelet levels. We ran a controlled experiment using the
I believe we maxxed out the efficiency by offering OMP threads that closely matched the wavelet levels. Moving up to 16 only shaved a few hundred milliseconds off a pretty long and duplicative routine. A similar data point measured by @SilvioGrosso shows a similar optimization around 8 threads:
Here, the increase to 16 from 8 was about 10% more inefficient. |
It's probably not mixed up. My results show the same behavior. The two specific things I find interesting are (1) the use of 24 threads when the number of cores you have is 20 (the code puts a limit equal to the number of cores) and (2) why the thread count is still 1 after manually setting the threads so high (the threads used is calculated with a formula that should result in a number greater than 1). |
@Lawrence37 @SilvioGrosso ‘s computer has the 20 threads (6 P-cores, 8 E-cores). |
Ok, that makes sense. I thought the first system specs in the original post was @chaimav's computer. |
Correct, I have only benchmarked one computer, an i7 13700 (full specs here https://www.amazon.com/dp/B0CFBDRMXT ). My previous computer was an i3 8100 which rotated to my wife. I it will be of value, I can run the scripts on it as well. |
@chaimav I created a branch which respects the number of threads set in preferences when using wavelets. I'm interested in knowing what the performance is like for different manually-set values. Executables will be available for download at the bottom of this page in a few minutes: https://github.com/Beep6581/RawTherapee/actions/runs/10238785142 |
@Lawrence37 I just tested RawTherapee_wavelet-thread-num_5.10-383-gf9bcf594b_win64_release with different numbers set for for threads and found no discernable difference with processing post scrolling**. Performance was similar to zero (automatic) of the standard dev build. **Tested with a stopwatch so some error is to be expected, but on the regular Dev build, non zero numbers shows noticeable improvement |
It's also slow with 1 thread? I expected it to have the same performance as dev with manual threads since both use 1 thread. |
I didn't try with 1, but I tried other numbers like 8 and 24. On the dev version, those are noticiably faster than 0. On this version I saw no difference between 0 and those numbers. I can try 1 when I get home tonight |
The default setting of 0 (Automatic) does not perform well on modern Intel CPUs with high thread counts.
Tested on:
Test: Enabling Wavelets > Sharp-mask and clarity and panning while zoomed in. Raising from automatic to a higher number greatly reduces processing time post panning.
Processor: 12th Gen Intel(R) Core™ i7-12700H (20 CPUs), ~2.7GHz
Memory: 32768MB RAM
Card name: NVIDIA GeForce RTX 3070 Ti Laptop GPU (RawTherapee does not take advantage of the GPU…)
SSD: 1 terabyte - NVMe SAMSUNG MZVL21T0HCLR-00BT7
video here: https://discuss.pixls.us/t/how-to-optimize-rawtherapee/44786/27?u=chaimav
Also tested on:
Processor: 13th Gen Intel(R) Core(TM) i7-13700 2.10 GHz
Memory: 16Gb
Card name: None (integrated graphics)
SSD: 1 TB NVMe Micron_2400_MTFDKBA1T0QFM
Can the automatic setting be improved to detect higher processors?
The text was updated successfully, but these errors were encountered: