Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4% CPU usage during FeatureExtraction on multi-core system #1352

Closed
grische opened this issue Mar 27, 2021 · 16 comments
Closed

4% CPU usage during FeatureExtraction on multi-core system #1352

grische opened this issue Mar 27, 2021 · 16 comments
Labels
bug for actual bugs (unsure? use type:question) stale for issues that becomes stale (no solution)

Comments

@grische
Copy link

grische commented Mar 27, 2021

I am running Meshroom on Linux with different core/thread counts and it seems that Meshroom is rarely using any.

For example, in this 64-thread system, it uses 3-4% of each core:
image

I ran Meshroom with the following command:

export MESHROOM_USE_MULTI_CHUNKS=0
~/Meshroom-2021.1.0-av2.4.0-centos7-cuda10.2/meshroom_compute --node FeatureExtraction --forceCompute ./myfile.mg

Log

Program called with the following parameters:
 * contrastFiltering =  Unknown Type "N11aliceVision7feature26EFeatureConstrastFilteringE"
 * describerPreset =  Unknown Type "N11aliceVision7feature21EImageDescriberPresetE"
 * describerQuality =  Unknown Type "N11aliceVision7feature15EFeatureQualityE"
 * describerTypes = "sift,akaze"
 * forceCpuExtraction = 0
 * gridFiltering = 1
 * input = "/home/ubuntu/Meshroom_projects/project_alpha1/MeshroomCache/CameraInit/48cdb1011c673b04ea7f87745787e994e92d6544/cameraInit.sfm"
 * maxNbFeatures = 0 (default)
 * maxThreads = 0
 * output = "/home/ubuntu/Meshroom_projects/project_alpha1/MeshroomCache/FeatureExtraction/e74be379aa871aafbd7ce447ef310e97ff429bf1"
 * rangeSize = 1 (default)
 * rangeStart = -1 (default)
 * relativePeakThreshold = 0.02 (default)
 * verboseLevel =  Unknown Type "N11aliceVision6system13EVerboseLevelE"

[14:25:14.113838][warning] Could not determine number of CUDA cards in this system
[14:25:14.113616][warning] No CUDA-Enabled GPU.
0
[14:25:14.432039][error] cudaGetDeviceCount failed: CUDA driver version is insufficient for CUDA runtime version
[14:25:14.432114][info] Can't find CUDA-Enabled GPU.
[14:25:14.480049][info] Job max memory consumption for one image: 2137 MB
[14:25:14.480098][info] Memory information:
        - Total RAM:     122.948 GB
        - Free RAM:      97.4961 GB
        - Available RAM: 121.118 GB
        - Total swap:    0.957027 GB
        - Free swap:     0.957027 GB

[14:25:14.480140][info] Max number of threads regarding memory usage: 52
[14:25:14.480163][info] # threads for extraction: 52
[14:25:14.771045][info] Extracting akaze features from view '/home/ubuntu/Meshroom_projects/frames_w_metadata/2423.jpg' [cpu]
[14:25:14.806557][info] Extracting akaze features from view '/home/ubuntu/Meshroom_projects/frames_w_metadata/0527.jpg' [cpu]
[14:25:14.823509][info] Extracting akaze features from view '/home/ubuntu/Meshroom_projects/frames_w_metadata/2003.jpg' [cpu]
[14:25:14.826405][info] Extracting akaze features from view '/home/ubuntu/Meshroom_projects/frames_w_metadata/1860.jpg' [cpu]
[14:25:14.826392][info] Extracting akaze features from view '/home/ubuntu/Meshroom_projects/frames_w_metadata/2663.jpg' [cpu]
@grische grische added the bug for actual bugs (unsure? use type:question) label Mar 27, 2021
@natowi
Copy link
Member

natowi commented Mar 27, 2021

That´s because the chunk does not need / can not handle more resources.
Parallel programming is not an easy task and some things require its time.

Here is a analogy to describe the situation:

"20 musicians play a symphony of 94 minutes. How long do 40 musicians need to play the same symphony?"

Computing chunks in parallel can speed up the process by computing in parallel instead of in sequence, but this does not change the requirements of the individual chunk.

To speed up the process, you could try to disable "Force CPU extraction" to utilize the GPU.

@grische
Copy link
Author

grische commented Mar 27, 2021

I am not sure I understand the analogy, as we are not talking about a single frame being processed, but thousand of frames instead.

There are 3500+ frames and 64 threads which should allow creating 50+ batches (considering the memory constraint, it seems to be 52), so I would expect 50+ threads to be fully utilized.

What am I missing here?

@grische
Copy link
Author

grische commented Mar 27, 2021

To speed up the process, you could try to disable "Force CPU extraction" to utilize the GPU.

If you look at the output, that should already happen:

 * forceCpuExtraction = 0

@natowi
Copy link
Member

natowi commented Mar 27, 2021

The point is, that one chunk does not necessarily need or can not utilize the full resource of a single thread. (MESHROOM_USE_MULTI_CHUNKS is still experimental)

@grische
Copy link
Author

grische commented Mar 27, 2021

Is the problem restricted to this single node FeatureExtraction or are all nodes similar performing similar?

@natowi
Copy link
Member

natowi commented Mar 27, 2021

"I tested this [MESHROOM_USE_MULTI_CHUNKS] on a 48 core workstation. The speed-up is not huge, it reduces the overall computation time by ~10%. This reduces computation time in FeatureExtraction, SfM, DepthMapping and Meshing but increases computation time slightly in DMFilter and Texturing."

"Here are the results for a test with 2000 images:
FeatureExtraction/ultra: 2h, CPU up to 100% and up to 200GB ram
FeatureMatching (10^9 features), 100 chunks, 3h per chunk... (aborted) resources for one chunk: CPU@1GHz 5%, ram 26GB -> computer has the capacity to run 10-20 chunks in parallel."

#175
#778


When you run your graph without MESHROOM_USE_MULTI_CHUNKS, does the cpu consumption max out?

@grische
Copy link
Author

grische commented Mar 27, 2021

FeatureExtraction/ultra: 2h, CPU up to 100% and up to 200GB ram

I am curious how you managed to get 100%, while I seem to be getting around ~4% per thread. Different settings?

@natowi
Copy link
Member

natowi commented Mar 27, 2021

Maybe the CPU clock at 1GHz, also the images were quite large.

(multithreaded program will run slower on a faster CPU because the CPU is getting more threads into a blocked state faster than a slower CPU would. *)

@grische
Copy link
Author

grische commented Mar 27, 2021

Maybe the CPU clock at 1GHz, also the images were quite large.

The test images I am working on are quite small, that might have a huge impact.

When you run your graph without MESHROOM_USE_MULTI_CHUNKS, does the cpu consumption max out?

No. I tried it without the parameter, with =0, with =False and with =True without any change.

@ChemicalXandco
Copy link
Contributor

ChemicalXandco commented Mar 27, 2021

* forceCpuExtraction = 0

If you want to max out your cpu you should force cpu extraction and use export MESHROOM_USE_MULTI_CHUNKS=0 like you were doing

@grische
Copy link
Author

grische commented Mar 27, 2021

If you want to max out your cpu you should force cpu extraction and use export MESHROOM_USE_MULTI_CHUNKS=0 like you were doing

I forced CPU extraction, but it seems to make no difference.
It would have surprised me as well, as there is no GPU present in this system.

@natowi
Copy link
Member

natowi commented Mar 27, 2021

Ok I checked again an 4% is not much indeed. In power saving mode, my PC runs at 50% capacity, but in Power mode it runs at 100%. Did you check your power consumption settings?

If you don´t have a supported GPU, don´t change forceCpuExtraction, by default (1) it is set to use the CPU.

@grische
Copy link
Author

grische commented Mar 27, 2021

Did you check your power consumption settings?

I am not sure what you mean. The process is running in a VM, has a niceness=0 (i.e. default) and from what I can tell, the underlying host is not throttling in any way.

@natowi
Copy link
Member

natowi commented Mar 27, 2021

With a VM there could also be other bottlenecks like write/read time to the hard drive / network drive.
This is why I asked, if running the default pipeline also utilizes only 5% of your CPU.

As you can see here, there are multiple factors at play:
https://superuser.com/questions/1264798/why-wont-my-cpu-operate-at-its-max-potential-even-when-my-application-which-ut/1264805

@stale
Copy link

stale bot commented Jul 28, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale for issues that becomes stale (no solution) label Jul 28, 2021
@stale
Copy link

stale bot commented Aug 18, 2021

This issue is closed due to inactivity. Feel free to re-open if new information is available.

@stale stale bot closed this as completed Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug for actual bugs (unsure? use type:question) stale for issues that becomes stale (no solution)
Projects
None yet
Development

No branches or pull requests

3 participants