Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy CPU load while downloading export templates #35868

Open
touilleMan opened this issue Feb 3, 2020 · 22 comments
Open

Heavy CPU load while downloading export templates #35868

touilleMan opened this issue Feb 3, 2020 · 22 comments

Comments

@touilleMan
Copy link
Member

Godot version:

3.2 stable

OS/device including version:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Pop!_OS 19.10
Release:	19.10
Codename:	eoan

Issue description:

The export template download page seems to be very demanding

image

Screenshot from 2020-02-03 07-18-31

@Calinou
Copy link
Member

Calinou commented Mar 29, 2020

The export template manager's progress bar only updates twice a second, so it shouldn't be due to the editor re-rendering continuously. As a workaround, you can download the export templates TPZ from the Download page and install them manually from the export template manager using the Install From File option.

@setanarut
Copy link
Contributor

same on macOS

@snoopdouglas
Copy link
Contributor

This still seems to be a problem in 3.3.1. Any ideas what's causing it / why this is assigned to 4.0?

@Calinou
Copy link
Member

Calinou commented May 20, 2021

This still seems to be a problem in 3.3.1. Any ideas what's causing it / why this is assigned to 4.0?

I don't know what's causing it. As for the milestone, we generally assign issues to a distant milestone unless they are regressions that occurred in a specific release.

@Calinou
Copy link
Member

Calinou commented Jun 24, 2021

#48902 might help with this once it's cherry-picked to the 3.x branch.

Can anyone try to use a profiler to see where the CPU usage is coming from?

@rcorre
Copy link
Contributor

rcorre commented Mar 19, 2022

Disabling use_threads reduces the CPU consumption from ~100% to <5% without significantly changing my download speed:

download_templates->set_use_threads(true);

This can be reproed using HTTPRequest to download a large file in GDScript as well:

extends Node2D

func _ready():
	var req := HTTPRequest.new()
	req.use_threads = true
	add_child(req)
	var error = req.request("https://downloads.tuxfamily.org/godotengine/3.4.3/Godot_v3.4.3-stable_x11.64.zip")
	if error != OK:
		push_error("failed")

@rcorre
Copy link
Contributor

rcorre commented Mar 19, 2022

I'm guessing because the thread func does a tight busy loop

while (!hr->thread_request_quit.is_set()) {
bool exit = hr->_update_connection();
if (exit) {
break;
}
OS::get_singleton()->delay_usec(1);
}

@Calinou
Copy link
Member

Calinou commented Mar 19, 2022

The 1 microsecond busywait has always been around: ab93fd1

There were merged PRs increasing a similar busywait on the remote debugger and editor export plugin, but none on HTTPRequest yet.

Can you try increasing delay_usec(1) to delay_usec(6900) (6.9 milliseconds)? If this slows down the download too much, can you try increasing it to delay_usec(1000)?

@rcorre
Copy link
Contributor

rcorre commented Mar 19, 2022

delay_usec(6900) significantly reduces the CPU load overall, though it still spikes close to 100% near the start of the download.

There's no noticeable change in download speed. AFAICT the thread is just checking "are we there yet" repeatedly, so I don't think it generally makes more sense to query faster than the framerate. In the export dialog you could probably get away with an even slower poll rate. Maybe it makes sense as a customization option on the HTTPRequest?

@Calinou
Copy link
Member

Calinou commented Mar 19, 2022

Maybe it makes sense as a customization option on the HTTPRequest?

It seems like a pretty advanced thing to change, so I'm not sure. I'd just leave it at 6900 which is enough to feed any 144 Hz monitor out there (if you want to display the progress somewhere).

rcorre added a commit to rcorre/godot that referenced this issue Mar 19, 2022
Using a threaded HTTPRequest will currently eat ~100% CPU on Linux (and
possibly other OS). The thread is just checking whether the request is
done. It generally doesn't make sense to do this any faster than the
framerate.

Fixes godotengine#35868.
@Faless
Copy link
Collaborator

Faless commented Mar 25, 2022

It seems like a pretty advanced thing to change, so I'm not sure. I'd just leave it at 6900 which is enough to feed any 144 Hz monitor out there (if you want to display the progress somewhere).

I don't see much of a point to have the thread delay to be equal to the frame rate.
You could just not use threads at that point and get basically the same result.
In fact, that would result in worse threaded performance than non-threaded one on monitors with higher frame rate.

The point of using threads is to maximize the download speed (at the cost of CPU), I agree the current usage might be too high, but setting it to the frame rate kind of defeat its purpose.

@rcorre
Copy link
Contributor

rcorre commented Mar 25, 2022

In fact, that would result in worse threaded performance than non-threaded one on monitors with higher frame rate.

It seems like separate bug that non-threaded HTTPRequest performance is framerate-dependent. Should non-threaded requests update in _physics_process rather than _process?

The point of using threads is to maximize the download speed (at the cost of CPU), I agree the current usage might be too high, but setting it to the frame rate kind of defeat its purpose.

With a default chunk size of 64k and a poll rate of 1us, we're expending CPU to optimize for a download speed of:

64k / 1us = 64 * 10^3 * 10^6 = 64GB/s

My proposal would be:

  1. Set the threaded sleep to 64us, which will optimally utilize a 1GB/s connection (ish, handwaving disk I/O, ect.)
  2. Doc that HTTPRequest will burn CPU at lower download rates when using threads until blocking is fixed (at which point the usleep can be removed?)
  3. Use non-threaded http requests in the editor (or make it configurable like https://github.com/rcorre/godot/blob/9614d11bbc98c09c53ccca63d792c2526c6b0c04/editor/plugins/asset_library_editor_plugin.cpp#L45, but I don't see why you'd want this given the current behavior)
  4. Change HTTPRequest to poll during physics frames rather than render frames for non-threaded requests.

Does that sound reasonable?

@Faless
Copy link
Collaborator

Faless commented Mar 25, 2022

Does that sound reasonable?

Not really to be honest.

  • There is no guarantee that the chunk size will be 64k since the server can choose a different one when using chunked encoding (as per the standard).
  • Moving the polling to physics will have little effect, as physics is usually synced with FPS, and when it's not, the extra steps are executed in quick succession during the frame time anyway.
  • I don't want to limit download speed to save some CPU, especially given the problem is of little impact (you are not constantly downloading large files).
  • If your concern is battery life and you do a lot of downloads, you might want to not use threads for now, in most other cases, it doesn't matter much.

I think there is no point adding more hacks than we already have, and the root cause should be fixed instead.

EDIT: To clarify, I truly believe this has to be fixed, but I want the fix to be the correct one, not just another hack that might end up biting back.

@rcorre
Copy link
Contributor

rcorre commented Mar 25, 2022

physics is usually synced with FPS

Fair, but couldn't you use that argument to say that anything belongs in _process rather than _physics_process? My mental model (which could be wrong) is that rendering-related stuff goes in _process, while logic-related stuff goes in _physics_process. This feels like the latter category.

EDIT: To clarify, I truly believe this has to be fixed, but I want the fix to be the correct one, not just another hack that might end up biting back.

Fair enough. Could we leave HTTPRequest alone, but stop using threaded downloads for export templates, or at least make it optional like asset library downloads? I'm guessing the average user won't notice if their templates download a little faster, but they will notice if the CPU is spinning.

@Faless
Copy link
Collaborator

Faless commented Mar 25, 2022

I'm guessing the average user won't notice if their templates download a little faster, but they will notice if the CPU is spinning.

Well, honestly, giving templates it's something you only download when updating the engine, I think the improved download speed is much better then a lower CPU usage when you download them.

@Calinou
Copy link
Member

Calinou commented Mar 25, 2022

I think the improved download speed is much better then a lower CPU usage when you download them.

We can probably try various sleep values on a fast fiber connection and figure out a sweet spot between download speed and CPU usage. Think of the laptops 🙂

@Faless
Copy link
Collaborator

Faless commented Mar 25, 2022

Think of the laptops slightly_smiling_face

Again, we are talking about the case of downloading the editor templates, on battery, on a laptop, on a slow connection.
I think it's more important to think about all the other 99% of the cases.

@rcorre
Copy link
Contributor

rcorre commented Mar 25, 2022

I'm sufficiently convinced that we should not band-aid HTTPRequest by changing the poll rate, but should rather fix blocking, so thanks for walking me through that @Faless. Is there an issue tracking the blocking fix?

I still think using a non-threaded request for export templates would be preferable until threaded requests can work without hot-spinning, but I won't continue to harp on it. I suppose those who are concerned can use curl instead.

@Calinou
Copy link
Member

Calinou commented Mar 25, 2022

I still think using a non-threaded request for export templates would be preferable until threaded requests can work without hot-spinning, but I won't continue to harp on it. I suppose those who are concerned can use curl instead.

I've tested this on a 60 Hz monitor (and V-Sync enabled) and found the download speed to be negatively impacted too much. For reference, I have a 1000/500 FTTH connection and am quite close to TuxFamily's servers.

  • 3 downloads of 4.0alpha5 export templates with threads enabled:
    • First run: 35 seconds
    • Second run: 24 seconds
    • Third run: 19 seconds
    • Average: 26 seconds
  • 1 download of 4.0alpha5 export templates with threads disabled: 77 seconds

To test this on your end, change version.py to use "alpha5" instead of "alpha" as a version status and recompile the editor.

@rcorre
Copy link
Contributor

rcorre commented Mar 26, 2022

Ok, I'll concede it is better to leave this as-is for now. What follows was just for my own curiosity:

I decided to give it a try as well, with this script:

extends Node

func _ready():
	var req = HTTPRequest.new()
	req.set_use_threads(false)
	add_child(req)

	req.download_file = "templates.zip"
	var err = req.request("https://downloads.tuxfamily.org/godotengine/3.4.4/Godot_v3.4.4-stable_x11.64.zip")
	if err != OK:
		push_error("An error occurred in the HTTP request.")
		get_tree().quit()

	await req.request_completed
	get_tree().quit()

I started with curl for a baseline:

real    0m18.428s
user    0m0.062s
sys     0m0.144s

Then I tried with use_threads=true:

real    0m18.348s
user    0m7.559s
sys     0m8.403s

real    0m16.789s
user    0m7.381s
sys     0m8.332s

real    0m18.096s
user    0m8.494s
sys     0m9.216s

Same speed as curl, but a lot more time on the CPU.

Then I set use_threads=false:

real    0m19.014s
user    0m0.870s
sys     0m0.346s

real    0m18.931s
user    0m0.987s
sys     0m0.422s

real    0m17.981s
user    0m0.728s
sys     0m0.561s

Still the same speed, but much more reasonable CPU:

Then I set my display to 30hz and tried again with use_threads=false:

real    0m30.904s
user    0m0.763s
sys     0m0.312s

Ok, that's a fair bit slower.
It's still weird to me that it uses process and not physics_process, but oh well.

Out of curiosity, I tried patching HTTPRequest to greedily consume all available chunks on each poll:

diff --git a/scene/main/http_request.cpp b/scene/main/http_request.cpp
index 700ba761f6..b035a14055 100644
--- a/scene/main/http_request.cpp
+++ b/scene/main/http_request.cpp
@@ -372,6 +372,7 @@ bool HTTPRequest::_update_connection() {
                                }
                        }

+                       while (true) {
                                client->poll();
                                if (client->get_status() != HTTPClient::STATUS_BODY) {
                                        return false;
@@ -391,6 +392,8 @@ bool HTTPRequest::_update_connection() {
                                        } else {
                                                body.append_array(chunk);
                                        }
+                               } else {
+                                       break;
                                }

                                if (body_size_limit >= 0 && downloaded.get() > body_size_limit) {
@@ -408,6 +411,7 @@ bool HTTPRequest::_update_connection() {
                                        call_deferred(SNAME("_request_done"), RESULT_SUCCESS, response_code, response_headers, body);
                                        return true;
                                }
+                       }

                        return false;
real    0m19.904s
user    0m0.594s
sys     0m0.326s

In a way, that's the best of both worlds, but I'm guessing HTTPRequest deliberately consumes only a single chunk per poll to avoid framerate drops?

@Faless
Copy link
Collaborator

Faless commented Mar 26, 2022

but I'm guessing HTTPRequest deliberately consumes only a single chunk per poll to avoid framerate drops?

Exactly, polling in a tight loop during process is not okay because it may stall the engine for too long if there the transfer rate is high.

@Calinou
Copy link
Member

Calinou commented Feb 8, 2024

I can still reproduce this as of 4.3.dev3 on Linux:

image

The download is going between 20 MB/s and 40 MB/s (megabytes) for reference, and the CPU is an i9-13900K. The speed matches a curl download of https://downloads.tuxfamily.org/godotengine/4.3/dev3/Godot_v4.3-dev3_export_templates.tpz so it's good, but curl only needs 3-4% CPU to achieve the same speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants