Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multithreaded Godot Engine Bullet processing. #40073

Closed
wants to merge 1 commit into from

Conversation

fire
Copy link
Member

@fire fire commented Jul 2, 2020

Test case godotengine/godot-demo-projects#483.

See #40072.

There's a crash.

#40064 needs to have a proposal written.

Run test case, on Win10 when in 4-5th test it crashes.

Soft body is disabled when using multicore.

@starwolfy
Copy link

Here is the proposal
Thank you @fire !

@fire fire force-pushed the multicore-bullet branch 2 times, most recently from 722b422 to f6c7b50 Compare July 3, 2020 18:47
}

dispatcher = bulletnew(GodotCollisionDispatcher(collisionConfiguration));
broadphase = bulletnew(btDbvtBroadphase);
solver = bulletnew(btSequentialImpulseConstraintSolver);
int32_t process_count = OS::get_singleton()->get_processor_count() - 2;
Copy link
Contributor

@Ansraer Ansraer Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain your reasoning for hardcoding the number of used threads? I am not sure how exactly multithreading is implemented in bullet and how well it scales, but I am guessing that after some point (e.g. machines with >=16/32) It would make more sense to keep some of them free for stuff like graphics or audio.

Copy link
Member Author

@fire fire Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that 2 is the number of hyperthreads that consist of one core. I then subtract it from the list of logical threads the os sees. The main thing is to not step on the gameplay thread (Main).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing this would work fine as long as there are not too many other threads out there. However, this might fail terribly if a dev writes a heavily parallelized game. In the end, it all comes down to how the threads are distributed.
I personally would probably set a max limit of physics threads, but tbh it probably won't ever really matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up integrating this feature, the best would be to have the max number of threads in project settings. As you mentioned, the right value really depends on how threads are used for the whole project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project configuration will default to 0 and any integer number above 0 will be the limit of core threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, some sort of config option would be nice. However, I am not sure if we want to use integers since the dev has no way of knowing what hardware players have. It's not a perfect solution, but what about using percentages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure a percent where it is a float number from 0 to 1? Have to see how other settings handle this.

@fire
Copy link
Member Author

fire commented Jul 3, 2020

Testing by @pouleyKetchoupp showed that my gut feel of single threaded physics is faster than multicore. Not sure where the bottleneck is.

@Ansraer
Copy link
Contributor

Ansraer commented Jul 4, 2020

Could you link us the produced data and information on how exactly this test works?
Right now it doesn't really surprise me that a parallel algorithm is slower than a sequential one. That's almost always the case for small enough n.
Could it be that the test wasn't demanding enough to really push the sequential algorithm to it's limit?

Really hope that's the problem, because all other reasons that come to mind would be an absolute nightmare to debug.

@pouleyKetchoupp
Copy link
Contributor

Here's the demo scene I was using for tests:
https://github.com/godotengine/godot-demo-projects/blob/master/3d/physics_tests/tests/performance/test_perf_contacts.tscn

Godot target: release_debug
OS: Windows 10
CPU: core i7 (2 cores, 4 logical processors)

So far I've only checked the integrated profiler in Godot to compare the spike in physics, and I got ~1s in single-threaded mode and ~1.2s in multi-threaded mode.

It might be because this stress test is not pushing the physics in the areas that are parallelized (there's only 100 bodies at once in this specific test, all added in the same spot to generate lots of contacts). It could be also that Bullet doesn't do such a good job at parallelizing, so it needs more investigation to see if there are some parameters to change that could give better results.

So I'm planning to check with more detailed profiling and with more different scenarios. I'll share more updates here.

@Ansraer
Copy link
Contributor

Ansraer commented Jul 4, 2020

100 bodies sounds like a very small problem, which would favor the sequential algorithm.
Maybe try adding more cubes over time. If everything works as intended the runtime of the parallel algorithm should rise significantly slower than that of the sequential one.

@pouleyKetchoupp
Copy link
Contributor

The scenario I've tested is a stress test for the solver, and it's important since that part of Bullet supports parallelization. The number of bodies is limited on purpose, so you can specifically identify performance problems in that part of the physics engine. I've started with this scenario because I had a test scene ready for it.

I agree though about the fact a scenario like the one you describe is needed too. It will help to stress out other parts of the physics engine that scale up with the number of bodies in the simulation.

}

dispatcher = bulletnew(GodotCollisionDispatcher(collisionConfiguration));
broadphase = bulletnew(btDbvtBroadphase);
solver = bulletnew(btSequentialImpulseConstraintSolver);
int32_t process_count = OS::get_singleton()->get_processor_count() - 2;
solver_pool = bulletnew(btConstraintSolverPoolMt(process_count));
solver_mutithread = bulletnew(btSequentialImpulseConstraintSolverMt);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classbtSequentialImpulseConstraintSolverMt is the single island solver. To gain performance may be helpful to tune this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent the last half hour reading through the forums on pybullet.org. I only found a handful of mentions of other solvers, and each and every time users were warned against using them since they were badly optimized.
As far as I can tell from what little information I found btSequentialImpulseConstraintSolverMt is the way to go.
Would appreciate it if someone who knows more about bullet could confirm this.

@fire
Copy link
Member Author

fire commented Jul 6, 2020

I'm stalled. Not sure where to go with this.

@pouleyKetchoupp
Copy link
Contributor

pouleyKetchoupp commented Jul 6, 2020

I'm still up for testing more things when I find time and see where to go from there. Maybe you can convert it to draft for now since it's still in evaluation phase?

@fire fire marked this pull request as draft July 6, 2020 19:17
@Ansraer
Copy link
Contributor

Ansraer commented Jul 8, 2020

Ok, so I spent some more time looking into Bullet Physics. (I think) I am making steady progress. @fire , could it be that you forgot to update the Godot collision dispatcher? Right now it extends btCollisionDispatcher, which is fine for single-threaded physics, but we will probably need btCollisionDispatcherMt for multithreading. If we want to support both we will probably need a second godot_collision_dispatcher.

@fire
Copy link
Member Author

fire commented Jul 9, 2020

@Ansraer I update[d] the pr. However, I can't test this in 4.0 at all. The 3.2 branch is testable but different.

@Ansraer
Copy link
Contributor

Ansraer commented Jul 11, 2020

I just built this in 4.0 and created a little test scene for it. What I did was to spawn a new cube every 100 ticks and let it drop onto a plane.
From what I could tell all 8 threads were used by godot, with godot taking about 20ms for a physics tick. I tried to compare that to single threaded performance,but had some trouble getting a consistent results since my laptop usually ended up thermal throttling sooner or later.
It feels like something is working, but I would appreciate it if someone with more stable thermals could run a similar test and post the results. Would be good to know how big the performance difference really is and what speedup we can expect.

@fire fire force-pushed the multicore-bullet branch 2 times, most recently from 3fdadd4 to dd27de0 Compare July 11, 2020 17:54
@fire fire marked this pull request as ready for review November 12, 2020 03:45
@fire
Copy link
Member Author

fire commented Nov 12, 2020

Action items:

Best would be to have the max number of threads in project settings. As you mentioned, the right value really depends on how threads are used for the whole project.

A percent where it is a float number from 0 to 1? It will default to 1.

Edited:

Done.

@fire
Copy link
Member Author

fire commented Nov 12, 2020

Anyone willing to create a 4.0 sample test?

@fire fire force-pushed the multicore-bullet branch 3 times, most recently from 185cae4 to a85ab9d Compare November 12, 2020 07:48
@fire
Copy link
Member Author

fire commented Dec 23, 2020

@pouleyKetchoupp can use this as a reference but will be closing the pr since I won't be updating it and I didn't see a performance increase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants