Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized physics object spawn time, improved bullet physics server code. #42643

Closed
wants to merge 2 commits into from

Conversation

AndreaCatania
Copy link
Contributor

@AndreaCatania AndreaCatania commented Oct 8, 2020

This is a combination of #39726 #40252 #41096 that got reverted here #42639

  • Optimized physics object spawn time, by integrating lazy body reload.
  • Optimized shape usage when the shape is not scaled.
  • Make sure to correctly fetch gravity when the integrate_forces function is used
  • Added override keyword
  • Using LocalVector

Test project: TestSpawnRB.zip you can just open an launch it.
The benchmark code is this:

extends Node

var count = 0

var shapes_per_frame := 2000
var shape := BoxShape3D.new()
var t1
var overall: int = 0

func _ready():
	t1 = OS.get_ticks_msec()
	_do_test()

func _physics_process(delta):
	if count < 3:
		var t2 = OS.get_ticks_msec()
		count += 1
		print("Frame ", count, ": ", t2 - t1, "ms")
		overall += t2 - t1
	else:
		print("Overall ", overall, "ms")
		get_tree().quit()

func _do_test():
	print("Test start")
	for i in range(0, shapes_per_frame):
		var rb = RigidBody3D.new()
		var c = CollisionShape3D.new()
		c.set_shape(shape)
		rb.add_child(c)
		add_child(rb)
		if i % 50 == 0:
			print("Added: ", i)
Test start
Added: 0
Added: 50
Added: 100
Added: 150
Added: 200
Added: 250
Added: 300
Added: 350
Added: 400
Added: 450
Added: 500
Added: 550
Added: 600
Added: 650
Added: 700
Added: 750
Added: 800
Added: 850
Added: 900
Added: 950
Added: 1000
Added: 1050
Added: 1100
Added: 1150
Added: 1200
Added: 1250
Added: 1300
Added: 1350
Added: 1400
Added: 1450
Added: 1500
Added: 1550
Added: 1600
Added: 1650
Added: 1700
Added: 1750
Added: 1800
Added: 1850
Added: 1900
Added: 1950
Frame 1: 33ms
Frame 2: 9146ms
Frame 3: 14503ms
Overall 23682ms

This is flame graph extracted from the perf of this PR:
Screenshot from 2020-08-07 09-43-54

This is flame graph extracted from the perf of master branch:
Screenshot from 2020-08-07 09-13-49

Without this PR, the set_ functions (like set_layer / set_mask) remove and add the body from the Physics World to recompute the entire hierarchy and cache.

Godot, call these functions one by one when a new RigidBody node is created, so the cache is internally reconstructed multiple times. With this PR, the body is created at the start of the frame and so all the code is executed only 1 time.

@akien-mga
Copy link
Member

akien-mga commented Oct 8, 2020

Do we have any demo projects available that have been ported to the current master branch? It would be nice to test physics changes on demos like the 3D platformer or TPS demo to check that everything work well (and ideally better when it's an optimizing like here, though I don't know if they would be impacted).

Maybe we could also have "pathological" physics demo that use lots of bodies and various APIs to stress test the implementation.

@akien-mga
Copy link
Member

Note: It's not particularly important, but remember that commit logs should be formatted as one title on the first line, then a blank line, and then the body of the commit message. Otherwise all of it is concerned part of the commit title, which leads to this kind of weirdness (see header): https://patch-diff.githubusercontent.com/raw/godotengine/godot/pull/42643.patch

@Calinou
Copy link
Member

Calinou commented Oct 8, 2020

Maybe we could also have "pathological" physics demo that use lots of bodies and various APIs to stress test the implementation.

We have one, but it needs to be ported to the master branch: https://github.com/godotengine/godot-demo-projects/tree/master/3d/physics_tests

@AndreaCatania
Copy link
Contributor Author

Here the test project TestSpawnRB.zip you can just open and launch it. You will see in the console the overall time taken to process 3 frames; you can test the PR and the current master branch to see the difference.

@AndreaCatania
Copy link
Contributor Author

Note: It's not particularly important, but remember that commit logs should be formatted as one title on the first line, then a blank line, and then the body of the commit message. Otherwise all of it is concerned part of the commit title, which leads to this kind of weirdness (see header): https://patch-diff.githubusercontent.com/raw/godotengine/godot/pull/42643.patch

Title added.

@akien-mga
Copy link
Member

Here's a quick and dirty port of the 3D platformer demo to current master: platformer.zip

@madmiraal
Copy link
Contributor

This PR is difficult to review, because it is both complex and trying to do multiple things. I think, to enable it to be reviewed and tested properly and effectively, it needs to be broken up into separate, self-contained commits and probably PRs; perhaps one for each of its deliverables:

  • Optimized physics object spawn time, by integrating lazy body reload.
  • Optimized shape usage when the shape is not scaled.
  • Make sure to correctly fetch gravity when the integrate_forces function is used
  • Added override keyword
  • Using LocalVector

@pouleyKetchoupp
Copy link
Contributor

@madmiraal I'm not totally opposed to splitting, but I don't think it's really necessary. At this point, I would rather review this PR as it is, because asking @AndreaCatania to group his PRs when they are split, then to split them when they are grouped is not going to help with (hopefully) finalizing this optimization soon :)

These two are a bit off-topic so they could be possibly separated. They don't add too much to the complexity though, because the first one is just one block and the second one is only in headers.

Make sure to correctly fetch gravity when the integrate_forces function is used
Added override keyword

The others are all about optimizing physics spawn time and they affect different areas of code.

  1. Optimized physics object spawn time, by integrating lazy body reload

This one is the only really complex change. For test purpose, different parts can be disabled by calling do_reload_shapes, do_reload_collision_filters or do_reload_body directly in some methods.

  1. Optimized shape usage when the shape is not scaled

This change is localized in shape_bullet so it's easy to review. For test purpose, it can be disabled by commenting the first lines in ShapeBullet::create_bt_shape.

  1. Using LocalVector

This one is only affecting vector accesses, and it's an easy, obvious optimization since it just gets rid of copy-on-write.

Copy link
Contributor

@pouleyKetchoupp pouleyKetchoupp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes generally look ok, I've made a few comments for possible improvements.

I'll try to run my own tests about the performance gain later this week.

modules/bullet/collision_object_bullet.h Outdated Show resolved Hide resolved
Comment on lines 395 to 409
for (uint32_t i = 0; i < queue_pre_flush.size(); i += 1) {
queue_pre_flush[i]->dispatch_callbacks();
queue_pre_flush[i]->is_in_flush_queue = false;
}
for (uint32_t i = 0; i < queue_flush.size(); i += 1) {
queue_flush[i]->dispatch_callbacks();
queue_flush[i]->is_in_flush_queue = false;
}
queue_pre_flush.clear();
queue_flush.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments here to explain what the logic is behind having two different queues could make it clearer for future contributors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please more comments for future contributors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 awesome thx!


btCollisionShape *ShapeBullet::create_bt_shape(const Vector3 &p_implicit_scale, real_t p_extra_edge) {
btVector3 s;
G_TO_B(p_implicit_scale, s);
return create_bt_shape(s, p_extra_edge);
}

btCollisionShape *ShapeBullet::create_bt_shape(const btVector3 &p_implicit_scale, real_t p_extra_edge) {
if (p_extra_edge == 0.0 && (p_implicit_scale - btVector3(1, 1, 1)).length2() <= CMP_EPSILON) {
return default_shape;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not creating the default shape here only when needed, to avoid always creating two shapes when there is a scale? Also, maybe with a ref count for the default shape to know when to destroy it, rather than storing the old default shape? That would simplify the logic around notifyShapeChanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch! Integrated.

@pouleyKetchoupp
Copy link
Contributor

I have problems to do proper performance tests at the moment, because I'm getting crashing asserts from Bullet when running TestSpawnRB project on latest master.

It's this assert in bvVector3::normalize():

btAssert(!fuzzyZero());
[Inline Frame] godot.windows.opt.tools.64.exe!btVector3::normalize() Line 305	C++
[Inline Frame] godot.windows.opt.tools.64.exe!btVector3::normalized() Line 953	C++
[Inline Frame] godot.windows.opt.tools.64.exe!btSingleSweepCallback::{ctor}(const btConvexShape *) Line 1040	C++
godot.windows.opt.tools.64.exe!btCollisionWorld::convexSweepTest(const btConvexShape * castShape, const btTransform & convexFromWorld, const btTransform & convexToWorld, btCollisionWorld::ConvexResultCallback & resultCallback, float allowedCcdPenetration) Line 1101	C++
godot.windows.opt.tools.64.exe!btDiscreteDynamicsWorld::integrateTransformsInternal(btRigidBody * * bodies, int numBodies, float timeStep) Line 1002	C++
godot.windows.opt.tools.64.exe!btDiscreteDynamicsWorld::integrateTransforms(float timeStep) Line 1056	C++
godot.windows.opt.tools.64.exe!btDiscreteDynamicsWorld::internalSingleStepSimulation(float timeStep) Line 489	C++
godot.windows.opt.tools.64.exe!btSoftRigidDynamicsWorld::internalSingleStepSimulation(float timeStep) Line 91	C++
godot.windows.opt.tools.64.exe!btDiscreteDynamicsWorld::stepSimulation(float timeStep, int maxSubSteps, float fixedTimeStep) Line 435	C++
godot.windows.opt.tools.64.exe!BulletPhysicsServer3D::step(float p_deltaTime) Line 1552	C++

It looks like it's due to something weird happening around ccd. Looking at the callstack, this code shouldn't be run on bodies with default settings.

From what I see, ccd seems to be activated on all bodies with this callstack:

>	godot.windows.tools.64.exe!RigidBodyBullet::set_continuous_collision_detection(bool p_enable) Line 734	C++
 	godot.windows.tools.64.exe!RigidBodyBullet::main_shape_changed() Line 309	C++
 	godot.windows.tools.64.exe!RigidCollisionObjectBullet::reload_shapes() Line 361	C++
 	godot.windows.tools.64.exe!RigidBodyBullet::reload_shapes() Line 811	C++
 	godot.windows.tools.64.exe!RigidCollisionObjectBullet::add_shape(ShapeBullet * p_shape, const Transform & p_transform, bool p_disabled) Line 231	C++
 	godot.windows.tools.64.exe!BulletPhysicsServer3D::body_add_shape(RID p_body, RID p_shape, const Transform & p_transform, bool p_disabled) Line 501	C++
 	godot.windows.tools.64.exe!CollisionObject3D::shape_owner_add_shape(unsigned int p_owner, const Ref<Shape3D> & p_shape) Line 253	C++

Because main_shape_changed is called two times for each added body. The first time it disables ccd, which sets the internal ccd threshold to be 10000. The second time, it enables ccd again because the internal threshold is > 0, due to this method:

bool RigidBodyBullet::is_continuous_collision_detection_enabled() const {
	return 0. < btBody->getCcdMotionThreshold();
}

I'll open a PR to propose a fix so performance changes from this PR can be properly compared on master.

…t server.

- Optimized physics object spawn time, by integrating lazy body reload.
- Optimized shape usage when the shape is not scaled.
- Make sure to correctly fetch gravity when the integrate_forces function is used
- Added override keyword
- Using LocalVector
@AndreaCatania
Copy link
Contributor Author

@pouleyKetchoupp I saw that you fixed the CCD issue. I've rebased and the crash seems gone. Though, I didn't had that, so I can't really say it's fixed. Please let me know if you still have the crash (in case you still have that, plese paste the stack trace).

@AndreaCatania
Copy link
Contributor Author

Never mind, I'm able to reproduce the issue. Thought, it's not affecting this PR any more.

@AndreaCatania
Copy link
Contributor Author

@pouleyKetchoupp did you try this after the CCD fix?

@pouleyKetchoupp
Copy link
Contributor

@AndreaCatania I've made some quick tests but I still need to do more in-depth performance comparison. I've been busy with other things but I'll finish reviewing it as soon as I can find extra time.

@AndreaCatania
Copy link
Contributor Author

@pouleyKetchoupp Ok thanks!

@pouleyKetchoupp
Copy link
Contributor

pouleyKetchoupp commented Jan 6, 2021

I finally got to do proper performance comparisons, sorry for the delay!

Here's the project I'm using (it's part of the physics tests framework, ported to 4.0):
broadphase-test-4.0.zip

This PR is also needed on master to make it work properly:
#44104 - Fix error when calling coroutine with await in _ready

And I can confirm a clear improvement when adding objects (about 2x faster with 125000 objects on my configuration, when counting the extra time spent on the next physics step, since now a part of the process is postponed).

On the other hand, it seems removing objects is almost 3x slower than before. I haven't investigated why exactly. @AndreaCatania could you please check that?

Here are detailed timings I get with the attached test scene (timings are pretty consistent).
Physics ticks timings are deltas between two physics ticks, and the first one includes the time spent to create/add/move/remove objects as well.

Bullet / Current master

* Creating objects...
  Create Time: 1588.323 ms
  Physics Tick: 1595.326 ms (total = 1595.326 ms)
  Physics Tick: 1.999 ms (total = 1597.325 ms)
  Physics Tick: 1.231 ms (total = 1598.556 ms)
  Physics Tick: 1.506 ms (total = 1600.062 ms)
* Adding objects...
  Add Time: 9086.594 ms
  Physics Tick: 9390.825 ms (total = 9390.825 ms)
  Physics Tick: 564.139 ms (total = 9954.964 ms)
  Physics Tick: 246.837 ms (total = 10201.801 ms)
  Physics Tick: 245.333 ms (total = 10447.134 ms)
* Moving objects...
  Move Time: 112.258 ms
  Physics Tick: 251.868 ms (total = 251.868 ms)
  Physics Tick: 579.750 ms (total = 831.618 ms)
  Physics Tick: 245.037 ms (total = 1076.655 ms)
  Physics Tick: 245.109 ms (total = 1321.764 ms)
* Removing objects...
  Remove Time: 2254.943 ms
  Physics Tick: 2276.565 ms (total = 2276.565 ms)
  Physics Tick: 3.215 ms (total = 2279.780 ms)
  Physics Tick: 1.398 ms (total = 2281.178 ms)
  Physics Tick: 1.228 ms (total = 2282.406 ms)

Bullet / PR 42643

* Creating objects...
  Create Time: 1464.757 ms
  Physics Tick: 1481.533 ms (total = 1481.533 ms)
  Physics Tick: 2.624 ms (total = 1484.157 ms)
  Physics Tick: 2.054 ms (total = 1486.211 ms)
  Physics Tick: 1.397 ms (total = 1487.608 ms)
* Adding objects...
  Add Time: 884.057 ms
  Physics Tick: 1010.470 ms (total = 1010.470 ms)
  Physics Tick: 3700.365 ms (total = 4710.835 ms)
  Physics Tick: 246.892 ms (total = 4957.727 ms)
  Physics Tick: 247.128 ms (total = 5204.855 ms)
* Moving objects...
  Move Time: 109.260 ms
  Physics Tick: 241.588 ms (total = 241.588 ms)
  Physics Tick: 575.832 ms (total = 817.420 ms)
  Physics Tick: 248.515 ms (total = 1065.935 ms)
  Physics Tick: 247.323 ms (total = 1313.258 ms)
* Removing objects...
  Remove Time: 5944.551 ms
  Physics Tick: 5995.082 ms (total = 5995.082 ms)
  Physics Tick: 2.237 ms (total = 5997.319 ms)
  Physics Tick: 2.046 ms (total = 5999.365 ms)
  Physics Tick: 1.515 ms (total = 6000.880 ms)

Godot Physics 3D

* Creating objects...
  Create Time: 1417.275 ms
  Physics Tick: 1421.562 ms (total = 1421.562 ms)
  Physics Tick: 11.435 ms (total = 1432.997 ms)
  Physics Tick: 2.029 ms (total = 1435.026 ms)
  Physics Tick: 1.596 ms (total = 1436.622 ms)
* Adding objects...
  Add Time: 1280.441 ms
  Physics Tick: 1318.668 ms (total = 1318.668 ms)
  Physics Tick: 357.871 ms (total = 1676.539 ms)
  Physics Tick: 227.257 ms (total = 1903.796 ms)
  Physics Tick: 228.116 ms (total = 2131.912 ms)
* Moving objects...
  Move Time: 120.289 ms
  Physics Tick: 512.350 ms (total = 512.350 ms)
  Physics Tick: 224.549 ms (total = 736.899 ms)
  Physics Tick: 2.229 ms (total = 739.128 ms)
  Physics Tick: 18.212 ms (total = 757.340 ms)
* Removing objects...
  Remove Time: 465.413 ms
  Physics Tick: 484.925 ms (total = 484.925 ms)
  Physics Tick: 2.018 ms (total = 486.943 ms)
  Physics Tick: 2.646 ms (total = 489.589 ms)
  Physics Tick: 1.151 ms (total = 490.740 ms)

@akien-mga
Copy link
Member

This would need to be redone for 3.x as Bullet was removed from the master branch, so it can be merged.

I also agree that the scope of this PR seems to be a bit big and it might be easier to assess and merge some of the changes independently (e.g. switch to LocalVector could likely be merged as is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants