-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved Collada loader #569
base: gz-common5
Are you sure you want to change the base?
Conversation
Signed-off-by: frederik <[email protected]>
@marcoag @mjcarroll It appears that 3 tests are failing. I am not sure whether this is to do with the modifications done by me as I ran the tests on main branch and they seem to be failing there as well. Do you have an idea? |
@frede791 Mind splitting out the linting changes from the functional code changes or remove the linting changes completely? Will make it easier to look at the actual changes and also easier to debug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this patch looks very promising! I was able to reproduce your numbers as well. I had a few very minor changes around the code. Here's my most important question:
Your description mentions that the optimizations are ranked in order of impact. Could you share some extra information about how much each optimization contributes to the speedup? I'm surprised that the optimization (1) is so important as we were mostly passing references.
@@ -330,7 +339,7 @@ namespace gz | |||
void hash_combine(std::size_t &_seed, const double &_v) | |||
{ | |||
std::hash<double> hasher; | |||
_seed ^= hasher(_v) + 0x9e3779b9 + (_seed << 6) + (_seed >> 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why removing the constant please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a constant doesn't change the distribution of the hash function but costs about 0.2 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is unexpected!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I took the code from https://stackoverflow.com/a/2595226. That's a while back. Maybe there's a better way to do it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of the hash function accounts for about 0.1 seconds on the Torino map.
@caguero I ran the code throught Intel VTune and then analyzed where the majority of the start up time was spent. By far the largest time save comes from changing the
@bperseghetti @caguero I will undo/improve on the linting changes and other requests tomorrow. |
I can take a closer look in the morning. |
Signed-off-by: frederik <[email protected]>
9516485
to
c22b4d9
Compare
Signed-off-by: frederik <[email protected]>
1098a84
to
f0ff34f
Compare
@caguero @bperseghetti I have addressed all issues raised above. Codecheck passes as well. |
…loop, use emplace_back Signed-off-by: frederik <[email protected]>
4cff57c
to
baa748a
Compare
Signed-off-by: frederik <[email protected]>
Signed-off-by: frederik <[email protected]>
1571a65
to
6911db6
Compare
So I think the failures are "real" in that those tests are segfaulting. Edit: and of course they don't reproduce locally. |
graphics/src/ColladaLoader.cc
Outdated
@@ -2028,10 +2241,10 @@ void ColladaLoader::Implementation::LoadPolylist( | |||
std::string offset = polylistInputXml->Attribute("offset"); | |||
if (semantic == "VERTEX") | |||
{ | |||
unsigned int count = norms.size(); | |||
unsigned int count = (*norms).size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the segfault is happening in CI. Since norms is an uninitialized shared_ptr, norms->size()
isn't a valid call at this point. I suppose it depends on your compiler/system if something kinda bad (data corruption) or really bad (segfault) happens at this point, because it seems to only consistently happen on my jammy system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this with the latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining test run failure for the Collada Loader seems to be the result of a mismatch between the actual mesh vertex and normal count and the expected counts
Overall, I am on board with the approach of minimizing copies where possible here, but I'm not sure if switching to shared_ptrs is the exact approach that we want to do yet. I'm going to spend a little more time with this today and see if I can come up with concrete feedback. |
There are also exisiting memory issues in some of these tests that ASAN catches, I'm going to try to get those resolved so we have a clean baseline to discuss from. Make it work right and then make it work fast, right? |
Signed-off-by: frederik <[email protected]>
Okay, I have addressed any current ASAN issues in the |
@mjcarroll I am not quite sure why the test failure is happening for the ColladaLoader. CI tells me that there is a mismatch between expected vertex/normal count and actual count but I am not quite sure why this is happening especially since it is only 1/14 tests in the test suite. I could see if the changes from #571 would have an impact. What do you think? |
@mjcarroll Would you be able to take another look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a comment with a patch that gets the test to pass. It reverts a check but I'm not sure if that change is intended or not.
|
||
// create a map of duplicate indices | ||
// create a map of duplicate indices | ||
if((*prev_vec) != vec){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked into the test failure and found that this check here results in more unique normals. Not sure if I understand the logic here. A vector is marked as unique if it's not equal to the previous one? So say if we have the following vectors: A = [0, 0, 1], B= [0, 1, 0], C =[0, 0, 1]. I think with this logic, C will be marked as unique because it does not equal to B?
If I remove the extra check with the following patch, the test passes:
collada.patch
diff --git a/graphics/src/ColladaLoader.cc b/graphics/src/ColladaLoader.cc
index b7979a3..c6c231c 100644
--- a/graphics/src/ColladaLoader.cc
+++ b/graphics/src/ColladaLoader.cc
@@ -1562,8 +1562,6 @@ void ColladaLoader::Implementation::LoadPositions(const std::string &_id,
auto values = toDoubleVec(valueStr, totCount);
gz::math::Vector3d vec;
- std::shared_ptr<gz::math::Vector3d> prev_vec =
- std::make_shared<gz::math::Vector3d>(gz::math::Vector3d::Zero);
if (!_values)
_values = std::make_shared<std::vector<gz::math::Vector3d>>();
if (!_duplicates)
@@ -1579,16 +1577,10 @@ void ColladaLoader::Implementation::LoadPositions(const std::string &_id,
(*_values).emplace_back(vec);
// create a map of duplicate indices
- if((*prev_vec) != vec){
- if (unique.find(vec) != unique.end())
- (*_duplicates)[(*_values).size()-1] = unique[vec];
- else
- unique[vec] = (*_values).size()-1;
- }
+ if (unique.find(vec) != unique.end())
+ (*_duplicates)[(*_values).size()-1] = unique[vec];
else
unique[vec] = (*_values).size()-1;
-
- (*prev_vec) = vec;
}
this->positionDuplicateMap[_id] = _duplicates;
@@ -1728,8 +1720,6 @@ void ColladaLoader::Implementation::LoadNormals(const std::string &_id,
auto values = toDoubleVec(valueStr, totCount);
gz::math::Vector3d vec;
- std::shared_ptr<gz::math::Vector3d> prev_vec =
- std::make_shared<gz::math::Vector3d>(gz::math::Vector3d::Zero);
if (!_values)
_values = std::make_shared<std::vector<gz::math::Vector3d>>();
if (!_duplicates)
@@ -1747,16 +1737,10 @@ void ColladaLoader::Implementation::LoadNormals(const std::string &_id,
(*_values).emplace_back(vec);
// create a map of duplicate indices
- if((*prev_vec) != vec){
- if (unique.find(vec) != unique.end())
- (*_duplicates)[(*_values).size()-1] = unique[vec];
- else
- unique[vec] = (*_values).size()-1;
- }
+ if (unique.find(vec) != unique.end())
+ (*_duplicates)[(*_values).size()-1] = unique[vec];
else
unique[vec] = (*_values).size()-1;
-
- (*prev_vec) = vec;
}
this->normalDuplicateMap[_id] = _duplicates;
@@ -1917,8 +1901,6 @@ void ColladaLoader::Implementation::LoadTexCoords(const std::string &_id,
auto values = toDoubleVec(valueStr, totCount);
gz::math::Vector2d vec;
- std::shared_ptr<gz::math::Vector2d> prev_vec =
- std::make_shared<gz::math::Vector2d>(gz::math::Vector2d::Zero);
if (!_values)
_values = std::make_shared<std::vector<gz::math::Vector2d>>();
if (!_duplicates)
@@ -1933,16 +1915,10 @@ void ColladaLoader::Implementation::LoadTexCoords(const std::string &_id,
(*_values).emplace_back(vec);
// create a map of duplicate indices
- if((*prev_vec) != vec){
- if (unique.find(vec) != unique.end())
- (*_duplicates)[(*_values).size()-1] = unique[vec];
- else
- unique[vec] = (*_values).size()-1;
- }
+ if (unique.find(vec) != unique.end())
+ (*_duplicates)[(*_values).size()-1] = unique[vec];
else
unique[vec] = (*_values).size()-1;
-
- (*prev_vec) = vec;
}
this->texcoordDuplicateMap[_id] = _duplicates;
I'll go ahead and remove |
Bug fix
Summary
This PR accelerates the loading of collada files which, for large meshes, drastically cuts down on the loading times.
This was done using several techniques, ranked below in order of impact:
To see the improvements for yourself, I have linked a folder called torino below, containing a map of the center Turin, Italy with medium-high resolution. You can run this map using:
GZ_SIM_RESOUCE_PATH=/path/to/parent/dir gz sim -r /path/to/torino/torino.sdf
When running the default colladaloader.cc, this map took about 85 seconds to load on my machine. With this new implementation it took me about 10 seconds. For larger maps, I have seen even more drastic performance improvements. A map that previously did not load in more than 2 hours, now only takes 3 minutes to load.
One technique that I did not manage to use was parallelization. A majority of the startup time is still spent in the loadtriangles() function, which itself calls several functions. Based on my understanding it was not possible to parallelize these functions. The other major loss of time is now the xml parsing itself. The current implementation uses tinyxml2. While good, after some research I found that there are several other parsers that are even faster (pugixml, rapidxml). Rewriting the xml sections to use one of these would be a worthwhile endeavor for the future, as it could lead to another large performance gain for big files.
Link to "torino" map:
https://drive.google.com/drive/folders/13TVY2aOQiS-SN2l3HkUXauy1qlJFKaGa?usp=sharing
Checklist
codecheck
passed (See contributing)Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining
Signed-off-by
messages.