-
-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: CollectDirtyChildren causing app to hang when saving object with many pointers #1589
base: master
Are you sure you want to change the base?
fix: CollectDirtyChildren causing app to hang when saving object with many pointers #1589
Conversation
This may be related to #1588. My suggestion would be to port the test cases I listed first (in this PR), then if that utility method is broken, add the fix (it’s in the comment also). I say this because saving child objects relies on batch saves and looks like the batching decision for large arrays is messed up. If the Android SDK is using the same algorithm for the batch calculation, it will probably need to be fixed also. |
Parse/Parse/PFObject.m
Outdated
@@ -207,7 +207,7 @@ + (BFTask *)_enqueue:(BFTask *(^)(BFTask *toAwait))taskStart forObjects:(NSArray | |||
+ (BOOL)collectDirtyChildren:(id)node | |||
children:(NSMutableSet *)dirtyChildren | |||
files:(NSMutableSet *)dirtyFiles | |||
seen:(NSSet *)seen | |||
seen:(NSMutableSet *)seen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, this may or not be the right solution. This is a recursive call, depending on how it's used this may or may not need to be mutable.
Does the original implementation work fine on a number of smaller children? If it does, I'm assuming it's less than 50, and the issue comes up over 50. If so, the problem might be the batching calculation I mentioned.
When I wrote those test cases, when it went over the set amount to batch (50), it returned only the first 50 objects every time which sounds like a similar symptom that you are reporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a batching issue as batching will happen after all dirty children are calculated. In my situation this recursive method spins without finishing, consuming all CPU. It does indeed work fine for small objects but explodes quickly. There is no batching involved in my case as I know there is only one dirty object, the root object, but this method will run into a seemingly endless loop trying to traverse all child objects.
This recursive call is internal and only called in one place passing in a NSMutableSet
(changing this to require a mutable set only failed in this one call site).
The problem here is triggered by our complex schema with a couple of classes, where each unfortunately is linked back to a root parent class. Now when we use arrays that are traversed and at the end many pointers point to the same parent, this parent is visited many, many, many times, and this explodes very quickly.
p = [Parent]
a.parent = p
b.parent = p
c.parent = p
d.parent = p
r = [Root]
r.objects = [a, b, c, d]
// saving this root will visit each of the children a, b, c, d,
// but the algorithm will not remember that p has already been visited and p will be visited 4 times,
// including all of p's children.
r.saveInBackground()
This explodes exponentially when you have layers of nested pointers to the common object or if you imagine the parent having pointers to other objects. This happens before the dirty children are returned, so before any batching could occur, so I don't think #1588 is necessarily related.
In my case this method never returns in reasonable time (I think it will return eventually, I cannot detect an endless loop in there, but that may take years).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p = [Parent]
a.parent = p
b.parent = p
c.parent = p
d.parent = p
r = [Root]
r.objects = [a, b, c, d]
// saving this root will visit each of the children a, b, c, d,
// but the algorithm will not remember that p has already been visited and p will be visited 4 times,
// including all of p's children.
r.saveInBackground()
I agree that it seems like it should remember p
after visiting a
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems like this would be remedied if p
points to all it's children and if the children really want to know their relation to their parent p
, they just add a field referencing p
's objectId
, not pointing back up to p
, allowing everything to traverse. This was the issue that was coming up in the other issue you referenced.
Eyeballing android, it seems to have may the batch count issue also https://github.com/parse-community/Parse-SDK-Android/blob/a5cfa1f9c6654281c76367ad8b3252e2499bb821/parse/src/main/java/com/parse/Lists.java#L67 |
Can you rebase this with the master branch to run the CI? Also, it sounds like you have a possible testcase to verify your bug fix, can you add that also? |
fc60a5a
to
1f26e35
Compare
Rebased to Does my explaination of the flaw in the algorithm make sense though? |
The explanation makes sense, but I need to think more if the case you presented should work and not introduce other issues. The problems listed in #982 look like circular dependencies to me that the algorithm couldn't detect. In those examples, it doesn't seem like they setup the schema in a way conducive to Parse. What have you seen after your change? Are you using the change in production? Do you have any sample code you can show that demonstrates the issue? |
@dplewis @mtrezza @TomWFox what do you all think? This was also brought up in the Android repo. Sounds like it can be a big problem. The questions I have is “what should the algorithm do in this scenario?”. “Does the change fix the problem without introducing other problems?”. I left some other comments above also... |
Yes, we are using this fix in production now as it has fixed the issue we were seeing of the CPU spinning when trying to save a dirty object that contains an array of pointers to other objects, which point to quite a lot of other objects. I at first also thought about circular dependencies that this algorithm wouldn't detect and that led me down to this code here, where objects would be visited multiple times. Now of course we have circular dependencies everywhere, but the algorithm is 1) detecting the only problematic case just time (two new objects pointing to each other) and 2) is supposed to stop when an object has already been In our case it has been working fine without issues but a recent addition of another level of pointers to our schema made the whole thing spin forever. Again, I cannot proof that this is stuck in an infinite loop or only for 200+ years. We have the same fix in production in Android since a few month and have not seen any issues related to this, but our saving usually only saves a single dirty object in one operation. |
Forgive my limited ObjC / Swift / iOS skills, but a simple test case would be:
No circular dependencies. This call should succeed in both cases, with or without my fix, but without my fix |
in this particular framework, every PFObject is a class (reference type) and since you didn’t save the parent first it might have trouble hashing to a unique value (It doesn’t have an objectId yet) causing the parent not to be recognized as the same object. If you really wanted to do this it seems you should save the parent first (Gets an objectId), then make the children point to the parent after the parent is saved |
Two objects attempting to point to the same object that is unsaved and therefore not unique can cause a problem |
If PFObjects were structs and equatable then there probably wouldn’t be an issue with this, but they are not here |
This would have disastrous consequences already as you'd end up with multiple dirty instances if that was the case. Also I'm pretty sure that because PFObject is a class and not a struct, two pointers to the parent will be identical (reference) and the hash will be identical and the object would only be added to the set once. But this is not the point I am trying to make, my algorithm change will not change what happens when two objects point to the same unsaved object. |
Classes don’t hash to the same value unless you make them conform to hashable/equatable or do some bitwise comparison. Since PFObject subclasses NSObject, you may be getting some help, but that’s not something I would personally depend on. Parse considers an object “Identfiable” when it has an objectId, in your case, none of them do yet but are pointing to the same object. You are claiming they should be identifiable because they are referencing the same local object, but these objects are being encoded and sent to the server with no objectId. Two objects can look exactly the same in encoded form, does that make them the same object? No way to tell, this can either be resolved as a circular dependency or treated as separate ParseObjects... Your coded example looks like a corner case circular dependency when using the current design of the SDK and it seems to be the case with the infinite loop you are getting. I gave a couple of examples of how to offset this, at least when it comes to saving to the server. I don’t know much about how save to local storage work as I assume they get some localId before saving. My questions above about will this change fix the problem without introducing others still remain. If the algorithm is indeed missing this corner case or can be improved to mitigate it, it should. |
In my example the parent object is the same instance. This should have the same hash because it is literally the same instance, not another object of the same class qith the same values. The two child classes are different and will get different objectIds. Also there is no circle and this is no edge case. Feel free to assume that all objects are already persisted and have objectIds and that there is no local storage. The algorithm is simply inefficient and with larger data sets is likely to quickly churn on CPU and may appear to never finish even if none of the objects are dirty. |
This was not the example you presented, it’s completely different. You presented saving 3 “new” objects, that means no objectIds. Your latest comment says assume objectIds... if the children have Parse Ponters to “p”, then their shouldn’t be a traversal issue. I’m assuming in your new scenario you have all keys included, having the complete object of p pointed to which is why p’s keys are even being looked at from the children... At any rate, in order to understand the possible improvements or issues your change may introduce you should post the updated code depicting the scenario that’s causing the issue as closely as possible. |
I fail to see why my example is not sufficient enough to outline what is happening in the algorithm and what I believe is a flaw. Whether objects are new or not is not relevant as they will be traversed regardless, to find circular dependencies of new objects. Neither does my example contain a circular dependency, to keep it simple. Can we agree that the algorithm would be expected to:
The algorithm does all that and there is no issue. My fix is a way to stop the algorithm from visiting nodes it has already seen. Now this will ideally not change the end result of what the algorithm is producing. Whether my fix is negatively impacting other aspects of this algorithm other than outlined above would be up for debate and I'd like to focus on that rather than batching issues outside the scope of the affected function. My simplified test case is of course not suitable to find regressions my fix could introduce but I was hoping we could get back to discuss my proposed changes and whether we can agree on them having positive or negative effect on the algorithm. |
Is there any update on this? We have the occasional issue when saving complex objects, and I suspect it's related. |
This issue has been automatically marked as stale because it has not had recent activity. If you believe it should stay open, please let us know! As always, we encourage contributions, check out the Contributing Guide |
This project is stale. :( |
This issue has been automatically marked as stale because it has not had recent activity. If you believe it should stay open, please let us know! As always, we encourage contributions, check out the Contributing Guide |
7db9be5
to
42f8143
Compare
I will reformat the title to use the proper commit message syntax. |
Rebased onto |
@shlusiak Could you try to write up a test case? A test case is usually required to sustainably fix a bug. Maybe someone from @parse-community/ios-sdk could help. You could look at existing test cases, choose and that is most similar to what you're trying to test, then copy and modify it. Also maybe take a look at the related issue in the Android SDK if there is any test case you can use here. |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
@stephanboner Would you be interested in joining the iOS SDK team there for occasional reviews? |
Hey @mtrezza, thanks for asking! Unfortunately I'm not working as a software developer anymore and therefore I no longer have the required infrastructure (a Mac with Xcode). Thank you though! |
@stephanboner Thanks for replying, wish you all the best in your new career! |
d7625ba
to
0dc8128
Compare
86c1353
to
aac4bdc
Compare
…ng when saving object with many pointers
aac4bdc
to
717081a
Compare
@shlusiak I noticed you made some commits; what is the state of this PR? |
Thanks for reaching out. This PR is still open and valid, and we rely on this branch for our production environment. I had to rebase this branch to get it to compile using XCode 16, but getting this into |
Do you think it's possible to write up a short test for this change? Then we can go ahead and merge this. |
Maybe something like this, added to (void)testCollectDirtyChildrenWithManyPointers {
// Create a root object
ParseObject *rootObject = [ParseObject objectWithClassName:@"RootObject"];
// Add a large number of child pointers to simulate heavy usage
for (int i = 0; i < 1000; i++) {
ParseObject *childObject = [ParseObject objectWithClassName:@"ChildObject"];
childObject[@"data"] = @(i); // Mock data
[rootObject addObject:childObject forKey:@"children"];
}
// Attempt to save the root object, expecting no hang or excessive recursion
XCTestExpectation *expectation = [self expectationWithDescription:@"Object saved without hang"];
[rootObject saveInBackgroundWithBlock:^(BOOL succeeded, NSError *error) {
XCTAssertTrue(succeeded, @"Root object with many pointers should save successfully.");
XCTAssertNil(error, @"No error should occur.");
[expectation fulfill];
}];
[self waitForExpectationsWithTimeout:10 handler:nil];
} |
@mtrezza I'm not actually an iOS developer, and unfortunately don't know Objective C, so even finding the fix was hard enough. If you are happy for me to just commit that test you provided, I can certainly do that. I'm a bit hesitant though to add tests that expect certain timings of functions, as success failure now may depend on the speed of the agents executing these tests, further I don't think unit tests are meant to stress test performance of functions. Are there sufficient unit tests covering |
Could you try adding the test and running it without the fix, just so we know whether it even fails. If it doesn't fail then there's no need to add it. Can you confirm that this fix worked for you? How did you try this out? |
I'm sorry, this is very much out of me league. How do I run tests? None of them compile in my XCode, I do not know how to write ObjC code. I'm very happy to explain the change to the algorithm, why the previous algorithm was very inefficient, caused us great performance problems in our production app, and why my proposed change fixes that for us. The same change was made for Android, and the issue is explained in parse-community/Parse-SDK-Android#1006 (comment) We are running our production app from this custom branch here, and please believe me that the performance problems went away. I understand you might want a usecase so you can reproduce and verify the issue, but I would need some help adding and running the tests. |
Sure, maybe you want to take a look at the existing tests in |
Please see parse-community/Parse-SDK-Android#1058 for the Android PR and parse-community/Parse-SDK-Android#1006 (comment) for the analysis of the issue.
The collectDirtyChildren forgets seen siblings when traversing, which causes extra traversals if those siblings point to the same objects and causes lots of garbage to be created in memory. It also has the potential of adding the same dirty object twice to the dirtyChildren collection, which may or may not be a set.
Imagine the simplified setup of an Object A with an attribute which is an Array containing two pointers to object B. When traversing A, the array will be looped over. For each object found, the
seen
set would be duplicated and B would be added and B would be traversed. When backing out and iterating over the next attribute in that array, the originalseen
variable would be used and B would be traversed again and all of it's child pointers.Removing the copy of the seen set may improve performance significantly.
Closes: #1588
Possibly closes: #982