-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In case of errors on some items during sync, no items are saved #3259
Comments
Hi @gbitaudeau, do you have an redacted version of your schema for us to test with? |
Sorry for the delay, I will make a short project with a simplified schema and add it here as soon as I can. |
Great, thanks. Let us know how it goes and if you have any questions. |
Hi, Here you can find a way to reproduce easily a case where sync failed and no items were saved in datastore. Please note that to make as short as possible, I changed how to reproduce the problem because I encountered this case due to authorization issue (what I described initially: owner changed on some model linked to other), but it takes more step to reproduce as you need to manage auth, signup, signin... So using this schema:
You execute the following code:
For me the import part is not the details about The problem come from how SyncComment query response is managed in Amplify Datastore. Here a sample response:
Hope you have enough information. |
Thank you, we will investigate this and post further updates. |
Hi @thisisabhash , In our latest production release, we added more logs to Crashlytics to be able to measure whether or not this bug affects a large number of our users.. An analysis on our data and Crashlytics show us than more than 500 users have the issue. All affected users have model objects which suffer from a foreign key that targets an object not present in DynamoDB.
This error occurs when sync response contain a content like :
We investigated to try to find the root cause of why some foreign keys reference missing objects and we were able to find at least 3 possible scenarios (but perhaps there are others ...):
I will describe just after how we were able to reproduce these 3 cases on the schema given here #3259 (comment). I will used the model describe by this schema instead of our real model to simplify. When one of these error occurs on a create mutation for a post, the post is not saved in server side but is present in local datastore. Restarting the datastore will not send the post mutation. Then if a comment which belongs to this post is created, it is saved in dynamoDB and then sync failed. To be able to fix sync problem in our application, we took the decision to replace all relations ( Case 1: Expired Cognito TokenWe use amazon cognito user pools and all objects have only one auth rule :
The server return an HTTP error code 401. If it can help, we observed that in method
Case 2: Network secureConnectionFailed errorSimulate this error is quite challenging because the request
Stopping the connection randomly during a request doesn't always produce this error (but we randomly succeeded to get it severals times). To help, we modify
In case 1 and 2, the
Case 3: DataStore.stop during mutationUse the following code:
|
I forgot to mention that we use Amplify 2.23.0 for latest production release. |
Hello @gbitaudeau , Thank you for the detailed information on the ticket. We are investigating this issue with priority. |
Hi @gbitaudeau , I've implemented a solution in the |
Thanks @5d , I will take a look when I can. Just to resume, there are 3 parts in this problem:
|
Hi @gbitaudeau, thanks for your patience and the detailed analysis. For (1), that is right, #3474 is specifically addressing handling partial responses (response with For (2), we have two PRs which will update the retry determination logic to #3475 - Adds the With these two changes, DataStore should retry on the one mutation event indefinitely as long as the retry determination logic returns
I understand this was to mock an error response from the service but If the server actually returns 401 unauthorized, DataStore will not retry. DataStore does not retry on unauthorized errors since it may never succeed, and block the rest of the items. If the Refresh token is expired and cannot refresh the access token, then we will retry (after this PR is in #3477). For (3). In the event that a mutation event fails to sync to AppSync and it is determined not to retry, all failure code paths end up calling the |
We have added the above changes to this branch: https://github.com/aws-amplify/amplify-swift/commits/fix/gh-3259/ |
Hi @gbitaudeau, a question about the repro step in the issue description
What steps did you take to change the parent's owner to another? Which schema and operations did you perform on which model? |
Hi, Thanks all for all your work and for the version 2.25.5 👍 I made some tests on it, and here what I can say:
Also to give you more insight, we listen
I don't know why we got this, but I suspect bugs on our side with the signOut/signIn process explain just before.
I well get the stream of mutation events which fail, but I don't really know what I can do with it because :
|
Hi @gbitaudeau, thanks for the reply
We actually started with this inversed but it caused a retry storm. I think we can explore inversing it in certain places and making it smarter, ie. if the same URLError occurs more than x number of times, then don't retry.
This is difficult for the client (Datastore) to handle if the server is sending back Unauthorized instead of something like TokenExpired error. I'm looking to see if I can cover most use cases by checking the expiry of the token before we send the request, or checking the expiry if we sent the request and received unauthorized. If it was due to an expired token, then we can keep the mutation event and retry.
If you sign the user out, can you also stop DataStore? This will prevent the mutations from being sent unncessarily if you don't expect any requests to be sent with a public auth type like API key. And then start DataStore after the user has signed in successfully?
I'm working on some additional checks to further categorize the mutation event failures as retryable (checking the expiry of the token, and checking if the error is due to the user signed out). However, if on the off chance, even after these changes, if you can actually determine that the mutation is one that you want to keep, you can perform the mutation against AppSync directly though |
Good point, after checking, I don't know why but we forgot to stop the datastore when the user is signed out for whatever reason. I will add this on our side, thanks.
This could be done, but it might need quite a big amount of work ... May be a nice feature on your side could be either :
Actually, to "replay" mutation we follow this flow :
|
#3487 further improves the retryability determination and has been released in https://github.com/aws-amplify/amplify-swift/releases/tag/2.25.6 Please give this update a try. @gbitaudeau Could you open a feature request for that? I'll look into flushing out the details. Feel free to reference back to this issue to keep the context. We'd like to resolve this bug since the root cause has been fixed and released. For replaying the mutation, in flow 3, do you replay all missing creates / updates / deletes against the local database by calling DataStore.save/delete? |
@lawmicha did you test this scenario ? |
I didn't test this one, it looks like it simplifies to a sequence of calls in this order that may cause a problem:
Did you have the error log from the errorHandler callback? cc @thisisabhash / @5d |
I've re-opened this issue to track Case 3, @thisisabhash had attempted earlier but could not reproduce even with the sample code. If you can reproduce, can you try again on my branch? |
Hi @lawmicha , Using your branch doesn't fix the issue. Here the log I had (using the sample code for
The error handler (in
In this log, the error appears for the first comment, but if I try several times, the comment in error is not always the first. |
Thanks for the logs @gbitaudeau, I noticed in the sample code, the // Code from Case 3 code sample
let post = Post(id:"post-\(j)-\(i)", title: "post-\(j)-\(i)", blog: blog)
let comment1 = Comment(id:"comment1-\(j)-\(i)", post: post, content: "Content-1-\(j)-\(i)") Add uniqueness to the id generation such as Can you dig into the state of the comment with id |
I was able to reproduce this (with the non-unique ids sample code from Case 3). Here's the state of the object directly querying AppSync. DataStore's create mutation for
Based on the state of these objects, it appears the post was not saved in DynamoDB, create mutation for comment was successful in creating a comments that points to a post which does not exist. The second query with the comment and nested post fields error because the call is looking to resolve the post from the Post table in dynamoDB but it does not exist |
@lawmicha , I understand your point about id uniqueness, but I cleanup everything before each test (local Datastore and DynamoDB). I used I changed the code to use UUID and got the same error:
Here you can find logs, and a Charles proxy file ( https://www.charlesproxy.com ) Here the dynamo table contents: |
Oh sorry, I missed that |
Just a short update, when I breakpoint into the error handler for the createComment error, I was able to check the local DB contents using https://sqlitebrowser.org/ and can see that both the post and comment's mutation events are there. What's unexpected is they both have an InProcess state of true. This means, the post mutation event was picked up (marked as InProcess true) and got interrupted by DataStore.stop. When DataStore.starts, it should start off by marking them all as false before picking up the oldest, which should have been the post MutationEvent. There's a bug in which the comment mutation event is being picked up because the post mutation event is hidden from the query due to it already having InProcess state of true. |
Updated the branch |
Hey @gbitaudeau, some more testing on my end. I ended up spliting the sample code in Case 3 into three sections The init() {
Task {
try await AmplifyConf.instance.start()
}
} // In AmplifyConf
func start() async throws {
Amplify.Logging.logLevel = .verbose
let datastoreConfig = DataStoreConfiguration.custom(errorHandler: { error in
var dumped = String()
dump(error, to: &dumped)
print("[lawmicha] ERROR \(error)")
})
try Amplify.add(plugin: AWSDataStorePlugin(modelRegistration: AmplifyModels(),
configuration: datastoreConfig))
try Amplify.add(plugin: AWSAPIPlugin(modelRegistration: AmplifyModels()))
try Amplify.configure()
} A button to start DataStore without running the test:
and the test itself:
// Updated `testStartStopLostMutation()
func testStartStopLostMutation() async {
do {
print(">>> wait blog")
let blog = try await subscribeToBlog()
let j = Int16.random(in: 0...Int16.max)
for i in 0...100 {
print(">>> for n°\(i)")
let post = Post(id:"post-\(j)-\(i)", title: "post-\(j)-\(i)", blog: blog)
let comment1 = Comment(id:"comment1-\(j)-\(i)", post: post, content: "Content-1-\(j)-\(i)")
let task = Task {
Task.detached {
try await Amplify.DataStore.stop()
try await Amplify.DataStore.start()
}
Task.detached {
try await Amplify.DataStore.save(post)
try await Amplify.DataStore.save(comment1)
}
}
let _ = await task.result
}
} catch {
print ("ERROR: \(error)")
}
} In the code above, i removed the wait for outbox to be empty so it should actually be a greater stress test. Just starting DataStore, without clearing out DynamoDB, will get some errors back on the errorHandler because of some of my previous runs were against the code with the bug, so there are a bunch of comments pointing to non-existent posts for me. This data can be cleaned up so it doesn't get pulled in anymore. Can you confirm the error is coming back is during create mutations, and not during a Running the "Test" button is giving me a new error with |
That's why I cleanup everything before each test ;)
Yes I can confirm this point: the error occurs on a About |
Other than |
Sorry if I wasn't clear enough but with latest
|
If this can help, when I breakpoint into the error handler for the createComment error, like you, I was able to check the local DB contents using https://sqlitebrowser.org/ and can see that only the comment's mutation events is there. |
Thanks for the details, I added back the wait for outbox code to match exactly Case 3, and can see that one of the post's mutation event isn't being created
In this instance, |
Hey @gbitaudeau, I've updated the PR with some changes to actually do what I had intended it to do. This will result in a failure thrown when saving the post, so it won't continue onward to save the comment. In the sample code, I've updated to catch and log this: Task.detached {
do {
try await Amplify.DataStore.save(post)
try await Amplify.DataStore.save(comment1)
} catch {
print("Failed to save: \(error)")
}
} My breakpoint on the errorHandler is no longer getting called since the comment never gets saved after the failure from saving the post. Please go ahead and try the latest from |
Hi @lawmicha , all tests are ok on my side, failure on save is nice in this case. Thank you very much. During this test I can see that there are memory leaks on web sockets: each This memory leak might be better tracked by another issue but as the code to easily reproduce it is here, I let you organise this as you want. |
Thanks for confirming your tests are working. I'll keep this issue open until my PR is merged and released. In terms of the Starscream dependency and memory leak patch, i'll create two new issues to track it, thanks for the details! |
The fix in #3492 has been released in 2.26.3 https://github.com/aws-amplify/amplify-swift/releases/tag/2.26.3 |
|
Describe the bug
If during initial sync, AppSync responds with some items on error, these items are
null
initems
response array. The other items which are not in error are not saved in datastore. Here a redacted result for sync Model named Draw with an error on one item:Steps To Reproduce
Expected behavior
DataStore's initial sync should reconcile partial GraphQL responses by saving the items in the local Database and emitting the errors to the errorHandler.
Amplify Framework Version
2.19.0
Amplify Categories
DataStore
Dependency manager
Swift PM
Swift version
5.9
CLI version
12.4.0
Xcode version
15.0
Relevant log output
No response
Is this a regression?
No
Regression additional context
No response
Platforms
iOS
OS Version
iOS 17.0.1
Device
iPhone
Specific to simulators
No response
Additional context
See discussion #3232 (comment)
The text was updated successfully, but these errors were encountered: