-
Notifications
You must be signed in to change notification settings - Fork 53
Fix #223 - Drop vector time when resource is removed #714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I am still not completely happy with this as the set of deleted resources is never cleared (unless the resource is recreated), meaning it continuously grows. Furthermore, we store references that might not be needed anymore as all participants might already have executed the deletion/move activity. I only see two options for this:
Pruning seems easier, but ACKs would be a nicer solution as it guarantees that all activities are handled correctly (which could be an issue if we prune to early). So I think I am currently leaning towards adding an ACK. @Drakulix @m273d15 @srossbach Any opinions? |
|
I updated my comment yesterday regarding the issue. You cannot simply delete something. If the vector time mismatch the Watchdog is unable to detect inconsistencies. As an example. When you perform a recovery (which includes a reset on the client as well on the server side) the whole session is locked/blocked, i.e nobody can edit files and modify something in the editor. As you are primary testing under IntelliJ and your plugin only supports 2 people you MUST NOT assume that if something works for n = 2 users that this will work for n > 2 users. |
core/src/saros/concurrent/management/ConcurrentDocumentClient.java
Outdated
Show resolved
Hide resolved
core/src/saros/concurrent/management/ConcurrentDocumentClient.java
Outdated
Show resolved
Hide resolved
|
So after taking a quick look you changed the logic to: When I see a create / delete activity the specific document gets a state which decide if an activity is applied. If you are not in that state do the old logic and create documents on the fly ? |
Yes. When a file is deleted, before the vector time is reset, it is added to the set of deleted resources. This happens for the server and the client whenever they receive a deletion/move activity. Whenever the server or client receives an Resources are removed from the set when they are recreated, either through a file creation or a file move. This ensures that activities will be processed again for the newly created resource. And yes, I did not change the old logic, so jupiter documents are still created on the fly. But due to the filter, it should not matter. |
9b4ea28 to
e97fe2a
Compare
While I like the overall approach, this remaining problem needs to be solved. Otherwise, we are introducing a new memory leak to fix an issue, that was mostly caused by a fix to an even worse leak. (But reading through the patch, this is something any activity could actually trigger, it is just much more likely with the HeartbeatDispatcher, right?) I like the ACK-solution, as it is cleaner, but I think the prune is more robost. With the ACK we could keep entries in cases, where we miss the ACK (e.g. an unexpected disconnect). One can absolutely argue, that this is negligible. I guess I am fine with either solution. |
@Drakulix No, this issue is independent from the hearbeat stuff. It just has the same solution.
This issue is that this causes a desync when subsequently recreating a deleted resource as the vector times no longer match. The issue for #710 is that the participant that deleted the resource still sends heartbeats for the deleted resource as the local jupiter document was never reset. And they also receive heartbeats from other clients that they can't process. Both issues are solved by this patch as it adjusts the behavior to also drop the vector time on outgoing deletion activities and adds a filter for straggling activities. |
e97fe2a to
2a5a0fc
Compare
|
Updated the change by introducing |
|
I am not sure whether adding ACKs is really the best solution. I implemented it and it seems to work (but I only did very limited testing with 2 participants; but that is not the point I am trying to make), but it adds a lot of logic (about 220 lines of code before cleanup and adding javadoc), a new activity type, and a lot of network traffic (as each user has to send an ACK for every other participant besides the creator of the deletion activity). Furthermore, as this adds to the concurrent nature of the problem, it is annoying to debug and to ensure that it correctly works for multiple clients. The prune solution on the other hand needs about 70 LOC and it ensures that the list of deleted resources is always cleared. I still think that using ACKs is a more elegant solution, but I am still pondering on whether it is worth the hassle (testing troubles and increased code complexity, which makes it harder to maintain). |
|
The current approach only looks at file activities. This is problematic when a folder is deleted as the vector time of the contained files will not be reset. However, this issue can not be solved with the current logic as the resources will already have been deleted on the creator's side, meaning we can't access the child resources. The only solution I can think of is to require the creator of the activity to always completely tear down deleted folders (i.e. create deletion activities for all contained resources that are dispatched before the actual folder is deleted). This is currently not handled in such a way in Saros/I, but there is already a TODO for it (as this is also needed for other cleanup stuff) and its a relatively easy change. In Saros/E, the logic in Furthermore, if this is now a requirement for folder deletions, this should be put in writing somewhere. Probably the javadoc of the folder deletion activity. |
We have no guideline on how to send file deletions when they are triggered from another delete operation (e.g delete a folder). Unless this is specified somewhere I would not make any assumption which data in which order you get. E.g you could just send a folder delete activity, but I guess this would blow up the partial sharing as it also have to keep track of the current resources. |
That is what I was fearing. I would prefer the prune solution in that case. |
@srossbach I know that we currently don't have any guidelines on this. My question was rather whether my understanding of the Eclipse handling (i.e. that child resources are explicitly handled) is correct. From looking at the code, it looks that way to me, but I am not familiar with the details of the Eclipse As I see no other way of handling this case, I would just make this explicit handling of child resources a new guideline. |
|
Hmmmm... a big problem with the prune/cleanup solution is that we don't have a timeout for clients. In other words, it is completely acceptable for a client to be stuck for a while and the process everything that happened in the meantime without causing a desync. An example of this happening in a normal session is the developer executing some blocking actions (e.g. opening the settings menu in IntelliJ) that block the IDE execution thread. This blocks the execution of received activities (like file deletions). This makes it basically impossible to set a cleanup window (i.e. after what time to throw away the deletion filter for a particular resource) that is guaranteed to work. |
7aafddf to
6441a56
Compare
|
Rebased onto current master without any interaction. |
6441a56 to
9fa3477
Compare
|
Rebased onto current master without any interaction. |
9fa3477 to
8ed15b5
Compare
|
Rebased onto current master without any interaction. Also added the commit |
|
Man, this PR grew a lot from its initial size (mostly because of the addition of the deletion ACK). Now that it has been tested multiple times in a multi-user session and the found issues were resolved, I think it is time to again start reviewing this PR. @Drakulix @m273d15 @srossbach @stefaus Can I interest anyone of you to again review this PR? As a note for the review: The filter logic is located in `ResourceActivityFilter#isFiltered(IActivity). I went through all existing activities and checked whether they were important for the teardown, but maybe I missed something. So if you see any other possible issues with needed activities being filtered that I might have missed (i.e. that are not handled correctly by the logic), please say so. |
|
Issues
======
+ Solved 1
Complexity increasing per file
==============================
- core/src/saros/concurrent/management/ConcurrentDocumentClient.java 1
- core/src/saros/activities/DeletionAcknowledgmentActivity.java 1
- core/src/saros/session/internal/DeletionAcknowledgmentDispatcher.java 3
- core/src/saros/concurrent/management/ResourceActivityFilter.java 7
- core/src/saros/concurrent/management/ConcurrentDocumentServer.java 1
See the complete overview on Codacy |
|
It seems like there is no interest in reviewing (or re-reviewing) this PR. (I want to make very clear that this is not meant as any form of criticism, just an observation! I know this PR isn't easy to review and a lot of work went into the review process already.) But, as I would still like to move the process along, after conferring with @m273d15, I have decided to proceed with merging this PR without a complete review unless anybody voices objections. I have now tested these changes many times and have removed any outstanding issues. I have also re-run the STF to confirm that all tests (besides the known faulty ones) pass. I feel like I have done my due diligence on that front. If you don't want that to happen, you have at least till Monday evening (the 28th of January 2020) to either voice some objections/concerns or at least declare your intentions of reviewing this PR (in a somewhat timely manner 😉). Otherwise, I will merge this some time next week. |
Adjusts ConcurrentDocumentClient and ConcurrentDocumentServer to drop the vector time for a resource when it is deleted (or moved, thereby "deleting" the resource with the previous path). Adds DeletedResourceFilter to encapsulate the logic of detecting and correctly handling resource deletions. Furthermore, it provides a way of filtering out subsequent activities for deleted resource. This is needed as there can be straggling activities for deleted resources that were created by remote participants before they executed the deletion activity for the resource. Such activities are now filter out until the corresponding resource is recreated. This ensures that activities for the new resource will still be processed correctly. Checksum activities confirming the file deletion are explicitly excluded from this filer (i.e. they will not be dropped for deleted resources) as they are needed to check the consistency of all clients. Fixes #223. Also fixes #710.
Adds DeletionAcknowledgmentActivity, which can be used to acknowledge that a file was successfully deleted locally as a reaction to a received activity. This acknowledgment allows us to drop the activity filter for the deleted resource once all other session participants have acknowledged that they received the deletion. Adds the DeletionAcknowledgmentDispatcher to automatically dispatch an acknowledgment when an activity is received that contains a file deletion.
Uses received deletion acknowledgment activities to determine when all other participants have received the deletion, allowing the local instance to drop the activity filter for the deleted resource.
Adjusts the wording in the activity filter to reflect that it is specifically dealing with files and not resources in general.
Adjusts ResourceActivityFilter.isFiltered(IActivity) to also never filter out editor activities of the type "CLOSED". This is necessary as such activities are used to clean up the user editor state. Adjusts the wording of the javadoc for isFiltered(IActivity). Adjusts code in ResourceActivityFilter to always uses the full qualifier when using a variable of the type 'FileActivity.Type'. This was done to make them easier to differentiate from the newly introduced usage of variables of the type 'EditorActivity.Type'.
Adds information on how to handle deleted child resources to FolderDeletedActivity. Adds notes that activities might be filtered out to the corresponding methods in ConcurrentDocumentServer and ConcurrentDocumentClient.
a5df054 to
eab54ff
Compare
|
Rebased onto current master without any interaction in preparation of merge. |
[CORE] Improve exception messages for jupiter preconditions
[FIX][CORE] #223 Drop vector time when resource is removed
Adjusts ConcurrentDocumentClient and ConcurrentDocumentServer to drop
the vector time for a resource when it is deleted (or moved, thereby
"deleting" the resource with the previous path).
Adds DeletedResourceFilter to encapsulate the logic of detecting and
correctly handling resource deletions.
Furthermore, it provides a way of filtering out subsequent activities
for deleted resource. This is needed as there can be straggling
activities for deleted resources that were created by remote
participants before they executed the deletion activity for the
resource. Such activities are now filter out until the corresponding
resource is recreated. This ensures that activities for the new
resource will still be processed correctly.
Checksum activities confirming the file deletion are explicitly excluded
from this filer (i.e. they will not be dropped for deleted resources) as
they are needed to check the consistency of all clients.
Fixes #223. Also fixes #710.
[NOP][I] Remove comments to reset vector time
[INTERNAL][CORE] Add deletion acknowledgments
Adds DeletionAcknowledgmentActivity, which can be used to acknowledge
that a file was successfully deleted locally as a reaction to a received
activity. This acknowledgment allows us to drop the activity filter for
the deleted resource once all other session participants have
acknowledged that they received the deletion.
Adds the DeletionAcknowledgmentDispatcher to automatically dispatch an
acknowledgment when an activity is received that contains a file
deletion.
[INTERNAL][CORE] Use deletion acknowledgments to drop activity filters
Uses received deletion acknowledgment activities to determine when all
other participants have received the deletion, allowing the local
instance to drop the activity filter for the deleted resource.
[REFACTOR][CORE] Adjust wording in activity filter
Adjusts the wording in the activity filter to reflect that it is
specifically dealing with files and not resources in general.
[FIX][CORE] Exclude editor closed activities from filter
Adjusts ResourceActivityFilter.isFiltered(IActivity) to also never
filter out editor activities of the type "CLOSED". This is necessary as
such activities are used to clean up the user editor state.
Adjusts the wording of the javadoc for isFiltered(IActivity).
Adjusts code in ResourceActivityFilter to always uses the full qualifier
when using a variable of the type 'FileActivity.Type'. This was done to
make them easier to differentiate from the newly introduced usage of
variables of the type 'EditorActivity.Type'.
[DOC][CORE] Add documentation on how to handle deleted child resources
Adds information on how to handle deleted child resources to
FolderDeletedActivity.
Adds notes that activities might be filtered out to the corresponding
methods in ConcurrentDocumentServer and ConcurrentDocumentClient.