Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: synchronization sometimes hangs while still reporting connected peers #256

Open
zvolin opened this issue Mar 25, 2024 · 3 comments
Open
Assignees

Comments

@zvolin
Copy link
Member

zvolin commented Mar 25, 2024

No description provided.

@zvolin zvolin changed the title Synchronization sometimes hang while still reporting connected peers Synchronization sometimes hangs while still reporting connected peers Mar 25, 2024
@zvolin zvolin changed the title Synchronization sometimes hangs while still reporting connected peers bug: synchronization sometimes hangs while still reporting connected peers Mar 25, 2024
@zvolin
Copy link
Member Author

zvolin commented May 17, 2024

I've identified 3 different issues causing this.

panic in libp2p-kad

The kbucket uses std::time::Instant sometimes triggering runtime panic. After that the node is in kinda undefined state, I saw it either stopping logging completely or remaining active only on libp2p-gossipsub but not syncing, not updating peers etc.
We managed to get the fix for this in scope of libp2p/rust-libp2p#5347. We'll either have to wait for the 0.54 release of libp2p or try to get a backport.

IndexedDb hanging on committing in append_single_unchecked

Happened to me on firefox. Switching from .commit() to .done() didn't solve this, it hung infinitely. This causes syncer to hang too, and also our UI updates as we await node.syncer_stats() which never resolves. The only thing that helped was clearing the whole Idb store. I didn't see it happening any time later, not sure what caused that.

header-ex request never resolved on libp2p level

With Sessions we make a few requests in parallel (8 currently) and retry the ones which errored out / provided incomplete ranges of headers. It sometimes happen that for a single request we never get any event from the request-response behavior. It should time-out, finish or error, but in this case we just don't get any event. The Session.run() then hangs on

        while self.ongoing > 0 {
            let (height, requested_amount, res) = self.recv_response().await;

with self.ongoing == 1 and syncer waits for it to finish. We could solve this by re-introducing timeouts by hand in our header-ex behavior, however it'd be good to know what's going on in libp2p. I spotted one place which could lead to this bug, but there should be log indicating it which wasn't present in my reproduction. Additional debugging is needed here.

@zvolin
Copy link
Member Author

zvolin commented May 17, 2024

I have a branch for debugging this here. It uses my fork of libp2p where I already backported the fix for the kbucket, but it's better to clone it locally, update patches to path based to be able to add logs manually in libp2p. It also has the bulk inserts into indexeddb implemented.

When debugging, run the node in chromium. There will be a lot of logs because it's on trace level. Wait for a 2-3 syncer batches to see if it reproduced. If it reproduced then wait half a minute or more, toggle the preserve logs button in dev tools and refresh page to stop the new logs from appearing to not flood you. You can then check the logs, they should persist the refresh. If it didn't reproduce, then just refresh and repeat, I found it hard to debug if I have more logs than from like 2-3 batches because there is a lot of them. Firefox has cool feature of saving all logs to the file too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants