Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element shows session errors after being restarted during opening spinner #18625

Closed
ell1e opened this issue Aug 18, 2021 · 39 comments
Closed

Element shows session errors after being restarted during opening spinner #18625

ell1e opened this issue Aug 18, 2021 · 39 comments
Labels
O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Critical Prevents work, causes data loss and/or has no workaround T-Defect X-Needs-Info This issue is blocked awaiting information from the reporter Z-Platform-Specific

Comments

@ell1e
Copy link

ell1e commented Aug 18, 2021

Steps to reproduce

  1. Sometimes Element gets stuck on an opening spinner. This in itself is a usability problem, because sometimes it recovers after half a minute and sometimes it's stuck for half an hour, and there is no progress indicator whatsoever. So that sucks.
  2. Since it gets stuck so often, if the spinner takes longer I often just close it and reopening it on a wild guess that maybe it is stuck. This happens every 2nd launch.
  3. Now after just doing this it prompted me that the session data is missing. That's not too good, especially since I assume this can cause irrevocably lost messages, due to the brittle design around key exchanges with no reprompting possible for other sessions if they were just sent to the dead session from a sender even when the sender session is still online and even when all receiver sessions are signed

What happened?

spinner gets stuck a lot. zero useful indications on progress. restarting element due to this suddenly wiped my session data, telling me it is now "missing" and I'm prompted with a blank login.

What did you expect?

spinner has proper progress indicator and session data isn't ruined if i quit early

Operating system

Linux

Application version

1.7.33

How did you install the app?

flatpak x64

Have you submitted a rageshake?

No

@ell1e ell1e added the T-Defect label Aug 18, 2021
@robintown robintown added O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Critical Prevents work, causes data loss and/or has no workaround X-Needs-Investigation labels Aug 18, 2021
@dbkr
Copy link
Member

dbkr commented Aug 18, 2021

Wow, that's pretty terrible. Unfortunately I think debug logs are the only way we're going to be able to find out what's going on here: I'm fairly sure this isn't a common problem.

@novocaine novocaine added X-Needs-Info This issue is blocked awaiting information from the reporter and removed X-Needs-Investigation labels Aug 25, 2021
@ilka-schulz
Copy link

ilka-schulz commented Aug 30, 2021

This definitely is absolutely terrible! It just happened to me, too, and I am furious about such bad design. I lost all my conversations!
Sorry for the harsh tone, though, but it really is bad and completely unexpected.

My system:

  • Debian 10 on Qubes OS 4.1.0 beta1
  • element-desktop 1.8.1 installed via https://packages.riot.im/debian default/main amd64 Packages
  • just uploaded a rageshake

@ilka-schulz
Copy link

Is there any workaround for now? I have decided I will ditch Element for personal use immediately (back to email with PGP – I guess it is the only channel that never failed me) but I need Element for work.

  • The "Restore from Backup" says it restored some keys but I still cannot decrypt anything.
  • Would it make sense to run a second Element instance on another device and let is just sit around "gathering keys"? I have only on physical device that connects to the internet but it runs Qubes OS so I could make it run another VM with Element in it.

@yajo
Copy link

yajo commented Oct 6, 2021

Some more details about this problem:

  1. This happened to me like 5-6 times during last month (never happened before... 🤷🏼‍♂️)
  2. When it happens, I cannot send logs. It says: "Error sending logs: No connected database". See it:
    imagen
  3. I'm using https://flathub.org/apps/details/im.riot.Riot
  4. Whenever I have to log in again, Element loses all its settings (notifications, theme, message previews, etc., all go to defaults)

The workaround I found is to have Element also installed on my phone, so when I have to log in again, it syncs keys from there.

@novocaine
Copy link
Contributor

Some more details about this problem:

  1. This happened to me like 5-6 times during last month (never happened before... 🤷🏼‍♂️)
  2. When it happens, I cannot send logs. It says: "Error sending logs: No connected database". See it:
    imagen
  3. I'm using https://flathub.org/apps/details/im.riot.Riot
  4. Whenever I have to log in again, Element loses all its settings (notifications, theme, message previews, etc., all go to defaults)

The workaround I found is to have Element also installed on my phone, so when I have to log in again, it syncs keys from there.

No connected database occurs when Element can't open an IndexedDB to write the logs to. There may be some error you can about both this, and your other problem, if you open the developer console (Ctrl+Shift+I) - would help us if you can try that next time it happens and post the results.

@novocaine
Copy link
Contributor

It might be relevant that both the OP and yajo are flatpak users, although maybe this is just becoming a common way of installing Element

@yajo
Copy link

yajo commented Nov 3, 2021

Yes, I use flatpaks everywhere I can. I installed from https://flathub.org/apps/details/im.riot.Riot

Today it happened again. I got an upgrade, then rebooted. Element was open, so it had to close immediately. After rebooting, I opened Element again, and it became dumb:

image

After a long time like this, I got the logs: vector-1635935322508.log

I closed it (Ctrl+Q). Then opened it again. I got this:

image
vector-1635935496886.log

Clicking on the "send us the logs" link, then clicking on "Download logs", fails with "No connected database...":

image
vector-1635935635995.log

Instead of clicking on "close session". I quit Element (Ctrl+Q) and open it back again. It still logs some IndexedDB connection errors:

image
vector-1635935739979.log

Now I just have to log in again 🤷🏼‍♂️

I hope this info helps diagnose the bug.

@novocaine
Copy link
Contributor

novocaine commented Nov 3, 2021

Your issue sounds identical to dexie/Dexie.js#271 (comment) which has a suggestion here

We are already doing this but it doesn't work out of the box in flatpak ..

So I've submitted a PR to the flatpak config based on what others have done. I don't have a linux environment to test on, so I don't know for sure if this works. If someone in this thread could test it, that would help.

@novocaine novocaine changed the title Element corrupted/lost all session data when closed during opening spinner Element corrupted/lost all session data when closed during opening spinner using flatpak Nov 3, 2021
@novocaine
Copy link
Contributor

novocaine commented Nov 3, 2021

Until that gets merged .. don't use flatpak, or if you must insist, try really hard not to open two copies of the app at the same time

@yajo
Copy link

yajo commented Nov 3, 2021

But I never opened more than 1 Element on parallel... Actually the problem always happens when I just have rebooted and logged in, so all is closed at that point.

@novocaine
Copy link
Contributor

I don't know what's causing the hanging causing you to restart the app in the first place - there's nothing in the logs - but the indexDB errors and session corruption occurred immediately after

I closed it (Ctrl+Q). Then opened it again

So, here I'm fairly confident the first process wasn't quite dead when you opened the second, causing the issue

@novocaine
Copy link
Contributor

novocaine commented Nov 3, 2021

If I were to take a guess at why it's apparently hanging after a version upgrade, I expect replacing the flatpak likely deletes Element's sync cache (due to flatpak sandboxing all app data), and so your next sync is from cold, which unfortunately can take a very long time if you are in many rooms with a lot of history and/or the homeserver isn't the fastest.

@yajo
Copy link

yajo commented Nov 3, 2021

Sorry I meant I got an upgrade for my OS, not for the app; the app didn't update.

I expect replacing the flatpak likely deletes Element's sync cache

Where is that cache supposedly stored? Flatpak sandboxes stuff, but the app still has write permissions on the dirs needed to function.

So, here I'm fairly confident the first process wasn't quite dead when you opened the second, causing the issue

That can be true, I didn't check the task manager. I'll check next time.

I don't know what's causing the hanging causing you to restart the app in the first place - there's nothing in the logs

Any way to get that info next time? I guess that should be the main issue to diagnose.

@novocaine
Copy link
Contributor

novocaine commented Nov 3, 2021

Sorry I meant I got an upgrade for my OS, not for the app; the app didn't update.

Okay, then the problem lies elsewhere!

I don't know what's causing the hanging causing you to restart the app in the first place - there's nothing in the logs

Any way to get that info next time? I guess that should be the main issue to diagnose.

I would look at the developer tools console and check what network requests are being made to see if it's a client or server issue.

It would also be useful for debugging purposes to understand if the issue occurs more often with flatpak than without. Your issue does sound like this one, where the OP indicates that the problems only repro under flatpak.

@yajo
Copy link

yajo commented Nov 5, 2021

Today it happened again and it seems like the only requests being made are for piwik:

imagen

Following your suggestion, this time I:

  1. Closed element.
  2. Opened the task manager.
  3. There were some element-desktop processes still running, so I terminated them.
  4. Once terminated, I opened element.
  5. It resurrected!

So, you weren't so far with #18625 (comment)! Thank you, at least now I have a workaround. 😊

@novocaine
Copy link
Contributor

It's quite good to know that it does actually resurrect if you kill the processes and there's no permanent damage.

I'm moving this issue to upstream as it seems flatpak specific, lets continue this in flathub/im.riot.Riot#230

@novocaine novocaine changed the title Element corrupted/lost all session data when closed during opening spinner Element shows session errors after being restarted during opening spinner Nov 5, 2021
@SISheogorath
Copy link

@yajo as far as I can tell from your screenshots you seem to run on GNOME. I know that Element minimizes to tray by default which should be the process you see there. Since GNOME doesn't have a system tray, this process ends up to continue to run in background invisibly.

@Erick555
Copy link

Erick555 commented Nov 5, 2021

You may try disabling tray in app prefs to test if it helps.

@ilka-schulz
Copy link

You may try disabling tray in app prefs to test if it helps.

It's quite good to know that it does actually resurrect if you kill the processes and there's no permanent damage.

I'm moving this issue to upstream as it seems flatpak specific, lets continue this in flathub/im.riot.Riot#230

It sounds like no one is taking this bug as serious as it is. I would expect Element to implement some failsafe behavior. I mean I lost all my chat history due to this bug. Element should be able to recover chat history!

@SISheogorath
Copy link

SISheogorath commented Nov 6, 2021

@ilka-schulz You shouldn't be able to loose your chat history if you use the "Secure Backup" functionality, since all your session keys are stored encrypted on the homeserver. 👀

At least unless you say that's broken as well.

@ilka-schulz
Copy link

@ilka-schulz You shouldn't be able to loose your chat history if you use the "Secure Backup" functionality, since all your session keys are stored encrypted on the homeserver. eyes

At least unless you say that's broken as well.

Thank you, I will try that!

I have Element installed in a VM and I back that up frequently. However, restoring a previous snapshot of the VM did not work.

@novocaine
Copy link
Contributor

@ilka-schulz if the issue recurs, we need to understand if its a similar instance of the problem reported by other posters in this thread with similar root causes -

@novocaine
Copy link
Contributor

@yajo as far as I can tell from your screenshots you seem to run on GNOME. I know that Element minimizes to tray by default which should be the process you see there. Since GNOME doesn't have a system tray, this process ends up to continue to run in background invisibly.

hrm, I would have hoped the single instance lock would still be held if a process is in the tray..

@novocaine novocaine added the X-Needs-Info This issue is blocked awaiting information from the reporter label Nov 9, 2021
@pv
Copy link

pv commented Feb 1, 2022

Here, when this occurs (on flatpak Element; this bug still happens ~once a month), the spinner window looks a bit different from normal. Normally, there's a "Log out" text on the bottom of the window when the spinner is active, but in the failing case, there's just the spinner in the window but no other UI elements.

@novocaine
Copy link
Contributor

Here, when this occurs (on flatpak Element; this bug still happens ~once a month), the spinner window looks a bit different from normal. Normally, there's a "Log out" text on the bottom of the window when the spinner is active, but in the failing case, there's just the spinner in the window but no other UI elements.

Thanks for this report @pv - we still don't understand what's driving this, so if you could provide debug logs when it occurs, that will help us

@turt2live
Copy link
Member

Apologies if this is in the backlog already, but are there multiple clients running at the same time? (ie: close doesn't actually close/end all processes). This can lead to a lock on the storage, which makes the second instance unhappy and think it can recover.

See also: element-hq/element-desktop#840

@turt2live
Copy link
Member

Related: element-hq/element-desktop#819

@turt2live
Copy link
Member

is anyone still seeing this?

@ell1e
Copy link
Author

ell1e commented Jun 14, 2022

is anyone still seeing this?

Yes, pretty frequently. One sure way for me to get it is to not launch Element for a while, while being in a few busy group chat places (I am assuming it probably helps if some of them use E2EE), and then launch Element. It will sometimes get stuck, sometimes without even any loading indicator, for minutes. Now if during that phase in GNOME 3 you do "top/GNOME3 bar app bar menu" > "Quit Element" (since it is stuck, I don't actually know if it would get unstuck reliably during these phases since usually after some minutes I give up) then launch it again, there is a good chance it will complain afterward about corrupted data.

@turt2live
Copy link
Member

@ell1e that would be a classic case of element-hq/element-desktop#819 fwiw

@ell1e
Copy link
Author

ell1e commented Jun 14, 2022

I think the GNOME Shell "Quit ..." entry does something different than close window. Not sure what it is, but I think it usually won't leave stuff running? (Maybe it figures out the parent of the window and sends SIGTERM?) So I'm not entirely sure it's multiple parallel processes, might just be Element breaking when it's shut down during the wrong moment of starting up. Could also be that it's parallel processes though, I don't think I ever checked when that happened

@turt2live
Copy link
Member

Quit probably kills that instance but there is still a second one lingering somewhere else.

@turt2live
Copy link
Member

CLosing as there hasn't been a lot of activity here to suggest it's a recurring problem. If this is untrue, please open a new issue with fresh reproduction steps for our QA team to take a look at.

@ell1e
Copy link
Author

ell1e commented Jan 10, 2023

I don't know I still see this a lot. Reproduction steps is "use element flatpak a lot" 👀 I don't think anyone knows how to trigger it willingly

Edit: oops, I forgot this issue was about the corruption initially. I haven't actually seen that in a while but mainly because I do killall element-desktop all the time now. That seems to avoid that. But it still gets stuck a ton during launch after updates

@turt2live
Copy link
Member

If this is the flatpak, then please report it upstream to them: https://github.com/flathub/im.riot.Riot/

@ell1e
Copy link
Author

ell1e commented Jan 10, 2023

No idea honestly if it happens without the flatpak, since I never run it without one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Critical Prevents work, causes data loss and/or has no workaround T-Defect X-Needs-Info This issue is blocked awaiting information from the reporter Z-Platform-Specific
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants