Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zellij server stability discussion #1050

Closed
raphCode opened this issue Feb 10, 2022 · 16 comments
Closed

Zellij server stability discussion #1050

raphCode opened this issue Feb 10, 2022 · 16 comments
Labels
discussion Conversation to figure out actionable steps. Most feature ideas start here. stability Issues in relation to stability

Comments

@raphCode
Copy link
Contributor

If the server crashes, all programs and potentially unsaved work dissappear, which is frustrating to say the least. For me, it also wipes my mental context which is supported by opened programs and zellij workspaces.
I really love zellij and would like to improve it to never loose work again, so here are my thoughts:

  • Is there anything we can do to catch all/most crashing errors in the server? (I guess probably not)
  • Should we organize a tracking issue for refactoring all potentially crashing locations in the code, and stop adding new ones, if possible?
  • Should we tag issues about server crashes specially to help working on server stability?
Just for the record, the crash which led me to open this issue in frustration:

I was just opening a new Tab and got this, never-seen-before error.

Error occurred in server:
Originating Thread(s):
1. stdin_handler_thread: AcceptInput
2. pty_thread: NewTab
3. screen_thread: NewTab
4. plugin_thread: Load

Error: thread 'wasm' panicked at 'called `Result::unwrap()` on an `Err` value: Error("unknown variant `SystemClipboardFailure`, expected one of `ModeUpdate`, `TabUpdate`, `Key`, `Mouse`, `Timer`, `CopyToClipboard`, `InputReceived`, `Visible`", line: 1, column: 84)': zellij-server/src/wasm_vm.rs:482

Ironically, this happened to me while I was working on a pull request to fix some other zellij crashes I am experiencing quite often lately: #882

@jaeheonji
Copy link
Member

jaeheonji commented Feb 10, 2022

Is there anything we can do to catch all/most crashing errors in the server? (I guess probably not)

Yes, it is difficult to catch all errors at once as you think. and, also zellij has intentional panic code.
Of course, we have to solve unexpected errors. That's why we need an error reporting system that can help solve errors easily (#1038)

Should we organize a tracking issue for refactoring all potentially crashing locations in the code, and stop adding new ones, if possible?

I think it's a good idea to create a new label for "refactoring" or "stability" and track the issue. But, I don't think there's a need to stop adding new things. Because each person has a different view and interest for some feature or function. So, I think it is right to proceed with adding and refactoring at the same time. And if there is a conflict between the two, I think the best solution is communication.

Should we tag issues about server crashes specially to help working on server stability?

As above, Personally, I agree 100% on this 😄

Just for the record, the crash which led me to open this issue in frustration:
I was just opening a new Tab and got this, never-seen-before error.

main branches can always have errors until a new version is released.
As far as I know, we are currently working on this error. (maybe related to the clipboard system? @tlinford)

@jaeheonji jaeheonji added the discussion Conversation to figure out actionable steps. Most feature ideas start here. label Feb 10, 2022
@tlinford
Copy link
Contributor

@raphCode This specific error looks related to the changes in #996 and #1022, could it be the same issue as #1041? (Basically plugins not being updated correctly).

But in general, totally agree with you on improving stability, ideally we need to get to a point where crashes should basically never happen. The error reporting system is going to be a great start, and there's a lot of code that can be improved to properly handle errors instead of just unwrapping.

@imsnif
Copy link
Member

imsnif commented Feb 10, 2022

Also, just wanted to mention - regardless of why this happened or how to solve this: I'm super sorry about the experience! This is definitely very frustrating when it happens in the middle of your work. Glad to hear you want to take this in a positive direction.

I'm not sure whether you're running directly from main or not - but if you are I'd very much recommend not using main as your daily driver. Things can be a little unstable before we finalize them in a release.

@raphCode
Copy link
Contributor Author

raphCode commented Feb 11, 2022

Regarding the crash:
I am using zellij 0.24.0 from the official Arch repos for my daily driver, also the error above occured with it.
It happened to me twice now on opening a new tab.
I cleared my plugin directory, maybe that helps. If I find a reproducer, I will take it to a dedicated issue.

I think it's a good idea to create a new label for "refactoring" or "stability" and track the issue.

I think a single label stability or server-crash suffices for issues where crashes happen in the wild. Refactoring is just the prevention of possible crash locations in the code and seems to be better discussed in PRs?

That being said, I am willing to label/organize issues around server crashes, as well as looking for locations to refactor and trying to fix them.

But, I don't think there's a need to stop adding new things. Because each person has a different view and interest for some feature or function.

Let me clarify here: I did not mean to suppress new features, just new potential crash locations by using unwrap(), expect() etc in calls that might also fail gracefully.
It is a call for more graceful error handling in zellij-server when writing new code :)

This would ensure that only the already existing code needs to be checked and refactored, not newly add one.

A quick search for unwrap and friends in zellij-server yields 1900 matches, so in practice, I think the possible refactor locations should be limited to things that call outside of zellij - like syscalls and the plugin system.
The rest of the code should not crash, and if it does it is probably a logic bug anyways.

Glad to hear you want to take this in a positive direction.

Thank you and everybody else for making zellij, which I already like infinitely more than tmux or screen, despite some crashing bugs!

@raphCode
Copy link
Contributor Author

If I should label issues related to server stability, someone needs to grant me triage access in this repo.
Also, please create a label, since triage access does not include label creation. I would call it stability because there are non-crashing issues which negatively impact sessions (like #1063).

Alternatively, we can start organizing the issues in projects which also include a kanban style view.

@jaeheonji
Copy link
Member

If you want to organize issues related to stability, you can create a new issue like issue #280 and manage it as a list. (then I can pinned the issue) How about this?

@raphCode
Copy link
Contributor Author

raphCode commented Feb 23, 2022

Perfect, haven't thought of this!
Here we go: #1100

@imsnif
Copy link
Member

imsnif commented Feb 23, 2022

Thank you very much for doing this, @raphCode ! I plan on prioritizing those issues and this is really helpful.

@imsnif
Copy link
Member

imsnif commented Mar 7, 2022

Hey @raphCode - I'm gathering a list of stability issues to work on. Did clearing the plugin cash directory fix this in the end?

@raphCode
Copy link
Contributor Author

raphCode commented Mar 8, 2022

It did not happen again, I believe this might fixed it. But I cannot tell definitely - the crash was sporadic.

@imsnif
Copy link
Member

imsnif commented Mar 9, 2022

It makes a lot of sense that this fixed it. This happened due to a plugin API change (running old plugins on a new Zellij API).
Zellij has built-in protections for this (basically versioning the plugin cache) - so I'd be very curious to understand how this happened. IIRC you installed directly from Arch?

@a-kenji a-kenji added the stability Issues in relation to stability label Mar 9, 2022
@raphCode
Copy link
Contributor Author

raphCode commented Mar 9, 2022

IIRC you installed directly from Arch

Yes, that is correct.
From the infos in the linked issues I assume this happened when I test-run zellij compiled from source: It may write something into the plugin cache that the official releases don't like.

How do you manage a local development version alongside a "production" zellij?

@a-kenji
Copy link
Contributor

a-kenji commented Mar 10, 2022

How do you manage a local development version alongside a "production" zellij?

cargo make run does that for you. You could also run a script that runs the release with the --data-dir flag, or just set that flag manually. Or install the wrapper script to your path.

@raphCode
Copy link
Contributor Author

I think I only ever used cargo run, but maybe I did something strange otherwise...

@imsnif
Copy link
Member

imsnif commented Mar 10, 2022

I think I only ever used cargo run, but maybe I did something strange otherwise...

That's probably the issue. You need to do cargo make run (more details in CONTRIBUTING.md if you're interested)

@raphCode
Copy link
Contributor Author

Ah well, that explains it.
TIL about cargo-make.
I probably should check CONTRIBUTING.md more often on github projects :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Conversation to figure out actionable steps. Most feature ideas start here. stability Issues in relation to stability
Projects
None yet
Development

No branches or pull requests

5 participants