-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to reprocess records from previous version #5581
Comments
I'm assigning it to @saig0 for now for myself so I can track who's working on it, and that there is someone working on it. |
Once we start on an issue breakdown, please create a milestone instead and close this issue (and reference the milestone). |
Just to add a bit more findings on my side: The issue can happen not just while upgrading, but also during a simple restart of a broker. I came across it in
|
@strawhat5 thanks for sharing 👍 I'm not aware of this bug. It seems not directly related to this issue. Please open a new issue if you've more information about it or how it can be reproduced. |
It is related to #5251 as we have the same parallel workflow deployed here as in #5251. In this case though, I did not perform any version upgrade, one of the broker pod randomly restarted and went into an inconsistent state during reprocessing. I was just emphasizing on this line, that the reprocessing failure can happen even during a normal restart:
|
The new concept of the workflow processing is described here: ZEP 004 |
Describe the bug
When upgrading Zeebe to a new version then it can happen that a partition fails to start and stays unhealthy after the upgrade.
The issue is caused by a conceptional problem in the reprocessing. On reprocessing, the broker restores/rehydrates the data (i.e. RocksDB) by reading the records on the log stream and do the processing again (without writing any follow-up record). If the behavior of the workflow engine changes in the new version (e.g. during a non-user-facing refactoring, or a bug fix) then it may write different data in the state on reprocessing that doesn't match to the records on the log stream. As a result, the state may be corrupted, or the records are not reprocessed (i.e. preconditions are not fulfilled).
The issue can be omitted if a snapshot is created before. If there are no new records processed after the snapshot then no reprocessing is performed.
To Reproduce
See #5251, #5268, #5393
Expected behavior
I can upgrade Zeebe to a new version and continue with my existing data.
Log/Stacktrace
See the linked issues.
Environment:
0.24.2
,0.25.0-SNAPSHOT
The text was updated successfully, but these errors were encountered: