Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Tobira Worker Resilience When Opencast is Unreachable #1175

Open
geichelberger opened this issue Jun 4, 2024 · 4 comments
Open

Comments

@geichelberger
Copy link
Contributor

If Opencast becomes unreachable, the Tobira worker crashes and causes the systemd service to fail because of unsuccessful retry attempts. This circumstance can be caused by network outages or updates from Opencast.

The expected behavior would be for the worker not to exit, handle the error, and, importantly, continue running.

@LukasKalbertodt
Copy link
Member

Can you give more details? The worker should in fact not fail when Opencast is down. I regularly look at three long-running Tobira systems and there the worker never failed because of an unavailable Opencast. It just prints errors to the log but recovers automatically. So you have to give me more details to reproduce your error state. What Tobira version? What exactly are you doing?

@geichelberger
Copy link
Contributor Author

Log:

Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: Started Tobira Worker.
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.155 INFO  tobira >  Starting Tobira ~~ cli_args=["/opt/tobira/tobira", "worker", "-c", "/etc/tobira/config.toml"]
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.155 INFO  tobira >  Loaded config ~~ source_file="/etc/tobira/config.toml"
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.155 INFO  tobira >  Starting Tobira worker ...
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.171 INFO  tobira::db >  Connected to DB! ~~ server_version="15.3" user="tobira" session_user="tobira" schema="tobira" database="tobira"
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.176 INFO  tobira::db::migrations >  All migrations are already applied: database schema is up to date.
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.246 INFO  tobira::search >  Connected to MeiliSearch at 'https://oc-index-02.xyz:7700'
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.287 ERROR tobira >  error synchronizing with Opencast
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:                                      >  
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:                                      >  Caused by:
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:                                      >      0: failed to fetch API version
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:                                      >      1: API returned unexpected HTTP code 503 Service Unavailable (for 'https://xyz/tobira/version', authenticating as 'admin')
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: ▶▶▶ Error: error synchronizing with Opencast
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: Caused by:
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:  ‣ failed to fetch API version
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:    ‣ API returned unexpected HTTP code 503 Service Unavailable (for 'https://xyz/tobira/version', authenticating as 'admin')
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: tobira-worker.service: Main process exited, code=exited, status=1/FAILURE
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: tobira-worker.service: Failed with result 'exit-code'.
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: tobira-worker.service: Scheduled restart job, restart counter is at 5.
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: Stopped Tobira Worker.

@LukasKalbertodt
Copy link
Member

Oh so you are saying the worker cannot be started while Opencast is down? But a running work does not go down with Opencast. Yes?

@geichelberger
Copy link
Contributor Author

Sorry, I should have been a little bit more precise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants