Increase Tobira Worker Resilience When Opencast is Unreachable #1175

geichelberger · 2024-06-04T07:21:29Z

If Opencast becomes unreachable, the Tobira worker crashes and causes the systemd service to fail because of unsuccessful retry attempts. This circumstance can be caused by network outages or updates from Opencast.

The expected behavior would be for the worker not to exit, handle the error, and, importantly, continue running.

LukasKalbertodt · 2024-06-05T10:12:05Z

Can you give more details? The worker should in fact not fail when Opencast is down. I regularly look at three long-running Tobira systems and there the worker never failed because of an unavailable Opencast. It just prints errors to the log but recovers automatically. So you have to give me more details to reproduce your error state. What Tobira version? What exactly are you doing?

geichelberger · 2024-06-05T14:45:55Z

Log:

Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: Started Tobira Worker.
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.155 INFO  tobira >  Starting Tobira ~~ cli_args=["/opt/tobira/tobira", "worker", "-c", "/etc/tobira/config.toml"]
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.155 INFO  tobira >  Loaded config ~~ source_file="/etc/tobira/config.toml"
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.155 INFO  tobira >  Starting Tobira worker ...
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.171 INFO  tobira::db >  Connected to DB! ~~ server_version="15.3" user="tobira" session_user="tobira" schema="tobira" database="tobira"
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.176 INFO  tobira::db::migrations >  All migrations are already applied: database schema is up to date.
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.246 INFO  tobira::search >  Connected to MeiliSearch at 'https://oc-index-02.xyz:7700'
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.287 ERROR tobira >  error synchronizing with Opencast
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:                                      >  
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:                                      >  Caused by:
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:                                      >      0: failed to fetch API version
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:                                      >      1: API returned unexpected HTTP code 503 Service Unavailable (for 'https://xyz/tobira/version', authenticating as 'admin')
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: ▶▶▶ Error: error synchronizing with Opencast
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: Caused by:
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:  ‣ failed to fetch API version
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]:    ‣ API returned unexpected HTTP code 503 Service Unavailable (for 'https://xyz/tobira/version', authenticating as 'admin')
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: tobira-worker.service: Main process exited, code=exited, status=1/FAILURE
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: tobira-worker.service: Failed with result 'exit-code'.
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: tobira-worker.service: Scheduled restart job, restart counter is at 5.
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: Stopped Tobira Worker.

LukasKalbertodt · 2024-06-05T15:57:06Z

Oh so you are saying the worker cannot be started while Opencast is down? But a running work does not go down with Opencast. Yes?

geichelberger · 2024-06-05T18:52:45Z

Sorry, I should have been a little bit more precise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase Tobira Worker Resilience When Opencast is Unreachable #1175

Increase Tobira Worker Resilience When Opencast is Unreachable #1175

geichelberger commented Jun 4, 2024

LukasKalbertodt commented Jun 5, 2024

geichelberger commented Jun 5, 2024

LukasKalbertodt commented Jun 5, 2024

geichelberger commented Jun 5, 2024

Increase Tobira Worker Resilience When Opencast is Unreachable #1175

Increase Tobira Worker Resilience When Opencast is Unreachable #1175

Comments

geichelberger commented Jun 4, 2024

LukasKalbertodt commented Jun 5, 2024

geichelberger commented Jun 5, 2024

LukasKalbertodt commented Jun 5, 2024

geichelberger commented Jun 5, 2024