Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-robust handling of HA server disconnection #26

Open
KeithSBB opened this issue Dec 23, 2023 · 12 comments
Open

non-robust handling of HA server disconnection #26

KeithSBB opened this issue Dec 23, 2023 · 12 comments

Comments

@KeithSBB
Copy link

KeithSBB commented Dec 23, 2023

My HA server occasional performs automatic software updates in the early morning and then reboots. this breaks the connection to wyoming-satellite which then goes nuts producing many errors in its log files.:

run
8:30 AM
await writer.drain()
run
8:30 AM
File "/home/mycroft/wyoming-satellite/.venv/lib/python3.9/site-packages/wyoming/event.py", line 129, in async_write_event
run
8:30 AM
await async_write_event(event, self._writer)
run
8:30 AM
File "/home/mycroft/wyoming-satellite/wyoming_satellite/satellite.py", line 128, in event_to_server
run
8:30 AM
Traceback (most recent call last):
run
8:30 AM
ERROR:root:Unexpected error sending event to server
...

Sometimes it reconnects, other times it just hangs and I have to restart wyoming-satellite manually.

I'm running both wyoming-openwakeword and wyoming-satellite as services on a RP4. I also noticed that when wyoming-satellite hangs due to the server disconnecting I can't simply restart it. I must shutdown wyoming-openwakeword first (which forces a shutdown or wyoming-satellite) and then start wyoming-satellite (which will start up wyoming-openwakeword)

@henne49
Copy link

henne49 commented Jan 2, 2024

not sure it will help you, but the wakeword service was not started for me, after issuing this command, it worked. As the wakeword service is now started before the satellite

sudo systemctl enable --now wyoming-openwakeword.service

@KeithSBB
Copy link
Author

KeithSBB commented Jan 2, 2024

henne48: When my HA server updates itself and reboots the connection to my rp4 wyoming-satellite is broken and never reconnects. Both the wyoming-satellite and wyoming-openwakeword services are still running. I have to manually restart them to regain the connection to HA. So your suggestion doesn't apply in this case. A more robust solution would be code added to reconnect to HA if the connection is lost.

@henne49
Copy link

henne49 commented Jan 2, 2024

Fully agreed, but I simply restart the raspi to fix in the meantime, but that did not work, as the openwakeword was not starting properly after a reboot.

@KeithSBB
Copy link
Author

KeithSBB commented Jan 3, 2024 via email

@KeithSBB
Copy link
Author

KeithSBB commented Jan 20, 2024

I just update my installation to 1.1.1 wyoming 1.5.2 and I'm still seeing wyoming-satellite failing to reconnect after the HA server reboots (automatically upon software updates).
This is a real pain as I must manually shutdown wyoming-openwakeword (which stops wyomig-satellite) and then start wyoming-satellite (which starts wyoming-openwakeword)

Here's the wyoming-satellite log:


9:34 AM
ConnectionResetError: [Errno 104] Connection reset by peer
run
9:34 AM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
run
9:34 AM
data = self._sock.recv(self.max_size)
run
9:34 AM
File "/usr/lib/python3.11/asyncio/selector_events.py", line 995, in _read_ready__data_received
run
9:34 AM
await self._waiter
run
9:34 AM
File "/usr/lib/python3.11/asyncio/streams.py", line 522, in _wait_for_data
run
9:34 AM
await self._wait_for_data('readuntil')
run
9:34 AM
File "/usr/lib/python3.11/asyncio/streams.py", line 637, in readuntil
run
9:34 AM
^^^^^^^^^^^^^^^^^^^^^^^^^
run
9:34 AM
line = await self.readuntil(sep)
run
9:34 AM
File "/usr/lib/python3.11/asyncio/streams.py", line 545, in readline
run
9:34 AM
^^^^^^^^^^^^^^^^^^^^^^^
run
9:34 AM
json_line = await reader.readline()
run
9:34 AM
File "/home/mycroft/wyoming-satellite/.venv/lib/python3.11/site-packages/wyoming/event.py", line 79, in async_read_event
run
9:34 AM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
run
9:34 AM
event = await async_read_event(self.reader)
run
9:34 AM
File "/home/mycroft/wyoming-satellite/.venv/lib/python3.11/site-packages/wyoming/server.py", line 31, in run
run
9:34 AM
Traceback (most recent call last):
run
9:34 AM
future: <Task finished name='wyoming event handler' coro=<AsyncEventHandler.run() done, defined at /home/mycroft/wyoming-satellite/.venv/lib/python3.11/site-packages/wyoming/server.py:28> exception=ConnectionResetError(104, 'Connection reset by peer')>
run
9:34 AM
ERROR:asyncio:Task exception was never retrieved
run
9:34 AM
INFO:root:Disconnected from server

@Rafaille
Copy link

Rafaille commented Jan 25, 2024

I just update my installation to 1.1.1 wyoming 1.5.2 and I'm still seeing wyoming-satellite failing to reconnect after the HA server reboots (automatically upon software updates). This is a real pain as I must manually shutdown wyoming-openwakeword (which stops wyomig-satellite) and then start wyoming-satellite (which starts wyoming-openwakeword)

As a workaround you can create an automation in HA to remotely restart services on your satellite right after HA updates.

@KeithSBB
Copy link
Author

KeithSBB commented Jan 25, 2024

[Rafaille]

As a workaround you can create an automation in HA to remotely restart services on your satellite right after HA updates.

That sounds like a good idea, but I'm not sure how to remotely restart a service from within a HA 'start' automation.
Can you provide a little more details to point me in the right direction?
Thanks,

Update: added

shell_command:
   restart-satellite: sudo systemctl --host [email protected] restart wyoming-satellite

and i then called shell_command.restart-satellite from a HA startup automation. It doesn't work yet because of passwords and other security issues, but I think this is the right approach.

@KeithSBB
Copy link
Author

KeithSBB commented Jan 26, 2024

No Lucking in getting the restart wyoming-satellite service automation upon HA start-up to work as suggested by [Rafaille].

I created key pairs for ssh. It works fine in a HA terminal, but gives "Host key verification failed." when run from the automation (or service call).
No success so far in figuring out why.
This is a still a serious issue for me.

@Rafaille
Copy link

No Lucking in getting the restart wyoming-satellite service automation upon HA start-up to work as suggested by [Rafaille].

I created key pairs for ssh. It works fine in a HA terminal, but gives "Host key verification failed." when run from the automation (or service call). No success so far in figuring out why. This is a still a serious issue for me.

You are indeed on the right track, I had the same issue.
You just need to add a couple of arguments. This is the shell command I use:

ssh -o StrictHostKeyChecking=no -i /config/.ssh/id_xxxxxxx [email protected] -tt "sudo systemctl restart wyoming-satellite.service"

Just replace the x with your config and make sure that the user you login as has sudo privileges

@KeithSBB
Copy link
Author

Adding -o StrictHostKeyChecking=no is usually considered to be a bad approach, but I tried it and along with explicitly specifying the key "-i /config/.ssh/id_rsa" it works. (I had to move id_rsa from under /root to /config)

Since I created rsa key pairs I tired it again without '-o StrictHostKeyChecking=no' and that works too!

I guess the problem was that automation runs under a different user than root while the terminal is root?

anyway, thanks for you help.

@Rafaille
Copy link

Yes sorry I forgot to mention that I had to move the key file. Good point about StrictHostKeyChecking not being necessary, I will take it out my command as well, just for good measure.
Happy that it works for you, although a permanent fix will be welcome eventually ;)

@Mincka
Copy link

Mincka commented Jun 3, 2024

In my case, I tried the remote restart of the service, but sadly, it does not help. Same error than @KeithSBB
The error is still there when I wake the satellite remotely (thanks to rhasspy/wyoming#10 and #144).
However, what seems to work is a remote reboot of the device... It will do the trick while we hope for a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants