Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cyberpower stale commands #91

Closed
mattlward opened this issue Feb 11, 2021 · 47 comments
Closed

Cyberpower stale commands #91

mattlward opened this issue Feb 11, 2021 · 47 comments

Comments

@mattlward
Copy link

Problem/Motivation

Container file system not complete. After the upgrade to 0.6.1 my ups started going stale. I opened portainer and consoled into the nut container to edit /etc/nut/ups.conf: and add a pollinterval = 15. But, vi and/or nano are not in the container.

I did adjust my config this way and am waiting to see if it works.

users:
  - username: upsmonmaster
    password: H8NzOjV30g471PF6TZtr
    instcmds:
      - all
    actions: []
devices:
  - name: DR_UPS
    alias: drups
    driver: usbhid-ups
    port: auto
    pollinterval: 15
    config:
      - vendorid = 0764*
mode: netserver
shutdown_host: 'false'
log_level: debug
shutdown_hassio: 'false'

I can restart nut without complaints, at this point.

I suspect that the editors went away with the linux change in 0.6.0

Expected behavior

I expect my ups to stay online.

Actual behavior

Data goes stale after a few hours, Cyber Power ups and they are known for needing a short pollinterval

I may need to change DEADTIME 25, normally in the file /etc/nut/upsmon.conf:

System Health

version: core-2021.2.3
installation_type: Home Assistant Supervised
dev: false
hassio: true
docker: true
virtualenv: false
python_version: 3.8.7
os_name: Linux
os_version: 4.19.0-13-amd64
arch: x86_64
timezone: America/Chicago


GitHub API: ok
Github API Calls Remaining: 5000
Installed Version: 1.11.2
Stage: running
Available Repositories: 745
Installed Repositories: 10


host_os: Debian GNU/Linux 10 (buster)
update_channel: stable
supervisor_version: supervisor-2021.02.6
docker_version: 20.10.2
disk_total: 218.1 GB
disk_used: 10.0 GB
healthy: true
supported: true
supervisor_api: ok
version_api: ok
installed_addons: Backup Hassio to Google Drive (1.7.2), Dropbox Sync (1.3.0), Duck DNS (1.12.5), FTP (4.0.0), File editor (5.2.0), Log Viewer (0.9.1), RPC Shutdown (2.2), WireGuard (0.5.0), Mosquitto broker (5.1), SSH & Web Terminal (8.0.1), Samba share (9.3.0), TasmoAdmin (0.14.0), motionEye (0.11.0), AdGuard Home (3.0.0), Portainer (1.4.0), Glances (0.11.0), Check Home Assistant configuration (3.6.0), Network UPS Tools (0.6.1), DHCP server (1.2)


dashboards: 1
resources: 3
views: 16
mode: storage

Steps to reproduce

(How can someone else make/see it happen)

Proposed changes

(If you have a proposed change, workaround or fix,
describe the rationale behind it)

@sinclairpaul
Copy link
Member

The addons aren't designed to manually edit files, I would recommend looking at the upsd_maxage option as well, that can be configured in the config.

@mattlward
Copy link
Author

Paul, I just noticed that in the docs. Is this a proper config? I does load.

users:
  - username: upsmonmaster
    password: H8NzOjV30g471PF6TZtr
    instcmds:
      - all
    actions: []
devices:
  - name: DR_UPS
    alias: drups
    driver: usbhid-ups
    port: auto
    pollinterval: 15
    upsd_maxage: 25
    config:
      - vendorid = 0764*
mode: netserver
shutdown_host: 'false'
log_level: debug
shutdown_hassio: 'false'

@sinclairpaul
Copy link
Member

upsd_maxage: 25 should be outside the device (i.e. at the same level as devices/mode etc).

Also shutdown_hassio is not valid anymore and can be removed.

@mattlward
Copy link
Author

Thanks for the info. New config:

users:
  - username: upsmonmaster
    password: H8NzOjV30g471PF6TZtr
    instcmds:
      - all
    actions: []
devices:
  - name: DR_UPS
    alias: drups
    driver: usbhid-ups
    port: auto
    pollinterval: 15
    config:
      - vendorid = 0764*
mode: netserver
shutdown_host: 'false'
upsd_maxage: 25
log_level: debug

I will update, if it stays online for 3 to 4 hours. Since the update I have only made about 2 hours. I am surprised that directly connected hardware is less tolerant than the other units I have remotely connected to RPi's. They just never fail.

@mattlward
Copy link
Author

Does the log_level: debug place undo load on anything? I really does provide a lot of info.

@sinclairpaul
Copy link
Member

You can remove the log_level debug, it is rarely that helpful:)

Please let us know how you get on.

@mattlward
Copy link
Author

Do I need to worry about the cruff in the /etc/nut config files that was brought forward and now can't be removed? I verified that /etc/nut/ups.conf still has the basic perimeters that I have been running forever. Or is that being added to the file by the config in the addon?

@sinclairpaul
Copy link
Member

The addon creates the config on startup, I'm not sure what you are referring to.

@sinclairpaul
Copy link
Member

sinclairpaul commented Feb 11, 2021

Actually looking further, you would likely need:

  - name: DR_UPS
    driver: usbhid-ups
    port: auto
    config:
      - vendorid = 0764*
      - alias = drups
      - pollinterval = 15

Assuming those are valid driver options

@mattlward
Copy link
Author

This config did not work...

users:
  - username: upsmonmaster
    password: H8NzOjV30g471PF6TZtr
    instcmds:
      - all
    actions: []
devices:
  - name: DR_UPS
    driver: usbhid-ups
    port: auto
    config:
      - vendorid = 0764
      - alias = drups
      - pollinterval = 15
mode: netserver
shutdown_host: 'false'
upsd_maxage: 25

Log snippet around the nut restart"

0.007333	User [email protected] logged into UPS [DR_UPS]
   0.001957	Logged into UPS DR_UPS@localhost
   0.002118	Poll UPS [DR_UPS@localhost] failed - Driver not connected
{"message": "Event nut.ups_event fired."}Network UPS Tools upsmon 2.7.4
   5.002540	Poll UPS [DR_UPS@localhost] failed - Driver not connected
{"message": "Event nut.ups_event fired."}Network UPS Tools upsmon 2.7.4
  10.003294	Poll UPS [DR_UPS@localhost] failed - Driver not connected
  15.003649	Poll UPS [DR_UPS@localhost] failed - Driver not connected

My config from above does not seem to process the pollinterval properly, the ups becomes stale after about 1 hr and 45 minutes. Restarting nut restores service.

@sinclairpaul
Copy link
Member

sinclairpaul commented Feb 11, 2021

I personally have a Cyberpower UPS, it has been up and running for the past 5 hours or so of testing with the following:

devices:
  - name: Cyberpower
    driver: usbhid-ups
    port: auto
    config: []
mode: netserver
shutdown_host: 'false'
list_usb_devices: true
upsd_maxage: 25

Looking at the NUT docs, the default poll interval is 2 seconds.

@mattlward
Copy link
Author

I just stripped mine down to this:

devices:
  - name: DR_UPS
    alias: drups
    driver: usbhid-ups
    port: auto
    config:
      - vendorid = 0764
mode: netserver
shutdown_host: 'false'
list_usb_devices: true
upsd_maxage: 20

Maybe just made it to complex or smart. I am more accustomed to running on a Pi and having to configure it by hand.

Will report back.

@mattlward
Copy link
Author

Well crap, no data already. Reporting data stale in log. Will try a system restart, don't expect a change

@ricarva
Copy link

ricarva commented Feb 12, 2021

Same problem here: CyberPower UPS was working properly with the Addon, but now is throwing up "stale data" log messages and the sensors become unavailable.

It's unclear what the trigger is. A manual restart of the Addon seems to get things going, albeit temporarilly.

I first noticed this more than two days ago, so it was either brought about by HA 2012.2 or by the NUT Addon's v0.5.0.

Let me know what I can do to provide any more useful information.

@mattlward
Copy link
Author

@ricarva This morning I fell back to my last stable config, it was version 0.5.0. Did a partial restore a few moments ago. I have noticed that just unplugging and replugging the usb restored service.

I am waiting to see if this test allows the unit to remain up.

@mattlward
Copy link
Author

I have exceeded 2 hours of good connection after reverting to my 0.5.0 snapshot. Up longer now than I have been on 0.6.0 or 0.6.1, but long term stability test will just take time.

@ricarva
Copy link

ricarva commented Feb 12, 2021

@mattlward Thanks for the head's up.

I'm still on 0.6.1 and HA 2012.2, but confirm your finding that unplugging/replugging the USB restores the service. Let's see for how long.

@sinclairpaul Any idea of where the issue may lie?

Thanks for the help and insights.

@mattlward
Copy link
Author

mattlward commented Feb 12, 2021

Under .0.6.1 the service was restored for less than 2 hours, no different than restarting Nut. I am on HA 2021.2.3

I am running on a Lenovo M73, so I have multiple usb controllers and changing to a different controller had the same short duration fix.

@mattlward
Copy link
Author

@ricarva, did you edit the /etc/nut *.conf files under older versions? I did and I know the container gets rebuilt on upgrade, but my ups.conf and upsd.conf files still seemed to contain my old data when changing from 0.5.0 to 0.6.0.

I could be that it only appears that way, if when the system builds those files it reads data from the addon config file.

@ricarva
Copy link

ricarva commented Feb 12, 2021

I'm running on a RPi3b, and my last few attempts saw NUT last less than an hour.

@mattlward

I did not edit the *.conf files.

@mattlward
Copy link
Author

mattlward commented Feb 12, 2021

Not sure if this means anything... This is a graph of my free memory, the spike was the last time 0.6.1 died and I fell back to 0.5.0. My system has always appeared to have a memory leak, but never really goes below around 5600. And never becomes unstable. If not for version changes, I would expect wonderful uptimes. I have see 16 weeks between reboots.

image

@assices
Copy link

assices commented Feb 12, 2021

Same problem here: CyberPower UPS was working properly with the Addon, but now is throwing up "stale data" log messages and the sensors become unavailable.

It's unclear what the trigger is. A manual restart of the Addon seems to get things going, albeit temporarilly.

I first noticed this more than two days ago, so it was either brought about by HA 2012.2 or by the NUT Addon's v0.5.0.

Let me know what I can do to provide any more useful information.

Same issue at my side. It began yesterday.
My UPS is a CyberPower VALUE600EILCD, connected by usb port to a Rasp 4 (Hassio).

@sinclairpaul
Copy link
Member

I just pushed an update to the edge repo which you are welcome to test, allowing the config of the deadtime parameter. I have been running for at least 5 hours fine with it (although I got that yesterday), with the following config:

devices:
  - name: Cyberpower
    driver: usbhid-ups
    port: auto
    config:
      - pollinterval = 15
mode: netserver
shutdown_host: 'false'
upsd_maxage: 25
upsmon_deadtime: 25

Can I also suggest that when you save the config, take a quick look in the Supervisor log, as it will report any issues with it. I will continue to test over the next day or so.

@mattlward
Copy link
Author

I will try to convert to the edge repo this evening or in the morning.

Thanks for your work and help.

@ricarva
Copy link

ricarva commented Feb 12, 2021

@sinclairpaul, thanks for the quick turnaround on a possible fix.

I do wonder: do you have an idea of what made the issue manifest when it wasn't a problem in the past?

@mattlward
Copy link
Author

mattlward commented Feb 12, 2021

@sinclairpaul , now on the latest edge version with the following config:

devices:
  - name: DR_UPS
    alias: drups
    driver: usbhid-ups
    port: auto
    config:
      - vendorid = 0764*
      - pollinterval = 15
mode: netserver
shutdown_host: 'false'
list_usb_devices: true
upsd_maxage: 25
upsmon_deadtime: 25

@mattlward

This comment has been minimized.

@sinclairpaul
Copy link
Member

At one time I could send them as a switch using sshpass to send commands into the nut container via a docker exec. But, that no longer works because I have lost system shell access in order to stay healthy and supported

You can docker exec all you want, however I think this is getting a little off topic for this issue, likely better asked on the forums or Discord.

@mattlward
Copy link
Author

Understood. Just throwing it out there.

@sinclairpaul
Copy link
Member

So currently I am 8 hours without an issue, any other updates?

@mattlward
Copy link
Author

I am at 3:45 on the edge build, still looking good.

@ricarva
Copy link

ricarva commented Feb 13, 2021

@sinclairpaul still the question holds: why would deadtime need to be set now, when it wasn't a problem in the past?

@sinclairpaul
Copy link
Member

As mine has been running all night without an issue, I will release and close this out.

still the question holds: why would deadtime need to be set now, when it wasn't a problem in the past?

Deadtime was also set, it now can be adjusted, the repo has a changelog, and after spending ~20 hours of my own time on the addon this week, I'm not really going to look any further into it 😉.

Might of been the debian change, or the HA hw layer change, but neither I can do anything about.

@sinclairpaul
Copy link
Member

Hopefully fixed with v0.6.2, closing out for now.

@sinclairpaul sinclairpaul changed the title Stale data and incomplete shell in docker container Cyberpower stale commands Feb 14, 2021
@sblantipodi
Copy link

@sinclairpaul 0.6.2 doesn't solved it here. Can you reopen the issue please?

@garyak
Copy link

garyak commented Feb 21, 2021

I'm also seeing stale data errors with v6.2.

@sinclairpaul
Copy link
Member

I'm sorry folks, it is likely your configuration. I have been running for over a week with no issues, and based on the other comments I would suggest it works. To clarify my config is:

devices:
  - name: Cyberpower
    driver: usbhid-ups
    port: auto
    config:
      - pollinterval = 15
mode: netserver
shutdown_host: 'false'
list_usb_devices: true
upsd_maxage: 25
upsmon_deadtime: 25

@garyak
Copy link

garyak commented Feb 21, 2021

Alright, I'll duplicate your config as it will work for me and let you know how it goes.

@sblantipodi
Copy link

sblantipodi commented Feb 22, 2021

same here, testing Paul's config...

@ricarva
Copy link

ricarva commented Feb 22, 2021

My input on the configuration piece: what made the setup stable for me was setting the poll interval.

The Maxage and Deadtime params, by themselves, were not enough.

Cheers,

@geiseri
Copy link

geiseri commented Feb 22, 2021

I can confirm here that since the 0.6.2 update AND the upsmon_deadtime change it has worked perfectly for the last few days. Looking at my history I think this might have been a regression/change/feature with supervisor since I only noticed this after my last update. Either way now works like a charm!

@Hyrules
Copy link

Hyrules commented Feb 22, 2021

I'm having this issue. Here is my config :

devices:
  - name: BR1500G
    driver: usbhid-ups
    port: auto
    config:
      - pollinterval = 15
mode: netserver
shutdown_host: 'false'
upsd_maxage: 25
upsmon_deadtime: 25

Let's see if this fixes it. I had already setted the maxage and deadtime without success. I'm on 0.6.2 as well.

@geiseri
Copy link

geiseri commented Feb 22, 2021

i am not sure it matters, but have the serial in there because I have multiple ups attached.

devices:
  - name: ups_1
    driver: usbhid-ups
    port: auto
    config:
      - serial = "CXEJP2003238"
      - pollinterval = 15
  - name: ups_2
    driver: usbhid-ups
    port: auto
    config:
      - serial = "CTHGO2007041"
      - pollinterval = 15
mode: netserver
shutdown_host: 'false'
log_level: debug
list_usb_devices: true
upsd_maxage: 25
upsmon_deadtime: 25```

@sblantipodi
Copy link

something broke in the recent HA core since the problem started since the last HA Core update.

@Hyrules
Copy link

Hyrules commented Feb 23, 2021

In my case the it's still working at the moment the solution was to add -pollininterval = 15 in the device -> config option So my config is working.

@garyak
Copy link

garyak commented Feb 23, 2021

A couple days now without error. Thanks @sinclairpaul.

@sinclairpaul
Copy link
Member

Thanks for all the feedback, please feel free to open new issues.

@hassio-addons hassio-addons locked as resolved and limited conversation to collaborators Feb 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants