-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: SMART support #19103
base: main
Are you sure you want to change the base?
storage: SMART support #19103
Conversation
For now I added a new card to the disk detail which shows basic information and allows to run/abort self test. Note that for now only (S)ATA devices are supported by Udisks, NVMe support will be added in the latest release https://github.com/storaged-project/udisks/releases/tag/udisks-2.10.0 @garrett I believe you wanted to work on mockups for this, any chance you could look into proper design? |
f2b0ca0
to
00eb709
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating tests for this would be great, maybe udisks already has a way to mock it?
pkg/storaged/smart-details.jsx
Outdated
return ( | ||
<Card> | ||
<CardHeader actions={{ actions: <SmartActions smartInfo={smartInfo} /> }}> | ||
<CardTitle component="h2">{_("S.M.A.R.T")}</CardTitle> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's really something one can translate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm force of habit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in non-latin languages? Like chinese or russian or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if in doubt, keep it translatable. I can't say I like the many dots, this looks a bit like a 90's superhero comic 😁 So please either "SMART", or at least make it consistent and add a period to the T
as well.
Yes, it does. SmartUpdate() has an |
I am not sure how to test this. In order to run |
Neither does udisks test it (only if they find SMART disks), maybe because of the same reason? Random googling https://stackoverflow.com/questions/48351096/how-to-emulate-a-sata-disk-drive-in-qemu
So my suggestion would be to yolo hack |
FYI udiskctl has a smart simulate options:
|
@tomasmatus Are you still interested in working on this? It's quite a nice feature. |
Wow its been a year already... yes, this is something i should revive @martinpitt |
If i recall correctly the issue was tests, so trying to experiment with |
So, I think having SMART info is good. I remember I suggested it a long time ago in some issue and/or mockup. I don't remember where or what though. Where would this go? It's usually stylized as SMART, not S.M.A.R.T. (which is just awkward). At least in English, acronyms and initialisms usually do not include periods. https://en.wikipedia.org/wiki/Acronym — and acronyms that are proper names (including SMART) or commonly used are generally not supposed to even have periods... and they're usually (but not always) all in uppercase. It's kind of weird with all kinds of special-casing, however; Wikipedia's acronym page actually has a "nomenclature" section with some examples: https://en.wikipedia.org/wiki/Acronym#Nomenclature — and it even covers pronunciation a bit (when some are pronounced and some are spelled out... and even some that are a mixture, like "JPEG" and "MS-DOS"). As far as being detailed; I think we should show an overview and use a "progressive disclosure" concept. (That is, show the important, common stuff first and then click somewhere to see everything.) Since it looks like you're picking this up, it would be great to talk about the design of it. 👍 I know we had the concept of SMART errors propagating all the way up to the health card on the overview too (which would then link to the details). |
I am interested in this feature. Is there anything that can be done to progress the PR? |
@SchoolGuy: A recent update related to this PR: @tomasmatus is back and is planning on picking this up (as you can see in an above comment). I will be working with him on the design. If you'd like to help out, that'd be fine as well. In what ways are you interested in SMART? What data are you looking for, specifically? And when do you consult SMART data, usually? (From time to time or only when there are strange things going on as a diagnosis tool?) |
@garrett I am no designer by trade as such I am not sure on what part of the design I may be of assistance. I should be able to dig into the code and help out with implementing an idea that somebody else had. If that is not available I may be able to design things on my own but they may be hard to use... I am interested in SMART in two ways:
The idea for this specific project is to have Cockpit running as a WebUI for a single host with an mdadm array running and that resulting device is exposed as an NFS share that other hosts will use. In short: I want to use the host as a NAS. I will grab the SMART values additionally with the smartd exporter and will let Prometheus grab that but since that will be hosted by one of the other hosts I want to be able to know if a failure there is related to the mdadm array being down or by a configuration error. I am a quite paranoid person (regarding my infrastructure) and as such I am checking my monitoring quite regularly but it is not a fixed schedule I have. |
I'm not a developer but I'm a Fedora packager and would be very interested in seeing this in Fedora/RHEL. |
I can try to find some time to get back around to this to make some mockups. But the gist is that we need it in two places:
FWIW, here are the design ideas for disk issues (from a long time ago): #8787 (comment) SMART is mentioned, with regard to showing it in the health card:
We probably want to have some kind of status API where the overview page can query the storage page to get the SMART info and show it on the health card if there is a problem (and only if there is a problem), similar to the updates being on the health card when there are updates. (The software updates page is queried for this.) |
Perfect. This is already a great amount of detail. @tomasmatus how to you want to organize so we don't run into conflicts working on the same files? |
That fails into the scope of Cockpit!
What metrics are you interested in? UDisks provides some basic health metrics as seen in the screenshot on top. For other attributes we have to use SmartGetAttributes, so if you can give us a list of interesting attributes and maybe a
Cockpit will only support showing SMART values for a single host, the host you connect too. We don't want to support anything outside of the UDisks API |
For reference, udisks reports only this information from SMART:
It's not much, but it's probably enough in most cases? If this is all, then we can't do the expand idea, unless we get more information, like from |
Is it worth looking to see how gnome-disk-utility looks up the extra data? |
See my comment:
|
I've looked into testing this in a test virtual machine, the main issue is that even if I add a sata disk, udisks does not seem to identify this as a ATA drive: On my nas
|
Right, that's the extended data that we're definitely not going to show by default. It's way too verbose and misleading. We could, in a third stage, add all the details similar to that, possibly in a modal dialog similar to that. But, we should handle SMART in these steps:
This first PR should implement step # 1. Then we need to do a follow-up for step # 2 (overview). Then eventually add a step # 3 for more details. We should not do this all in one go, nor should we just show all the data points SMART can give us. (Many are very misleading, scary, or not relevant depending on the circumstance. And it varies per manufacturer and storage medium (spinning disks of different types, SSD, NVMe, etc.). It's a mess, really.) (I think @tomasmatus and I will talk about this more either later this week or next week.) |
For testing SMART in CI, there is a udisks issue now https://issues.redhat.com/browse/STORAGECFG-801 |
Please check this release out: https://github.com/storaged-project/udisks/releases/tag/udisks-2.10.90
This works now in UDisks, however it's still an However I've found this hidden workaround in Let me know if it helps your testing use case. |
00eb709
to
a19bc7f
Compare
I installed newest udisks from copr and played a bit with
After running the VM I can simulate SMART info with
I tried to look into this as that is for a different tool, not virsh. I didn't find any mentions of |
Yeah, trigerring SMART tests wouldn't work on any emulated device as far as I know.
You'd likely need to tweak the
That's a qemu commandline, as libvirt is just a wrapper, it'll likely have corresponding features in a slightly different syntax (i.e. |
(notes to self, will update with more later) Testing this on my trusty WD friend: I am able to start the test and see "live" updates on the progress status: I can abort the test while it's running: I can see one bad sector on my SSD. The disk still reports itself as healthy. |
We need udisks-nightly or well the udisks pre-release for testing this feature as it now uses udev instead of direct drive access for determing disk types etc. This apparently works better for testing emulating SMART data in a virtual machine. We have two options to get this tested:
In the test you would install those packages and kinda pray you have no dependency resolving issues 😄 Now for adding the SATA disk, that's a good question.... It is officially not supported. We can't easily override the so an idea I had was: class TestSmart(testlib.MachineCase):
provision = {
"0": { "drive_bus": "sata" },
} And then you'll need to hack on your own bots checkout:
Basically the domain XML to get a drive_bus format argument
|
Thanks @jelly. I agree that we should only throw so much effort on automating the test here -- if it's not possible with current udisks, there's that. But as far as I understand it, this is exclusively about testing with a mock smart device, right? i.e. current udisks can trigger smart in current versions, just not inject mock data? Please test it carefully on actual hardware. The tests should then do a feature test (like udisks version) so that it can run in the /nightly scenario -- you can easily trigger that in PRs. Then we have at least one place where it's covered, and over time it will flow into the distros. IMHO adding a package set etc. is overkill. |
Correct, current udisks works with the real hardware. The newer version (pre-release) is only needed only for virtual disks so I can mock something in the test. @martinpitt
I am trying to do this as much as I can. I can do that on my laptop with fedora 41 and desktop with arch which covers all three cases of HDD, SATA SSD and NVMe device. |
Initial SMART self test support
temporary placement, wip
3bda430
to
9bb8359
Compare
Needed in order to test SMART disk info in CI cockpit-project/cockpit#19103
Needed in order to test SMART disk info in CI cockpit-project/cockpit#19103
Initial SMART self test support
Uses data available in UDisks http://storaged.org/doc/udisks2-api/latest/gdbus-org.freedesktop.UDisks2.Drive.Ata.html
https://issues.redhat.com/browse/COCKPIT-935