-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: show temperature #48
Comments
since we have already smartmontools present in ShredOS the following commands could be used as fallback in case drivetemp doesn't deliver... |
Thanks for that info, saves me a lot of time and should be quick to implement. In fact after adding the code so SATA shows up on your controller as well as SAS, I think I'll add these next. I would think an update interval of 20 seconds should be sufficient? I wouldn't have thought the drive temperature changes too fast. Maybe even once a minute. |
Minute should be fine until proven otherwise :) I'd think that drivetemp would be the prefered first way as it's agnostic and doesn't rely on 3rd-party tools. Since it's using common ATA commands it should cover IDE, SATA, SAS and possibly even NVMe drives. As a fallback it will readout the temperature from SMART accd. to their documentation:
The temperature is SMART attribute 194 btw. Some (a lot) (of older) drives don't have that attribute, or the FW has it implemented wrong or some other other odd reason out of our control, so even that can fail. If that fails then even smartctl/smartmontools or other 3rd party tools will ALSO not be able to tell you the temp. |
Drivetemp it is, I'll leave this link here as a reminder https://www.kernel.org/doc/html/latest/hwmon/drivetemp.html for next week |
Just some work notes
Sample output for
|
cross-reference martijnvanbrummelen/nwipe#290 |
Temperature feature currently being written: Hopefully I will have a version ready for testing by the end of this week. |
Nice. Looking forward to it. Thanks so much. |
A couple of other things that I've noticed.
One other thought: In regards to the meaning of the colours that are used to display temperature:
|
Thorough! Hats off. Nothing to add really. This is it. Maybe one side note to put things into perspective: USB/SATA adapters seem to get really hot, too. Due to the sustained max data transfers during the wipe the IC/chips seem to be able to reach a critical (undocumented) temperature much quicker... and fail. Way earlier than the drive would fail ;) I've had this happen with a cheap adapter in the past. This is just anecdotal and I'm not saying it happens all the time or with all adapters. But something to consider. |
I have the same problem with USB adapters, even had one smoke and go pop ! I tend use an ATX PSU now to power the 3.5" drives I plug into the USB port, especially if I'm using an older drive that pulls a lot more startup current. I've added a little demo routine (which wil get stripped out now it's working) just to test the gui logic in terms of displaying the temperature in it's different forms. The limits I've set are not real world but they will do just to test the code. The real limits will be obtained from the drive and if not available the temperature will remain the default white text on blue background. nwipe_demo_temperature-2021-11-10_22.47.24.mp4 |
committed martijnvanbrummelen/nwipe#360 |
Thanks for the video. Seems to work as intended. I looked at the committed code and was surprised how much work had to be put in. This should be a reminder anytime a feature addition is requested :) |
No problem, regarding the amount of work, yes indeed. For what on the face of it is a small addition it does take a fair amount of code and time. But then nwipe is, I suppose, a fairly complex program, what with it's multiple threads and it's ncurses interface. I just did a search on line to see how much a C programmer in the US gets paid, apparently $40 per hour (I'm assuming that's full time, rather than contract), but based on that hourly rate, adding the temperature feature would have resulted in a invoice of about $900 and that's before I even make the package changes to ShredOS and add the latest nwipe. Luckily I do it for the love of coding :-) BTW I'll be uploading the latest ShredOS with the nwipe v0.32.003 (temperature edition) this evening. |
Now the major part of adding the temperature feature is done, making other additions, like having an nwipe option that either pauses a given thread or even shuts down the computer when the temperature reaches critical should be fairly straightforward (famous last words) :-) nwipe --temperature_critical=ACTION
|
@Firminator @Carl-Wilhelm shredos-2020.05.017_x86-64_0.32.003_20211111.img |
OK thank you so much for the IMG. That'll help testing on different platforms. I did a quick test earlier compiling NWIPE on my production box (without running an actual wipe) and the log said it detected the temperature on 2 out of 4 drives. So that's quite something already. One of the two failed drives was a SATA SSD with a botched/buggy FW, so drivetemp basically failed with it's native capabilities ("it uses the ATA SCT Command Transport feature to read the current drive temperature") and also it's fallback (reading out SMART attribute 194). Nothing anyone can do here. |
One more addition... drivetemp detects the temperature of the NVMe drive:
So
|
That's great, thanks for the detailed explanation. Regarding that nvme drive that doesn't show up in nwipe but does appear in hwmon, can you give me a list of all the files/directories under |
As a reminder to myself, if we end up using smartmontools data extensively in the new user interface, then I should consider using the --json ouput mode of smartctl and combing that with the json parser library https://github.com/PartialVolume/jsmn to generate a structure in C containing the smartmontools output. Currently I'm simply parsing the text output using string manipulations which is ok for the serial number info as it's only one item (and relies upon smartmontools not drastically changing their current text format) but if I'm going to start pulling more data out of smartmontools a more formal approach would be preferable using the jsmn library. |
I think I'll put the temperature in the drive selection menu as well (if it's not too involved) |
.iso now available for v2020.05.017_x86-64_0.32.003 |
I'll try to get the data from that other device. I dont have it available right now.
... so I think this is a local issue just on my production box. I'll check "all the files/directories under sys/class/hwmon/hwmon1" on that box later this evening.
That would be super-helpful. 👍 Thanks for the ISO also. Much appreciated. |
Yes
|
and then
|
Perfect, that will give me the match I need to match the device detected by the 'ped_device_get_next' function (libparted) against the correct hwmon directory. |
hallejujah. we are getting there. |
I am wondering though if the temperatures found in |
Good point. Only time will tell. I have another device with an NVME disk available later today and can test it on there as well... even though it wouldn't give a definite answer to your thought. |
Uploaded latest martijnvanbrummelen/nwipe#380 |
Log has some odd errors though
|
@Firminator Did you want me to update shredos with nwipe v0.32.012 ? It will take a couple of hours to build, I'll set it going before hitting the sack, so will be available about 10:00hrs UTC+0 tomorrow UK |
I think we should be good to go regarding the temperature readout for now. Take your time though with the compile. There is no pressure. It can wait.
|
That error message needs rewording. It should say "error: hwmon: Can't open /sys/class/hwmon/hwmonX/block" as it first tries to open block devices. I need to correct that. It's also (or should be) a verbose message so normally won't appear in the logs |
I see. For clarification: this output was done with -v so I guess there is nothing to fix there except for your mentioned wording. |
I need to add a third path to look for this device
I'll make the changes this morning and let you know when it's ready. Regarding the blank screen issue, I've build a version of shredos with framebuffer support, so commands like fbgrab image.png or fbgrab -C2 image.png now work for taking a snapshot of the three virtual screens. Tested in both legacy and UEFI. Legacy defaults to 640x480 on my DELL optiplex and I see the video resolution switch as it boots up and hopefully the resolution should be user switchable via the command line or linux kernel line. In UEFI it comes up in 1024x768. I'm using a 4:3 monitor hence the resolutions it's defaulting to. Fingers crossed this will fix the blank screen issue. fbgrab -v image.png will tell you what mode your in. Regarding devices /dev/sdc and /dev/sdd above, do they output any temperature info via smartctl -a or are they the drives that don't seem to produce any temperature data? |
Latest nwipe available for testing martijnvanbrummelen/nwipe#381 Updated hwmon search path to handle more nvme devices. |
Yes, there is no temp output from smartctl. I checked recently and found that the firmware doesn't even provide the SMART attribute 194 for temp. The tech specs for that drive series don't mention it at all. My guess is that INTEL decided to not put a temp. sensor in the drives (to drive costs down maybe). It's odd as you would expect that INTEL drives do have all the features. It's not that temp reading in storage is a new thing. |
|
That's great, I'll release Shredos with the latest version. Then after that, I'll start on implementing the smartctl json structure so I can update the USB temperatures (where the adapter supports ATA pass through). |
I think I'll close this and open a new issue regarding adding temperature support to USB devices. |
I ran nwipe yesterday on another device that couldn't boot ShredOS at all. It came preinstalled with Ubuntu hence took advantage of it and compiled nwipe. NVMe temperature readout was succesfull on a M.2 port connected to a PCIe Gen3 so I think we got this covered pretty good. Awesome. |
Excellent, glad the temperature readouts are working well. Did ShredOS just display the Recalbox message and get no further? |
For these systems that don't appear to boot but in fact are running nwipe, I could have ShredOS pipe the output of dmesg to the USB flash. Then we could see whether anything is showing up in the logs. |
Yes, this device is latest gen and the BIOS is UEFI-only with no option so set it to Legacy. Intel GPU. No additional video. It just hangs saying Recalbox.
Yes good idea. Pipe out the whole boot and initialization process to the flash drive. |
Good news. NVMe temperatures show up with 2 NVMe drives as well. Well done PartialVolume! I don't have a picture right now, but nwipe is now more advanced in that regard than GSmartControl/smartmontools. Who would have thought. Once I get the 3rd drive I run another test. |
That's great !, I'm still running tests on the new build based on kernel rev. 5.13.19. It probably won't be ready for a day or two but I'll keep you updated. |
[I took this picture 2021-11-30, but the latest ShredOS version from this week looks identical. So this works and this thread could be closed.] On a sidenote (and this should probably go into a seperate issue): |
Excellent. The drive names appear in whatever order libparted decides to find the drives. The drive contexts are then created in that same order. Probably best to open that as a separate issue. If it was to be fixed I'd probably do that at the libparted stage. After finding all the drives, sort the list then create the contexts, the GUI displays the list in order of context c[0]-> through c[n]->. I probably wouldn't do a sort in the GUI as that code is already pretty convoluted. |
For NVMe:
source https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.5-NVMe-HWMON-Support
grep . /sys/class/nvme/nvme0/device/hwmon/hwmon[1-9]/{name,temp1_input}
The number 1 to 9 probably depends on the mainboard/device and will list temperatures from other devices as well.
Example on my box:
cat /sys/class/nvme/nvme0/device/hwmon/hwmon3/temp1_input
The value that it spits out (e.g. 31850] needs to be divided by 1000 for human readable format, i.e. 31.85 Celsius
nvme smart-log /dev/nvme0 | grep -i '^temperature'
For SATA:
The text was updated successfully, but these errors were encountered: