Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart agent reports only null and zero values for SAS drives #390

Open
snowsnoot opened this issue Dec 21, 2021 · 3 comments
Open

Smart agent reports only null and zero values for SAS drives #390

snowsnoot opened this issue Dec 21, 2021 · 3 comments

Comments

@snowsnoot
Copy link

Smart script only prints 'null' and '0' values for my SAS drives. I get a few more values for my SATA SSD, but still a few null's:

sda is the SSD, sdb - sdi are SAS drives (HP MB3000FBUCN)

# ./smart -c /etc/snmp/smart.config
sda,null,null,0,null,null,null,0,null,null,22,0,null,null,0,98,3649,0,0,0,0,0,0,0,0,9319
sdb,0,null,null,null,null,null,null,null,null,38,null,null,null,null,null,null,0,0,0,0,0,0,0,0,null
sdc,0,null,null,null,null,null,null,null,null,39,null,null,null,null,null,null,0,0,0,0,0,0,0,0,null
sdd,47,null,null,null,null,null,null,null,null,37,null,null,null,null,null,null,0,0,0,0,0,0,0,0,null
sde,244,null,null,null,null,null,null,null,null,40,null,null,null,null,null,null,0,0,0,0,0,0,0,0,null
sdf,0,null,null,null,null,null,null,null,null,39,null,null,null,null,null,null,0,0,0,0,0,0,0,0,null
sdg,105,null,null,null,null,null,null,null,null,38,null,null,null,null,null,null,0,0,0,0,0,0,0,0,null
sdh,1,null,null,null,null,null,null,null,null,39,null,null,null,null,null,null,0,0,0,0,0,0,0,0,null
sdi,63,null,null,null,null,null,null,null,null,37,null,null,null,null,null,null,0,0,0,0,0,0,0,0,null

Config file:

# cat /etc/snmp/smart.config
useSN=0
smartctl=/usr/sbin/smartctl
cache=/var/cache/smart/cache
sda /dev/sda -d sat
sdb /dev/sdb -d scsi
sdc /dev/sdc -d scsi
sdd /dev/sdd -d scsi
sde /dev/sde -d scsi
sdf /dev/sdf -d scsi
sdg /dev/sdg -d scsi
sdh /dev/sdh -d scsi
sdi /dev/sdi -d scsi
@napaster
Copy link

napaster commented Jun 2, 2022

Same story with sas disks

@JvGinkel
Copy link
Contributor

JvGinkel commented Nov 3, 2022

The smart script is parsing the smartctl output and use specific ID's and output that as you can see here https://github.com/librenms/librenms-agent/blob/master/snmp/smart#L442-L444

$toReturn=$toReturn.$disk_id.','.$IDs{'5'}.','.$IDs{'10'}.','.$IDs{'173'}.','.$IDs{'177'}.','.$IDs{'183'}.','.$IDs{'184'}.','.$IDs{'187'}.','.$IDs{'188'}
	    .','.$IDs{'190'} .','.$IDs{'194'}.','.$IDs{'196'}.','.$IDs{'197'}.','.$IDs{'198'}.','.$IDs{'199'}.','.$IDs{'231'}.','.$IDs{'233'}.','.
		$completed.','.$interrupted.','.$read_failure.','.$unknown_failure.','.$extended.','.$short.','.$conveyance.','.$selective.','.$IDs{'9'}."\n";

So maybe you can do a smartctl on the cli and then see which ID's you get returned with what value and if that's the same as this smartctl output produce.

For example one of my disks gives:

smartctl -a /dev/sda 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       98683
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       42
177 Wear_Leveling_Count     0x0013   095   095   000    Pre-fail  Always       -       53
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   056   048   000    Old_age   Always       -       44
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       27
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       30079114404

You can see that for example ID's 10 and 173 are missing so these are null values in the script output as there is nothing to parse.

@rci-kmccolm
Copy link

I think the issue is with how the script is parsing the output of smartctl. Here is the output that smartctl gives agasinst my SAS drive. As you can see, it is very different from what you posted.

# smartctl -a /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.19.16-200.fc36.x86_64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HP
Product:              MB3000FBUCN
Revision:             HPD2
Compliance:           SPC-3
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca01a7830fc
Serial number:        YHJ4342D
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Thu Nov  3 09:39:26 2022 EDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     40 C
Drive Trip Temperature:        65 C

Accumulated power on time, hours:minutes 52796:53
Manufactured in week 12 of year 2012
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  136
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2183
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0   446238         0         0          0     146784.484           0
write:         0 49852631         0  49852631          0     209954.084           0
verify:        0       18         0        18          0        437.992           0

Non-medium error count:     2204

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   27867                 - [-   -    -]
# 2  Background short  Completed                   -   25967                 - [-   -    -]
# 3  Background short  Completed                   -   25967                 - [-   -    -]
# 4  Background short  Completed                   -   25898                 - [-   -    -]
# 5  Background long   Completed                   -   25898                 - [-   -    -]
# 6  Background short  Completed                   -      24                 - [-   -    -]
# 7  Background short  Completed                   -      21                 - [-   -    -]

Long (extended) Self-test duration: 27182 seconds [7.6 hours]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants