-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
234 lines (189 loc) · 10.1 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
hdrecover
---------
Attempts to recover a hard disk that has bad blocks on it.
WARNING: A hard disk with bad blocks on it is likely to fail! If you
value your data you should get a new hard disk instead of using this
program!
However, if you can't afford a new hard disk, or just like being
reckless with your data then this tool might just help you out!
Requirements:
* A 2.6 kernel
* A dead hard drive (well one with bad blocks - not completely dead)
* smartmontools is handy although not strictly needed
* root access
When to use:
First of all, if you haven't already got it - go download
smartmontools! (http://smartmontools.sf.net). Then run:
smartctl -A /dev/hda
Where hda is whichever drive you're interested in. The output should
include three attributes that are interesting:
* Reallocated_Event_Count
This is how many sectors have already been reallocated on the
drive. We're hoping to get the hard disk to increase this number!
* Current_Pending_Sector
The number of sectors that the drive thinks are dodgy. Bear in mind
sometimes drives change their mind about whether a sector is bad or
not - so this number can go down without a reallocation occuring.
* Offline_Uncorrectable
This is the number of sectors that the drive has attempted to
correct itself, but failed. Running the command:
smartctl -t offline /dev/hda
should cause the drive to test the sectors and attempt to fix
them. Not all drives support this though.
So the attribute you're interested in at the moment is
Current_Pending_Sector. If this is not 0 then there's something up
with your disk. If Offline_Uncorrectable is less than
Current_Pending_Sectors then you may want to run an offline test which
may fix it, or you can dive straight in and use hdrecover which will
test (and attempt to fix) it itself.
So run (as root):
hdrecover /dev/hda
and sit back and wait (or probably you're best off going and doing
something else while you wait). If the program comes across a bad
sector it will make several attempts at reading the sector. Between
each attempt it will randomly seek so as to reposition the head,
hopefully getting the data off the disk. If it succeeds then the drive
should automatically reallocate the sector. If after several attempts
the data still can't be read then the program will give you the option
of overwriting the data in the sector. This will force the drive to
reallocate the block.
WARNING: Overwriting the sector WILL cause data loss (it's fairly
obvious really!). Hopefully, since it is only one sector (or a handful
if there are more bad sectors on the drive), it won't be too
important. But bear in mind that you should at the very least run fsck
to check the integrity of the filesystem.
Once the program has finished (it takes a long time I'm afraid), a
summary will be printed showing how many blocks had errors. If you
repeat the smartctl command:
smartctl -A /dev/hda
you should see that Current_Pending_Sector is now 0 and
Reallocated_Event_Count will have risen by the number of sectors the
drive decided to reallocate. Offline_Uncorrectable usually doesn't
immediately update - you have to either wait for the drive to update
it itself or run an offline test which should force the drive to
update it.
Have fun!
DISCLAIMER: Hard disks with bad sectors could fail at anytime -
although hdrecover appears to fix the drive you shouldn't trust it!
And just to re-inforce the GPL licence: if it in any way breaks your
computer then you get to keep both pieces! I accept no responsibility
for any damage it causes!
-------------------------------------------------------------------------------
Sample run:
SMART attribute table before:
SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0029 100 100 020 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 064 063 020 Pre-fail Always - 4623
4 Start_Stop_Count 0x0032 096 096 008 Old_age Always - 2882
5 Reallocated_Sector_Ct 0x0033 099 099 020 Pre-fail Always - 5
7 Seek_Error_Rate 0x000b 100 093 023 Pre-fail Always - 0
9 Power_On_Hours 0x0012 080 080 001 Old_age Always - 13227
10 Spin_Retry_Count 0x0026 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0013 100 090 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 097 097 008 Old_age Always - 1993
13 Read_Soft_Error_Rate 0x000b 100 093 023 Pre-fail Always - 0
194 Temperature_Celsius 0x0022 078 073 042 Old_age Always - 57
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 295108
196 Reallocated_Event_Count 0x0010 099 099 020 Old_age Offline - 1
197 Current_Pending_Sector 0x0032 100 099 020 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 092 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x001a 156 156 000 Old_age Always - 44
Notice there is 1 Current_Pending_Sector. Running hdrecover /dev/hdb
(it happens to be the second IDE drive):
# hdrecover /dev/hdb
hdrecover version 0.2
By Steven Price
Disk is 156355584 sectors big
Sector 28000 (00%) ETR: 1 hours 33 minutes 3 seconds
Sector 75920 (00%) ETR: 1 hours 8 minutes 36 seconds
Sector 121520 (00%) ETR: 1 hours 4 minutes 16 seconds
Sector 169400 (00%) ETR: 1 hours 1 minutes 27 seconds
Sector 217120 (00%) ETR: 59 minutes 55 seconds
Failed to read block at sector 257500, investigating futher...
Error at sector 257505
Attempting to pounce on it...
Attempt 1 from sector 117727958: FAILED
Attempt 2 from sector 149263364: FAILED
Attempt 3 from sector 84429675: FAILED
Attempt 4 from sector 81160097: FAILED
Attempt 5 from sector 156272094: FAILED
Attempt 6 from sector 75028634: FAILED
Attempt 7 from sector 63238892: FAILED
Attempt 8 from sector 116585171: FAILED
Attempt 9 from sector 102184049: FAILED
Attempt 10 from sector 5777672: FAILED
Attempt 11 from sector 81154822: FAILED
Attempt 12 from sector 69534256: FAILED
Attempt 13 from sector 90633165: FAILED
Attempt 14 from sector 150946922: FAILED
Attempt 15 from sector 18909543: FAILED
Attempt 16 from sector 29845152: FAILED
Attempt 17 from sector 24976866: FAILED
Attempt 18 from sector 52496797: FAILED
Attempt 19 from sector 104093022: FAILED
Attempt 20 from sector 127778471: FAILED
Couldn't recover sector
The data for this sector could not be recovered. However, destroying the
contents of this sector (ie writing zeros to it) should cause the hard disk
to reallocate it making the drive useable again
Do you want to destroy the sector? [y/n]:
/---------\
| WARNING |
\---------/
Up until this point you haven't lost any data because of this program
However, if you say yes below YOU WILL LOSE DATA!
Proceed at your own risk!
Type 'destroy data' to continue
destroy data
Wiping sector...
Checking sector is now readable...
Sector is now readable. But you have lost data!
Sector 257520 (00%) ETR: 30 hours 58 minutes 53 seconds
Sector 290700 (00%) ETR: 27 hours 35 minutes 18 seconds
... (loads more status info) ...
Sector 156197020 (99%) ETR: 3 seconds
Sector 156242080 (99%) ETR: 2 seconds
Sector 156287080 (99%) ETR: 1 seconds
Sector 156332200 (99%) ETR: 0 seconds
Summary:
1 bad sectors found
of those 0 were recovered
and 1 could not be recovered and were destroyed causing data loss
*****************************************
* You have wiped a sector on this disk! *
*****************************************
* If you care about the filesystem on *
* this disk you should run fsck on it *
* before mounting it to correct any *
* potential metadata errors *
*****************************************
#
So the sector had to be wiped to recover it. fsck was then run to
check the metadata wasn't damaged (it wasn't) and smartctl now returns
the following attribute table:
SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0029 100 100 020 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 064 063 020 Pre-fail Always - 4623
4 Start_Stop_Count 0x0032 096 096 008 Old_age Always - 2882
5 Reallocated_Sector_Ct 0x0033 099 099 020 Pre-fail Always - 5
7 Seek_Error_Rate 0x000b 100 093 023 Pre-fail Always - 0
9 Power_On_Hours 0x0012 080 080 001 Old_age Always - 13227
10 Spin_Retry_Count 0x0026 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0013 100 090 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 097 097 008 Old_age Always - 1993
13 Read_Soft_Error_Rate 0x000b 100 093 023 Pre-fail Always - 0
194 Temperature_Celsius 0x0022 078 073 042 Old_age Always - 57
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 295108
196 Reallocated_Event_Count 0x0010 099 099 020 Old_age Offline - 1
197 Current_Pending_Sector 0x0032 100 099 020 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 092 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x001a 156 156 000 Old_age Always - 44
Current_Pending_Sector is now 0 so the drive is 'fixed' (for
now). Also note that Reallocated_Event_Count hasn't gone up (it was 1
before). This means that the drive now has confidence in the sector
that has been overwritten and has not used a spare sector. Whether you
still have confidence in the drive is another matter!