Skip to content

tensorflow model trained using backblaze disk drive failure statistics to predict future errors on disks

License

Notifications You must be signed in to change notification settings

bobbae/tensorblaze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tensorflow model for Backblaze data

Backblaze disk failure dataset is disk drive failure stats that are based on vendor, model and SMART and some other information.

Using Tensorflow it is possible to get a model to predict the failure rates based on the data. First, the Backblaze data was cleaned up and preprocessed via some python scripts. Then the data is used to train a model.

Usage:

usage: train.py [-h] [-f FILE] [-t TRAIN] [-p PREDICT]

train backblaze model

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  input file
  -t TRAIN, --train TRAIN
            train and save to file
  -p PREDICT, --predict PREDICT
            predict using model from file

Examples:

To train the model:

$ python train.py -t modelfailures.tf -f failed.csv
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
>>>Using input file failed.csv
./failed.csv 4597 bytes.
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:81:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
---------------------------------
Run id: GIZTH5
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 86
Validation samples: 0
--
Training Step: 6  | total loss: 0.48509
| Adam | epoch: 001 | loss: 0.48509 - acc: 0.9995 -- iter: 86/86
--
Training Step: 12  | total loss: 0.15169
| Adam | epoch: 002 | loss: 0.15169 - acc: 1.0000 -- iter: 86/86
--
Training Step: 18  | total loss: 0.13430
| Adam | epoch: 003 | loss: 0.13430 - acc: 0.9784 -- iter: 86/86
--
Training Step: 24  | total loss: 0.10718
| Adam | epoch: 004 | loss: 0.10718 - acc: 0.9845 -- iter: 86/86
--
Training Step: 30  | total loss: 0.13343
| Adam | epoch: 005 | loss: 0.13343 - acc: 0.9825 -- iter: 86/86
--
Training Step: 36  | total loss: 0.03229
| Adam | epoch: 006 | loss: 0.03229 - acc: 0.9960 -- iter: 86/86
--
Training Step: 42  | total loss: 0.09462
| Adam | epoch: 007 | loss: 0.09462 - acc: 0.9866 -- iter: 86/86
--
Training Step: 48  | total loss: 0.03574
| Adam | epoch: 008 | loss: 0.03574 - acc: 0.9956 -- iter: 86/86
--
Training Step: 54  | total loss: 0.09058
| Adam | epoch: 009 | loss: 0.09058 - acc: 0.9836 -- iter: 86/86
--
Training Step: 60  | total loss: 0.04486
| Adam | epoch: 010 | loss: 0.04486 - acc: 0.9933 -- iter: 86/86
--

To use the model:

$ python train.py -p modelfailures.tf -d HGST,HMS5C4040ALE640,0,0,0,0,0
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:81:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
::: [0.47443267703056335, 0.525567352771759]

About

tensorflow model trained using backblaze disk drive failure statistics to predict future errors on disks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages