Skip to content

Latest commit

 

History

History
109 lines (97 loc) · 5.01 KB

README.md

File metadata and controls

109 lines (97 loc) · 5.01 KB

Tensorflow model for Backblaze data

Backblaze disk failure dataset is disk drive failure stats that are based on vendor, model and SMART and some other information.

Using Tensorflow it is possible to get a model to predict the failure rates based on the data. First, the Backblaze data was cleaned up and preprocessed via some python scripts. Then the data is used to train a model.

Usage:

usage: train.py [-h] [-f FILE] [-t TRAIN] [-p PREDICT]

train backblaze model

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  input file
  -t TRAIN, --train TRAIN
            train and save to file
  -p PREDICT, --predict PREDICT
            predict using model from file

Examples:

To train the model:

$ python train.py -t modelfailures.tf -f failed.csv
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
>>>Using input file failed.csv
./failed.csv 4597 bytes.
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:81:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
---------------------------------
Run id: GIZTH5
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 86
Validation samples: 0
--
Training Step: 6  | total loss: 0.48509
| Adam | epoch: 001 | loss: 0.48509 - acc: 0.9995 -- iter: 86/86
--
Training Step: 12  | total loss: 0.15169
| Adam | epoch: 002 | loss: 0.15169 - acc: 1.0000 -- iter: 86/86
--
Training Step: 18  | total loss: 0.13430
| Adam | epoch: 003 | loss: 0.13430 - acc: 0.9784 -- iter: 86/86
--
Training Step: 24  | total loss: 0.10718
| Adam | epoch: 004 | loss: 0.10718 - acc: 0.9845 -- iter: 86/86
--
Training Step: 30  | total loss: 0.13343
| Adam | epoch: 005 | loss: 0.13343 - acc: 0.9825 -- iter: 86/86
--
Training Step: 36  | total loss: 0.03229
| Adam | epoch: 006 | loss: 0.03229 - acc: 0.9960 -- iter: 86/86
--
Training Step: 42  | total loss: 0.09462
| Adam | epoch: 007 | loss: 0.09462 - acc: 0.9866 -- iter: 86/86
--
Training Step: 48  | total loss: 0.03574
| Adam | epoch: 008 | loss: 0.03574 - acc: 0.9956 -- iter: 86/86
--
Training Step: 54  | total loss: 0.09058
| Adam | epoch: 009 | loss: 0.09058 - acc: 0.9836 -- iter: 86/86
--
Training Step: 60  | total loss: 0.04486
| Adam | epoch: 010 | loss: 0.04486 - acc: 0.9933 -- iter: 86/86
--

To use the model:

$ python train.py -p modelfailures.tf -d HGST,HMS5C4040ALE640,0,0,0,0,0
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:81:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0)
::: [0.47443267703056335, 0.525567352771759]