Anomaly detection in HPC systems

Dataset

The data was collected from a monitored supercomputer hosted at CINECA and called "Marconi100"; the data was collected with a tool called Examon. The dataset is composed of several folders, a folder for each selected node (there are not all the hundreds of nodes present on Marconi100, but some nodes with periods that also contained failures). The information monitored on Marconi100's nodes is varied, ranging from the load of the different cores, to the temperature of the room where the nodes are located, the speed of the fans, details on memory accesses in writing / reading, etc The sampling rate of the data at the source varies between 5 and 10 seconds. However, in the data set the data are aggregated in 15-minutes intervals; in particular, the mean value ("avg: <metric_name>") and variance ("var: <metric_name>") are computed over each 15-minute interval.

Task

I performed an Anomaly detection using the three approaches semi supervised, unsupervised and self -supervised learning

Models

I have used here models semi-supervised,unsupervised & self supervised algorithms. In order to make the comparision between the types of the model.

Autoencoders
Isolation Forest
Local Outlier Factor
One class SVM
Minimum Covariance Determinant
self supervised TABNET

Project WorkFlow

Dataset Prepratation
Data Analysis
Split the data into training(Normal data) and testing
MinMax Scaling
Semi supervised Learning- Autoencoder
Reconstruction error check
chosing Threshold based on F1 score
Implementation of Unsupervised Algorithms
- Isolation Forest
- Local Outlier Factor
- One class SVM
- Minimum Covariance Determinant
self supervised using TABNET

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Research papers		Research papers
images		images
new_data		new_data
AI_in_industry.ipynb		AI_in_industry.ipynb
Presentation.pdf		Presentation.pdf
Project Report.pdf		Project Report.pdf
README.md		README.md
autoencoder.h5		autoencoder.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly detection in HPC systems

Dataset

Task

Models

Project WorkFlow

Results

Built With

Author

About

Releases

Packages

Contributors 2

Languages

LIA-UniBo/Ai-in-industry

Folders and files

Latest commit

History

Repository files navigation

Anomaly detection in HPC systems

Dataset

Task

Models

Project WorkFlow

Results

Built With

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages