Home

Welcome to the Middleware for Data-intensive Analysis and Science (MIDAS) tutorial!

Ioannis Paraskevakos, Oliver Beckstein, Andre Luckow and Shantenu Jha

You have learnt about the high-performance libraries for data analysis (SPIDAL). MIDAS is the middleware to support these analytical libraries. It does so by enabling ABDS frameworks to execute on HPC. Specifically, it supports:

Resource management capabilities via Pilot-Hadoop
Coordination and communication via Pilot-Spark

Pilot-Hadoop is used for executing multiple applications as a resource management layer. Pilot-Spark is used to support iterative analytical algorithms.

There are two components of the MIDAS tutorial:

Infrastructure: focuses on infrastructure, viz., how to run ABDS capabilities on existing HPC resources. We will introduce the concept a "Pilot Job" as an effective resource management capability and will discuss use of RADICAL-Pilot, Pilot-Hadoop and Pilot-Spark on data-intensive applications.
Applications/libraries: Discusses how MDAnalysis, a python Molecular Dynamics Analysis tool can use MIDAS for new functionality and higher performance.

At the end of this tutorial, we expect you will be able to:

Fire-up a Spark or Hadoop cluster on your favorite HPC machine.
Understand the basic concepts of task-parallel execution and be able to use RADICAL-Pilot to run task parallel applications
Perform scalable data-intensive analysis of biomolecular trajectories using MDAnalysis enhanced by MIDAS.

For Infrastructure Component:

Click here to get a link to the slides
Click here to get started and for tutorial exercises.

For MDAnalysis with MIDAS Component:

Click here to get a link to the slides for MDAnalysis.
Click here to get started and for tutorial exercises

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Clone this wiki locally