-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the Middleware for Data-intensive Analysis and Science (MIDAS) tutorial!
Ioannis Paraskevakos, Oliver Beckstein, Andre Luckow and Shantenu Jha
You have learnt about the high-performance libraries for data analysis (SPIDAL). MIDAS is the middleware to support these analytical libraries. It does so by enabling ABDS frameworks to execute on HPC. Specifically, it supports:
- Resource management capabilities via Pilot-Hadoop
- Coordination and communication via Pilot-Spark
Pilot-Hadoop is used for executing multiple applications as a resource management layer. Pilot-Spark is used to support iterative analytical algorithms.
There are two components of the MIDAS tutorial:
- Infrastructure: focuses on infrastructure, viz., how to run ABDS capabilities on existing HPC resources. We will introduce the concept a "Pilot Job" as an effective resource management capability and will discuss use of RADICAL-Pilot, Pilot-Hadoop and Pilot-Spark on data-intensive applications.
- Applications/libraries: Discusses how MDAnalysis, a python Molecular Dynamics Analysis tool can use MIDAS for new functionality and higher performance.
At the end of this tutorial, we expect you will be able to:
- Fire-up a Spark or Hadoop cluster on your favorite HPC machine.
- Understand the basic concepts of task-parallel execution and be able to use RADICAL-Pilot to run task parallel applications
- Perform scalable data-intensive analysis of biomolecular trajectories using MDAnalysis enhanced by MIDAS.
For Infrastructure Component:
- Click here to get a link to the slides
- Click here to get started and for tutorial exercises.
For MDAnalysis with MIDAS Component:
-
Click here to get a link to the slides for MDAnalysis.
-
Click here to get started and for tutorial exercises