Skip to content
Oliver Beckstein edited this page Apr 21, 2017 · 21 revisions

Welcome to the Middleware for Data-intensive Analysis and Science (MIDAS) tutorial!

Ioannis Paraskevakos, Oliver Beckstein, Andre Luckow and Shantenu Jha

You have learnt about the high-performance libraries for data analysis (SPIDAL). MIDAS is the middleware to support these analytical libraries. It does so by enabling ABDS frameworks to execute on HPC. Specifically, it supports:

  1. Resource management capabilities via Pilot-Hadoop
  2. Coordination and communication via Pilot-Spark

Pilot-Hadoop is used for executing multiple applications as a resource management layer. Pilot-Spark is used to support iterative analytical algorithms.

There are two components of the MIDAS tutorial:

  • Infrastructure: focuses on infrastructure, viz., how to run ABDS capabilities on existing HPC resources. We will introduce the concept a "Pilot Job" as an effective resource management capability and will discuss use of RADICAL-Pilot, Pilot-Hadoop and Pilot-Spark on data-intensive applications.
  • Applications/libraries: Discusses how MDAnalysis, a python Molecular Dynamics Analysis tool can use MIDAS for new functionality and higher performance.

At the end of this tutorial, we expect you will be able to:

  • Fire-up a Spark or Hadoop cluster on your favorite HPC machine.
  • Understand the basic concepts of task-parallel execution and be able to use RADICAL-Pilot to run task parallel applications
  • Perform scalable data-intensive analysis of biomolecular trajectories using MDAnalysis enhanced by MIDAS.

For Infrastructure Component:

For MDAnalysis with MIDAS Component:

Clone this wiki locally