CS205: Extreme Scale Data and Computational Science

Spring 2019

About the Course

Computational science has become a third partner, together with theory and experimentation, in advancing scientific knowledge and practice, and an essential tool for product and process development and manufacturing in industry. Big data science adds the ‘fourth pillar’ to scientific advancements, providing the methods and algorithms to extract knowledge or insights from data.

The course is a journey into the foundations of Parallel Computing at the intersection of large-scale computational science and big data analytics. Many science communities are combining high performance computing and high-end data analysis platforms and methods in workflows that orchestrate large-scale simulations or incorporate them into the stages of large-scale analysis pipelines for data generated by simulations, experiments, or observations.

This is an applications course highlighting the use of modern computing platforms in solving computational and data science problems, enabling simulation, modeling and real-time analysis of complex natural and social phenomena at unprecedented scales. The class emphasizes on making effective use of the diverse landscape of programming models, platforms, open-source tools, computing architectures and cloud services for high performance computing and high-end data analytics.

Main course site: Harvard-CS205.org

About the Projects

Extreme scale data science at the convergence of big data and massively parallel computing is enabling simulation, modelling and real-time analysis of complex natural and social phenomena at unprecedented scales. The aim of the projects is to gain practical experience into this interplay by applying parallel computation principles in solving a compute and data-intensive problem.

These final projects solve a data-intensive or a compute-intensive problem with parallel processing on the AWS cloud or on Harvard’s supercomputer: Odyssey (or both!). They have identified a compute or and data science problem, analysed its compute scaling requirements, collected the data, designed and implemented a parallel software, and demonstrated scaled performance of an end-to-end application.

Spring 2019 Projects

Presented on 8 May 2019

Group Number	Project Title	Team	Website
1	Real-Time Crowd Dynamics	Daniel Inge, Maddy Nakada, Raymond Lin, Stephen Slater, William Fu	GitHub, Website
2	Visualizing a Galactic Dark Matter Simulation	Alpha Sanneh, Sihan Yuan, Will Claybaugh, and Kaley Brauer	GitHub, Website
3	Parallel YouTube Classification	Filip Michalsky, Dylan Randle, Tommy Hill, Paxton Maeder-York	GitHub, Website
4	Parallel Newton Step for the SCOPF problem	Srivatsan Srinivasan, Aditya Karan, Cory Williams, Manish Reddy Vuyyuru	GitHub, Website
5	Thunderstruct	Nick Pagel, Jonathan Guillotte-Blouin, Santiago Vargas	GitHub, Website
6	Bayesian Additive Regression Trees (BART) using Spark and CUDA	Beau Coker, Patrick Emedom-Nnamdi, Isabella Grabski, Hali Hambridge, Matthew Quinn	GitHub, Website
7	Density Equalizing Maps	Millie Zhou, Benedikt Groever, Baptiste Lemaire	GitHub, Website
8	Large-Scale Distributed Sentiment Analysis With RNNs	Jianzhun Du, Rong Liu, Matteo Zhang, Yan Zhao	GitHub, Website
9	Parallel Echo State Networks	Zachary Blanks, Cedric Flamant, Elizabeth Lim, Zhai Yi	GitHub, Website
10	Parallelization on Single-Nucleotide Variant (SNV) Calling	Pu Zheng, Zijie Zhao, Weihung Hsu, Kangli Wu	GitHub
11	Parallelized Amazon Recommendation System based on Spark and OpenMP	Zhaohong Jin, Zheyu Wu, Abhimanyu Vasishth, Yuhao Lu	GitHub, Website
12	Parallelized analysis of CRISPR genetic screens	Bhaven Patel, Rory Maizels, Hugo Ramambason	GitHub, Website
13	Parallel Simulation of Federated Learning	Danyun He, Xin Dong, Meng Dong, Ziao Lin	GitHub, Website
14	Space Chemistry	William Burke, Drake Deuel, Esmail Fadae, Jamila Pegues	GitHub, Website

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS205: Extreme Scale Data and Computational Science

About the Course

About the Projects

Spring 2019 Projects

About

Releases

Packages

Contributors 15

harvard-cs-205/CS205-Spring2019-Projects

Folders and files

Latest commit

History

Repository files navigation

CS205: Extreme Scale Data and Computational Science

About the Course

About the Projects

Spring 2019 Projects

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 15

Packages