Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example scenario - HPC SaaS #759

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
ba7336b
Revised example scenario for data warehousing
alexbuckgit Aug 3, 2018
2755f32
Minor edits to example scenario for data warehousing
alexbuckgit Aug 3, 2018
71490cd
Removed extraneous diagram
alexbuckgit Aug 6, 2018
9a34e0f
Revisions to example scenario for data warehousing
alexbuckgit Aug 10, 2018
49c3c00
Revised version of example scenario for data warehousing
alexbuckgit Aug 21, 2018
ec75fb9
Revisions to example scenario for data warehousing
alexbuckgit Aug 21, 2018
40b1814
Revisions to example scenario for data warehousing
alexbuckgit Aug 21, 2018
82ea3f6
Incorporated feedback into example scenario for data warehousing
alexbuckgit Aug 23, 2018
3006c0d
Minor updates to example scenario for data warehousing
alexbuckgit Aug 23, 2018
d734a17
Updated art in example scenario for data warehousing
alexbuckgit Aug 29, 2018
70fe3f0
Draft version of example scenario for HPC SaaS
alexbuckgit Aug 30, 2018
a7e6549
Revisions to example scenario for HPC SaaS
alexbuckgit Aug 30, 2018
2f8a71e
Revisions to example scenario for HPC SaaS
alexbuckgit Aug 31, 2018
49c2fd6
Remove extraneous files from branch
alexbuckgit Sep 5, 2018
035657a
Minor diagram updates
alexbuckgit Sep 6, 2018
a9ab8bb
Minor revisions to example scenario
alexbuckgit Sep 6, 2018
c8af9e0
Incorporated feedback into example scenario for HPC
alexbuckgit Sep 18, 2018
bd1e441
Revised draft diagram for example scenario
alexbuckgit Sep 18, 2018
279e2fc
Incorporated feedback into example scenario for HPC
alexbuckgit Sep 19, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions docs/example-scenario/apps/hpc-saas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
title: Computer-aided engineering through high performance computing on Azure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's work on this title. Does this actually reflect the article? Would it be better to call it a "Managed system for Computer Aided Design on Azure"?

description: <Article Description>
author: alexbuckgit
ms.date: 08/22/2018
---

# Computer-aided engineering through high performance computing on Azure

This example scenario demonstrates delivery of a software-as-a-service (SaaS) platform built on the high-performance computing (HPC) capabilities of Azure. This scenario is based on an engineering software solution. However, the architecture is relevant to other industries requiring HPC resources such as image rendering, complex modeling, and financial risk calculation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this SaaS for internal resources or building a platform to be consumed by external customers?


This example demonstrates an engineering software provider that delivers computer-aided engineering (CAE) applications to engineering firms and manufacturing enterprises. CAE solutions enable innovation, reduce development times, and lower costs throughout the lifetime of a product's design. These solutions require a substantial compute resources and often process high data volumes. The high costs of an on-premises HPC appliance or high-end workstations often put these technologies out of reach for small engineering firms, entrepreneurs, and students.

The company wants to expand the market for its applications by building a SaaS platform backed by cloud-based HPC technologies. Their customers should be able to pay for compute resources as needed and access massive computing power that would be unaffordable otherwise. The company's goals include:
* Taking advantage of HPC capabilities in Azure to accelerate the product design and testing process
* Using the latest hardware innovations to run complex simulations, while minimizing the costs for simpler simulations
* Enabling true-to-life visualization and rendering in a web browser, without requiring a high-end engineering workstation

## Potential use cases

Other scenarios using this architecture might include:

* Genomics research
* Weather simulation
* Computational chemistry applications

## Architecture

![Architecture for a SaaS solution enabling HPC capabilities][architecture]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's re-word this to sound more like a data flow.

* Users can access NV-series virtual machines (VMs) via a browser with a HTML5-based RDP connection using the Apache Guacamole service (http://guacamole.apache.org/). These VM instances provide powerful GPUs for rendering and collaborative tasks. Users can edit their designs and view their results without needing access to high-end mobile computing devices or laptops. The Altair PBSWorks or open-source PBS Professional scheduler spins up additional nodes based on user-defined heuristics and could alternatively leverage a solution based on Azure functions.
* From a desktop CAD session, users can submit workloads for execution on available HPC cluster nodes. These workloads perform tasks such as stress analysis or computational fluid dynamics calculations, eliminating the need for dedicated on-premises compute clusters. These cluster nodes can be configured to auto-scale based on load or queue depth based on active user demand for compute resources.
* Azure Kubernetes Service (AKS) is used to host the web resources available to end users.

### Components

* [H-series virtual machines](/azure/virtual-machines/linux/sizes-hpc) are used to run compute-intensive simulations such as molecular modeling and computational fluid dynamics. The solution also takes advantage of technologies like remote direct memory access (RDMA) connectivity and InfiniBand networking.
* [NV-series virtual machines](/azure/virtual-machines/windows/sizes-gpu) give engineers high-end workstation functionality from a standard web browser. These virtual machines have NVIDIA Tesla M60 GPUs that support advanced rendering and can run single precision workloads.
* [General purpose virtual machines](/azure/virtual-machines/linux/sizes-general) running CentOS handle more traditional workloads such as web applications.
* [Application Gateway](/azure/application-gateway/) load balances the requests coming into the web servers.
* [Azure Kubernetes Service (AKS)](/azure/aks/) is used to run scalable workloads at a lower cost for simulations that don't require the high end capabilities of HPC or GPU virtual machines.
* [Altair PBS Works Suite](https://www.pbsworks.com/PBSProduct.aspx?n=PBS-Works-Suite&c=Overview-and-Capabilities) orchestrates the HPC workflow, ensuring that enough virtual machine instances are available to handle the current load. It also deallocates virtual machines when demand is lower to reduce costs.
* [Blob storage](/storage/blobs/storage-blobs-introduction) stores files that support the scheduled jobs.

### Alternatives

* [Azure CycleCloud](/azure/cyclecloud/overview) simplifies creating, managing, operating, and optimizing HPC clusters. It offers advanced policy and governance features. CycleCloud supports any job scheduler or software stack.
* [HPC Pack](/azure/virtual-machines/windows/hpcpack-cluster-options) can create and manage an Azure HPC cluster for Windows Server-based workloads. HPC Pack isn't an option for Linux-based workloads.
* [Azure Automation State Configuration](/azure/automation/automation-dsc-overview) provides an infrastructure-as-code approach to defining the virtual machines and software to be deployed. Virtual machines can be deployed as part of a virtual machine scale set, with auto-scaling rules for compute nodes based on the number of jobs submitted to the job queue. When a new virtual machine is needed, it is provisioned using the latest patched image from the Azure image gallery, and then the required software is installed and configured via a PowerShell DSC configuration script.

## Considerations

* While using an infrastructure-as-code approach is a great way to manage virtual machine build definitions, it can take a long time to provision a new virtual machine using a script. This solution found a good middle ground by using the DSC script to periodically create a golden image, which can then be used to provision a new virtual machine faster than completely building a VM on demand using DSC. Azure DevOps Services or other CI/CD tooling can periodically refresh golden images using DSC scripts.
* Balancing overall solution costs with fast availability of compute resources is a key consideration. Provisioning a pool of N-series virtual machine instances and putting them in a deallocated state lowers the operating costs. When an additional virtual machine is needed, reallocating an existing instance will involve powering up the virtual machine on a different host, but the PCI bus detection time required by the OS to identify and install drivers for the GPU is eliminated because a virtual machine that is de-provisioned and then re-provisioned will retain the same PCI bus for the GPU upon restart.
* The original architecture relied entirely on Azure virtual machines for running simulations. In order to reduce costs for workloads that didn't require all the capabilities of a virtual machine, these workloads were containerized and deployed to Azure Kubernetes Service (AKS).
* The company's workforce had existing skills in open source technologies. They can take advantage of these skills by building on technologies like Linux and Kubernetes.

## Pricing

To help you explore the cost of running this scenario, many of the required services are pre-configured in a [cost calculator example][calculator]. The costs of your solution are dependent on the number and scale of services needed to meet your requirements.

The following considerations will drive a substantial portion of the costs for this solution:
* Azure virtual machine costs increase linearly as additional instances are provisioned. Virtual machines that are deallocated will only incur storage costs, and not compute costs. These deallocated machines can then be reallocated when demand is high.
* Azure Kubernetes Services costs are based on the VM type chosen to support the workload. The costs will increase linearly based on the number of VMs in the cluster.

## Next Steps

* Read the [Altair customer story][source-document]. This example scenario is based on a version of their architecture.
* Review other [Big Compute solutions](https://azure.microsoft.com/en-us/solutions/big-compute/) available in Azure.
* Learn proven practices for building Azure-based solutions in the [Azure Architecture Center](/azure/architecture/).

<!-- links -->
[source-document]: https://customers.microsoft.com/story/altair-manufacturing-azure
[architecture]: ./media/architecture-diagram-hpc-saas.png
[calculator]: https://azure.com/e/3cb9ccdc893f41ffbcdb00c328178ccf
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.