-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example scenario - HPC SaaS #759
Closed
Closed
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
ba7336b
Revised example scenario for data warehousing
alexbuckgit 2755f32
Minor edits to example scenario for data warehousing
alexbuckgit 71490cd
Removed extraneous diagram
alexbuckgit 9a34e0f
Revisions to example scenario for data warehousing
alexbuckgit 49c3c00
Revised version of example scenario for data warehousing
alexbuckgit ec75fb9
Revisions to example scenario for data warehousing
alexbuckgit 40b1814
Revisions to example scenario for data warehousing
alexbuckgit 82ea3f6
Incorporated feedback into example scenario for data warehousing
alexbuckgit 3006c0d
Minor updates to example scenario for data warehousing
alexbuckgit d734a17
Updated art in example scenario for data warehousing
alexbuckgit 70fe3f0
Draft version of example scenario for HPC SaaS
alexbuckgit a7e6549
Revisions to example scenario for HPC SaaS
alexbuckgit 2f8a71e
Revisions to example scenario for HPC SaaS
alexbuckgit 49c2fd6
Remove extraneous files from branch
alexbuckgit 035657a
Minor diagram updates
alexbuckgit a9ab8bb
Minor revisions to example scenario
alexbuckgit c8af9e0
Incorporated feedback into example scenario for HPC
alexbuckgit bd1e441
Revised draft diagram for example scenario
alexbuckgit 279e2fc
Incorporated feedback into example scenario for HPC
alexbuckgit File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
--- | ||
title: Computer-aided engineering through high performance computing on Azure | ||
description: <Article Description> | ||
author: alexbuckgit | ||
ms.date: 08/22/2018 | ||
--- | ||
|
||
# Computer-aided engineering through high performance computing on Azure | ||
|
||
This example scenario demonstrates delivery of a software-as-a-service (SaaS) platform built on the high-performance computing (HPC) capabilities of Azure. This scenario is based on an engineering software solution. However, the architecture is relevant to other industries requiring HPC resources such as image rendering, complex modeling, and financial risk calculation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this SaaS for internal resources or building a platform to be consumed by external customers? |
||
|
||
This example demonstrates an engineering software provider that delivers computer-aided engineering (CAE) applications to engineering firms and manufacturing enterprises. CAE solutions enable innovation, reduce development times, and lower costs throughout the lifetime of a product's design. These solutions require a substantial compute resources and often process high data volumes. The high costs of an on-premises HPC appliance or high-end workstations often put these technologies out of reach for small engineering firms, entrepreneurs, and students. | ||
|
||
The company wants to expand the market for its applications by building a SaaS platform backed by cloud-based HPC technologies. Their customers should be able to pay for compute resources as needed and access massive computing power that would be unaffordable otherwise. The company's goals include: | ||
* Taking advantage of HPC capabilities in Azure to accelerate the product design and testing process | ||
* Using the latest hardware innovations to run complex simulations, while minimizing the costs for simpler simulations | ||
* Enabling true-to-life visualization and rendering in a web browser, without requiring a high-end engineering workstation | ||
|
||
## Potential use cases | ||
|
||
Other scenarios using this architecture might include: | ||
|
||
* Genomics research | ||
* Weather simulation | ||
* Computational chemistry applications | ||
|
||
## Architecture | ||
|
||
![Architecture for a SaaS solution enabling HPC capabilities][architecture] | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's re-word this to sound more like a data flow. |
||
* Users can access NV-series virtual machines (VMs) via a browser with a HTML5-based RDP connection using the Apache Guacamole service (http://guacamole.apache.org/). These VM instances provide powerful GPUs for rendering and collaborative tasks. Users can edit their designs and view their results without needing access to high-end mobile computing devices or laptops. The Altair PBSWorks or open-source PBS Professional scheduler spins up additional nodes based on user-defined heuristics and could alternatively leverage a solution based on Azure functions. | ||
* From a desktop CAD session, users can submit workloads for execution on available HPC cluster nodes. These workloads perform tasks such as stress analysis or computational fluid dynamics calculations, eliminating the need for dedicated on-premises compute clusters. These cluster nodes can be configured to auto-scale based on load or queue depth based on active user demand for compute resources. | ||
* Azure Kubernetes Service (AKS) is used to host the web resources available to end users. | ||
|
||
### Components | ||
|
||
* [H-series virtual machines](/azure/virtual-machines/linux/sizes-hpc) are used to run compute-intensive simulations such as molecular modeling and computational fluid dynamics. The solution also takes advantage of technologies like remote direct memory access (RDMA) connectivity and InfiniBand networking. | ||
* [NV-series virtual machines](/azure/virtual-machines/windows/sizes-gpu) give engineers high-end workstation functionality from a standard web browser. These virtual machines have NVIDIA Tesla M60 GPUs that support advanced rendering and can run single precision workloads. | ||
* [General purpose virtual machines](/azure/virtual-machines/linux/sizes-general) running CentOS handle more traditional workloads such as web applications. | ||
* [Application Gateway](/azure/application-gateway/) load balances the requests coming into the web servers. | ||
* [Azure Kubernetes Service (AKS)](/azure/aks/) is used to run scalable workloads at a lower cost for simulations that don't require the high end capabilities of HPC or GPU virtual machines. | ||
* [Altair PBS Works Suite](https://www.pbsworks.com/PBSProduct.aspx?n=PBS-Works-Suite&c=Overview-and-Capabilities) orchestrates the HPC workflow, ensuring that enough virtual machine instances are available to handle the current load. It also deallocates virtual machines when demand is lower to reduce costs. | ||
* [Blob storage](/storage/blobs/storage-blobs-introduction) stores files that support the scheduled jobs. | ||
|
||
### Alternatives | ||
|
||
* [Azure CycleCloud](/azure/cyclecloud/overview) simplifies creating, managing, operating, and optimizing HPC clusters. It offers advanced policy and governance features. CycleCloud supports any job scheduler or software stack. | ||
* [HPC Pack](/azure/virtual-machines/windows/hpcpack-cluster-options) can create and manage an Azure HPC cluster for Windows Server-based workloads. HPC Pack isn't an option for Linux-based workloads. | ||
* [Azure Automation State Configuration](/azure/automation/automation-dsc-overview) provides an infrastructure-as-code approach to defining the virtual machines and software to be deployed. Virtual machines can be deployed as part of a virtual machine scale set, with auto-scaling rules for compute nodes based on the number of jobs submitted to the job queue. When a new virtual machine is needed, it is provisioned using the latest patched image from the Azure image gallery, and then the required software is installed and configured via a PowerShell DSC configuration script. | ||
|
||
## Considerations | ||
|
||
* While using an infrastructure-as-code approach is a great way to manage virtual machine build definitions, it can take a long time to provision a new virtual machine using a script. This solution found a good middle ground by using the DSC script to periodically create a golden image, which can then be used to provision a new virtual machine faster than completely building a VM on demand using DSC. Azure DevOps Services or other CI/CD tooling can periodically refresh golden images using DSC scripts. | ||
* Balancing overall solution costs with fast availability of compute resources is a key consideration. Provisioning a pool of N-series virtual machine instances and putting them in a deallocated state lowers the operating costs. When an additional virtual machine is needed, reallocating an existing instance will involve powering up the virtual machine on a different host, but the PCI bus detection time required by the OS to identify and install drivers for the GPU is eliminated because a virtual machine that is de-provisioned and then re-provisioned will retain the same PCI bus for the GPU upon restart. | ||
* The original architecture relied entirely on Azure virtual machines for running simulations. In order to reduce costs for workloads that didn't require all the capabilities of a virtual machine, these workloads were containerized and deployed to Azure Kubernetes Service (AKS). | ||
* The company's workforce had existing skills in open source technologies. They can take advantage of these skills by building on technologies like Linux and Kubernetes. | ||
|
||
## Pricing | ||
|
||
To help you explore the cost of running this scenario, many of the required services are pre-configured in a [cost calculator example][calculator]. The costs of your solution are dependent on the number and scale of services needed to meet your requirements. | ||
|
||
The following considerations will drive a substantial portion of the costs for this solution: | ||
* Azure virtual machine costs increase linearly as additional instances are provisioned. Virtual machines that are deallocated will only incur storage costs, and not compute costs. These deallocated machines can then be reallocated when demand is high. | ||
* Azure Kubernetes Services costs are based on the VM type chosen to support the workload. The costs will increase linearly based on the number of VMs in the cluster. | ||
|
||
## Next Steps | ||
|
||
* Read the [Altair customer story][source-document]. This example scenario is based on a version of their architecture. | ||
* Review other [Big Compute solutions](https://azure.microsoft.com/en-us/solutions/big-compute/) available in Azure. | ||
* Learn proven practices for building Azure-based solutions in the [Azure Architecture Center](/azure/architecture/). | ||
|
||
<!-- links --> | ||
[source-document]: https://customers.microsoft.com/story/altair-manufacturing-azure | ||
[architecture]: ./media/architecture-diagram-hpc-saas.png | ||
[calculator]: https://azure.com/e/3cb9ccdc893f41ffbcdb00c328178ccf |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's work on this title. Does this actually reflect the article? Would it be better to call it a "Managed system for Computer Aided Design on Azure"?