Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add examples for PAIF automation #168

Merged
merged 4 commits into from
May 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions examples/workflows/private-ai-foundation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Private AI Foundation via Terraform - Samples

This repository contains sample automation for enabling a Tanzu Kubernetes Cluster with NVIDIA GPUs in VMware Cloud Foundation.

The configuration is divided into several steps which are intended to be executed in order.
Each step is designed to be atomic and can be executed independently from the rest provided that its prerequisites in terms of infrastructure are in place.

These examples also use the [vSphere Terraform Provider](https://github.com/hashicorp/terraform-provider-vsphere)

## Workflow Overview

### Starting State

This configuration is intended to be applied on an environment with a configured management domain.

### Desired State

Using these samples you can:

* Create a cluster image with vGPU drivers provided by NVIDIA
* Create a subscribed content library with custom container images
* Deploy a workload domain with vSAN storage and NSX network backing
* Create an NSX Edge Cluster
* Enable vSphere Supervisor on a cluster
* Configure a vSphere Namespace and a Virtual Machine Class with your vGPU drivers

## Contents

### The [steps](https://github.com/vmware/terraform-provider-vcf/-/tree/main/examples/workflows/private-ai-foundation/steps) directory contains the sample configuration divided into a number of examples/workflows/private-ai-foundation/steps
#### [Step 1](https://github.com/vmware/terraform-provider-vcf/-/blob/main/examples/workflows/private-ai-foundation/steps/01) - Create a vLCM cluster image with NVIDIA GPU drivers
#### [Step 2](https://github.com/vmware/terraform-provider-vcf/-/blob/main/examples/workflows/private-ai-foundation/steps/02) - Export the cluster image and create a workload domain with it
#### [Step 3](https://github.com/vmware/terraform-provider-vcf/-/blob/main/examples/workflows/private-ai-foundation/steps/03) - Create an NSX Edge Cluster
#### [Step 4](https://github.com/vmware/terraform-provider-vcf/-/blob/main/examples/workflows/private-ai-foundation/steps/04) - Create a subscribed Content Library
#### [Step 5](https://github.com/vmware/terraform-provider-vcf/-/blob/main/examples/workflows/private-ai-foundation/steps/05) - Enable Supervisor, create a vSphere Namespace and a VM class
55 changes: 55 additions & 0 deletions examples/workflows/private-ai-foundation/steps/01/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# This step generates a custom host image with vGPU drivers
# on the vCenter for the management domain.
# The source for the offline software depot for this step has to
# contain the drivers.

terraform {
required_providers {
vsphere = {
source = "hashicorp/vsphere"
}
}
}

# Connect to the vCenter Server backing the management domain
provider "vsphere" {
user = var.vcenter_username
password = var.vcenter_password
vsphere_server = var.vcenter_server
}

# Read a datacenter. Can be any datacenter
data "vsphere_datacenter" "dc" {
name = var.datacenter_name
}

# Retrieve the list of available host images from vLCM
# It is also valid to base your custom image on the build of a particular host
# but this scenario is not automated
data "vsphere_host_base_images" "base_images" {}

# Create an offline software depot
# The source for the depot should contain the vGPU drivers
resource "vsphere_offline_software_depot" "depot" {
location = var.depot_location
}

# Create a compute cluster
# It will remain empty and its sole purpose is to be used by vLCM to configure
# a custom image with the GPU drivers
resource "vsphere_compute_cluster" "image_source_cluster" {
name = var.cluster_name
datacenter_id = data.vsphere_datacenter.dc.id

# The "host_image" block enables vLCM on the cluster and configures a custom image with the provided settings
# It is recommended to add this block after you have configured your depot and retrieved the list of base images
# so that you can select the correct values
# This example uses the first available image and the first available component
host_image {
esx_version = data.vsphere_host_base_images.base_images.version.0
component {
key = vsphere_offline_software_depot.depot.component.0.key
version = vsphere_offline_software_depot.depot.component.0.version.0
}
}
}
23 changes: 23 additions & 0 deletions examples/workflows/private-ai-foundation/steps/01/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
variable "vcenter_username" {
description = "Username used to authenticate against the vCenter Server"
}

variable "vcenter_password" {
description = "Password used to authenticate against the vCenter Server"
}

variable "vcenter_server" {
description = "FQDN or IP Address of the vCenter Server for the management domain"
}

variable "datacenter_name" {
description = "The name of the datacenter where the new cluster will be created"
}

variable "cluster_name" {
description = "The name of the compute cluster"
}

variable "depot_location" {
description = "The URL where the contents for the offline software depot are hosted"
}
193 changes: 193 additions & 0 deletions examples/workflows/private-ai-foundation/steps/02/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# This step creates a new workload domain with the custom image
# from step 1.

terraform {
required_providers {
vcf = {
source = "vmware/vcf"
}
vsphere = {
source = "hashicorp/vsphere"
}
}
}

# Connect to the SDDC Manager
provider "vcf" {
sddc_manager_host = var.sddc_manager_host
sddc_manager_username = var.sddc_manager_username
sddc_manager_password = var.sddc_manager_password
}

# Connect to the vCenter Server backing the management domain
provider "vsphere" {
user = var.vcenter_username
password = var.vcenter_password
vsphere_server = var.vcenter_server
}

# Request the same datacenter which you created your cluster on in step 1
data "vsphere_datacenter" "dc" {
name = var.datacenter_name
}

# Request the compute cluster which you created in step 1
data "vsphere_compute_cluster" "image_source_cluster" {
name = var.source_cluster_name
datacenter_id = data.vsphere_datacenter.dc.id
}

# Configure a network pool for your hosts
resource "vcf_network_pool" "domain_pool" {
name = "engineering-pool"
network {
gateway = "192.168.10.1"
mask = "255.255.255.0"
mtu = 9000
subnet = "192.168.10.0"
type = "VSAN"
vlan_id = 100
ip_pools {
start = "192.168.10.5"
end = "192.168.10.50"
}
}
network {
gateway = "192.168.11.1"
mask = "255.255.255.0"
mtu = 9000
subnet = "192.168.11.0"
type = "vMotion"
vlan_id = 100
ip_pools {
start = "192.168.11.5"
end = "192.168.11.50"
}
}
}

# Commission 3 hosts for the new domain
resource "vcf_host" "host1" {
fqdn = var.esx_host1_fqdn
username = "root"
password = var.esx_host1_pass
network_pool_id = vcf_network_pool.domain_pool.id
storage_type = "VSAN"
}
resource "vcf_host" "host2" {
fqdn = var.esx_host2_fqdn
username = "root"
password = var.esx_host2_pass
network_pool_id = vcf_network_pool.domain_pool.id
storage_type = "VSAN"
}
resource "vcf_host" "host3" {
fqdn = var.esx_host3_fqdn
username = "root"
password = var.esx_host3_pass
network_pool_id = vcf_network_pool.domain_pool.id
storage_type = "VSAN"
}

# Extract a vLCM personality (a cluster image) from the cluster you created in step 1
# This will be applied on the new workload domain
# It is crucial that you do this before creating the domain as it is not possible to enable vLCM afterwards
resource "vcf_cluster_personality" "custom_image" {
name = "custom-image-3"
cluster_id = data.vsphere_compute_cluster.image_source_cluster.id
domain_id = var.management_domain_id
}

# Create a workload domain
resource "vcf_domain" "wld01" {
name = var.workload_domain_name

vcenter_configuration {
datacenter_name = "${var.workload_domain_name}-datacenter"
fqdn = var.workload_vcenter_fqdn
gateway = "10.0.0.250"
ip_address = var.workload_vcenter_address
name = "${var.workload_domain_name}-vcenter"
root_password = var.vcenter_root_password
subnet_mask = "255.255.252.0"
}

cluster {
name = "vi-cluster"

host {
id = vcf_host.host1.id
license_key = var.esx_license_key
}

host {
id = vcf_host.host2.id
license_key = var.esx_license_key
}

host {
id = vcf_host.host3.id
license_key = var.esx_license_key
}

vds {
name = "${var.workload_domain_name}-vds01"

portgroup {
name = "${var.workload_domain_name}-vds01-PortGroup-Mgmt"
transport_type = "MANAGEMENT"
}

portgroup {
name = "${var.workload_domain_name}-vds01-PortGroup-vMotion"
transport_type = "VMOTION"
}

portgroup {
name = "${var.workload_domain_name}-vds01-PortGroup-VSAN"
transport_type = "VSAN"
}
}

vsan_datastore {
datastore_name = "${var.workload_domain_name}-vsan"
license_key = var.vsan_license_key
}

geneve_vlan_id = "112"
cluster_image_id = vcf_cluster_personality.custom_image.id
}

nsx_configuration {
license_key = var.nsx_license_key
nsx_manager_admin_password = var.nsx_manager_admin_password

# You need to prepare the DNS entries for these hostnames before running Terraform
# You are free to modify the FQDNs
nsx_manager_node {
fqdn = "nsx-mgmt-wld-1.vrack.vsphere.local"
gateway = "10.0.0.250"
ip_address = var.nsx_manager_node1_address
name = "nsx-mgmt-wld-1"
subnet_mask = "255.255.252.0"
}

nsx_manager_node {
fqdn = "nsx-mgmt-wld-2.vrack.vsphere.local"
gateway = "10.0.0.250"
ip_address = var.nsx_manager_node2_address
name = "nsx-mgmt-wld-2"
subnet_mask = "255.255.252.0"
}

nsx_manager_node {
fqdn = "nsx-mgmt-wld-3.vrack.vsphere.local"
gateway = "10.0.0.250"
ip_address = var.nsx_manager_node3_address
name = "nsx-mgmt-wld-3"
subnet_mask = "255.255.252.0"
}
vip = var.nsx_manager_vip_address
vip_fqdn = "nsx-manager-wld.vrack.vsphere.local"
}
}
Loading