Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argo workflows on EKS example and doc #43

Merged
merged 23 commits into from
Oct 31, 2022
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 66 additions & 1 deletion schedulers/argo-workflow/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,66 @@
# Argo Worklfow on EKS (Coming Soon)
# Argo Worklfows on EKS
Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/job-schedulers-eks/argo-workflows-eks) to deploy this pattern and run sample tests.

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.0.0 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 3.72 |
| <a name="requirement_helm"></a> [helm](#requirement\_helm) | >= 2.4.1 |
| <a name="requirement_kubernetes"></a> [kubernetes](#requirement\_kubernetes) | >= 2.10 |
| <a name="requirement_random"></a> [random](#requirement\_random) | 3.3.2 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 3.72 |
| <a name="provider_kubernetes"></a> [kubernetes](#provider\_kubernetes) | >= 2.10 |
| <a name="provider_helm"></a> [helm](#provider\_helm) | 2.0.0 |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_eks_blueprints"></a> [eks\_blueprints](#module\_eks\_blueprints) | github.com/aws-ia/terraform-aws-eks-blueprints | v4.10.0 |
| <a name="module_eks_blueprints_kubernetes_addons"></a> [eks\_blueprints\_kubernetes\_addons](#module\_eks\_blueprints\_kubernetes\_addons) | github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons | v4.10.0 |
| <a name="module_helm_addon"></a> [helm_addon](#module\_eks\_blueprints\_kubernetes\helm_addons) | github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons/helm-addon | v4.12 |
| <a name="module_vpc"></a> [vpc](#module\_vpc) | terraform-aws-modules/vpc/aws | ~> 3.0 |
| <a name="module_vpc_endpoints"></a> [vpc\_endpoints](#module\_vpc\_endpoints) | terraform-aws-modules/vpc/aws//modules/vpc-endpoints | ~> 3.0 |
| <a name="module_vpc_endpoints_sg"></a> [vpc\_endpoints\_sg](#module\_vpc\_endpoints\_sg) | terraform-aws-modules/security-group/aws | ~> 4.0 |

## Resources

| Name | Type |
|------|------|


| [kubernetes_cluster_role.spark_cluster](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/cluster_role) | resource |
| [kubernetes_role_binding.spark_role_binding](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/cluster_role_binding) | resource |
| [kubernetes_role_binding.argo-admin-rolebinding](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/cluster_role_binding) | resource |
| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones) | data source |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_eks_cluster_auth.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth) | data source |
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |
| [aws_region.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/region) | data source |


## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_eks_cluster_version"></a> [eks\_cluster\_version](#input\_eks\_cluster\_version) | EKS Cluster version | `string` | `"1.23"` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of the VPC and EKS Cluster | `string` | `"spark-k8s-operator"` | no |
| <a name="input_private_subnets"></a> [private\_subnets](#input\_private\_subnets) | Private Subnets CIDRs. 16382 IPs per Subnet | `list(string)` | <pre>[<br> "10.1.0.0/18",<br> "10.1.64.0/18",<br> "10.1.128.0/18"<br>]</pre> | no |
| <a name="input_public_subnets"></a> [public\_subnets](#input\_public\_subnets) | Public Subnets CIDRs. 4094 IPs per Subnet | `list(string)` | <pre>[<br> "10.1.192.0/20",<br> "10.1.208.0/20",<br> "10.1.224.0/20"<br>]</pre> | no |
| <a name="input_region"></a> [region](#input\_region) | region | `string` | `"eu-west-1"` | no |
| <a name="input_vpc_cidr"></a> [vpc\_cidr](#input\_vpc\_cidr) | VPC CIDR | `string` | `"10.1.0.0/16"` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_configure_kubectl"></a> [configure\_kubectl](#output\_configure\_kubectl) | Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
106 changes: 106 additions & 0 deletions schedulers/argo-workflow/addons.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
module "eks_blueprints_kubernetes_addons" {
source = "github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons?ref=v4.10.0"

eks_cluster_id = module.eks_blueprints.eks_cluster_id
eks_cluster_endpoint = module.eks_blueprints.eks_cluster_endpoint
eks_oidc_provider = module.eks_blueprints.oidc_provider
eks_cluster_version = module.eks_blueprints.eks_cluster_version

#---------------------------------------------------------------
# Amazon EKS Managed Add-ons
#---------------------------------------------------------------
# EKS Addons
enable_amazon_eks_vpc_cni = true
enable_amazon_eks_coredns = true
enable_amazon_eks_kube_proxy = true
enable_amazon_eks_aws_ebs_csi_driver = true

#---------------------------------------------------------------
# CoreDNS Autoscaler helps to scale for large EKS Clusters
# Further tuning for CoreDNS is to leverage NodeLocal DNSCache -> https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/
#---------------------------------------------------------------
enable_coredns_autoscaler = true

#---------------------------------------------------------------
# Metrics Server
#---------------------------------------------------------------
enable_metrics_server = true


#---------------------------------------------------------------
# Cluster Autoscaler
#---------------------------------------------------------------
enable_cluster_autoscaler = true


#---------------------------------------------------------------
# Spark Operator Add-on
#---------------------------------------------------------------
enable_spark_k8s_operator = true

#---------------------------------------------------------------
# Apache YuniKorn Add-on
#---------------------------------------------------------------
enable_yunikorn = true


depends_on = [
module.eks_blueprints
]
victorgu-github marked this conversation as resolved.
Show resolved Hide resolved
tags = local.tags

}


#-------------------------------------------------
# Argo Workflows Helm Add-on
#-------------------------------------------------
locals {

default_helm_config = {
name = "argo-workflows"
chart = "argo-workflows"
repository = "https://argoproj.github.io/argo-helm"
version = "v0.20.1"
namespace = "argo-workflows"
create_namespace = true
description = "Argo workflows Helm chart deployment configuration"
}



irsa_config = {
kubernetes_namespace = local.default_helm_config["namespace"]
kubernetes_service_account = local.name
create_kubernetes_namespace = try(local.default_helm_config["create_namespace"], true)
create_kubernetes_service_account = false
irsa_iam_policies = []
}

eks_oidc_issuer_url = replace(data.aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer, "https://", "")
}
victorgu-github marked this conversation as resolved.
Show resolved Hide resolved

module "helm_addon" {
source = "github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons/helm-addon?ref=v4.12.0"
helm_config = local.default_helm_config
irsa_config = local.irsa_config

addon_context = {
aws_caller_identity_account_id = data.aws_caller_identity.current.account_id
aws_caller_identity_arn = data.aws_caller_identity.current.arn
aws_eks_cluster_endpoint = module.eks_blueprints.eks_cluster_endpoint
aws_partition_id = data.aws_partition.current.partition
aws_region_name = data.aws_region.current.name
eks_cluster_id = module.eks_blueprints.eks_cluster_id
eks_oidc_issuer_url = replace(data.aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer, "https://", "")
eks_oidc_provider_arn = "arn:${data.aws_partition.current.partition}:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.eks_oidc_issuer_url}"
tags = local.tags
irsa_iam_role_path = "/"
irsa_iam_permissions_boundary = ""
}

depends_on = [
module.eks_blueprints
]

}
victorgu-github marked this conversation as resolved.
Show resolved Hide resolved
15 changes: 15 additions & 0 deletions schedulers/argo-workflow/data.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
data "aws_eks_cluster_auth" "this" {
name = module.eks_blueprints.eks_cluster_id
}

data "aws_availability_zones" "available" {}

data "aws_region" "current" {}

data "aws_caller_identity" "current" {}

data "aws_partition" "current" {}

data "aws_eks_cluster" "eks_cluster" {
name = module.eks_blueprints.eks_cluster_id
}
164 changes: 164 additions & 0 deletions schedulers/argo-workflow/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
locals {
name = var.name
region = var.region
azs = slice(data.aws_availability_zones.available.names, 0, 3)
vpc_endpoints = ["autoscaling", "ecr.api", "ecr.dkr", "ec2", "ec2messages", "elasticloadbalancing", "sts", "kms", "logs", "ssm", "ssmmessages"]

tags = {
Blueprint = local.name
GithubRepo = "github.com/awslabs/data-on-eks"
}
}

victorgu-github marked this conversation as resolved.
Show resolved Hide resolved
#---------------------------------------------------------------
# EKS Blueprints
#---------------------------------------------------------------
module "eks_blueprints" {
source = "github.com/aws-ia/terraform-aws-eks-blueprints?ref=v4.10.0"
victorgu-github marked this conversation as resolved.
Show resolved Hide resolved

cluster_name = local.name
cluster_version = var.eks_cluster_version

vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnets

cluster_endpoint_private_access = true # if true, Kubernetes API requests within your cluster's VPC (such as node to control plane communication) use the private VPC endpoint
cluster_endpoint_public_access = true # if true, Your cluster API server is accessible from the internet. You can, optionally, limit the CIDR blocks that can access the public endpoint.

#---------------------------------------------------------------
# Note: This can further restricted to specific required for each Add-on and your application
#---------------------------------------------------------------
node_security_group_additional_rules = {
# Extend node-to-node security group rules. Recommended and required for the Add-ons
ingress_self_all = {
description = "Node to node all ports/protocols"
protocol = "-1"
from_port = 0
to_port = 0
type = "ingress"
self = true
}
# Recommended outbound traffic for Node groups
egress_all = {
description = "Node all egress"
protocol = "-1"
from_port = 0
to_port = 0
type = "egress"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
# Allows Control Plane Nodes to talk to Worker nodes on all ports. Added this to simplify the example and further avoid issues with Add-ons communication with Control plane.
# This can be restricted further to specific port based on the requirement for each Add-on e.g., metrics-server 4443, analytics-operator 8080, karpenter 8443 etc.
# Change this according to your security requirements if needed
ingress_cluster_to_node_all_traffic = {
description = "Cluster API to Nodegroup all traffic"
protocol = "-1"
from_port = 0
to_port = 0
type = "ingress"
source_cluster_security_group = true
}
}

managed_node_groups = {
# Core node group for deploying all the critical add-ons
mng1 = {
node_group_name = "core-node-grp"
subnet_ids = module.vpc.private_subnets

instance_types = ["m5.xlarge"]
ami_type = "AL2_x86_64"
capacity_type = "ON_DEMAND"

disk_size = 100
disk_type = "gp3"

max_size = 9
min_size = 3
desired_size = 3
create_launch_template = true
launch_template_os = "amazonlinux2eks"

update_config = [{
max_unavailable_percentage = 50
}]

k8s_labels = {
Environment = "preprod"
Zone = "test"
WorkerType = "ON_DEMAND"
NodeGroupType = "core"
}
# Checkout the docs for more details on node-template labels https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0
additional_tags = {
Name = "core-node-grp"
subnet_type = "private"
"k8s.io/cluster-autoscaler/node-template/label/arch" = "x86"
"k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/os" = "linux"
"k8s.io/cluster-autoscaler/node-template/label/noderole" = "core"
"k8s.io/cluster-autoscaler/node-template/label/node-lifecycle" = "on-demand"
"k8s.io/cluster-autoscaler/experiments" = "owned"
"k8s.io/cluster-autoscaler/enabled" = "true"
}
}
}

tags = local.tags
}



#---------------------------------------------------------------
# Kubernetes Cluster role for argo workflows
#---------------------------------------------------------------
resource "kubernetes_cluster_role" "spark-cluster" {
metadata {
name = "spark-cluster-role"
}

rule {
verbs = ["*"]
api_groups = ["sparkoperator.k8s.io"]
resources = ["sparkapplications"]
}
}
#---------------------------------------------------------------
# Kubernetes Cluster Role binding role for argo workflows
#---------------------------------------------------------------
resource "kubernetes_role_binding" "spark_role_binding" {
metadata {
name = "argo-spark-rolebinding"
namespace = local.default_helm_config.namespace
}

subject {
kind = "ServiceAccount"
name = "default"
namespace = local.default_helm_config.namespace
}

role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = kubernetes_cluster_role.spark-cluster.id
}
}
resource "kubernetes_role_binding" "argo-admin-rolebinding" {
metadata {
name = "argo-admin-rolebinding"
namespace = local.default_helm_config.namespace
}
victorgu-github marked this conversation as resolved.
Show resolved Hide resolved

subject {
kind = "ServiceAccount"
name = "default"
namespace = local.default_helm_config.namespace
}

role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "admin"
}
}
9 changes: 9 additions & 0 deletions schedulers/argo-workflow/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
output "configure_kubectl" {
description = "Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig"
value = module.eks_blueprints.configure_kubectl
}

output "eks_api_server_url" {
description = "Your eks API server endpoint"
value = module.eks_blueprints.eks_cluster_endpoint
}
17 changes: 17 additions & 0 deletions schedulers/argo-workflow/providers.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
provider "aws" {
region = local.region
}

provider "kubernetes" {
host = module.eks_blueprints.eks_cluster_endpoint
cluster_ca_certificate = base64decode(module.eks_blueprints.eks_cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.this.token
}

provider "helm" {
kubernetes {
host = module.eks_blueprints.eks_cluster_endpoint
cluster_ca_certificate = base64decode(module.eks_blueprints.eks_cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.this.token
}
}
Loading