|
| 1 | +# MSI Support Testing for Bedrock AKS-gitops |
| 2 | + |
| 3 | +| Revision | Date | Author | Remarks | |
| 4 | +| -------: | ------------ | -------------- | ------------- | |
| 5 | +| 0.1 | Mar-30, 2020 | Nathaniel Rose | Initial Draft | |
| 6 | + |
| 7 | +## 1. Overview |
| 8 | + |
| 9 | +Managed Identities for Azure resources provides Azure services with an |
| 10 | +automatically managed identity in Azure AD. You can use the identity to |
| 11 | +authenticate to any service that supports Azure AD authentication, including Key |
| 12 | +Vault, without any credentials in your code. Terraform can be configured to use |
| 13 | +managed identity for authentication in one of two ways: using environment |
| 14 | +variables, or by defining the fields within the provider block. |
| 15 | + |
| 16 | +AKS creates two managed identities: |
| 17 | + |
| 18 | +- System-assigned managed identity: The identity that the Kubernetes cloud |
| 19 | + provider uses to create Azure resources on behalf of the user. |
| 20 | + |
| 21 | +- User-assigned managed identity: The identity that's used for authorization in |
| 22 | + the cluster. |
| 23 | + |
| 24 | +This document outlines a testing suite to support feature related support for |
| 25 | +managed identities in AKS using a proposed new Bedrock environment that |
| 26 | +leverages a modified cobalt project test harness in order for test pod identity |
| 27 | +within an AKS cluster using agile CI/CD and test validation. |
| 28 | + |
| 29 | +### Scenarios Addressed: |
| 30 | + |
| 31 | +1. [As an SRE, I want Enable MSI Support for aks-gitops module](https://github.com/microsoft/bedrock/issues/994) |
| 32 | +2. [As an Operator, I want automated testing validation for MSI verified within Bedrock](https://github.com/microsoft/bedrock/issues/1197) |
| 33 | +3. [As an operator, I want integration Tests tracking with junit logs from terratest](https://github.com/microsoft/bedrock/issues/867) |
| 34 | +4. [As an operator, I want to implement a managed service identity (via AAD Pod Identity) based secret handling strategy](https://github.com/microsoft/bedrock/issues/482) |
| 35 | + |
| 36 | +## 2. Out of Scope |
| 37 | + |
| 38 | +An existing pull request for Bedrock currently exists that enables MSI support |
| 39 | +for aks-gitops modules [#995](https://github.com/microsoft/bedrock/pull/995). |
| 40 | +This design document seeks to solely capture a terraform template and |
| 41 | +complementary test. |
| 42 | + |
| 43 | +The following are not included in this proposal: |
| 44 | + |
| 45 | +- Mocking for Terraform Unit Tests |
| 46 | +- Feature revert and Rollback from failed merges |
| 47 | +- Adjusting Cobalt Test Fixture support for current file organization of |
| 48 | + Bedrock: i.e.: testing files in respective folders for template environments. |
| 49 | + |
| 50 | +## 3. Design Details |
| 51 | + |
| 52 | +This design seeks to introduce modular testing for terraform known as |
| 53 | +`Test Fixtures` based on best practices initially introduced by |
| 54 | +[Project Cobalt](github.com/microsoft/cobalt). The test fixtures decouples |
| 55 | +terraform commands to respective pipeline templats to be called and dynamically |
| 56 | +populated by a targeted template test. |
| 57 | + |
| 58 | +### 3.1 Embed new Infrastructure DevOps Model Flow - Continuous Integration |
| 59 | + |
| 60 | +Bedrock infrastructure integration tests have problematic gaps that do not |
| 61 | +account for terraform unit testing, state validation to live environments and |
| 62 | +staged release management for Bedrock versioning. Bedrock test harness does not |
| 63 | +contain module targeted fail fast resource definition validation outside the |
| 64 | +scope of an environment `terraform plan`. In addition, integration tests are |
| 65 | +validated through new deployments that require extensive time to provision. |
| 66 | +Furthermore, releases of features contain no issue reporting benchmark, |
| 67 | +automated deployment validation, or guidance process for merging into master. In |
| 68 | +this design we wish to provide a single template leveraging MSI that verifies a |
| 69 | +new Infrastructure Testing Workflow that improves on the current Bedrock test |
| 70 | +harness. |
| 71 | + |
| 72 | +This design is intended to address expected core testing functionality |
| 73 | +including: |
| 74 | + |
| 75 | +- Support deployment of application-hosting infrastructure that will eventually |
| 76 | + house the actual application service components capture basic metrics and |
| 77 | + telemetry from the deployment process for monitoring of ongoing pipeline |
| 78 | + performance and diagnosis of any deployment failures |
| 79 | +- Support deployment into multiple staging environments |
| 80 | +- Execute automated unit-level and integration-level tests against the |
| 81 | + resources, prior to deployment into any long-living environments |
| 82 | +- Provide a manual approval process to gate deployment into long-living |
| 83 | + environments |
| 84 | +- Provide detection, abort, and reporting of deployment status when a failure |
| 85 | + occurs. |
| 86 | + |
| 87 | + |
| 88 | + |
| 89 | +The proposed new Infrastructure Devops Flow for Terraform Testing can be |
| 90 | +separated by 4 key steps: |
| 91 | + |
| 92 | +1. Test Suite Initialization - Provisioning global artifacts, secrets and |
| 93 | + dependencies needed for targeted whitelisted test matrix. |
| 94 | +2. Static Validation - Environment initialization, code validation, inspection, |
| 95 | + terraform security compliance, and terraform module unit tests. |
| 96 | +3. Dynamic Validation - Targeted environment interoperability, integration |
| 97 | + tests, cloud deployment, de-provisioning of resources, error reporting. |
| 98 | +4. QA- Peer approval, release management, feature staging, acceptance test |
| 99 | + within live cluster. |
| 100 | + |
| 101 | +> The diagram above contains green check marks that indicate preexisting Bedrock |
| 102 | +> testing components that are already implemented through the current test |
| 103 | +> harness. |
| 104 | +
|
| 105 | +### 3.2 Creation of Managed Identity enable AKS Gitops Environments |
| 106 | + |
| 107 | +A new AKS Bedrock template with Managed Identity enabled, (`azure-MI`), will be |
| 108 | +added to the collection of environment templates. This template will be an |
| 109 | +upgraded derivative of the `azure-simple` template, with a new dependency on |
| 110 | +`azure-common-infra` and will contain the following: |
| 111 | + |
| 112 | +- Managed Identity System Level for AKS |
| 113 | +- Pod Identity Security Policy |
| 114 | +- Backend State |
| 115 | + |
| 116 | +**Sample `Main.tf`** |
| 117 | + |
| 118 | +``` |
| 119 | +resource "azurerm_resource_group" "aks_rg" { |
| 120 | + name = local.aks_rg_name |
| 121 | + location = local.region |
| 122 | +} |
| 123 | +
|
| 124 | +module "aks-gitops" { |
| 125 | + source = "github.com/microsoft/bedrock?ref=aks_msi_integration//cluster/azure/aks-gitops" |
| 126 | +
|
| 127 | + acr_enabled = true |
| 128 | + agent_vm_count = var.aks_agent_vm_count |
| 129 | + agent_vm_size = var.aks_agent_vm_size |
| 130 | + cluster_name = local.aks_cluster_name |
| 131 | + dns_prefix = local.aks_dns_prefix |
| 132 | + flux_recreate = var.flux_recreate |
| 133 | + gc_enabled = true |
| 134 | + msi_enabled = true |
| 135 | + gitops_ssh_url = var.gitops_ssh_url |
| 136 | + gitops_ssh_key = var.gitops_ssh_key_file |
| 137 | + gitops_path = var.gitops_path |
| 138 | + gitops_poll_interval = var.gitops_poll_interval |
| 139 | + gitops_label = var.gitops_label |
| 140 | + gitops_url_branch = var.gitops_url_branch |
| 141 | + kubernetes_version = var.kubernetes_version |
| 142 | + resource_group_name = azurerm_resource_group.aks_rg.name |
| 143 | + service_principal_id = module.app_management_service_principal.service_principal_application_id |
| 144 | + service_principal_secret = module.app_management_service_principal.service_principal_password |
| 145 | + ssh_public_key = file(var.ssh_public_key_file) |
| 146 | + vnet_subnet_id = module.vnet.vnet_subnet_ids[0] |
| 147 | + network_plugin = var.network_plugin |
| 148 | + network_policy = var.network_policy |
| 149 | + oms_agent_enabled = var.oms_agent_enabled |
| 150 | +} |
| 151 | +``` |
| 152 | + |
| 153 | +Questions & Limitations: |
| 154 | + |
| 155 | +- With the deployment of the `azure-common-infra` template for Key Vault, will |
| 156 | + that also need to be modified for Manage Identity to whitelist AKS to access |
| 157 | + keyvault? |
| 158 | + |
| 159 | +### 3.3 Testing for Managed Identity enable AKS Gitops Environments |
| 160 | + |
| 161 | +The testing for the Managed Identity enabled AKS gitops environment will |
| 162 | +incorporate the aforementioned new Infrastructure DevOps Model Flow for |
| 163 | +Terraform to assess pod identity access for a Voting App service deployed using |
| 164 | +terraform and a flux manifest repository. |
| 165 | + |
| 166 | +#### Unit Tests |
| 167 | + |
| 168 | +Cobalt Test Fixtures includes a library that simplifies writing unit terraform |
| 169 | +tests against templates. It extracts out pieces of this process and provides a |
| 170 | +static validation for a json sample output per module. For this, we require Unit |
| 171 | +Tests for the following modules: |
| 172 | + |
| 173 | +- AKS |
| 174 | +- Key Vault |
| 175 | +- VNet |
| 176 | +- Subnet |
| 177 | +- Gitops |
| 178 | + |
| 179 | +#### Integration Tests |
| 180 | + |
| 181 | +Integration tests will validate resource interoperability upon deployment. |
| 182 | +Pending a successful `terraform apply`, using a go script and terratest go |
| 183 | +library, this design will create an integration test for the respective |
| 184 | +environment template that verifies |
| 185 | + |
| 186 | +- Access to cluster through MI |
| 187 | +- Flux namespace |
| 188 | +- Access to voting app using Pod Identity |
| 189 | +- Access to key using flex-volume |
| 190 | + ([Unable to use Env Vars](https://github.com/Azure/kubernetes-keyvault-flexvol/issues/28)) |
| 191 | +- 200 response on Voting App |
| 192 | + |
| 193 | +#### Acceptance Test |
| 194 | + |
| 195 | +Acceptance tests are defined in this design as a system affirmation that the |
| 196 | +incoming PR has a successful build in a live staging environment once applied. |
| 197 | +Maintain a live QA environment that successful builds from an incoming PR are |
| 198 | +applied to the state file. |
| 199 | + |
| 200 | +Questions & Limitations: |
| 201 | + |
| 202 | +- With an incoming change to an azure provider module, how will this be applied |
| 203 | + to an existing terraform deployment. If fail, should we redeploy a new |
| 204 | + `azure-MI` environment for QA? |
| 205 | + |
| 206 | +#### Reporting |
| 207 | + |
| 208 | +Output a test failure report using out-of-box terratest JUnit compiler to |
| 209 | +capture errors thrown during build. |
| 210 | + |
| 211 | +The whitelisted integration test for `azure-MI` will include: |
| 212 | + |
| 213 | +> `go test -v -run TestIT_Bedrock_AzureMI_Test -timeout 99999s | tee TestIT_Bedrock_AzureMI_Test.log` |
| 214 | +
|
| 215 | +> `terratest_log_parser -testlog TestIT_Bedrock_AzureSimple_Test.log -outputdir single_test_output` |
| 216 | +
|
| 217 | +The pipeline will publish the XML report as an artifact that is uniquely named |
| 218 | +to AzDO. |
| 219 | + |
| 220 | +``` |
| 221 | + task: PublishPipelineArtifact@1 |
| 222 | + inputs: |
| 223 | + path: $(modulePath)/test/single_test_output |
| 224 | + artifact: simple_test_logs |
| 225 | + condition: always() |
| 226 | + - task: PublishTestResults@2 |
| 227 | + inputs: |
| 228 | + testResultsFormat: 'JUnit' |
| 229 | + testResultsFiles: '**/report.xml' |
| 230 | + searchFolder: $(modulePath)/test |
| 231 | + condition: and(eq(variables['Agent.JobStatus'], 'Succeeded'), endsWith(variables['Agent.JobName'], 'Bedrock_Build_Azure_MI')) |
| 232 | +``` |
| 233 | + |
| 234 | +## 4. Dependencies |
| 235 | + |
| 236 | +This design for a Managed Identity AKS Testing Harness will leverage the |
| 237 | +following: |
| 238 | + |
| 239 | +- [Bedrock Pre-Reqs: az cli | terraform | golang | fabrikate ](https://github.com/microsoft/bedrock/tree/master/tools/prereqs) |
| 240 | +- [Terratest](https://github.com/gruntwork-io/terratest) |
| 241 | +- [Terraform Compliance](https://github.com/eerkunt/terraform-compliance) |
| 242 | +- [Cobalt Terraform Test Fixtures](https://github.com/microsoft/cobalt/tree/master/test-harness) |
| 243 | + |
| 244 | +## 5. Risks & Mitigations |
| 245 | + |
| 246 | +Risks & Limitations: |
| 247 | + |
| 248 | +- With the deployment of the `azure-common-infra` template for Key Vault, will |
| 249 | + that also need to be modified for Manage Identity to whitelist AKS to access |
| 250 | + keyvault? |
| 251 | +- With an incoming change to an azure provider module, how will this be applied |
| 252 | + to an existing terraform deployment. If fail, should we redeploy a new |
| 253 | + `azure-MI` environment for QA? |
| 254 | +- How long does it take to deploy MI and Keyvault in a pipeline? |
| 255 | + |
| 256 | +## 6. Documentation |
| 257 | + |
| 258 | +Yes, Documentation will need to be added to the new terraform environment and |
| 259 | +the Bedrock testing guidance. |
0 commit comments