Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions infrastructure/terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,14 @@ cp terraform.tfvars.example terraform.tfvars
terraform init && terraform apply
```

## ⚙️ Optional AML diagnostics

Set `should_enable_aml_diagnostic_logs = true` in `terraform.tfvars` to create an AML workspace diagnostic setting that sends all AML resource logs to the platform Log Analytics workspace. The default is `false`.

```hcl
should_enable_aml_diagnostic_logs = true
```

## 📖 Documentation

| Guide | Description |
Expand Down
1 change: 1 addition & 0 deletions infrastructure/terraform/TERRAFORM.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ Architecture:
| should\_add\_current\_user\_storage\_blob | Whether to add the current user as Storage Blob Data Contributor | `bool` | `true` | no |
| should\_create\_resource\_group | Whether to create the resource group for the robotics infrastructure | `bool` | `true` | no |
| should\_deploy\_aml\_compute | Whether to deploy an AzureML managed compute cluster for GPU workloads | `bool` | `false` | no |
| should\_enable\_aml\_diagnostic\_logs | Whether to enable AML workspace diagnostic logs in Log Analytics | `bool` | `false` | no |
| should\_deploy\_ampls | Whether to deploy Azure Monitor Private Link Scope and its private endpoint | `bool` | `true` | no |
| should\_deploy\_dce | Whether to deploy Data Collection Endpoint for observability | `bool` | `true` | no |
| should\_deploy\_grafana | Whether to deploy Azure Managed Grafana dashboard | `bool` | `true` | no |
Expand Down
5 changes: 3 additions & 2 deletions infrastructure/terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,9 @@ module "platform" {
should_deploy_dce = var.should_deploy_dce

// AzureML compute
should_deploy_aml_compute = var.should_deploy_aml_compute
aml_compute_config = var.aml_compute_config
should_enable_aml_diagnostic_logs = var.should_enable_aml_diagnostic_logs
should_deploy_aml_compute = var.should_deploy_aml_compute
aml_compute_config = var.aml_compute_config

// DNS zone flags
should_include_aks_dns_zone = var.should_include_aks_dns_zone
Expand Down
2 changes: 2 additions & 0 deletions infrastructure/terraform/modules/platform/TERRAFORM.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Optional: PostgreSQL and Redis for OSMO workloads.
| [azurerm_machine_learning_compute_cluster.gpu](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/machine_learning_compute_cluster) | resource |
| [azurerm_managed_redis.main](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/managed_redis) | resource |
| [azurerm_monitor_data_collection_endpoint.main](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/monitor_data_collection_endpoint) | resource |
| [azurerm_monitor_diagnostic_setting.ml_workspace_logs](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/monitor_diagnostic_setting) | resource |
| [azurerm_monitor_private_link_scope.main](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/monitor_private_link_scope) | resource |
| [azurerm_monitor_private_link_scoped_service.ai](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/monitor_private_link_scoped_service) | resource |
| [azurerm_monitor_private_link_scoped_service.dce](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/monitor_private_link_scoped_service) | resource |
Expand Down Expand Up @@ -118,6 +119,7 @@ Optional: PostgreSQL and Redis for OSMO workloads.
| should\_add\_current\_user\_key\_vault\_admin | Whether to add the current user as Key Vault Secrets Officer | `bool` | `true` | no |
| should\_add\_current\_user\_storage\_blob | Whether to add the current user as Storage Blob Data Contributor | `bool` | `true` | no |
| should\_deploy\_aml\_compute | Whether to deploy an AzureML managed compute cluster for GPU workloads | `bool` | `false` | no |
| should\_enable\_aml\_diagnostic\_logs | Whether to enable AML workspace diagnostic logs in Log Analytics | `bool` | `false` | no |
| should\_deploy\_ampls | Whether to deploy Azure Monitor Private Link Scope and its private endpoint | `bool` | `true` | no |
| should\_deploy\_dce | Whether to deploy Data Collection Endpoint for observability | `bool` | `true` | no |
| should\_deploy\_grafana | Whether to deploy Azure Managed Grafana dashboard | `bool` | `true` | no |
Expand Down
17 changes: 17 additions & 0 deletions infrastructure/terraform/modules/platform/azureml.tf
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,23 @@
}
}

resource "azurerm_monitor_diagnostic_setting" "ml_workspace_logs" {

Check warning on line 92 in infrastructure/terraform/modules/platform/azureml.tf

View workflow job for this annotation

GitHub Actions / Terraform Validation / Terraform Validation

Argument is deprecated
count = var.should_enable_aml_diagnostic_logs ? 1 : 0

name = "diag-mlw-${local.resource_name_suffix}"
target_resource_id = azapi_resource.ml_workspace.id
log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id

enabled_log {
category_group = "allLogs"
Comment thread
katriendg marked this conversation as resolved.
}

metric {
category = "AllMetrics"
enabled = false
}
}

// ============================================================
// AzureML Managed Compute Cluster (Optional)
// ============================================================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -601,6 +601,58 @@ run "osmo_identity_disabled" {
}
}

// ============================================================
// AML Diagnostic Logs Conditional
// ============================================================

run "aml_diagnostic_logs_enabled" {
command = plan

variables {
resource_prefix = run.setup.resource_prefix
environment = run.setup.environment
instance = run.setup.instance
location = run.setup.location
resource_group = run.setup.resource_group
current_user_oid = run.setup.current_user_oid
should_enable_aml_diagnostic_logs = true
}

assert {
condition = length(azurerm_monitor_diagnostic_setting.ml_workspace_logs) == 1
error_message = "AML diagnostic setting should be created when enabled"
}

assert {
condition = azurerm_monitor_diagnostic_setting.ml_workspace_logs[0].name == "diag-mlw-${run.setup.resource_prefix}-${run.setup.environment}-${run.setup.instance}"
error_message = "AML diagnostic setting should use the standard diagnostic setting name"
}

assert {
condition = one(azurerm_monitor_diagnostic_setting.ml_workspace_logs[0].enabled_log).category_group == "allLogs"
error_message = "AML diagnostic setting should enable all AML log categories"
}
}

run "aml_diagnostic_logs_disabled" {
command = plan

variables {
resource_prefix = run.setup.resource_prefix
environment = run.setup.environment
instance = run.setup.instance
location = run.setup.location
resource_group = run.setup.resource_group
current_user_oid = run.setup.current_user_oid
should_enable_aml_diagnostic_logs = false
}

assert {
condition = length(azurerm_monitor_diagnostic_setting.ml_workspace_logs) == 0
error_message = "AML diagnostic setting should not be created when disabled"
}
}

// ============================================================
// AML Compute Conditional
// ============================================================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,12 @@ run "verify_defaults" {
error_message = "AML compute cluster should NOT be created by default"
}

// AML diagnostic logs NOT enabled by default
assert {
condition = length(azurerm_monitor_diagnostic_setting.ml_workspace_logs) == 0
error_message = "AML diagnostic setting should NOT be created by default"
}

// OSMO identity enabled by default
assert {
condition = length(azurerm_user_assigned_identity.osmo) == 1
Expand Down
6 changes: 6 additions & 0 deletions infrastructure/terraform/modules/platform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,12 @@ variable "should_deploy_dce" {
* AzureML Compute Configuration
*/

variable "should_enable_aml_diagnostic_logs" {
type = bool
description = "Whether to enable AML workspace diagnostic logs in Log Analytics"
default = false
}

variable "should_deploy_aml_compute" {
type = bool
description = "Whether to deploy an AzureML managed compute cluster for GPU workloads"
Expand Down
Loading
Loading