Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new resource - azurerm_machine_learning_compute_cluster #11675

Merged

Conversation

gro1m
Copy link
Contributor

@gro1m gro1m commented May 12, 2021

Fixes #11254

Basic implementation for a Machine Learning Compute Cluster (without possibility to ssh into compute cluster). Unfortunately there are still some problems:

--- TEST SUMMARY ---

TF_ACC=1 go test -v ./azurerm/internal/services/machinelearning -run=TestAccComputeCluster -timeout 60m -ldflags="-X=github.com/terraform-providers/terraform-provider-azurerm/version.ProviderVersion=acc"
2021/05/12 10:29:26 [DEBUG] not using binary driver name, it's no longer needed
2021/05/12 10:29:26 [DEBUG] not using binary driver name, it's no longer needed
=== RUN   TestAccComputeCluster_basic
=== PAUSE TestAccComputeCluster_basic
=== RUN   TestAccComputeCluster_requiresImport
=== PAUSE TestAccComputeCluster_requiresImport
=== RUN   TestAccComputeCluster_basicUpdate
=== PAUSE TestAccComputeCluster_basicUpdate
=== CONT  TestAccComputeCluster_basic
=== CONT  TestAccComputeCluster_basicUpdate
=== CONT  TestAccComputeCluster_requiresImport

<>

    testing.go:620: Step 2/2, expected an error with pattern, no match on: Error running apply: exit status 1
        
        Error: ID contained more segments than required: "/subscriptions/b609167b-ec1b-406c-accc-dc6cb5df9f43/resourceGroups/acctestRG-ml-210512102926964607/providers/Microsoft.MachineLearningServices/workspaces/acctest-MLW2105121029269646/computes/CC-21051207", map[computes:CC-21051207]
        
          on terraform_plugin_test.tf line 113, in resource "azurerm_machine_learning_compute_cluster" "import":
         113: resource "azurerm_machine_learning_compute_cluster" "import" {
        
        
=== CONT  TestAccComputeCluster_basicUpdate
    testing.go:620: Step 3/4 error: After applying this test step, the plan was not empty.
        stdout:
        
        
        An execution plan has been generated and is shown below.
        Resource actions are indicated with the following symbols:
          ~ update in-place
        
        Terraform will perform the following actions:
        
          # azurerm_machine_learning_compute_cluster.test will be updated in-place
          ~ resource "azurerm_machine_learning_compute_cluster" "test" {
                id                            = "/subscriptions/b609167b-ec1b-406c-accc-dc6cb5df9f43/resourceGroups/acctestRG-ml-210512102926969345/providers/Microsoft.MachineLearningServices/workspaces/acctest-MLW2105121029269693/computes/CC-21051245"
                name                          = "CC-21051245"
                # (5 unchanged attributes hidden)
        
        
              ~ scale_settings {
                  ~ max_node_count                   = 1 -> 2
                  ~ min_node_count                   = 0 -> 1
                  ~ node_idle_time_before_scale_down = "PT30S" -> "PT60S"
                }
                # (1 unchanged block hidden)
            }
        
        Plan: 0 to add, 1 to change, 0 to destroy.
--- FAIL: TestAccComputeCluster_requiresImport (865.39s)
--- FAIL: TestAccComputeCluster_basicUpdate (876.13s)
--- PASS: TestAccComputeCluster_basic (976.30s)
FAIL
FAIL    github.com/terraform-providers/terraform-provider-azurerm/azurerm/internal/services/machinelearning     981.063s
FAIL
make: *** [acctests] Error 1

@gro1m
Copy link
Contributor Author

gro1m commented May 15, 2021

@aristosvo If you have time would be happy if you could review this as well. I tried to fix the requiresImport error, but I did not find what the issue is. I removed the update for now, as this is impossible to achieve with the current API. For the requiresImport I still get this error:

Error: ID contained more segments than required: "/subscriptions/b609167b-ec1b-406c-accc-dc6cb5df9f43/resourceGroups/acctestRG-ml-210512102926964607/providers/Microsoft.MachineLearningServices/workspaces/acctest-MLW2105121029269646/computes/CC-21051207", map[computes:CC-21051207]

It seems like the computes/CC-21051207 part is unexpected, but why. I thought that I am using the correct parsers and I did not find any erroneous ones. But maybe you can find something?!

Copy link
Collaborator

@aristosvo aristosvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To solve your initial problem we can fix the test, haven't looked at the whole resource in detail

@gro1m
Copy link
Contributor Author

gro1m commented May 16, 2021

TEST SUMMARY:

TF_ACC=1 go test -v ./azurerm/internal/services/machinelearning -run=TestAccComputeCluster -timeout 60m -ldflags="-X=github.com/terraform-providers/terraform-provider-azurerm/version.ProviderVersion=acc"
2021/05/16 10:18:46 [DEBUG] not using binary driver name, it's no longer needed
2021/05/16 10:18:46 [DEBUG] not using binary driver name, it's no longer needed
=== RUN TestAccComputeCluster_basic
=== PAUSE TestAccComputeCluster_basic
=== RUN TestAccComputeCluster_requiresImport
=== PAUSE TestAccComputeCluster_requiresImport
=== CONT TestAccComputeCluster_basic
=== CONT TestAccComputeCluster_requiresImport
--- PASS: TestAccComputeCluster_basic (560.54s)
--- PASS: TestAccComputeCluster_requiresImport (568.04s)
PASS
ok github.com/terraform-providers/terraform-provider-azurerm/azurerm/internal/services/machinelearning 572.932s

@ghost ghost added size/XXL and removed size/XL labels May 16, 2021
@gro1m gro1m changed the title [WIP] new resource - azurerm_machine_learning_compute_cluster new resource - azurerm_machine_learning_compute_cluster May 16, 2021
@gro1m gro1m force-pushed the r/azurerm_machine_learning_compute_cluster branch from 44236cd to 178d4ca Compare May 18, 2021 07:07
@gro1m
Copy link
Contributor Author

gro1m commented May 18, 2021

As there have been some small changes, I tested again and everything still looks good:

TF_ACC=1 go test -v ./azurerm/internal/services/machinelearning -run=TestAccComputeCluster -timeout 60m -ldflags="-X=github.com/terraform-providers/terraform-provider-azurerm/version.ProviderVersion=acc"
2021/05/18 10:18:42 [DEBUG] not using binary driver name, it's no longer needed
2021/05/18 10:18:42 [DEBUG] not using binary driver name, it's no longer needed
=== RUN   TestAccComputeCluster_basic
=== PAUSE TestAccComputeCluster_basic
=== RUN   TestAccComputeCluster_requiresImport
=== PAUSE TestAccComputeCluster_requiresImport
=== CONT  TestAccComputeCluster_basic
=== CONT  TestAccComputeCluster_requiresImport
--- PASS: TestAccComputeCluster_basic (461.23s)
--- PASS: TestAccComputeCluster_requiresImport (472.31s)
PASS
ok      github.com/terraform-providers/terraform-provider-azurerm/azurerm/internal/services/machinelearning     476.998s

@favoretti
Copy link
Collaborator

My 2ct on the matter - since the cluster is ephemeral by design, we could potentially release these without ability to update the scale settings and basically recreate it every time, it's not a super time-consuming operation. While we're working with Service Team on fixing the swagger - people (we) could use this already and implement scaling updates once SDK has a fix?

@gro1m
Copy link
Contributor Author

gro1m commented May 20, 2021

@katbyte Should look good as well now:)

TF_ACC=1 go test -v ./azurerm/internal/services/machinelearning -run=TestAccComputeCluster -timeout 60m -ldflags="-X=github.com/terraform-providers/terraform-provider-azurerm/version.ProviderVersion=acc"
2021/05/20 11:10:17 [DEBUG] not using binary driver name, it's no longer needed
2021/05/20 11:10:17 [DEBUG] not using binary driver name, it's no longer needed
=== RUN   TestAccComputeCluster_basic
=== PAUSE TestAccComputeCluster_basic
=== RUN   TestAccComputeCluster_requiresImport
=== PAUSE TestAccComputeCluster_requiresImport
=== CONT  TestAccComputeCluster_basic
=== CONT  TestAccComputeCluster_requiresImport
--- PASS: TestAccComputeCluster_basic (764.75s)
--- PASS: TestAccComputeCluster_requiresImport (785.94s)
PASS
ok      github.com/terraform-providers/terraform-provider-azurerm/azurerm/internal/services/machinelearning     790.986s

@favoretti favoretti force-pushed the r/azurerm_machine_learning_compute_cluster branch 2 times, most recently from 245103e to 4a80be0 Compare May 26, 2021 11:30
@favoretti favoretti force-pushed the r/azurerm_machine_learning_compute_cluster branch from 4a80be0 to 51d5ca1 Compare May 26, 2021 11:53
@favoretti
Copy link
Collaborator

image

@katbyte katbyte modified the milestones: v2.62.0, v2.63.0 Jun 4, 2021
@gro1m
Copy link
Contributor Author

gro1m commented Jun 10, 2021

@katbyte @tombuildsstuff @aristosvo @favoretti @ArcturusZhang Microsoft informed us that they expect we could start implementing the update operation in August (fix in July, communication in August). The workaround for the time being to get unblocked would be to manually update the auto generated GO SDK.

Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks @caius - just have a couple comments on schema bit otherwise this is looking good


* `scale_settings` - (Required) A `scale_settings` block as defined below. Changing this forces a new Machine Learning Compute Cluster to be created.

* `vm_priority` - (Required) The priority of the VM. Changing this forces a new Machine Learning Compute Cluster to be created.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what values are possible here? and can we validate against them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options are "LowPriority" or "Dedicated", so possible we could validate that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in new commit

## Arguments Reference

The following arguments are supported:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we put name. location, worspace id first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in new commit.


* `min_node_count` - (Required) Minimal node count. Changing this forces a new Machine Learning Compute Cluster to be created.

* `node_idle_time_before_scale_down` - (Required) Node Idle Time Before Scale Down: defines the time until the compute is shutdown when it has gone into Idle state. Is defined according to W3C XML schema standard for duration. Changing this forces a new Machine Learning Compute Cluster to be created.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this make more sense as

Suggested change
* `node_idle_time_before_scale_down` - (Required) Node Idle Time Before Scale Down: defines the time until the compute is shutdown when it has gone into Idle state. Is defined according to W3C XML schema standard for duration. Changing this forces a new Machine Learning Compute Cluster to be created.
* `scale_down_nodes_after_idle_duration` - (Required) Node Idle Time Before Scale Down: defines the time until the compute is shutdown when it has gone into Idle state. Is defined according to W3C XML schema standard for duration. Changing this forces a new Machine Learning Compute Cluster to be created.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the azure cli term used here: https://docs.azure.cn/zh-cn/cli/ext/azure-cli-ml/ml/computetarget/create?view=azure-cli-latest#ext_azure_cli_ml_az_ml_computetarget_create_amlcompute. But I agree that your proposal is more descriptive, so up to you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use this more description one, we tend to change names to better match the terraform schema/provider the best UX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in newest commit

@gro1m
Copy link
Contributor Author

gro1m commented Jun 10, 2021

@katbyte Everything should be addressed now, could you test please, because I won't push for some hours now? I did not change the naming of vm_size and node_idle_time_after_shutdown (see reasons above). If you still think it is better to change it would be nice if you could that.

@katbyte
Copy link
Collaborator

katbyte commented Jun 10, 2021

address your comments inline above, just node_idle_time_after_shutdown to address now

@katbyte katbyte modified the milestones: v2.63.0, v2.64.0 Jun 11, 2021
@gro1m
Copy link
Contributor Author

gro1m commented Jun 11, 2021

@katbyte I addressed your change and the tests should pass:

TEST SUMMARY:

=== RUN   TestAccComputeCluster_basic
=== PAUSE TestAccComputeCluster_basic
=== RUN   TestAccComputeCluster_requiresImport
=== PAUSE TestAccComputeCluster_requiresImport
=== CONT  TestAccComputeCluster_basic
=== CONT  TestAccComputeCluster_requiresImport
--- PASS: TestAccComputeCluster_basic (515.41s)
--- PASS: TestAccComputeCluster_requiresImport (529.17s)
PASS
ok      github.com/terraform-providers/terraform-provider-azurerm/azurerm/internal/services/machinelearning     540.560s

Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gro1m - LGTM 👍

@katbyte katbyte merged commit 353fedd into hashicorp:master Jun 11, 2021
katbyte added a commit that referenced this pull request Jun 11, 2021
@gro1m gro1m deleted the r/azurerm_machine_learning_compute_cluster branch June 11, 2021 20:51
@github-actions
Copy link

This functionality has been released in v2.64.0 of the Terraform Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Azure Machine Learning Compute Clusters (new resource)
6 participants