Skip to content

Support resolving Azure identities from VM scale set for SQL Server connections#51671

Merged
gabrielcorado merged 7 commits intomasterfrom
gabrielcorado/support-connect-sqlserver-vmss
Feb 14, 2025
Merged

Support resolving Azure identities from VM scale set for SQL Server connections#51671
gabrielcorado merged 7 commits intomasterfrom
gabrielcorado/support-connect-sqlserver-vmss

Conversation

@gabrielcorado
Copy link
Copy Markdown
Contributor

Closes #51606.

Updates the flow for retrieving the Azure identity for SQL Server connections to handle if the VM is on a scale set. For this case, we need to retrieve the identities list from a different API, given that fetching it directly using the resource ID doesn't work, as VMSS has a different resource ID. And, we cannot use the already existent GetByVMID because those VMs are not on the regular VMs list.

This PR adds an option to GetByVMID so callers can specify the scale set name. Once this is set, we'll use the Azure VM set scale client to retrieve the VM details. The remaining of the flow is the same.

changelog: Fix Azure SQL Servers connect failures when the database agent runs on a VM scale set.

@github-actions github-actions Bot added database-access Database access related issues and PRs size/md labels Jan 30, 2025
Comment thread lib/cloud/azure/vm.go
@gabrielcorado
Copy link
Copy Markdown
Contributor Author

@greedy52 I reworked this PR to go for more like the solution you mentioned in your comment: take the resource ID and extract the info from there. This is given that VM scale sets have two orchestration modes that we'll need to support:

  • Flexible: Works like a regular VM and has a VM resource ID. But the info about the scale set to be present in the metadata response. We still need to call the regular VM endpoint to retrieve the identities. The original solution on this PR wouldn't work because we would make the call to the wrong API. So we don't need anything changed on the flow if they're using this mode.
  • Uniform: This requires a different API call (to scale set VM endpoint). We can distinguish it from the resource ID. This is usually the mode used on AKS clusters.

@greedy52
Copy link
Copy Markdown
Contributor

greedy52 commented Feb 5, 2025

@greedy52 I reworked this PR to go for more like the solution you mentioned in your comment: take the resource ID and extract the info from there. This is given that VM scale sets have two orchestration modes that we'll need to support:

  • Flexible: Works like a regular VM and has a VM resource ID. But the info about the scale set to be present in the metadata response. We still need to call the regular VM endpoint to retrieve the identities. The original solution on this PR wouldn't work because we would make the call to the wrong API. So we don't need anything changed on the flow if they're using this mode.
  • Uniform: This requires a different API call (to scale set VM endpoint). We can distinguish it from the resource ID. This is usually the mode used on AKS clusters.

sounds good. could you update the description of the function so reader knows what to expect when different IDs are passed in?

@greedy52
Copy link
Copy Markdown
Contributor

greedy52 commented Feb 5, 2025

could you also verify what impact this change has for other callers? like

// If the token is from the system-assigned managed identity, the resource ID
// is for the VM itself and we can use it to look up the VM.
if slices.Contains(resourceID.ResourceType.Types, azureVirtualMachine) {
vm, err = vmClient.Get(ctx, tokenClaims.ManangedIdentityResourceID)

@gabrielcorado
Copy link
Copy Markdown
Contributor Author

@greedy52 Added comments for better context on the flow.

could you also verify what impact this change has for other callers? like

It is only being used by the place you mentioned. It already matches VMSS VMs there, but then it would fail to retrieve their information using the regular VMs API. So, we're improving the Azure join handling in some parts (when the identity is system-managed). It is worth mentioning that this doesn't change the other flows, including discovery. After merging this, I'll add this information to the other issue.

Comment thread lib/cloud/azure/vm.go Outdated
} {
t.Run(tc.desc, func(t *testing.T) {
vmClient := NewVirtualMachinesClientByAPI(tc.client)
vmClient := NewVirtualMachinesClientByAPI(tc.client, nil /* scaleSetAPI */)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe use the mock here. it doesn't have to mock anything but it's better nil in my opinion. Or, since it's the same api call, we could also merge the test below here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that this test case shouldn't rely on the secondary client. If that happens, the test will fail with a panic (which would indicate the client's misusage). We could use a mock that always returns an error, but that could also cause false positives when the tests expect an error.

Comment thread lib/srv/db/common/auth_test.go Outdated
@gabrielcorado gabrielcorado added this pull request to the merge queue Feb 14, 2025
Merged via the queue into master with commit 66c019c Feb 14, 2025
@gabrielcorado gabrielcorado deleted the gabrielcorado/support-connect-sqlserver-vmss branch February 14, 2025 03:04
@public-teleport-github-review-bot
Copy link
Copy Markdown

@gabrielcorado See the table below for backport results.

Branch Result
branch/v15 Failed
branch/v16 Failed
branch/v17 Failed

@milos-teleport
Copy link
Copy Markdown
Contributor

Heya @gabrielcorado, any plans for a v17 backport?

carloscastrojumo pushed a commit to carloscastrojumo/teleport that referenced this pull request Feb 19, 2025
…onnections (gravitational#51671)

* feat(sqlserver): support resolving identity for scale set VMs

* fix: update mocks signature

* refactor: rework fetch scale set vms

* chore(imds): remove unused metadata attribute

* chore: add comments clarifying the flow

* refactor: code review suggestions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/branch/v17 database-access Database access related issues and PRs size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VMSS hosted database agents on Azure are broken

4 participants