Skip to content

Conversation

@m1kola
Copy link
Contributor

@m1kola m1kola commented Jul 8, 2021

#1599 needs to be merged first.

Which issue this PR addresses:

Part of work item №9586080.

What this PR does / why we need it:

Next iteration of #1569: adds validation for Server-Side Encryption and Encryption at host:

  • Check first party SP permissions to read DiskEncryptionSet
  • Check cluster SP permissions to read DiskEncryptionSet
  • Check whether VM supports encryption at host (see docs: we support Standard_E64i_v3, Standard_G5 and Standard_GS5 amongst others and these three do not seem to support encryption at host)

Test plan for issue:

PR adds unit tests + Manual tests (see #1569 for instructions).

Is there any documentation that needs to be updated for this PR?

We need to update customer facing docs and CLI, but it is out of scope of this work.

@m1kola m1kola force-pushed the 9586080_encryption-validation branch 3 times, most recently from cfca6d7 to 749b166 Compare July 9, 2021 14:27
@m1kola m1kola force-pushed the 9586080_encryption-validation branch 7 times, most recently from d9c1fe1 to 61c4997 Compare July 12, 2021 12:40
@m1kola m1kola marked this pull request as ready for review July 12, 2021 12:42
@m1kola m1kola requested review from jewzaam and mjudeikis as code owners July 12, 2021 12:42
Comment on lines 276 to 278
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that currently we allow two different disk encrypthion sets to be used: one for master and one for worker nodes.

We can change the static validation to make sure that master and worker profiles use the same disk encryption set. I'm however not sure whether it makes sense to impose this restriction or not. Opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a reason why it would need to be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a reason why it would need to be the same.

Did you mean different? I would imagine that most of the customers will be using the same disk encryption set for both master and worker nodes. But there are probably use cases for having different sets for workers. Maybe for multi tenant clusters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep them separate. If there is a possibility - somebody will take it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion with MJ about it and decided to force master and worker disk encryption sets to be the same for now in static validation.

We can relax this validation later, if we see a use case.

Comment on lines 276 to 278
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep them separate. If there is a possibility - somebody will take it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to validate location too. If I recall right keyvault + DES has to be in the same region too where cluster is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right:

All resources related to your customer-managed keys (Azure Key Vaults, disk encryption sets, VMs, disks, and snapshots) must be in the same subscription and region.

However, I think it should be enough to validation only location of DES here: we do not work with KV directly and do not even need permissions for it. Sounds like validation of location of KV is a responsibility of compute RP: you first create a KV, then you create a DES via compute RP which should have own validation and only after this you create an ARO cluster providing DES resoruce ID.

Looking into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjudeikis I added location validation.

I think it should be enough to validation only location of DES here: we do not work with KV directly and do not even need permissions for it. Sounds like validation of location of KV is a responsibility of compute RP...

I looked into it and it seems like compute RP doesn't validate location of the KV when creating DES: I was able to create a DES in westeurope with a KV from eastus. Not sure whether it is just outdated documentation or lack of validation on the compute RP.

I would prefer not to validate KV location, becuase it will likely require extra permissions for RP and cluster SP. But we can look into it. What do you think?

Copy link
Contributor

@mjudeikis mjudeikis Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But were you able to use DES in westeurope with cluster in eastusu? Im interested if this works/does not work. I expect this to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But were you able to use DES in westeurope with cluster in eastusu?

@mjudeikis with the new validation in place it will not be possible because our validation will fail: des location != cluster location. But I'm going to test this case this afternoon:

  • KV from westeurope
  • DES in eastus with KV from westeurope
  • Cluster in eastus

It should pass our validation, but I expect it to fail on the underlying RP somewhere. But who knows...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjudeikis, ok so it looks like compute RP doesn't validate when you create a DES, but does validate when you create a VM. I got a DeploymentFailed with KeyVaultAndDiskInDifferentRegions for each master and bootstrap VM.

Not great. I wish they were validating it during DES creation...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this is something I seen :) we need to catch this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into catching it and there are two options:

  1. Catch in the dynamic validation.
  2. Catch DeploymentFailed and check if one of the deploymnet errors has something to do with KeyVaultAndDiskInDifferentRegions.

And both end up being a mess.

If we want to catch on validation we need:

  1. Customer to give us extra permissions which is not great as we only need it during cluster creation for validation of something which is actually not ARO responsibility.
  2. We will have to do the following (additional) checks:
    • Does disk encrhption set have an a source vault set (not sure in which circumstances it can be nil, but the field is optional/nilable)
    • Do we have permissions to read KV object?
    • Does the key vault actually exist (you can create a KV & disk encryption set, then delete key vault and try to create a cluster)
    • Is KV in the same region as cluster?
    • Probably some other edge cases.

If we want to catch this on DeploymentFailed - then it is also tricky. We already catching DeploymentFailed in pkg/cluster/arm.go and propogating it to a customer. If we want to have a nice message then, we need to handle at least these failures:

  1. Disk encryption set exists and in correct region, but KV is in different region. We can catch it by KeyVaultAndDiskInDifferentRegions.
  2. Disk encryption set exists, but key vault doesn't exist. We can't catch it easily: errors are just BadRequest and text message.

Even if we do catch these cases, what do we do next? Do we introduce a special case of api.CloudErrorCodeDeploymentFailed where we say "your key vault is in different region/doesn't exist"? What about other errors from DeploymentFailed? It is possible that DES/KV errors are not the only cause of deploymnet being failed. Do we just ignore other errors?

After looking and trying different options, my opition on this is - it is not worth handling as we already handling DeploymentFailed in pkg/cluster/arm.go and propogating all the erors from ARM to the customer:

if serviceErr != nil {
b, _ := json.Marshal(serviceErr)
return &api.CloudError{
StatusCode: http.StatusBadRequest,
CloudErrorBody: &api.CloudErrorBody{
Code: api.CloudErrorCodeDeploymentFailed,
Message: "Deployment failed.",
Details: []api.CloudErrorBody{
{
Message: string(b),
},
},
},
}
}

@m1kola m1kola force-pushed the 9586080_encryption-validation branch from 61c4997 to ce96da7 Compare July 13, 2021 11:34
@m1kola m1kola requested a review from mjudeikis July 13, 2021 11:38
@m1kola m1kola force-pushed the 9586080_encryption-validation branch 2 times, most recently from c11c679 to de20631 Compare July 16, 2021 11:31
Validations for Server-Side Encryption and Encryption At Host.
@m1kola m1kola force-pushed the 9586080_encryption-validation branch from de20631 to 1eecec6 Compare July 16, 2021 13:24
@troy0820 troy0820 added ready-for-review LGTM Looks Good To Me size-medium Size medium labels Jul 16, 2021
@m1kola m1kola merged commit cca924c into Azure:master Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM Looks Good To Me ready-for-review size-medium Size medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants