Skip to content

✨ Additional data volumes for machines#1668

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
stackhpc:feature/data-volumes
Oct 4, 2023
Merged

✨ Additional data volumes for machines#1668
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
stackhpc:feature/data-volumes

Conversation

@mkjpryor
Copy link
Copy Markdown

@mkjpryor mkjpryor commented Sep 13, 2023

What this PR does / why we need it:

This PR adds the ability for a machine to have additional data volumes attached, e.g. for etcd.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1286

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Sep 13, 2023
@netlify
Copy link
Copy Markdown

netlify bot commented Sep 13, 2023

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 94d9690
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/651c295a278dfd00086e887a
😎 Deploy Preview https://deploy-preview-1668--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 13, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 13, 2023
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @mkjpryor. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 13, 2023
Copy link
Copy Markdown
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Lets start the discussion.

NameSuffix string `json:"nameSuffix"`
Size int `json:"diskSize,omitempty"`
VolumeType string `json:"volumeType,omitempty"`
AvailabilityZone string `json:"availabilityZone,omitempty"`
Copy link
Copy Markdown
Contributor

@mdbooth mdbooth Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenfin Very much interested in your input here.

I've proposed adding useMachineAZ (default false) to a future api version for the root volume. What do you think the semantics of this one should be?

Options:

  • Empty AZ means no AZ
  • Empty AZ means something machine related
  • Anything else?

Is there any reason to have an option here to use the same AZ as the machine?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to make the semantics the same as the root volume for now. I think that is less confusing.

Then when we change the root volume we can change this as well. My plan is to have one "volume creating" function so hopefully the logic doesn't have to be duplicated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is we know the semantics of the rootVolume are wrong, and we're going to have a minor upgrade problem there. If we know what we're going to do with rootVolume I'd prefer to get this right in the first instance.

See #1662 for WIP.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is left to do on the root volume WIP? Should we get that merged first?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a sensible change to me. Are you worried about it being a breaking change because the default behaviour changes from "machine AZ" to "no AZ"?

(Which should have been the default all along IMHO, but what's done is done...)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we want a non-breaking change, then the useServerAZ flag defaults to true.

Copy link
Copy Markdown
Contributor

@EmilienM EmilienM Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we want a non-breaking change, then the useServerAZ flag defaults to true.

That's what I believe is already CAPO's behaviour on rootVolumes:

availabilityZone := instanceSpec.FailureDomain
if rootVolume.AvailabilityZone != "" {
availabilityZone = rootVolume.AvailabilityZone
}

Copy link
Copy Markdown
Author

@mkjpryor mkjpryor Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I mirrored the current root behaviour for the additional volumes on purpose.

I think we should change the behaviour for all types of volume together in a separate PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this behaviour for the same reason we need to change it for the root volume, namely what @JohnGarbutt called the '80% case' is impossible. We are definitely have to change it.

However, I can also see the benefit of being consistent with the root volume, and making a breaking change to both APIs at the same time in the future.

Copy link
Copy Markdown
Contributor

@EmilienM EmilienM Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We came to an agreement to bulk change it via another PR. I added it to the TODO.

Copy link
Copy Markdown
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you also add field comments of the form

// FieldName does X. It defaults to Y. It is incompatible with Z.

@mdbooth
Copy link
Copy Markdown
Contributor

mdbooth commented Sep 13, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 13, 2023
@mkjpryor mkjpryor force-pushed the feature/data-volumes branch from 8491f71 to 28501e5 Compare September 14, 2023 16:03
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 15, 2023
RootVolume *RootVolume `json:"rootVolume,omitempty"`

// DataVolumes is a list of specifications for volumes to attach to the server instance
DataVolumes []DataVolume `json:"dataVolumes,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but just to mention the CAPZ used DataDisks.
I wonder if we want to follow that pattern, since our data might end-up being on volumes and/or local disks at some point.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used DataVolumes to match RootVolume. I don't really have a strong opinion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're likely to end up with:

  • RootVolume
  • DataVolumes
  • LocalDataVolume

I'm with @mkjpryor on the lack of a strong opinion on the name. I agree that DataVolumes is consistent with RootVolume. Presumably they're also referenced in... KubeadmControlPlane? What are they called there?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The things you have to set in kubeadmcontrolplane and/or kubeadmbootstrapconfig correspond to cloud-init fields, so DiskSetup and Mounts.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the OpenStack terminology for these things is volume, so my feeling is that we stick with that 🤷‍♂️

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we settle on the API, considering that we'll support multiple ephemeral disks and volumes.

Option A. DataVolumes and LocalDataVolumes where each of them have the options needed for their use-case.

Option B. One field (maybe name it BlockDevices to avoid confusion with volumes`) with all fields that we need for both use cases.

I have a preference for Option B. (and the rename) for simplicity's sake.

Copy link
Copy Markdown
Author

@mkjpryor mkjpryor Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is also for option B with the name BlockDevices. I like using a name that mirrors the OpenStack API attribute that we are ultimately setting - I think it is probably less confusing for users.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we capture that these block devices explicitly don't include the root volume. I feel that DataDisks and DataVolumes already capture that in the word Data. If we really like BlockDevices, perhaps AdditionalBlockDevices?

// This is then available in the instance metadata, allowing devices to be
// identified by tag inside the server.
Tag string `json:"tag,omitempty"`
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be addressed in a separate PR but for supporting local disks, we'll need to add more fields.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one I think, to indicate whether it should be a local or Cinder disk? The rest can be reused in both cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we accept my argument that there will be a single local disk then I would prefer it to have a separate field because it's more obvious to the user and doesn't require validation. I would only put a local flag here if we decide it's useful to have multiple local disks.

But as you say we don't need to consider this yet.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdbooth as I raised in Slack, I think I have a use case for multiple ephemeral devices carved out of the ephemeral volume, so I would like to see an option that permits that.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is something for a future PR anyway

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkjpryor Please look at my recent refactor & patches, I think the way we're proposing now is the same behaviour as with rootVolumes. And indeed this can be bulked changed later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to add support for local ephemeral drives, we would need the DestinationType field and that's it, I think.

@EmilienM
Copy link
Copy Markdown
Contributor

👍 excellent start to me!

Beside a few comments and missing doc (and tests?) this is good to me.

Note: I'm very interested by local disks instead volumes, but with your PR I can easily see it extended via a separate PR. However I have no objection (beside making the PR too big) if you include it in here.

@mkjpryor
Copy link
Copy Markdown
Author

Note: I'm very interested by local disks instead volumes, but with your PR I can easily see it extended via a separate PR. However I have no objection (beside making the PR too big) if you include it in here.

I feel like it should be a separate PR, just in case there are unintended consequences that need addressing.

@EmilienM
Copy link
Copy Markdown
Contributor

/test pull-cluster-api-provider-openstack-test

@EmilienM
Copy link
Copy Markdown
Contributor

/test pull-cluster-api-provider-openstack-build

@EmilienM EmilienM force-pushed the feature/data-volumes branch from 93fc4f6 to fceff67 Compare September 25, 2023 02:13
DeleteOnTermination: true,
// TODO(mkjpryor) uncomment once gophercloud patch is merged
// https://github.com/gophercloud/gophercloud/pull/2777
// Tag: dataVolumeSpec.Tag,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once gophercloud/gophercloud#2789 is merged and a release was produced, I'll update vendoring here.

@EmilienM
Copy link
Copy Markdown
Contributor

EmilienM commented Sep 25, 2023

TODO:

@mdbooth
Copy link
Copy Markdown
Contributor

mdbooth commented Sep 25, 2023

TODO:

See above comment. I'd also like to make it clearer that BlockDevices isn't all block devices because it doesn't include the root volume.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 25, 2023
// This is then available in the instance metadata, allowing devices to be
// identified by tag inside the server.
Tag string `json:"tag,omitempty"`
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to add support for local ephemeral drives, we would need the DestinationType field and that's it, I think.

bootVolume := firstVolume
additionalVolume := secondVolume
// The boot volume is the one with the "bootable" flag set.
if firstVolume.Bootable != "true" { // This is genuinely a string, not a bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious as to why we need this gymnastic. Don't we have the guarantee the first volume is always the boot volume?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do more testing to confirm that listing volumes provide an output ordered by creation time. I'll look at that today because I agree that this is ugly 😄 .

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it and indeed it seems that the volume used for boot is always first.

- diskSize: 1
nameSuffix: extravol
volumeType: ${OPENSTACK_VOLUME_TYPE_ALT}
availabilityZone: ${OPENSTACK_FAILURE_DOMAIN}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be good to drop the availabilityZone there, to replicate the test we do for rootVolume?
We're testing setting an availabilityZone for the worker.

Expect(bootVolume.VolumeType).To(Equal(volumeTypeAlt))

additionalVolume := additionalVolumes[machine.Name]
Expect(additionalVolume.AvailabilityZone).To(Equal(failureDomainAlt))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to ensure the additionalVolumes behave the same as rootVolume, we should probably replicate the test, i.e. expect the value to be failureDomain.

"block_device_mapping_v2": []map[string]interface{}{
{
"delete_on_termination": true,
"destination_type": "local",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we don't support local yet, so it's weird to have it there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the default block device mapping when no root volume is being used.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to specify a non-bootable block device, you also have to specify a bootable one. That means that we need to specify the "from image to local" block device

might create the volume, but the instance create fails. This would leave us with a dangling
volume with no instance.
1. Create the volume
2. Create the instance with the created volume in iAdditionalBlockDevices with DeleteOnTermination=true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vim user spotted.

Suggested change
2. Create the instance with the created volume in iAdditionalBlockDevices with DeleteOnTermination=true
2. Create the instance with the created volume in AdditionalBlockDevices with DeleteOnTermination=true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we should mention that the root volume is added to AdditionalBlockDevices as it's not completely obvious, no?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'm updating the doc.

BootIndex: 0,
DeleteOnTermination: true,
})
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand the reason to set an ephemeral drive from the image, is that because if we don't set it and we have additional block devices, then we don't know where to boot the instance from?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly. We need an ephemeral storage to be created using the imageID and be bootable (bootindex 0).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, yeah. OpenStack won't let you set non-bootable block devices without setting a bootable one

return nil, err
}
if existingVolume != nil {
if existingVolume.Size != opts.Size {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to also check the availability zone and the type to ensure the existing volume conforms to the spec?
Alternatively, it might be simpler to error out when there is an existing volume with the name we want to use.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we do this:

  • When we create a volume, we put the Kubernetes UUIDs of the OpenStackCluster and the OpenStackMachine in the volume metadata
    • openstack.infrastructure.cluster.x-k8s.io/cluster-uuid
    • openstack.infrastructure.cluster.x-k8s.io/machine-uuid
  • When we search for a volume by name and find one, we then check if the UUIDs in the metadata match the cluster/machine being reconciled
  • If they do, we own the volume and that is the volume we return
  • If they don't, we don't own the volume and we error out

If that's a pattern people agree on, I'll be happy to do this via a separated PR if possible. For the sake of not adding too many design changes into one PR.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdbooth and I also discussed doing this for Neutron resources using tags

Copy link
Copy Markdown
Contributor

@mandre mandre Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks reasonable. I would just change the logic to search for volume by name and by metadata in a single query if that is possible. If it returns a result, that's the one we own and we can use it, otherwise, we know we need to create a new one.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the single query is possible unfortunately, although I might have missed something. I would be surprised if the metadata items were indexed for searching.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We came to the agreement that we (I) will take care of it in a separated PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever cut a release containing this feature which does not add the metadata described by @EmilienM then we have an upgrade issue here which would be a whole lot more complicated. If we don't think we can get it into this PR, lets not wait around to implement it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said on another comment, I'm working on it and this will be done before the next release is cut.

@EmilienM EmilienM force-pushed the feature/data-volumes branch from 9fca026 to de72667 Compare September 26, 2023 18:09
@EmilienM
Copy link
Copy Markdown
Contributor

EmilienM commented Sep 26, 2023

/hold cancel

This is ready for review.
There are a list of things we want to do next, but we believe that this should be done via a separated PR:

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 26, 2023
Copy link
Copy Markdown
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really close, thanks!

Can you please also consider how soon we can add the volume metadata support? Ideally this will be within the next couple of months to avoid an upgrade issue.

NameSuffix string `json:"nameSuffix"`
Size int `json:"diskSize,omitempty"`
VolumeType string `json:"volumeType,omitempty"`
AvailabilityZone string `json:"availabilityZone,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this behaviour for the same reason we need to change it for the root volume, namely what @JohnGarbutt called the '80% case' is impossible. We are definitely have to change it.

However, I can also see the benefit of being consistent with the root volume, and making a breaking change to both APIs at the same time in the future.

return nil, err
}
if existingVolume != nil {
if existingVolume.Size != opts.Size {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever cut a release containing this feature which does not add the metadata described by @EmilienM then we have an upgrade issue here which would be a whole lot more complicated. If we don't think we can get it into this PR, lets not wait around to implement it.

Expect(volumes).To(HaveLen(2))

bootVolume, err := shared.GetOpenStackVolume(e2eCtx, volumes[0].ID)
rootVolume, err := shared.GetOpenStackVolume(e2eCtx, volumes[0].ID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order of AttachedVolumes stable? I would guess strictly not, although possibly stable enough in practise for this test?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's apparently stable. The volume attachments listed for a server seem to be indexed by their creation, and the rootVolume is always created first. I was surprised myself too but I've tested it a few times and it was always the case 🤷 .

I'll try to find the code in Nova to confirm what I saw.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API output is generated here. The bdms parameter is populated here. I followed that call to nova.objects.BlockDeviceMappingList.bdms_by_instance_uuid all the way down to the DB layer and didn't spot any sorting being done. This leads me to believe we're probably relying on the order of the rows being returned by the DB. As you likely know, MySQL doesn't guarantee sorting without a specific ORDER BY so that doesn't seem wise since it won't be consistent.

You should double check this (!) but assuming this is correct, you probably want to explore device tagging. In general, nova doesn't guarantee anything wrt device ordering and asks that you use device tagging if you need this. I don't know if gophercloud support this but you simply want to add the tags field to the BDM for the root volume, e.g.

{
  "block_device_mapping_v2": [
    {
      "uuid": "<image_id>",
      "boot_index": 0,
      "source_type": "image",
      "destination_type": "volume",
      "volume_size": "<volume_size>",
      "tags": ["foo", "bar"]
    }
  ]
}

(note: this requires API microversion 2.42).

You can then use the volume attachments API (GET /servers/{serverID}/os-volume-attachments) to list the attached volumes with tags, since this info is not returned in the os-extended-volumes:volumes_attached field in the GET /servers/{serverID} response.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, that was my thought as well. I'll investigate it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've addressed this comment with:

// nova.objects.BlockDeviceMappingList.bdms_by_instance_uuid does not guarantee order of the volumes
// so we need to find the boot volume by checking the "bootable" flag for now.
firstVolumeFound, err := shared.GetOpenStackVolume(e2eCtx, volumes[0].ID)
Expect(err).NotTo(HaveOccurred(), "failed to get OpenStack volume %s for machine %s", volumes[0].ID, machine.Name)
secondVolumeFound, err := shared.GetOpenStackVolume(e2eCtx, volumes[1].ID)
Expect(err).NotTo(HaveOccurred(), "failed to get OpenStack volume %s for machine %s", volumes[1].ID, machine.Name)
rootVolume := firstVolumeFound
additionalVolume := secondVolumeFound
// The boot volume is the one with the "bootable" flag set.
if !firstVolumeFound.Bootable == "true" { // This is genuinely a string, not a bool
rootVolume = secondVolumeFound
additionalVolume = firstVolumeFound
}

Until we find a better solution. With device tagging that I plan to work on next, we'll probably make this better.

@EmilienM
Copy link
Copy Markdown
Contributor

EmilienM commented Sep 27, 2023

This is really close, thanks!

Can you please also consider how soon we can add the volume metadata support? Ideally this will be within the next couple of months to avoid an upgrade issue.

I'm working on it now, I think the PR will be pushed before Tuesday next week.

Copy link
Copy Markdown
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job with this!
I agree with the decision to implement the AZ in the same way as for the root volume. It will be easier to understand it this way and then break both at the same time.

Could you please add some doc strings for at least the more complicated functions? I commented below on those I think are most important.

@lentzi90
Copy link
Copy Markdown
Contributor

/test pull-cluster-api-provider-openstack-e2e-full-test

@EmilienM EmilienM force-pushed the feature/data-volumes branch from b76c92c to 26e9191 Compare September 29, 2023 15:35
@EmilienM
Copy link
Copy Markdown
Contributor

@lentzi90 comments addressed. Thanks for the review!

Copy link
Copy Markdown
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you respond to the 3 API requests:

  • Rename nameSuffix to name
  • Remove tag in favour of just using name
  • Addition of strategic merge patch annotation to additionalBlockDevices

There's also an outstanding request from @lentzi90 for a missing docstring.

Apart from that, this looks good and has good test coverage. I don't think there's nothing complicated in the above requests.

type AdditionalBlockDevice struct {
// NameSuffix is the suffix to be added to volume names.
// It will be combined with the machine name to make the volume name.
NameSuffix string `json:"nameSuffix"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this name instead? The docstring can explain that the volume name will be the machine name with this appended.

Copy link
Copy Markdown
Contributor

@EmilienM EmilienM Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you want to name it Name. This is not a name, this is a name suffix.
This would be a lie to the API user that their block device would be named like this, because this wouldn't be the case.

The actual block device name is chosen from within the code:

volumeName(instanceSpec.Name, blockDevice.NameSuffix)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below. It would be the name of the volume in the context of this machine. It would also be the name of the volume in the additionalBlockDevices list.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, I'll do it.

// Tag is a tag to add to the volume attachment.
// This is then available in the instance metadata, allowing devices to be
// identified by tag inside the server.
Tag string `json:"tag,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to have a separate tag element instead of just using the name?

@mdbooth
Copy link
Copy Markdown
Contributor

mdbooth commented Oct 3, 2023

To explain the above API requests, consider the case where a user wants to specify an additional volume to hold etcd.

Before:

additionalBlockDevices:
- nameSuffix: etcd
  size: 10
  tag: etcd

After:

additionalBlockDevices:
- name: etcd
  size: 10

I prefer the latter.

@EmilienM EmilienM force-pushed the feature/data-volumes branch from 26e9191 to 61afcac Compare October 3, 2023 14:15
@EmilienM
Copy link
Copy Markdown
Contributor

EmilienM commented Oct 3, 2023

@mdbooth @lentzi90 I've applied all of your suggestions, thanks for the feedback. I think this time it's ready to go!

We now have the ability for a machine to have additional
block devices to be attached.

Here is an example on how to use `additionalBlockDevices`
for adding additional Cinder volumes attached to the server instance:

```yaml
additionalBlockDevices:
- nameSuffix: dbvol1
  size: 50
  volumeType: my-volume-type
  availabilityZone: az0
  tag: database
```

Co-authored-by: Matt Pryor <matt@stackhpc.com>
Co-authored-by: Emilien Macchi <emilien@redhat.com>
@EmilienM EmilienM force-pushed the feature/data-volumes branch from 61afcac to 94d9690 Compare October 3, 2023 14:46
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mdbooth, mkjpryor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2023
@mdbooth
Copy link
Copy Markdown
Contributor

mdbooth commented Oct 3, 2023

Pending tests, of course.

@lentzi90 Can you lgtm if you're happy with the API changes?

@lentzi90
Copy link
Copy Markdown
Contributor

lentzi90 commented Oct 4, 2023

/test pull-cluster-api-provider-openstack-e2e-full-test

Copy link
Copy Markdown
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 4, 2023
@k8s-ci-robot k8s-ci-robot merged commit ade1ed0 into kubernetes-sigs:main Oct 4, 2023
@EmilienM EmilienM deleted the feature/data-volumes branch October 4, 2023 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to mount additional volumes

8 participants