Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw block volume mode support #249

Merged

Conversation

nicktate
Copy link
Contributor

@nicktate nicktate commented Jan 3, 2020

Description

What does this pull request accomplish

  • Adds VolumeMode: Block support
  • Parallelizes test/integration_test.go
  • Adds Block volume tests to integration tests

Additional Info

Here is a high level summary of the changes to the driver:

  • controller.go/validateCapabilities - Needs to accept Block access mode in the capabilities check
  • node.go/NodeStageVolume - Updated to a noop for block volumes because we bind mount the absolute device path directly
  • node.go/NodePublishVolume - Checks the AccessMode and triggers nodePublishVolumeForFileSystem or nodePublishVolumeForBlock respectively.
    • The main difference is the source path for nodePublishVolumeForBlock is the absolute path of the device on disk
  • mounter.go/Mount - Updated to handle an empty fsType being passed in the case of Block
  • node.go/NodeGetVolumeStats - Checks to determine if it is a block device. If so, it uses blockdev to determine the totalCapacity of the device (no other information can be collected)

Here is a high level summary of the changes to the integration tests:

  • Supports running tests in parallel
  • Generate resource names based off of the test name
  • Add table driven tests, one per volume mode: Filesystem and Block
  • Clean up all resources on test exit

Test Results

Fixes #192

@nicktate nicktate requested review from adamwg and timoreimann January 3, 2020 20:37
@nicktate nicktate force-pushed the ntate/feature/raw-block-support branch from b25af5a to f517dc1 Compare January 3, 2020 20:40
Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall; left a few suggestions and questions.

Could I ask you to also amend the change log and extend the feature list in our README?

driver/node.go Outdated Show resolved Hide resolved
driver/node.go Outdated Show resolved Hide resolved
driver/node.go Outdated Show resolved Hide resolved
driver/node.go Outdated Show resolved Hide resolved
driver/node.go Outdated Show resolved Hide resolved
driver/mounter.go Outdated Show resolved Hide resolved
@@ -147,7 +148,31 @@ func (m *mounter) Mount(source, target, fsType string, opts ...string) error {
return errors.New("target is not specified for mounting the volume")
}

mountArgs = append(mountArgs, "-t", fsType)
// This is a raw block device mount. Create the mount point as a file
// since bind mount device node requires it to be a file
Copy link
Contributor

@timoreimann timoreimann Jan 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I didn't know that you have to use a file target. Curious, how/where did you find out about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trial and error and then found the corresponding man page (https://linux.die.net/man/8/mount) information about it which was extremely vague:

mount --bind olddir newdir

or shortoption
mount -B olddir newdir

or fstab entry is:
/olddir /newdir none bind

After this call the same contents is accessible in two places. One can also remount a single file (on a single file).

Specifically:

One can also remount a single file (on a single file).

Given that the absolute device path is a block device file, we correspondingly have to bind mount that onto another file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍Thanks (and good job discovering this)

driver/mounter.go Outdated Show resolved Hide resolved
driver/mounter.go Outdated Show resolved Hide resolved
driver/mounter.go Outdated Show resolved Hide resolved
@timoreimann
Copy link
Contributor

timoreimann commented Jan 4, 2020

I forgot to mention that I haven't reviewed the extended integration tests yet; will do that in a second round.

@nicktate btw thanks for the excellent PR description -- appreciated the quick overview I was able to get thanks to that. 👏

@timoreimann
Copy link
Contributor

timoreimann commented Jan 4, 2020

Noticed another thing... the spec describes how NodeExpandVolume should handle block mode:

  // Volume capability describing how the CO intends to use this volume.
  // This allows SP to determine if volume is being used as a block
  // device or mounted file system. For example - if volume is being
  // used as a block device the SP MAY choose to skip expanding the
  // filesystem in NodeExpandVolume implementation but still perform
  // rest of the housekeeping needed for expanding the volume. If
  // volume_capability is omitted the SP MAY determine
  // access_type from given volume_path for the volume and perform
  // node expansion. This is an OPTIONAL field.
  VolumeCapability volume_capability = 5;

This raises two questions for me:

  1. Should we skip expanding in case of block mode access?
  2. (If yes to 1:) Should we implement access type detection if volume_capability is omitted?


tt := []struct {
pod func() *v1.Pod
pvc func() *v1.PersistentVolumeClaim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we discussed that the integration tests will only be useful for testing Kubernetes < 1.14 going forward because newer releases support running the upstream end-to-end tests (that ship with an entire suite of access mode-related tests). What we likely want at some point is introduce a flag to run all integration tests for older releases (1.13 while still supported by us, potentially even older ones) and none for newer releases.

I understand the function returning a PVC is to support testing for filesystem and block mode, respectively, here and for most (all?) test we have build so far. Unfortunately, this would make it difficult to run the tests on 1.13 or below because block mode turned beta in Kubernetes 1.14 only; meaning that the block mode-related tests would all fail on older releases.

Thinking about how we can resolve this, I can see two possibilities:

  1. Test the access mode in a single, dedicated test only and revert the pre-existing tests to their prior state. That way, we can easily turn tests on and off as needed per tested Kubernetes release.
  2. Do not test the access mode in our integration tests at all but rely on the upstream tests only. Merging the feature without any tests is not ideal, however, so for this scenario we'd probably want to wait for Add support for running upstream storage end-to-end tests #248 first so that we can enable the block testing capability in the upstream tests.

The first option allows us to move forward without depending on #248, at the price of a bit more tech debt (as we eventually will be able to remove our access mode integration tests entirely in favor of the upstream ones; at least, I assume so). #248 seems close to completion, but you never know, so maybe option 1. seems slightly preferable.

Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the version support 👍

I think #248 is definitely the direction we will move towards but I think 1 or a variation of it would be better in the short term to reduce the risk of this PR getting blocked.

I'm fine either creating a separate test as you suggested in number 1 or simply calculating the kubernetes_version of the cluster in test during setup and then having a minimum version gate for the table driven test configuration. I think either way it would be smart to create a version gate for the block tests.

I'm personally leaning towards adding the gate into the updated table driven tests for the enhanced test cleanup functionality and naming. I think if they were staying long term it would be the right decision, but since these tests also have a relatively short EOL I would be fine with reverting them to prior state as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gating on top of your improvements sounds fine for me as well. 👍

@timoreimann
Copy link
Contributor

Left a general comment/question regarding the direction of our integration tests, which I think we should answer first before going into details.

Copy link
Contributor

@adamwg adamwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm on a quick read, modulo Timo's notes.

Regarding the resize case, my opinion is that we should not resize filesystems for raw block devices. We didn't create the filesystem, so it feels wrong to touch it later; if a user is using raw block, they're signing up to take care of such things themselves. Interested in others' thoughts on this.

driver/node.go Outdated Show resolved Hide resolved
@nicktate
Copy link
Contributor Author

nicktate commented Jan 6, 2020

@adamwg / @timoreimann I agree with you both on the NodeExpandVolume (lean towards not automatically expanding in the case Block mode access).

https://github.com/kubernetes/kubernetes/blob/4c50ee993c82c6852eb3b3aa8dfa8ecc4bcfe330/pkg/util/resizefs/resizefs_linux.go#L49 ignores resizing of the device if it is not formatted. I hadn't considered the case of the user requesting block and formatting after the fact.

I will noop if Block is in volume capability, but if it omitted, I think it is fine to attempt the resize since it will be internally handled.

@nicktate nicktate force-pushed the ntate/feature/raw-block-support branch from b307610 to f102259 Compare January 6, 2020 20:48
@timoreimann
Copy link
Contributor

@nicktate

I will noop if Block is in volume capability, but if it omitted, I think it is fine to attempt the resize since it will be internally handled.

That sounds good to me. 👍

FWIW, the volumeMode field is immutable, so users wouldn't be able to change the mode through the Kubernetes API. They'd have to access the volume directly, at which points all bets are off.

driver/node.go Outdated Show resolved Hide resolved
driver/node.go Outdated Show resolved Hide resolved
driver/node.go Outdated Show resolved Hide resolved
driver/node.go Outdated Show resolved Hide resolved
test/kubernetes/deploy/deploy.sh Outdated Show resolved Hide resolved
@nicktate nicktate force-pushed the ntate/feature/raw-block-support branch from f102259 to 553c03e Compare January 6, 2020 23:51
@nicktate
Copy link
Contributor Author

nicktate commented Jan 6, 2020

@timoreimann Most comments have been addressed with exception of test updates & readme/changelog. I will be following up once I have re-ran integration / e2e tests.

@nicktate nicktate force-pushed the ntate/feature/raw-block-support branch from be86518 to 2e40914 Compare January 7, 2020 15:22
@nicktate
Copy link
Contributor Author

nicktate commented Jan 7, 2020

@timoreimann / @adamwg This should be good for a final review. I believe all comments have been addressed and I've re-ran e2e and upstream tests and posted the new results in the description.

I do have some interesting findings regarding the expand volume. I updated the logic per the spec and check volume capabilities if it exists, and in the case of block volume attempt to noop a filesystem resize. In effect, if a user formatted their raw block volume and then expanded the persistent volume, we would also attempt to expand the filesystem because we cannot discern it is block access mode.

In testing the VolumeCapability argument is always nil and after digging into the source I have found it is not passed to the corresponding calls:

I posed the question in the community CSI channel, so you can follow any updates here: https://kubernetes.slack.com/archives/C8EJ01Z46/p1578414177000600.

@nicktate
Copy link
Contributor Author

nicktate commented Jan 7, 2020

I've also tested against #209 and it seems to be working as expected:

╰─ k get po
NAME                                                            READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-88krf                                          3/3     Running     0          4m35s
csi-cephfsplugin-lbjd8                                          3/3     Running     0          4m35s
csi-cephfsplugin-provisioner-86d9db5b7b-l52pg                   4/4     Running     0          4m35s
csi-cephfsplugin-provisioner-86d9db5b7b-tnk8h                   4/4     Running     0          4m35s
csi-cephfsplugin-wlmb7                                          3/3     Running     0          4m35s
csi-rbdplugin-6gzfg                                             3/3     Running     0          4m36s
csi-rbdplugin-cg2zj                                             3/3     Running     0          4m36s
csi-rbdplugin-gqkb6                                             3/3     Running     0          4m36s
csi-rbdplugin-provisioner-76bb56764b-8cx4t                      5/5     Running     0          4m36s
csi-rbdplugin-provisioner-76bb56764b-9crmp                      5/5     Running     0          4m36s
rook-ceph-crashcollector-pool-3yiqj6yja-kxkn-84bc647d4b-btq8j   1/1     Running     0          2m55s
rook-ceph-crashcollector-pool-3yiqj6yja-kzep-6c7bcd8686-mmrfj   1/1     Running     0          90s
rook-ceph-crashcollector-pool-3yiqj6yja-kzes-56cbb64c4f-9dld9   1/1     Running     0          3m14s
rook-ceph-mgr-a-6c7497f4d6-k5j9f                                1/1     Running     0          2m32s
rook-ceph-mon-a-856565757b-sk4t2                                1/1     Running     0          3m34s
rook-ceph-mon-b-5bf967cfb5-gxhnz                                1/1     Running     0          3m14s
rook-ceph-mon-c-6cd5c4bbc4-q9cs2                                1/1     Running     0          2m55s
rook-ceph-operator-f978b8565-j45jq                              1/1     Running     0          5m33s
rook-ceph-osd-0-7bf8956b67-5tczx                                1/1     Running     0          90s
rook-ceph-osd-1-959f64d47-79mxw                                 1/1     Running     0          86s
rook-ceph-osd-2-6df95cd54-tglwn                                 1/1     Running     0          84s
rook-ceph-osd-prepare-set1-0-data-7jgmd-2cwkr                   0/1     Completed   0          2m11s
rook-ceph-osd-prepare-set1-1-data-lc9hv-2ngbw                   0/1     Completed   0          2m11s
rook-ceph-osd-prepare-set1-2-data-bvkk9-55mg6                   0/1     Completed   0          2m11s
rook-discover-h5t7z                                             1/1     Running     0          5m13s
rook-discover-lndkl                                             1/1     Running     0          5m13s
rook-discover-lqdcc                                             1/1     Running     0          5m13s

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 for validating the PR with Rook as well.

LGTM 👍 Can you use Github to squash-merge the change (or squash manually and force-push) so that we end up with a single clean commit in master?

Copy link
Contributor

@adamwg adamwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

* Refactor integration tests for parallelization and raw block volume mode
* Add kubernetes version compatibility check for integration tests
* Add changelog and readme updates for raw block mode
@nicktate nicktate force-pushed the ntate/feature/raw-block-support branch from 2e40914 to 0e22e5c Compare January 7, 2020 20:47
@nicktate nicktate merged commit 8a59ae6 into digitalocean:master Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Raw Block Volume mounts
3 participants