Adds capability to provision directories on the EFS dynamically by jonathanrainer · Pull Request #732 · kubernetes-sigs/aws-efs-csi-driver

jonathanrainer · 2022-07-01T09:58:03Z

This PR was the result of a hackathon at my current company as well as some work outside to sure up the unit tests and E2E test the final product.

Is this a bug fix or adding new feature?
New feature, requested in #538 and #517 and possibly other issues, this comes up a lot as a feature people would like.

What is this PR about? / Why do we need it?
Since its creation the driver has supported Access Point Provisioning as its main means of dynamic provisioning. However this is problematic for a few reasons, it can causes issues with user permissions around deleting provisioned directories and there is a hard limit of 120 AccessPoints per EFS which for some use cases is very quickly depleted. Further it requires various AWS IAM Permissions that can be complicated to sort out and manage.

This PR allows a new provisioning mode called efs-dir so that instead of creating EFS access points, directories are created instead. This new provisioning mode supports all of the previous parameters (UID, GID, GID Ranges etc.) and so is very easy to transition to, or run alongside the access point provisioning style if preferred. At present the directories are named after the PersistentVolume that is created to ensure a unique directory name and not run into security problems but once #640 merges there's no reason why the same method wouldn't apply to this provisioning method as well with a little bit of extra work. Either this PR can be updated or #640 can be extended if this merges first.

This is achieved by creating a new Interface called Provisioner that is implemented by an AccessPointProvisioner (the original method) and a DirectoryProvisioner (the new method) this also allows in future for different kinds of provisioning to occur, maybe FileSystemProvisioning for example.

What testing is done?
I've moved around some of the Unit Tests that used to relate to controller.go into new files that reference each of the provisioners. These all run 🟢 and I'm fairly happy I've covered most of the important cases. Overall I'm not 100% sure about the placement of the tests and the coverage but it definitely covers all the new code and the code that existed and replicates existing behaviour.

I've also run this through on my own EKS cluster with the deleteProvisionedDir flag on and off and it performs exactly as I anticipated it would. Directories get provisioned, and then when the PVC gets deleted they either are deleted or remain. I've also written E2E tests to run in CI for creation and deletion but will need the PR to be approved before I can road test those.

fixes #538

k8s-ci-robot · 2022-07-01T09:58:12Z

Hi @jonathanrainer. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jonathanrainer · 2022-07-01T10:00:38Z

@Ashley-wenyizha Hi, could I get an approved to test on here so I can start sorting out the E2E tests?

thesuperzapper · 2022-07-04T01:57:23Z

@Ashley-wenyizha @wongma7 @jsafrane I truly believe a feature like this PR is critical for the EFS CSI driver to remain useful, and I would greatly appreciate your thoughts on this PR.

With the current approach, people will often get themselves into a situation where they are suddenly unable to provision new PVCs (once they hit 120 EFS "Access Points" in an AWS Account) and then have no idea what is wrong, let alone be able to fix it without moving to another CSI driver.

The kubernetes-csi/csi-driver-nfs driver already uses a "sub-folder" approach that works on arbitrary NFS servers (including EFS), so if the EFS driver is unwilling to implement this feature, we should consider deprecating this EFS driver and point people towards the official NFS one, in the interest of not leaving users with a ticking timebomb that breaks their cluster once 120 PVCs exist.

jsafrane · 2022-07-14T12:31:36Z

/ok-to-test

andre-lx · 2022-08-05T15:46:27Z

HI @jonathanrainer

Great work.

We are using the #640 for a while, but the 120 access points are a real blocker for us, so this looks like a better solution for our problem in the near future.

We only have two problems, and one of them, as you said in the description are solved by the #640 and that is the subPathPattern and ensureUniqueDirectory , since we want to write to the root folder (and folders paths inside), and we not want the creation of one folder per PVC. As an example:

we have controllers writing to the folder A using the subPath on the deployment
and we have some pods, writing to the folder A/xxx-xxx also using the subPath on the deployment

Using the #640 we are able to do this.

The other problem we see, is that, since you are not using access points, all the files in the EFS are written as root user.

Why not use a single access point per storage class to do all the work?

Thanks for the hard work and hope this two PR's are merged soon.

oe-hbk · 2022-08-23T20:06:27Z

@jonathanrainer Thanks for this PR. It solves a big problem for us where we can't use APs because we don't use Amazon DNS. I took this and merged with #687 (fixing some merge conflicts along the way) and they both work great together. One thing, which may or may not be related to the combined PR on my end, but for directory provisioning, when the SC reclaimPolicy: delete the created directory does not get deleted. If I set basePath: /foo/bar when the PVC and PV are deleted, /foo/bar/pvc-unique-id directory still remains.

FleetAdmiralButter · 2022-08-31T02:10:25Z

Thanks for this @jonathanrainer! I've done a build of this and deployed to our cluster where I've force-updated the EFS storage class to use directory provisioning. Existing PVs all still mount correctly via access points and new PVs are being provisioned correctly using directories.

ChamMach · 2022-09-24T07:36:10Z

Hello, any news on this? I think it's critical to merge it for this driver's future.

jsafrane · 2022-09-26T13:41:17Z

@jonathanrainer can you please rebase the PR?
@wongma7 @torredil can you please review it? IMO it's quite solid.

jonathanrainer · 2022-09-26T14:16:18Z

@jsafrane Sorry I've kind of let this one get away from me a bit, I've just changed job but should be able to dedicate some time to this later in the week. Think it needs a rebase and the E2Es need looking at but it's nearly there.

jsafrane · 2023-11-30T14:10:21Z

@jonathanrainer I am sorry this PR takes more that one year, can you please rebase again?

wolffberg · 2024-01-31T12:05:09Z

Hi @jonathanrainer. Any chance you will have time for above in the near future? Your work is much appreciate⭐

k8s-triage-robot · 2024-04-30T12:30:49Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

wolffberg · 2024-05-14T08:16:52Z

/remove-lifecycle stale

jsafrane · 2024-05-14T13:16:26Z

@jonathanrainer would you be able to rebase this PR?

k8s-triage-robot · 2024-08-12T13:30:51Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-09-11T13:40:32Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

wolffberg · 2024-09-11T13:42:46Z

/remove-lifecycle rotten

k8s-triage-robot · 2024-12-10T14:22:01Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

z0rc · 2024-12-10T15:16:01Z

/remove-lifecycle stale

k8s-triage-robot · 2025-03-10T15:37:00Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

z0rc · 2025-03-11T10:57:03Z

/remove-lifecycle stale

k8s-triage-robot · 2025-06-09T11:03:13Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2025-07-09T11:31:45Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2025-08-08T12:28:02Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2025-08-08T12:28:07Z

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 1, 2022

k8s-ci-robot requested a review from leakingtapan July 1, 2022 09:58

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 1, 2022

k8s-ci-robot requested a review from wongma7 July 1, 2022 09:58

github-advanced-security AI found potential problems Jul 1, 2022

View reviewed changes

jonathanrainer mentioned this pull request Jul 1, 2022

Allow more control of the name of the directory created by the Access Point under Dynamic Provisioning #640

Merged

This was referenced Jul 4, 2022

Disable Access Point Usage #538

Closed

Failed to create access point: AccessPointLimitExceeded: You have reached the maximum number of access points #517

Closed

jsafrane reviewed Jul 14, 2022

View reviewed changes

Comment thread pkg/driver/directory_provisioner.go Outdated

Comment thread pkg/driver/directory_provisioner.go Outdated

Comment thread pkg/driver/directory_provisioner.go Outdated

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 14, 2022

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2022

tobybellwood mentioned this pull request Aug 30, 2022

Support for using different storage classes than bulk uselagoon/build-deploy-tool#102

Open

portellaa mentioned this pull request Sep 15, 2022

feat: enable support for single AP per SC ydataai/aws-efs-csi-driver#1

Open

jonathanrainer force-pushed the directory-provisioning branch from 7a5e93e to f7664a7 Compare September 30, 2022 21:38

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2022

jonathanrainer force-pushed the directory-provisioning branch 3 times, most recently from 8126091 to 26f4763 Compare October 1, 2022 12:25

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2024

bertinatto mentioned this pull request Jul 10, 2024

[WIP] Directory provisioning #1408

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 11, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 11, 2024

mpatlasov mentioned this pull request Nov 8, 2024

Add "efs-dir" provsioning mode #1497

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 11, 2025

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 9, 2025

Conversation

jonathanrainer commented Jul 1, 2022

Uh oh!

k8s-ci-robot commented Jul 1, 2022

Uh oh!

jonathanrainer commented Jul 1, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thesuperzapper commented Jul 4, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsafrane commented Jul 14, 2022

Uh oh!

andre-lx commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oe-hbk commented Aug 23, 2022

Uh oh!

FleetAdmiralButter commented Aug 31, 2022

Uh oh!

ChamMach commented Sep 24, 2022

Uh oh!

jsafrane commented Sep 26, 2022

Uh oh!

jonathanrainer commented Sep 26, 2022

Uh oh!

jsafrane commented Nov 30, 2023

Uh oh!

wolffberg commented Jan 31, 2024

Uh oh!

k8s-triage-robot commented Apr 30, 2024

Uh oh!

wolffberg commented May 14, 2024

Uh oh!

jsafrane commented May 14, 2024

Uh oh!

k8s-triage-robot commented Aug 12, 2024

Uh oh!

k8s-triage-robot commented Sep 11, 2024

Uh oh!

wolffberg commented Sep 11, 2024

Uh oh!

k8s-triage-robot commented Dec 10, 2024

Uh oh!

z0rc commented Dec 10, 2024

Uh oh!

k8s-triage-robot commented Mar 10, 2025

Uh oh!

z0rc commented Mar 11, 2025

Uh oh!

k8s-triage-robot commented Jun 9, 2025

Uh oh!

k8s-triage-robot commented Jul 9, 2025

Uh oh!

k8s-triage-robot commented Aug 8, 2025

Uh oh!

k8s-ci-robot commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

andre-lx commented Aug 5, 2022 •

edited

Loading