Skip to content

support quota set on overlay snapshot#5859

Closed
yylt wants to merge 1 commit intocontainerd:mainfrom
yylt:quota
Closed

support quota set on overlay snapshot#5859
yylt wants to merge 1 commit intocontainerd:mainfrom
yylt:quota

Conversation

@yylt
Copy link
Contributor

@yylt yylt commented Aug 11, 2021

support set quota on overlay snapshot

as discussed in issue #3329

Fixes #3329

enable quota set by follows:

  1. Make sure the root directory of containerd is mounted with 'pquota'
  2. Set 'overlay' as the default snapshot and enable quota in 'overlay' config
  3. Set the default quota size in CRI

config.toml like this

  [plugins.'io.containerd.cri.v1.runtime']
    enable_selinux = false
    default_snapshot_quota_size = '2M'   #+
...
  [plugins.'io.containerd.snapshotter.v1.overlayfs']
    root_path = ''
    upperdir_label = false
    sync_remove = false
    slow_chown = false
    mount_options = []
    enable_quota = true  #+

check containerd root mount info

# /etc/fstab
...
/dev/mapper/containerd_root  /var/lib/containerd  xfs     defaults,pquota        0 0

# mount ...
/dev/mapper/containerd_root on /var/lib/containerd type xfs (rw,relatime,prjquota)

check container had been set quota

xfs_quota -x -c "report -h" /var/lib/containerd/

Signed-off-by: Yang Yang yang8518296@163.com

@k8s-ci-robot
Copy link

Hi @yylt. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Aug 11, 2021

Build succeeded.

See the License for the specific language governing permissions and
limitations under the License.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems copied from Moby. Please clarify the origin of the code and the license in code comments.

#define Q_XGETPQUOTA QCMD(Q_XGETQUOTA, PRJQUOTA)
#endif
*/
import "C"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid cgo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid cgo

Done!

@AkihiroSuda AkihiroSuda added the status/needs-discussion Needs discussion and decision from maintainers label Aug 11, 2021
@AkihiroSuda AkihiroSuda added this to the 1.6 milestone Aug 11, 2021
@dmcgowan
Copy link
Member

Quota handling is a topic we should discuss at a high level first and possibly a good topic for a future community meeting. However, getting library support for setting project quotas across different backing FS makes sense to do now.

As an end feature, we need to discuss how it will be integrated with Kubernetes and how we can use it from our own tooling and go libraries. It seems backwards to have the cri layer here enable the quota then the snapshot set a static quota. I would think the quota would be set by the client adding a quota and the snapshotter enabling it via configuration and erroring out if the backing FS couldn't support it when enabled. The difficult part is figuring out from the client perspective whether quota is enabled so it knows whether it can avoid expensive ephemeral storage accounting (such as in the Kubelet).

NetNSMountsUnderStateDir bool `toml:"netns_mounts_under_state_dir" json:"netnsMountsUnderStateDir"`
// SupportSetQuota indicate to set quota on container read-write snapshot, when snapshot do not
// support quota set, it will do nothing.
SupportSetQuota bool `toml:"support_set_quota" json:"supportSetQuota"`
Copy link

@lining2020x lining2020x Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe EnableRootfsQuota or EnableSetQuotasounds better.

@dmcgowan dmcgowan modified the milestones: 1.6, 1.7 Dec 9, 2021
@pacoxu
Copy link
Contributor

pacoxu commented Jun 9, 2022

The difficult part is figuring out from the client perspective whether quota is enabled so it knows whether it can avoid expensive ephemeral storage accounting (such as in the Kubelet).

In Kubernetes, there is a feature gate and a check method SupportsQuotas, here is the KEP "Quotas for Ephemeral Storage" kubernetes/enhancements#1029 from Kubernetes.
https://github.com/kubernetes/kubernetes/blob/226323178e23b4c476001266beab1e2f116b3879/pkg/volume/util/fsquota/common/quota_linux_common_impl.go#L180-L190

// SupportsQuotas determines whether the filesystem supports quotas.
func SupportsQuotas(mountpoint string, qType QuotaType) (bool, error) {
	data, err := runXFSQuotaCommand(mountpoint, "state -p")
	if err != nil {
		return false, err
	}
	if qType == FSQuotaEnforcing {
		return strings.Contains(data, "Enforcement: ON"), nil
	}
	return strings.Contains(data, "Accounting: ON"), nil
}

@yylt
Copy link
Contributor Author

yylt commented Jun 16, 2022

sorry for so long time later.

As an end feature, we need to discuss how it will be integrated with Kubernetes and how we can use it from our own tooling and go libraries. It seems backwards to have the cri layer here enable the quota then the snapshot set a static quota. I would think the quota would be set by the client adding a quota and the snapshotter enabling it via configuration and erroring out if the backing FS couldn't support it when enabled. The difficult part is figuring out from the client perspective whether quota is enabled so it knows whether it can avoid expensive ephemeral storage accounting (such as in the Kubelet).

it is good suggestion. now quota setter focus on overlay, and with pure go.

and update snapshots litter, add new option WithQuotaSize. mostly if the option not define, will use default size in overlayfs configurtion

such as:

  [plugins."io.containerd.snapshotter.v1.overlayfs"]
    root_path = ""
    quota_size = "128MB"

Copy link
Contributor

@pacoxu pacoxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to build a containerd to do some test. Too many quota is created in my env.

[root@daocloud ~]# xfs_quota -xc "report -a -p -h"  | head -n 100
Project quota on /var/lib/kubelet (/dev/sdc)
                        Blocks
Project ID   Used   Soft   Hard Warn/Grace
---------- ---------------------------------
#0           820K      0      0  00 [------]
volume1048590      0     8E     8E  00 [------]
volume1048605      0     8E     8E  00 [------]
volume1048607      0     8E     8E  00 [------]
volume1048610      0     8E     8E  00 [------]

Project quota on /var/lib/containerd (/dev/sdb)
                        Blocks
Project ID   Used   Soft   Hard Warn/Grace
---------- ---------------------------------
#0          10.2G      0      0  00 [------]
#2              0      0      0  00 [------]
#3              0    15G    15G  00 [------]
#4            12K    15G    15G  00 [------]
#5              0    15G    15G  00 [------]
#6              0    15G    15G  00 [------]
#7              0    15G    15G  00 [------]
#8              0    15G    15G  00 [------]
#9              0    15G    15G  00 [------]
#10             0    15G    15G  00 [------]
#11             0    15G    15G  00 [------]
#12             0    15G    15G  00 [------]
#13             0    15G    15G  00 [------]
#14             0    15G    15G  00 [------]
#15             0    15G    15G  00 [------]
#16             0    15G    15G  00 [------]
#17             0    15G    15G  00 [------]
#18             0    15G    15G  00 [------]
#19             0    15G    15G  00 [------]
#20             0    15G    15G  00 [------]
#21             0    15G    15G  00 [------]
#22             0    15G    15G  00 [------]
#23             0    15G    15G  00 [------]
#24            4K    15G    15G  00 [------]
#25             0    15G    15G  00 [------]
#26             0    15G    15G  00 [------]
#27             0    15G    15G  00 [------]
#28             0    15G    15G  00 [------]
#29            4K    15G    15G  00 [------]
#30             0    15G    15G  00 [------]
#31             0    15G    15G  00 [------]
#32             0    15G    15G  00 [------]
#33             0    15G    15G  00 [------]
#34             0    15G    15G  00 [------]
#35            4K    15G    15G  00 [------]
#36            4K    15G    15G  00 [------]
#37             0    15G    15G  00 [------]
#38          540K    15G    15G  00 [------]
#39             0    15G    15G  00 [------]
#40             0    15G    15G  00 [------]
#41             0    15G    15G  00 [------]
#42             0    15G    15G  00 [------]
#43             0    10G    10G  00 [------]
#44             0    10G    10G  00 [------]
#45             0    10G    10G  00 [------]
#46             0    10G    10G  00 [------]
#47             0    10G    10G  00 [------]
#48             0    10G    10G  00 [------]
#49             0    10G    10G  00 [------]
#50             0    10G    10G  00 [------]
#51             0    10G    10G  00 [------]
#52             0    10G    10G  00 [------]
#53             0    10G    10G  00 [------]
#54             0    10G    10G  00 [------]
#55             0    10G    10G  00 [------]
#56             0    10G    10G  00 [------]
#57             0    10G    10G  00 [------]
#58             0    10G    10G  00 [------]
#59             0    10G    10G  00 [------]
#60             0    10G    10G  00 [------]
#61             0    10G    10G  00 [------]
#62             0    10G    10G  00 [------]
#63             0    10G    10G  00 [------]

I set the size to 10G at first and 15G later.

[root@daocloud ~]# xfs_quota -xc "report -a -p -h"  | wc -l
1956507

oOpts = append(oOpts, overlay.WithUpperdirLabel)
}

ic.Meta.Exports["root"] = root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dup with line 74.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +121 to +125
fsMagic, err := overlayutils.GetFSMagic(root)
if err != nil {
return nil, err
}
if config.quotaSize > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fsMagic, err := overlayutils.GetFSMagic(root)
if err != nil {
return nil, err
}
if config.quotaSize > 0 {
if config.quotaSize > 0 {
fsMagic, err := overlayutils.GetFSMagic(root)
if err != nil {
return nil, err
}

} else {
size = config.quotaSize
}
return quotaCtl.SetAllQuota(size, targets...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting quota may need some logs. Probably, some debug log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@yylt
Copy link
Contributor Author

yylt commented Jun 27, 2022

I tried to build a containerd to do some test. Too many quota is created in my env.

[root@daocloud ~]# xfs_quota -xc "report -a -p -h"  | head -n 100
Project quota on /var/lib/kubelet (/dev/sdc)
                        Blocks
Project ID   Used   Soft   Hard Warn/Grace
---------- ---------------------------------
#0           820K      0      0  00 [------]
volume1048590      0     8E     8E  00 [------]
volume1048605      0     8E     8E  00 [------]
volume1048607      0     8E     8E  00 [------]
volume1048610      0     8E     8E  00 [------]

Project quota on /var/lib/containerd (/dev/sdb)
                        Blocks
Project ID   Used   Soft   Hard Warn/Grace
---------- ---------------------------------
#0          10.2G      0      0  00 [------]
#2              0      0      0  00 [------]
#3              0    15G    15G  00 [------]
#4            12K    15G    15G  00 [------]
#5              0    15G    15G  00 [------]
#6              0    15G    15G  00 [------]
#7              0    15G    15G  00 [------]
#8              0    15G    15G  00 [------]
#9              0    15G    15G  00 [------]
#10             0    15G    15G  00 [------]
#11             0    15G    15G  00 [------]
#12             0    15G    15G  00 [------]
#13             0    15G    15G  00 [------]
#14             0    15G    15G  00 [------]
#15             0    15G    15G  00 [------]
#16             0    15G    15G  00 [------]
#17             0    15G    15G  00 [------]
#18             0    15G    15G  00 [------]
#19             0    15G    15G  00 [------]
#20             0    15G    15G  00 [------]
#21             0    15G    15G  00 [------]
#22             0    15G    15G  00 [------]
#23             0    15G    15G  00 [------]
#24            4K    15G    15G  00 [------]
#25             0    15G    15G  00 [------]
#26             0    15G    15G  00 [------]
#27             0    15G    15G  00 [------]
#28             0    15G    15G  00 [------]
#29            4K    15G    15G  00 [------]
#30             0    15G    15G  00 [------]
#31             0    15G    15G  00 [------]
#32             0    15G    15G  00 [------]
#33             0    15G    15G  00 [------]
#34             0    15G    15G  00 [------]
#35            4K    15G    15G  00 [------]
#36            4K    15G    15G  00 [------]
#37             0    15G    15G  00 [------]
#38          540K    15G    15G  00 [------]
#39             0    15G    15G  00 [------]
#40             0    15G    15G  00 [------]
#41             0    15G    15G  00 [------]
#42             0    15G    15G  00 [------]
#43             0    10G    10G  00 [------]
#44             0    10G    10G  00 [------]
#45             0    10G    10G  00 [------]
#46             0    10G    10G  00 [------]
#47             0    10G    10G  00 [------]
#48             0    10G    10G  00 [------]
#49             0    10G    10G  00 [------]
#50             0    10G    10G  00 [------]
#51             0    10G    10G  00 [------]
#52             0    10G    10G  00 [------]
#53             0    10G    10G  00 [------]
#54             0    10G    10G  00 [------]
#55             0    10G    10G  00 [------]
#56             0    10G    10G  00 [------]
#57             0    10G    10G  00 [------]
#58             0    10G    10G  00 [------]
#59             0    10G    10G  00 [------]
#60             0    10G    10G  00 [------]
#61             0    10G    10G  00 [------]
#62             0    10G    10G  00 [------]
#63             0    10G    10G  00 [------]

I set the size to 10G at first and 15G later.

[root@daocloud ~]# xfs_quota -xc "report -a -p -h"  | wc -l
1956507

yes, it becauce set quota committed layer too. but it is not expected

@pacoxu
Copy link
Contributor

pacoxu commented Jul 6, 2022

I got some new errors with latest code.

7月 06 14:57:03 paco containerd[10220]: time="2022-07-06T14:57:03.299383030+08:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1
7月 06 14:57:03 paco containerd[10220]: time="2022-07-06T14:57:03.299968488+08:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.overlayfs" error="Failed to set quota limit for projid 1 on /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/backingFsBlockDev: function not implemented"
7月 06 14:57:03 paco containerd[10220]: time="2022-07-06T14:57:03.300169800+08:00" level=warning msg="could not use snapshotter overlayfs in metadata plugin" error="Failed to set quota limit for projid 1 on /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/backingFsBlockDev: function not implemented"

And the quota is not set as expected.

@yylt yylt force-pushed the quota branch 2 times, most recently from f72d83c to ed16c4b Compare July 29, 2022 07:42
@yylt
Copy link
Contributor Author

yylt commented Jul 29, 2022

I try command GOGC=off containerd -c myconfig.toml, and it always successed, and use code below, but it does not work.

	_, _, errno := syscall.Syscall6(syscall.SYS_QUOTACTL, uintptr(qcmd(Q_XSETQLIM, XFS_PROJ_QUOTA)),
		(*[2]uintptr)(unsafe.Pointer(&backingFsBlockDev))[0], uintptr(projectID),
		uintptr(unsafe.Pointer(&fd)), 0, 0)

and use bpftrace to debug, and the syscall _sys_quotactl retval could be 0, -2 , -20.

bpftrace -e 'kretprobe:__x64_sys_quotactl {printf(%s, retval %d\n",kstack,retval)}'

finally, use below to translate string to uintptr could be work fine.

	devbyte := append([]byte(backingFsBlockDev), 0)

	_, _, errno := syscall.Syscall6(syscall.SYS_QUOTACTL, uintptr(qcmd(Q_XSETQLIM, XFS_PROJ_QUOTA)),
		uintptr(unsafe.Pointer(&devbyte[0])), uintptr(fd.id),
		uintptr(unsafe.Pointer(&fd)), 0, 0)

@lukasmrtvy
Copy link

Any update?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present, there is no direct way to use quota on overlayfs, here is to support quota-enabled for overlayfs only.

As mentioned at #5859 (comment), perhaps in the future, configuring quotas through config.toml, cri-api, or annotations will be added, and this should discussion more.

Therefore, I believe it is more appropriate to modify the document when it is supported configure.

Signed-off-by: Yang Yang <yang8518296@163.com>
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

}
}

// WithLabels appends labels to a created snapshot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// WithLabels appends labels to a created snapshot
// WithQuotaSize appends a label to this snapshot identifying it's maximum quota size in bytes for the snapshot writable layer


// WithLabels appends labels to a created snapshot
func WithQuotaSize(bytesize uint64) Opt {
return func(info *Info) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call WithLabels

return nil
}

// WithEnableQuota define the enable quota
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// WithEnableQuota define the enable quota
// WithEnableQuota set enable quota to true/on

}
}

func QuotaSize(labels map[string]string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func QuotaSize(labels map[string]string) string {
// QuotaSize returns a string comprising the maximum quota size in bytes as an unsigned integer or nil if unset
func QuotaSize(labels map[string]string) string {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add more helpers here. This helper doesn't do much either.

// MountOptions are options used for the overlay mount (not used on bind mounts)
MountOptions []string `toml:"mount_options"`

// EnableQuota is define the quota on the snapshot writable layer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// EnableQuota is define the quota on the snapshot writable layer
// EnableQuota turns on the quota max size in bytes feature for the snapshot writable layer

case fs.MagicXfs:
quotaCtl, err := quota.NewControl(root)
if err != nil {
log.L.WithError(err).Warnf("could not initinal quota control")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.L.WithError(err).Warnf("could not initinal quota control")
log.L.WithError(err).Warnf("could not initialize quota control")

log.L.WithError(err).Warnf("could not initinal quota control")
return nil, fmt.Errorf("directory '%s' does not support set quota, make sure mount with 'pquota'", root)
}
return func(targets []string, size uint64) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting we switched to a string... but are still using quota size here as a uint. thus no way to set "0" on size vs "" to mean unset return nil...

Copy link
Member

@dmcgowan dmcgowan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is only the first step but it is defining an interface for how it will be used by other snapshotters and backing filesystems in the future. I am fine we we want to start by just defining the label but think we should have a more generalized solution or plan before we put it into the default snapshotter.

Is there a plan for how this would interact with CRI/k8s? Setting a default max in the CRI config seems more dangerous than helpful as these sort of limits should be defined by k8s. This should also be able to free up the costly Usage calculation when it is used, otherwise it is only a minor enhancement.

I would suggest not considering this for the 2.1 release since it would only be experimental and the limited backing store support and possible client use doesn't warrant changes to the default snapshotter. I do envision in 2.2 on beyond we can start thinking about more hybrid snapshotters, where a snapshotter may be using overlay as well as block devices (similar to what erofs is doing). That could provide a more generalized solution for quota but possibly not a part of the default overlay snapshotter.

LabelSnapshotUIDMapping = "containerd.io/snapshot/uidmapping"
// LabelSnapshotGIDMapping is the label used for GID mappings
LabelSnapshotGIDMapping = "containerd.io/snapshot/gidmapping"
// LabelSnapshotQuotaSize is the quota size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we define the label here, we should clearly define the expected value

}
}

func QuotaSize(labels map[string]string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add more helpers here. This helper doesn't do much either.

// SupportQuota checks if overlay filesystem supports quota or not.
//
// This function returns quotaSetter or error if it fails to check the filesystem.
func SupportQuota(root string) (QuotaSetter, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't overlay specific

@yylt
Copy link
Contributor Author

yylt commented Mar 10, 2025

@dmcgowan clear on the current concern, the current implementation is indeed limited to overlayfs. More detailed context can be found in this discussion: #11516.

//
// Get project id of parent dir as minimal id to be used by driver
//
minProjectID, err := getProjectID(basePath)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when containerd restart , can not obtain an existing path with prjid set on the system. In this case, the value in q.quotas[] is empty. The newly started container will use the prjid already used by another container
(dlv) p q
("*github.com/containerd/containerd/snapshots/quota.Control")(0xc000456800)
github.com/containerd/containerd/snapshots/quota.Control {
RWMutex: sync.RWMutex {
w: (sync.Mutex)(0xc000456800),
writerSem: 0,
readerSem: 0,
readerCount: (
"sync/atomic.Int32")(0xc000456810),
readerWait: (
"sync/atomic.Int32")(0xc000456814),},
backingFsBlockDev: "/home/containerd_rt/root/io.containerd.snapshotter.v1.overlayfs/backingFsBlockDev",
nextProjectID: 2,
quotas: map[string]uint32 [],}
(dlv)

@yylt
Copy link
Contributor Author

yylt commented Mar 13, 2025

Compared to ephemeral storage, adding storage configuration to cri-API is completely unnecessary, and there are many file system restrictions, making it seem useless

@mikebrow
Copy link
Member

are we on the same page? "Goals - Universal snapshot support for quota: To implement a unified approach for managing quotas across all snapshot types. ... " along with Crosby's orig. recommendation "this can be implemented with snapshot labels and then the snapshotters will have to be updated to support these things. However, we still need higher layers like CRI and kube to pass these labels" and this discussion thread: #759

It's doesn't feel like we should only care about quotas for overlayfs and it doesn't seem like quota requirements don't need to flow over the CRI pod/container/metrics apis, ... not following the idea that quotas would be completely unnecessary and seemingly useless for CRI clients.

@yylt
Copy link
Contributor Author

yylt commented Mar 13, 2025

are we on the same page? "Goals - Universal snapshot support for quota: To implement a unified approach for managing quotas across all snapshot types. ... " along with Crosby's orig. recommendation "this can be implemented with snapshot labels and then the snapshotters will have to be updated to support these things. However, we still need higher layers like CRI and kube to pass these labels" and this discussion thread: #759

sorry for that, it's possible that I didn't fully express my thoughts. Integrating with CRI is obviously necessary, but that would require submitting a KEP([Kubernetes Enhancement Proposals) and engaging in discussions. I anticipate this process to be quite time-consuming. Moreover, ephemeral storage generally doesn't require snapshot type, making it more convenient.

Only after the KEP had been merged, that would be necessary to modify the current patch, it's likely that widespread agreement will only be achieved at that point.

@Vigilans
Copy link

Vigilans commented Apr 1, 2025

As mentioned at #5859 (comment), perhaps in the future, configuring quotas through config.toml, cri-api, or annotations will be added, and this should discussion more.

As an end feature, we need to discuss how it will be integrated with Kubernetes and how we can use it from our own tooling and go libraries.

It would be great if this feature could also be introduced to docker with containerd snapshotter enabled. Currently with containerd-snapshotter: true, docker will lose the ability to set quota which was available in overlay2 graph driver.

@lukasmrtvy
Copy link

lukasmrtvy commented May 27, 2025

@yylt, by any chance, is it possible to set the quota manually for each container with xfs_quota as workaround ? Thanks

btw whats is this #10404?

@yylt
Copy link
Contributor Author

yylt commented May 28, 2025

@yylt, by any chance, is it possible to set the quota manually for each container with xfs_quota as workaround ? Thanks

ref: https://linux.die.net/man/8/xfs_quota, and here use project id set limit, like this

# mount -o prjquota /dev/xvm/var /var
# xfs_quota -x -c 'project -s -p /var/log 42' /var
# xfs_quota -x -c 'limit -p bhard=1g 42' /var

btw whats is this #10404?

Perhaps it's another way of implementing quota

@lukasmrtvy
Copy link

lukasmrtvy commented May 28, 2025

@yylt Thanks. This is working for vanilla Docker ( for UpperDir ), but with containerd, there are snapshots, and these are not predictable. Any idea?

This is what I have tried:

docker create --name foobar alpine sleep infinity
cid=$(docker inspect foobar --format='{{.Id}}')
hex_part="${cid:0:8}"
projid=$((0x$hex_part % 65000 + 1000))
root_dir="/mnt/data/docker/rootfs/overlayfs/$cid"
mkdir "$root_dir"
xfs_quota -x -c "project -s -p $root_dir $projid" /mnt/data/docker/
xfs_quota -x -c "limit -p bsoft=2G bhard=2G $projid" /mnt/data/docker/
docker start foobar && docker exec foobar sh -c 'df -h'

Filesystem                Size      Used Available Use% Mounted on
overlay                 289.6G      2.4G    287.2G   1% /

@yylt
Copy link
Contributor Author

yylt commented May 28, 2025

@yylt Thanks. This is working for vanilla Docker ( for UpperDir ), but with containerd, there are snapshots, and these are not predictable. Any idea?

Snapshot aim to implemented union file system with different backends. Currently, if it's overlayfs, you can usually see the following information on the system

# mount
...
overlay on /run/containerd/io.containerd.runtime.v2.task/k8s.io/{cid}/rootfs type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/229/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/262/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/262/work)

Quotas can be set for the directories specified by upperdir and workdir through xfs_quota

@lukasmrtvy
Copy link

lukasmrtvy commented May 28, 2025

This information is available after the container is started, not before; thus, it's impossible to set quotas before the container starts.

@yylt
Copy link
Contributor Author

yylt commented May 28, 2025

This information is available after the container is started, not before; thus, it's impossible to set quotas before the container starts.

Right, but it's should be acceptable for want to workaround, also you can subscribe for CONTAINER_STARTED_EVENT through socket(containerd) so that you can set them up as quickly as possible

@hdp7891000
Copy link

@yylt Hello, could you please provide the configuration and usage methods again? I attempted to set a quota limit but was unsuccessful.
"Ignoring unknown key in TOML for plugin" error="strict mode: fields in the document are missing in the target struct" key=default_snapshot_quota_size plugin=io.containerd.cri.v1.runtime

The file system has enabled prjquota.

@yylt
Copy link
Contributor Author

yylt commented Jul 10, 2025

@yylt Hello, could you please provide the configuration and usage methods again?

Sorry for the late reply.

The current implementation cannot be directly used, it will be completed in a few steps. Now only allows the overlayfs to support quota syscall in the internal.

@hsiangkao
Copy link
Member

hsiangkao commented Jul 10, 2025

@yylt Hello, could you please provide the configuration and usage methods again?

Sorry for the late reply.

The current implementation cannot be directly used, it will be completed in a few steps. Now only allows the overlayfs to support quota syscall in the internal.

I do believe quota solution (or other forms for writable layers such as seperate sparse files) is much cleaner to be implemented as a part of mount manager/plugin so that overlayfs-based writable layers among various snapshotters can leverage this seemlessly.

@dmcgowan
Copy link
Member

I'm going to remove this out of the 2.2 milestone. I'm not sure we want this PR as is, a few things to consider though

  1. In 2.2 we will have a block based option for setting quota, this will allow quota to be set regardless of the underlying filesystem. Currently this is only planned to be implemented for erofs snapshotter in 2.2.
  2. We need a quota plumbed through CRI so that kubelet can use it for ephemeral limits and have an alternative to the check usage and evict logic.
  3. The proposal to use NRI may be worth exploring for the non-block case Filesystem quotas #759 (comment)

If you want to split out the changes to define the quota size label and documentation around it, I think we could consider that for 2.2.

@yylt
Copy link
Contributor Author

yylt commented Oct 17, 2025

I'm going to remove this out of the 2.2 milestone. I'm not sure we want this PR as is, a few things to consider though

1. In 2.2 we will have a block based option for setting quota, this will allow quota to be set regardless of the underlying filesystem. Currently this is only planned to be implemented for erofs snapshotter in 2.2.

2. We need a quota plumbed through CRI so that kubelet can use it for ephemeral limits and have an alternative to the check usage and evict logic.

3. The proposal to use NRI may be worth exploring for the non-block case  [Filesystem quotas #759 (comment)](https://github.com/containerd/containerd/issues/759#issuecomment-3336572411)

If you want to split out the changes to define the quota size label and documentation around it, I think we could consider that for 2.2.

Thank you for providing this information. I’ve always had some concerns about whether to make changes:

  • Quotas are only applicable to OverlayFS and require specific mount options.

  • For the CRI to detect quota support, CRI have to know the current snapshotter, but snapshotter is under image layer and exposing this doesn’t seem ideal.

  • NRI is an excellent external management tool.

split out the changes should be related to [Mount Management(https://github.com//issues/11303)], and also should be another pull reqeust.

Let’s leave it at that now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL status/needs-discussion Needs discussion and decision from maintainers

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Feature request: overlayfs quota limitation