-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add io_uring_* syscalls #1264
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes: containers/podman#16796 Signed-off-by: Daniel J Walsh <[email protected]>
@giuseppe PTAL |
I don't see any dangerous syscall at the moment, but this could change in the future as new syscalls are supported: enum io_uring_op {
IORING_OP_NOP,
IORING_OP_READV,
IORING_OP_WRITEV,
IORING_OP_FSYNC,
IORING_OP_READ_FIXED,
IORING_OP_WRITE_FIXED,
IORING_OP_POLL_ADD,
IORING_OP_POLL_REMOVE,
IORING_OP_SYNC_FILE_RANGE,
IORING_OP_SENDMSG,
IORING_OP_RECVMSG,
IORING_OP_TIMEOUT,
IORING_OP_TIMEOUT_REMOVE,
IORING_OP_ACCEPT,
IORING_OP_ASYNC_CANCEL,
IORING_OP_LINK_TIMEOUT,
IORING_OP_CONNECT,
IORING_OP_FALLOCATE,
IORING_OP_OPENAT,
IORING_OP_CLOSE,
IORING_OP_FILES_UPDATE,
IORING_OP_STATX,
IORING_OP_READ,
IORING_OP_WRITE,
IORING_OP_FADVISE,
IORING_OP_MADVISE,
IORING_OP_SEND,
IORING_OP_RECV,
IORING_OP_OPENAT2,
IORING_OP_EPOLL_CTL,
IORING_OP_SPLICE,
IORING_OP_PROVIDE_BUFFERS,
IORING_OP_REMOVE_BUFFERS,
IORING_OP_TEE,
IORING_OP_SHUTDOWN,
IORING_OP_RENAMEAT,
IORING_OP_UNLINKAT,
IORING_OP_MKDIRAT,
IORING_OP_SYMLINKAT,
IORING_OP_LINKAT,
IORING_OP_MSG_RING,
IORING_OP_FSETXATTR,
IORING_OP_SETXATTR,
IORING_OP_FGETXATTR,
IORING_OP_GETXATTR,
IORING_OP_SOCKET,
IORING_OP_URING_CMD,
IORING_OP_SEND_ZC,
/* this goes last, obviously */
IORING_OP_LAST,
}; I am not sure we should permit it by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Docker allows these syscalls. But I prefer another head nod given the security concerns.
I think this isn't future proof. The seccomp profile is deny by default, right? So when the kernel adds a new syscall we have to add it explicitly to the allow list. If io_uring adds support for a new syscall it is impossible to restrict this use if it is considered dangerous. Of course I have no idea how likely this is but when this happens we might need to disallow io_uring here which will then break users that rely on io_uring. |
agreed. I think we need a new profile, we have no control over what other syscalls will be enabled in io_uring. |
How do you envision users dealing with this? |
Well they would need to create their own profile or we ship one and then the user only needs to do Ideally the kernel would just apply the seccomp filter for the io_uring operations just like the normal syscalls. To me it fells extremely weird that you can bypass the seccomp filter like that but I have no idea about the kernel internals so I guess there must be reasons for that. |
That would mean us shipping a separate json file or users being smart enough to figure this out on their own. (Not likely). |
Should we add the ability to modify the seccomp file directly with podman run --security-opt seccomp:+io_uring ... |
Let's explore the problem space a bit further. I wonder
|
I've asked the Moby folks why they decided to enable io_uring: moby/moby#41223 (comment) If it is an accepted risk from the community, then let's go forward with it but to me it still sounds like a risk to open up to syscalls we have no control over and it is also not possible to control what io_uring does from seccomp at the moment.
seccomp customization is difficult, especially with the JSON file we use. Maybe we need to provide a series of curated profiles, since CRI-O and Podman are already using different profiles, something like |
I like having limited choice that we can document, and perhaps diagnose when it fails. I don't think we should merge this PR, since we should be as secure if not more secure the Docker. Having an easy wait to turn on io_uring for users who actually need it would be better then turning it on for everyone by default. |
That's really something to improve on. Changing seccomp profiles is hard when we only want to add one ore two syscalls. |
Fixes: containers/podman#16796
Signed-off-by: Daniel J Walsh [email protected]