-
Notifications
You must be signed in to change notification settings - Fork 160
POSIX backend: LINUXAIO API introduced #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi anton-nayshtut! Thank you for contributing to ai-dynamo/nixl. Your PR reviewers will review your contribution then trigger the CI to test your changes. 🚀 |
456f622
to
754e723
Compare
754e723
to
f5080a5
Compare
4842d05
to
b018b40
Compare
/build |
@ovidiusm , I see that it needs a rebase. Could you please advise on the correct procedure? Should I rebase and push manually? |
rt_dep = cpp.find_library('rt', required: true) | ||
thread_dep = dependency('threads') | ||
|
||
# Check for libaio (for POSIX plugin and test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a different PR we should fix this mistake where it assumes libaio is required for POSIX AIO. It's not. We don't need any symbol test or extra library for POSIX AIO, it's baked into glibc. But your Linux AIO additions here are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. I just wanted to make this change a separate PR as you mentioned.
if (ret != num_ios_to_submit) { | ||
if (ret < 0) { | ||
NIXL_ERROR << absl::StrFormat("linux_aio submit failed: %s", nixl_strerror(-ret)); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial submission isn't a failure. We should handle it by trying again later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree. However, it seems that the current API is built on the premise that it’s all or nothing, so for now I decided to follow it rather than break it by implementing a retry mechanism in functions that aren’t intended for this but are repeatedly called until success - like checkCompleted
.
Generally speaking, I think we need to make the API aware of retries and partial submissions.
Please advise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large would it have to be in practice for full submission to be rejected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking because we have code freeze on Monday, and I am wondering if you think you can merge this new API by then as is and fix the partial submission issue in a separate PR since a fix would be touching code that is out of scope of this PR, or you think it is best to postpone everything and merge in a future version.
Right now we are switching to major / bugfix release cadence, so 0.7.0 would be a good candidate for this feature and 0.7.1 would be a good candidate for the fix. Unless you think that the code will be broken and prefer not to ship it now. IIUC currently requests are rejected if the client app is trying to write/read too many descriptors at once, which may be acceptable if the limit is high and users have not reported such issues until now in the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It behaves the same way as, for example, UringQueue
which submits IOs using io_uring_submit API. This API also returns the number of submitted submission queue entries.
That’s why I implemented the Linux AIO plugin in the same way. It adds functionality and doesn’t make things worse.
So, I think we can add it.
That said, I believe we must adjust the API to be aware of possible partial submissions ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need change of API for partial submission, we can entirely handle this inside the plugin implementation. Having said that, we can address this together for all POSIX IOs in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need change of API for partial submission, we can entirely handle this inside the plugin implementation. Having said that, we can address this together for all POSIX IOs in a separate PR.
Could you please elaborate on your suggestion? As far as I understand, the submit()
API is only called once, and then checkCompleted()
is called repeatedly in a loop until the transfer is finished. Is that correct? If so, how would you recommend implementing partial submission handling? Should the plugin keep sending the rest of the data in checkCompleted()
until it’s all done, or do you have something else in mind?
|
||
ios_to_submit[idx] = nullptr; // Mark as completed | ||
|
||
if (events[i].res < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no way to report that a single operation failed? The whole backend simply fails? This isn't a comment so much on your patch but on this entire POSIX backend I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems so, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In NIXL if entire transfer is not complete its considered a failure. If user wants to get more visibility, they can split their transfers to more requests. Everything is async
There are no conflicts, either a normal merge or a rebase should just work. You can also click on the update branch button and pull, it does a merge. |
If URING is not supported, the getQueueType() API now returns the correct return value (queue_t::UNSUPPORTED). The patch also cleans up the code by removing an unnecessary ifdef. Signed-off-by: Anton Nayshtut <[email protected]>
Linux AIO, although only available on Linux, is known for much better performance than POSIX AIO. This patch implements a Linux AIO backend and integrates it into the NIXL build system so it is only built when the platform supports it. Signed-off-by: Anton Nayshtut <[email protected]>
This patch adds support for a POSIX LINUXAIO API using the Linux AIO plugin. Signed-off-by: Anton Nayshtut <[email protected]>
b018b40
to
a2297c1
Compare
Done! Thanks! |
"--posix_api_type", | ||
type=str, | ||
help="API type for POSIX operations [AIO, URING] (only used with POSIX backend", | ||
help="API type for POSIX operations [AIO, URING, LINUXAIO] (only used with POSIX backend", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can Linux AIO be made default, is it available with manylinux pip builds? @aranadive ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can Linux AIO be made default, is it available with manylinux pip builds? @aranadive ?
It can. As @benlwalker rightly mentioned in one of his comments, the current POSIX AIO dependency is incorrect. NIXL currently assumes that libaio
is required for POSIX AIO, while in reality it is part of GLIBC.
We’re going to fix this dependency in a separate PR, but the fact that POSIX AIO is available in the manylinux build means that libaio
is available there. This means we can make Linux AIO the default option.
if (ret != num_ios_to_submit) { | ||
if (ret < 0) { | ||
NIXL_ERROR << absl::StrFormat("linux_aio submit failed: %s", nixl_strerror(-ret)); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need change of API for partial submission, we can entirely handle this inside the plugin implementation. Having said that, we can address this together for all POSIX IOs in a separate PR.
|
||
ios_to_submit[idx] = nullptr; // Mark as completed | ||
|
||
if (events[i].res < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In NIXL if entire transfer is not complete its considered a failure. If user wants to get more visibility, they can split their transfers to more requests. Everything is async
return NIXL_SUCCESS; | ||
} | ||
|
||
struct io_event events[32]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why specifically 32 events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to. No specific reason. We can pick any other number.
This PR introduces a Linux AIO plugin for the POSIX backend.
Linux AIO, although only available on Linux, is known for much better performance than POSIX AIO.
This patch implements a Linux AIO backend and integrates it into the NIXL build system so it is only built when the platform supports it.
The PR also introduces the
LINUXAIO
API parameter tonixlbench
.Here are the
nixlbench
results (with--storage_enable_direct
on top of NVME drive - SAMSUNG MZPLJ1T6HBJR-00007):AIO
URING
LINUXAIO