-
Notifications
You must be signed in to change notification settings - Fork 177
Integration of GUSLI plugin #494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi danielhe-nvidia! Thank you for contributing to ai-dynamo/nixl. Your PR reviewers will review your contribution then trigger the CI to test your changes. 🚀 |
src/plugins/gusli/gusli_backend.cpp
Outdated
| __LOG_RETERR(NIXL_ERR_INVALID_PARAM, "missing SGL, or SGL too small 0x%lx[b]", local[0].len); | ||
| } | ||
| } else { | ||
| unsigned i = (has_sgl_mem ? 1 : 0); // If supplied sgl, can't use it for now, just ignore it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly the code here assumes that the first descriptor is "special" memory to be used internally by Gusli for optimization when '-sgl' custom param is set. @vvenkates27 what do you think about exposing such API toggle?
src/plugins/gusli/gusli_backend.cpp
Outdated
| std::vector<class nixlGusliBackendReqH> child; // Array of sub completions, possibly a tree, though tested only 2 level io. | ||
| static void completion_cb(nixlGusliBackendReqH *c) { | ||
| __LOG_IO(c, "_done, rv=%d", c->io.get_error()); | ||
| c->pollable_async_rv = c->io.get_error(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check: This callback is always called from the same thread that initialized the request handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, by definition no. It will be called from internal gusli thread. However there is a cancellation point that guarantees, when IO is destroyed (it calls cancel) so callback will not arrive
81896b2 to
54cab9a
Compare
| std::stoi(backParams->at("max_num_simultaneous_requests")); | ||
| if (backParams->count("config_file") > 0) | ||
| gusli_params.config_file = backParams->at("config_file").c_str(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract gusli::global_clnt_context::init_params into a helper function...
src/plugins/gusli/gusli_backend.cpp
Outdated
| if (backParams->count("config_file") > 0) | ||
| gusli_params.config_file = backParams->at("config_file").c_str(); | ||
| } | ||
| lib_ = std::make_unique<gusli::global_clnt_raii>(gusli_params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and move this initialization into initializer list by passing return value of the new helper to it directly instead of using a temporary. With this, you won't need to even use smart pointer anymore since you could just make lib_ by-value class field.
| if (rv == gusli::connect_rv::C_OK) return NIXL_SUCCESS; | ||
| if (rv == gusli::connect_rv::C_NO_DEVICE) return NIXL_ERR_NOT_FOUND; | ||
| if (rv == gusli::connect_rv::C_WRONG_ARGUMENTS) return NIXL_ERR_INVALID_PARAM; | ||
| return NIXL_ERR_BACKEND; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use switch here.
| assert(lib_->BREAKING_VERSION == 1); | ||
| } | ||
|
|
||
| nixlGusliEngine::~nixlGusliEngine() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this and use "=default" in the header instead.
| nixl_status_t | ||
| nixlGusliEngine::deregisterMem(nixlBackendMD *_md) { | ||
| nixlGusliMemReq *md = (nixlGusliMemReq *)_md; | ||
| if (!md) __LOG_RETERR(NIXL_ERR_INVALID_PARAM, "md==null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant. Validate function arg instead, rename auto_deleter to md and use it to access the metadata in a safe manner.
| } | ||
| }; // namespace | ||
|
|
||
| nixlGusliEngine::nixlGusliEngine(const nixlBackendInitParams *nixlInit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consult the code style guide regarding variable naming conventions.
| } | ||
| return true; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of code in this file is a re-implementation of infrastructure that already exist in test/gtest/plugins or of the primitives provided by Gtest library. Please integrate the test-cases from here into one of the existing Gtest suites so it could be validated by CI from day1.
|
Can you please add your test to the CI (need to be added to .gitlab/test_cpp.sh)? |
|
@barneuman
Naturally - I tested GUSLI <--> NIXL with each and every 1 of the above 3 approaches, but I don't know how eventually NIXL team will decide to integrate NIXL unit-tests. Note: The application which is the backend server (loads GUSLI server library) be it SPDK based or NVMeshUM, or user space NVME driver, etc - Should not be part of NIXL and should be managed by external team. For example, SPDK team should be in charge of the SPDK server running, restarting upon crash, configuring to give access to specific block devices and not to others. It is definitely not the responsibility of NIXL / nor GUSLI teams. This is equivalent to real NVME disks. NIXL team should not be responsible for inserting new disks when existing run out of space or managing which local disks should be used for io and which (like os disk) should not. This external configuration of hardware is managed outside of NIXL |
Change-Id: I05e7b35eb3076142b4dcc1f58a68865d5a5962c0 Signed-off-by: danielhsh <[email protected]>
452a7a2 to
1d184fa
Compare
Signed-off-by: Adit Ranadive <[email protected]>
|
Closing this PR since #887 was merged |
Change-Id: I05e7b35eb3076142b4dcc1f58a68865d5a5962c0
What?
Review only: Don't merge, as GUSLI standalone repository still do not exist
Support transfer of 1 element in a list
Support transfer of N element in a list, with/Without scattaer gather extra element
Support the above to multiple block devices in one transfer
Why?
Allow using gusli access (to server or bdevs) via nixl