-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fabtests/rdm_atomic, ubertest: move atomic verification to common code and use in functional test #10155
base: main
Are you sure you want to change the base?
Conversation
This allows fabtests to make use of atomic validation code Signed-off-by: Alexia Ingerson <[email protected]>
To properly validate atomic data, we need host bounce buffers for the result and compare buffers in addition to the regular bounce buffer for the tx/rx bufs. This adds two extra bufs allocated only for atomic purposes and adds hmem support to the common atomic validation path. It also renames the alloc/free_tx_buf calls to generic alloc/free_host_bufs which allocates all three buffers at once. Signed-off-by: Alexia Ingerson <[email protected]>
ft_post_atomic posted "buf" which is the base address for the entire send and recv buffer allocation. The first half of the allocation is the receive buffer and the second half is the send buffer. Posting just "buf" meant it was sending the receive buffer. This changes it to send the tx buf and do an atomic on the rx buf which allows us to properly do atomic validation Signed-off-by: Alexia Ingerson <[email protected]>
This allows us to post the rx buf without corrupting memory in case its needed for validation Signed-off-by: Alexia Ingerson <[email protected]>
Match the behavior of memset() where the value passed in is an int, but it is interpreted as a char. While ZE can technically handle this scenario, others may not so we need to standardize across ifaces Signed-off-by: Alexia Ingerson <[email protected]>
Add data validation to the atomic test by using the newly added atomic fill and check support imported from ubertest. This code uses a macro that switches on datatype for filling and checking the buffer contents. The atomic validation path requires an extra buffer to copy the contents of the original atomic buffer in order to recreate the atomic function locally and check the buffer against the simulated atomic operation. This patch also refactors the entire test to remove the extremely confuctin macros used for the base/fetch/compare operations. The macros made the code extremely difficult to read and debug and also made it difficult to add data validation. Separating it into three explicit functions is about the same amount of code and significantly more readable Synchronization messages are added in the validation case to ensure the atomic operation completed on both sides before validation occurs. This requires the addition of the FI_ORDER_SAW and FI_ORDER_SAR message ordering to ensure that we get the completion for the send/recv sync after the atomic message is processed Signed-off-by: Alexia Ingerson <[email protected]>
Run fi_rdm_atomic with data validation in standard and short test suites Signed-off-by: Alexia Ingerson <[email protected]>
ret = ft_hmem_copy_from(opts.iface, opts.device, dev_host_comp, | ||
cmp, count * datatype_to_size(type)); | ||
if (ret) { | ||
FT_ERR("Failed to copy from atomic buffer\n"); | ||
return ret; | ||
} | ||
|
||
check_comp = dev_host_comp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is check_comp needed when atomic != FI_ATOMIC_COMPARE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think I left it because coverity gets really confused by the unused variables with the atomic cases. But I'm going to refactor this whole check which is looking really messy. I'll push an update when I figure out the other failures.
Looks like MS compiler doesn't like the complex types:
|
Ubertest copies the software implementation of atomics from libfabric in order to add atomic operation data validation. This is done by saving the sent data and performing a local atomic in order to validate against the actual data calculated by the libfabric provider.
This patch set moves that implementation from ubertest to the common code for other tests (specifically the fi_rdm_atomic test) to leverage it.
Ubertest atomics with validation is not run in the CI by default but he rdm_atomic test is. Because of the lack of validation in the functional fabtest, an incorrect fetched buffer pointer was never caught. This patch set also enables running rdm_atomic with validation in the standard and short test sets in runfabtests.sh so we can catch this in the future.
During testing of these changes, it properly caught the previously missed bug and properly passes when it was fixed.