Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdm_tagged_bw is broken with OOB sync #10118

Open
shijin-aws opened this issue Jun 25, 2024 · 3 comments
Open

rdm_tagged_bw is broken with OOB sync #10118

shijin-aws opened this issue Jun 25, 2024 · 3 comments

Comments

@shijin-aws
Copy link
Contributor

Happens to both main and v1.21.x (haven't checked older versions yet)

FI_LOG_LEVEL=warn fi_rdm_tagged_bw -p efa -b -j 0
bytes   iters   total       time     MB/sec    usec/xfer   Mxfers/sec
64      20k     1.2m        0.06s     22.55       2.84       0.35
256     20k     4.8m        0.03s    201.48       1.27       0.79
1k      20k     19m         0.03s    813.89       1.26       0.79
libfabric:3180488:1719276361::efa:cq:efa_rdm_rxe_report_completion():762<warn> Message truncated! tag: 60030 incoming message size: 4096 receiving buffer size: 1024
[error] fabtests:common/shared.c:2904: cq_readerr 265 (Truncation error), provider errno: -265 (Unknown error)

Same tests failed with tcp provider similarly

ubuntu@ip-172-31-39-234:~/PortaFiducia/build/libraries/libfabric/main/source/libfabric/fabtests$ FI_LOG_LEVEL=warn fi_rdm_tagged_bw -p tcp -b -j 0
bytes   iters   total       time     MB/sec    usec/xfer   Mxfers/sec
64      20k     1.2m        0.04s     31.00       2.06       0.48
256     20k     4.8m        0.05s     94.87       2.70       0.37
1k      20k     19m         0.07s    281.92       3.63       0.28
libfabric:3181046:1719276643::tcp:ep_data:xnet_handle_truncate():159<warn> msg recv truncated
[error] fabtests:common/shared.c:2904: cq_readerr 265 (Truncation error), provider errno: 11 (Resource temporarily unavailable)

So it should be a fabtests issue

@shijin-aws
Copy link
Contributor Author

shijin-aws commented Jun 25, 2024

Non-OOB (-E) sync works.

rdm_tagged_pingpong, rma_bw works fine with OOB sync as well

@j-xiong
Copy link
Contributor

j-xiong commented Jun 25, 2024

Is it related to #10108?

Update: probably not, since the commit is in main only.

@shijin-aws
Copy link
Contributor Author

@j-xiong no, it seems a long-standing issue, I will dig into it .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants