Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tail of RMA buffer sporadically not updated after remote host put request finished. #10487

Open
YaoHaoLau opened this issue Feb 13, 2025 · 0 comments
Labels

Comments

@YaoHaoLau
Copy link

YaoHaoLau commented Feb 13, 2025

I'm using ucx java interfaces to archive file reading from remote hosts with RMA feature. Remote host calling UcpEndpoint.putNonBlocking() to transfer data to local host, after corresponding UcpRequest is finished with its status is UCS_OK, alter local host that data is ready to read. Then local host reads rma segment buffer and checks data with expected.

Bug description

This routine works well but sporadically data content from RMA buffer not the same as expected and the tail of rma buffer (tens of KB) keep not changed comparing with last RMA put transfer.

This type of error rarely occurs under normal circumstances, but does. In my test case, 50 client pods continious reading total ~24TB data from ~50 server pods, this error occurs once with test case dozens of times without any exceptions, all requests is done with UCS_OK.

Setup and versions

Each pod has four mlx5 devices available.

Build

  • #define UCX_CONFIGURE_FLAGS "--disable-logging --disable-debug --disable-assertions --disable-params-check --enable-mt --enable-compiler-opt=2 --without-avx --prefix=/root/ucx_1_18_0"
  • #define UCX_MODULE_SUBDIR "ucx"
  • #define VERSION "1.18"
  • #define uct_MODULES ":ib:rdmacm:cma"
  • #define ENABLE_MT 1

Running

  • UCX_MAX_RMA_RAILS=1

Steps to Reproduce

Simplified test case is describe here.

Server part

  1. create shared UcpContext with new UcpContext(new UcpParams().requestRmaFeature().requestWakeupFeature().setMtWorkersShared(true))
  2. create UcpWorker for listener from shared UcpConxt with requestThreadSafety()
  3. for each income connection:
    • create UcpWorker from shared UcpConxt with requestThreadSafety()
    • accept UcpConnectionRequest in current UcpWorker
    • wait endpoint established and sync RMA necessary info with client
    • create 2MB UCS_MEMORY_TYPE_HOST UcpMemory and relate ByteBuffer
  4. for each read request:
    • pick corresponding endpoint according to read request
    • offset = 0, remain = read expect length
    • do
      transfer_size = min(128KB, remain)
      UcpEndpoint.putNonBlocking(localAddress=local_offset + offset, size=transfer_size, remoteAddress = remote_address + offset, remoteKey, null)
      progress and wait request done with status equals to UCS_OK
      flush endpoint / worker
      alter RMA buffer segment from offset is readable
      remain -= transfer_size, offset += transfer_size
    • while remain > 0
    • release held endpoint, able to reuse

Client part

  1. create shared UcpContext with new UcpContext(new UcpParams().requestRmaFeature().requestWakeupFeature().setMtWorkersShared(true))
  2. for server_addr in server_list:
    • create UcpWorker from shared UcpConxt with requestThreadSafety()
    • create 2MB UCS_MEMORY_TYPE_HOST UcpMemory
    • generate remote key buffer via UcpMemory.getRemoteKeyBuffer()
    • create UcpEndpoint to server_addr in current UcpWorker
    • wait endpoint established and sync RMA necessary info with server
    • async waitEvent & progress worker
  3. start multiple thread to perfrom data read from remote host, each thread:
    • pick an endpoint to remote host and tell remote to read
    • each read operation read length varias from 128KB to 2MB
    • offset = 0, remain = read expect length
    • do:
      wait remote host alter message
      rma segment from offset is readable
      read data and check with expected, read size min(128KB, remain)
      remain -= transfer_size, offset += transfer_size
    • while remain > 0
    • release held endpoint, able to reuse
@YaoHaoLau YaoHaoLau added the Bug label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant