Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fabtests/efa: Add fabtests for efa-direct #10800

Merged
merged 1 commit into from
Feb 25, 2025

Conversation

jiaxiyan
Copy link
Contributor

Run fabtests for efa-direct up to device max msg size(8K) where FI_RMA is supported.
RMA tests currently post a recv with the max transfer size. It could run up to max rdma size after fixing fabtests to post the recv within device max msg size.
Also skip 0 byte for rma because fabtests use inject for messages smaller than inject_size, but efa-direct does not support it until firmware supports inline write.

@jiaxiyan jiaxiyan requested a review from a team February 19, 2025 21:13
@@ -1,8 +1,8 @@
import pytest

@pytest.mark.functional
def test_flood_peer(cmdline_args, fabric):
def test_flood_peer(cmdline_args):
from common import ClientServerTest
test = ClientServerTest(cmdline_args, f"fi_flood -e rdm -W 6400 -S 512 -T 5",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails with
transmit(): common/shared.c:2243, ret=12 (Cannot allocate memory)
Batch memory registration:Fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks bad, need to look into

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO here so that we don't forget?

Copy link
Contributor

@shijin-aws shijin-aws Feb 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses a mode FT_OPT_ALLOC_MULT_MR , which allocates another array of buffers of size opts.transfer_size and register each of them of such size https://github.com/ofiwg/libfabric/blob/v1.22.0/fabtests/common/shared.c#L587-L690. Then
during fi_flood test it post recv from these separate mr buffers via ft_post_rx . However ft_post_rx will finally post a fi_recv with size as the MAX of opts.transfer_size and FT_MAX_CTRL_MSG (1024). When opts.transfer_size is smaller than FT_MAX_CTRL_MSG, it will finally post a recv with size larger than the recv buffer.

I am still discussing with Intel whether it is a bug of the test, but it won't work for efa-direct anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, should be fixed by #10819

@@ -3,14 +3,14 @@

@pytest.mark.functional
@pytest.mark.parametrize("environment_variable", ["FI_EFA_FORK_SAFE", "RDMAV_FORK_SAFE"])
def test_fork_support(cmdline_args, completion_semantic, environment_variable, fabric):
def test_fork_support(cmdline_args, completion_semantic, environment_variable):
from common import ClientServerTest
import copy
cmdline_args_copy = copy.copy(cmdline_args)

cmdline_args_copy.append_environ("{}=1".format(environment_variable))
test = ClientServerTest(cmdline_args_copy, "fi_rdm_tagged_bw -K",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

efa-direct does not support tag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to run with tagged_bw, we can run with fi_rdm_bw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and got timeout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the default message sizes are too big?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed by #10802

from common import ClientServerTest
cmd = "fi_multi_ep -e rdm"
if shared_cq:
cmd += " -Q"
test = ClientServerTest(cmdline_args, cmd, fabric=fabric)
test = ClientServerTest(cmdline_args, cmd, fabric="efa")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout with efa-direct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a bug too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure it out, it is because the rx_size too large again. And this test post a too large rx buffer during the ep enable, and that rx buffer cannot get a send delivered at the final ft_finalize_ep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to specify a small -S after this PR is merged #10814

command = "fi_rdm_pingpong" + " " + perf_progress_model_cli
if iteration_type == "standard" and fabric == "efa-direct":
pytest.skip("Skip standard iteration for efa-direct")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails with inject(): common/shared.c:2321, ret=12 (Cannot allocate memory)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should pass?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret=12 sounds like a bug that this PR will fix #10818

@@ -15,7 +15,7 @@ def test_rnr_read_cq_error(cmdline_args, fabric):
# in this case.
cmdline_args_copy = copy.copy(cmdline_args)
cmdline_args_copy.strict_fabtests_mode = False
test = ClientServerTest(cmdline_args_copy, "fi_efa_rnr_read_cq_error", fabric=fabric)
test = ClientServerTest(cmdline_args_copy, "fi_efa_rnr_read_cq_error", fabric="efa")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libfabric:27104:1739913431::efa:cq:efa_cq_handle_error():100 err: 269, message: Destination resource not ready (no work queue entries posted on receive queue) My EFA addr: fi_addr_efa://[fe80::4f6:e9ff:fefe:adcd]:1:1128862596 My host id: i-059a4b7d0e872e94d Peer EFA addr: fi_addr_efa://[fe80::409:2dff:fe85:85d9]:1:1721598274 Peer host id: N/A (10)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unexpected too. efa-direct should configure the rnr retry based on setopt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/*
	 * In order for the sender to get RNR error, we need to first consume
	 * all pre-posted receive buffer (in efa provider, fi->rx_attr->size
	 * receiving buffer are pre-posted) on the receiver side, the subsequent
	 * sends (expected_rnr_error) will then get RNR errors.
	 */

This is only true for efa-rdm. Efa-direct doesn't prepost any buffer, the number of fi_send posted should be adjusted accordingly.

@jiaxiyan jiaxiyan force-pushed the efa-direct-fabtests branch from 052ea96 to 352de40 Compare February 19, 2025 22:36
@@ -1,8 +1,8 @@
import pytest

@pytest.mark.functional
def test_flood_peer(cmdline_args, fabric):
def test_flood_peer(cmdline_args):
from common import ClientServerTest
test = ClientServerTest(cmdline_args, f"fi_flood -e rdm -W 6400 -S 512 -T 5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks bad, need to look into

@@ -3,14 +3,14 @@

@pytest.mark.functional
@pytest.mark.parametrize("environment_variable", ["FI_EFA_FORK_SAFE", "RDMAV_FORK_SAFE"])
def test_fork_support(cmdline_args, completion_semantic, environment_variable, fabric):
def test_fork_support(cmdline_args, completion_semantic, environment_variable):
from common import ClientServerTest
import copy
cmdline_args_copy = copy.copy(cmdline_args)

cmdline_args_copy.append_environ("{}=1".format(environment_variable))
test = ClientServerTest(cmdline_args_copy, "fi_rdm_tagged_bw -K",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to run with tagged_bw, we can run with fi_rdm_bw

from common import ClientServerTest
cmd = "fi_multi_ep -e rdm"
if shared_cq:
cmd += " -Q"
test = ClientServerTest(cmdline_args, cmd, fabric=fabric)
test = ClientServerTest(cmdline_args, cmd, fabric="efa")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a bug too

@@ -15,7 +15,7 @@ def test_rnr_read_cq_error(cmdline_args, fabric):
# in this case.
cmdline_args_copy = copy.copy(cmdline_args)
cmdline_args_copy.strict_fabtests_mode = False
test = ClientServerTest(cmdline_args_copy, "fi_efa_rnr_read_cq_error", fabric=fabric)
test = ClientServerTest(cmdline_args_copy, "fi_efa_rnr_read_cq_error", fabric="efa")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unexpected too. efa-direct should configure the rnr retry based on setopt

"""
import subprocess
binpath = os.getcwd()
cmd = "timeout 360 " + os.path.join(binpath, "fi_efa_rdma_checker -o writedata")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you check all read, write, and writedata?

@@ -167,6 +183,8 @@ def fabtests_args_to_pytest_args(fabtests_args, shared_options, run_mode):
pytest_args.append(markers)

if fabtests_args.expression:
if not support_fi_rma(fabtests_args.server_id):
fabtests_args.expression += " and not efa-direct"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you just add pytest.skip() in the rma test when fabric == 'efa-direct' and not support_fi_rma

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you prefer adding this in runfabtests.py. How about checking has_rdma in the fixture rma_bw_completion_semantic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't mix these, you can introduce another fixture as rma_fabric like

def rma_fabric(fabric):
    if fabric == 'efa-direct' and not_fi_rma(...):
        pytest.skip()
    return fabric

Then make all test_rma_bw and test_rma_pingpong use rma_fabric instead of fabric as the fixture?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this skip efa-direct on all instances that don't support writedata?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing of this PR. support_fi_rma is even False on p5, and all efa-direct tests are skipped...

@@ -1,8 +1,8 @@
import pytest

@pytest.mark.functional
def test_flood_peer(cmdline_args, fabric):
def test_flood_peer(cmdline_args):
from common import ClientServerTest
test = ClientServerTest(cmdline_args, f"fi_flood -e rdm -W 6400 -S 512 -T 5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO here so that we don't forget?

command = "fi_rdm_pingpong" + " " + perf_progress_model_cli
if iteration_type == "standard" and fabric == "efa-direct":
pytest.skip("Skip standard iteration for efa-direct")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should pass?

@@ -167,6 +183,8 @@ def fabtests_args_to_pytest_args(fabtests_args, shared_options, run_mode):
pytest_args.append(markers)

if fabtests_args.expression:
if not support_fi_rma(fabtests_args.server_id):
fabtests_args.expression += " and not efa-direct"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this skip efa-direct on all instances that don't support writedata?

@@ -30,6 +30,9 @@ def test_transfer_with_read_protocol_cuda(cmdline_args, fabtest_name, cntrl_env_
if not has_cuda(cmdline_args.client_id) or not has_cuda(cmdline_args.server_id):
pytest.skip("Client and server both need a Cuda device")

if fabtest_name == "fi_rdm_tagged_bw" and fabric == "efa-direct":
pytest.skip("efa-direct does not support tagged")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to run this test with efa-direct at all, it is testing protocol selection that efa-direct doesn't have

@jiaxiyan jiaxiyan force-pushed the efa-direct-fabtests branch from 6a1262e to 8f18de1 Compare February 24, 2025 19:10
Run fabtests for efa-direct up to device max msg size(8K) where
FI_RMA is supported.
RMA tests currently post a recv with the max transfer size. It
could run up to max rdma size after fixing fabtests to post the
recv within device max msg size.
Also skip 0 byte for rma because fabtests use inject for messages
smaller than inject_size, but efa-direct does not support it
until firmware supports inline write.

Signed-off-by: Jessie Yang <[email protected]>
@jiaxiyan jiaxiyan force-pushed the efa-direct-fabtests branch from 8f18de1 to 35ba211 Compare February 25, 2025 00:47
@shijin-aws shijin-aws merged commit 3ac1613 into ofiwg:main Feb 25, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants