Skip to content

Conversation

@tstamler
Copy link
Contributor

What?

Update the remote storage example with pipelining code and more details in README.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi tstamler! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@tstamler tstamler force-pushed the remote_storage_pipeline branch from 1111e18 to c3f3cdc Compare October 14, 2025 15:13
@tstamler tstamler marked this pull request as ready for review October 17, 2025 14:20
Signed-off-by: Timothy Stamler <[email protected]>
@tstamler tstamler force-pushed the remote_storage_pipeline branch from b2c6c33 to 805df1e Compare October 17, 2025 14:20
@tstamler tstamler force-pushed the remote_storage_pipeline branch from cb625ac to 93ab2a6 Compare October 17, 2025 14:41
Comment on lines 290 to 292
end_time = time.time()

elapsed = end_time - start_time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
end_time = time.time()
elapsed = end_time - start_time
elapsed = time.time() - start_time

Comment on lines 308 to 310
end_time = time.time()

elapsed = end_time - start_time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
end_time = time.time()
elapsed = end_time - start_time
elapsed = time.time() - start_time

Comment on lines +369 to +372
mem = "DRAM"

if args.role == "client":
mem = "VRAM"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have some constants for "DRAM", "VRAM", "GDS_MT" and so on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pretty consistently expect python applications to just use strings, its a little clearer than constant definitions in the bindings. The python API will do these translations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant constant strings

Something like:

class MemoryType(StrEnum):
    DRAM = "DRAM"
    VRAM = "VRAM"
    ...

Then usage will be:

MemoryType.DRAM

We have those enums in Rust and C++, thought we had something similar here, but if there isn't, it requires a separate PR as it's unrelated

nixl_mem_reg_list = []
nixl_file_reg_list = []

if mem == "VRAM":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to use constants for memory/storage types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above


elapsed = end_time - start_time

logger.info(f"Time for {iterations} READ iterations: {elapsed} seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info(f"Time for {iterations} READ iterations: {elapsed} seconds")
logger.info("Time for %s READ iterations: %s seconds", iterations, elapsed)

We shouldn't use f-strings in loggers, it's not optimized, pylint would warn about it

s += 1
continue

if s == iterations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this flow happen with the s < iterations while condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, while loop should be an or

Comment on lines +106 to +118
if s == 0:
futures.append(
executor.submit(
execute_transfer,
my_agent,
my_mem_descs,
my_file_descs,
my_agent.name,
"READ",
)
)
s += 1
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move this line before the loop, initiating s to 1?

I think it would simplify the loop and help avoid a branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


end_time = time.time()

logger.info(f"Time for {iterations} iterations: {end_time - start_time} seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-string -> %s logger formatting

### Pipelining

To improve performance of the remote storage server, we can pipeline operations to network and storage. This pipelining allows multiple threads to handle each request. However, in order to maintain correctness, the order of network and storage must happen in order for each individual remote storage operation. To do this, we implemented a simple pipelining scheme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tstamler please can you add some more description on how this is implemented with NIXL in this example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a brief description


### Performance Tips

For high-speed storage and network hardware, you may need to tweak performance with a couple of environment variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide suggested list of env - vars, (like providing example configuration with CX-7, SPX and providing what UCX tuning/GDS tuning that is seen to be beneficial?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next two lines are these suggestions. I don't know what other example configuration options you might be talking about.


For high-speed storage and network hardware, you may need to tweak performance with a couple of environment variables.

First, for optimal GDS performance, ensure you are using the GDS_MT backend with default concurrency. Additionally, you can use the cufile options described in the [GDS README](https://github.com/ai-dynamo/nixl/blob/main/src/plugins/cuda_gds/README.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also highlight the difference between the GDS backend and that GDS plugins can work in compat mode by default, and what is required for true GDS support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds GDS specific and should be in the GDS README. I can at least add a reminder here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants