Skip to content

Conversation

@oleksandr-pavlyk
Copy link
Contributor

While working on C++ stand-alone code executing what test_graph.py does in gh-843, I noticed that add_child passes dependendencies extracted from capturing stream inconsistently with num_dependencies parameter obtained in the same cuStreamGetCaptureInfo call.

Incidentally, after correcting this error, I can no longer reproduce errors reported in gh-843

Description

closes #843

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

While working on C++ stand-alone code executing what `test_graph.py`
does in NVIDIAgh-843, I noticed that `add_child` passes dependendencies
extracted from capturing stream inconsistently with num_dependencies
parameter obtained in the same cuStreamGetCaptureInfo call.

Incidentally, after correcting this error, I can no longer reproduce
errors reported in NVIDIAgh-843
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Aug 21, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@oleksandr-pavlyk
Copy link
Contributor Author

@pciolkosz could you take a look as well, please

@oleksandr-pavlyk
Copy link
Contributor Author

/ok to test

handle_return(
driver.cuGraphAddChildGraphNode(
graph_out, deps_info_out[0], num_dependencies_out, child_graph._mnff.graph
graph_out, *deps_info_out, num_dependencies_out, child_graph._mnff.graph
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done like this because during CUDA 13 bringup @rwgk wanted to make cuda-core work for both 12 and 13 (which have different signatures) without revealing anything about 13 (#722). @vzhurba01 expressed concerns that make sense to me. Now that we are bitten by this and that 13 is out, we should properly check the binding version and not try to hide the differences.

Copy link
Contributor Author

@oleksandr-pavlyk oleksandr-pavlyk Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to hide discrepancy is actually few lines below, where we pad deps_info_update with None to match the size of deps_info_out.

Using deps_info_out[0] while using num_dependencies_out is a mistake. We could keep deps_info_out[0] by change num_dependencies_out to 1. This runs the risk of missing dependencies though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

We could branch with some sort of CTK < 13, but the resulting copy-paste introduces its own class of potential accidents. I'm not sure it'll be less error prone than the compact implementation that we have right now.

Where I went wrong in #722: I wasn't careful enough about reviewing the documentation for cuGraphAddChildGraphNode. I think even if I had taken the copy-paste route, I might have made this mistake (sorry). When the tests passed, I didn't look any further.

@leofang leofang added bug Something isn't working P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Aug 21, 2025
@leofang leofang added this to the cuda.core beta 7 milestone Aug 21, 2025
@github-actions

This comment has been minimized.

@oleksandr-pavlyk
Copy link
Contributor Author

oleksandr-pavlyk commented Aug 21, 2025

Ok, all CTK 13.0 tests failed. This is because:

  1. For CTK 12.9, deps_info_out has length of num_dependencies, see 12.9 driver API
  2. For CTK 13.0, deps_info_out has length of 2 * num_dependencies, where extra elements are edge data, see 13.0 driver API.
  3. The cuGraphAddChildGraphNode only takes num_dependencies of these per CUDA Driver API

So the solution would be to replace *deps_info_out in the call to cuGraphAddChildGraphNode with *deps_info_out[:num_dependencies].

Will push in a second.

@oleksandr-pavlyk
Copy link
Contributor Author

/ok to test

@leofang leofang requested a review from vzhurba01 August 21, 2025 15:19
@oleksandr-pavlyk oleksandr-pavlyk moved this from Todo to In Review in CCCL Aug 21, 2025
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please add release note

@leofang
Copy link
Member

leofang commented Aug 21, 2025

nvm the copilot does not work like this, I'll ask it to fix release notes in a separate PR

@leofang leofang merged commit 05952a3 into NVIDIA:main Aug 21, 2025
48 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Aug 21, 2025
@leofang
Copy link
Member

leofang commented Aug 21, 2025

Thanks Sasha and all!

Copilot AI pushed a commit that referenced this pull request Aug 21, 2025
* Fix an apparent mistake in GraphBuilder.add_child

While working on C++ stand-alone code executing what `test_graph.py`
does in gh-843, I noticed that `add_child` passes dependendencies
extracted from capturing stream inconsistently with num_dependencies
parameter obtained in the same cuStreamGetCaptureInfo call.

Incidentally, after correcting this error, I can no longer reproduce
errors reported in gh-843

* Implemented fix to work with both CTK 12.9 and CTK 13.0
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@oleksandr-pavlyk oleksandr-pavlyk deleted the tentative-fix-for-gh-843 branch August 21, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module P0 High priority - Must do!

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Out of bound access in the test_graph_update test

4 participants