-
Notifications
You must be signed in to change notification settings - Fork 230
Fix an apparent mistake in GraphBuilder.add_child #879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an apparent mistake in GraphBuilder.add_child #879
Conversation
While working on C++ stand-alone code executing what `test_graph.py` does in NVIDIAgh-843, I noticed that `add_child` passes dependendencies extracted from capturing stream inconsistently with num_dependencies parameter obtained in the same cuStreamGetCaptureInfo call. Incidentally, after correcting this error, I can no longer reproduce errors reported in NVIDIAgh-843
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
@pciolkosz could you take a look as well, please |
|
/ok to test |
| handle_return( | ||
| driver.cuGraphAddChildGraphNode( | ||
| graph_out, deps_info_out[0], num_dependencies_out, child_graph._mnff.graph | ||
| graph_out, *deps_info_out, num_dependencies_out, child_graph._mnff.graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was done like this because during CUDA 13 bringup @rwgk wanted to make cuda-core work for both 12 and 13 (which have different signatures) without revealing anything about 13 (#722). @vzhurba01 expressed concerns that make sense to me. Now that we are bitten by this and that 13 is out, we should properly check the binding version and not try to hide the differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way to hide discrepancy is actually few lines below, where we pad deps_info_update with None to match the size of deps_info_out.
Using deps_info_out[0] while using num_dependencies_out is a mistake. We could keep deps_info_out[0] by change num_dependencies_out to 1. This runs the risk of missing dependencies though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
We could branch with some sort of CTK < 13, but the resulting copy-paste introduces its own class of potential accidents. I'm not sure it'll be less error prone than the compact implementation that we have right now.
Where I went wrong in #722: I wasn't careful enough about reviewing the documentation for cuGraphAddChildGraphNode. I think even if I had taken the copy-paste route, I might have made this mistake (sorry). When the tests passed, I didn't look any further.
This comment has been minimized.
This comment has been minimized.
|
Ok, all CTK 13.0 tests failed. This is because:
So the solution would be to replace Will push in a second. |
|
/ok to test |
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot please add release note
|
nvm the copilot does not work like this, I'll ask it to fix release notes in a separate PR |
|
Thanks Sasha and all! |
* Fix an apparent mistake in GraphBuilder.add_child While working on C++ stand-alone code executing what `test_graph.py` does in gh-843, I noticed that `add_child` passes dependendencies extracted from capturing stream inconsistently with num_dependencies parameter obtained in the same cuStreamGetCaptureInfo call. Incidentally, after correcting this error, I can no longer reproduce errors reported in gh-843 * Implemented fix to work with both CTK 12.9 and CTK 13.0
|
While working on C++ stand-alone code executing what
test_graph.pydoes in gh-843, I noticed thatadd_childpasses dependendencies extracted from capturing stream inconsistently with num_dependencies parameter obtained in the same cuStreamGetCaptureInfo call.Incidentally, after correcting this error, I can no longer reproduce errors reported in gh-843
Description
closes #843
Checklist