Skip to content

Conversation

@jtamzn
Copy link
Contributor

@jtamzn jtamzn commented Feb 6, 2023

This patch to fix compilation issue in #11381

bot:notacherrypick

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

e492149: Fix compilation issue in OFI with CUDA

  • check_signed_off: does not contain a valid Signed-off-by line
  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

3e1d43c: Fix compilation issue in OFI with CUDA

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@jsquyres
Copy link
Member

jsquyres commented Feb 6, 2023

Thanks for the fix! Does this fix also apply to the main branch?

@jsquyres jsquyres added this to the v4.1.5 milestone Feb 6, 2023
@jsquyres jsquyres changed the title Fix compilation issue in OFI with CUDA v4.1.x: Fix compilation issue in OFI with CUDA Feb 6, 2023
@github-actions
Copy link

github-actions bot commented Feb 6, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

3e1d43c: Fix compilation issue in OFI with CUDA

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@wenduwan
Copy link
Contributor

wenduwan commented Feb 6, 2023

Thanks for the fix! Does this fix also apply to the main branch?

Thanks for looking at the issue! main branch does not have this problem so we do not need to apply the fix.

@jtamzn
Copy link
Contributor Author

jtamzn commented Feb 6, 2023

@jsquyres no, we believe main/5.0.x is good. I think it only applies to v4 branch.

@jsquyres
Copy link
Member

jsquyres commented Feb 6, 2023

Ok, if it's not needed on main, then bot:notacherrypick is appropriate -- I see you already added that.

@bwbarrett bwbarrett self-assigned this Feb 6, 2023
Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the wrong fix for what is actually a bad merge in 55d3501.

@bwbarrett
Copy link
Member

Yeah, there's no reason for the OFI MTL on the 4.1.x series to have any CUDA code in it at all. It shouldn't be closing it, and it looks like it's just a bad merge that brought it in.

@jtamzn
Copy link
Contributor Author

jtamzn commented Feb 6, 2023

Yeah, there's no reason for the OFI MTL on the 4.1.x series to have any CUDA code in it at all. It shouldn't be closing it, and it looks like it's just a bad merge that brought it in.

Oh I forget switching github account.

Actually this sounds more reasonable to me. I was thinking that could be a backported CUDA code. If not, we should remove it.

@jtamzn
Copy link
Contributor Author

jtamzn commented Feb 7, 2023

@bwbarrett I updated the PR to remove blocking cuda related code here. However, upon checking the file, there is an additional block also calling in Line 911 as

#if OPAL_CUDA_SUPPORT
    /**
     * Some providers do not require the use of the CUDA convertor
     * in OMPI and its use will cause performance degradation. The
     * following providers will disable it when selected.
     */
    if (!strncmp(prov->fabric_attr->prov_name, "psm3", 4)
        || !strncmp(prov->fabric_attr->prov_name, "psm2", 4))
    {
        ompi_mtl_ofi.base.mtl_flags |= MCA_MTL_BASE_FLAG_CUDA_INIT_DISABLE;
    }
#endif /* OPAL_CUDA_SUPPORT */

It doesn't preventing compilation and compiled bin can pass OSU microbenchmark. Are these lines legit here? If not, we should remove them, too.

@bwbarrett
Copy link
Member

bwbarrett commented Feb 7, 2023

@bwbarrett I updated the PR to remove blocking cuda related code here. However, upon checking the file, there is an additional block also calling in Line 911 as

I think that code should be fine, and since it doesn't reference any CUDA calls, doesn't need the include you asked about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants