Skip to content

Conversation

@artpol84
Copy link
Contributor

@artpol84 artpol84 commented Apr 20, 2021

See #8823 for the details.

bot:notacherrypick

@artpol84
Copy link
Contributor Author

The fixes for other branches will follow.

@artpol84 artpol84 requested review from hppritcha and rhc54 April 20, 2021 18:50
@artpol84
Copy link
Contributor Author

I'm waiting for the guys from LANL to confirm that this is fixing their problem (should be later today).
Once I'll hear from them, I'll remove the "WIP" label and open PRs to other branches.

@artpol84
Copy link
Contributor Author

@rhc54 what is "PMIX_ID" and do we want to still check against it?

@artpol84 artpol84 force-pushed the topic/v4.1.x/fix_pmix_detection branch 2 times, most recently from 8a14505 to 52c8340 Compare April 20, 2021 19:02
@rhc54
Copy link
Contributor

rhc54 commented Apr 20, 2021

@rhc54 what is "PMIX_ID" and do we want to still check against it?

I have no idea - it isn't even something we deprecated. I'd remove it.

@artpol84 artpol84 force-pushed the topic/v4.1.x/fix_pmix_detection branch from 52c8340 to cc45161 Compare April 20, 2021 22:03
See open-mpi#8823 for the details.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for more details.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
@artpol84 artpol84 force-pushed the topic/v4.1.x/fix_pmix_detection branch from cc45161 to 2210251 Compare April 20, 2021 22:04
@artpol84
Copy link
Contributor Author

Just got the confirmation that on the system where the problem was observed this indeed fixes the issue:

$> OMPI_MCA_pmix_base_verbose=100 srun -n 2 --mpi=pmix_v3 ./a.out
...
[sn001.localdomain:35359] mca:base:select: Auto-selecting pmix components
[sn001.localdomain:35359] mca:base:select:( pmix) Querying component [isolated]
[sn001.localdomain:35359] mca:base:select:( pmix) Query of component [isolated] set priority to 0
[sn001.localdomain:35359] mca:base:select:( pmix) Querying component [pmix3x]
[sn001.localdomain:35359] mca:base:select:( pmix) Query of component [pmix3x] set priority to 100
[sn001.localdomain:35359] mca:base:select:( pmix) Querying component [s1]
[sn001.localdomain:35359] mca:base:select:( pmix) Query of component [s1] set priority to 10
[sn001.localdomain:35359] mca:base:select:( pmix) Querying component [s2]
[sn001.localdomain:35359] mca:base:select:( pmix) Selected component [pmix3x]
...

@artpol84 artpol84 added this to the v4.1.1 milestone Apr 20, 2021
@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6

@jsquyres
Copy link
Member

Which release branches is this PR relevant to?

@artpol84
Copy link
Contributor Author

I think all of them

@artpol84
Copy link
Contributor Author

I'm going to create PRs to 4.x and 3.x branches

@artpol84
Copy link
Contributor Author

@jjhursey @gpaulsen please help with IBM failure

@jsquyres
Copy link
Member

Ok. After this PR is merged, can you cherry-pick to all relevant release branches? Thanks.

@artpol84
Copy link
Contributor Author

@jsquyres already opened.
I guess we don't want to go as far as v2.x, right?

@jsquyres
Copy link
Member

@jsquyres already opened.
I guess we don't want to go as far as v2.x, right?

Correct.

@jjhursey
Copy link
Member

bot:ibm:pgi:retest

@jsquyres
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@artpol84
Copy link
Contributor Author

bot:mellanox:retest

1 similar comment
@artpol84
Copy link
Contributor Author

bot:mellanox:retest

@jsquyres jsquyres merged commit 6ce9b9d into open-mpi:v4.1.x Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants