Skip to content

Conversation

@awlauria
Copy link
Contributor

@awlauria awlauria commented Aug 12, 2021

The following command:

mpirun -host hostA:0,hostB:6 ./x

was launching: (num_cores on hostA) + (6 hostB ranks)
instead of only the expected 6 ranks on hostB.

This works correctly with -hostfile with "hostA slots=0".
This patch just makes the behaviors consistent.

bot:notacherrypick

Co-authored-by: Austen Lauria [email protected]

Signed-off-by: Austen Lauria [email protected]

@awlauria awlauria added this to the v4.0.7 milestone Aug 12, 2021
node->name = strdup(orte_process_info.nodename);
node->index = ORTE_PROC_MY_NAME->vpid;
node->slots = 0;
ORTE_FLAG_SET(node, ORTE_NODE_FLAG_SLOTS_GIVEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you really want to do this - setting that flag means we won't sense the number of available slots by counting the number of cores. So basically you are "locking" mpirun itself to 0 slots. IIRC, the object constructor sets slots to zero - otherwise, this initializer is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I think you're right and these lines aren't needed for the patch. Verifying..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

The following command:

mpirun -host hostA:0,hostB:6 ./x

was launching: (num_cores on hostA) + (6 hostB ranks)
instead of only the expected 6 ranks on hostB.

This works correctly with -hostfile with "hostA slots=0".
This patch just makes the behaviors consistent.

Co-authored-by: Austen Lauria <[email protected]>

Signed-off-by: Austen Lauria <[email protected]>
@awlauria
Copy link
Contributor Author

bot:aws:retest

more java.

@rhc54
Copy link
Contributor

rhc54 commented Aug 12, 2021

FWIW: I ported this over to PRRTE

@awlauria
Copy link
Contributor Author

@rhc54 thanks! I can port the others as well if you don't have the time to do it.

@rhc54
Copy link
Contributor

rhc54 commented Aug 12, 2021

@rhc54 thanks! I can port the others as well if you don't have the time to do it.

I ported #9232 - were there any others?

@awlauria
Copy link
Contributor Author

awlauria commented Aug 13, 2021

Does #9222 apply as well?

@rhc54
Copy link
Contributor

rhc54 commented Aug 13, 2021

Does #9222 apply as well?

No - PRRTE doesn't modify LD_LIBRARY_PATH the way ORTE did.

@jjhursey
Copy link
Member

Does PR #9234 need to be updated?

@awlauria
Copy link
Contributor Author

@jjhursey #9234 is a cherry-pick of this one - so they should be the same.

@gpaulsen gpaulsen merged commit e5a33c1 into open-mpi:v4.0.x Aug 18, 2021
@awlauria awlauria deleted the fix_host_parse branch March 17, 2022 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants