Skip to content

Conversation

@yosefe
Copy link
Contributor

@yosefe yosefe commented Jan 28, 2021

Why

Allow MPI runtime skip request leak check, which will silence errors like reported in open-mpi/ompi#8426

@yosefe yosefe added the API label Jan 28, 2021
@shamisp
Copy link
Contributor

shamisp commented Jan 28, 2021

@yosefe - Why MPI_Finalize does not cleanup the resources ? AFAIK, It supposed to track communicators and clean the resources. Adding @hppritcha

* @ingroup UCP_WORKER
* @brief UCP worker flags
*
* This enumeration allows pecifying flags for @ref ucp_worker_params_t.flags,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifying


uct_thread_mode = UCS_THREAD_MODE_SINGLE;
worker->flags = 0;
worker->flags = (params->field_mask & UCP_WORKER_PARAM_FIELD_FLAGS) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: you can use UCP_PARAM_VALUE

UCP_WORKER_FLAG_EXTERNAL_EVENT_FD = UCS_BIT(0), /**< worker event fd is external */
UCP_WORKER_FLAG_EDGE_TRIGGERED = UCS_BIT(1), /**< events are edge-triggered */
UCP_WORKER_FLAG_MT = UCS_BIT(2) /**< MT locking is required */
/** MT locking is required */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are values moved to allow combining with new params->flags?
Can you please add a comment and some check they are not mixed?

@yosefe
Copy link
Contributor Author

yosefe commented Jan 29, 2021

@yosefe - Why MPI_Finalize does not cleanup the resources ? AFAIK, It supposed to track communicators and clean the resources. Adding @hppritcha

Some applications do not release all requests [even though required by MPI standard], and other MPI implementations do not warn about this. There are user applications that start to spew these warnings when ran with UCX, and users see this as UCX issue, and in cases just silence them by setting UCX_LOG_LEVEL=error

See similar reports:

Since that kind of leak does would not usually have side effects (MPI_Finalize is executed right before process exit), it would be better to not print it in case of MPI.

@shamisp
Copy link
Contributor

shamisp commented Jan 29, 2021

I think part of the annoyance here (from user perspective) is that it generates tons of errors... Would it be better to generate one-liner error: your application leaked NN requests. @hppritcha @pavanbalaji is the preference from MPI community to silence this ? thanks !

@pavanbalaji
Copy link
Contributor

pavanbalaji commented Jan 30, 2021

I think part of the annoyance here (from user perspective) is that it generates tons of errors... Would it be better to generate one-liner error: your application leaked NN requests. @hppritcha @pavanbalaji is the preference from MPI community to silence this ? thanks !

MPICH only generates this when the user provides a special configure flag, not by default. From MPI's perspective, it doesn't matter if you throw an error in UCX or not. We keep track of MPI requests internally and would throw a warning (if the user specified the configure flag) anyway. The UCX warning would be redundant in that case.

We could also release the UCX requests internally in MPI_Finalize as you suggested. So it shouldn't matter one way or another for MPI libraries. But I'm not an MPI library developer anymore, so I can't speak for them. :-)

@shamisp
Copy link
Contributor

shamisp commented Jan 30, 2021

I think part of the annoyance here (from user perspective) is that it generates tons of errors... Would it be better to generate one-liner error: your application leaked NN requests. @hppritcha @pavanbalaji is the preference from MPI community to silence this ? thanks !

MPICH only generates this when the user provides a special configure flag, not by default. From MPI's perspective, it doesn't matter if you throw an error in UCX or not. We keep track of MPI requests internally and would throw a warning (if the user specified the configure flag) anyway. The UCX warning would be redundant in that case.

We could also release the UCX requests internally in MPI_Finalize as you suggested. So it shouldn't matter one way or another for MPI libraries. But I'm not an MPI library developer anymore, so I can't speak for them. :-)

As I read - MPICH detects the leak but does not really clean it up.

@hppritcha
Copy link
Contributor

I think part of the annoyance here (from user perspective) is that it generates tons of errors... Would it be better to generate one-liner error: your application leaked NN requests. @hppritcha @pavanbalaji is the preference from MPI community to silence this ? thanks !

preference for silence from the OMPI side.

@yosefe
Copy link
Contributor Author

yosefe commented Feb 4, 2021

@shamisp are we good to go?

@shamisp
Copy link
Contributor

shamisp commented Feb 4, 2021

👍 from my end. @tonycurtis can you please review documentation.

Copy link
Contributor

@tonycurtis tonycurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ucp.h:1137: could simplify to "if UCP_WORKER_......." not set, defaults to 0 ?

@yosefe
Copy link
Contributor Author

yosefe commented Feb 4, 2021

test output:

[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from all/test_ucp_worker_request_leak
[ RUN      ] all/test_ucp_worker_request_leak.tag_send_recv/0 <all/leak_check>
[     INFO ] send req: 0x1e0e040, recv req: 0x1e18080
[     INFO ] < object 0x1e0df40 was not returned to mpool ucp_requests >
[     INFO ] < object 0x1e17f80 was not returned to mpool ucp_requests >
[       OK ] all/test_ucp_worker_request_leak.tag_send_recv/0 (371 ms)
[ RUN      ] all/test_ucp_worker_request_leak.tag_send_recv/1 <all/leak_ignore>
[     INFO ] send req: 0x288b040, recv req: 0x2895080
[       OK ] all/test_ucp_worker_request_leak.tag_send_recv/1 (373 ms)
[----------] 2 tests from all/test_ucp_worker_request_leak (744 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (744 ms total)
[  PASSED  ] 2 tests.

@yosefe
Copy link
Contributor Author

yosefe commented Feb 5, 2021

@tonycurtis fixed the comment
@brminich can you pls review the gtest?

return get_variant_value(0) == LEAK_IGNORE;
}

/// @override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor extra / here and below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's supposed to be doxgen comment
https://www.doxygen.nl/manual/docblocks.html

A third alternative is to use a block of at least two C++ comment lines, where each line starts with an additional slash or an exclamation mark. Here are examples of the two cases:

///
/// ... text ...
///

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aa, ok

Some runtimes, such as MPI, allow exiting the application without
releasing all allocated requests.
@yosefe yosefe force-pushed the topic/ucp-api-flag-to-silence-request-leak branch from 7a382c3 to ebff59d Compare February 5, 2021 18:38
@yosefe yosefe merged commit ced9ac6 into openucx:master Feb 6, 2021
@yosefe yosefe deleted the topic/ucp-api-flag-to-silence-request-leak branch February 6, 2021 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants