Skip to content

Conversation

@vspetrov
Copy link
Collaborator

@vspetrov vspetrov commented May 5, 2022

What

A set of gtest improvements:

  • Switch to gtest release v1.10.x that has "GTEST_SKIP" macro
  • Cleanup/refactor mc_reduce test suite: split cuda/host test types_ops enumerations. (this will be used in PR API: add float128 and float32(64,128)_complex dt #492 )
  • Add checks for availability of mc cuda component at runtime and proper gtest_skip. This way we can launch gtest on systems with and w/o cuda w/o failures (that we might have in current master)

@vspetrov
Copy link
Collaborator Author

bot:retest

@vspetrov vspetrov force-pushed the topic/gtest_update branch from 9ebe916 to 548c0ab Compare May 12, 2022 09:10
@vspetrov
Copy link
Collaborator Author

@Sergei-Lebedev @bureddy @manjugv plz review this PR. It is large but simple. All the changes in gtest.h and gtest-all.cc (which are biggest part of the PR) can be ignored in review, since this is just an update of the googletest to the official v1.10 release.

All the rest is simple: mostly refactoring of reduction tests - making groups of type+op pairs for cuda/host cases; and adding proper GTEST_SKIP in the unsupported cases.

1 failure in openucx.ucc azure check should also be ignored - there is timeout in clang-format check. apparently 15K lines of code too much for git-clang-format to handle in 60mins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call ucc_collective_finalize to free all successfully initialized requests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destructor of UCC_REQ is called later and it takes care of reqs finalize

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some pairs are missing compared to original version e.g. UCC_DT_INT16 max

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. We never had ALL the combinations (to limit the time spent in collectives with reductions). I changed it that way: we now test ALL possible combos with test_mc_reduce, AND we test "every possible dtype" and "every possible op" in the reduce coll tests (reduce, allreduce, reduce_scatter), but now every possible combination of them.
Also i added explicitly INT32, FLOAT32, FLOAT64 with all arithmetic ops since those are most commonly used. If you think we should add more plz comment - i will extend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we make pairs so that reduction is always supported then this check is not needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For test_mc_reduce (not collective but just reduction backend) i don't enumerate separate dtypes for CUDA. I have the enumerations for ALL possible pairs, but skip not-supported. I could create separate set of testing::types for CUDA only. It will be little bit more explicit but then i can remove test skip here. What do you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, keep it as is pls, i thought we support all ops and dtypes in cuda

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing UCC_DT_BFLOAT16

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PREDEFINED_TYPES macro is only used with collectives that do not do any reductions. Do you think we need to add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, why not, we anyway test different datatypes and I don't think we test bfloat16 copy operation somewhere else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@vspetrov vspetrov force-pushed the topic/gtest_update branch from 548c0ab to 1934021 Compare May 16, 2022 07:27
@vspetrov vspetrov requested a review from Sergei-Lebedev May 16, 2022 07:27
@vspetrov vspetrov force-pushed the topic/gtest_update branch 2 times, most recently from 106219b to 00f69e8 Compare May 16, 2022 11:56
Valentin Petrov added 3 commits May 18, 2022 22:01
    Don't list all the skipped tests by default
    Refactor reduce-ops tests (to support custom tests for CUDA/HOST)
    Use GTEST_SKIP and mc_available checks to properly skip
    unsupported cases
@vspetrov vspetrov force-pushed the topic/gtest_update branch from 00f69e8 to efbff3d Compare May 18, 2022 19:01
@vspetrov
Copy link
Collaborator Author

Merged w/o codestyle/openucx.ucc pass - both time out on the code style format - too large change in gtest.h/gtest_all.cc.

@vspetrov vspetrov merged commit e6f32f1 into openucx:master May 18, 2022
@vspetrov vspetrov deleted the topic/gtest_update branch May 18, 2022 20:31
@vspetrov vspetrov restored the topic/gtest_update branch May 19, 2022 08:15
@vspetrov vspetrov deleted the topic/gtest_update branch May 25, 2022 11:53
kingchc pushed a commit to facebookresearch/ucc that referenced this pull request Jul 20, 2022
* TEST: update to gtest 1.10

* TEST: gtest_print_skipped

    Don't list all the skipped tests by default

* TEST: gtest improvements

    Refactor reduce-ops tests (to support custom tests for CUDA/HOST)
    Use GTEST_SKIP and mc_available checks to properly skip
    unsupported cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants