Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STK_mesh_base Mac build errors #13324

Closed
spdomin opened this issue Aug 5, 2024 · 12 comments
Closed

STK_mesh_base Mac build errors #13324

spdomin opened this issue Aug 5, 2024 · 12 comments
Labels
pkg: STK type: bug The primary issue is a bug in Trilinos code or tests

Comments

@spdomin
Copy link
Contributor

spdomin commented Aug 5, 2024

Bug Report

@alanw0, after a hiatus in supporting nightly Mac builds, I noticed a few build errors in stk_mesh_base. I have not bisected since it should be simple to resolve.

Description

[ 79%] Building CXX object packages/stk/stk_mesh/stk_mesh/base/CMakeFiles/stk_mesh_base.dir/__/baseImpl/DeletedEntityCache.cpp.o
In file included from /Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/stk/stk_mesh/stk_mesh/baseImpl/DeletedEntityCache.cpp:2:
In file included from /Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/stk/stk_mesh/stk_mesh/baseImpl/MeshModification.hpp:44:
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/stk/stk_mesh/stk_mesh/baseImpl/DeletedEntityCache.hpp:47:14: error: no
template named 'unordered_map' in namespace 'std'
typedef std::unordered_map<EntityKey, Entity::entity_value_type, std::hash> GhostReuseMap;
~~~~~^
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/stk/stk_mesh/stk_mesh/baseImpl/DeletedEntityCache.cpp:14:22: error: type
'stk::mesh::GhostReuseMap' (aka 'int') does not provide a subscript operator
m_ghost_reuse_map[m_bulkData.entity_key(entity)] = entity.local_offset();
~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/stk/stk_mesh/stk_mesh/baseImpl/DeletedEntityCache.cpp:40:26: error:
invalid range expression of type 'int'; no viable 'begin' function available
for (auto keyAndOffset : m_ghost_reuse_map) {
^ ~~~~~~~~~~~~~~~~~
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/stk/stk_mesh/stk_mesh/baseImpl/DeletedEntityCache.cpp:43:20: error: member
reference base type 'stk::mesh::GhostReuseMap' (aka 'int') is not a structure or union
m_ghost_reuse_map.clear();

Steps to Reproduce

commit f70ac4d14e3b37e0f16e52d8095bf4b9a97da549
Merge: 24eb97a b0895ea

@spdomin spdomin added the type: bug The primary issue is a bug in Trilinos code or tests label Aug 5, 2024
@alanw0 alanw0 added the pkg: STK label Aug 5, 2024
@alanw0
Copy link
Contributor

alanw0 commented Aug 5, 2024

Thanks for the report, this is a clear case of a missing include.
I'm surprised it works on other compilers, I guess there are indirect includes somehow getting it.
I'll get an update in as soon as I can.

@spdomin
Copy link
Contributor Author

spdomin commented Aug 9, 2024

Looks like this attempt to update STK failed.

@alanw0
Copy link
Contributor

alanw0 commented Aug 9, 2024

Looks like this attempt to update STK failed.

Yeah we've had some cmake/macro changes recently related to the sierra cmake conversion. The trilinos PR testing caught a unit-test failure that I'm fixing now.
I guess it will notify you when it succeeds.

@spdomin
Copy link
Contributor Author

spdomin commented Aug 13, 2024

I am now noting hangs on the new nightly process (non-Mac, simply Linux).

I will perform a bisect and see what is going on.

@alanw0
Copy link
Contributor

alanw0 commented Aug 13, 2024

I am now noting hangs on the new nightly process (non-Mac, simply Linux).

I will perform a bisect and see what is going on.

Ugh. Let me know what you find.

@spdomin
Copy link
Contributor Author

spdomin commented Aug 13, 2024

At least we have a small unit test that is showing the hang: Hex8Mesh.faceBasic

I have a couple more iterations. There was also a Kokkos snapshot...

@ndellingwood
Copy link
Contributor

@spdomin If the test failure is correlated with the kokkos snapshot the hang could indicate something like a View creation or destruction within a parallel region (failure would for example be similar to #13328 ). If that is the case, there is a new tool being developed that is helpful for detecting those cases, kokkos/kokkos-tools#267 , running the hanging test with that tool loaded would help hone in on locations of code to inspect

@spdomin
Copy link
Contributor Author

spdomin commented Aug 13, 2024

Great, I am on the final iteration:

commit f8ff2ad (HEAD)
Author: Nathan Ellingwood [email protected]
Date: Wed Aug 7 16:39:21 2024 -0600

stk: modify test to prevent allocation in parallel region

modify NgpMeshTest.volatileFastSharedCommMap to prevent allocation in a parallel region, which can result in deadlock with kokkos version 4.4
address issue #13328

Co-authored-by: Christian Trott <[email protected]>
Signed-off-by: Nathan Ellingwood <[email protected]>

and this code base actually does not build cleanly...

[ 45%] Building CXX object packages/kokkos/containers/src/CMakeFiles/kokkoscontainers.dir/impl/Kokkos_UnorderedMap_impl.cpp.o
In file included from /fgs/spdomin/nightly/Trilinos/packages/kokkos/core/src/View/MDSpan/Kokkos_MDSpan_Extents.hpp:25,
from /fgs/spdomin/nightly/Trilinos/packages/kokkos/core/src/Kokkos_View.hpp:40,
from /fgs/spdomin/nightly/Trilinos/packages/kokkos/core/src/Kokkos_Parallel.hpp:31,
from /fgs/spdomin/nightly/Trilinos/packages/kokkos/core/src/Kokkos_MemoryPool.hpp:26,
from /fgs/spdomin/nightly/Trilinos/packages/kokkos/core/src/Kokkos_TaskScheduler.hpp:34,
from /fgs/spdomin/nightly/Trilinos/packages/kokkos/core/src/Serial/Kokkos_Serial.hpp:37,
from /fgs/spdomin/nightly/Trilinos/packages/kokkos/core/src/decl/Kokkos_Declare_SERIAL.hpp:21,
from /fgs/spdomin/nightly/Trilinos/build_nightly_release_10.3.0/packages/kokkos/KokkosCore_Config_DeclareBackend.hpp:22,
from /fgs/spdomin/nightly/Trilinos/packages/kokkos/core/src/Kokkos_Core.hpp:45,
from /fgs/spdomin/nightly/Trilinos/packages/kokkos/containers/src/Kokkos_UnorderedMap.hpp:30,
from /fgs/spdomin/nightly/Trilinos/packages/kokkos/containers/src/impl/Kokkos_UnorderedMap_impl.cpp:21:
/fgs/spdomin/nightly/Trilinos/packages/kokkos/core/src/View/MDSpan/Kokkos_MDSpan_Header.hpp:47:10: fatal error: mdspan/mdspan.hpp: No such file or directory
47 | #include <mdspan/mdspan.hpp>

@crtrott or @alanw0 - how shall I proceed to bisect the new Nalu hang?

@spdomin
Copy link
Contributor Author

spdomin commented Aug 13, 2024

I will create a new issue

@spdomin
Copy link
Contributor Author

spdomin commented Aug 13, 2024

@alanw0 - STK is off the hook:) See: #13351

I may need to chat with you about this view-of-views thing... It reminds me of my photo-art that I often title "View from the [flower | stream | etc]

@alanw0
Copy link
Contributor

alanw0 commented Aug 14, 2024

@alanw0 - STK is off the hook:) See: #13351

I may need to chat with you about this view-of-views thing... It reminds me of my photo-art that I often title "View from the [flower | stream | etc]

I always like being off the hook.
But seriously, I'm pretty sure nalu doesn't use the stk class that was the subject of that issue 13328 mentioned above. That class (NgpMesh/HostMesh) is related to NGP code.

@spdomin
Copy link
Contributor Author

spdomin commented Aug 14, 2024

Looks like the Mac build and test is back online - thanks. I see the same hang that I will report on: #13351

@spdomin spdomin closed this as completed Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: STK type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

3 participants