Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch: support rocm for gpu info #2261

Merged
merged 2 commits into from
Nov 10, 2023
Merged

arch: support rocm for gpu info #2261

merged 2 commits into from
Nov 10, 2023

Conversation

mloubout
Copy link
Contributor

@mloubout mloubout commented Nov 7, 2023

currently removes the Xcompile flag needed for host side as a duplicate flag

Fixes #2260

@mloubout mloubout added the arch jitting, archinfo, ... label Nov 7, 2023
Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Merging #2261 (d0002b2) into master (e707027) will decrease coverage by 0.07%.
The diff coverage is 19.44%.

@@            Coverage Diff             @@
##           master    #2261      +/-   ##
==========================================
- Coverage   86.95%   86.88%   -0.07%     
==========================================
  Files         229      229              
  Lines       41970    42034      +64     
  Branches     7752     7760       +8     
==========================================
+ Hits        36495    36522      +27     
- Misses       4838     4871      +33     
- Partials      637      641       +4     
Files Coverage Δ
devito/mpi/distributed.py 92.92% <100.00%> (+0.02%) ⬆️
devito/arch/compiler.py 44.42% <85.71%> (+4.34%) ⬆️
tests/test_gpu_common.py 1.40% <0.00%> (-0.02%) ⬇️
devito/arch/archinfo.py 40.15% <2.12%> (-3.72%) ⬇️

@mloubout mloubout changed the title compiler: prevent custom compiler from removing host flags compiler: misc arch and mpi bug fix Nov 7, 2023
@mloubout mloubout added bug-py-minor MPI mpi-related labels Nov 7, 2023
Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the mpi problem?

@deckerla
Copy link
Contributor

deckerla commented Nov 7, 2023

What was the mpi problem?

This is the related issue:

#2262

@@ -150,7 +150,10 @@ def preprocess(clusters, options=None, **kwargs):
for c in clusters:
if c.is_halo_touch:
hs = HaloScheme.union(e.rhs.halo_scheme for e in c.exprs)
queue.append(c.rebuild(halo_scheme=hs))
if hs.distributed_aindices:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After playing with the test a bit, I'm not sure this is the right fix.

The test does need a halo exchange (u needs to be up-to-date before getting interpolated), but what this patch is essentially: "well, if the halo_scheme is empty, drop it and don't generate anything"

However, empty halo schemes should have been dropped earlier on, to avoid clusters-level pollution. In particular, here: https://github.com/devitocodes/devito/blob/master/devito/ir/clusters/algorithms.py#L395

So, it turns out that is_void gives False here because the fmapper is non-empty while the distributed-aindices are empty. And the latter is caused by the fact that no dimensions are used to index into u -- just plain symbols, AFAICT

So we might have to:

  • fix this somewhere else
  • make the test more robust, e.g. by placing the receivers at the MPI borders

About the fix: it might need to go somewhere here https://github.com/devitocodes/devito/blob/master/devito/mpi/halo_scheme.py#L470

by the looks of it, if f.grid.is_distributed(d) then:

  • if there's an aindex, we good (most of the cases)
  • if i[d] is a number, we should throw a user exception, since it's illegal MPI code (can't use pure numbers to index into distributed dimensions
  • if i[d] isn't a number (e.g., like in our test, an expression of pure symbols), then we should use a dummy dimension placeholder, capturing the fact that such dimension does require a halo exchange

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't drop it it just keeps it where it is in the list of clusters. It's dropped somewhere else probably in the mpi pass but that's a different issue.

@mloubout mloubout force-pushed the tweak-custom-cuda branch 10 times, most recently from 5a7a65a to 1d87f69 Compare November 9, 2023 20:12
else:
hispace = None

if hispace and options['mpi']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever happens here, and regardless of how we end up doing it, we should always do it, irrespective of whether MPI is enabled or not. This will confer robustness to our compiler and, in particular, it will avoid "bugs to only pop up with MPI enabled"

@mloubout mloubout removed bug-py-minor MPI mpi-related labels Nov 10, 2023
@mloubout mloubout changed the title compiler: misc arch and mpi bug fix arch: support rocm for gpu info Nov 10, 2023
@mloubout mloubout force-pushed the tweak-custom-cuda branch 3 times, most recently from dc67515 to 94e7c92 Compare November 10, 2023 14:00
@mloubout mloubout force-pushed the tweak-custom-cuda branch 3 times, most recently from 43a7baa to 99b776c Compare November 10, 2023 15:16
@mloubout mloubout merged commit 730ed09 into master Nov 10, 2023
32 checks passed
@mloubout mloubout deleted the tweak-custom-cuda branch July 22, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch jitting, archinfo, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_gpu_info not working for AMD GPU
4 participants