arch: support rocm for gpu info #2261

mloubout · 2023-11-07T14:42:54Z

currently removes the Xcompile flag needed for host side as a duplicate flag

codecov · 2023-11-07T14:47:36Z

Codecov Report

Merging #2261 (d0002b2) into master (e707027) will decrease coverage by 0.07%.
The diff coverage is 19.44%.

@@            Coverage Diff             @@
##           master    #2261      +/-   ##
==========================================
- Coverage   86.95%   86.88%   -0.07%     
==========================================
  Files         229      229              
  Lines       41970    42034      +64     
  Branches     7752     7760       +8     
==========================================
+ Hits        36495    36522      +27     
- Misses       4838     4871      +33     
- Partials      637      641       +4

Files	Coverage Δ
devito/mpi/distributed.py	`92.92% <100.00%> (+0.02%)`	⬆️
devito/arch/compiler.py	`44.42% <85.71%> (+4.34%)`	⬆️
tests/test_gpu_common.py	`1.40% <0.00%> (-0.02%)`	⬇️
devito/arch/archinfo.py	`40.15% <2.12%> (-3.72%)`	⬇️

georgebisbas

What was the mpi problem?

deckerla · 2023-11-07T19:50:41Z

What was the mpi problem?

This is the related issue:

#2262

FabioLuporini · 2023-11-08T08:10:25Z

devito/ir/stree/algorithms.py

@@ -150,7 +150,10 @@ def preprocess(clusters, options=None, **kwargs):
    for c in clusters:
        if c.is_halo_touch:
            hs = HaloScheme.union(e.rhs.halo_scheme for e in c.exprs)
-            queue.append(c.rebuild(halo_scheme=hs))
+            if hs.distributed_aindices:


After playing with the test a bit, I'm not sure this is the right fix.

The test does need a halo exchange (u needs to be up-to-date before getting interpolated), but what this patch is essentially: "well, if the halo_scheme is empty, drop it and don't generate anything"

However, empty halo schemes should have been dropped earlier on, to avoid clusters-level pollution. In particular, here: https://github.com/devitocodes/devito/blob/master/devito/ir/clusters/algorithms.py#L395

So, it turns out that is_void gives False here because the fmapper is non-empty while the distributed-aindices are empty. And the latter is caused by the fact that no dimensions are used to index into u -- just plain symbols, AFAICT

So we might have to:

fix this somewhere else

make the test more robust, e.g. by placing the receivers at the MPI borders

About the fix: it might need to go somewhere here https://github.com/devitocodes/devito/blob/master/devito/mpi/halo_scheme.py#L470

by the looks of it, if f.grid.is_distributed(d) then:

if there's an aindex, we good (most of the cases)

if i[d] is a number, we should throw a user exception, since it's illegal MPI code (can't use pure numbers to index into distributed dimensions

if i[d] isn't a number (e.g., like in our test, an expression of pure symbols), then we should use a dummy dimension placeholder, capturing the fact that such dimension does require a halo exchange

It doesn't drop it it just keeps it where it is in the list of clusters. It's dropped somewhere else probably in the mpi pass but that's a different issue.

FabioLuporini · 2023-11-10T11:56:17Z

devito/ir/stree/algorithms.py

+            else:
+                hispace = None
+
+            if hispace and options['mpi']:


whatever happens here, and regardless of how we end up doing it, we should always do it, irrespective of whether MPI is enabled or not. This will confer robustness to our compiler and, in particular, it will avoid "bugs to only pop up with MPI enabled"

…ates

mloubout added the arch jitting, archinfo, ... label Nov 7, 2023

mloubout requested a review from FabioLuporini November 7, 2023 14:42

mloubout changed the title ~~compiler: prevent custom compiler from removing host flags~~ compiler: misc arch and mpi bug fix Nov 7, 2023

mloubout added bug-py-minor MPI mpi-related labels Nov 7, 2023

mloubout force-pushed the tweak-custom-cuda branch from 233e82b to f0eac9a Compare November 7, 2023 17:54

georgebisbas reviewed Nov 7, 2023

View reviewed changes

FabioLuporini requested changes Nov 8, 2023

View reviewed changes

mloubout force-pushed the tweak-custom-cuda branch 10 times, most recently from 5a7a65a to 1d87f69 Compare November 9, 2023 20:12

FabioLuporini reviewed Nov 10, 2023

View reviewed changes

mloubout force-pushed the tweak-custom-cuda branch from 1d87f69 to 39cb6c6 Compare November 10, 2023 12:38

mloubout removed bug-py-minor MPI mpi-related labels Nov 10, 2023

mloubout changed the title ~~compiler: misc arch and mpi bug fix~~ arch: support rocm for gpu info Nov 10, 2023

FabioLuporini approved these changes Nov 10, 2023

View reviewed changes

compiler: prevent custom compiler from removing host flags as duplict…

9a89f7c

…ates

mloubout force-pushed the tweak-custom-cuda branch 3 times, most recently from dc67515 to 94e7c92 Compare November 10, 2023 14:00

mloubout force-pushed the tweak-custom-cuda branch 3 times, most recently from 43a7baa to 99b776c Compare November 10, 2023 15:16

arch: suport rocm-smi for get_gpu_info

d0002b2

mloubout force-pushed the tweak-custom-cuda branch from 99b776c to d0002b2 Compare November 10, 2023 15:18

mloubout merged commit 730ed09 into master Nov 10, 2023
32 checks passed

mloubout deleted the tweak-custom-cuda branch July 22, 2024 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch: support rocm for gpu info #2261

arch: support rocm for gpu info #2261

mloubout commented Nov 7, 2023 •

edited

Loading

codecov bot commented Nov 7, 2023 •

edited

Loading

georgebisbas left a comment

deckerla commented Nov 7, 2023 •

edited

Loading

FabioLuporini Nov 8, 2023

mloubout Nov 8, 2023

FabioLuporini Nov 10, 2023

arch: support rocm for gpu info #2261

arch: support rocm for gpu info #2261

Conversation

mloubout commented Nov 7, 2023 • edited Loading

codecov bot commented Nov 7, 2023 • edited Loading

Codecov Report

georgebisbas left a comment

Choose a reason for hiding this comment

deckerla commented Nov 7, 2023 • edited Loading

FabioLuporini Nov 8, 2023

Choose a reason for hiding this comment

mloubout Nov 8, 2023

Choose a reason for hiding this comment

FabioLuporini Nov 10, 2023

Choose a reason for hiding this comment

mloubout commented Nov 7, 2023 •

edited

Loading

codecov bot commented Nov 7, 2023 •

edited

Loading

deckerla commented Nov 7, 2023 •

edited

Loading