trixi-gpu · huiyuxie · Sep 9, 2024 · Sep 9, 2024 · Sep 9, 2024 · Sep 9, 2024
diff --git a/.github/workflows/SpellCheck.yml b/.github/workflows/SpellCheck.yml
@@ -11,3 +11,5 @@ jobs:
         uses: actions/checkout@v4
       - name: Check spelling
         uses: crate-ci/[email protected]
+        with:
+          config: typos.toml # specify the custom config file
diff --git a/docs/dev_env_info.md b/docs/dev_env_info.md
@@ -0,0 +1,262 @@
+# Development Environment Information
+
+## Recent Update 
+
+The hardware has been updated for this project recently. The new NVIDIA GeForce Series has been launched to continue implementation and testing of this project. Here is the detailed environment information for this project now.
+
+The following shows the specific CPU information.
+
+```Bash
+huiyu@huiyuxps15:~$ lscpu
+Architecture:            x86_64
+  CPU op-mode(s):        32-bit, 64-bit
+  Address sizes:         46 bits physical, 48 bits virtual
+  Byte Order:            Little Endian
+CPU(s):                  20
+  On-line CPU(s) list:   0-19
+Vendor ID:               GenuineIntel
+  Model name:            13th Gen Intel(R) Core(TM) i9-13900H
+    CPU family:          6
+    Model:               186
+    Thread(s) per core:  2
+    Core(s) per socket:  10
+    Socket(s):           1
+    Stepping:            2
+    BogoMIPS:            5990.39
+    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse ss
+                         e2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop
+                         _tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_time
+                         r aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enha
+                         nced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdsee
+                         d adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vae
+                         s vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
+Virtualization features:
+  Virtualization:        VT-x
+  Hypervisor vendor:     Microsoft
+  Virtualization type:   full
+Caches (sum of all):
+  L1d:                   480 KiB (10 instances)
+  L1i:                   320 KiB (10 instances)
+  L2:                    12.5 MiB (10 instances)
+  L3:                    24 MiB (1 instance)
+Vulnerabilities:
+  Gather data sampling:  Not affected
+  Itlb multihit:         Not affected
+  L1tf:                  Not affected
+  Mds:                   Not affected
+  Meltdown:              Not affected
+  Mmio stale data:       Not affected
+  Retbleed:              Mitigation; Enhanced IBRS
+  Spec rstack overflow:  Not affected
+  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
+  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
+  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
+  Srbds:                 Not affected
+  Tsx async abort:       Not affected
+```
+
+The following shows the specific GPU information.
+
+```Bash
+huiyu@huiyuxps15:~$ nvidia-smi
+Thu May 16 17:26:16 2024
++---------------------------------------------------------------------------------------+
+| NVIDIA-SMI 545.23.07              Driver Version: 546.12       CUDA Version: 12.3     |
+|-----------------------------------------+----------------------+----------------------+
+| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
+|                                         |                      |               MIG M. |
+|=========================================+======================+======================|
+|   0  NVIDIA GeForce RTX 4060 ...    On  | 00000000:01:00.0 Off |                  N/A |
+| N/A   43C    P3              11W /  45W |      0MiB /  8188MiB |      0%      Default |
+|                                         |                      |                  N/A |
++-----------------------------------------+----------------------+----------------------+
+
++---------------------------------------------------------------------------------------+
+| Processes:                                                                            |
+|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
+|        ID   ID                                                             Usage      |
+|=======================================================================================|
+|  No running processes found                                                           |
++---------------------------------------------------------------------------------------+
+```
+
+## Legacy (AWS)
+
+This project were implemented and tested on AWS EC2 instances. So far, types of EC2 instances used in this project include: `p3.2xlarge` (single-GPU) and `g4dn.12xlarge` (multi-GPU). For more information about the EC2 instances, please refer to [AWS EC2 Instance Types](https://aws.amazon.com/ec2/instance-types/).
+
+The following information is about the environment of the EC2 instances used in this project.
+
+### AWS `p3.2xlarge`
+The following shows the specific CPU information of the `p3.2xlarge` instance used in this project.
+
+```Bash
+ubuntu@ip-172-31-7-163:~/trixi_cuda$ lscpu
+Architecture:            x86_64
+  CPU op-mode(s):        32-bit, 64-bit
+  Address sizes:         46 bits physical, 48 bits virtual
+  Byte Order:            Little Endian
+CPU(s):                  8
+  On-line CPU(s) list:   0-7
+Vendor ID:               GenuineIntel
+  Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
+    CPU family:          6
+    Model:               79
+    Thread(s) per core:  2
+    Core(s) per socket:  4
+    Socket(s):           1
+    Stepping:            1
+    CPU max MHz:         3000.0000
+    CPU min MHz:         1200.0000
+    BogoMIPS:            4600.04
+    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscal
+                         l nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni
+                          pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdra
+                         nd hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erm
+                         s invpcid rtm rdseed adx xsaveopt
+Virtualization features: 
+  Hypervisor vendor:     Xen
+  Virtualization type:   full
+Caches (sum of all):     
+  L1d:                   128 KiB (4 instances)
+  L1i:                   128 KiB (4 instances)
+  L2:                    1 MiB (4 instances)
+  L3:                    45 MiB (1 instance)
+NUMA:                    
+  NUMA node(s):          1
+  NUMA node0 CPU(s):     0-7
+Vulnerabilities:         
+  Itlb multihit:         KVM: Mitigation: VMX unsupported
+  L1tf:                  Mitigation; PTE Inversion
+  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
+  Meltdown:              Mitigation; PTI
+  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
+  Retbleed:              Not affected
+  Spec store bypass:     Vulnerable
+  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
+  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
+  Srbds:                 Not affected
+  Tsx async abort:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
+```
+
+The following shows the specific GPU information of the `p3.2xlarge` instance used in this project.
+
+```Bash
+ubuntu@ip-172-31-7-163:~/trixi_cuda$ nvidia-smi
+Sat Aug 26 00:38:06 2023       
++---------------------------------------------------------------------------------------+
+| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
+|-----------------------------------------+----------------------+----------------------+
+| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|                                         |                      |               MIG M. |
+|=========================================+======================+======================|
+|   0  Tesla V100-SXM2-16GB            On | 00000000:00:1E.0 Off |                    0 |
+| N/A   47C    P0               25W / 300W|      0MiB / 16384MiB |      0%      Default |
+|                                         |                      |                  N/A |
++-----------------------------------------+----------------------+----------------------+
+
++---------------------------------------------------------------------------------------+
+| Processes:                                                                            |
+|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
+|        ID   ID                                                             Usage      |
+|=======================================================================================|
+|  No running processes found                                                           |
++---------------------------------------------------------------------------------------+
+```
+
+### AWS `g4dn.12xlarge`
+The following shows the specific CPU information of the `g4dn.12xlarge` instance used in this project.
+
+```Bash
+ubuntu@ip-172-31-4-230:~/trixi_cuda$ lscpu
+Architecture:            x86_64
+  CPU op-mode(s):        32-bit, 64-bit
+  Address sizes:         46 bits physical, 48 bits virtual
+  Byte Order:            Little Endian
+CPU(s):                  48
+  On-line CPU(s) list:   0-47
+Vendor ID:               GenuineIntel
+  Model name:            Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
+    CPU family:          6
+    Model:               85
+    Thread(s) per core:  2
+    Core(s) per socket:  24
+    Socket(s):           1
+    Stepping:            7
+    BogoMIPS:            4999.99
+    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf
+                         lush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch
+                         _perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_fre
+                         q pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popc
+                         nt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dno
+                         wprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms i
+                         nvpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512
+                         bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
+Virtualization features: 
+  Hypervisor vendor:     KVM
+  Virtualization type:   full
+Caches (sum of all):     
+  L1d:                   768 KiB (24 instances)
+  L1i:                   768 KiB (24 instances)
+  L2:                    24 MiB (24 instances)
+  L3:                    35.8 MiB (1 instance)
+NUMA:                    
+  NUMA node(s):          1
+  NUMA node0 CPU(s):     0-47
+Vulnerabilities:         
+  Gather data sampling:  Unknown: Dependent on hypervisor status
+  Itlb multihit:         KVM: Mitigation: VMX unsupported
+  L1tf:                  Mitigation; PTE Inversion
+  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unkno
+                         wn
+  Meltdown:              Mitigation; PTI
+  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unkno
+                         wn
+  Retbleed:              Vulnerable
+  Spec rstack overflow:  Not affected
+  Spec store bypass:     Vulnerable
+  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
+  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affect
+                         ed
+  Srbds:                 Not affected
+  Tsx async abort:       Not affected
+```
+
+The following shows the specific GPU information of the `g4dn.12xlarge` instance used in this project.
+
+```Bash
+ubuntu@ip-172-31-4-230:~/trixi_cuda$ nvidia-smi
+Sat Dec 30 20:19:54 2023       
++---------------------------------------------------------------------------------------+
+| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
+|-----------------------------------------+----------------------+----------------------+
+| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
+|                                         |                      |               MIG M. |
+|=========================================+======================+======================|
+|   0  Tesla T4                       On  | 00000000:00:1B.0 Off |                    0 |
+| N/A   20C    P8               8W /  70W |      2MiB / 15360MiB |      0%      Default |
+|                                         |                      |                  N/A |
++-----------------------------------------+----------------------+----------------------+
+|   1  Tesla T4                       On  | 00000000:00:1C.0 Off |                    0 |
+| N/A   20C    P8              10W /  70W |      2MiB / 15360MiB |      0%      Default |
+|                                         |                      |                  N/A |
++-----------------------------------------+----------------------+----------------------+
+|   2  Tesla T4                       On  | 00000000:00:1D.0 Off |                    0 |
+| N/A   21C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
+|                                         |                      |                  N/A |
++-----------------------------------------+----------------------+----------------------+
+|   3  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
+| N/A   20C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
+|                                         |                      |                  N/A |
++-----------------------------------------+----------------------+----------------------+
+
++---------------------------------------------------------------------------------------+
+| Processes:                                                                            |
+|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
+|        ID   ID                                                             Usage      |
+|=======================================================================================|
+|  No running processes found                                                           |
++---------------------------------------------------------------------------------------+
+```
diff --git a/examples/euler_shockcapturing_1d.jl b/examples/euler_shockcapturing_1d.jl
@@ -0,0 +1,65 @@
+using Trixi, TrixiGPU
+using OrdinaryDiffEq
+
+# The example is taken from the Trixi.jl
+
+###############################################################################
+# semidiscretization of the compressible Euler equations
+
+equations = CompressibleEulerEquations1D(1.4)
+
+initial_condition = initial_condition_weak_blast_wave
+
+surface_flux = flux_lax_friedrichs
+volume_flux = flux_shima_etal
+basis = LobattoLegendreBasis(3)
+indicator_sc = IndicatorHennemannGassner(equations, basis,
+                                         alpha_max = 0.5,
+                                         alpha_min = 0.001,
+                                         alpha_smooth = true,
+                                         variable = density_pressure)
+volume_integral = VolumeIntegralShockCapturingHG(indicator_sc;
+                                                 volume_flux_dg = volume_flux,
+                                                 volume_flux_fv = surface_flux)
+solver = DGSEM(basis, surface_flux, volume_integral)
+
+coordinates_min = -2.0
+coordinates_max = 2.0
+mesh = TreeMesh(coordinates_min, coordinates_max,
+                initial_refinement_level = 5,
+                n_cells_max = 10_000)
+
+semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver)
+
+###############################################################################
+# ODE solvers, callbacks etc.
+
+tspan = (0.0, 1.0)
+ode = semidiscretize_gpu(semi, tspan) # from TrixiGPU.jl
+
+summary_callback = SummaryCallback()
+
+analysis_interval = 100
+analysis_callback = AnalysisCallback(semi, interval = analysis_interval)
+
+alive_callback = AliveCallback(analysis_interval = analysis_interval)
+
+save_solution = SaveSolutionCallback(interval = 100,
+                                     save_initial_solution = true,
+                                     save_final_solution = true,
+                                     solution_variables = cons2prim)
+
+stepsize_callback = StepsizeCallback(cfl = 0.8)
+
+callbacks = CallbackSet(summary_callback,
+                        analysis_callback, alive_callback,
+                        save_solution,
+                        stepsize_callback)
+
+###############################################################################
+# run the simulation
+
+sol = solve(ode, CarpenterKennedy2N54(williamson_condition = false),
+            dt = 1.0, # solve needs some value here but it will be overwritten by the stepsize_callback
+            save_everystep = false, callback = callbacks);
+summary_callback() # print the timer summary