Skip to content

NaN error and integer overflow in number of atoms #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
minitu opened this issue Sep 6, 2020 · 0 comments
Open

NaN error and integer overflow in number of atoms #12

minitu opened this issue Sep 6, 2020 · 0 comments

Comments

@minitu
Copy link

minitu commented Sep 6, 2020

Hello,
I'd like to report two errors that I observed when running the MPI + Kokkos version of MiniMD (miniMD/kokkos).

The first error is the T and P values showing up as NaN, which causes some kernels to run abnormally fast.
The configuration is as the following, executed on 32 nodes of OLCF Summit:

$ jsrun -n192 -a1 -c1 -g1 -K3 -r6 -M -gpu ./miniMD -i in.lj.miniMD -gn 0 -nx 768 -ny 768 -nz 384 -n 100
# Create System:
# Done ....
# miniMD-Reference 1.2 (MPI+OpenMP) output ...
# Run Settings:
        # MPI processes: 192
        # Host Threads: 1
        # Inputfile: ../inputs/in.lj.miniMD
        # Datafile: None
# Physics Settings:
        # ForceStyle: LJ
        # Force Parameters: 1.00 1.00
        # Units: LJ
        # Atoms: 905969664
        # Atom types: 8
        # System size: 1289.93 1289.93 644.96 (unit cells: 768 768 384)
        # Density: 0.844200
        # Force cutoff: 2.500000
        # Timestep size: 0.005000
# Technical Settings:
        # Neigh cutoff: 2.800000
        # Half neighborlists: 1
        # Team neighborlist construction: 0
        # Neighbor bins: 460 460 230
        # Neighbor frequency: 1000
        # Sorting frequency: 1000
        # Thermo frequency: 100
        # Ghost Newton: 0 
        # Use intrinsics: 0
        # Do safe exchange: 0
        # Size of float: 8

# Starting dynamics ...   
# Timestep T U P Time
0 nan -6.773368e+00 nan  0.000
100 nan 0.000000e+00 nan  1.138


# Performance Summary:
# MPI_proc OMP_threads nsteps natoms t_total t_force t_neigh t_comm t_other performance perf/thread grep_string t_extra
192 1 100 905969664 1.137955 0.050640 0.000000 0.671161 0.416153 79613833194.819092 414655381.223016 PERF_SUMMARY 0.000000

The second error is an integer overflow error in the total number of atoms, with large problem sizes:

$ jsrun -n1536 -a1 -c1 -g1 -K3 -r6 -M -gpu ./miniMD -i in.lj.miniMD -gn 0 -nx 1536 -ny 1536 -nz 768 -n 100
# Create System:
# Done ....
# miniMD-Reference 1.2 (MPI+OpenMP) output ...
# Run Settings:
        # MPI processes: 1536
        # Host Threads: 1
        # Inputfile: ../inputs/in.lj.miniMD
        # Datafile: None
# Physics Settings:
        # ForceStyle: LJ
        # Force Parameters: 1.00 1.00
        # Units: LJ
        # Atoms: -1342177280
        # Atom types: 8
        # System size: 2579.86 2579.86 1289.93 (unit cells: 1536 1536 768)
        # Density: 0.844200
        # Force cutoff: 2.500000
        # Timestep size: 0.005000
# Technical Settings:
        # Neigh cutoff: 2.800000
        # Half neighborlists: 1
        # Team neighborlist construction: 0
        # Neighbor bins: 921 921 460
        # Neighbor frequency: 1000
        # Sorting frequency: 1000
        # Thermo frequency: 100
        # Ghost Newton: 0
        # Use intrinsics: 0
        # Do safe exchange: 0
        # Size of float: 8

# Starting dynamics ...
# Timestep T U P Time
0 1.440000e+00 3.657619e+01 -6.220309e+00  0.000
100 1.435069e+00 3.657569e+01 -6.219723e+00  2.041


# Performance Summary:
# MPI_proc OMP_threads nsteps natoms t_total t_force t_neigh t_comm t_other performance perf/thread grep_string t_extra
1536 1 100 -1342177280 2.040788 0.056916 0.000000 0.852597 1.131275 -65767589680.726250 -42817441.198389 PERF_SUMMARY 0.000000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant