Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP Sparse access outside array boundaries #405

Open
6 of 11 tasks
Jim-Walk opened this issue Jul 18, 2019 · 2 comments
Open
6 of 11 tasks

OpenMP Sparse access outside array boundaries #405

Jim-Walk opened this issue Jul 18, 2019 · 2 comments

Comments

@Jim-Walk
Copy link

Jim-Walk commented Jul 18, 2019

What type of issue is this?

  • Bug in the code or other problem
  • Inadequate/incorrect documation
  • Feature request

If this is a bug report, please use the following template.
Otherwise, please delete the rest of the template.

Where does this bug appear?

Check all that apply:

  • MacOS
  • Linux
  • Cray
  • GCC
  • Clang
  • Intel compiler
  • MPICH and derivatives (MVAPICH2, Intel MPI, Cray MPI, etc.)
  • Open-MPI

Operating system

What is the output of uname -a?
Linux l0 4.18.0-17-generic #18~18.04.1-Ubuntu SMP Fri Mar 15 15:27:12 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Compiler

gcc
What is the output of ${COMPILER} -v or ${COMPILER} --version?
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

PRK build information

Please attach or inline make.defs.

#
# This file shows the GCC toolchain options for PRKs using
# OpenMP, MPI and/or Fortran coarrays only.
#
# Base compilers and language options
#
VERSION=-7
# C99 is required in some implementations.
CC=gcc${VERSION} -std=c11 -pthread
#EXTRA_CLIBS=-lrt
# All of the Fortran code is written for the 2008 standard and requires preprocessing.
FC=gfortran${VERSION} -std=f2008 -cpp
# C++11 may not be required but does no harm here.
CXX=g++${VERSION} -std=gnu++17 -pthread
#
# Compiler flags
#
# -mtune=native is appropriate for most cases.
# -march=native is appropriate if you want portable binaries.
DEFAULT_OPT_FLAGS=-O3 -mtune=native -ffast-math
#DEFAULT_OPT_FLAGS=-O0
DEFAULT_OPT_FLAGS+=-g3
#DEFAULT_OPT_FLAGS+=-fsanitize=undefined
#DEFAULT_OPT_FLAGS+=-fsanitize=undefined,leak
#DEFAULT_OPT_FLAGS+=-fsanitize=address
#DEFAULT_OPT_FLAGS+=-fsanitize=thread
# If you are compiling for KNL on a Xeon login node, use the following:
# DEFAULT_OPT_FLAGS=-g -O3 -march=knl
# See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html for details.
#
#DEFAULT_OPT_FLAGS+=-fopt-info-vec-missed
DEFAULT_OPT_FLAGS+=-Wall #-Werror
DEFAULT_OPT_FLAGS+=-Wno-ignored-attributes -Wno-deprecated-declarations
#DEFAULT_OPT_FLAGS+=-mavx -mfma
#
# OpenMP flags
#
OPENMPFLAG=-fopenmp
OPENMPSIMDFLAG=-fopenmp-simd
OFFLOADFLAG=-foffload="-O3 -v"
ORNLACCFLAG=-fopenacc
#
# OpenCL flags
#
# MacOS
OPENCLFLAG=-framework OpenCL
# Linux
#OPENCLDIR=/etc/alternatives/opencl-intel-tools
#OPENCLFLAG=-I${OPENCLDIR} -L${OPENCLDIR}/lib64 -lOpenCL
OPENCLFLAG+=-Wno-ignored-attributes -Wno-deprecated-declarations
METALFLAG=-framework MetalPerformanceShaders
#
# SYCL flags
#
# triSYCL
# https://github.com/triSYCL/triSYCL is header-only so just clone in Cxx11 directory...
SYCLDIR=./triSYCL
SYCLCXX=${CXX} -std=c++17 ${OPENMPFLAG}
SYCLFLAG=-I$(SYCLDIR)/include
# ProGTX
# https://github.com/ProGTX/sycl-gtx
#SYCLDIR=${HOME}/Work/OpenCL/sycl-gtx
#SYCLCXX=${CXX} ${OPENMPFLAG}
#SYCLFLAG=-DUSE_SYCL -I${SYCLDIR}/sycl-gtx/include -L${SYCLDIR}/build/sycl-gtx -lsycl-gtx ${OPENCLFLAG}
METALFLAG=-framework MetalPerformanceShaders
#
# OCCA
#
#OCCADIR=${HOME}/prk-repo/Cxx11/occa
#
# Cilk
#
#CILKFLAG=-fcilkplus
#
# TBB
#
TBBDIR=/usr/local/Cellar/tbb/2019_U5_1
TBBFLAG=-I${TBBDIR}/include -L${TBBDIR}/lib -ltbb
#
# Parallel STL, Boost, etc.
#
BOOSTFLAG=-I/usr/local/Cellar/boost/1.69.0_2/include
RANGEFLAG=-DUSE_BOOST_IRANGE ${BOOSTFLAG}
#RANGEFLAG=-DUSE_RANGES_TS -I./range-v3/include
PSTLFLAG=${OPENMPSIMDFLAG} ${TBBFLAG} ${RANGEFLAG}
#PSTLFLAG=${OPENMPSIMDFLAG} ${TBBFLAG} -DUSE_INTEL_PSTL -I./pstl/include ${RANGEFLAG}
KOKKOSDIR=/opt/kokkos/gcc
KOKKOSFLAG=-I${KOKKOSDIR}/include -L${KOKKOSDIR}/lib -lkokkos ${OPENMPFLAG}
RAJADIR=/opt/raja/gcc
RAJAFLAG=-I${RAJADIR}/include -L${RAJADIR}/lib -lRAJA ${OPENMPFLAG} ${TBBFLAG}
THRUSTDIR=/Users/jrhammon/Work/NVIDIA/thrust
THRUSTFLAG=-I${THRUSTDIR} ${RANGEFLAG}
#
# SYCL flags
#
# triSYCL
# https://github.com/triSYCL/triSYCL is header-only so just clone in Cxx11 directory...
SYCLDIR=./triSYCL
SYCLCXX=${CXX} -O3 -Wall -std=c++17 ${OPENMPFLAG}
SYCLFLAG=-I${SYCLDIR}/include ${BOOSTFLAG} -DTRISYCL
# ProGTX
# https://github.com/ProGTX/sycl-gtx
#SYCLDIR=${HOME}/Work/OpenCL/sycl-gtx
#SYCLCXX=${CXX} ${OPENMPFLAG}
#SYCLFLAG=-I${SYCLDIR}/sycl-gtx/include -L${SYCLDIR}/build/sycl-gtx -lsycl-gtx ${OPENCLFLAG}
SYCLFLAG+=${RANGEFLAG}
#
# SYCL flags
#
# triSYCL
# https://github.com/triSYCL/triSYCL is header-only so just clone in Cxx11 directory...
SYCLDIR=./triSYCL
SYCLCXX=${CXX} -std=c++17 ${OPENMPFLAG}
SYCLFLAG=-I${SYCLDIR}/include ${BOOSTFLAG}
# ProGTX
# https://github.com/ProGTX/sycl-gtx
#SYCLDIR=${HOME}/Work/OpenCL/sycl-gtx
#SYCLCXX=${CXX} ${OPENMPFLAG}
#SYCLFLAG=-DUSE_SYCL -I${SYCLDIR}/sycl-gtx/include -L${SYCLDIR}/build/sycl-gtx -lsycl-gtx ${OPENCLFLAG}
#
# CBLAS for C++ DGEMM
#
BLASFLAG=-DACCELERATE -framework Accelerate
CBLASFLAG=-DACCELERATE -framework Accelerate -flax-vector-conversions
#
# CUDA flags
#
# Mac w/ CUDA emulation via https://github.com/hughperkins/coriander
#NVCC=/opt/llvm/cocl/bin/cocl
# Linux w/ NVIDIA CUDA
NVCC=nvcc
CUDAFLAGS=-g -O3 -std=c++11 -arch=sm_50
# https://github.com/tensorflow/tensorflow/issues/1066#issuecomment-200574233
CUDAFLAGS+=-D_MWAITXINTRIN_H_INCLUDED
#
# ISPC
#
ISPC=ispc
ISPCFLAG=-O3 --target=host --opt=fast-math
#
# MPI
#
# We assume you have installed an implementation of MPI-3 that is in your path.
MPICC=mpicc -std=c99
#
# Fortran 2008 coarrays
#
# see https://github.com/ParRes/Kernels/blob/master/FORTRAN/README.md for details
# single-node
COARRAYFLAG=-fcoarray=single -lcaf_single
# multi-node
# COARRAYFLAG=-fcoarray=lib -lcaf_mpi

MEMKINDDIR=/home/parallels/PRK/deps
MEMKINDFLAGS=-I${MEMKINDDIR}/include -L${MEMKINDDIR}/lib -lmemkind -Wl,-rpath=${MEMKINDDIR}/lib

Output showing problem

When the sparse OpenMP benchmark is run with ./sparse 12 2 11 10 the program tries to write data to memory which has not been allocated. To find this error, please comment out line 262 and 263 of sparse.c, and then outside the for loop, on line 265 add printf("nent: %llu elm: %llu \n", nent, elm+4);. This will show that the length of col_index, nent, is 171966464, and the program tries to write to 171966467. Due to the parallel nature of this program, there is a strong chance that each thread is writing outside of it the boundaries of its own array. Furthermore, if the program is compiled with icc --check-pointers=rw on linux, the solution fails to validate.

If the output is short, please inline it here.
Otherwise, please pipe it to a plain text file and attach that file.
Note that you may need to use $command 2>&1 $log to capture the error messages.

Please do not attach screenshots of your terminal.

@jeffhammond
Copy link
Member

Thanks for the bug report. I will try to figure this out.

@jeffhammond jeffhammond pinned this issue Aug 15, 2019
@AtlantaPepsi
Copy link
Contributor

I believe the boundary indices are valid. 171966467 printed is elm+4 (elm=171966463), in the case of not commenting out 262 and 263, elm will be reassigned at line 262 and end up being 171966464 after the loop exits. Since the print statement at 265 is out side the matrix loop (249-264), elm will never be used again, so there should be no out of bound errors. (If you run it with icc where the solution validates, you will get the same output.)

But validation does fail when compiled with gcc, and I think the source of the error is at line 285. temp should be declared private as well to avoid race condition. I still need to run more cases and check if there are other problems, but I believe this is the one.

For some reason icc is immune to this bug, I couldn't replicate this issue even with -check-pointers=rw, quite curious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants