Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Paired-end datasets #28

Open
ps-account opened this issue May 16, 2019 · 5 comments
Open

Issues with Paired-end datasets #28

ps-account opened this issue May 16, 2019 · 5 comments

Comments

@ps-account
Copy link

ps-account commented May 16, 2019

Running the code (also using vmiheer latest version) on a paired end dataset leads to a crash. gdb seems to indicate the "opposite alignment kernel" might be where things go wrong...

info    : [0] aligning reads [168820736, 169869311]
verbose : [0]   1048576 reads
verbose : [0]   209.715 M bps (300.0 MB)
verbose : [0]   100.0 bps/read (min: 100, max: 100)
verbose : [0]   26.7 K reads/s
info    : [0] aligning reads [169869312, 170758330]
verbose : [0]   889019 reads
verbose : [0]   177.764 M bps (254.3 MB)
verbose : [0]   100.0 bps/read (min: 100, max: 100)
error   : opposite alignment kernel: an illegal memory access was encountered
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  device free failed: an illegal memory access was encountered

Thread 17 "nvBowtie" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff80d61700 (LWP 25577)]
0x00007ffff693c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
@ps-account
Copy link
Author

backtrace, it might be just a paired end issue

#0  0x00007ffff693c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff693e02a in __GI_abort () at abort.c:89
#2  0x00007ffff74ae8f7 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff74b4a46 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff74b3aa9 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff74b4458 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff6ce1573 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007ffff6ce1ad1 in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007ffff74b4ca7 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x0000000000695e7b in thrust::cuda_cub::throw_on_error (status=cudaErrorIllegalAddress,
    msg=0x9f1f37 "device free failed") at /home/bla/local/cuda/cuda-10.0/include/thrust/system/cuda/detail/util.h:194
#10 0x00000000006a2196 in thrust::cuda_cub::free<thrust::cuda_cub::tag, thrust::device_ptr<void> > (ptr=...)
    at /home/bla/local/cuda/cuda-10.0/include/thrust/system/cuda/detail/malloc_and_free.h:87
#11 0x00000000006a02c1 in thrust::free<thrust::cuda_cub::tag, thrust::device_ptr<void> > (exec=..., ptr=...)
    at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/malloc_and_free.h:78
#12 0x000000000069f3d2 in thrust::device_free (ptr=...)
    at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/device_free.inl:40
#13 0x000000000072b940 in thrust::device_malloc_allocator<unsigned int>::deallocate (this=0x7fff80d60ab0, p=...,
    cnt=1572862) at /home/bla/local/cuda/cuda-10.0/include/thrust/device_malloc_allocator.h:148
#14 0x0000000000728eda in thrust::detail::allocator_traits<thrust::device_malloc_allocator<unsigned int> >::deallocate(thrust::device_malloc_allocator<unsigned int>&, thrust::device_ptr<unsigned int>, unsigned long)::workaround_warnings::deallocate(thrust::device_malloc_allocator<unsigned int>&, thrust::device_ptr<unsigned int>, unsigned long) (a=..., p=..., n=1572862)
    at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/allocator/allocator_traits.inl:257
#15 0x0000000000728f07 in thrust::detail::allocator_traits<thrust::device_malloc_allocator<unsigned int> >::deallocate (
    a=..., p=..., n=1572862) at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/allocator/allocator_traits.inl:261
#16 0x000000000072628c in thrust::detail::contiguous_storage<unsigned int, thrust::device_malloc_allocator<unsigned int> >::deallocate (this=0x7fff80d60ab0) at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/contiguous_storage.inl:190
#17 0x0000000000725ee8 in thrust::detail::contiguous_storage<unsigned int, thrust::device_malloc_allocator<unsigned int> >::~contiguous_storage (this=0x7fff80d60ab0, __in_chrg=<optimized out>)
    at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/contiguous_storage.inl:64
#18 0x0000000000770fe8 in thrust::detail::vector_base<unsigned int, thrust::device_malloc_allocator<unsigned int> >::~vector_base (this=0x7fff80d60ab0, __in_chrg=<optimized out>)
    at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/vector_base.inl:497

---Type <return> to continue, or q <return> to quit---
#19 0x00000000007701aa in thrust::device_vector<unsigned int, thrust::device_malloc_allocator<unsigned int> >::~device_vector (this=0x7fff80d60ab0, __in_chrg=<optimized out>) at /home/bla/local/cuda/cuda-10.0/include/thrust/device_vector.h:78
#20 0x0000000000770854 in nvbio::vector<nvbio::device_tag, unsigned int>::~vector (this=0x7fff80d60ab0,
    __in_chrg=<optimized out>) at /home/bla/local/nvBowtie-cuda10/nvbio/nvbio/basic/vector.h:113
#21 0x0000000000774d90 in nvbio::io::SequenceDataStorage<nvbio::device_tag>::~SequenceDataStorage (this=0x7fff80d60a00,
    __in_chrg=<optimized out>) at /home/bla/local/nvBowtie-cuda10/nvbio/nvbio/io/sequence/sequence.h:436
#22 0x0000000000768221 in nvbio::bowtie2::cuda::ComputeThreadPE::do_run (this=0x37aee80)
    at /home/bla/local/nvBowtie-cuda10/nvbio/nvBowtie/bowtie2/cuda/compute_thread.cu:597
#23 0x00000000007682b5 in nvbio::bowtie2::cuda::ComputeThreadPE::run (this=0x37aee80)
    at /home/bla/local/nvBowtie-cuda10/nvbio/nvBowtie/bowtie2/cuda/compute_thread.cu:693
#24 0x0000000000678930 in nvbio::Thread<nvbio::bowtie2::cuda::ComputeThreadPE>::execute (arg=0x37aee80)
    at /home/bla/local/nvBowtie-cuda10/nvbio/nvbio/basic/threads.h:116
#25 0x00007ffff7bc16ba in start_thread (arg=0x7fff80d61700) at pthread_create.c:333
#26 0x00007ffff6a0e41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@ps-account
Copy link
Author

now with cuda-gdb. Another weird thing is this issue happens on Pascal but not on Maxwell.
The finishing of the alignment kernel seems to be the issue


info    : [0] aligning reads [168820736, 169869311]
verbose : [0]   1048576 reads
verbose : [0]   209.715 M bps (300.0 MB)
verbose : [0]   100.0 bps/read (min: 100, max: 100)
verbose : [0]   26.8 K reads/s
info    : [0] aligning reads [169869312, 170758330]
verbose : [0]   889019 reads
verbose : [0]   177.764 M bps (254.3 MB)
verbose : [0]   100.0 bps/read (min: 100, max: 100)

CUDA Exception: Warp Out-of-range Address
The exception was triggered at PC 0x562ebd0

Thread 17 "nvBowtie" received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 156, grid 433544, block (5731,0,0), thread (78,0,0), device 0, sm 18, warp 40, lane 14]
0x000000000562ebf0 in nvbio::bowtie2::cuda::detail::finish_alignment_kernel<nvbio::bowtie2::cuda::detail::BestTracebackStream<0u, nvbio::aln::GotohAligner<(nvbio::aln::AlignmentType)1, nvbio::bowtie2::cuda::SmithWatermanScoringScheme<nvbio::bowtie2::cuda::QualCost<int>, nvbio::bowtie2::cuda::ConstantCost<int> >, nvbio::aln::PatternBlockingTag>, nvbio::bowtie2::cuda::TracebackPipelineState<nvbio::bowtie2::cuda::SmithWatermanScoringScheme<nvbio::bowtie2::cuda::QualCost<int>, nvbio::bowtie2::cuda::ConstantCost<int> > > >, nvbio::bowtie2::cuda::SmithWatermanScoringScheme<nvbio::bowtie2::cuda::QualCost<int>, nvbio::bowtie2::cuda::ConstantCost<int> >, nvbio::bowtie2::cuda::TracebackPipelineState<nvbio::bowtie2::cuda::SmithWatermanScoringScheme<nvbio::bowtie2::cuda::QualCost<int>, nvbio::bowtie2::cuda::ConstantCost<int> > > ><<<(5734,1,1),(96,1,1)>>> ()

@ps-account
Copy link
Author

How to reproduce creating a truncated sam file from a small unpaired dataset, assuming you have installed nvbio:

# get arabidopsis from e.g. illumina igenome
wget ftp://igenome:[email protected]/Arabidopsis_thaliana/Ensembl/TAIR10/Arabidopsis_thaliana_Ensembl_TAIR10.tar.gz
# unpack
tar -zxvf Arabidopsis_thaliana_Ensembl_TAIR10.tar.gz
cd Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/
# create index
nvBWT -d 1 genome.fa genome-index

cd 
# if you don't have it, download sra toolkit from https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/
~/sratoolkit.2.9.6-1-ubuntu64/bin/prefetch -v ERX3219973
~/sratoolkit.2.9.6-1-ubuntu64/bin/fastq-dump --outdir . --split-files $HOME/ncbi/public/sra/ERX3219973.sra

# now run nvBowtie
nvBowtie -x $HOME/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome-index  -1 ERX3219973_1.fastq -2 ERX3219973_2.fastq -S ERX3219973.bam

# make sure you have samtools installed, then run 
samtools view ERX3219973.bam | tail -n1
[main_samview] truncated file.
ERX3219973.91 ST-J00101:86:HMYKLBBXX:7:1103:9709:41950 length=150       4       *       0       0       *       *       0   0AACCGGTGAGACTTCCAATGATTGATTCAAATTAACTTCGAAGCTTCCATTTGTTCTTCACTTTGCTGACTGTGTTTATTGTTGGTTACAGGAAGGCAAGGACAATGTTAGAGTCATAGGTATTTTTCTTGACTTGTCTCAGATAAAGGG       AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

@ps-account
Copy link
Author

I get an error at the start of nvBowtie, could that be related?

info    : nvBowtie... started
verbose :   cuda devices : 1
verbose :   device 0 has compute capability 6.1
verbose :     SM count          : 20
verbose :     SM clock rate     : 1733 Mhz
verbose :     memory clock rate : 4.5 Ghz
verbose :   chosen device 0
verbose :     device name        : Quadro P5000
verbose :     compute capability : 6.1
visible : mapping reference index... started
info    :   file: "genome-index"
info    : SequenceDataMMAP: error mapping file "/nvbio.genome-index.seq_info" (2)!
visible : mapping reference index... failed
visible : loading reference index... started
info    :   file: "genome-index"
visible : loading reference index... done
visible : FMIndexData: loading... started
visible :   genome : genome-index
info    : reading bwt... started
info    : reading bwt... done
verbose :   length: 119667750
info    : building occurrence table... started

@teepean
Copy link

teepean commented Oct 21, 2020

I have experienced the same problem. Stracing shows following error:

openat(AT_FDCWD, "/dev/shm/nvbio.hs37d5-index.seq_info", O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = -1 ENOENT (No such file or directory)

EDIT: This error occurs when the shared memory is not running. So in my case I ran:

./nvFM-server hs37d5-index hs37d5 &

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants