-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sambamba markdup deadlock #189
Comments
Actually, it appears to be ignoring "-t 1" and still creating 16 threads. The "-t" option only seems to have an effect if it is greater than the number of cores on the machine (so "-t 32" results in 32 threads but "-t 1" still gets 16 threads). |
From top, running with
I understand there is a main thread and a single task pool thread, but what are all those other threads doing? |
We are running the v0.5.8 release binary for Linux x64. |
Thanks for the excellent report! There are two task pools being created here, where one of them mostly sits idle. I fixed this in the |
Could it be that you're hitting this bug? http://www.infoq.com/news/2015/05/redhat-futex |
On the system in question, we're running Linux 3.19.0-49-generic on Ubuntu 14.04. It's a little hard to tell from the article exactly what's affected, but since the bug was introduced in Linux 3.10 it seems like we're probably new enough to be safe. |
Hi there, We are hitting the same bug on certain systems. We've checked the kernel bug in question, and confirmed that our systems include the patch. In addition to hanging, we get the following stack trace in /var/log/messages:
With the same input bam's it will repeatedly fail, but regenerating the BAM's can often fix the issue. We're still debugging, but we've also noticed that this error doesn't happen on XenServer hypervisors, only on KVM. Is there anything we could do to generate more information? Thanks in advance! |
@Seth-Karlo thanks for putting effort into debugging, 'repeatedly' sounds promising to me. |
Sure thing, I'm waiting for another run to fail again. Once it does I'll do the test and report back my findings. |
@lomereiter I'm a colleague of Seth-Karlo and followed up on one of the hanging sambamba processes today. I've taken some logs snippets and created a stack trace for the deadlocked process. I've also rescheduled the same job with --show-progress, only to find it finished without fault: I hope the stack trace can help you root cause the problem. |
It would be good to have the strace output prior to the deadlock (to have some context.) |
I've tried to get some more data on the deadlock situation. But that just seems to confirm we're still hitting the previously mentioned kernel bug. In this article it says attaching gdb or strace will wake up the application: https://access.redhat.com/solutions/1386323 |
Absence of errors with attached |
For now I've been running summarized straces on the processes just to keep them running. 98.65 21938.838735 109 200371614 39874878 futex 100.00 22238.076002 202592160 40093885 total |
We also seem to be seeing this issue on Here is the
|
Thanks @rtnh for the nice gist w/ backtraces (from gdb I guess, right?). |
Couldn't reproduce this w/ sambamda-0.5.8, VirtualBox, a Core i7-6700, and debian-7.4/linux 3.2.0. |
@MartinNowak what makes you think this is broken in the 3.10 version mentioned in the Gist? Both issues appears to have their fixes in:
If you grab the SRPM and apply patches you can also see the relevant changes in |
I am willing to look at this bug if anyone is still facing it. |
this is still an issue for us, causing a lot of seemingly stochastic failures, so hard to pin down. Samtools has recently introduced a markdup sub-comand so we've switched to that. |
Thanks for reporting. Reopened because I am rewriting markdup combined with subsampling. |
that would be ideal!!! |
@isthisthat can you tell me what CPU markdup is crashing on? Is it a Xeon 26xx series? |
Hi @pjotrp (I've been looking at this with @isthisthat) — yes (via c4 AWS EC2 instances). Just to clarify: not crashing but deadlocking indefinitely |
These are Intel Xeon E5-2666 v3 and should not suffer from the particular hardware bug we encountered in #335. It is interesting it only appears on KVM instances (reported above). I think we ought to try a new release that will come out in the coming days because it uses latest LDC and LLVM. If that does not fix it we can consider replacing the reader - which appears to have the deadlock. I have written a new reader and it would be interesting to try. |
Can you check whether the new release still shows this behaviour? |
sambamba version: 0.7.1, downloaded binary from release's page and uncompress for using. LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch
Distributor ID: Ubuntu
Description: Ubuntu 14.04.6 LTS
Release: 14.04
Codename: trusty Feb 10 01:39, when I used command as follow: $ /home/luna/Desktop/Software/sambamba/build/sambamba markdup -t 16 -l 9 --tmpdir ./ --sort-buffer-size 4096 ./Bulk.mem.sort.bam ./Bulk.mem.sort.mkdup.bam But it was deadlock at Feb 10 01:40. There are just 42 bam files: $ ls ~/work/TempChimera/Bulk/sambatmp/sambamba-pid70545-markdup-oerh
PairedEndsInforbsv0 sorted.13.bam.idx sorted.18.bam.idx sorted.22.bam.idx sorted.27.bam.idx sorted.31.bam.idx sorted.36.bam.idx sorted.40.bam.idx sorted.6.bam.idx
sorted.0.bam sorted.14.bam sorted.19.bam sorted.23.bam sorted.28.bam sorted.32.bam sorted.37.bam sorted.41.bam sorted.7.bam
sorted.0.bam.idx sorted.14.bam.idx sorted.19.bam.idx sorted.23.bam.idx sorted.28.bam.idx sorted.32.bam.idx sorted.37.bam.idx sorted.41.bam.idx sorted.7.bam.idx
sorted.10.bam sorted.15.bam sorted.1.bam sorted.24.bam sorted.29.bam sorted.33.bam sorted.38.bam sorted.42.bam sorted.8.bam
sorted.10.bam.idx sorted.15.bam.idx sorted.1.bam.idx sorted.24.bam.idx sorted.29.bam.idx sorted.33.bam.idx sorted.38.bam.idx sorted.42.bam.idx sorted.8.bam.idx
sorted.11.bam sorted.16.bam sorted.20.bam sorted.25.bam sorted.2.bam sorted.34.bam sorted.39.bam sorted.4.bam sorted.9.bam
sorted.11.bam.idx sorted.16.bam.idx sorted.20.bam.idx sorted.25.bam.idx sorted.2.bam.idx sorted.34.bam.idx sorted.39.bam.idx sorted.4.bam.idx sorted.9.bam.idx
sorted.12.bam sorted.17.bam sorted.21.bam sorted.26.bam sorted.30.bam sorted.35.bam sorted.3.bam sorted.5.bam
sorted.12.bam.idx sorted.17.bam.idx sorted.21.bam.idx sorted.26.bam.idx sorted.30.bam.idx sorted.35.bam.idx sorted.3.bam.idx sorted.5.bam.idx
sorted.13.bam sorted.18.bam sorted.22.bam sorted.27.bam sorted.31.bam sorted.36.bam sorted.40.bam sorted.6.bam And I checked the /var/log/messages, I found this: Feb 10 01:46:09 bioinfo vmunix: [2800262.044005] ------------[ cut here ]------------
Feb 10 01:46:09 bioinfo vmunix: [2800262.044027] WARNING: CPU: 13 PID: 70577 at /build/linux-lts-xenial-lRzcrX/linux-lts-xenial-4.4.0/arch/x86/include/asm/thread_info.h:226
sigsuspend+0x6d/0x70()
Feb 10 01:46:09 bioinfo vmunix: [2800262.044031] Modules linked in: nfsv3 xt_multiport ipmi_devintf ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf
_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tabl
es nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_pow
erclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_ssif lrw gf128mul glue_helper ablk_helper cryptd joyde
v input_leds ast ttm drm_kms_helper sb_edac drm edac_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me mei lpc_ich shpchp wmi ipmi_si 8250_fintek rfcomm ipmi_msghand
ler bnep bluetooth parport_pc ppdev acpi_pad mac_hid knem(OE) lp parport nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(
OE) iw_cm(
Feb 10 01:46:09 bioinfo vmunix: onfigfs ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_ib(OE) ib_uverbs(OE) ib_core(OE) mlx4
_en(OE) vxlan ip6_udp_tunnel udp_tunnel raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic mlx4_core(OE) igb mlx_compat(OE) usbhid mpt
3sas i2c_algo_bit ahci dca raid_class hid libahci ptp scsi_transport_sas raid6_pq pps_core libcrc32c raid1 raid0 multipath linear fjes
Feb 10 01:46:09 bioinfo vmunix: [2800262.044237] CPU: 13 PID: 70577 Comm: sambamba Tainted: G W OE 4.4.0-148-generic #174~14.04.1-Ubuntu
Feb 10 01:46:09 bioinfo vmunix: [2800262.044242] Hardware name: Sugon I840-G20/80B32-U4/H, BIOS 2.57 05/15/2018
Feb 10 01:46:09 bioinfo vmunix: [2800262.044246] 0000000000000000 ffff881625847ed0 ffffffff813eee37 0000000000000000
Feb 10 01:46:09 bioinfo vmunix: [2800262.044255] ffffffff81cc5118 ffff881625847f08 ffffffff810829f6 ffff882038e82a00
Feb 10 01:46:09 bioinfo vmunix: [2800262.044263] 00007fc2c57f8c50 00007fc32c6f12b8 00007fc2c57f85a8 0000000000000001
Feb 10 01:46:09 bioinfo vmunix: [2800262.044270] Call Trace:
Feb 10 01:46:09 bioinfo vmunix: [2800262.044288] [<ffffffff813eee37>] dump_stack+0x63/0x8c
Feb 10 01:46:09 bioinfo vmunix: [2800262.044297] [<ffffffff810829f6>] warn_slowpath_common+0x86/0xc0
Feb 10 01:46:09 bioinfo vmunix: [2800262.044303] [<ffffffff81082aea>] warn_slowpath_null+0x1a/0x20
Feb 10 01:46:09 bioinfo vmunix: [2800262.044310] [<ffffffff81092aed>] sigsuspend+0x6d/0x70
Feb 10 01:46:09 bioinfo vmunix: [2800262.044318] [<ffffffff81094140>] SyS_rt_sigsuspend+0x40/0x50
Feb 10 01:46:09 bioinfo vmunix: [2800262.044332] [<ffffffff8182d61b>] entry_SYSCALL_64_fastpath+0x22/0xcb
Feb 10 01:46:09 bioinfo vmunix: [2800262.044337] ---[ end trace 042f2041827a6556 ]--- I tried to use version 0.8.0, still deadlocked. The log: Feb 10 19:17:10 bioinfo vmunix: [2863321.656228] WARNING: CPU: 67 PID: 16791 at /build/linux-lts-xenial-lRzcrX/linux-lts-xenial-4.4.0/arch/x86/include/asm/thread_info.h:226
sigsuspend+0x6d/0x70()
Feb 10 19:17:10 bioinfo vmunix: [2863321.656231] Modules linked in: nfsv3 xt_multiport ipmi_devintf ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf
_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tabl
es nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_pow
erclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_ssif lrw gf128mul glue_helper ablk_helper cryptd joyde
v input_leds ast ttm drm_kms_helper sb_edac drm edac_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me mei lpc_ich shpchp wmi ipmi_si 8250_fintek rfcomm ipmi_msghand
ler bnep bluetooth parport_pc ppdev acpi_pad mac_hid knem(OE) lp parport nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(
OE) iw_cm(
Feb 10 19:17:10 bioinfo vmunix: onfigfs ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_ib(OE) ib_uverbs(OE) ib_core(OE) mlx4
_en(OE) vxlan ip6_udp_tunnel udp_tunnel raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic mlx4_core(OE) igb mlx_compat(OE) usbhid mpt
3sas i2c_algo_bit ahci dca raid_class hid libahci ptp scsi_transport_sas raid6_pq pps_core libcrc32c raid1 raid0 multipath linear fjes
Feb 10 19:17:10 bioinfo vmunix: [2863321.656402] CPU: 67 PID: 16791 Comm: sambamba Tainted: G W OE 4.4.0-148-generic #174~14.04.1-Ubuntu
Feb 10 19:17:10 bioinfo vmunix: [2863321.656405] Hardware name: Sugon I840-G20/80B32-U4/H, BIOS 2.57 05/15/2018
Feb 10 19:17:10 bioinfo vmunix: [2863321.656408] 0000000000000000 ffff88185a7f3ed0 ffffffff813eee37 0000000000000000
Feb 10 19:17:10 bioinfo vmunix: [2863321.656413] ffffffff81cc5118 ffff88185a7f3f08 ffffffff810829f6 ffff8840380ee200
Feb 10 19:17:10 bioinfo vmunix: [2863321.656417] 000000000001a05b 00007f521f066018 00007f51c8ff75e8 00007f521f066144
Feb 10 19:17:10 bioinfo vmunix: [2863321.656422] Call Trace:
Feb 10 19:17:10 bioinfo vmunix: [2863321.656436] [<ffffffff813eee37>] dump_stack+0x63/0x8c
Feb 10 19:17:10 bioinfo vmunix: [2863321.656443] [<ffffffff810829f6>] warn_slowpath_common+0x86/0xc0
Feb 10 19:17:10 bioinfo vmunix: [2863321.656446] [<ffffffff81082aea>] warn_slowpath_null+0x1a/0x20
Feb 10 19:17:10 bioinfo vmunix: [2863321.656450] [<ffffffff81092aed>] sigsuspend+0x6d/0x70
Feb 10 19:17:10 bioinfo vmunix: [2863321.656455] [<ffffffff81094140>] SyS_rt_sigsuspend+0x40/0x50
Feb 10 19:17:10 bioinfo vmunix: [2863321.656466] [<ffffffff8182d61b>] entry_SYSCALL_64_fastpath+0x22/0xcb
Feb 10 19:17:10 bioinfo vmunix: [2863321.656469] ---[ end trace 042f2041827a655d ]--- Now I am trying to use version 0.6.6 for testing if it's work. But it still deadlocked, and there are 298 bam files: $ ls /home/luna/work/TempChimera/Bulk/sam/sambamba-pid17263-markdup-xxmk
PairedEndsInfocdty0 sorted.130.bam sorted.165.bam sorted.19.bam sorted.233.bam sorted.268.bam sorted.32.bam sorted.67.bam
PairedEndsInfocdty1 sorted.130.bam.idx sorted.165.bam.idx sorted.19.bam.idx sorted.233.bam.idx sorted.268.bam.idx sorted.32.bam.idx sorted.67.bam.idx
PairedEndsInfocdty2 sorted.131.bam sorted.166.bam sorted.1.bam sorted.234.bam sorted.269.bam sorted.33.bam sorted.68.bam
PairedEndsInfocdty3 sorted.131.bam.idx sorted.166.bam.idx sorted.1.bam.idx sorted.234.bam.idx sorted.269.bam.idx sorted.33.bam.idx sorted.68.bam.idx
PairedEndsInfocdty4 sorted.132.bam sorted.167.bam sorted.200.bam sorted.235.bam sorted.26.bam sorted.34.bam sorted.69.bam
PairedEndsInfocdty5 sorted.132.bam.idx sorted.167.bam.idx sorted.200.bam.idx sorted.235.bam.idx sorted.26.bam.idx sorted.34.bam.idx sorted.69.bam.idx
PairedEndsInfocdty6 sorted.133.bam sorted.168.bam sorted.201.bam sorted.236.bam sorted.270.bam sorted.35.bam sorted.6.bam
SingleEndBasicInfofozu0 sorted.133.bam.idx sorted.168.bam.idx sorted.201.bam.idx sorted.236.bam.idx sorted.270.bam.idx sorted.35.bam.idx sorted.6.bam.idx
sorted.0.bam sorted.134.bam sorted.169.bam sorted.202.bam sorted.237.bam sorted.271.bam sorted.36.bam sorted.70.bam
sorted.0.bam.idx sorted.134.bam.idx sorted.169.bam.idx sorted.202.bam.idx sorted.237.bam.idx sorted.271.bam.idx sorted.36.bam.idx sorted.70.bam.idx
sorted.100.bam sorted.135.bam sorted.16.bam sorted.203.bam sorted.238.bam sorted.272.bam sorted.37.bam sorted.71.bam
sorted.100.bam.idx sorted.135.bam.idx sorted.16.bam.idx sorted.203.bam.idx sorted.238.bam.idx sorted.272.bam.idx sorted.37.bam.idx sorted.71.bam.idx
sorted.101.bam sorted.136.bam sorted.170.bam sorted.204.bam sorted.239.bam sorted.273.bam sorted.38.bam sorted.72.bam
sorted.101.bam.idx sorted.136.bam.idx sorted.170.bam.idx sorted.204.bam.idx sorted.239.bam.idx sorted.273.bam.idx sorted.38.bam.idx sorted.72.bam.idx
sorted.102.bam sorted.137.bam sorted.171.bam sorted.205.bam sorted.23.bam sorted.274.bam sorted.39.bam sorted.73.bam
sorted.102.bam.idx sorted.137.bam.idx sorted.171.bam.idx sorted.205.bam.idx sorted.23.bam.idx sorted.274.bam.idx sorted.39.bam.idx sorted.73.bam.idx
sorted.103.bam sorted.138.bam sorted.172.bam sorted.206.bam sorted.240.bam sorted.275.bam sorted.3.bam sorted.74.bam
sorted.103.bam.idx sorted.138.bam.idx sorted.172.bam.idx sorted.206.bam.idx sorted.240.bam.idx sorted.275.bam.idx sorted.3.bam.idx sorted.74.bam.idx
sorted.104.bam sorted.139.bam sorted.173.bam sorted.207.bam sorted.241.bam sorted.276.bam sorted.40.bam sorted.75.bam
sorted.104.bam.idx sorted.139.bam.idx sorted.173.bam.idx sorted.207.bam.idx sorted.241.bam.idx sorted.276.bam.idx sorted.40.bam.idx sorted.75.bam.idx
sorted.105.bam sorted.13.bam sorted.174.bam sorted.208.bam sorted.242.bam sorted.277.bam sorted.41.bam sorted.76.bam
sorted.105.bam.idx sorted.13.bam.idx sorted.174.bam.idx sorted.208.bam.idx sorted.242.bam.idx sorted.277.bam.idx sorted.41.bam.idx sorted.76.bam.idx
sorted.106.bam sorted.140.bam sorted.175.bam sorted.209.bam sorted.243.bam sorted.278.bam sorted.42.bam sorted.77.bam
sorted.106.bam.idx sorted.140.bam.idx sorted.175.bam.idx sorted.209.bam.idx sorted.243.bam.idx sorted.278.bam.idx sorted.42.bam.idx sorted.77.bam.idx
sorted.107.bam sorted.141.bam sorted.176.bam sorted.20.bam sorted.244.bam sorted.279.bam sorted.43.bam sorted.78.bam
sorted.107.bam.idx sorted.141.bam.idx sorted.176.bam.idx sorted.20.bam.idx sorted.244.bam.idx sorted.279.bam.idx sorted.43.bam.idx sorted.78.bam.idx
sorted.108.bam sorted.142.bam sorted.177.bam sorted.210.bam sorted.245.bam sorted.27.bam sorted.44.bam sorted.79.bam
sorted.108.bam.idx sorted.142.bam.idx sorted.177.bam.idx sorted.210.bam.idx sorted.245.bam.idx sorted.27.bam.idx sorted.44.bam.idx sorted.79.bam.idx
sorted.109.bam sorted.143.bam sorted.178.bam sorted.211.bam sorted.246.bam sorted.280.bam sorted.45.bam sorted.7.bam
sorted.109.bam.idx sorted.143.bam.idx sorted.178.bam.idx sorted.211.bam.idx sorted.246.bam.idx sorted.280.bam.idx sorted.45.bam.idx sorted.7.bam.idx
sorted.10.bam sorted.144.bam sorted.179.bam sorted.212.bam sorted.247.bam sorted.281.bam sorted.46.bam sorted.80.bam
sorted.10.bam.idx sorted.144.bam.idx sorted.179.bam.idx sorted.212.bam.idx sorted.247.bam.idx sorted.281.bam.idx sorted.46.bam.idx sorted.80.bam.idx
sorted.110.bam sorted.145.bam sorted.17.bam sorted.213.bam sorted.248.bam sorted.282.bam sorted.47.bam sorted.81.bam
sorted.110.bam.idx sorted.145.bam.idx sorted.17.bam.idx sorted.213.bam.idx sorted.248.bam.idx sorted.282.bam.idx sorted.47.bam.idx sorted.81.bam.idx
sorted.111.bam sorted.146.bam sorted.180.bam sorted.214.bam sorted.249.bam sorted.283.bam sorted.48.bam sorted.82.bam
sorted.111.bam.idx sorted.146.bam.idx sorted.180.bam.idx sorted.214.bam.idx sorted.249.bam.idx sorted.283.bam.idx sorted.48.bam.idx sorted.82.bam.idx
sorted.112.bam sorted.147.bam sorted.181.bam sorted.215.bam sorted.24.bam sorted.284.bam sorted.49.bam sorted.83.bam
sorted.112.bam.idx sorted.147.bam.idx sorted.181.bam.idx sorted.215.bam.idx sorted.24.bam.idx sorted.284.bam.idx sorted.49.bam.idx sorted.83.bam.idx
sorted.113.bam sorted.148.bam sorted.182.bam sorted.216.bam sorted.250.bam sorted.285.bam sorted.4.bam sorted.84.bam
sorted.113.bam.idx sorted.148.bam.idx sorted.182.bam.idx sorted.216.bam.idx sorted.250.bam.idx sorted.285.bam.idx sorted.4.bam.idx sorted.84.bam.idx
sorted.114.bam sorted.149.bam sorted.183.bam sorted.217.bam sorted.251.bam sorted.286.bam sorted.50.bam sorted.85.bam
sorted.114.bam.idx sorted.149.bam.idx sorted.183.bam.idx sorted.217.bam.idx sorted.251.bam.idx sorted.286.bam.idx sorted.50.bam.idx sorted.85.bam.idx
sorted.115.bam sorted.14.bam sorted.184.bam sorted.218.bam sorted.252.bam sorted.287.bam sorted.51.bam sorted.86.bam
sorted.115.bam.idx sorted.14.bam.idx sorted.184.bam.idx sorted.218.bam.idx sorted.252.bam.idx sorted.287.bam.idx sorted.51.bam.idx sorted.86.bam.idx
sorted.116.bam sorted.150.bam sorted.185.bam sorted.219.bam sorted.253.bam sorted.288.bam sorted.52.bam sorted.87.bam
sorted.116.bam.idx sorted.150.bam.idx sorted.185.bam.idx sorted.219.bam.idx sorted.253.bam.idx sorted.288.bam.idx sorted.52.bam.idx sorted.87.bam.idx
sorted.117.bam sorted.151.bam sorted.186.bam sorted.21.bam sorted.254.bam sorted.289.bam sorted.53.bam sorted.88.bam
sorted.117.bam.idx sorted.151.bam.idx sorted.186.bam.idx sorted.21.bam.idx sorted.254.bam.idx sorted.289.bam.idx sorted.53.bam.idx sorted.88.bam.idx
sorted.118.bam sorted.152.bam sorted.187.bam sorted.220.bam sorted.255.bam sorted.28.bam sorted.54.bam sorted.89.bam
sorted.118.bam.idx sorted.152.bam.idx sorted.187.bam.idx sorted.220.bam.idx sorted.255.bam.idx sorted.28.bam.idx sorted.54.bam.idx sorted.89.bam.idx
sorted.119.bam sorted.153.bam sorted.188.bam sorted.221.bam sorted.256.bam sorted.290.bam sorted.55.bam sorted.8.bam
sorted.119.bam.idx sorted.153.bam.idx sorted.188.bam.idx sorted.221.bam.idx sorted.256.bam.idx sorted.290.bam.idx sorted.55.bam.idx sorted.8.bam.idx
sorted.11.bam sorted.154.bam sorted.189.bam sorted.222.bam sorted.257.bam sorted.291.bam sorted.56.bam sorted.90.bam
sorted.11.bam.idx sorted.154.bam.idx sorted.189.bam.idx sorted.222.bam.idx sorted.257.bam.idx sorted.291.bam.idx sorted.56.bam.idx sorted.90.bam.idx
sorted.120.bam sorted.155.bam sorted.18.bam sorted.223.bam sorted.258.bam sorted.292.bam sorted.57.bam sorted.91.bam
sorted.120.bam.idx sorted.155.bam.idx sorted.18.bam.idx sorted.223.bam.idx sorted.258.bam.idx sorted.292.bam.idx sorted.57.bam.idx sorted.91.bam.idx
sorted.121.bam sorted.156.bam sorted.190.bam sorted.224.bam sorted.259.bam sorted.293.bam sorted.58.bam sorted.92.bam
sorted.121.bam.idx sorted.156.bam.idx sorted.190.bam.idx sorted.224.bam.idx sorted.259.bam.idx sorted.293.bam.idx sorted.58.bam.idx sorted.92.bam.idx
sorted.122.bam sorted.157.bam sorted.191.bam sorted.225.bam sorted.25.bam sorted.294.bam sorted.59.bam sorted.93.bam
sorted.122.bam.idx sorted.157.bam.idx sorted.191.bam.idx sorted.225.bam.idx sorted.25.bam.idx sorted.294.bam.idx sorted.59.bam.idx sorted.93.bam.idx
sorted.123.bam sorted.158.bam sorted.192.bam sorted.226.bam sorted.260.bam sorted.295.bam sorted.5.bam sorted.94.bam
sorted.123.bam.idx sorted.158.bam.idx sorted.192.bam.idx sorted.226.bam.idx sorted.260.bam.idx sorted.295.bam.idx sorted.5.bam.idx sorted.94.bam.idx
sorted.124.bam sorted.159.bam sorted.193.bam sorted.227.bam sorted.261.bam sorted.296.bam sorted.60.bam sorted.95.bam
sorted.124.bam.idx sorted.159.bam.idx sorted.193.bam.idx sorted.227.bam.idx sorted.261.bam.idx sorted.296.bam.idx sorted.60.bam.idx sorted.95.bam.idx
sorted.125.bam sorted.15.bam sorted.194.bam sorted.228.bam sorted.262.bam sorted.297.bam sorted.61.bam sorted.96.bam
sorted.125.bam.idx sorted.15.bam.idx sorted.194.bam.idx sorted.228.bam.idx sorted.262.bam.idx sorted.297.bam.idx sorted.61.bam.idx sorted.96.bam.idx
sorted.126.bam sorted.160.bam sorted.195.bam sorted.229.bam sorted.263.bam sorted.298.bam sorted.62.bam sorted.97.bam
sorted.126.bam.idx sorted.160.bam.idx sorted.195.bam.idx sorted.229.bam.idx sorted.263.bam.idx sorted.298.bam.idx sorted.62.bam.idx sorted.97.bam.idx
sorted.127.bam sorted.161.bam sorted.196.bam sorted.22.bam sorted.264.bam sorted.29.bam sorted.63.bam sorted.98.bam
sorted.127.bam.idx sorted.161.bam.idx sorted.196.bam.idx sorted.22.bam.idx sorted.264.bam.idx sorted.29.bam.idx sorted.63.bam.idx sorted.98.bam.idx
sorted.128.bam sorted.162.bam sorted.197.bam sorted.230.bam sorted.265.bam sorted.2.bam sorted.64.bam sorted.99.bam
sorted.128.bam.idx sorted.162.bam.idx sorted.197.bam.idx sorted.230.bam.idx sorted.265.bam.idx sorted.2.bam.idx sorted.64.bam.idx sorted.99.bam.idx
sorted.129.bam sorted.163.bam sorted.198.bam sorted.231.bam sorted.266.bam sorted.30.bam sorted.65.bam sorted.9.bam
sorted.129.bam.idx sorted.163.bam.idx sorted.198.bam.idx sorted.231.bam.idx sorted.266.bam.idx sorted.30.bam.idx sorted.65.bam.idx sorted.9.bam.idx
sorted.12.bam sorted.164.bam sorted.199.bam sorted.232.bam sorted.267.bam sorted.31.bam sorted.66.bam
sorted.12.bam.idx sorted.164.bam.idx sorted.199.bam.idx sorted.232.bam.idx sorted.267.bam.idx sorted.31.bam.idx sorted.66.bam.idx The log/messages: Feb 10 19:29:46 bioinfo vmunix: [2864077.314148] ------------[ cut here ]------------
Feb 10 19:29:46 bioinfo vmunix: [2864077.314167] WARNING: CPU: 13 PID: 17291 at /build/linux-lts-xenial-lRzcrX/linux-lts-xenial-4.4.0/arch/x86/include/asm/thread_info.h:226 sigsuspend+0x6d/0x70()
Feb 10 19:29:46 bioinfo vmunix: [2864077.314170] Modules linked in: nfsv3 xt_multiport ipmi_devintf ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_ssif lrw gf128mul glue_helper ablk_helper cryptd joydev input_leds ast ttm drm_kms_helper sb_edac drm edac_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me mei lpc_ich shpchp wmi ipmi_si 8250_fintek rfcomm ipmi_msghandler bnep bluetooth parport_pc ppdev acpi_pad mac_hid knem(OE) lp parport nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(
Feb 10 19:29:46 bioinfo vmunix: onfigfs ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_ib(OE) ib_uverbs(OE) ib_core(OE) mlx4_en(OE) vxlan ip6_udp_tunnel udp_tunnel raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic mlx4_core(OE) igb mlx_compat(OE) usbhid mpt3sas i2c_algo_bit ahci dca raid_class hid libahci ptp scsi_transport_sas raid6_pq pps_core libcrc32c raid1 raid0 multipath linear fjes
Feb 10 19:29:46 bioinfo vmunix: [2864077.314320] CPU: 13 PID: 17291 Comm: sambamba.bak Tainted: G W OE 4.4.0-148-generic #174~14.04.1-Ubuntu
Feb 10 19:29:46 bioinfo vmunix: [2864077.314323] Hardware name: Sugon I840-G20/80B32-U4/H, BIOS 2.57 05/15/2018
Feb 10 19:29:46 bioinfo vmunix: [2864077.314325] 0000000000000000 ffff881d52a6fed0 ffffffff813eee37 0000000000000000
Feb 10 19:29:46 bioinfo vmunix: [2864077.314329] ffffffff81cc5118 ffff881d52a6ff08 ffffffff810829f6 ffff8840360d0000
Feb 10 19:29:46 bioinfo vmunix: [2864077.314333] 0000000000936c01 00007efe9604d058 00007efe2a7fa6f8 00007efe96045f00
Feb 10 19:29:46 bioinfo vmunix: [2864077.314337] Call Trace:
Feb 10 19:29:46 bioinfo vmunix: [2864077.314350] [<ffffffff813eee37>] dump_stack+0x63/0x8c
Feb 10 19:29:46 bioinfo vmunix: [2864077.314356] [<ffffffff810829f6>] warn_slowpath_common+0x86/0xc0
Feb 10 19:29:46 bioinfo vmunix: [2864077.314359] [<ffffffff81082aea>] warn_slowpath_null+0x1a/0x20
Feb 10 19:29:46 bioinfo vmunix: [2864077.314362] [<ffffffff81092aed>] sigsuspend+0x6d/0x70
Feb 10 19:29:46 bioinfo vmunix: [2864077.314367] [<ffffffff81094140>] SyS_rt_sigsuspend+0x40/0x50
Feb 10 19:29:46 bioinfo vmunix: [2864077.314378] [<ffffffff8182d61b>] entry_SYSCALL_64_fastpath+0x22/0xcb
Feb 10 19:29:46 bioinfo vmunix: [2864077.314381] ---[ end trace 042f2041827a656c ]--- I don't know what happened, in previous, this command works normally, but sometimes it was deadlocked. |
We are experiencing deadlock in sambamba markdup v0.5.8. When it hits the deadlock, the process is completely stuck and never makes progress. We have experienced it with both
-t 1
and-t 16
although it seems to be more prevalent with-t 16
.After some indeterminate amount of time (could be 10 minutes, could be 40 minutes) the CPU usage drops to zero and it never reports anything past the first log message. Using strace indicates that all threads are waiting on a mutex:
The text was updated successfully, but these errors were encountered: