-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Hi ,
We observed certain processes getting into the hung state intermittently in one of our production servers running on LS 1.5.0-1 .
Redis version : redis-2.8.3-1.el6.remi.x86_64.
Redis gem : logstash-input-redis-0.1.5,logstash-output-redis-0.1.4.
The issue was observed when there was a data loss over 6 hours , upon deeper analysis this was found from the dmesg output(below)
Logs from dmesg:
[130080.754158] INFO: task jbd2/vda1-8:404 blocked for more than 120 seconds.
[130080.760642] Not tainted 2.6.32-504.3.3.el6.x86_64 #1
[130080.763900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[130080.768831] jbd2/vda1-8 D 0000000000000000 0 404 2 0x00000000
[130080.772753] ffff8804347f7c20 0000000000000046 ffff8804347f7bb0 ffffffff81041e98
[130080.775250] 00000000ffffffff 00014aef7c132530 0000000000190490 ffff880433d83ad0
[130080.777580] 000000000488db84 ffffffffaa3742d6 ffff880433cda5f8 ffff8804347f7fd8
[130080.779830] Call Trace:
[130080.780769] [] ? pvclock_clocksource_read+0x58/0xd0
[130080.782369] [] ? ktime_get_ts+0xb1/0xf0
[130080.783821] [] ? sync_buffer+0x0/0x50
[130080.785258] [] io_schedule+0x73/0xc0
[130080.786579] [] sync_buffer+0x40/0x50
[130080.787920] [] __wait_on_bit+0x5f/0x90
[130080.789402] [] ? sync_buffer+0x0/0x50
[130080.790815] [] out_of_line_wait_on_bit+0x78/0x90
[130080.792304] [] ? wake_bit_function+0x0/0x50
[130080.793723] [] __wait_on_buffer+0x26/0x30
[130080.795212] [] jbd2_journal_commit_transaction+0x1181/0x1500 [jbd2]
[130080.797359] [] ? autoremove_wake_function+0x0/0x40
[130080.798860] [] kjournald2+0xb8/0x220 [jbd2]
[130080.800291] [] ? autoremove_wake_function+0x0/0x40
[130080.801929] [] ? kjournald2+0x0/0x220 [jbd2]
[130080.803370] [] kthread+0x9e/0xc0
[130080.804686] [] child_rip+0xa/0x20
[130080.805980] [] ? kthread+0x0/0xc0
[130080.807384] [] ? child_rip+0x0/0x20
[130080.808815] INFO: task monit:7737 blocked for more than 120 seconds.
[130080.810374] Not tainted 2.6.32-504.3.3.el6.x86_64 #1
[130080.811792] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[130080.814006] monit D 0000000000000002 0 7737 1 0x00000080
[130080.815728] ffff88043681ddc8 0000000000000082 ffff88043681dd90 ffff88043681dd8c
[130080.818115] 000000000000000e ffff88043fc24500 00007628f0511016 ffff8800282158c0
[130080.820557] 0000000000000400 0000000107b9cf25 ffff880436800638 ffff88043681dfd8
[130080.822826] Call Trace:
[130080.823715] [] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[130080.825291] [] ? autoremove_wake_function+0x0/0x40
[130080.826805] [] jbd2_complete_transaction+0x68/0xb0 [jbd2]
[130080.828434] [] ext4_sync_file+0x121/0x1d0 [ext4]
[130080.829960] [] vfs_fsync_range+0xa1/0x100
[130080.831305] [] vfs_fsync+0x1d/0x20
[130080.832593] [] do_fsync+0x3e/0x60
[130080.833877] [] sys_fsync+0x10/0x20
[130080.835203] [] system_call_fastpath+0x16/0x1b
[130080.836679] INFO: task redis-server:16342 blocked for more than 120 seconds.
[130080.838347] Not tainted 2.6.32-504.3.3.el6.x86_64 #1
[130080.839740] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[130080.841842] redis-server D 0000000000000000 0 16342 7669 0x00000080
[130080.843495] ffff8802ff523dc8 0000000000000086 ffff8802ff523e70 0000000000000001
[130080.845683] 000000000000000e ffff8802ff523de8 ffff8802ff523d88 ffffffff81123d80
[130080.847905] ffff880418442220 ffff8802ff523de8 ffff880438b9e5f8 ffff8802ff523fd8
[130080.850120] Call Trace:
[130080.850987] [] ? find_get_pages_tag+0x40/0x130
[130080.852500] [] ? prepare_to_wait+0x4e/0x80
[130080.853907] [] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[130080.855517] [] ? autoremove_wake_function+0x0/0x40
[130080.857042] [] jbd2_complete_transaction+0x68/0xb0 [jbd2]
[130080.858701] [] ext4_sync_file+0x121/0x1d0 [ext4]
[130080.860364] [] vfs_fsync_range+0xa1/0x100
[130080.861914] [] vfs_fsync+0x1d/0x20
[130080.863271] [] do_fsync+0x3e/0x60
[130080.864585] [] sys_fsync+0x10/0x20
[130080.865924] [] system_call_fastpath+0x16/0x1b
[130440.867121] INFO: task auditd:976 blocked for more than 120 seconds.
[130440.887808] Not tainted 2.6.32-504.3.3.el6.x86_64 #1
[130440.889428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[130440.891858] auditd D 0000000000000002 0 976 1 0x00000000
[130440.893890] ffff8804345c5dc8 0000000000000086 0000000000000000 0000000000000001
[130440.896336] 000000000000000e ffff8804345c5de8 000076857ec710a1 ffffffff81123d80
[130440.898530] ffff88043658c9f0 0000000107bfe1f2 ffff880439b245f8 ffff8804345c5fd8
[130440.900691] Call Trace:
[130440.901570] [] ? find_get_pages_tag+0x40/0x130
[130440.903118] [] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[130440.904704] [] ? autoremove_wake_function+0x0/0x40
[130440.906237] [] jbd2_complete_transaction+0x68/0xb0 [jbd2]
[130440.907907] [] ext4_sync_file+0x121/0x1d0 [ext4]
[130440.909426] [] vfs_fsync_range+0xa1/0x100
[130440.910797] [] vfs_fsync+0x1d/0x20
[130440.912099] [] do_fsync+0x3e/0x60
[130440.913418] [] sys_fsync+0x10/0x20
[130440.914720] [] system_call_fastpath+0x16/0x1b
[130440.916265] INFO: task monit:7737 blocked for more than 120 seconds.
[130440.917819] Not tainted 2.6.32-504.3.3.el6.x86_64 #1
[130440.919187] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[130440.921423] monit D 0000000000000000 0 7737 1 0x00000080
[130440.923205] ffff88043681ddc8 0000000000000082 ffff88043681de70 0000000000000001
[130440.925459] 000000000000000e ffff88043681dde8 ffff88043681dd88 ffffffff81123d80
[130440.927732] ffff8803e9691dd8 ffff88043681dde8 ffff880436800638 ffff88043681dfd8
[130440.929976] Call Trace:
[130440.930840] [] ? find_get_pages_tag+0x40/0x130
[130440.932328] [] ? prepare_to_wait+0x4e/0x80
[130440.933761] [] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[130440.935380] [] ? autoremove_wake_function+0x0/0x40
[130440.936969] [] jbd2_complete_transaction+0x68/0xb0 [jbd2]
[130440.938589] [] ext4_sync_file+0x121/0x1d0 [ext4]
[130440.940136] [] vfs_fsync_range+0xa1/0x100
[130440.941543] [] vfs_fsync+0x1d/0x20
[130440.942868] [] do_fsync+0x3e/0x60
[130440.944152] [] sys_fsync+0x10/0x20
[130440.945456] [] system_call_fastpath+0x16/0x1b
[130800.947600] INFO: task monit:7737 blocked for more than 120 seconds.
[130800.952255] Not tainted 2.6.32-504.3.3.el6.x86_64 #1
[130800.955780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[130800.961182] monit D 0000000000000000 0 7737 1 0x00000080
[130800.965362] ffff88043681ddc8 0000000000000082 0000000000000000 0000000000000001
[130800.972321] 000000000000000e ffff88043681dde8 000076b55fefce7b ffffffff81123d80
[130800.978541] ffff8803e9691dd8 0000000107c304ac ffff880436800638 ffff88043681dfd8
[130800.984111] Call Trace:
[130800.985754] [] ? find_get_pages_tag+0x40/0x130
[130800.987516] [] ? prepare_to_wait+0x4e/0x80
[130800.989063] [] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[130800.990687] [] ? autoremove_wake_function+0x0/0x40
[130800.992253] [] jbd2_complete_transaction+0x68/0xb0 [jbd2]
[130800.994023] [] ext4_sync_file+0x121/0x1d0 [ext4]
[130800.995597] [] vfs_fsync_range+0xa1/0x100
[130800.997073] [] vfs_fsync+0x1d/0x20
[130801.004995] [] do_fsync+0x3e/0x60
[130801.006273] [] sys_fsync+0x10/0x20
[130801.007618] [] system_call_fastpath+0x16/0x1b
As observed processes/task getting "blocked for more than 120 seconds" occur.We believe this stalled the logstash related processes to enter a livelock state.On restarting the logstash related processes the data flow seemed to start. Also the redis instance on that server was @ 100% CPU when this issue occured.
This kind of issue/error was never observed during the use of logstash 1.4.1 , even on a production environment and is beleived to be only observed once switched over to logstash 1.5.0-1.
The question now is whether this is a logstash related issue , or anything got to do with the redis version impacting this on logstash I/O . ? If so what can be done to avoid this issue .
Also is there a specific version of redis required while working with logstash 1.5.0-1 ?