Occasional kernel panics in Hyper-V #2195

neoGeneva · 2017-11-14T21:45:08Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Please provide the following details:

Environment:

Minikube version: v0.23.0

OS: Microsoft Windows 10 Pro 10.0.15063 N/A Build 15063
VM Driver: hyperv
ISO version: v0.23.6
Install tools: n/a
Others: fabric8 v0.4.173 | kubectl v1.8.0

What happened:
With a fresh install of minikube, attempting to install fabric8 eventually leads to a kernel panic after several minutes. The panic seems to happen occasionally without fabric8, though that's the most reliable way to produce it.

What you expected to happen:
minikube to not crash

How to reproduce it (as minimally and precisely as possible):

Create a virtual switch in hyper-v (name: minikubeNAT, connection type: internal network, vlan id: enable)
Install minikube: minikube start --vm-driver hyperv --hyperv-virtual-switch minikubeNAT
Install fabric8: gofabric8 deploy --package system -n fabric8 --github-client-id xxx --github-client-secret xxx

Output of minikube logs (if applicable):

The final output displayed in hyper-v (manually transcribed, hopefully there's no typos):

[  1817.969079]  0000000000000000 ffffc90002acbe40 ffffffffa0002ae5 ffffffff813ee645
[  1817.969079] Call Trace: 
[  1817.969079]  [<ffffffff810bd9e3>] ? del_timer_sync+0x43/0x50
[  1817.969079]  [<ffffffffa0002ae5>] vmbus_sendpacket_ctl+0xa5/0xb0 [hv_vmbus]
[  1817.969079]  [<ffffffff813ee645>] ? find_next_bit+0x15/0x20
[  1817.969079]  [<ffffffffa0002b04>] vmbus_sendpacket+0x14/0x20 [hv_vmbus]
[  1817.969079]  [<ffffffffa0081374>] post_status+0x104/0x110 [hv_balloon]
[  1817.969079]  [<ffffffffa0081380>] ? post_status+0x110/0x110 [hv_balloon]
[  1817.969079]  [<ffffffffa00813ad>] dm_thread_func+0x2d/0x40 [hv_balloon]
[  1817.969079]  [<ffffffff8107b145>] kthread+0xc5/0xe0
[  1817.969079]  [<ffffffff8107b080>] ? kthread_park+0x60/0x60
[  1817.969079]  [<ffffffff819e1005>] ret_from_fork+0x25/0x30
[  1817.969079] Code: 13 08 44 39 e0 75 ed 44 8d 69 08 80 7d d0 00 48 c7 45 b8 00 00 00 00 0f 85 2e 01 00 00 49 8b 97 f8 00 00 00 41 8b 87 08 01 00 00 <8b> 72 04 8b 3a 89 f1 89 7d c4 29 f9 39 fe 77 06 89 c1 29 f9 01
[  1817.969079] RIP  [<ffffffffa0004d30>] hv_ringbuffer_write+0x60/0x1d0 [hv_vmbus]
[  1817.969079]  RSP <ffffc90002acbd98>
[  1817.969079] CR2: ffffc90002aa9004
[  1817.969079] ---[ end trace lcff63732a36dce0 ]---
[  1817.969079] Kernel panic - not syncing: Fatal exception
[  1817.969079] Kernel Offset: disabled
[  1817.969079] Rebooting in 10 seconds..

The tail of dmesg logs before it crashed (though from a seperate crash from above):

[  760.586458] audit: type=1300 audit(1510647327.127:117): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=f47c50 items=0 ppid=3369 pid=7124 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[  760.586540] audit: type=1327 audit(1510647327.127:117): proctitle=69707461626C6573002D7732002D43004B5542452D4D41524B2D44524F50002D74006E6174002D6A004D41524B002D2D7365742D786D61726B00307830303030383030302F30783030303038303030
[  760.653018] audit: type=1325 audit(1510647327.193:118): table=nat family=2 entries=105
[  760.653132] audit: type=1300 audit(1510647327.193:118): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=1d52c50 items=0 ppid=3369 pid=7131 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[  760.653198] audit: type=1327 audit(1510647327.193:118): proctitle=69707461626C6573002D7732002D43004B5542452D4D41524B2D4D415351002D74006E6174002D6A004D41524B002D2D7365742D786D61726B00307830303030343030302F30783030303034303030
[  760.672711] audit: type=1325 audit(1510647327.213:119): table=nat family=2 entries=105
[  760.672776] audit: type=1300 audit(1510647327.213:119): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=1b3c9b0 items=0 ppid=3369 pid=7133 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[  760.672781] audit: type=1327 audit(1510647327.213:119): proctitle=69707461626C6573002D7732002D43004B5542452D504F5354524F5554494E47002D74006E6174002D6D00636F6D6D656E74002D2D636F6D6D656E74006B756265726E657465732073657276696365207472616666696320726571756972696E6720534E4154002D6D006D61726B002D2D6D61726B0030783030303034303030
[  761.695118] hv_balloon: Memory hot add failed

Anything else do we need to know:
I've tried getting addition dmesg logs out, but I'm not sure how to change the kernel parameters to be able add ttyS0 as a console.

The times that I've see it crash the VM has had a smallish amount of memory (512MB to 1GB) allocated to it, with the "Memory hot add failed" error showing I assume this is related to #1403, though in some cases can cause a kernel panic.

I've tried reproducing the issue with dynamic memory disabled and ram set to 2GB, and haven't yet been able to (though I'm still trying,) though the fabric8 pods get stuck in a crash loop and the VM runs out of memory.

The text was updated successfully, but these errors were encountered:

gbraad · 2017-11-28T06:15:21Z

I have seen someone report this before as part of another issue. The kernel panic looks like it is related to the hypervisor communication modules. Will have to try this myself... Note: I haven't seen this happen for Minishift (using the CentOS based images), so this is something with the Minikube ISO.

Do make sure you assign plenty memory. In your case I would expect > 2GB, as you are using fabric8. Also, check if there is swap assigned to the VM.

chgeuer · 2017-12-15T15:07:51Z

My minikube on Windows 10 Pro (15063.786) was creashing every 30-200 seconds. I changed memory allocation from 2GB to 4GB, and disabled "Dynamic Memory" in Hyper-V. Currently, it works.

fejta-bot · 2018-03-15T15:36:45Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-04-14T15:53:53Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-05-14T16:42:32Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

neoGeneva mentioned this issue Nov 14, 2017

localkube: panic: unable to read certificate-authority /var/lib/localkube/\var\lib\localkube\certs\ca.crt #1981

Closed

r2d4 added co/hyperv HyperV related issues os/windows labels Nov 16, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 14, 2018

k8s-ci-robot closed this as completed May 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Occasional kernel panics in Hyper-V #2195

Occasional kernel panics in Hyper-V #2195

neoGeneva commented Nov 14, 2017

gbraad commented Nov 28, 2017 •

edited

Loading

chgeuer commented Dec 15, 2017

fejta-bot commented Mar 15, 2018

fejta-bot commented Apr 14, 2018

fejta-bot commented May 14, 2018

Occasional kernel panics in Hyper-V #2195

Occasional kernel panics in Hyper-V #2195

Comments

neoGeneva commented Nov 14, 2017

gbraad commented Nov 28, 2017 • edited Loading

chgeuer commented Dec 15, 2017

fejta-bot commented Mar 15, 2018

fejta-bot commented Apr 14, 2018

fejta-bot commented May 14, 2018

gbraad commented Nov 28, 2017 •

edited

Loading