Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional kernel panics in Hyper-V #2195

Closed
neoGeneva opened this issue Nov 14, 2017 · 5 comments
Closed

Occasional kernel panics in Hyper-V #2195

neoGeneva opened this issue Nov 14, 2017 · 5 comments
Labels
co/hyperv HyperV related issues lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. os/windows

Comments

@neoGeneva
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Please provide the following details:

Environment:

Minikube version: v0.23.0

  • OS: Microsoft Windows 10 Pro 10.0.15063 N/A Build 15063
  • VM Driver: hyperv
  • ISO version: v0.23.6
  • Install tools: n/a
  • Others: fabric8 v0.4.173 | kubectl v1.8.0

What happened:
With a fresh install of minikube, attempting to install fabric8 eventually leads to a kernel panic after several minutes. The panic seems to happen occasionally without fabric8, though that's the most reliable way to produce it.

What you expected to happen:
minikube to not crash

How to reproduce it (as minimally and precisely as possible):

  1. Create a virtual switch in hyper-v (name: minikubeNAT, connection type: internal network, vlan id: enable)
  2. Install minikube: minikube start --vm-driver hyperv --hyperv-virtual-switch minikubeNAT
  3. Install fabric8: gofabric8 deploy --package system -n fabric8 --github-client-id xxx --github-client-secret xxx

Output of minikube logs (if applicable):

The final output displayed in hyper-v (manually transcribed, hopefully there's no typos):

[  1817.969079]  0000000000000000 ffffc90002acbe40 ffffffffa0002ae5 ffffffff813ee645
[  1817.969079] Call Trace: 
[  1817.969079]  [<ffffffff810bd9e3>] ? del_timer_sync+0x43/0x50
[  1817.969079]  [<ffffffffa0002ae5>] vmbus_sendpacket_ctl+0xa5/0xb0 [hv_vmbus]
[  1817.969079]  [<ffffffff813ee645>] ? find_next_bit+0x15/0x20
[  1817.969079]  [<ffffffffa0002b04>] vmbus_sendpacket+0x14/0x20 [hv_vmbus]
[  1817.969079]  [<ffffffffa0081374>] post_status+0x104/0x110 [hv_balloon]
[  1817.969079]  [<ffffffffa0081380>] ? post_status+0x110/0x110 [hv_balloon]
[  1817.969079]  [<ffffffffa00813ad>] dm_thread_func+0x2d/0x40 [hv_balloon]
[  1817.969079]  [<ffffffff8107b145>] kthread+0xc5/0xe0
[  1817.969079]  [<ffffffff8107b080>] ? kthread_park+0x60/0x60
[  1817.969079]  [<ffffffff819e1005>] ret_from_fork+0x25/0x30
[  1817.969079] Code: 13 08 44 39 e0 75 ed 44 8d 69 08 80 7d d0 00 48 c7 45 b8 00 00 00 00 0f 85 2e 01 00 00 49 8b 97 f8 00 00 00 41 8b 87 08 01 00 00 <8b> 72 04 8b 3a 89 f1 89 7d c4 29 f9 39 fe 77 06 89 c1 29 f9 01
[  1817.969079] RIP  [<ffffffffa0004d30>] hv_ringbuffer_write+0x60/0x1d0 [hv_vmbus]
[  1817.969079]  RSP <ffffc90002acbd98>
[  1817.969079] CR2: ffffc90002aa9004
[  1817.969079] ---[ end trace lcff63732a36dce0 ]---
[  1817.969079] Kernel panic - not syncing: Fatal exception
[  1817.969079] Kernel Offset: disabled
[  1817.969079] Rebooting in 10 seconds.. 

The tail of dmesg logs before it crashed (though from a seperate crash from above):

[  760.586458] audit: type=1300 audit(1510647327.127:117): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=f47c50 items=0 ppid=3369 pid=7124 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[  760.586540] audit: type=1327 audit(1510647327.127:117): proctitle=69707461626C6573002D7732002D43004B5542452D4D41524B2D44524F50002D74006E6174002D6A004D41524B002D2D7365742D786D61726B00307830303030383030302F30783030303038303030
[  760.653018] audit: type=1325 audit(1510647327.193:118): table=nat family=2 entries=105
[  760.653132] audit: type=1300 audit(1510647327.193:118): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=1d52c50 items=0 ppid=3369 pid=7131 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[  760.653198] audit: type=1327 audit(1510647327.193:118): proctitle=69707461626C6573002D7732002D43004B5542452D4D41524B2D4D415351002D74006E6174002D6A004D41524B002D2D7365742D786D61726B00307830303030343030302F30783030303034303030
[  760.672711] audit: type=1325 audit(1510647327.213:119): table=nat family=2 entries=105
[  760.672776] audit: type=1300 audit(1510647327.213:119): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=1b3c9b0 items=0 ppid=3369 pid=7133 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[  760.672781] audit: type=1327 audit(1510647327.213:119): proctitle=69707461626C6573002D7732002D43004B5542452D504F5354524F5554494E47002D74006E6174002D6D00636F6D6D656E74002D2D636F6D6D656E74006B756265726E657465732073657276696365207472616666696320726571756972696E6720534E4154002D6D006D61726B002D2D6D61726B0030783030303034303030
[  761.695118] hv_balloon: Memory hot add failed

Anything else do we need to know:
I've tried getting addition dmesg logs out, but I'm not sure how to change the kernel parameters to be able add ttyS0 as a console.

The times that I've see it crash the VM has had a smallish amount of memory (512MB to 1GB) allocated to it, with the "Memory hot add failed" error showing I assume this is related to #1403, though in some cases can cause a kernel panic.

I've tried reproducing the issue with dynamic memory disabled and ram set to 2GB, and haven't yet been able to (though I'm still trying,) though the fabric8 pods get stuck in a crash loop and the VM runs out of memory.

@gbraad
Copy link
Contributor

gbraad commented Nov 28, 2017

I have seen someone report this before as part of another issue. The kernel panic looks like it is related to the hypervisor communication modules. Will have to try this myself... Note: I haven't seen this happen for Minishift (using the CentOS based images), so this is something with the Minikube ISO.

Do make sure you assign plenty memory. In your case I would expect > 2GB, as you are using fabric8. Also, check if there is swap assigned to the VM.

@chgeuer
Copy link

chgeuer commented Dec 15, 2017

My minikube on Windows 10 Pro (15063.786) was creashing every 30-200 seconds. I changed memory allocation from 2GB to 4GB, and disabled "Dynamic Memory" in Hyper-V. Currently, it works.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 14, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/hyperv HyperV related issues lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. os/windows
Projects
None yet
Development

No branches or pull requests

6 participants