Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Hyper-V Nested Virt and Guest UEFI Support #729

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

willronchetti
Copy link

@willronchetti willronchetti commented Aug 3, 2017

This patch set integrates nested virtualization and guest UEFI support into OpenXT. These patches have been tested on OpenXT. The following tests have been carried out. Areas where testing can be improved are highlighted.

  • Windows Hyper-V VM creation
  • Windows UEFI Secure Boot Guest
  • XXX: Windows Credential Guard can be run, but appears to cripple the system and exhibit high CPU usage to the point where the OS in unresponsive/unusable. More testing is needed. This appears to be a known issue between Microsoft and Citrix.
  • XXX: Linux UEFI Guests tested: Debian, CentOS, Ubuntu, Arch and Fedora. Arch requires some extra work, while Ubuntu appears to place its 'bootx64.efi' file in a spot where OVMF cannot locate it. Both issues should be able to be fixed rather painlessly.

This is a very large patch set that will require thorough testing beyond what I've done before being merged into master.

@@ -1,48 +1,8 @@
################################################################################
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this header?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake - this patch was giving me some trouble. Will add it back in.

@jean-edouard
Copy link
Member

Fails to build (xen patchqueue doesn't apply): http://openxt-builder.ainfosec.com:8010/builders/openxt/builds/1042/steps/Build/logs/stdio

@dpsmith
Copy link
Member

dpsmith commented Aug 3, 2017

I have done a cursory read of the change set and will provide those comments later. Before that, I would note that this adds two major features in a single PR that is one commit. The single PR is fine since it is related featues, but I would suggest that this at least gets broken into two commits, one that adds the guest UEFI support and one builds on it to add the nested virtualization support.

if ( !++port )
nvcpu->nv_vmexit_pending = 1;
} while ( !nvcpu->nv_vmexit_pending );
+#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of just disabling, why not test for DEBUG. That way it is easier to turn on when troubleshooting.

+}
+
+
+static int paging_read_l2_entry(struct vcpu *v, unsigned long l2_pa, uint64_t * entry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent with spacing and honor spacing conventions of existing code you are patching.

+ return 0;
+}
+
+static unsigned int paging_get_l2_pa_from_l2_va(struct vcpu *v,unsigned long l2_va, unsigned long *l2_pa)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing here as well

+
+ uint64_t pml4e_addr;
+ uint64_t pml4e;
+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Is this extra white space needed?

+
+ uint64_t l2_pa;
+
+if (paging_get_l2_pa_from_l2_va(v,va,&l2_pa))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation


if ( is_hvm_vcpu(v) && paging_mode_hap(v->domain) && nestedhvm_is_n2(v) )
{
+#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't just turn off code blocks, if replacing then replace. If it is being left to provide understanding, then add a comment within the block to let people know why its still here.

if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
continue;

+#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again don't #if out replaced code.

Device (ISA)
{
- Name (_ADR, 0x00010000) /* device 1, fn 0 */
+ //Name (_ADR, 0x00010000) /* device 1, fn 0 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like #if, if you are leaving for understanding, then explain otherwise just delete the line

+ if (b_info->stubdomain_version == LIBXL_STUBDOMAIN_VERSION_LINUX) {
+ drive = libxl__sprintf
+ (gc, "file=%s%c,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
+ "/dev/xvd", 'a'+disk, disk, format);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way not to have to hard code the device path? Maybe make it a #define variable?

libxl__xs_get_dompath(gc, dm_domid)),
"%d", guest_domid);
if (guest_config->b_info.stubdomain_version == LIBXL_STUBDOMAIN_VERSION_LINUX) {
+#if 0 /* LIES */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is wrong, then just replace it.

(AHCI_NUM_COMMAND_SLOTS << 8) |
(AHCI_SUPPORTED_SPEED_GEN1 << AHCI_SUPPORTED_SPEED) |
- HOST_CAP_NCQ | HOST_CAP_AHCI;
+ /*HOST_CAP_NCQ |*/ HOST_CAP_AHCI;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just delete and if you feel its needed, add a comment that it was removed and why

return;
}
- dma_blk_unmap(dbs);
+ //dma_blk_unmap(dbs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just delete it

#include "migration/migration.h"
#include "kvm_i386.h"

+extern int xen_q35;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like blind accessible global state, they are too easily abused and leaves yourself open to potential exploit. Make it local and export wrappers to set and get that guard for correct values, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's how QEMU defines a bunch of other Xen variables too in include/hw/xen/xen.h, so I would just place this next to the other ones:

extern uint32_t xen_domid;
extern enum xen_mode xen_mode;
extern bool xen_allowed;

+ fatal_errmsg = g_strdup_printf("failed to get backing file size");
+ } else if (size == 0) {
+ fatal_errmsg = g_strdup_printf("PC system firmware (pflash) "
+ "cannot have zero size");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am all about observing 80 char, and this looks like it would have been fine. Please don't unnecessarily split.

+ }
+
+
+ blk = blk_by_legacy_dinfo(pflash_drv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation consistency

+
+
+ if (size < 0) {
+ fatal_errmsg = g_strdup_printf("failed to get backing file size");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

pc_memory_init(pcms, get_system_memory(),
rom_memory, &ram_memory);
+ } else {
+ xen_map_efi_var_rom(rom_memory);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this logic trigger for non-uefi guest on xen and if so, is it handled gracefully?

pc_q35_machine_options(m);
- m->alias = "q35";
+ if (xen_q35)
+ m->alias = "pc";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

IN EFI_HANDLE Controller
)
{
+#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace, not #if 0


#include "AcpiPlatform.h"

+#define FISH \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this for development, but maybe a more meaningful debug augmentation would be better.

EFI_STATUS Status;
VOID *Interface;
EFI_EVENT PciEnumerated;
+#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove, leave comment if necessary

return Status;
}

+#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again


EFI_ACPI_2_0_ROOT_SYSTEM_DESCRIPTION_POINTER *XenAcpiRsdpStructurePtr = NULL;

+#define FISH \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug again

SETTINGS Settings;

Status = GetSettings (&Settings);
+#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

AddReservedMemoryBaseSizeHob (0xFC000000, 0x1000000, FALSE);

- PcdSetBool (PcdPciDisableBusEnumeration, TRUE);
+ //PcdSetBool (PcdPciDisableBusEnumeration, TRUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

// 0xFEE00000 LAPIC 1 MB
//
PciSize = 0xFC000000 - PciBase;
+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is returns necessary? would reduce diff churn

+ PcdSet64 (PcdPciMmio32Base, 0x80000000);
+ PcdSet64 (PcdPciMmio32Size, 0x7c000000);
+
+# if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turned off code you added, should it just not be added?

XENIO_PROTOCOL *XenIo;
EFI_DEVICE_PATH_PROTOCOL *DevicePath;

+ return EFI_UNSUPPORTED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of just shorting out this function, can you not go to the invocations of this function and implement a check whether to even call?

EFI_STATUS Status;
XENBUS_PROTOCOL *XenBusIo;

+ return EFI_UNSUPPORTED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short out again

XEN_BLOCK_FRONT_DEVICE *Dev;
EFI_BLOCK_IO_MEDIA *Media;

+ return EFI_UNSUPPORTED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short out again

@dozylynx
Copy link
Contributor

dozylynx commented Aug 3, 2017

I'm very pleased to see this progress on nested virtualization -- thanks for the PR.

We need a detailed and clear set of test validation steps for this feature, or set of features, in this PR so that we will be able to verify that the new functionality has been preserved correctly when OpenXT uprevs to a newer version of the Xen hypervisor and toolstack software.

We need to know exactly what software, and what features of that software, this introduces a requirement to remain compatible with, please.

@willronchetti
Copy link
Author

A few updates:

  • The upstream patch revert should solve the build problem.
  • In lieu of working on code review for the moment I'll be working on pushing up some documents detailing how exactly to test out the new features.
  • This was all tested with Xen 4.6.5 - now that 4.6.6 has been merged to master, I've been testing today briefly on that. UEFI functionality looks good so far.
  • Today is my last (official) day at AIS so much of the work remaining to be done for this to be merged will be done by someone else. With that said, I will still keep an eye on this PR and provide any insight if need be.

@willronchetti
Copy link
Author

I have attached two documents here that may be freely distributed that describe how to go about testing out the new features. I hope these documents are helpful. I will be periodically keeping track of this PR and hope to contribute more in the future, but for now the remaining work will be picked up by someone else at AIS.

There is, however, one unfortunate thing to note. In my latest round of testing I've found that enabling Windows Hyper-V causes a triple fault upon boot as both a standard and UEFI guest. Having glanced through the Xen 4.6.6 changeset, I am not convinced that the problem was caused there. It's possible that other changes have come in recently that may have had an effect. Fortunately, the surface where this could have happened is relatively small. In addition, the original document we received suggests that all of this functionality (and more) was working in April 2017, thus the time frame for such a change is relatively small. This is the only issue I've found in my (limited) testing since merging many commits from master today.

openxt_efi_guide.txt
readme.pdf

@openxt-booter
Copy link

You may wish to leave the "#if 0"s in place as this functionality is likely to be merged upstream (Citrix are also instrested in q35, ovmf and Credential guard and HVCI), and it will make unpicking your PQ from the the upstream changes easier.

@dpsmith
Copy link
Member

dpsmith commented Aug 7, 2017

Okay, then don't just #if 0 out the code. Use a descriptive flag and possible comments that a) makes the code easily locatable and b) helps reviewers understand why the code is disabled.

@jean-edouard
Copy link
Member

Now OVMF won't build. Does it need nasm installed on the host?
http://openxt-builder.ainfosec.com:8010/builders/openxt/builds/1051/steps/Build/logs/stdio

@openxt-booter
Copy link

You should obtain the original documentation provided with this patch series, it explains in detail what each part of the patches do and why. How you wish to modify it an maintain them in this PR is entirely up to you: but each modification that diverges significantly from upstream is something you're going to either have to merge in later or for which a PQ must be maintained - eg it's quite likely that xen upstream will implement q35 and ovmf support itself soon.

@willronchetti
Copy link
Author

@jean-edouard I did not experience any issues building OVMF, but judging from that error message it looks like nasm is indeed needed to build OVMF. My build machine must've had it, which is why I did not make any note of it. The only original documentation given is the 'readme' posted above, which gives some additional background/details.

@jean-edouard
Copy link
Member

Thanks @willronchetti.
The package will have to be added there before merging this PR: https://github.com/OpenXT/openxt/blob/master/build-scripts/setup.sh#L162

In the meantime, I will manually add it to the AIS builders to be able to build and test this PR.

@jean-edouard
Copy link
Member

@jean-edouard
Copy link
Member

It also requires adding a device to qemu, which is not covered by this PR:
Aug 9 18:38:52.546306 xenconsoled: win10efi-dm (4): qemu-system-i386: Unknown device 'lsi53c895a' for bus 'PCI'

@jean-edouard
Copy link
Member

I ran a build with a qemu fix here: http://openxt-builder.ainfosec.com:8010/builders/openxt/builds/1077

I really feel that this PR should be split into multiple ones.

I was able to smoke-test the guest UEFI support (ovmf), and it actually works pretty well. It's also optional and not enabled by default, so I would be willing to merge it pretty quickly if it was in its own PR.

The nested-virt support is a more important feature that needs more reviewing, and not having guest UEFI support mixed-in would also help.

@eric-ch
Copy link
Contributor

eric-ch commented Aug 17, 2017

The nesting support is dependent on a work-around in this PR. Quoting readme.pdf that @willronchetti added:

The next patch fixes bit rot in the nesting support [...] we would not recommend using it for production as it does not enforce all the isolation requirements.

See also comments inline:

Until a proper fix is available, this is a -1 from me.

@jean-edouard +1 on Splitting the PR to get the guest UEFI support merged independently.

@stacktrust
Copy link

What are the test cases for guest UEFI support and nested virt in this PR?

In today's community call, it was stated that new developers would be working on these features. Should we wait for them to submit a new PR?

@openxt-booter
Copy link

@eric-ch the fix is already upstream, and I beleive there's a 4.8 backport.

- outb(0x4d0, (uint8_t)(PCI_ISA_IRQ_MASK >> 0));
- outb(0x4d1, (uint8_t)(PCI_ISA_IRQ_MASK >> 8));
+ if ((pci_readw(PCI_ISA_DEVFN, PCI_VENDOR_ID) == 0x8086) &&
+ (pci_readw(PCI_ISA_DEVFN, PCI_VENDOR_ID) == 0x7000)) {
Copy link
Contributor

@tklengyel tklengyel Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here should be PCI_DEVICE_ID

@tklengyel
Copy link
Contributor

Relevant discussion on xen-devel https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00379.html

The same conclusion has been reached about this approach I shared during the meeting we had in the fall: "While Secure Boot can be enabled with this implementation, it is not sufficiently secure because the guest is able to write anything it wants to the emulated flash. KVM solved this problem with SMM mode but I don't like that solution either."

@openxt-booter
Copy link

The correct solution is to implement efi variable reads and writes with a hypercall. There are only 3 functions. GetNextVariableName only needs 3 arguments, Get and Set both take 5 arguments,
leaving one register for the hypercall number (or you could pass a structure).

Since the interface is not often used, and performance isn't an issue, you could easily
implement this via a v4v connexion (thus requiring no hypervisor modifications).
Guest EFI would have a statically allocated space for a v4v ring, and then the EFI code
on each call would:

  1. take a spinlock
  2. walk CR3 to find the pfns of the ring
  3. bind the ring
  4. send a message to dom0 to do the operation
  5. poll the ring, in a busyloop calling pause in between
  6. read the reply
  7. release the ring
  8. release the spinlock.

This would be easy to implement, the only subtlety is that the code needs to register
for the Phys->Virt event so it can adjust the pointer to the ring and the spinlock, all
the other allocations can be from the stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants