Skip to content

DoTheEvo/XCPng-basics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 

Repository files navigation

XCP-ng

guide-by-example

logo

  1. Purpose & Overview
  2. Why XCP-ng
  3. Xen Orchestra
  4. The Basics
  5. Backups
  6. Advanced Concepts
  7. Issues encountered
  8. Videos

Purpose & Overview

A virtualization platform build around Xen a type 1 hypervisor.
An alternative to ESXi or Proxmox.

Xen is an open source project, developed under the Linux Foundation with support from major industry players - aws, intel, amd, arm, google, alibaba cloud,...
2007-2013 Citrix directed Xen development, but gave up the control to attract more collaboration from the industry giants, of which aws(amazon) is the biggest xen user.

XCPng itself started on kickstarter as a fork of XenServer, which with version 7.0 closed-source some of it's components. The first release came out in 2018, but Vates - the company behind it, worked on Xen Orchestra since 2012. They are located in France and have ~40 employees.

  • Xen - The hypervisor.
  • XCPng - A single purpose linux distro preconfigured with xen, uses centos user space.
    This is what you install on the metal.
  • XO - Xen Orchestra - a web interface for centralized management of xcpng hosts.
    Deployed as a container or a virtual machine.
  • XOA - Xen Orchestra Appliance - a paid version of XO with full support and some extra features like XOSTOR through webGUI.
  • XCPng Center - A windows desktop application for management of xcpng hosts, a community project. Was abandonware but it has a new maintainer.

Why XCP-ng

vs-proxmox

In 2022 Broadcom announced the plan to buy VMware for $60 billion and the search for ESXi replacement started.

Proxmox

The absolute front runner candidate. Debian based, uses KVM for VMs, LXC for containers, native ZFS, native CEPH, huge active community, dozens of tech youtubers, a proven solution being out there for 20 years. Made in Austria.
Tried it and it looked good. Bit complicated, bit unpolished, but very powerful. The thing is that I never felt drawn to it. Felt like I would be spending a lot of time learning the ins and outs to get the confidence I had with esxi. And while that's expected, it's still a chore, still an annoyance.
This made me want to stick longer with esxi and let proxmox cook, get few more major releases and improvements as vmware refuges start to give feedback and money.

XCPng

Seen it mentioned and had a spare Fujitsu P558 with i3-9100 to test it on. After I wrapped my head around the need to deploy Xen Orchestra somewhere, it felt like everything was simple and it just worked with minimal effort. And that apparently is what makes me enthusiastic about stuff.

  • Tried to spin win11 24H2 and it just worked without manually dealing with TPM. The VM creation did not feel overwhelming with 19 options and settings, which all feel like opportunities for a fuck up. Once RDPed in, it felt fast and responsive, without anything weird happening and without being send to read 5 pages on how to tweak it to improve performance.
  • Tried to spin arch linux, no problem, zero pauses to read up on what to do to get uefi boot working, or some dealings with secure boot. No issues, no complications, no refusal to turn off when the VM was told to turn off.
  • Tried igpu passthrough in to that arch to test jellyfin in docker.. and it was just pushing a slider next to the igpu and a restart of the host and then in the VM's settings picking the igpu from a list.
  • Tried to deploy opnsense with a wan side and a lan side networks and some windows VM that would be only connected to the opnsense LAN side... and it also just straight up worked, no crowded complicated menus and settings, no spending lot of time investigating.
  • Tried snapshots and it was simple. Though they are simple in all hypervisors I guess.
  • Tried to backup a VM to a network share and it seemed ok, also rolling snapshots just worked with simple scheduling. Though in backups there are more options and menus and terms that will require reading and testing... but even this basic intuitive stuff beats the free esxi with the ghetto script.
  • Tried migrating a VM from esxi and it was also ridiculously easy. Just giving the ip and credentials, selecting which VM to migrate and what kind of system it is. Though it took ~5 hours for a 90GB vmkd.

The webUI of XO has a bit of an amateurish vibe compared to proxmox or esxi, but generally it's clean and simple. I like that often the info you see can be clicked and edited right then and there. That's how you change names, the number of cores, the ram, or enlarge disks.
They do work on a new redesign that reminds me of opnsense, which is good.

When googling proxmox vs xcpng there seems to be a repeating opinion that xcpng is bit more stable, bit more reliable. Which obviously sits well with me, but I also know that guys like Jeff from Craft Computing are deploying proxmox commercially left and right for years, so it must be pretty stable and I am just glad that I did not find bunch of complains about instability.

Now, since that first try I installed xcpng on a few more machines and the experience there was not as hurdle-free as that first time. There's now a chapter where I note issues I encounter. But still.. that first impression sold me on it pretty hard.

Hypervisors Benchmarks

benchmark-symbols

For me the performance is not a deciding factor and I expect it to be adequate with all of modern hyperviros. But since I am playing with these I can as well run some benchmarks.

Test machine - ThinkCentre M75q Gen2; ryzen 4350GE; 16+4GB ram; 128GB nvme oem ssd for OS, 500GB sata ssd for VMs
VMs are win10 x64, 8 cores 16GB ram
Tests are run 3+ times, the highest value is noted.

  • metal - nothing of note, though it had 20GB or ram compared to VMs 16GB.
  • xcpng - storage is ext thin; the guest drivers installed.
  • proxmox - cpu - host; storage - thin LVM; virtio drivers installed; followed this video for general setup.
  • hyperv - ram was not dynamic
Win10 WM test metal xcpng proxmox hyperv
cinebench 866 839 788 776
geekbench 1380 & 4780 1306 & 4636 1292 & 4213 1283 & 4455
cristal disk mark 41 22 17 18
hdtune 177 144 87 113
iperf pass pass pass pass
latency pass pass pass pass
setup overview info info info info

Cinebench R15 is pretty clear cut. Was run several times on each, even with restarts.
Geekbench is nice that it gives a link to detailed results, has some note about an issue with timers with proxmox.
Cristal disk mark shows pretty big differences, picked random read as the important value for the table. HDtune Burst speed was picked fo the table. There also was some cashing going on hyperv and xcpng, hence after a while sequentials higher than metal.
iperf would be more interesting if 2.5gbit or 10gbit nic, maybe there be some difference.
DPC latency test is probably worthless but it's maybe a check if there is not some weird slowness going on, but all passed. Of note is that the very first measurment on proxmox had better values than metal, but I could not replicate it with later runs.

Likely the performance can be tweaked and improved on some, but I am fine with xcpng performance so its nice to not needing to bother.
Also of note - the performance of a windows VM, is not indicative of a performance of a linux VM, but I dont feel like doing linux, it would be probably geekbench + fio and I hate dealing with fio test configs and results.



Xen Orchestra

diagram

The official docs and the official docs2.
Github.
Ronivay's install script.

An open source web-based centralized management platform for xcpng servers.

  • XO - Xen Orchestra - Free version compiled from the source.
  • XOA - Xen Orchestra Appliance - Paid version. Functional in free mode but with limitations.
  • XO Lite - Xen Orchestra Lite - Running on every host. Only provides basic info and simplifies XOA deployment. Under development.

Most non commercial users want to deploy XO which provides full functionality while being free. It can run either as a VM or a docker container. And either on the xcpng host or any other machine that can ping the host. The complication is that you need XO to deploy XO on to an xcpng host.

  • docker container - the most trivial and quick deployment.
  • virtual machine - there's extra work of spinning up a new debian VM.
  • VM on the xcpng itself - there's additional extra work of using XOA first.

XO in Docker

docker-logo

Ronivay's github.

The compose here uses ronivay's image and is a variation of their compose.

The changes made - switching from volumes to bind mounts and not mapping port 80 to docker host port 80, but just using expose to document the port webGUI is using. Reason is that theres an expectation of running a reverse proxy. If no reverse proxy then go with ronivay's port mapping.

compose.yml

services:

  xen-orchestra:
    image: ronivay/xen-orchestra:latest
    container_name: xen-orchestra
    hostname: xen-orchestra
    restart: unless-stopped
    env_file: .env
    stop_grace_period: 1m
    expose:
        - "80"         # webGUI
    cap_add:           # capabilities are needed for NFS/SMB mount
      - SYS_ADMIN
      - DAC_READ_SEARCH
    # additional setting required for apparmor enabled systems. also needed for NFS mount
    security_opt:
      - apparmor:unconfined
    volumes:
      - ./xo_data:/var/lib/xo-server
      - ./redis_data:/var/lib/redis
    # these are needed for file restore.
    # allows one backup to be mounted at once which will be umounted after some minutes if not used (prevents other backups to be mounted during that)
    # add loop devices (loop1, loop2 etc) if multiple simultaneous mounts needed.
    devices:
     - "/dev/fuse:/dev/fuse"
     - "/dev/loop-control:/dev/loop-control"
     # - "/dev/loop0:/dev/loop0"

networks:
  default:
    name: $DOCKER_MY_NETWORK
    external: true

.env

# GENERAL
DOCKER_MY_NETWORK=caddy_net
TZ=Europe/Bratislava

# XO
HTTP_PORT=80

Caddy is used for reverse proxy, details here.

Caddyfile

xo.{$MY_DOMAIN} {
    reverse_proxy xen-orchestra:80
}


XO in a VM

debian-logo

If you got an another server with a hypervisor, or if you on your desktop run a virtualbox or a hyperv, or for the final VM deployment of XO on xcpng...

  • Spin up a new debian virtual machine, click through the regular install.
  • clone the github repo with the install script
    git clone https://github.com/ronivay/XenOrchestraInstallerUpdater.git
  • go inside
    cd XenOrchestraInstallerUpdater
  • make a copy of the sample config.
    cp sample.xo-install.cfg xo-install.cfg
  • run the install script
    sudo ./xo-install.sh

More discussion about the process here.



XO on XCPng itself

web-install

Note: the videos showcasing the process are in the last chapter.

The easiest way is to first deploy the paid XOA and use that to deploy XO.

  • In a browser, go to the xcpng host IP address, top right corner you see Deploy XOA
    • btw, one can also initialize this deployment by creating an account on xen-orchestra.com and under the account find "XOA quick deploy - Deploy now".
  • Click through the setup.
  • Login to XOA at the ip address this new VM got.
  • Follow The Basics section to:
    • create iso storage and upload iso
    • spin up a new VM with debian or ubuntu or centos stream
    • git clone XO install script repo, rename the config file, execute the install script
      or alternatively setup the new VM as a docker host and deploy XO as a container there
    • add xcpng host as a server in to the XO
    • delete XOA virtual machine


XO deployment is an extra step compared to other hypervisors, but if you ever get more servers this approach starts to make sense - not thinking about running the management tool on the thing it manages. That the hosts are thought of as replaceable cogs in a bigger machine...
But yeah, it also means it is an extra work for - "my first home server" types of deployments.

Some aspects of XO

  • Once VMs are up and running, XO is not required for them to function. But you lose some functionality if you would turn it off or disconnect.
    • Backups schedule and their execution.
      XO is what manages backups, even the data of the VMs that are being backed up flow through the XO during a backup job if it's going to a network share. Theres even XO Proxy to be there with the VMs on-site while the main management XO is wherever...
    • Metrics monitoring.
      Can't look up cpu load from the last week if XO was not there to record it.
    • HA - High Availability - ...like duh, something needs to orchestrate it...
  • XO is the free version, compiled from the source, nagging notices about not having subscription are something thats just there occasionally.

The Basics

XCPng Host Installation

The official docs.

Download the latest release ISO. Boot the ISO, I use ventoy, click through the installation...
All is pretty straight forward. The official docs have pretty hand holding instructions too.

After reboot, we are shown the basic info menu, similar to esxi but better. I really like the look with all the info and all the functionality. This menu can be open even when SSH in, with xsconsole command.

xcpng-console-menu

The First login in to XO

  • [email protected] // admin
  • change login email and password
    Settings > Users
  • Turn off the filters for VMs, as by default only the running VMs are shown
    user icon in the left bottom corner > Customize filters > VM - Default filter = None

Add Server

  • New > Server
  • label whatever
  • ip address
  • root / password set during the xcpng installation
  • slider Allow Unauthorized Certificates - True

Updates

  • Home > Hosts > your_host > Patches

Create DVD ISO storage

iso-sr-250px

The official docs.

Local Storage Repo
  • New > Storage
  • Select your_host
  • Set the name and the description
  • Select storage type: ISO SR: Local
  • Path: /media if you are ok putting ISOs on the boot disk
    • Alternative is to ssh in and look around for a path to another drive, usually it's somewhere in run/sr-mount/
  • Create
  • To upload an ISO
    Import > Disk > To SR: whatever_named
    It knows the type of the storage repo and allows upload of ISOs

If /media is selected the storage repo is created on a 18GB root partition.

NFS share
  • Have an NFS share, I use truenas scale
  • New > Storage
  • Select your_host
  • Set the name and the description
  • Select storage type: NFS ISO
  • Server: ip_address_of_the_nfs_share
    • search icon
  • select detected path
  • Create


Virtual Machines creation

vms-250px

The official docs.

Spinning a new VM is easy and quick.
Preconfigured templates take care of lots of settings. They are in json format and can be browsed in /usr/share/xapi/vm-templates.

  • New > VM
  • Select template
  • vCPU, RAM, socket
  • ISO
  • Network default
  • Disk - change the name, set the size

Guest Tools

agent-detect

The official docs.

Consist of two components and you absolutely want to make sure you got both working properly.
The info is in the General tab of every virtual machine.

  • Kernel Paravirtualization Drivers - improve performance, usually I/O.
    • HVM - no drivers
    • PVHVM - drivers present
  • Management Agent - better guest management and metrics reporting.
    • Management agent not detected
    • Management agent detected

Windows

The official docs.

The above linked official docs tell well the details.

Theres also a VM option, to get the drivers through windows updates, but reading the docs, it's just a driver and the VM still needs the agent, so you would still be installing agent... so it's not worth the bother.



Linux

The official docs.

Again, the linked docs tell well all the details.
The drivers are in linux kernel, so one only needs the agent.

For my go-to archlinux I just

  • yay xe-guest-utilities-xcp-ng
  • sudo systemctl enable --now xe-linux-distribution.service

For occasianal debian install it's just as the docs say

  • XO comes with XCPng-Tools iso, mount that in to virtual dvd in General tab of a VM
  • restart the VM
  • sudo mount /dev/cdrom /mnt
  • sudo bash /mnt/Linux/install.sh
  • reboot and unmount


Backups

backup-diagram

Backups are important enough that the official docs should be the main source of information. Stuff here are just some highlights, notes.

Be aware - Xen Orchestra is what schedules and executes backups.
XO must be running with xcpng hosts, its not a fire and forget deployment.

Backup Jobs Types

At the moment I just played with rolling snapshots, backups, and delta backups.

  • VM Backup & Replication
    • Rolling Snapshot
      Takes a snapshot at schedule. Retention is set in the schedule section.
    • Backup
      Snapshot of a VM and then exports to a remote location. Full size every time, so lot of space, bandwidth and time is used.
    • Delta Backup
      Incremental backups of only changes against the initial full back
      CBT - Changed Block Tracking - a new way to do incremental backups
    • Disaster Recovery
      Full replication. The backup of the VM can be started immediately, no restoration.
    • Continuous Replication
      Replication but through incremental changes.
  • VM Mirror Backup
    • Mirror full backup
      backup of a backup repo
    • Mirror incremental backup
      backup of a backup repo through incremental changes.
  • XO config & Pool metadata Backup
  • Sequence

Smart mode

Gives ability to more broadly target VMs for backup jobs.
Instead of just selecting VMs manually.. it can be that all running VMs on all hosts get rolling snapshots. Or the ones tagged as production will have nightly full backup,...

Health check

Here's Lawrence Systems video on backups that are automatically tested. A VM is restored at a host of choice, booted without network and theres a check that guest tools agent starts. If all that happens, the backup is marked as healthy and the VM is destroyed.

Veeam Support

Seems theres a prototype and a praise of xen api from veeam devs.
Though that does not mean the management will decide to create and support xen veeam edition.

Remotes

Kinda weird how for backups you are creating remotes and not storage, like its some type of different category even when I am doing same nfs..

showmount -e 192.168.1.150 - a handy command showing nfs shares and paths

  • Settings > Remotes
  • Local or NFS or SMB or S3
  • IP address
  • port - can be left empty
  • path of the share
  • custom options - can be empty

Create a backup job

  • Backup > New > VM Backup & Replication

backup-job-report

Backup reports

First setup email server for notifications

  • Settings > Plugins > transport-email
  • I use a free Brevo account for an smtp server - 300 emails a day.

Then in backup job settings

  • Report when - always | skipped or failure
  • Report recipient - set an email and you have to press the plus sign
  • Save

Advanced Concepts

Passthrough

passthrough-pic

When you want to give a virtual machine direct full hardware access to some device.
Be aware that once passthrough is setup it's tight to hardware addresses and hardware changes first require disabling the passthrough, or your xpcng might not boot or devices might be stuck hidden.

intel igpu passthrough

  • On the server host
    • Home > Hosts > your_host > Advanced > PCI Devices
      Enable slider next to VGA compatible controller
    • Reboot the host, go check if the slider is on
  • On the Virtual Machine
    • Home > VMs > your_VM > Advanced >
      At the end a button - Attach PCIs, there pick the igpu listed.

In VM you can check with lspci | grep -i vga

Tested with jellyfin and enabled transcoding, monitored with btop and intel_gpu_htop.

The old way - cli passthrough

Lawrence video.

  • ssh in on to xcpng host
  • lspci -D list the devices that can be passthrough
  • pick the device you want, note the HW address at the begining, in this case it was 0000:00:02.0
  • hide the device from the system
    /opt/xensource/libexec/xen-cmdline --set-dom0 "xen-pciback.hide=(0000:00:02.0)"
    • be aware, the command is overrwriting the current blacklist, so for multiple devices it would be
      /opt/xensource/libexec/xen-cmdline --set-dom0 "xen-pciback.hide=(0000:00:02.0)(0000:00:01.0)"
  • reboot the hypervisor
  • can use command xl pci-assignable-list to check device that can be passthrough


amd igpu passthrough

No luck so far.

udevadm info --query=all --name=/dev/dri/renderD128
lspci -vvv -s 00:08.0 - "00:08.0" being physical address shown in the udevadm dmesg | grep -i amdgpu - if loaded correctly

Pools

pool-join-pic

For easier management on larger scale.
Pools remove some duplicitous effort when setting up shared storage, or networks, or backups. Allow for easier/faster live migration of VMs or for automatic load balancing. Safer updates of the hosts and easier scale up of the compute power by adding more hosts.
To join a pool, the hosts must have similar CPU, for examle you can no mix amd and intel, but not sure how similar till you get message - "Failed: The hosts in this pool are not homogeneous. CPUs differ."

  • All hosts are masters in their own pool, pick one that will be the master
    rename it's pool to something more specific
  • for the machines that will be joining that pool
    • ssh in or get to the console of the host
      Home > Hosts > your_host > Console
    • xsconsole to get the core menu
      • Resource Pool Configuration > Join a Resource Pool
      • give the hostname of the pool master
      • root and password

I only had few machines in a pool to check it out, do some testing.
Might add more info in the future.

Monitoring

prometheus-monit

Prometheus + Grafana monitoring

To get metrics and setup alerts.
Details on general prometheus + grafana deployment here.

compose.yml

services:

  xen01:
    image: ghcr.io/mikedombo/xen-exporter:latest
    container_name: xen01
    hostname: xen01
    restart: unless-stopped
    environment:
      - XEN_HOST=10.0.19.62
      - XEN_USER=root
      - XEN_PASSWORD=aaaaaa
      - XEN_SSL_VERIFY=false

networks:
  default:
    name: caddy_net
    external: true

prometheus.yml

scrape_configs:
  - job_name: 'xenserver'
    static_configs:
      - targets: ['xen01:9100']


opnsense or pfsense as a VM in xcpng

tx-checksumming-off

The official docs.

The most important bit of info is to disable TX Checksum Offload
Here will be more detailed example of opnsense deployment.

Notes on some concepts

Storage

The official docs.

The above docs link gives good overview. I plan to keep it simple.

  • ext4 for local storage
  • nfs for network shares

Various file types encountered.

  • VDI - Virtual Disk Storage - a concept, not an actual file type
  • .vhd - A file representing a virtual disks and snapshots.
  • .xva - An archive of a VM, used for backups.
  • .iso - Bootable dvd image, usually for OS installation.

Virtualization Models

xen-virt-modes

Xenproject wiki has a good article on these, especially with bit of history.

  • PV - Paravirtualization
    The oldest way, bypassing need for emulation of hardware by having the guest OS aware of being in a VM, running with a modified kernel and using a specific hypervisor API.
  • HVM - Hardware Virtual Machine
    Full emulation of hardware using hardware support - intel VT-x | AMD-V
  • PVHVM - Hardware virtualization with paravirtualization drivers enabled
    HVM performed worse in some aspects, especially I/O, installing the PV emulated device drivers bypasses some qemu overhead, improving the performance.
  • PVH - Paravirtualization-on-HVM
    Further performance improvements and reduced complexity. Completely drops the need for qemu for the emulation of hardware. Not yet really used.

Issues encountered

  • A VM with "Generic Linux UEFI" preset failed to boot from arch ISO
    Weird issue. Seems the cause is that the ISO SR was created in /media on the boot drive which was a small OEM nvme ssd that came with that miniPC. The thing is that I had 3 lenovo miniPCs at that time and every single one of them had this issue. Debian 12 ISO and template also had that issue.
    Any change to the setup solved the problem. Replacing the ssd with a larger brand-name nvme ssd; creating ISO SR on a different drive; switch to a sata ssd; using nfs share for ISOs; switching to bios;...
    Probably some weird quirk with uefi and ext3 and a small nvme ssd or something.
  • igpu passthrough of ryzen 4350GE is not working at all, ThinkCentre M75q Gen 2.
  • igpu passthrough of i5-8400T had a poor performance, ThinkCentre M720q.
    I am starting to wonder if my initial test with i3-9100 of the passthrough really worked as well as I remember it working.
    Will keep testing when I get some intel based machines in hands as right now I got none.

Videos

How to deploy XOA on a freshly installed xcpng host and then using XOA to deploy debian VM and run script that installs XO.

01-XO-lite-Deploy-XOA.mp4
02-XOA-firstlogin-new-vm.mp4
03-debian-install-speed.mp4
04-XO-install-script.mp4
05-XO-firstlogin-adding-host-remove-XOA.mp4


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published