The NATS execution engine node service runs as a daemon on a host where it responds to control commands, starting, stopping, and monitoring workloads.
The following is a (potentially incomplete) list of the things you'll need in order to be able to run the node application.
- An accessible copy of the
firecracker
binary in the path of the node service - A linux kernel
- A root file system with nex-agent configured as a boot service.
- CNI plugins installed (at least) to
/opt/cni/bin
host-local
ptp
tc-redirect-tap
- A CNI configuration file at
/etc/cni/conf.d/fcnet.conflist
First and foremost, the nex node
application makes use of the firecracker SDK to start VMs, which is an act that requires a number of elevated privileges. When running locally for testing, it's easy enough to just sudo nex-node ....
but in production, you might want to craft a bespoke user and group that can do just enough to make firecracker happy.
Because nex node
utilizes firecracker VMs, nex node
must be statically linked, freestanding Linux elf binaries. The node service will reject dynamically-linked or non-elf binaries.
As you run this and create warm VMs to reside in the pool, you'll be allocating IP addresses from the firecracker network CNI device (defaults to fcnet
). This means that eventually you'll run out of IP addresses during development. To clear them, you can purge your /var/lib/cni/networks/{device}
directory. This will start IP allocation back at .2
(.1
is the host).
Additionally, while running and testing this you may end up with a large amount of idle/dormant veth
devices. To remove them all, you can use the following script:
for name in $(ifconfig -a | sed 's/[ \t].*//;/^\(lo\|\)$/d' | grep veth)
do
echo $name
sudo ip link delete $name
done
veth
devices to start over fresh.
The nex node
service needs to know the size and shape of the firecracker machines to dispense. As a result, it needs a JSON file that describes the cookie cutter from which VMs are stamped. Here's a sample machineconfig.json
file:
{
"kernel_path": "/home/kevin/lab/firecracker/vmlinux-5.10",
"rootfs_path": "/home/kevin/lab/firecracker/rootfs.ext4",
"machine_pool_size": 1,
"cni": {
"network_name": "fcnet",
"interface_name": "veth0"
},
"machine_template": {
"vcpu_count": 1,
"memsize_mib": 256
},
"requester_public_keys": []
}
This file tells nex node
where to find the kernel and rootfs for the firecracker VMs, as well as the CNI configuration. Finally, if you supply a non-empty value for requester_public_keys
, that will serve as an allow-list for public Xkeys that can be used to submit requests. XKeys are basically nkeys that can be used for encryption. Note that the network_name
field must match exactly the {network_name}.conflist
file in /etc/cni/conf.d
.
Here's a look at the fcnet.conflist
file that we use as a default. This gives each firecracker VM access to whatever the host can access, and allows the host to make inbound requests. Regardless of your CNI configuration, the agent process must be able to communicate with the host node via the internal NATS server (defaults to running in port 9222
).
{
"name": "fcnet",
"cniVersion": "0.4.0",
"plugins": [
{
"type": "ptp",
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.127.0/24",
"resolvConf": "/etc/resolv.conf"
}
},
{
"type": "tc-redirect-tap"
}
]
}
If you want to add additional constraints and rules to the fcnet
definition, consult the CNI documentation as we use just plain CNI configuration files here.
You can monitor events coming out of the node and workloads running within the node by subscribing on the following subject pattern:
$NEX.events.{namespace}.{event_type}
The payload of these events is a CloudEvent envelope containing an inner JSON object for the data
field.
You can subscribe to log emissions without console access by using the following subject pattern:
$NEX.logs.{namespace}.{host}.{workload}.{vmId}
This gives you the flexibility of monitoring everything from a given host, workload, or vmID. For example, if you want to see
an aggregate of all logs emitted by the bankservice
workload throughout an entire system, you could just subscribe to:
$NEX.logs.*.*.bankservice.*