Skip to content

Commit 91891de

Browse files
olethanhnesitor
authored andcommitted
Fix Instance network not working on server reboot
Related Jira ticket : ALEPH-549 Problem: When the whole server rebooted, internet was not working inside instance and they were unreachable on the network. As the network interface were not properly recreated Analysis: At the instance automatic restart once the serevr rebooted, the network interface is not existant, thus QEMU recreating some kind of default network that is not setup as needed (tunneling, ip address, nat, ipv6, etc.._ and thus the network inside the instance was non functional.) Solution: In the controller, wait for the network interface to be recreated by the supervisor In the supevisor recreate the network interface and set it up if it is not existing This is done this way as it is not possible to add tuntap after the interface is created How to test: * Stop the supervisor * Stop the instance controller (via systemctl stop aleph-vm-controller@ ) * Delete the network interface : ip link del vmtapX (e.g. vmtap4 if it is the one the interface is using) * Start the instance again (systemctl start aleph-vm-controler@... ) * Check that it is waiting for the network interface (journalctl -u aleph-vm-controller@... -f ) * Start the supervisor again, in the log it should create the network interface * The instance controler should detect it and launch QEMU * Check that the interface exist and has an ip: ip a * SSH into the instance , check that the network is working as intented
1 parent c978419 commit 91891de

File tree

2 files changed

+23
-13
lines changed

2 files changed

+23
-13
lines changed

src/aleph/vm/controllers/__main__.py

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
import sys
77
from asyncio.subprocess import Process
88
from pathlib import Path
9+
from time import sleep
910

1011
from aleph.vm.hypervisors.firecracker.microvm import MicroVM
1112
from aleph.vm.hypervisors.qemu.qemuvm import QemuVM
@@ -127,21 +128,27 @@ def main():
127128

128129
config.settings.check()
129130

130-
if args.initialize_network_settings:
131-
network = Network(
132-
vm_ipv4_address_pool_range=config.settings.IPV4_ADDRESS_POOL,
133-
vm_network_size=config.settings.IPV4_NETWORK_PREFIX_LENGTH,
134-
external_interface=config.settings.NETWORK_INTERFACE,
135-
ipv6_allocator=make_ipv6_allocator(
136-
allocation_policy=config.settings.IPV6_ALLOCATION_POLICY,
137-
address_pool=config.settings.IPV6_ADDRESS_POOL,
138-
subnet_prefix=config.settings.IPV6_SUBNET_PREFIX,
139-
),
140-
use_ndp_proxy=config.settings.USE_NDP_PROXY,
141-
ipv6_forwarding_enabled=config.settings.IPV6_FORWARDING_ENABLED,
142-
)
131+
network = Network(
132+
vm_ipv4_address_pool_range=config.settings.IPV4_ADDRESS_POOL,
133+
vm_network_size=config.settings.IPV4_NETWORK_PREFIX_LENGTH,
134+
external_interface=config.settings.NETWORK_INTERFACE,
135+
ipv6_allocator=make_ipv6_allocator(
136+
allocation_policy=config.settings.IPV6_ALLOCATION_POLICY,
137+
address_pool=config.settings.IPV6_ADDRESS_POOL,
138+
subnet_prefix=config.settings.IPV6_SUBNET_PREFIX,
139+
),
140+
use_ndp_proxy=config.settings.USE_NDP_PROXY,
141+
ipv6_forwarding_enabled=config.settings.IPV6_FORWARDING_ENABLED,
142+
)
143143

144+
if args.initialize_network_settings:
144145
network.setup()
146+
else:
147+
# Wait for the network interface to be by the supervisor
148+
# Otherwise QEMU will create a new one, and it won't be properly setup as a tuntap as we expect
149+
while not network.interface_exists(config.vm_id):
150+
logger.info("Waiting for network interface to be created...")
151+
sleep(1)
145152

146153
asyncio.run(run_persistent_vm(config))
147154

src/aleph/vm/pool.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,9 @@ async def load_persistent_executions(self):
299299
if self.network:
300300
vm_type = VmType.from_message_content(execution.message)
301301
tap_interface = await self.network.prepare_tap(vm_id, vm_hash, vm_type)
302+
if not self.network.interface_exists(vm_id):
303+
# In case of a reboot, the network is not automatically created
304+
await self.network.create_tap(vm_id, tap_interface)
302305

303306
# Activate ndp_proxy for existing interfaces if needed
304307
if self.network.ndp_proxy and self.network.interface_exists(vm_id):

0 commit comments

Comments
 (0)