Skip to content

dnsvizor: init at 0-unstable-2025-12-15#1907

Closed
jian-lin wants to merge 7 commits into
ngi-nix:mainfrom
linj-fork:pr/dnsvizor-init
Closed

dnsvizor: init at 0-unstable-2025-12-15#1907
jian-lin wants to merge 7 commits into
ngi-nix:mainfrom
linj-fork:pr/dnsvizor-init

Conversation

@jian-lin
Copy link
Copy Markdown
Contributor

@jian-lin jian-lin commented Dec 15, 2025

To build this locally, run command like this:

nix build -f . dnsvizor.unix
nix build -f . dnsvizor.hvt

Progress

The macosx and genode targets fail to build. They seem to be niche targets so I do not plan to fix them. So they are not exposed to avoid CI errors.

Optional TODO list

  • enable tests: tests fail to build if target is not unix
  • avoid IFD (optional because CI passes now even with IFD)

Closes #1906

@jian-lin jian-lin marked this pull request as draft December 15, 2025 02:46
@jian-lin jian-lin changed the title dnsvizor: init at dev dnsvizor: init at 0-unstable-2025-12-15 Dec 15, 2025
Comment thread default.nix Outdated
Comment thread default.nix Outdated
@jian-lin jian-lin marked this pull request as ready for review December 16, 2025 04:21
@jian-lin
Copy link
Copy Markdown
Contributor Author

The remaining thing to do is to upstream my patches. Other than that, this PR is ready for review.

@jian-lin jian-lin requested a review from eljamm December 16, 2025 04:23
Comment thread pkgs/by-name/dnsvizor/hillingar.nix
bad: eval failure by IFD or build failure
@imincik imincik self-requested a review December 16, 2025 08:39
Copy link
Copy Markdown
Contributor

@imincik imincik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please investigate if we can solve IFDs somehow. Thanks.

Comment thread flake.nix Outdated
inputs.opam-nix.inputs.opam-overlays.follows = "opam-overlays";
inputs.opam-nix.inputs.mirage-opam-overlays.follows = "mirage-opam-overlays";
# update opam-nix to fix eval error of new Nixpkgs: attribute 'overrideScope'' missing
inputs.opam-nix.url = "github:tweag/opam-nix";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is using IFD which will prevent us to merge this PR. Is there any workaround (for example including some generated file in git)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "prevent us to merge this PR", do you mean

  • there is a no-IFD policy in ngipkgs
  • or currently CI on aarch64 fails

Indeed, opam-nix, used by hillingar, can avoid IFD by adding generated files. However, hillingar does not expose that functionality from opam-nix yet. More work is needed to implement that in hillingar.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed buildbox-nix CI config as a workaround to make CI pass in bf173c8.

See the commit message for more details and possible alternative methods.

Avoiding IFD is of low priority (to me) now since CI passes. So I'll focus on projects(dnsvizor): init next. I am also fine to work on avoiding IFD before projects(dnsvizor): init if you think that is better.

WDYT?

Comment thread flake.nix Outdated
Comment thread pkgs/by-name/dnsvizor/hillingar.nix Outdated
@jian-lin jian-lin requested a review from imincik December 17, 2025 04:47
My patches has been upstreamed.
My patches are applied using nix.
By default, buildbot-nix looks at "checks", which consists of
"checks.x86_64-linux" and "checks.aarch64-linux".  Our buildbot-nix CI
runs on a x86_64-linux[1] machine so buildbot-nix errors out for IFDs
needing to build on aarch64-linux systems.

This patch fixes that error by letting buildbot-nix only look at
"checks.x86_64-linux", which was made possible by [2].  Compared to
the previous state, the only disadvantage is that we do not catch eval
errors on aarch64-linux in the buildbot-nix CI any more.

There are 3 possible alternative fixes:

1. ban IFD in ngipkgs
2. exclude "aarch64-linux" from "checks"
3. emulate aarch64-linux on a x86_64-linux machine using
   boot.binfmt.emulatedSystems = [ "aarch64-linux" ]

The 1st alternative fix usually needs extra work to implement and
usually means we have to commit generated lock files to ngipkgs repo.

The 2nd alternative fix affects more than just buildbot-nix CI, such
as "nix flake check", which may not be desirable.

The 3rd alternative fix will slow down the buildbox-nix CI since
emulating another system is slow.

[1]: https://github.com/ngi-nix/ngipkgs/blob/dfab738d4a1d00f6c1b958be29163d672badf05f/infra/makemake/default.nix#L3
[2]: nix-community/buildbot-nix#318
@jian-lin
Copy link
Copy Markdown
Contributor Author

jian-lin commented Jan 9, 2026

While working on #1944, I find that the built dnsvizor.hvt has runtime error. Needs further investigation.

Fortunately, upstream provides binary releases of unikernels (the same unikernel binary works on many arch/platform/systems), and claims them to be reproducible.

@eljamm
Copy link
Copy Markdown
Contributor

eljamm commented Jan 9, 2026

  • make buildbot-nix CI pass with IFD

As @ju1m mentioned in the review meeting:

If you want to speed up your project by avoiding Import From Derivation, opam-nix supports haskell.nix-style materialization.

Could we use materialization?

@jian-lin
Copy link
Copy Markdown
Contributor Author

jian-lin commented Jan 9, 2026

To be clear, I already know that opam-nix supports materialization. I just said materialization using different words "adding generated files":

Indeed, opam-nix, used by hillingar, can avoid IFD by adding generated files. However, hillingar does not expose that functionality from opam-nix yet. More work is needed to implement that in hillingar.

The complex part of avoiding IFD for MirageOS unikernels lies in the (cross-)building process of the unikernel.

@ju1m ju1m mentioned this pull request Jan 28, 2026
4 tasks
@ju1m
Copy link
Copy Markdown

ju1m commented Feb 4, 2026

Oh noes, unlike the prebuilt dnsvizor.hvt that is used in #1944 (comment) , the resulting dnvizor.hvt crashes at startup on my x86_64-linux, with the error:

Solo5: trap: type=#PF ec=0x0 rip=0x466a86 rsp=0x1ffffc10 rflags=0x10002
Solo5: trap: cr2=0x28
Solo5: ABORT: cpu_x86_64.c:181: Fatal trap

Reproducer

# Remove `services.dnsvizor.package` in `projects/DNSvizor/services/dnsvizor/module.nix`
$ nix build -f. projects.DNSvizor.nixos.tests.dns --show-trace --allow-import-from-derivation
# Or after a rebase on latest main Git branch:
$ nix build -f. projects.DNSvizor.tests."Enable DNSvizor as a dual-stack recursive DNS resolver:resolverKind=recursive-useNetworkd=false-useNftables=false" --show-trace --allow-import-from-derivation
[…]
vm-test-run-DNSvizor> dnsResolver: waiting for success: journalctl -u dnsvizor -g "listening on"
vm-test-run-DNSvizor> dnsResolver # [   23.807864] solo5-hvt[1008]:             |      ___|
vm-test-run-DNSvizor> dnsResolver # [   23.808586] solo5-hvt[1008]:   __|  _ \  |  _ \ __ \
vm-test-run-DNSvizor> dnsResolver # [   23.811182] solo5-hvt[1008]: \__ \ (   | | (   |  ) |
vm-test-run-DNSvizor> dnsResolver # [   23.811624] solo5-hvt[1008]: ____/\___/ _|\___/____/
vm-test-run-DNSvizor> dnsResolver # [   23.812131] solo5-hvt[1008]: Solo5: Bindings version v0.10.0
vm-test-run-DNSvizor> dnsResolver # [   23.812481] solo5-hvt[1008]: Solo5: Memory map: 128 MB addressable:
vm-test-run-DNSvizor> dnsResolver # [   23.813071] solo5-hvt[1008]: Solo5:   reserved @ (0x0 - 0xfffff)
vm-test-run-DNSvizor> dnsResolver # [   23.813467] solo5-hvt[1008]: Solo5:       text @ (0x100000 - 0x50bfff)
vm-test-run-DNSvizor> dnsResolver # [   23.813974] solo5-hvt[1008]: Solo5:     rodata @ (0x50c000 - 0x5c8fff)
vm-test-run-DNSvizor> dnsResolver # [   23.814440] solo5-hvt[1008]: Solo5:       data @ (0x5c9000 - 0xa16fff)
vm-test-run-DNSvizor> dnsResolver # [   23.814967] solo5-hvt[1008]: Solo5:       heap >= 0xa17000 < stack < 0x8000000
vm-test-run-DNSvizor> dnsResolver # [   23.851537] solo5-hvt[1008]: Solo5: trap: type=#PF ec=0x0 rip=0x466a86 rsp=0x7fffc10 rflags=0x10002
vm-test-run-DNSvizor> dnsResolver # [   23.852588] solo5-hvt[1008]: Solo5: trap: cr2=0x28
vm-test-run-DNSvizor> dnsResolver # [   23.853930] solo5-hvt[1008]: Solo5: ABORT: cpu_x86_64.c:181: Fatal trap
vm-test-run-DNSvizor> dnsResolver # [   23.862646] systemd[1]: dnsvizor.service: Main process exited, code=exited, status=255/EXCEPTION
vm-test-run-DNSvizor> dnsResolver # [   23.865940] systemd[1]: dnsvizor.service: Failed with result 'exit-code'.
vm-test-run-DNSvizor> dnsResolver # [   24.214968] systemd[1]: dnsvizor.service: Scheduled restart job, restart counter is at 5.
vm-test-run-DNSvizor> dnsResolver # [   24.215543] systemd[1]: dnsvizor.service: Start request repeated too quickly.
vm-test-run-DNSvizor> dnsResolver # [   24.216436] systemd[1]: dnsvizor.service: Failed with result 'exit-code'.
vm-test-run-DNSvizor> dnsResolver # [   24.217489] systemd[1]: Failed to start dnsvizor recursive/stub DNS resolver and DHCP server

A quicker reproducer

$ nix shell nixpkgs#solo5
$ sudo ip tuntap add tap-unikernel mode tap
$ sudo ip link set dev tap-unikernel up
$ solo5-hvt --mem=512 --net:service=tap-unikernel -- $(nix build --print-out-paths --no-link -f. dnsvizor.hvt --allow-import-from-derivation)/dnsvizor.hvt
            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Bindings version v0.10.0
Solo5: Memory map: 512 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x50bfff)
Solo5:     rodata @ (0x50c000 - 0x5c8fff)
Solo5:       data @ (0x5c9000 - 0xa16fff)
Solo5:       heap >= 0xa17000 < stack < 0x20000000
Solo5: trap: type=#PF ec=0x0 rip=0x466a86 rsp=0x1ffffc10 rflags=0x10002
Solo5: trap: cr2=0x28
Solo5: ABORT: cpu_x86_64.c:181: Fatal trap

Expecting:

$ aria2c https://builds.robur.coop/job/dnsvizor/build/dd2ac462-f0d4-4866-8439-e4fbbc1e97ae/f/bin/dnsvizor.hvt
$ solo5-hvt --mem=512 --net:service=tap-unikernel -- ./dnsvizor.hvt
            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Bindings version v0.10.0
Solo5: Memory map: 512 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x4ddfff)
Solo5:     rodata @ (0x4de000 - 0x66afff)
Solo5:       data @ (0x66b000 - 0xa02fff)
Solo5:       heap >= 0xa03000 < stack < 0x20000000
2026-02-04T02:58:00-00:00: [INFO] [netif] Plugging into service with mac 42:55:0f:61:29:30 mtu 1500
2026-02-04T02:58:00-00:00: [INFO] [ethernet] Connected Ethernet interface 42:55:0f:61:29:30
2026-02-04T02:58:00-00:00: [INFO] [ARP] Sending gratuitous ARP for 10.0.0.2 (42:55:0f:61:29:30)
2026-02-04T02:58:00-00:00: [INFO] [ARP] Sending gratuitous ARP for 10.0.0.2 (42:55:0f:61:29:30)
2026-02-04T02:58:00-00:00: [INFO] [ipv6] IP6: Starting
2026-02-04T02:58:00-00:00: [INFO] [ndpc6] IP6: Processing unknown option, MSB 5
2026-02-04T02:58:00-00:00: [INFO] [ndpc6] ICMP6: Unknown packet type: ty=143
2026-02-04T02:58:01-00:00: [INFO] [ndpc6] IP6: Processing unknown option, MSB 5
2026-02-04T02:58:01-00:00: [INFO] [ndpc6] ICMP6: Unknown packet type: ty=143
2026-02-04T02:58:01-00:00: [INFO] [ipv6] IP6: Started with fe80::4055:fff:fe61:2930
2026-02-04T02:58:01-00:00: [INFO] [udp] UDP layer connected on 10.0.0.2/24, fe80::4055:fff:fe61:2930/64
2026-02-04T02:58:01-00:00: [INFO] [tcp.pcb] TCP layer connected on 10.0.0.2/24, fe80::4055:fff:fe61:2930/64
2026-02-04T02:58:01-00:00: [INFO] [tcpip-stack-direct] Dual TCP/IP stack assembled: mac=42:55:0f:61:29:30,ip=10.0.0.2/24, fe80::4055:fff:fe61:2930/64
2026-02-04T02:58:01-00:00: [WARNING] [happy-eyeballs.mirage] inject was called the 2 times
2026-02-04T02:58:01-00:00: [WARNING] [application] No password specified, endpoints requiring authentication won't be accessible.
2026-02-04T02:58:01-00:00: [ERROR] [application] Neither --no-tls nor --ca-seed specified. The seed (base64 encoded) used to generate the private key for the certificate. The seed can be prepended by the type of the key (rsa or ed25519) plus a colon. For a RSA key, the user can also specify bits: "rsa:4096:foo=".
Solo5: solo5_exit(64) called

No better luck with dnsvizor.spt:

$ solo5-spt --mem=512 --net:service=tap-unikernel -- $(nix build --print-out-paths --no-link -f. dnsvizor.spt --allow-import-from-derivation)/dnsvizor.spt
            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Bindings version v0.10.0
Solo5: Memory map: 512 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x509fff)
Solo5:     rodata @ (0x50a000 - 0x5c6fff)
Solo5:       data @ (0x5c7000 - 0xa10fff)
Solo5:       heap >= 0xa11000 < stack < 0x20000000
Segmentation fault         (core dumped)solo5-spt --mem=512 --net:service=tap-unikernel -- $(nix build --print-out-paths --no-link -f. dnsvizor.spt --allow-import-from-derivation)/dnsvizor.spt

Debug

$ mkdir dump
$ solo5-hvt-debug --dumpcore=dump --mem=512 --net:service=tap-unikernel -- $(nix build --print-out-paths --no-link -f. dnsvizor.hvt --allow-import-from-derivation)/dnsvizor.hvt
$ gdb --core=dump/core.solo5-hvt.1970565 -- result/dnsvizor.hvt
Reading symbols from result/dnsvizor.hvt...
warning: found thread with pid 0, assigned replacement Target Id: LWP 1
[New LWP 1]
#0  0x0000000000466a86 in __gmpn_cpuvec_init ()
(gdb) bt
#0  0x0000000000466a86 in __gmpn_cpuvec_init ()
#1  0x00000000001013bc in _abort ()
#2  0x0000000000100b93 in cpu_trap_handler ()
#3  0x0000000000100f01 in cpu_trap_14 ()
#4  0x0000000000000001 in ?? ()
#5  0x0000000000cdfb10 in ?? ()
#6  0x0000000000000038 in ?? ()
#7  0x000000001ffffdf8 in ?? ()
#8  0x0000000000000038 in ?? ()
#9  0x0000000000000001 in ?? ()
#10 0x0000000000000000 in ?? ()

There's surely a way to load the missing symbol, but we get back to rip=466a86, which contains mov %fs:0x28,%r12:

$ gdb $(nix build --print-out-paths --no-link -f. dnsvizor.hvt --allow-import-from-derivation)/dnsvizor.hvt
Reading symbols from /nix/store/zz0pbbh42440mzcbrbl3nmbgm6xncnbn-mirage-dnsvizor-hvt-0-unstable-2025-12-17/dnsvizor.hvt...
(gdb) disassemble 0x466a86
Dump of assembler code for function __gmpn_cpuvec_init:
   0x0000000000466a70 <+0>:	push   %rbp
   0x0000000000466a71 <+1>:	xor    %esi,%esi
   0x0000000000466a73 <+3>:	mov    %rsp,%rbp
   0x0000000000466a76 <+6>:	push   %r15
   0x0000000000466a78 <+8>:	push   %r14
   0x0000000000466a7a <+10>:	push   %r13
   0x0000000000466a7c <+12>:	push   %r12
   0x0000000000466a7e <+14>:	push   %rbx
   0x0000000000466a7f <+15>:	sub    $0x138,%rsp
   0x0000000000466a86 <+22>:	mov    %fs:0x28,%r12

(gdb) info registers
rax            0xa0cf28            10538792
rbx            0xa0cfe8            10538984
rcx            0x0                 0
rdx            0x508               1288
rsi            0xa0cfe8            10538984
rdi            0xff                255
rbp            0xa0cf40            0xa0cf40 <cpu_trap_stack+3872>
rsp            0x3ffffc10          0x3ffffc10
r8             0x0                 0
r9             0x0                 0
r10            0x0                 0
r11            0x0                 0
r12            0x527602            5404162
r13            0x52760d            5404173
r14            0x527611            5404177
r15            0x0                 0
rip            0x466a86            0x466a86 <__gmpn_cpuvec_init+22>
eflags         0x10002             [ RF ]
cs             0x8                 8
ss             0x10                16
ds             0x10                16
es             0x10                16
fs             0x10                16
gs             0x10                16
fs_base        0x0                 0
gs_base        0x0                 0

Solo5/solo5#331 explains that dnsvizor.hvt is "accessing memory outside of the range given by Solo5", as in solo5's test_notls.c:

https://github.com/Solo5/solo5/blob/dabc69fd89b8119449ec4088c54b458d4ccc851b/tests/test_notls/test_notls.c#L34

AFAIU, fs_base is zeroed by solo5 to disable Thread-local storage (TLS), %fs being the address of the current thread's user-space thread structure.

__gmpn_cpuvec_init suggests that pkgsStatic.gmp is accessing TLS when initializing something.

The disassembling suggests it's happening before the calls to __gmpn_cpuid in:
https://github.com/gmp-mirror/gmp/blob/4b3b66d3328e390af894b97e0438aed90c3305be/mpn/x86_64/fat/fat.c#L263-L266

Maybe in CPUVEC_SETUP_x86_64 which is generated when configuring the build of gmp.

This seems to match the fact that the perfectly working prebuilt binary on https://builds.robur.coop is built on FreeBSD for x86-64:

$ file dnsvizor.hvt
dnsvizor.hvt: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), statically linked, interpreter /nonexistent/solo5/, for OpenBSD, stripped

a platform on which TLS is not supported by the building toolchain:

On amd64 the assembler does not support thread-local access relocations in 64-bit mode (binutils 2.13.2)

Which gives:

$ readelf -l dnsvizor.hvt | grep TLS -A1
  TLS            0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000         0x0

Whereas dnsvisor.hvt built using this nix package currently has:

$ readelf -l result/dnsvizor.hvt | grep TLS -A1
  TLS            0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000         0x8

It may be possible to tell the assembler or linker to disable TLS, but I have not yet found a flag to do it, only -mtls-check in gas to disable checking TLS relocation.

@jian-lin
Copy link
Copy Markdown
Contributor Author

jian-lin commented Feb 4, 2026

Oh noes, ... the resulting dnvizor.hvt crashes at startup on my x86_64-linux, with the error:

Solo5: trap: type=#PF ec=0x0 rip=0x466a86 rsp=0x1ffffc10 rflags=0x10002
Solo5: trap: cr2=0x28
Solo5: ABORT: cpu_x86_64.c:181: Fatal trap

This is the same error I already mentioned. Not sure why you seem surprised😅.

I did not post the error details because I did not expect others' help. I guess I probably should have posted the error itself and maybe my investigation. FWIW, my investigation stopped at solo5 and did not went down to gmp TLS.

Great progress on the investigation! 👍

@jian-lin
Copy link
Copy Markdown
Contributor Author

jian-lin commented Feb 7, 2026

merged via ##2018

closing

@jian-lin jian-lin closed this Feb 7, 2026
@github-project-automation github-project-automation Bot moved this to Done in Nix@NGI Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

DNSvizor: Package it for NGIpkgs

4 participants