libutil: Use Boost.URL for URI parsing#13445
Conversation
Ericson2314
left a comment
There was a problem hiding this comment.
I like this a lot!
But going to leave for others to review a bit just case there is some issues e.g. with how much Boost or compat I didn't think of
ab42ce4 to
77750fd
Compare
7140696 to
e76b2d6
Compare
Mic92
left a comment
There was a problem hiding this comment.
Fixed some issue in to_string(), let me know what you think.
These cases do not seem to be covered by the test suite at all.
This matcher is useful for checking error messages, which always contain ANSI escapes.
The myriad of hand-rolled URL parsing and validation code is a constant source of problems. Regexes are not a great way of writing parsers and there's a history of getting them wrong. Boost.URL is a good library we can outsource most of the heavy lifting to.
The default comparison operator can be generated by the compiler since C++20.
Boost.URL is a significantly more RFC-compliant parser than what libutil currently has a bundle of incomprehensible regexes. One aspect of this change is that RFC4007 ZoneId IPv6 literals are represented in URIs according to RFC6874 [1]. Previously they were represented naively like so: [fe80::818c:da4d:8975:415c\%enp0s25]. This is not entirely correct, because the percent itself has to be pct-encoded: > "%" is always treated as an escape character in a URI, so, according to the established URI syntax [RFC3986] any occurrences of literal "%" symbols in a URI MUST be percent-encoded and represented in the form "%25". Thus, the scoped address fe80::a%en1 would appear in a URI as http://[fe80::a%25en1]. [1]: https://datatracker.ietf.org/doc/html/rfc6874 Co-authored-by: Jörg Thalheim <joerg@thalheim.io>
7d12190 to
a54284c
Compare
|
Ran the whole |
|
One concern I have is the flake regression suite only tests public flakes, and I suspect some flakes out there will no longer work correctly with new parsing semantics.
…On Sat, Jul 19, 2025, at 4:52 PM, Sergei Zimmerman wrote:
*xokdvium* left a comment (NixOS/nix#13445) <#13445 (comment)>
Ran the whole `flake-regressions` suite and `hydraJobs.tests`. Best I can tell this doesn't regress anything. What's great is that this also fixed a long-standing issue #10898 <#10898>.
—
Reply to this email directly, view it on GitHub <#13445 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASXLH67Q2DEPSFXGBVUSL3JKVYVAVCNFSM6AAAAACBHRJBRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAOJSGU3DKNJWG4>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
I did see that Determinate runs GHA CI with a private flake-regressions-data repository.
Any particular issues you can think of? Did some of the prior non-compliance get ossified into lock files? |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2025-08-06-nix-team-meeting-minutes-239-240/67694/1 |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
Motivation
Boost.URL is a significantly more RFC-compliant parser
than what libutil currently has a bundle of incomprehensible
regexes.
One aspect of this change is that RFC4007 ZoneId IPv6 literals
are represented in URIs according to RFC6874 1.
Previously they were represented naively like so:
[fe80::818c:da4d:8975:415c\%enp0s25].This is not entirely correct, because the percent itself has to be pct-encoded:
Context
Starting to pay off the tech debt #9603.
Fixes #10898.
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.