Add support for IPv6 scoped addresses (RFC4007)#15263
Add support for IPv6 scoped addresses (RFC4007)#15263straight-shoota merged 29 commits intocrystal-lang:masterfrom
Conversation
|
Could you please open an issue as a feature request first? We ask this to separate high-level discussion of the feature (in the issue discussion) from details of this specific PR (PR discussion). Ref: https://github.com/crystal-lang/crystal/blob/master/CONTRIBUTING.md#using-the-issue-tracker I'm marking this PR as draft for now. |
|
ah well, shame on me. I've just created #15264 now to discuss the feature itself. |
12d0b01 to
e9e524e
Compare
In order to translate interface names in such scoped addresses the required `LibC` binding to `if_nametoindex()` has been added. This method obviously only works for interfaces (devices) that are actually present on the system. The binding for the reverse operation `if_indextoname()` has also been added, although its usage is a bit more cumbersome due to LibC::Char* buffer handling. The necessary buffer length has been placed into the constant `LibC::IF_NAMESIZE`, which appears to be `16u8` on unix-like systems and `257u16` on windows. This could potentially be reworked via a preprocessor block at compile-time as indicated by some folks over on discord, I currently do not know how to achieve that though. Scoped identifiers are only valid for link-local (`fe80::`) addresses, e.g. `fe80::1%eth0` References: - https://datatracker.ietf.org/doc/html/rfc4007 Fixes crystal-lang#15264
e9e524e to
5b253ee
Compare
|
Alright, here we go. Not sure why, but something is not happy in the workflow runs in my fork.
Apart from that the remaining workflows appear to be happy with it. Let me know if you'd like more comments in the added code or want something changed. |
This should never have made it into the previous commit. If allowed, I could squash/fixup to get rid of this noise but as per the contributing guidelines amending/editing and force-pushing branches with active PRs is undesired.
I fully agree with all suggestions, so this commit implements them. As per the contributing guidelines this is a separate commit and not squashed.
|
Thanks for the detailed review and constructive feedback/suggestions. Following the contributing guidelines I have done so with separate commits but I'd be happy to rebase+squash/fixup if desired for a "cleaner" history before merging into |
This just implicitly assumes that `eth0` in the example got enumerated with interface index `2` by the kernel. Should be sufficient for an example.
straight-shoota
left a comment
There was a problem hiding this comment.
Don't worry about squash/rebase. We do that when merging.
Commit history in the PR is best left as is.
Instead of using `#partition('%')` on the input string, which results in
additional `String` allocations, we can stop consuming the slice within
`#parse_v6_fields?` when we encounter the zone seprator character (`'%'`)
and pass that index back to the constructor.
However, this change to `#parse_v6_fields?` is a bit more intrusive and
changes it return type from a simple `UInt16[8]?` to a
`Tuple(UInt16[8]?, Int32?)` to account for the zone separator index.
All references to that method and its specs were updated accordingly to
adopt this change.
Additionally the exception conditions for invalid v6 addresses have been
simplified a bit and all of the converted to proper `Socket::Error`
exceptions.
This is proposed as an alternate solution for the previous implementation. Returning the index of the zone separator (`'%'`) back to the constructor is a bit cumbersome for subsequent usage. Return the remaining subslice instead makes it much easier to use in the constructor for subsequent parsing and within the exception messages. I definitely prefer this approach over the previous one and consider it an improvement.
To avoid a breaking change only modify the private overload. Additionally remove unneeded counter variable for tracking the index of the zone separator as its value can be calculated when needed.
|
Thanks for the explanation and dilligence in the reviews, that goes to everyone who has taken part here, much appreciated. I've pushed two more commits, one to switch the macro to use |
|
I'm not happy about the
IMO input and output types should always be identical, so |
|
That is definitely a good catch. I've adjusted |
|
Windows CI is failing, because Win32 presumably does not set
|
Unfortunately I can't seem to find any useful
|
|
Are we good to merge now? (I'd really like this to be part of the 1.17.0 release 🙂 ) |
| # Scoped/Zoned IPv6 link-local addresses are supported per RFC4007, e.g. | ||
| # `fe80::abcd%eth0` but will always use their numerical interface index | ||
| # in the `#inspect` representation. The interface name can be retrieved later | ||
| # using `#link_local_interface` on the `IPAddress` object. |
There was a problem hiding this comment.
Sorry for the post-merge comment, this is a fantastic PR I will get a bunch of use out of.
However, isn't it actually #to_s which i more important to document the representation of? #to_s is expected to be used in application code, whereas #inspect is for debugging, and happens to use #to_s here.
Additionally, would it make sense for #inspect to show the interface by name, since performance is a non-issue? With fallback to zone number if #inspect fails.
There was a problem hiding this comment.
Improving documentation sounds good for a follow-up 👌
I'm not sure about name vs. id. #to_s is supposed to be a user-friendly format, so that would be an argument to show the name instead of the id there...
There was a problem hiding this comment.
Modifying the #to_s output would be a breaking change. There are likely lots of things/applications/libs that occasionally convert an IPv4/IPv6 address to a string and handing them off to whatever. Each of these implementations would then potentially need adjusting if there was suddenly a %<zone_id> suffix as part of the string.
I do think a separate #to_s_rfc4007? method might be useful to get that exact format. With an optional boolean to chose between numerical zone_id and link_local_interface.
There was a problem hiding this comment.
I do think a separate
#to_s_rfc4007? method might be useful to get that exact format. With an optional boolean to chose between numericalzone_idandlink_local_interface.
Sounds like a great idea. I'd suggest using a dedicated named argument in to_s instead though.
There was a problem hiding this comment.
Correct me if I'm wrong, but the current state of this PR looks like to_s is modified as well as inspect.
I'd prefer for inspect to by default include the zone, by default with name.
There was a problem hiding this comment.
Yes, this PR adds the zone ID to the output of both, #to_s and #inspect (the latter delegates to the former).
We missed adding a spec for that to make it explicit.
Considering https://github.com/crystal-lang/crystal/pull/15263/files#r2103365809 this might have been an unintended change and we should revert that.
I'd prefer for inspect to always include the zone.
Agreed. The information should be there. But should it be ID or name?
Maybe ID because it's sufficient and more succinct?
There was a problem hiding this comment.
After taking another look, the #to_s always returned the address accompanied by the port. If one wanted to obtain just the address, there is an #address method, which hasn't been modified by this PR, i.e. it doesn't include any %<zone_id>, neither numerical nor string.
It is indeed true, that we've now implicitly changed the #to_s behavior with a potentially breaking formatting change. This was definitely an oversight as there is also no spec for it.
I can follow up with another PR for adding an optional bool to #to_s that controls whether or not to include the zone_id.
Should we default to always include it or omit it in #to_s to keep previous behavior and only always include it in #inspect?
Furthermore, when including it, do we use numerical ID or interface name?
I'd be fine attempting #link_local_interface and fallback to using numerical @zone_id if it failed. Thoughts?
There was a problem hiding this comment.
One more thing worth noting:
Since zone_id can only ever be set (i.e. .positive?) on link-local ipv6 addresses, the format change doesn't really affect anything.
If at all, it actually might fix certain implementations since link-local ipv6 addresses might not even work properly without the zone_id.
But I still think we can make improvements in both the comments/docs, default behavior and obviously also add some specs for that.
There was a problem hiding this comment.
Given the effect is only for link-local addresses, I think including the zone ID by default makes sense for both to_s and inspect. However, looking up the interface name has a performance impact so I believe it should only be on by default for inspect.
In order to translate interface names in such scoped addresses the
required
LibCbinding toif_nametoindex()has been added.This method obviously only works for interfaces (devices) that are
actually present on the system.
The binding for the reverse operation
if_indextoname()has also beenadded, although its usage is a bit more cumbersome due to
LibC::Char*buffer handling. The necessary buffer length has been placed into the
constant
LibC::IF_NAMESIZE, which appears to be16u8on unix-likesystems and
257u16on windows.This could potentially be reworked via a preprocessor block at
compile-time as indicated by some folks over on discord, I currently do
not know how to achieve that though.
Scoped identifiers are only valid for link-local (
fe80::) addresses, e.g.fe80::1%eth0References:
Fixes #15264 .