Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add probe_icmp_reply_ttl_total #694

Merged
merged 1 commit into from
Oct 8, 2020
Merged

Conversation

luizluca
Copy link
Contributor

IPv4 TTL (IPv6 hoplimit) identifies the number of routers used by
an arriving packet. It can be used to detect routing problems.

Signed-off-by: Luiz Angelo Daros de Luca [email protected]

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All codepaths here will need manual testing on both Linux and Windows.

prober/icmp.go Outdated Show resolved Hide resolved
prober/icmp.go Outdated Show resolved Hide resolved
prober/icmp.go Outdated Show resolved Hide resolved
prober/icmp.go Outdated Show resolved Hide resolved
prober/icmp.go Outdated Show resolved Hide resolved
@luizluca luizluca force-pushed the icmp_ttl branch 2 times, most recently from f5d07a1 to 28c5059 Compare September 25, 2020 19:15
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks better, this will all need manual testing as we've been caught out before.

prober/icmp.go Outdated Show resolved Hide resolved
prober/icmp.go Outdated Show resolved Hide resolved
prober/icmp.go Outdated Show resolved Hide resolved
prober/icmp.go Show resolved Hide resolved
@luizluca
Copy link
Contributor Author

New push:

  • Renamed rawConn -> v4RawConn, ttlGauge -> hopLimitGauge
  • check if ControlMessage is null (possibly in Windows)

@brian-brazil
Copy link
Contributor

That push didn't seem to change anything, can you try again?

@luizluca luizluca force-pushed the icmp_ttl branch 3 times, most recently from 6e8cf46 to 70673e8 Compare September 29, 2020 21:39
prober/icmp.go Outdated Show resolved Hide resolved
prober/icmp.go Outdated Show resolved Hide resolved
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you've manually tested the various code paths?

prober/icmp.go Outdated Show resolved Hide resolved
@luizluca
Copy link
Contributor Author

luizluca commented Oct 1, 2020

I take it you've manually tested the various code paths

I tested both a working and not working case for IPv4 (raw and not), IPv6 and dns. I forced 'dontFragment' flag modifying the code as I could not pass it using URL variables. Do I need to test anything else?

The raw ping from windows failed as v4RawConn.WriteTo returns "write ip 0.0.0.0->8.8.8.8: sendmsg: not implemented on windows/amd64". That's the only code path that still calls the same functions as I only removed the v4conn wrapper.

It looks like it is expected:
☞ On Windows, the ReadFrom and WriteTo methods of RawConn are not implemented.
☞ On Windows, the ControlMessage for ReadFrom and WriteTo methods of PacketConn is not implemented.
https://godoc.org/golang.org/x/net/ipv4#pkg-note-bug

I guess RawConn does not have any use in Windows as both {Read/Write}(Batch,To,From} are not implemented yet.

BTW, the timeout for non-existing targets is quite long (duration_seconds=119.500175799, about 2min) for both protocols and both systems. At ts=2020-10-01T17:35:09.042Z deadline is 2020-10-01T14:37:08.541967114-03:00. It's quite a long time for an ICMP answer.

@luizluca
Copy link
Contributor Author

luizluca commented Oct 1, 2020

New push:

  • conditionally register the hopLimit prop
  • remove the RawConn msg

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forced 'dontFragment' flag modifying the code as I could not pass it using URL variables.

It comes from the config file.

Do I need to test anything else?

The unprivileged sockets should also be tested.

BTW, the timeout for non-existing targets is quite long

It used to be much shorter, however that made it difficult to test slow targets from the browser.

prober/icmp.go Show resolved Hide resolved
@luizluca
Copy link
Contributor Author

luizluca commented Oct 2, 2020

New Push:

  • removed hasHopLimit. Now it uses hopLimit==-1 as hasHopLimit==false
  • hopLimit initialized as 0 to emphasizes its initial state (although it should be 0 by default)

prober/icmp.go Outdated
requestType icmp.Type
replyType icmp.Type
icmpConn *icmp.PacketConn
v4RawConn *ipv4.RawConn
hopLimit float64 = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The = 0 is redundant, however it'd be cleaner to set it to -1 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be either -1 or 0. I just need to set it to -1 if SetControlMessage fails or 0 if it works. As I don't use the "else" when I check SetControlMessage , I opted to use 0 by default and set it to -1 inside the existing block.

What would be the better option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate bool would be cleaner I think, rather than overloading this variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the first approach. Updated.

prober/icmp.go Outdated
if dstIPAddr.IP.To4() == nil {
var cm *ipv6.ControlMessage
n, cm, peer, err = icmpConn.IPv6PacketConn().ReadFrom(rb)
if cm != nil && hopLimit >= 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the check on hopLimit here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopLimit==-1 would tell me if SetControlMessage failed. In this cade, TTL/HopLimit info might exist (when there is any OOB data) but it is trash. It would be much easier if hopLimit == 0 wasn't a valid state.

var n int
var peer net.Addr
var err error
var hopLimit float64 = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still messy, as there's now two hasHopLimit indicators doing different things. It'd be better to name the bool what is actually, namely whether the CM was set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real issue is that go does not tell me if hopLimit is missing or zero. I opened an issue golang/go#41820.

I renamed the bool to ipv6HopLimitFlagSet and now I use it only for IPv6. If golang fixes the issue, it can be removed in the future to adapt to whatever go uses to inform hopLimit is missing.

Another solution is to assume that hopLimit will never be 0 in a echo-reply. I assumed that hop limit could be zero when it is using the last permitted hop. However, I was wrong as a packet with hopLimit==1 will not get forwarded. The IPv6 echo reply will only be 0 if the target OS is explicitly setting the answer to be 0 (maybe a way to avoid its traffic to be routed?). As it is a very very special case, we might ignore it.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best not to make assumptions about the TTL in a tool like this, as if that weird situation does happen you wouldn't want to lead users down the wrong path. It's probably best to use the flag for both protocols.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's play safe. I reintroduced the bool check for both.

@luizluca
Copy link
Contributor Author

luizluca commented Oct 6, 2020

  • Changed the debug message when TTL/HopLimit is missing
  • Using ipv6HopLimitFlagSet instead of hasHopLimit

prober/icmp.go Outdated Show resolved Hide resolved
IPv6 hop limit (IPv4 TTL) tells how many remaining hops the reply
datagram has. When the target uses a well known initial hop limit,
it can be used to count the number of hops from the target to the
prober and detect routing problems.

Signed-off-by: Luiz Angelo Daros de Luca <[email protected]>
@brian-brazil brian-brazil merged commit 953d80b into prometheus:master Oct 8, 2020
@brian-brazil
Copy link
Contributor

Thanks!

@luizluca
Copy link
Contributor Author

luizluca commented Oct 8, 2020

Thanks!

Thank YOU for Prometheus!

@luizluca luizluca deleted the icmp_ttl branch October 8, 2020 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants