Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set keepalives on rpcap's control socket #773

Closed
wants to merge 4 commits into from
Closed

Set keepalives on rpcap's control socket #773

wants to merge 4 commits into from

Conversation

kevinboulain
Copy link
Contributor

Would you be interested in the following extension of libpcap's API? It builds up on #772 and closes #766 (but doesn't hard-code anything in libpcap and leave it to the users, which seems to be a much more sensible approach).

Since #772 now verifies the integrity of the TCP control connection, optionally allowing keep-alives on it will allow to more efficiently react on a broken network connectivity (the data being transported over UDP or TCP).
If you prefer this commit to follow #772, feel free to close this pull request (but #772 is useful on its own for UDP-based data transport when the network is regarded as perfect).

On Linux, it seems the first keep-alive is sent after 2 hours of idling and a broken connection will be detected 20 minutes later (from man 7 tcp), if and only if SO_KEEPALIVE is set, so a libpcap application retrieving packets from a remote rpcapd may be stuck for quite a long while.

The usage is quite simple: pcap_set_control_keepalive (happy to get suggestions about a better name) needs to be called after a successful pcap_open with some values (for example, something more aggressive but still reasonable would be pcap_set_control_keepalive(handle, 1, 5, 15, 5) which would wait 15 seconds of idle time before sending a keep-alive, dropping the connection after 5 missed keep-alives each one spaced by 5 seconds).

The keep-alives settings probably only works on Linux.

If the idea is validated, I'll push another commit for the corresponding man pages.

@infrastation
Copy link
Member

Regardless of the other points, TCP keep-alives date back to at least 1989, and Linux isn't the only OS that implements them in present-day Internet.

@kevinboulain
Copy link
Contributor Author

Yup, I only have a Linux for now, but I guess I could probably check the API for *BSD and Windows.

I was more interested in getting your feedback on the basic idea than to implement something exhaustive if I must rewrite most of it because it doesn't fit libpcap's API.

@mcr mcr added this to the release-after-next milestone Apr 25, 2019
@mcr mcr added the rpcap label Apr 26, 2019
@infrastation
Copy link
Member

How do you think, where would a custom keep-alive interval be used: at the client side or the server side or both?

@kevinboulain
Copy link
Contributor Author

I'm not part of the project anymore, but @fjolliton might want to hear about that.

@infrastation
Copy link
Member

Rebased on the current master branch. Any objections to merging this change as it is?

@infrastation
Copy link
Member

As it turns out, this code needs some amendments to build on OSes other than Linux.

@guyharris
Copy link
Member

We no longer have pcap_snprintf() in the trunk; just use snprintf(), as we only support building on platforms (OS+compiler) that support it.

SOL_TCP is a Linuxism; NetBSD and FreeBSD don't define it, although they
define TCP_KEEPCNT, TCP_KEEPIDLE and TCP_KEEPINTVL.  The matter is, even
on Linux the man page says:

"For example, to indicate that an option is to be interpreted by the TCP
protocol, level should be set to the protocol number of TCP; see
getprotoent(3)."
@infrastation
Copy link
Member

C:\projects\libpcap\pcap-rpcap.c(3523): warning C4133: 'function': incompatible types - from 'int *' to 'const char *'

That seems to mean that setsockopt() signature is different on Windows.

@infrastation
Copy link
Member

Except Windows, this change now looks much better CI-wise. When the platform does not support setting keepalive, would it be more appropriate to have the error compile-time instead of run-time?

@guyharris
Copy link
Member

guyharris commented Jan 21, 2022

APIs that affect contents of the private data pointed to by the priv member of a pcap_t should not be called directly - they should be called through a pointer in the pcap_t, as their behavior is module-dependent.

You'll have to add a new set_control_keepalive_op to the "More methods." section of the pcap_t structure in pcap-int.h.

Then add a new PCAP_WARNING_CONTROL_KEEPALIVE_NOTSUP error code to the list of warning codes in pcap/pcap.h.

Then, add a pcap_set_control_keepalive_not_initialized() routine to pcap.c that looks something like

static int
pcap_set_control_keepalive_not_initialized(pcap_t *p, int enable _U_, int keepcnt _U, int keepidle _U_, int keepintvl _U_)
{
	if (p->activated) {
		/* Not set by the module, so probably not supported */
		return PCAP_WARNING_CONTROL_KEEPALIVE_NOTSUP;
	}
	/* in case the caller doesn't check for PCAP_ERROR_NOT_ACTIVATED */
	(void)snprintf(pcap->errbuf, sizeof(pcap->errbuf),
	    "This handle hasn't been activated yet");
}

and set set_control_keepalive_op to pcap_set_control_keepalive_not_initialized in initialize_ops() in pcap.c.

Then have add a int pcap_set_control_keepalive() to pcap.c that looks something like

int
pcap_set_control_keepalive(pcap_t *p, int enable, int keepcnt, int keepidle, int keepintvl)
{
	 return (p-> set_control_keepalive_op(p, enable, keepcnt, keepidle, keepintvl);
}

Then rename pcap_set_control_keepalive() in pcap-rpcap.c to pcap_set_control_keepalive_rpcap() and, in pcap_open_rpcap(), set fp-> set_control_keepalive_op to pcap_set_control_keepalive_rpcap.

Finally, add a message string for PCAP_WARNING_CONTROL_KEEPALIVE_NOTSUP to pcap_statustostr().

That way, if somebody calls pcap_set_control_keepalive() on a pcap_t that doesn't refer to an rpcap session, it'll return PCAP_WARNING_CONTROL_KEEPALIVE_NOTSUP, which the application can report or can just ignore.

(Also, make pcap_set_control_keepalive_rpcap() return PCAP_ERROR, which is #defined to -1, rather than -1.)

@infrastation
Copy link
Member

Thank you for the detailed comments, clearly this change is far from being ready for merging. If nobody volunteers to amend it as described, I might try doing that later.

GabrielGanne added a commit to GabrielGanne/libpcap that referenced this pull request Feb 16, 2022
Closes: the-tcpdump-group#773

Signed-off-by: Kevin Boulain <[email protected]>
Signed-off-by: Gabriel Ganne <[email protected]>
@infrastation
Copy link
Member

@ether42, do you prefer to complete this change yourself or to let @GabrielGanne do it?

@kevinboulain
Copy link
Contributor Author

@ether42, do you prefer to complete this change yourself or to let @GabrielGanne do it?

As explained in #773 (comment) I'm not working on this project anymore so anyone can take over.

@infrastation
Copy link
Member

Thank you for confirming, closing this pull request then. Please excuse us for not being able to work on it earlier. @GabrielGanne, please address Guy Harris feedback above in your version of these changes.

GabrielGanne added a commit to GabrielGanne/libpcap that referenced this pull request Sep 20, 2022
Closes: the-tcpdump-group#773

Signed-off-by: Kevin Boulain <[email protected]>
Signed-off-by: Gabriel Ganne <[email protected]>
@GabrielGanne GabrielGanne mentioned this pull request Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

rpcap & rpcapd: lost link isn't detected and may get stuck.
4 participants