Skip to content

Conversation

@dain
Copy link
Member

@dain dain commented Jul 2, 2025

Description

This PR adds two new node inventory systems: Announce and DNS. Announce is the new default and replaces Airlift Discovery. The system used is determined by the discovery.type property which currently can be announce, dns or airlift-discovery.

Announce inventory is a trivial announcement service for nodes. Nodes only announce their internal URI every 5 seconds, as opposed to a full discovery announcement. The announcement server simply keeps a list of known workers that times out after 30 seconds. The existing internal node manager code handles the rest, reaching out to the node to get state and any other critical information. The announcer client uses the same discovery.uri property name as before, so no configuration changes are needed. The new announcer supports multiple names so announcements can be sent to multiple coordinators.

Secondly, the PR adds a DNS based node manager. Instead of announcement, this relies on K8s style DNS infrastructure. In this setup it is expect that there are one or more hostnames that will provide all IP addresses for the workers in the cluster. It is assumed that all nodes are bound to the same port and mixed HTTP / HTTPS is not allowed (all nodes must be configured the same way). The hostnames are configured with the cluster.hosts property. DNS does not have an announcement system, so no configuration is needed on workers.

Release notes

(x) Release notes are required, with the following suggested text:

## Section
* Add Announce node inventory to replace Airlift Discovery.
* Add DNS node inventory for use in K8s like environments that provide a DNS name for all workers.
* Add `discovery.type` to set node inventory system. The value can be `announce`, `dns` or `airlift-discovery`.

@cla-bot cla-bot bot added the cla-signed label Jul 2, 2025
@dain dain requested a review from electrum July 2, 2025 06:30
@dain dain force-pushed the node-manager/announce branch from e609de6 to a2ae1d2 Compare July 2, 2025 18:43
@dain dain changed the title WIP Node manager/announce Node manager/announce Jul 2, 2025
@dain dain marked this pull request as ready for review July 2, 2025 18:44
@dain dain force-pushed the node-manager/announce branch from a2ae1d2 to 80cfdea Compare July 2, 2025 21:40
@dain dain changed the title Node manager/announce Add new simple announce node manager and DNS node manager Jul 2, 2025
{
private List<URI> coordinatorUris = List.of();

@NotNull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be @NotEmpty? Is there a purpose to configure discovery.type=announce with an empty list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinator does not need an announcement unless there are multiple coordinators.

private List<URI> coordinatorUris = List.of();

@NotNull
public List<@NotNull URI> getCoordinatorUris()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using @NotNull here do anything? Can the config system create a list with null elements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means you cannot have a null element in the list. It isn't required, but it is reasonable documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what it does, but seems confusing because it should be impossible. We don't do that anywhere else, but I'm fine if you want to leave it.

checkArgument(port > 0 && port < 65536, "invalid port: %s", port);
this.hosts = requireNonNull(hosts, "hosts is null")
.stream()
.filter(not(Strings::isNullOrEmpty))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get a null host from config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but I'm being careful with announcement configs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real reason is that the docs for getAllByName say that null or empty result in the loopback address being returned and I don't want to deal with that

dain added 2 commits July 2, 2025 18:55
Nodes only announce their internal URI every 5 seconds, as opposed to a
full discovery announcement. The announcement server simply keeps a list
of known workers that times out after 30 seconds. The existing internal
node manager code handles the rest, reaching out to the node to get
state and any other critical information.

The announcer client uses the same discovery.uri property name as
before, and the new code supports multiple names if there are multiple
coordinators to announce to.

The system used is determined by the discovery.type property which
currently can be announce or airlift_discovery.
@dain dain force-pushed the node-manager/announce branch from 80cfdea to 96e4ebd Compare July 3, 2025 02:01
Comment on lines +130 to +147
Futures.addCallback(
responseFuture, new FutureCallback<>()
{
@Override
public void onSuccess(StatusResponse response)
{
int statusCode = response.getStatusCode();
if (statusCode < 200 || statusCode >= 300) {
logWarning("Failed to announce node state to %s: %s", announceUri, statusCode);
}
}

@Override
public void onFailure(Throwable t)
{
logWarning("Error announcing node state to %s: %s", announceUri, t.getMessage());
}
}, directExecutor());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this could be

Futures.addCallback(addSuccessCallback(responseFuture, () -> {
    ...
});
Futures.addCallback(addExceptionCallback(responseFuture, t ->
    logWarning("Error announcing node state to %s: %s", announceUri, t.getMessage()));

@dain dain merged commit 365f1b7 into master Jul 3, 2025
104 checks passed
@dain dain deleted the node-manager/announce branch July 3, 2025 02:59
@github-actions github-actions bot added this to the 477 milestone Jul 3, 2025
@dain
Copy link
Member Author

dain commented Jul 3, 2025

I put the followup suggestions in #26125

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants