Skip to content

Conversation

@andythsu
Copy link
Member

@andythsu andythsu commented Sep 3, 2025

Description

Expose cpu usage and memory from Trino cluster on an endpoint

Additional context and related issues

Resolves #26549

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

## Section
* Expose metrics for Trino Gateway on /v1/integrations/gateway/metrics (issue #26549)

double systemCpuLoad = 0.0;
if (OPERATING_SYSTEM_MX_BEAN instanceof com.sun.management.OperatingSystemMXBean) {
systemCpuLoad = ((com.sun.management.OperatingSystemMXBean) OPERATING_SYSTEM_MX_BEAN).getCpuLoad();
}
Copy link
Member Author

@andythsu andythsu Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking for suggestions here. This could lead to false metrics if OPERATING_SYSTEM_MX_BEAN instanceof com.sun.management.OperatingSystemMXBean is always false. The systemCpuLoad will always be 0.0, and trino-gateway will think there's no load on this cluster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could log a warning

Suggested change
}
} else {
// Log a warning, or throw if this metric is critical
log.warn("Could not retrieve system CPU load: OperatingSystemMXBean is not an instance of com.sun.management.OperatingSystemMXBean");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider making this a JVM requirement we verify in TrinoSystemRequirements. @electrum what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could make this Optional everywhere, and then the client of this endpoint would need to decide what to do when the statistic is not available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to make it a jvm requirement if possible. It's also how it's used here:

if (ManagementFactory.getOperatingSystemMXBean() instanceof OperatingSystemMXBean) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me. We already require com.sun.management.UnixOperatingSystemMXBean and Trino has other implicit requirements on OpenJDK.

@andythsu andythsu force-pushed the gateway_metrics branch 2 times, most recently from b9a76ec to 11317f9 Compare September 4, 2025 13:27
@wendigo
Copy link
Contributor

wendigo commented Sep 4, 2025

Why another endpoint if public v1/status contains CPU load and available memory?

@wendigo
Copy link
Contributor

wendigo commented Sep 4, 2025

{
  "nodeId": "presto-master",
  "nodeVersion": {
    "version": "477-test-1-7-g515c8c0-dirty"
  },
  "environment": "test",
  "coordinator": true,
  "uptime": "8.73s",
  "externalAddress": "presto-master.docker.cluster",
  "internalAddress": "presto-master.docker.cluster",
  "memoryInfo": {
    "availableProcessors": 16,
    "pool": {
      "maxBytes": 1503238554,
      "reservedBytes": 0,
      "reservedRevocableBytes": 0,
      "queryMemoryReservations": {

      },
      "queryMemoryAllocations": {

      },
      "queryMemoryRevocableReservations": {

      },
      "taskMemoryReservations": {

      },
      "taskMemoryRevocableReservations": {

      },
      "freeBytes": 1503238554
    }
  },
  "processors": 16,
  "processCpuLoad": 0.000903614457831325,
  "systemCpuLoad": 0.142035287566504,
  "heapUsed": 244729616,
  "heapAvailable": 2147483648,
  "nonHeapUsed": 306181120
}

@andythsu
Copy link
Member Author

andythsu commented Sep 4, 2025

@wendigo as far as I understand /v1/status only retrieves the status "per node". It doesn't give the entire cluster's metrics. This PR aggregates the metrics so it returns the metrics "per cluster"

@wendigo
Copy link
Contributor

wendigo commented Sep 4, 2025

@andythsu how this PR addresses this need?

@andythsu
Copy link
Member Author

andythsu commented Sep 4, 2025

@wendigo all nodes are available on the coordinator node. We can easily sum all node's free bytes up

        long totalFreeBytes = clusterMemoryManager.getAllNodesMemoryInfo()
                .values()
                .stream()
                .flatMap(Optional::stream)
                .map(MemoryInfo::getPool)
                .mapToLong(MemoryPoolInfo::getFreeBytes)
                .sum();


@GET
@Path("metrics")
@ResourceSecurity(MANAGEMENT_READ)
Copy link
Member Author

@andythsu andythsu Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to use MANAGEMENT_READ because this endpoint will be hit by a dummy user from gateway. Most likely it'll be a role account. We may not want this role account to have management privileges.

Since this endpoint is for integrations, maybe we can have something like user A needs to have <integration> privilege in order to call this endpoint. We can easily use this access type for future integrations as well. Not sure if any of the existing access types can achieve such thing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion the 'MANAGEMENT_READ' is totally fine :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added management read for just this kind of thing. I allows the caller to read data that some may find sensitive, like the IPs of workers. Honestly I would have make this stuff public, but people get concerned.


@GET
@Path("metrics")
@ResourceSecurity(MANAGEMENT_READ)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion the 'MANAGEMENT_READ' is totally fine :)


import static io.trino.server.security.ResourceSecurity.AccessType.MANAGEMENT_READ;

@Path("/v1/integrations/trinoGateway")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use kebab-case or snake_case for paths ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we should use a path that is more "generic" as this endpoint could be used for other user cases as well. These kind of endpoints over time become a de facto part of the protocol so it's design and shape should be considered carefully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other endpoints in trino codebase use camel case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wendigo there was a discussion on slack about making a special endpoint for the gateway like we have for the UI. The implication being that changes to the endpoint would be decided by the gateway team, and could be backwards incompatible if they decided, which they likely will not as they need compatiblity, but it would be their choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make the endpont call just gateway

@ResourceSecurity(MANAGEMENT_READ)
public ClusterMetrics clusterMetrics()
{
long totalFreeBytes = clusterMemoryManager.getAllNodesMemoryInfo()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're calling clusterMemoryManager.getAllNodesMemoryInfo() twice, which results in unnecessary overhead and potential inconsistency if the state changes between calls.
PLS cache the result in a variable

.map(MemoryInfo::getPool)
.mapToLong(MemoryPoolInfo::getFreeBytes)
.sum();
double totalSystemLoad = clusterMemoryManager.getAllNodesMemoryInfo()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
double totalSystemLoad = clusterMemoryManager.getAllNodesMemoryInfo()
double aggregatedSystemLoad = clusterMemoryManager.getAllNodesMemoryInfo()

.values()
.stream()
.flatMap(Optional::stream)
.map(MemoryInfo::getPool)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can getPool() be null? if so: wrap with a null check to avoid potential NullPointerException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, pool won't be null.

this.pool = requireNonNull(pool, "pool is null");

return new ClusterMetrics(totalFreeBytes, totalSystemLoad);
}

public record ClusterMetrics(long totalFreeBytes, double totalSystemLoad)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps add documentation to clarify what each field means

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@GET
@Path("metrics")
@ResourceSecurity(MANAGEMENT_READ)
public ClusterMetrics clusterMetrics()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to getClusterMetrics (as it is actionable)

@andythsu andythsu force-pushed the gateway_metrics branch 2 times, most recently from e3e042f to c4121d7 Compare September 15, 2025 03:46
Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems fine to me, but @wendigo should signoff.

double systemCpuLoad = 0.0;
if (OPERATING_SYSTEM_MX_BEAN instanceof com.sun.management.OperatingSystemMXBean) {
systemCpuLoad = ((com.sun.management.OperatingSystemMXBean) OPERATING_SYSTEM_MX_BEAN).getCpuLoad();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider making this a JVM requirement we verify in TrinoSystemRequirements. @electrum what do you think?

double systemCpuLoad = 0.0;
if (OPERATING_SYSTEM_MX_BEAN instanceof com.sun.management.OperatingSystemMXBean) {
systemCpuLoad = ((com.sun.management.OperatingSystemMXBean) OPERATING_SYSTEM_MX_BEAN).getCpuLoad();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could make this Optional everywhere, and then the client of this endpoint would need to decide what to do when the statistic is not available.


import static io.trino.server.security.ResourceSecurity.AccessType.MANAGEMENT_READ;

@Path("/v1/integrations/trinoGateway")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wendigo there was a discussion on slack about making a special endpoint for the gateway like we have for the UI. The implication being that changes to the endpoint would be decided by the gateway team, and could be backwards incompatible if they decided, which they likely will not as they need compatiblity, but it would be their choice.


@GET
@Path("metrics")
@ResourceSecurity(MANAGEMENT_READ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added management read for just this kind of thing. I allows the caller to read data that some may find sensitive, like the IPs of workers. Honestly I would have make this stuff public, but people get concerned.

Comment on lines 67 to 68
public record ClusterMetrics(long totalFreeBytes, double aggregatedSystemLoad)
{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be on one line

@andythsu
Copy link
Member Author

@wendigo Could you take another look at this PR?

boolean coordinator = buildConfigObject(ServerConfig.class).isCoordinator();
if (coordinator) {
jaxrsBinder(binder).bind(AnnounceNodeResource.class);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. not sure what changed here because it looks fine in my local

newExporter(binder).export(ClusterMemoryManager.class).withGeneratedName();

// metrics used by Trino Gateway
jaxrsBinder(binder).bind(TrinoGatewayResource.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop comment. rename class to GatewayResource.class

@Inject
public TrinoGatewayResource(ClusterMemoryManager clusterMemoryManager)
{
this.clusterMemoryManager = clusterMemoryManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNonNull(clusterMemoryManager, "clusterMemoryManager is null")

*
* @param totalFreeBytes the sum of free memory from each node in the cluster
* @param aggregatedSystemLoad the sum of system load from each node in the cluster
*/
Copy link
Contributor

@wendigo wendigo Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record definition doesn't return anything so it should be rather "represents"

@andythsu andythsu force-pushed the gateway_metrics branch 2 times, most recently from 4530594 to 5853788 Compare September 30, 2025 14:45
@andythsu
Copy link
Member Author

andythsu commented Oct 9, 2025

@wendigo this is ready for another review!

@wendigo
Copy link
Contributor

wendigo commented Oct 16, 2025

@andythsu I applied some changes, take a look and if you are ok with it, I can merge this PR

@andythsu
Copy link
Member Author

@wendigo is this the only change you made?

    private static final Supplier<Double> SYSTEM_CPU_LOAD = Suppliers
            .memoizeWithExpiration(() -> clamp(((UnixOperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getCpuLoad(), 0.0, 1.0), 5, TimeUnit.SECONDS);

if so, I'm happy with this change. TIL clamp is cool

@andythsu
Copy link
Member Author

@wendigo thanks! huh weird I couldn't find that new commit. The diff LGTM!

@wendigo wendigo merged commit 1ca42b4 into trinodb:master Oct 16, 2025
192 of 193 checks passed
@github-actions github-actions bot added this to the 478 milestone Oct 16, 2025
@ebyhr
Copy link
Member

ebyhr commented Oct 16, 2025

The PR title and description are stale - the actual endpoint is different. Please update it.

@wendigo wendigo changed the title Expose /v1/integrations/trinoGateway/metrics Expose /v1/integrations/gateway/metrics Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Expose metrics for Trino Gateway to consume

8 participants