Skip to content

Implement Strict Routing Semantics for Routing Rules#804

Open
Peiyingy wants to merge 8 commits intotrinodb:mainfrom
Peiyingy:Peiyingy/enforceIsolation
Open

Implement Strict Routing Semantics for Routing Rules#804
Peiyingy wants to merge 8 commits intotrinodb:mainfrom
Peiyingy:Peiyingy/enforceIsolation

Conversation

@Peiyingy
Copy link
Copy Markdown
Member

@Peiyingy Peiyingy commented Dec 4, 2025

Description

This PR adds strictRouting flag for routing as proposed in #797.
Routing rules can now support strictRouting: True to force pinning the query to the routing group

When strictRouting = false (default), the routing behavior remains unchanged.
When strictRouting = true, the gateway would not fall back to default clusters if the pinned backend becomes unavailable. Instead, the query should fail immediately with a 404 error.

Testing

  • mvn clean install
  • Added new test cases in TestStochasticRoutingManager
  • Local test to verify the strictRouting flag works
trino> select 1;
Error starting query at http://localhost:8080/v1/statement returned an invalid response: JsonResponse{statusCode=404, headers={content-length=[96], content-type=[text/plain], date=[Thu, 04 Dec 2025 03:45:04 GMT], vary=[Accept-Encoding]}, hasValue=false} [Error: No healthy backends available for routing group 'adhoc1' under strict routing for user 'pye']

clean up code

wip

add tests
@cla-bot cla-bot Bot added the cla-signed label Dec 4, 2025
@Peiyingy Peiyingy marked this pull request as ready for review December 4, 2025 03:57
@RoeyoOgen
Copy link
Copy Markdown
Contributor

Just a few remarks:

  1. why "enforceIsolation" and not strictRouting? or noFallback?
  2. does this differ if the routing group is a single cluster rather than a group? - does this clash with your other PR #796?

@xkrogen
Copy link
Copy Markdown
Member

xkrogen commented Dec 4, 2025

why "enforceIsolation" and not strictRouting? or noFallback?

+1 strictRouting is a good name

@Peiyingy
Copy link
Copy Markdown
Member Author

Peiyingy commented Dec 4, 2025

Just a few remarks:

  1. why "enforceIsolation" and not strictRouting? or noFallback?
  2. does this differ if the routing group is a single cluster rather than a group? - does this clash with your other PR #796?
  1. Thanks for the suggesting. I'll change the name to strictRouting
  2. This would also support routing cluster rules. After either of the PR is merged, we just need to rebase the other PR to make sure both strictRouting and cluster routing are supported.

@Peiyingy Peiyingy changed the title Implement Stronger Isolation Semantics for Routing Rules Implement Strict Routing Semantics for Routing Rules Dec 4, 2025
Comment thread gateway-ha/src/main/java/io/trino/gateway/ha/router/BaseRoutingManager.java Outdated
.filter(backEnd -> isBackendHealthy(backEnd.getName()))
.toList();
if (strictRouting && backends.isEmpty()) {
throw new WebApplicationException(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why WebApplicationException and not IllegalStateException like in IllegalStateException("Number of active backends found zero"))

or maybe a custom Explicit Exception inherited from RouterException?

Copy link
Copy Markdown
Contributor

@felicity3786 felicity3786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also update the official doc?

.filter(backEnd -> isBackendHealthy(backEnd.getName()))
.toList();
if (strictRouting && backends.isEmpty()) {
throw new WebApplicationException(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a WARN or ERROR with context when the 404 is triggered? Also +1 on maybe making it a NoBackendAvailableException?

Copy link
Copy Markdown
Member

@oneonestar oneonestar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ambiguous what you mean strictRouting.

There are two fallback logics in routing.

  1. If RoutingGroupSelector failed to select a routing group for a query, fallback to defaultRoutingGroup.

    String routingGroup = !isNullOrEmpty(routingDestination.routingGroup())
    ? routingDestination.routingGroup()
    : defaultRoutingGroup;

  2. If there are no available active healthy cluster for the selected routing group, fallback to ActiveDefaultBackends (ie. active healthy cluster in defaultRoutingGroup).

    return selectBackend(backends, user).orElseGet(() -> provideDefaultBackendConfiguration(user));

It's not clear which fallback logic (or both) are disabled. If you decided to disable both (currently you only handled 2), please make sure the error messages are clearly distinguishable for these cases. Docs are also required to explain the exact affect of this config.

.toList();
if (strictRouting && backends.isEmpty()) {
throw new WebApplicationException(
Response.status(NOT_FOUND)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client did nothing wrong and shouldn't receive a 4xx error. We should return a 5XX indicate a server side error, maybe 503 Service Unavailable.

@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Ask for help on #trino-gateway-dev on Trino slack.

@github-actions github-actions Bot added the stale label Jan 27, 2026
@github-actions
Copy link
Copy Markdown

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions Bot closed this Feb 17, 2026
@Peiyingy Peiyingy reopened this Feb 18, 2026
@Peiyingy Peiyingy removed the stale label Feb 19, 2026
@mosabua mosabua added the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label Feb 19, 2026
Copy link
Copy Markdown
Member

@oneonestar oneonestar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t like setting the default value to false when it’s null in multiple places, but it might require heavy refactoring to remove that pattern. I’m okay with it for now, and we can improve it later in another PR.

? routingDestination.routingGroup()
: defaultRoutingGroup;
ProxyBackendConfiguration backendConfiguration = routingManager.provideBackendConfiguration(routingGroup, user);
boolean strictRouting = Optional.ofNullable(routingDestination.strictRouting()).orElse(false);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean strictRouting = Optional.ofNullable(routingDestination.strictRouting()).orElse(false);
boolean strictRouting = requireNonNullElse(routingDestination.strictRouting(), false);

* Response from the external routing service that includes:
* - routingGroup: The target routing group for the request (optional)
* - errors: Any errors that occurred during routing
* - externalHeaders: Headers that can be set in the request
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment.

* @param priority priority of the routing rule. Higher number represents higher priority. If two rules have same priority then order of execution is not guaranteed.
* @param actions actions of the routing rule
* @param condition condition of the routing rule
*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment.

Comment on lines +94 to +96
// When no cluster is found:
// - If strictRouting is false, fall back to the default routing group backend.
// - If strictRouting is true, return a 404 response.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the comment to provideBackendConfiguration().
Also, we return 503 now, not 404.

Comment on lines +76 to +84
boolean strictRouting = false;
for (RoutingRule rule : requireNonNull(rules.get())) {
if (rule.evaluateCondition(data, state)) {
log.debug("%s evaluated to true on request: %s", rule, request);
rule.evaluateAction(result, data, state);
strictRouting = rule.isStrictRouting();
}
});
return new RoutingSelectorResponse(result.get(RESULTS_ROUTING_GROUP_KEY));
}
return new RoutingSelectorResponse(result.get(RESULTS_ROUTING_GROUP_KEY), ImmutableMap.of(), strictRouting);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is flawed.
The value of strictRouting is always equal to the last evaluated rule, but the last evaluated rule may not be the one that sets the final routingGroup (e.g., it may only update the state without setting routingGroup).

@ebyhr ebyhr removed the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

8 participants