Skip to content

Commit b159e46

Browse files
committed
Add support for retrying on a different cluster
1 parent ca0c7e5 commit b159e46

File tree

13 files changed

+953
-21
lines changed

13 files changed

+953
-21
lines changed

presto-client/src/main/java/com/facebook/presto/client/PrestoHeaders.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ public final class PrestoHeaders
4242
public static final String PRESTO_SESSION_FUNCTION = "X-Presto-Session-Function";
4343
public static final String PRESTO_ADDED_SESSION_FUNCTION = "X-Presto-Added-Session-Functions";
4444
public static final String PRESTO_REMOVED_SESSION_FUNCTION = "X-Presto-Removed-Session-Function";
45+
public static final String PRESTO_RETRY_QUERY = "X-Presto-Retry-Query";
4546

4647
public static final String PRESTO_CURRENT_STATE = "X-Presto-Current-State";
4748
public static final String PRESTO_MAX_WAIT = "X-Presto-Max-Wait";

presto-docs/src/main/sphinx/admin/properties.rst

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1156,4 +1156,48 @@ The corresponding session property is :ref:`admin/properties-session:\`\`query_c
11561156

11571157
Use to configure how long a query can be queued before it is terminated.
11581158

1159-
The corresponding session property is :ref:`admin/properties-session:\`\`query_max_queued_time\`\``.
1159+
The corresponding session property is :ref:`admin/properties-session:\`\`query_max_queued_time\`\``.
1160+
1161+
Query Retry Properties
1162+
----------------------
1163+
1164+
``retry.enabled``
1165+
^^^^^^^^^^^^^^^^^
1166+
1167+
* **Type:** ``boolean``
1168+
* **Default value:** ``true``
1169+
1170+
Enable cross-cluster retry functionality. When enabled, queries that fail with
1171+
specific error codes can be automatically retried on a backup cluster if a
1172+
retry URL is provided.
1173+
1174+
``retry.allowed-domains``
1175+
^^^^^^^^^^^^^^^^^^^^^^^^^
1176+
1177+
* **Type:** ``string``
1178+
* **Default value:** (empty, signifying current second-level domain allowed only)
1179+
1180+
Comma-separated list of allowed domains for retry URLs. Supports wildcards
1181+
like ``*.example.com``. For example: ``cluster1.example.com,*.backup.example.net``.
1182+
When empty (default), only retry URLs from the same domain as the current server
1183+
are allowed.
1184+
1185+
``retry.require-https``
1186+
^^^^^^^^^^^^^^^^^^^^^^^
1187+
1188+
* **Type:** ``boolean``
1189+
* **Default value:** ``false``
1190+
1191+
Require HTTPS for retry URLs. When enabled, only HTTPS URLs will be accepted
1192+
for cross-cluster retry operations.
1193+
1194+
``retry.cross-cluster-error-codes``
1195+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1196+
1197+
* **Type:** ``string``
1198+
* **Default value:** ``REMOTE_TASK_ERROR``
1199+
1200+
Comma-separated list of error codes that allow cross-cluster retry. When a query
1201+
fails with one of these error codes, it can be automatically retried on a backup
1202+
cluster if a retry URL is provided. Available error codes include standard Presto
1203+
error codes such as ``REMOTE_TASK_ERROR``, ``CLUSTER_OUT_OF_MEMORY``, etc.

presto-docs/src/main/sphinx/develop/client-protocol.rst

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,9 @@ Request Header Name Description
122122
``X-Presto-Extra-Credential`` Provides extra credentials to the connector. The header is a name=value string that
123123
is saved in the session ``Identity`` object. The name and value are only
124124
meaningful to the connector.
125+
``X-Presto-Retry-Query`` Boolean flag indicating that this query is a placeholder for potential retry.
126+
When set to ``true``, marks the query on the backup cluster as a retry placeholder
127+
and prevents retry chains in cross-cluster retry scenarios.
125128
====================================== =========================================================================================
126129

127130

@@ -184,3 +187,69 @@ Data Member Type Notes
184187
=================
185188

186189
Class ``PrestoHeaders`` enumerates all the HTTP request and response headers allowed by the Presto client REST API.
190+
191+
192+
Cross-Cluster Query Retry
193+
=========================
194+
195+
Presto supports automatic query retry on a backup cluster when a query fails on the primary cluster. This feature enables
196+
high availability by transparently redirecting failed queries to a backup cluster.
197+
198+
The cross-cluster retry mechanism works as follows:
199+
200+
Query Parameters
201+
----------------
202+
203+
When a router or load balancer handles a query that should support cross-cluster retry, it includes the following
204+
query parameters when redirecting the client to the primary cluster:
205+
206+
* ``retryUrl`` - The URL-encoded endpoint of the backup cluster where the query can be retried if it fails
207+
* ``retryExpirationInSeconds`` - The number of seconds until the retry URL expires (must be at least 1). This value
208+
should be set based on the ``Cache-Control`` headers returned by Presto query endpoints. Presto uses ``Cache-Control``
209+
headers to indicate how long a query will be retained in the server's memory. The retry expiration should not exceed
210+
this cache duration to ensure the placeholder query is still available when the retry occurs.
211+
212+
Both parameters must be provided together. If only one is provided, the request will be rejected with a 400 Bad Request error.
213+
214+
Example request to primary cluster::
215+
216+
POST /v1/statement?retryUrl=https%3A%2F%2Fbackup.example.com%3A8080%2Fv1%2Fstatement&retryExpirationInSeconds=300
217+
218+
Retry Header
219+
------------
220+
221+
The ``X-Presto-Retry-Query`` header is used to indicate that a query is being created as a placeholder for potential
222+
retry. When set to ``true``, this header:
223+
224+
* Indicates the query is a retry placeholder on the backup cluster
225+
* Prevents retry chains - a query marked with this header will not trigger another retry if it fails
226+
227+
Retry Flow
228+
----------
229+
230+
1. Router/load balancer POSTs the query to the backup cluster with ``X-Presto-Retry-Query: true`` header to create
231+
a placeholder query that can be used as a retry destination
232+
2. Router redirects (HTTP 307) the client to the primary cluster with ``retryUrl`` and ``retryExpirationInSeconds``
233+
query parameters
234+
3. Client follows the redirect and POSTs the query to the primary cluster
235+
4. Primary cluster executes the query normally
236+
5. If the query fails with a retriable error code (configured on the server), the Presto server modifies the
237+
``nextUri`` in the response to point to the retry URL of the backup cluster
238+
6. Client follows the ``nextUri`` to the backup cluster where the placeholder query executes the actual query
239+
7. If the retry query fails, it will not trigger another retry since it's marked with ``X-Presto-Retry-Query``
240+
241+
Limitations
242+
-----------
243+
244+
Cross-cluster retry has the following limitations:
245+
246+
* **Query types**: Retry only works when no results have been sent back to the client. In practice, this feature
247+
works well for:
248+
249+
- ``CREATE TABLE AS SELECT`` statements
250+
- DDL operations (``CREATE``, ``ALTER``, ``DROP``, etc.)
251+
- ``INSERT`` statements
252+
- ``SELECT`` queries that fail before any results are produced
253+
254+
For ``SELECT`` queries that produce results, retry will only occur if the failure happens during planning or
255+
before the first batch of results is generated.
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
/*
2+
* Licensed under the Apache License, Version 2.0 (the "License");
3+
* you may not use this file except in compliance with the License.
4+
* You may obtain a copy of the License at
5+
*
6+
* http://www.apache.org/licenses/LICENSE-2.0
7+
*
8+
* Unless required by applicable law or agreed to in writing, software
9+
* distributed under the License is distributed on an "AS IS" BASIS,
10+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
* See the License for the specific language governing permissions and
12+
* limitations under the License.
13+
*/
14+
package com.facebook.presto.server;
15+
16+
import com.facebook.airlift.configuration.Config;
17+
import com.facebook.airlift.configuration.ConfigDescription;
18+
import com.facebook.presto.common.ErrorCode;
19+
import com.facebook.presto.spi.StandardErrorCode;
20+
import com.google.common.base.Splitter;
21+
import com.google.common.collect.ImmutableSet;
22+
import jakarta.validation.constraints.NotNull;
23+
24+
import java.util.Set;
25+
26+
import static com.facebook.presto.spi.StandardErrorCode.REMOTE_TASK_ERROR;
27+
import static com.google.common.collect.ImmutableSet.toImmutableSet;
28+
29+
public class RetryConfig
30+
{
31+
private boolean retryEnabled = true;
32+
private Set<String> allowedRetryDomains = ImmutableSet.of();
33+
private boolean requireHttps;
34+
private Set<Integer> crossClusterRetryErrorCodes = ImmutableSet.of(
35+
REMOTE_TASK_ERROR.toErrorCode().getCode());
36+
37+
public boolean isRetryEnabled()
38+
{
39+
return retryEnabled;
40+
}
41+
42+
@Config("retry.enabled")
43+
@ConfigDescription("Enable cross-cluster retry functionality")
44+
public RetryConfig setRetryEnabled(boolean retryEnabled)
45+
{
46+
this.retryEnabled = retryEnabled;
47+
return this;
48+
}
49+
50+
@NotNull
51+
public Set<String> getAllowedRetryDomains()
52+
{
53+
return allowedRetryDomains;
54+
}
55+
56+
@Config("retry.allowed-domains")
57+
@ConfigDescription("Comma-separated list of allowed domains for retry URLs " +
58+
"(supports wildcards like *.example.com)")
59+
public RetryConfig setAllowedRetryDomains(String domains)
60+
{
61+
if (domains == null || domains.trim().isEmpty()) {
62+
this.allowedRetryDomains = ImmutableSet.of();
63+
}
64+
else {
65+
this.allowedRetryDomains = Splitter.on(',')
66+
.trimResults()
67+
.omitEmptyStrings()
68+
.splitToList(domains)
69+
.stream()
70+
.map(String::toLowerCase)
71+
.collect(toImmutableSet());
72+
}
73+
return this;
74+
}
75+
76+
public boolean isRequireHttps()
77+
{
78+
return requireHttps;
79+
}
80+
81+
@Config("retry.require-https")
82+
@ConfigDescription("Require HTTPS for retry URLs")
83+
public RetryConfig setRequireHttps(boolean requireHttps)
84+
{
85+
this.requireHttps = requireHttps;
86+
return this;
87+
}
88+
89+
@NotNull
90+
public Set<Integer> getCrossClusterRetryErrorCodes()
91+
{
92+
return crossClusterRetryErrorCodes;
93+
}
94+
95+
@Config("retry.cross-cluster-error-codes")
96+
@ConfigDescription("Comma-separated list of error codes that allow cross-cluster retry")
97+
public RetryConfig setCrossClusterRetryErrorCodes(String errorCodes)
98+
{
99+
if (errorCodes == null || errorCodes.trim().isEmpty()) {
100+
// Keep the default error codes
101+
return this;
102+
}
103+
else {
104+
this.crossClusterRetryErrorCodes = Splitter.on(',')
105+
.trimResults()
106+
.omitEmptyStrings()
107+
.splitToList(errorCodes)
108+
.stream()
109+
.map(StandardErrorCode::valueOf)
110+
.map(StandardErrorCode::toErrorCode)
111+
.map(ErrorCode::getCode)
112+
.collect(toImmutableSet());
113+
}
114+
return this;
115+
}
116+
}
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
/*
2+
* Licensed under the Apache License, Version 2.0 (the "License");
3+
* you may not use this file except in compliance with the License.
4+
* You may obtain a copy of the License at
5+
*
6+
* http://www.apache.org/licenses/LICENSE-2.0
7+
*
8+
* Unless required by applicable law or agreed to in writing, software
9+
* distributed under the License is distributed on an "AS IS" BASIS,
10+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
* See the License for the specific language governing permissions and
12+
* limitations under the License.
13+
*/
14+
package com.facebook.presto.server;
15+
16+
import com.facebook.airlift.configuration.testing.ConfigAssertions;
17+
import com.google.common.collect.ImmutableMap;
18+
import org.testng.annotations.Test;
19+
20+
import java.util.Map;
21+
22+
import static com.facebook.airlift.configuration.testing.ConfigAssertions.assertFullMapping;
23+
import static com.facebook.airlift.configuration.testing.ConfigAssertions.assertRecordedDefaults;
24+
25+
public class TestRetryConfig
26+
{
27+
@Test
28+
public void testDefaults()
29+
{
30+
assertRecordedDefaults(ConfigAssertions.recordDefaults(RetryConfig.class)
31+
.setRetryEnabled(true)
32+
.setRequireHttps(false)
33+
.setAllowedRetryDomains(null)
34+
.setCrossClusterRetryErrorCodes("REMOTE_TASK_ERROR"));
35+
}
36+
37+
@Test
38+
public void testExplicitPropertyMappings()
39+
{
40+
Map<String, String> properties = new ImmutableMap.Builder<String, String>()
41+
.put("retry.enabled", "false")
42+
.put("retry.allowed-domains", "*.foo.bar,*.baz.qux")
43+
.put("retry.require-https", "true")
44+
.put("retry.cross-cluster-error-codes", "QUERY_QUEUE_FULL")
45+
.build();
46+
47+
RetryConfig expected = new RetryConfig()
48+
.setRetryEnabled(false)
49+
.setRequireHttps(true)
50+
.setAllowedRetryDomains("*.foo.bar,*.baz.qux")
51+
.setCrossClusterRetryErrorCodes("QUERY_QUEUE_FULL");
52+
53+
assertFullMapping(properties, expected);
54+
}
55+
}

presto-main/src/main/java/com/facebook/presto/server/CoordinatorModule.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,10 @@ protected void setup(Binder binder)
210210
binder.bind(QueryBlockingRateLimiter.class).in(Scopes.SINGLETON);
211211
newExporter(binder).export(QueryBlockingRateLimiter.class).withGeneratedName();
212212

213+
// retry configuration
214+
configBinder(binder).bindConfig(RetryConfig.class);
215+
binder.bind(RetryUrlValidator.class).in(Scopes.SINGLETON);
216+
213217
binder.bind(LocalQueryProvider.class).in(Scopes.SINGLETON);
214218
binder.bind(ExecutingQueryResponseProvider.class).to(LocalExecutingQueryResponseProvider.class).in(Scopes.SINGLETON);
215219

0 commit comments

Comments
 (0)