Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions api/iceberg-service/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ dependencies {
implementation("com.fasterxml.jackson.core:jackson-annotations")
implementation("com.fasterxml.jackson.core:jackson-core")
implementation("com.fasterxml.jackson.core:jackson-databind")

compileOnly(platform(libs.quarkus.bom))
Copy link
Contributor

@dimas-b dimas-b Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we do not need the Quarkus BOM here. Could we add a specific local version for org.eclipse.microprofile.fault-tolerance:microprofile-fault-tolerance-api to the TOML file? Since the dep is compile-only, version mismatches should not be a problem. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was my main concern initially, but I did not property research and articulate it 🤦

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a version for microprofile-fault-tolerance-api to the toml file, which is the same version brought in by the current version of Quarkus.

compileOnly("io.quarkus:quarkus-smallrye-fault-tolerance")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one doubt I have about this PR is whether we can safely add these dependencies here -- or what the alternative would be if that's not recommended.

cc @snazy / @dimas-b / @singhpk234

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... Why do we need this dep? @RichardLiu2001 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: Even if we go with server-side timeouts, I believe this control should be applied to implementations, not to API definitions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more detail here @dimas-b? Within dropwizard, we used a filter for this because that seemed like the best narrow waist for a top-level timeout like this. How can we cover all of the APIs if we push this into implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this dep?

This dep is so that IcebergRestCatalogApi can compile.

believe this control should be applied to implementations, not to API definitions

Why? Is it to avoid adding a Quarkus dependency in the API? If so, what is wrong with that?
We also have @Timed at the API level. Why is that acceptable but @Timeout isn't?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Timed is non-intrusive, hence less of a concern, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compile-only deps should not leak to POM, I guess.

In the end, since API is generated and implementations are meant to be used together with the generated API, I guess it does not make a lot of different from the maintenance POV where the annotation is places.

Sorry about the false alarm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not import Quarkus but rather only the dependency that defines the annotation: org.eclipse.microprofile.fault-tolerance:microprofile-fault-tolerance-api

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, cf. #1431 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the dependency to use just microprofile-fault-tolerance-api rather than bringing in the Quarkus.

}

openApiGenerate {
Expand Down
3 changes: 3 additions & 0 deletions api/management-service/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ dependencies {
compileOnly("io.micrometer:micrometer-core")

implementation(libs.slf4j.api)

compileOnly(platform(libs.quarkus.bom))
compileOnly("io.quarkus:quarkus-smallrye-fault-tolerance")
}

openApiGenerate {
Expand Down
3 changes: 3 additions & 0 deletions api/polaris-catalog-service/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ dependencies {
implementation("com.fasterxml.jackson.core:jackson-annotations")
implementation("com.fasterxml.jackson.core:jackson-core")
implementation("com.fasterxml.jackson.core:jackson-databind")

compileOnly(platform(libs.quarkus.bom))
compileOnly("io.quarkus:quarkus-smallrye-fault-tolerance")
}

openApiGenerate {
Expand Down
3 changes: 3 additions & 0 deletions quarkus/defaults/src/main/resources/application.properties
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@ quarkus.otel.sdk.disabled=true

quarkus.test.integration-test-profile=it

quarkus.fault-tolerance.global.timeout.unit=seconds
quarkus.fault-tolerance.global.timeout.value=60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, 60 sec default timeout is too low. I believe we should set the default to allow most calls. Administrators concerned with overload, will have to lower this config based on their specific requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, maybe the default can be something like 600s or even disabled (is that possible?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to no timeout by default (if possible), otherwise 600 sec looks reasonable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabled it via application.properties.


polaris.realm-context.type=default
polaris.realm-context.realms=POLARIS
polaris.realm-context.header-name=Polaris-Realm
Expand Down
1 change: 1 addition & 0 deletions quarkus/service/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ dependencies {
implementation("io.quarkus:quarkus-opentelemetry")
implementation("io.quarkus:quarkus-security")
implementation("io.quarkus:quarkus-smallrye-context-propagation")
implementation("io.quarkus:quarkus-smallrye-fault-tolerance")

implementation(libs.jakarta.enterprise.cdi.api)
implementation(libs.jakarta.inject.api)
Expand Down
3 changes: 3 additions & 0 deletions server-templates/api.mustache
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ import {{javaxPackage}}.inject.Inject;

import org.apache.polaris.core.context.RealmContext;

import org.eclipse.microprofile.faulttolerance.Timeout;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Expand Down Expand Up @@ -108,6 +110,7 @@ public class {{classname}} {
@Produces({ {{#produces}}"{{{mediaType}}}"{{^-last}}, {{/-last}}{{/produces}} }){{/hasProduces}}{{#hasAuthMethods}}
{{#authMethods}}{{#isOAuth}}@RolesAllowed("**"){{/isOAuth}}{{/authMethods}}{{/hasAuthMethods}}
@Timed("{{metricsPrefix}}.{{baseName}}.{{nickname}}")
@Timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the timeout actually defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this change is beneficial... Clients should control their own timeouts (in general).

If this annotation cancels the related HTTP request, will it actually cancel the server-side processing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within Open Catalog, server-side timeouts have been very useful for protecting the service from very long-running requests. Clients can of course have their own timeouts, but I think server side timeouts are very beneficial.

Agreed that this only makes sense if the server-side processing actually gets cancelled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the timeout actually defined?

Quarkus configs - I added a default value of 60s to the default application.properties.

will it actually cancel the server-side processing?

The API method will throw a TimeoutException which gets appropriately mapped in the IcebergExceptionMapper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API method will throw a TimeoutException

That is clear :) What happens to the server threads that are still executing the request that timed out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the docs :) I guess the server thread will get a simple java interrupt, which does look useful. Sorry, I should have paid more attention to this PR initially 🤦

It LGTM (assuming longer default timeput), but let's collect some more reviews, though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is 1 second. To disable the Timeout, we can disable Quarkus fault tolerance which I did in application.properties.

public Response {{nickname}}({{#isMultipart}}MultipartFormDataInput input,{{/isMultipart}}{{#allParams}}{{>queryParams}}{{>pathParams}}{{>headerParams}}{{>bodyParams}}{{^isMultipart}}{{>formParams}},{{/isMultipart}}{{#isMultipart}}{{^isFormParam}},{{/isFormParam}}{{/isMultipart}}{{/allParams}}@Context @MeterTag(key="realm_id",expression="realmIdentifier") RealmContext realmContext,@Context SecurityContext securityContext) {
{{! Don't log form or header params in case there are secrets, e.g., OAuth tokens }}
LOGGER.atDebug().setMessage("Invoking {{baseName}} with params")
Expand Down
3 changes: 3 additions & 0 deletions service/common/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ dependencies {
implementation("com.azure:azure-storage-blob")
implementation("com.azure:azure-storage-file-datalake")

compileOnly(platform(libs.quarkus.bom))
compileOnly("io.quarkus:quarkus-smallrye-fault-tolerance")

testImplementation(platform(libs.junit.bom))
testImplementation("org.junit.jupiter:junit-jupiter")
testImplementation(libs.assertj.core)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
import org.apache.iceberg.exceptions.ValidationException;
import org.apache.iceberg.rest.responses.ErrorResponse;
import org.apache.polaris.core.exceptions.FileIOUnknownHostException;
import org.eclipse.microprofile.faulttolerance.exceptions.TimeoutException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.event.Level;
Expand Down Expand Up @@ -191,6 +192,7 @@ static int mapExceptionToResponseCode(RuntimeException rex) {
case IllegalArgumentException e -> Status.BAD_REQUEST.getStatusCode();
case UnsupportedOperationException e -> Status.NOT_ACCEPTABLE.getStatusCode();
case WebApplicationException e -> e.getResponse().getStatus();
case TimeoutException e -> Status.REQUEST_TIMEOUT.getStatusCode();
default -> Status.INTERNAL_SERVER_ERROR.getStatusCode();
};
}
Expand Down
Loading