chore(deps): Bump iceberg to 1.10.0#26879
Conversation
be1b476 to
49a34e5
Compare
|
I see another PR #26754 |
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
context.body()cast toMap<String, String>inIcebergRestCatalogServletis unchecked and could fail at runtime; consider usinginstanceof Map<?, ?>with safe extraction/conversion of values instead of a direct generic cast. - In the token exchange handling branch, it might be safer to validate that
grant_typeandsubject_tokenare non-empty strings (not just non-null) before building theOAuthTokenResponse, to avoid propagating malformed token requests.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `context.body()` cast to `Map<String, String>` in `IcebergRestCatalogServlet` is unchecked and could fail at runtime; consider using `instanceof Map<?, ?>` with safe extraction/conversion of values instead of a direct generic cast.
- In the token exchange handling branch, it might be safer to validate that `grant_type` and `subject_token` are non-empty strings (not just non-null) before building the `OAuthTokenResponse`, to avoid propagating malformed token requests.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/rest/TestIcebergDistributedRest.java:89-93` </location>
<code_context>
Map<String, String> connectorProperties = ImmutableMap.<String, String>builder()
.putAll(restConnectorProperties(serverUri))
.put("iceberg.rest.session.type", SessionType.USER.name())
+ // Enable OAuth2 authentication to trigger token exchange flow
+ // The credential is required to initialize the OAuth2Manager
+ .put("iceberg.rest.auth.type", "OAUTH2")
+ .put("iceberg.rest.auth.oauth2.credential", "client:secret")
.build();
</code_context>
<issue_to_address>
**suggestion (testing):** Add an integration-level assertion that the OAuth2 token-exchange flow actually preserves user identity
Right now the REST tests exercise the OAuth2/token-exchange path, but they don’t verify that identity is actually preserved. Please extend this test class to:
- Run a query whose success depends on the original user identity being preserved across the REST catalog boundary (and would fail if the subject were dropped or replaced with a generic token).
- If the harness allows, assert that REST calls carry a token with the expected `token-exchange-token:sub=` prefix, or otherwise indirectly verify that authorization uses the original subject.
That will demonstrate that the Iceberg bump + token-exchange logic really fixes the underlying identity issue, not just the configuration.
Suggested implementation:
```java
Map<String, String> connectorProperties = ImmutableMap.<String, String>builder()
.putAll(restConnectorProperties(serverUri))
.put("iceberg.rest.session.type", SessionType.USER.name())
// Enable OAuth2 authentication to trigger token exchange flow
// The credential is required to initialize the OAuth2Manager
.put("iceberg.rest.auth.type", "OAUTH2")
.put("iceberg.rest.auth.oauth2.credential", "client:secret")
.build();
return IcebergQueryRunner.builder()
```
```java
import static org.testng.Assert.assertTrue;
import com.facebook.presto.Session;
import com.facebook.presto.spi.security.Identity;
public class TestIcebergDistributedRest
```
```java
@Test
public void testOAuth2TokenExchangePreservesIdentity()
{
// Use a non-default user to make identity preservation verifiable
String user = "test_oauth2_user";
Session session = testSessionBuilder()
.setCatalog("iceberg")
.setSchema("tpch")
.setIdentity(new Identity(user, Optional.empty()))
.build();
// This query should only succeed if the REST catalog sees the original user identity.
// The testing REST server is configured to authorize only subjects that match the
// token-exchange subject derived from the Presto session user.
assertQuery(
session,
"SELECT count(*) FROM orders",
"SELECT count(*) FROM tpch.tiny.orders");
// Verify that the REST call carried a token-exchange token that preserved the subject.
// The testing REST server records the last Authorization header it received.
String authorizationHeader = testingIcebergRestServer.getLastAuthorizationHeader()
.orElseThrow(() -> new AssertionError("Missing Authorization header on REST request"));
assertTrue(
authorizationHeader.startsWith("Bearer token-exchange-token:sub=" + user),
"Expected token-exchange token to preserve user identity, got: " + authorizationHeader);
}
}
```
To make this compile and behave as intended, you will likely need to:
1. Ensure imports:
- Add `import java.util.Optional;`
- Add `import org.testng.annotations.Test;`
- Ensure `assertQuery` and `testSessionBuilder` are available (they typically come from `AbstractTestQueryFramework` in Presto tests). If this class does not extend that yet, you may need to:
- Extend `AbstractTestQueryFramework`, or
- Adapt the code to however sessions/queries are executed in this test class (e.g., via a `QueryRunner` field).
2. Expose the Authorization header from the testing REST server:
- In whatever testing REST server class is used (often something like `TestingIcebergRestServer` or similar), add:
- A field like `private volatile Optional<String> lastAuthorizationHeader = Optional.empty();`
- Logic in the REST request handling path to capture the `Authorization` header from incoming requests and store it into `lastAuthorizationHeader`.
- A public accessor:
```java
public Optional<String> getLastAuthorizationHeader()
{
return lastAuthorizationHeader;
}
```
- Make sure `testingIcebergRestServer` is a field of `TestIcebergDistributedRest` initialized with the server instance used by `restConnectorProperties(serverUri)`.
3. Enforce identity-based authorization on the REST side:
- In the testing REST server's authorization/token-exchange logic, validate that:
- The incoming OAuth2 token-exchange subject matches the Presto session user.
- Configure the server such that a mismatch (e.g., if the subject were dropped or replaced with a generic token) would cause the query executed in `testOAuth2TokenExchangePreservesIdentity` to fail. For example, you can:
- Restrict access to the `orders` table to a specific subject, or
- Reject requests whose token subject does not equal the expected user.
4. Adjust catalog/schema/table in the test if needed:
- If `"tpch"` / `"orders"` are not the right schema/table for this test harness, replace them with a table that is guaranteed to be present and protected by the REST server’s authorization logic.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
10d26f3 to
7c70638
Compare
|
@sourcery-ai review |
Reviewer's GuideUpgrades Iceberg, Parquet, and Avro versions and adjusts dependencies (including Jersey BOM and AWS KMS), updates Parquet writer API usage for deletes, and enhances REST catalog test infrastructure to handle OAuth2 token exchange flows with precise HTTP error/status handling. Sequence diagram for OAuth2 token exchange in Iceberg REST catalog testssequenceDiagram
actor TestRunner
participant IcebergDistributedTestBase
participant TestIcebergDistributedRest
participant IcebergClient as IcebergRestClient
participant PrestoServer
participant RestCatalogServlet as IcebergRestCatalogServlet
participant OAuth2AS as OAuth2AuthorizationServer
TestRunner->>IcebergDistributedTestBase: setupIcebergRestEnvironment()
IcebergDistributedTestBase->>PrestoServer: startWithRestCatalogAndOAuth2()
PrestoServer->>RestCatalogServlet: registerServlet()
TestRunner->>TestIcebergDistributedRest: runOAuth2TokenExchangeTest()
TestIcebergDistributedRest->>IcebergClient: listTables()
IcebergClient->>RestCatalogServlet: GET /v1/namespaces (no access token)
RestCatalogServlet-->>IcebergClient: 401 Unauthorized (WWW-Authenticate with auth server info)
IcebergClient->>OAuth2AS: POST /token (grant_type=urn:ietf:params:oauth:grant-type:token-exchange)
OAuth2AS-->>IcebergClient: 200 OK (access_token)
IcebergClient->>RestCatalogServlet: GET /v1/namespaces (Authorization: Bearer access_token)
RestCatalogServlet-->>IcebergClient: 200 OK (namespace list)
IcebergClient-->>TestIcebergDistributedRest: namespaces result
TestIcebergDistributedRest-->>TestRunner: assertStatusAndBody()
Class diagram for Iceberg REST catalog test infrastructureclassDiagram
class IcebergDistributedTestBase {
}
class TestIcebergDistributedRest {
}
class IcebergRestCatalogServlet {
}
IcebergDistributedTestBase <|-- TestIcebergDistributedRest
TestIcebergDistributedRest --> IcebergRestCatalogServlet
IcebergDistributedTestBase --> IcebergRestCatalogServlet
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The static
capturedTokenExchangeSubjectslist introduces global mutable state shared across tests; consider using per-test instance wiring or automatically resetting this state in a setup hook rather than relying on callers to invokeclearCapturedTokenExchangeSubjects()manually. - In
IcebergRestCatalogServlet.execute, the token-exchange branch writes anOAuthTokenResponsedirectly to the response without applying the standardresponseHeaders(e.g., content type); consider reusing the same header setup logic for consistency with other responses. - The new token-exchange handling in
IcebergRestCatalogServlet.executeadds more branching to an already complex method; consider extracting this logic into a dedicated helper method to keepexecuteeasier to read and maintain.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The static `capturedTokenExchangeSubjects` list introduces global mutable state shared across tests; consider using per-test instance wiring or automatically resetting this state in a setup hook rather than relying on callers to invoke `clearCapturedTokenExchangeSubjects()` manually.
- In `IcebergRestCatalogServlet.execute`, the token-exchange branch writes an `OAuthTokenResponse` directly to the response without applying the standard `responseHeaders` (e.g., content type); consider reusing the same header setup logic for consistency with other responses.
- The new token-exchange handling in `IcebergRestCatalogServlet.execute` adds more branching to an already complex method; consider extracting this logic into a dedicated helper method to keep `execute` easier to read and maintain.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/test/java/org/apache/iceberg/rest/IcebergRestCatalogServlet.java:64-65` </location>
<code_context>
+ private static final String GRANT_TYPE = "grant_type";
+ private static final String TOKEN_EXCHANGE_PREFIX = "token-exchange-token:sub=";
+
+ // Track subjects from token-exchange tokens for test verification
+ private static final List<String> capturedTokenExchangeSubjects = Collections.synchronizedList(new ArrayList<>());
+
private final RESTCatalogAdapter restCatalogAdapter;
</code_context>
<issue_to_address>
**suggestion (testing):** Static mutable state for captured subjects can cause cross-test interference and flakiness.
Because `capturedTokenExchangeSubjects` is static and shared, tests can influence each other based on run order or parallel execution. Instead of relying on individual tests to clear it (as in `testTokenExchangePreservesUserIdentity`), centralize the reset in a common setup/teardown (e.g., `@BeforeMethod`/`@AfterMethod`) or refactor the capture mechanism so each test uses its own instance to avoid cross-test interference.
Suggested implementation:
```java
import java.util.Optional;
import java.util.function.Consumer;
import org.testng.annotations.BeforeMethod;
```
```java
// Track subjects from token-exchange tokens for test verification
private static final List<String> capturedTokenExchangeSubjects = Collections.synchronizedList(new ArrayList<>());
@BeforeMethod
public void resetCapturedTokenExchangeSubjects()
{
capturedTokenExchangeSubjects.clear();
}
private final RESTCatalogAdapter restCatalogAdapter;
```
Scan the test methods (for example, `testTokenExchangePreservesUserIdentity`) for any direct calls to `capturedTokenExchangeSubjects.clear()` and remove them, since the centralized `@BeforeMethod` now ensures a clean list before each test. If there are any tests that rely on the list not being cleared between methods, adjust them to assert only on the subjects captured within that specific test's execution.
</issue_to_address>
### Comment 2
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/rest/TestIcebergDistributedRest.java:161-163` </location>
<code_context>
+ ". Identity was not preserved across REST catalog boundary.");
+ }
+
+ // Prove that a different subject would fail authorization
+ // by verifying the unauthorized user test fails (covered by testRestUserSessionAuthorization)
+ // This confirms the authorization check is actually using the subject from the token
+ }
}
</code_context>
<issue_to_address>
**suggestion (testing):** The comment overstates what this test currently verifies; consider making the test self-contained or softening the claim.
The current comment claims this test proves a different subject would fail authorization, but the test never issues a failing query with a different subject—it only references `testRestUserSessionAuthorization`. To keep the test self-describing, either add an explicit negative-case assertion here, or reword the comment to state that the negative case is validated in `testRestUserSessionAuthorization` and this test only covers identity preservation and subject capture.
</issue_to_address>
### Comment 3
<location> `presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.cpp:460` </location>
<code_context>
{IcebergTableType::PARTITIONS, "PARTITIONS"},
{IcebergTableType::FILES, "FILES"},
{IcebergTableType::REFS, "REFS"},
+ {IcebergTableType::METADATA_LOG_ENTRIES, "METADATA_LOG_ENTRIES"},
{IcebergTableType::PROPERTIES, "PROPERTIES"},
{IcebergTableType::CHANGELOG, "CHANGELOG"},
</code_context>
<issue_to_address>
**issue (review_instructions):** The enumerator name METADATA_LOG_ENTRIES does not follow the required kPascalCase naming convention for enumerators.
Per the style guide, enumerators should be named in kPascalCase (e.g., kMetadataLogEntries). Even though this line is just using the enumerator, it introduces a new reference with a non-conforming name, which suggests the underlying enum value should be renamed (and usage here updated) to match the convention.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `presto-native-execution/**/*.hpp,presto-native-execution/**/*.hpp,presto-native-execution/**/*.cpp`
**Instructions:**
Use kPascalCase for static constants and enumerators.
</details>
</issue_to_address>
### Comment 4
<location> `presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.cpp:460` </location>
<code_context>
{IcebergTableType::PARTITIONS, "PARTITIONS"},
{IcebergTableType::FILES, "FILES"},
{IcebergTableType::REFS, "REFS"},
+ {IcebergTableType::METADATA_LOG_ENTRIES, "METADATA_LOG_ENTRIES"},
{IcebergTableType::PROPERTIES, "PROPERTIES"},
{IcebergTableType::CHANGELOG, "CHANGELOG"},
</code_context>
<issue_to_address>
**issue (review_instructions):** The new IcebergTableType enumerator name METADATA_LOG_ENTRIES is in UPPER_SNAKE_CASE instead of kPascalCase as required for enumerators.
Per the naming guidelines for this directory, enumerators should use kPascalCase. Consider renaming METADATA_LOG_ENTRIES to something like kMetadataLogEntries (and adjusting all corresponding references) to comply with the convention.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `presto-native-execution/**/*.hpp,presto-native-execution/**/*.cpp`
**Instructions:**
Use kPascalCase for static constants and enumerators.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Track subjects from token-exchange tokens for test verification | ||
| private static final List<String> capturedTokenExchangeSubjects = Collections.synchronizedList(new ArrayList<>()); |
There was a problem hiding this comment.
suggestion (testing): Static mutable state for captured subjects can cause cross-test interference and flakiness.
Because capturedTokenExchangeSubjects is static and shared, tests can influence each other based on run order or parallel execution. Instead of relying on individual tests to clear it (as in testTokenExchangePreservesUserIdentity), centralize the reset in a common setup/teardown (e.g., @BeforeMethod/@AfterMethod) or refactor the capture mechanism so each test uses its own instance to avoid cross-test interference.
Suggested implementation:
import java.util.Optional;
import java.util.function.Consumer;
import org.testng.annotations.BeforeMethod; // Track subjects from token-exchange tokens for test verification
private static final List<String> capturedTokenExchangeSubjects = Collections.synchronizedList(new ArrayList<>());
@BeforeMethod
public void resetCapturedTokenExchangeSubjects()
{
capturedTokenExchangeSubjects.clear();
}
private final RESTCatalogAdapter restCatalogAdapter;Scan the test methods (for example, testTokenExchangePreservesUserIdentity) for any direct calls to capturedTokenExchangeSubjects.clear() and remove them, since the centralized @BeforeMethod now ensures a clean list before each test. If there are any tests that rely on the list not being cleared between methods, adjust them to assert only on the subjects captured within that specific test's execution.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/rest/TestIcebergDistributedRest.java
Outdated
Show resolved
Hide resolved
PingLiuPing
left a comment
There was a problem hiding this comment.
Thanks for the code. Leave few comments, let's wait others comments too.
There was a problem hiding this comment.
The authorization error at lines 122-127 sets responseCode(HttpServletResponse.SC_FORBIDDEN) in the ErrorResponse, but then line 131 unconditionally sets the HTTP status to SC_BAD_REQUEST
| // Handle token exchange requests specially to preserve user identity | ||
| if (context.route() == Route.TOKENS && context.body() instanceof Map) { | ||
| @SuppressWarnings("unchecked") | ||
| Map<String, String> tokenRequest = (Map<String, String>) context.body(); |
presto-iceberg/src/test/java/org/apache/iceberg/rest/IcebergRestCatalogServlet.java
Outdated
Show resolved
Hide resolved
6c7c378 to
528424d
Compare
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In IcebergRestCatalogServlet, the unchecked cast from
context.body()toMap<String, String>can be avoided by using a typed model or at leastMap<?, ?>with safer extraction, which would remove the need for the suppression and make the token-exchange handling more robust. - The repeated Jersey exclusions across multiple Iceberg and native-execution dependencies could be centralized (e.g., via a common dependencyManagement or shared exclusion block) to reduce duplication and lower the risk of the exclusion sets diverging over time.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In IcebergRestCatalogServlet, the unchecked cast from `context.body()` to `Map<String, String>` can be avoided by using a typed model or at least `Map<?, ?>` with safer extraction, which would remove the need for the suppression and make the token-exchange handling more robust.
- The repeated Jersey exclusions across multiple Iceberg and native-execution dependencies could be centralized (e.g., via a common dependencyManagement or shared exclusion block) to reduce duplication and lower the risk of the exclusion sets diverging over time.
## Individual Comments
### Comment 1
<location> `presto-iceberg/pom.xml:96-99` </location>
<code_context>
+ <artifactId>parquet-hadoop</artifactId>
+ <version>${dep.parquet.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>org.apache.yetus</groupId>
+ <artifactId>audience-annotations</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.apache.hadoop</groupId>
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Wildcard Jersey exclusions may be overly broad and could hide missing dependencies.
Using `org.glassfish.jersey.core|media|containers` with `*` artifact patterns is quite aggressive and risks excluding needed modules, potentially causing hard-to-diagnose `ClassNotFoundException`s at runtime. Consider explicitly listing the known problematic Jersey artifacts to exclude instead of using wildcards, so additional modules can be added intentionally when required.
Suggested implementation:
```
<!-- Explicit Jersey core exclusions -->
<exclusion>
<groupId>org.glassfish.jersey.core</groupId>
<artifactId>jersey-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.core</groupId>
<artifactId>jersey-client</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.core</groupId>
<artifactId>jersey-server</artifactId>
</exclusion>
<!-- Explicit Jersey media exclusions -->
<exclusion>
<groupId>org.glassfish.jersey.media</groupId>
<artifactId>jersey-media-json-jackson</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.media</groupId>
<artifactId>jersey-media-moxy</artifactId>
</exclusion>
<!-- Explicit Jersey container exclusions -->
<exclusion>
<groupId>org.glassfish.jersey.containers</groupId>
<artifactId>jersey-container-servlet</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.containers</groupId>
<artifactId>jersey-container-servlet-core</artifactId>
</exclusion>
```
1. Make sure this SEARCH block matches the actual Jersey exclusions section in `presto-iceberg/pom.xml`. Adjust surrounding indentation or comments if needed so the patch applies cleanly.
2. If there are additional wildcard Jersey exclusions (e.g., for other `org.glassfish.jersey.*` groupIds), apply the same pattern: replace `artifactId>*</artifactId>` with explicit artifactIds that are known to cause conflicts.
3. Verify the list of excluded Jersey artifacts against the project’s actual dependency tree (`mvn dependency:tree`) and remove or add specific exclusions as needed, keeping them as minimal as possible.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PingLiuPing
left a comment
There was a problem hiding this comment.
Would you explain what's the breaking changes in iceberg 1.10? And which part of the code changes in this PR is/are the adaptions to those breaking changes?
hantangwangd
left a comment
There was a problem hiding this comment.
@PingLiuPing I think the most significant change is that starting from version 1.9, the AuthManager API was enabled, which shift the authentication system of RestCatalog from a built-in implementation to a pluggable architecture. This caused the modifications in our test class. Is there anything you'd like to add? @Joe-Abraham
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @Joe-Abraham, looks good to me!
@hantangwangd Thank you. |
|
I identified a problem associated with the changes made in this PR. More details are available here: #27066 |
This reverts commit b105046.
Description
This PR upgrades Apache Iceberg from version 1.8.1 to 1.10.0, along with related dependency updates to Parquet (1.13.1 → 1.16.0) and Avro (1.11.4 → 1.12.0). The changes include API adaptations for breaking changes in the new Iceberg version and enhanced test infrastructure for OAuth2 token exchange flows in REST catalog operations.
Test Changes
From version 1.9, the AuthManager API was enabled, which shift the authentication system of RestCatalog from a built-in implementation to a pluggable architecture. This caused the modifications in the test class.
Dependency upgrades and additions:
Iceberg connector and authentication improvements:
Parquet writer API updates:
Motivation and Context
Requires the newer version of Apache Iceberg.
Impact
The newer version of Apache Iceberg is now supported.
Test Plan
NA
Contributor checklist
Release Notes