Skip to content

Conversation

@gaborkaszab
Copy link
Collaborator

No description provided.

import com.google.errorprone.annotations.FormatMethod;

/** Exception to indicate that resource is unchanged. */
public class NotModifiedException extends RESTException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with clients handling the 304 from the server as an error and throwing . I see how it's convenient though in the current implementation since then it's just a try/catch and also in this case there's no real body to work with in the response. In my mind, it's fundamentally not an error so modeling it as an exception doesn't seem right.

I'll need to think more if there's a better alternative though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered putting the handling of 304 in the actual HTTP client https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L326 similar to 204 handling? I think the argument there is we should just be able to return null for the table response and then we don't really need to treat this like an "error" case.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I guess the argument against doing it in the http client is that if we just return null, it's a bit weird for clients to now have to worry about if (loadTableResponse != null) if it's a 304 vs getting an explicit exception and handling that way...

I think I still prefer putting this logic in the HTTP client instead of treating it as an error because all this handling is anyways contained in RESTSessionCatalog and abstracted away from users anyways.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was exactly my thought process, I just came to a different conclusion :)
My argument here is that we associate LoadTableResponse == null with a meaning (304-NotModified) that works now but:

  • it's not that intuitive when reading the RESTSessionCatalog code
  • Not entirely future proof, because if later some other mechanism also returns null for non-304 scenarios then it's hard to differentiate in RESTSessionCatalog why the result is null
  • Having to catch NotModifiedException in RESTSessionCatalog.loadTable() is clear and intuitive, because we explicitly articulate with catching the exception that this is the case when the table is unchanged (and not comparing results to null)

WDYT?

Copy link
Collaborator Author

@gaborkaszab gaborkaszab Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought: This PR was meant to focus on the server side implementation of sending 304. What we are discussing now is more about how the client should handle such a response and I was going to take care of that in a follow-up PR. The overlap is that the TableErrorHandler is used both in RESTCatalogAdapter.handleRequest() to give error from the server, and also on the HTTPClient.execute() side on the client to throw exceptions based on the received responses.

I admit my client-side follow-up PoC catches NotModifiedException on the RESTSessionCatalog.loadTable() side, but I can adjust that to get null if the table is not changed. But still we'd need this NotModifiedException for the server side if I'm not mistaken.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking a look, @pvary !
The thing with changing response types (and parameters) is that it is an API breaking change. Not directly for RESTCatalogAdapter.handleRequest() but for the calling RESTCatalogAdapter.execute() that invokes this, because it is an override of HTTPRequest.execute().
So with the current design of the reference IRC, I think we can either follow the exception driven approach, or implement some (not that foolproof) checks in RESTCatalogServlet like "if the responseBody is null and the load table endpoint was called, then return 304". The exception-driven approach seems to be the cleanest on the reference IRC side.
cc @amogh-jahagirdar @nastra

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, @gaborkaszab @pvary .

So @pvary, I do think it's like point 1 where it's a reference implementation of showing how things should work but i'm also aiming for "principle of least surprise" here. Also I just am a little bit against exposing concepts like exceptions for representing cases where it's not really exceptional. At the end of the day the 304 is really the server telling the client "hey use your cache".

Separating out the client side and server side:

1.) On the client side, after looking a bit further, clients like the Github cli and other Http client frameworks do essentially handle null response bodies for 304 and it's expected that the response body is null for a 304.

2.) On the server side, it sounds like the reason for adding the exception was that our current implementation of RESTCatalogAdapter doesn't have a way of expressing non-200 codes without an exception? I feel like that's the limitation to address in RESTCatalogAdapter.

In the end I'm mostly +0 if we want to do the exception mostly because it just feels like exposing an unnecessary concept to work around the current reference server handling, and that can lead to people misusing that exception since it's in the public API. It'd be ideal if we can at least show how complicated the other solution is for being able to return status codes other than 200 without an exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting back with your thoughts, @amogh-jahagirdar !

I totally get your point that exceptions shouldn't be the way to control flow for non-error use cases. I did some digging around the area a bit and explored a couple of options. In summary what I think what makes it difficult for expressing non-200 codes from RESTCatalogAdapter is how we use it in our tests. There are 2 setups:

  1. RESTSessionCatalog -> HttpClient -> RESTCatalogServlet -> RESTCatalogAdapter
  2. RESTSessionCatalog -> RESTCatalogAdapter (basically mocking the server)

For scenario 1) we have more flexibility because we could abstract whatever we return from RESTCatalogAdapter either in RESTCatalogServlet by converting it to HTTP status codes, or even in HttpClient. Starting here I created some PoC where I return a new LoadTableResponseWithStatusCode from RESTCatalogAdapter and then this could be translated into a 304 on the servlet level without a response body. In turn in HttpClient I could translate such responses into a null response.
However, the difficulty is scenario 2) where RESTCatalogAdapter is directly wired into RESTSessionCatalog as a client, because then whatever is returned from the adapter should be understood by the session catalog, including this new ...WithStatusCode response. This isn't something I'd follow, because it would expose too much details to the session catalog that shouldn't be there.

So long story short, I think we can't do anything fancy here because of the current design. I see 2 options:

  1. The current one with exceptions, but then these exceptions have to be caught within RESTSessionCatalog too that could be somewhat misleading to the readers because it's a success-case driven by exception.
  2. Return null from RESTCatalogAdapter.handleRequest(). With this, on RESTSessionCatalog.loadTable() side we have to make null checks and associate getting nulls meaning the table is unchanged. I don't think this is very intuitive.

Btw, do you know the purpose of scenario 2) other than having simple tests mocking the communication with the server? I think it adds extra complexity for the reference IRC implementation a lot, and reduces the amount of complexity we have, e.g. with status code now. Would it make sense to get rid of such usage from the tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amogh-jahagirdar FYI, this is how it would look like to return null from RESTCatalogAdapter. I think this is a reasonable alternative to the exception driven approach. I don't think we have any other way to return 304 because of the scenario 2) I described in my previous comment. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gaborkaszab for implementing the alternative, I took a look and that's much more inline with what I was thinking and I think I'd prefer that solution.

@github-actions github-actions bot removed the API label Sep 30, 2025
@gaborkaszab gaborkaszab changed the title API, Core: Reference IRC to throw 304-NotModifiedException Core: Reference IRC to return 304-NotModified Sep 30, 2025
@gaborkaszab gaborkaszab force-pushed the main_304_not_modified branch 4 times, most recently from fbf278f to 66f885e Compare September 30, 2025 18:35
@gaborkaszab
Copy link
Collaborator Author

Hi @amogh-jahagirdar
I've published the alternative approach within this PR. Would you mind taking a look?

@gaborkaszab
Copy link
Collaborator Author

Hey @amogh-jahagirdar ,
Would you mind taking a look at this approach?

@amogh-jahagirdar
Copy link
Contributor

Hey sorry for the delay @gaborkaszab yes this is on my list to review today!

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gaborkaszab. I think this generally looks good, just some minor comments on the implementation relevant for supporting 304-not modified. To keep this PR easier to review, could we take up that Route refactoring separately first as it's kind of a large change relative to the actual change for supporting this feature in the refernce IRC.

Comment on lines +365 to +366
return response.getCode() == HttpStatus.SC_NO_CONTENT
|| response.getCode() == HttpStatus.SC_NOT_MODIFIED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the extra code check here? Wouldn't the isSuccessful already cover this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is required here because this one triggers returning early with. Without this we'd run into an "Invalid (null) response body" RESTException.
However, the additional check into isSuccessful is not necessary required (same would be tru for SC_NO_CONTENT too), I think to be future proof it semantically makes sense for an isSuccessful function to return true for 304 too even though the current code returns before calling that function. WDYT?

Additionally, when I experimented around this area I noticed that the current test injects RESTCatalogAdapter directly into RESTSessionCatalog avoiding this code path through HTTPClient. Let me rethink the test a little bit so that we could go through the whole path.

this.asViewCatalog = catalog instanceof ViewCatalog ? (ViewCatalog) catalog : null;
}

enum Route {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we publish a separate PR for this refactoring? I think we could get that in first. It'll make this change really just focused on what's necessary for supporting not modified

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Here it is: #14313

Comment on lines 289 to 291
httpRequest.headers().entries().stream()
.filter(e -> e.name().equals(HttpHeaders.IF_NONE_MATCH))
.findFirst();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a getFirstHeader on httpRequest, so I think we could simplify this a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the classes from external hc.http source has such a function, however, this is the internal, Iceberg specific HTTPRequest class that doesn't. I can add such a function there, let me know if it makes sense (this or separate PR)

.findFirst();

String eTag = ETagProvider.of(response.metadataLocation());
Preconditions.checkNotNull(eTag, "ETag is null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this checkNotNull is neccessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, removed

@gaborkaszab gaborkaszab force-pushed the main_304_not_modified branch from 66f885e to ac313a8 Compare October 13, 2025 14:37
RESTCatalogAdapter returns null for load table requests where the table
wasn't changed based on ETag comparisons. The RESTCatalogServlet turns
this null into a 304 - NOT_MODIFIED.

Note, the client side (RESTSessionCatalog) handling of this null
response is not implemented within this PR and will come as a follow-up.
This won't cause any issues because the IF_NONE_MATCH header is not set,
and in turn the null response is not triggered.
@gaborkaszab gaborkaszab force-pushed the main_304_not_modified branch from ac313a8 to a049ba6 Compare October 13, 2025 16:07
@gaborkaszab
Copy link
Collaborator Author

FYI, I did a rebase with master to have the Route refactor

@gaborkaszab
Copy link
Collaborator Author

The changes I made recently on this PR:

  • Rebased with main to have the Route class refactor
  • Noticed, the current test mocks the adapter in RESTSessionCatalog and as a result the query doesn't go through HTTPClient and RESTCatalogServlet. Made some adjustments on the tests to be able to mock the adapter within the servlet instead of the one within RESTSessionCatalog to not test the whole chain.

@amogh-jahagirdar Do you think you have some capacity to take a look?

Copy link
Contributor

@pvary pvary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from my side.
@amogh-jahagirdar: any more comments?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable now thanks @gaborkaszab

@amogh-jahagirdar
Copy link
Contributor

Thanks @gaborkaszab and @pvary for reviewing

@amogh-jahagirdar amogh-jahagirdar merged commit f631298 into apache:main Oct 27, 2025
42 checks passed
@gaborkaszab
Copy link
Collaborator Author

Thanks for the reviews @amogh-jahagirdar @pvary !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants