-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Reference IRC to return 304-NotModified #14035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Reference IRC to return 304-NotModified #14035
Conversation
core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java
Outdated
Show resolved
Hide resolved
| import com.google.errorprone.annotations.FormatMethod; | ||
|
|
||
| /** Exception to indicate that resource is unchanged. */ | ||
| public class NotModifiedException extends RESTException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree with clients handling the 304 from the server as an error and throwing . I see how it's convenient though in the current implementation since then it's just a try/catch and also in this case there's no real body to work with in the response. In my mind, it's fundamentally not an error so modeling it as an exception doesn't seem right.
I'll need to think more if there's a better alternative though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we considered putting the handling of 304 in the actual HTTP client https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L326 similar to 204 handling? I think the argument there is we should just be able to return null for the table response and then we don't really need to treat this like an "error" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I guess the argument against doing it in the http client is that if we just return null, it's a bit weird for clients to now have to worry about if (loadTableResponse != null) if it's a 304 vs getting an explicit exception and handling that way...
I think I still prefer putting this logic in the HTTP client instead of treating it as an error because all this handling is anyways contained in RESTSessionCatalog and abstracted away from users anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was exactly my thought process, I just came to a different conclusion :)
My argument here is that we associate LoadTableResponse == null with a meaning (304-NotModified) that works now but:
- it's not that intuitive when reading the RESTSessionCatalog code
- Not entirely future proof, because if later some other mechanism also returns null for non-304 scenarios then it's hard to differentiate in RESTSessionCatalog why the result is null
- Having to catch NotModifiedException in RESTSessionCatalog.loadTable() is clear and intuitive, because we explicitly articulate with catching the exception that this is the case when the table is unchanged (and not comparing results to null)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thought: This PR was meant to focus on the server side implementation of sending 304. What we are discussing now is more about how the client should handle such a response and I was going to take care of that in a follow-up PR. The overlap is that the TableErrorHandler is used both in RESTCatalogAdapter.handleRequest() to give error from the server, and also on the HTTPClient.execute() side on the client to throw exceptions based on the received responses.
I admit my client-side follow-up PoC catches NotModifiedException on the RESTSessionCatalog.loadTable() side, but I can adjust that to get null if the table is not changed. But still we'd need this NotModifiedException for the server side if I'm not mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking a look, @pvary !
The thing with changing response types (and parameters) is that it is an API breaking change. Not directly for RESTCatalogAdapter.handleRequest() but for the calling RESTCatalogAdapter.execute() that invokes this, because it is an override of HTTPRequest.execute().
So with the current design of the reference IRC, I think we can either follow the exception driven approach, or implement some (not that foolproof) checks in RESTCatalogServlet like "if the responseBody is null and the load table endpoint was called, then return 304". The exception-driven approach seems to be the cleanest on the reference IRC side.
cc @amogh-jahagirdar @nastra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, @gaborkaszab @pvary .
So @pvary, I do think it's like point 1 where it's a reference implementation of showing how things should work but i'm also aiming for "principle of least surprise" here. Also I just am a little bit against exposing concepts like exceptions for representing cases where it's not really exceptional. At the end of the day the 304 is really the server telling the client "hey use your cache".
Separating out the client side and server side:
1.) On the client side, after looking a bit further, clients like the Github cli and other Http client frameworks do essentially handle null response bodies for 304 and it's expected that the response body is null for a 304.
2.) On the server side, it sounds like the reason for adding the exception was that our current implementation of RESTCatalogAdapter doesn't have a way of expressing non-200 codes without an exception? I feel like that's the limitation to address in RESTCatalogAdapter.
In the end I'm mostly +0 if we want to do the exception mostly because it just feels like exposing an unnecessary concept to work around the current reference server handling, and that can lead to people misusing that exception since it's in the public API. It'd be ideal if we can at least show how complicated the other solution is for being able to return status codes other than 200 without an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting back with your thoughts, @amogh-jahagirdar !
I totally get your point that exceptions shouldn't be the way to control flow for non-error use cases. I did some digging around the area a bit and explored a couple of options. In summary what I think what makes it difficult for expressing non-200 codes from RESTCatalogAdapter is how we use it in our tests. There are 2 setups:
- RESTSessionCatalog -> HttpClient -> RESTCatalogServlet -> RESTCatalogAdapter
- RESTSessionCatalog -> RESTCatalogAdapter (basically mocking the server)
For scenario 1) we have more flexibility because we could abstract whatever we return from RESTCatalogAdapter either in RESTCatalogServlet by converting it to HTTP status codes, or even in HttpClient. Starting here I created some PoC where I return a new LoadTableResponseWithStatusCode from RESTCatalogAdapter and then this could be translated into a 304 on the servlet level without a response body. In turn in HttpClient I could translate such responses into a null response.
However, the difficulty is scenario 2) where RESTCatalogAdapter is directly wired into RESTSessionCatalog as a client, because then whatever is returned from the adapter should be understood by the session catalog, including this new ...WithStatusCode response. This isn't something I'd follow, because it would expose too much details to the session catalog that shouldn't be there.
So long story short, I think we can't do anything fancy here because of the current design. I see 2 options:
- The current one with exceptions, but then these exceptions have to be caught within
RESTSessionCatalogtoo that could be somewhat misleading to the readers because it's a success-case driven by exception. - Return null from
RESTCatalogAdapter.handleRequest(). With this, onRESTSessionCatalog.loadTable()side we have to make null checks and associate getting nulls meaning the table is unchanged. I don't think this is very intuitive.
Btw, do you know the purpose of scenario 2) other than having simple tests mocking the communication with the server? I think it adds extra complexity for the reference IRC implementation a lot, and reduces the amount of complexity we have, e.g. with status code now. Would it make sense to get rid of such usage from the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar FYI, this is how it would look like to return null from RESTCatalogAdapter. I think this is a reasonable alternative to the exception driven approach. I don't think we have any other way to return 304 because of the scenario 2) I described in my previous comment. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gaborkaszab for implementing the alternative, I took a look and that's much more inline with what I was thinking and I think I'd prefer that solution.
5ee6311 to
59aa65e
Compare
59aa65e to
bc704da
Compare
bc704da to
5709c4f
Compare
fbf278f to
66f885e
Compare
|
Hi @amogh-jahagirdar |
|
Hey @amogh-jahagirdar , |
|
Hey sorry for the delay @gaborkaszab yes this is on my list to review today! |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gaborkaszab. I think this generally looks good, just some minor comments on the implementation relevant for supporting 304-not modified. To keep this PR easier to review, could we take up that Route refactoring separately first as it's kind of a large change relative to the actual change for supporting this feature in the refernce IRC.
| return response.getCode() == HttpStatus.SC_NO_CONTENT | ||
| || response.getCode() == HttpStatus.SC_NOT_MODIFIED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the extra code check here? Wouldn't the isSuccessful already cover this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is required here because this one triggers returning early with. Without this we'd run into an "Invalid (null) response body" RESTException.
However, the additional check into isSuccessful is not necessary required (same would be tru for SC_NO_CONTENT too), I think to be future proof it semantically makes sense for an isSuccessful function to return true for 304 too even though the current code returns before calling that function. WDYT?
Additionally, when I experimented around this area I noticed that the current test injects RESTCatalogAdapter directly into RESTSessionCatalog avoiding this code path through HTTPClient. Let me rethink the test a little bit so that we could go through the whole path.
| this.asViewCatalog = catalog instanceof ViewCatalog ? (ViewCatalog) catalog : null; | ||
| } | ||
|
|
||
| enum Route { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we publish a separate PR for this refactoring? I think we could get that in first. It'll make this change really just focused on what's necessary for supporting not modified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Here it is: #14313
| httpRequest.headers().entries().stream() | ||
| .filter(e -> e.name().equals(HttpHeaders.IF_NONE_MATCH)) | ||
| .findFirst(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a getFirstHeader on httpRequest, so I think we could simplify this a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the classes from external hc.http source has such a function, however, this is the internal, Iceberg specific HTTPRequest class that doesn't. I can add such a function there, let me know if it makes sense (this or separate PR)
| .findFirst(); | ||
|
|
||
| String eTag = ETagProvider.of(response.metadataLocation()); | ||
| Preconditions.checkNotNull(eTag, "ETag is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this checkNotNull is neccessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, removed
66f885e to
ac313a8
Compare
RESTCatalogAdapter returns null for load table requests where the table wasn't changed based on ETag comparisons. The RESTCatalogServlet turns this null into a 304 - NOT_MODIFIED. Note, the client side (RESTSessionCatalog) handling of this null response is not implemented within this PR and will come as a follow-up. This won't cause any issues because the IF_NONE_MATCH header is not set, and in turn the null response is not triggered.
ac313a8 to
a049ba6
Compare
|
FYI, I did a rebase with master to have the Route refactor |
|
The changes I made recently on this PR:
@amogh-jahagirdar Do you think you have some capacity to take a look? |
pvary
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from my side.
@amogh-jahagirdar: any more comments?
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks reasonable now thanks @gaborkaszab
|
Thanks @gaborkaszab and @pvary for reviewing |
|
Thanks for the reviews @amogh-jahagirdar @pvary ! |
No description provided.