-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Extended header support for RESTClient implementations #12194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Conflict with this refactor: #11992 |
c214964 to
0ee6b38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to allow users to provide their own HTTPClient implementation here and leverage HTTP Caching in a transparent manner while also exposing cache control/size?
This simplifies the flow from Iceberg perspective as well. I think it's better to avoid explicit header handling when it could be done via a compliant HTTP client.
|
To make things simpler from a user perspective we could also add a default CachedHttpClient to the builder in case someone wants something that just works out of the box. |
|
Thanks for taking a look, @1raghavmahajan! This is the proposal doc the freshness-aware table loading just in case: https://docs.google.com/document/d/1rnVSP_iv2I47giwfAe-Z3DYhKkKwWCVvCkC9rEvtaLA In general the whole improvement could work with letting the HttpClient to take care of the caching out of the box itself as you proposed. So I'd continue with this PR to add extended header support for Iceberg's REST clients. |
752eb00 to
7ad2f3f
Compare
| } | ||
|
|
||
| @Override | ||
| public <T extends RESTResponse> T get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're adding this, would it be possible to push the other get implementation to the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response.
Do you mean we could move that other get function from this class into the interface? We could have a "responseHeader-less" version in the interface that calls the version with the responseHeader param using a non-null h -> {} consumer, but I think that would be a behaviour change within the interface. Calling the "responseHeader-less" version would result in an UnsupportedOperationException from the default implementation of the other get function in the interface.
With the current approach I tried to follow how this PR from @nastra introduced the same for the post methods.
| * additional resource allocation. | ||
| */ | ||
| private HTTPClient(HTTPClient parent, AuthSession authSession) { | ||
| HTTPClient(HTTPClient parent, AuthSession authSession) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the builder of this class instead of changing the visibility of the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I'll revert this part (the second commit) from this PR because that was a kind of experiment to see how we could test the new interface in HTTPClient. Agreed, with @nastra that this isn't the way we'd want to go.
7ad2f3f to
df18911
Compare
|
Just sharing some updates for the record: I had limited capacity to follow-up on this recently, but this changes soon hopefully. I continued with the implementation of the freshness-aware loading, but I don't think there is an easy way anyway to test this part of the code as a separate building block. I wonder if we can merge this one as it is (with the additional test reverted) to have granularity with the implementation. @nastra @danielcweeks |
df18911 to
744bb96
Compare
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
Hi @danielcweeks , @nastra , I left this PR to expire since I plan to come up with a wider code change that covers more parts of the freshness aware loading. I just wanted to let you know that I recently had some shift in priorities and didn't have time to work in this, but in some weeks I'll again find capacity to continue the implementation. I have some open question wrt the design, I might reach out to you if you don't mind. |
|
@gaborkaszab @nastra @danielcweeks I am interested in this change to configure/expose the etag headers on GET request, would love to push this forward in a separate PR, please let me know if there's any WIP effort on this. |
|
@dramaticlly I don't think anyone is actively working this, so if you want to pick it up and move forward, I think that would be great. |
|
@dramaticlly @danielcweeks I admit I got a bit sidetracked from this, but still this is on my plate, so I'd be a bit uncomfortable to give it away. Can we coordinate on intentions here before we move one? Anyway, this was my last message on this PR:
|
744bb96 to
4e7fe97
Compare
|
@gaborkaszab do you want to take a look at suggestion from @nastra? Happy to help in any way |
…n ETags The freshness-aware table loading requires some additional support for HTTP headers: - Response headers for get and post requests - Input headers for get request Extended the RESTClient and its implementations to fill in the gaps where these headers weren't supported. With this patch, RESTCatalogAdapter populates the ETag HTTP response header.
e834987 to
669b44e
Compare
gaborkaszab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @nastra and @dramaticlly !
I've taken care of your comments except the one with ETags for stageCreate. I think there we should have some additional discussion.
| import org.apache.iceberg.rest.responses.LoadTableResponse; | ||
|
|
||
| /** Interface for creating the content of the ETag HTTP headers */ | ||
| public interface ETagProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the interface and kept the utility class with a static method
| import org.apache.iceberg.relocated.com.google.common.hash.Hashing; | ||
| import org.apache.iceberg.rest.responses.LoadTableResponse; | ||
|
|
||
| public class DefaultETagProvider implements ETagProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| private static final HashFunction MURMUR3 = Hashing.murmur3_32_fixed(); | ||
|
|
||
| @Override | ||
| public String of(LoadTableResponse resp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @nastra @dramaticlly ! changed the parameter to metadata location
| private static final HashFunction MURMUR3 = Hashing.murmur3_32_fixed(); | ||
|
|
||
| @Override | ||
| public String of(LoadTableResponse resp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| return this; | ||
| } | ||
|
|
||
| public RESTClient withETagProvider(ETagProvider eTagProv) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| respHeaders.clear(); | ||
|
|
||
| Table tbl = catalog.loadTable(TABLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| TableIdentifier.of(TABLE.namespace(), "other_table"), | ||
| ((BaseTable) tbl).operations().current().metadataFileLocation()); | ||
|
|
||
| assertThat(respHeaders).containsKey(HttpHeaders.ETAG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, done
| assertThat(eTag).isEqualTo(respHeaders.get(HttpHeaders.ETAG)); | ||
| } | ||
|
|
||
| private RESTCatalog setUpETagTest(Map<String, String> respHeaders) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| RESTCatalog catalog = catalog(adapter); | ||
|
|
||
| if (requiresNamespaceCreate()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| request.validate(); | ||
| if (request.stageCreate()) { | ||
| return castResponse( | ||
| responseType, CatalogHandlers.stageTableCreate(catalog, namespace, request)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure here. Isn't the metadata location null for stage create? I created a simple test and it was null there, so I don't think we can add an ETag here. Do I miss something?
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
| import org.apache.iceberg.relocated.com.google.common.hash.HashFunction; | ||
| import org.apache.iceberg.relocated.com.google.common.hash.Hashing; | ||
|
|
||
| class ETagProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it might make sense to have a small test class for this where the metadata location is null/well-defined and where we compare against a precalculated etag value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by recalculated tag value do you mean that a test where the expected output is hard-coded? Would that guard against someone changing the implementation of ETag creation? I added a test for that, let me know if this is what you mean
dramaticlly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @gaborkaszab!
|
According to Slack, @nastra seems to be offline until mid Sept. Maybe @amogh-jahagirdar would you mind taking a look? I got some approvals previously already, I just addressed a nit on top to add some extra tests. |
|
Merged to main. |
The freshness-aware table loading requires some additional support for HTTP headers:
Extended the RESTClient and its implementations to fill in the gaps where these headers weren't supported.