-
Notifications
You must be signed in to change notification settings - Fork 3k
Fix ADLSLocation file parsing #11395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM. @danielcweeks This adds in that test I was looking for where URI would fail, although looks like we have a bug in the current implementation anyway. |
|
Thanks @mrcnc , though overall it's really unfortunate that we have notably different behavior between S3 and ADLS in the URI handling. S3 allows for query params (though they're not considered part of the key) them while ADLS appears to have a non-standard handling. The one think I'm not clear about is the linked documentation doesn't actually go into what the valid path characters are. Is that documented somewhere that we can reference? It would be great to include that in the javadoc for future reference. |
jbonofre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to me avoiding using URI/split there.
|
I originally stripped off the query params to be on the safe side and to be consistent w/ S3FileIO, but given query params aren't specified in the URI format this change looks OK to me, +1 to Dan's comment about adding a Javadoc w/ the URI spec reference. |
dab1b25 to
b39772b
Compare
|
@danielcweeks Could you check to see if the Javadoc change is what you were looking for? |
* Azure: Fix ADLSLocation file parsing * Azure: Remove invalid test cases from ADLSLocationTest * Update Javadocs with reference to ADLS URI
* Azure: Fix ADLSLocation file parsing * Azure: Remove invalid test cases from ADLSLocationTest * Update Javadocs with reference to ADLS URI
* Azure: Fix ADLSLocation file parsing * Azure: Remove invalid test cases from ADLSLocationTest * Update Javadocs with reference to ADLS URI
* Azure: Fix ADLSLocation file parsing * Azure: Remove invalid test cases from ADLSLocationTest * Update Javadocs with reference to ADLS URI
After reviewing the concerns raised in #11344 about using
java.net.URIfor parsing in ADLSLocation, I contrived an example of a location that does not parse correctly. It also fails in the current implementation, so this PR adds a test and fix for the parsing code. Additionally it removes test cases that are invalid, since they don't test valid ABFS syntaxMotivation
The main reason to avoid using
java.net.URIis that it parses according to RFC 2396 but object storage providers do not strictly follow this specification. Specifically, in standard URI syntax, the question mark?separates the path component from the query component. However, Azure Blob Storage allows question marks in blob/file names, making these names incompatible with the RFC 2396 URI specification.Another important point is that Azure Storage APIs are accessed via HTTP APIs, so the
abfsandwasblocation syntax serve as identifiers to blobs accessed through HTTP URLs. This is the motivation behind removing the tests that included query and fragment components, since they would only be used in the HTTP URLs and not in the ABFS URI-like syntax.