-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-2158: Upgrade Hadoop dependency to version 3.2.0 #976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-2158: Upgrade Hadoop dependency to version 3.2.0 #976
Conversation
This updates Parquet's Hadoop dependency to 3.2.0. This version adds compatibility with Java 11, as well as many other features and bug fixes.
|
thrift module doesn't compile is using an hadoop internal class tagged as private & which made an incompatible change in hadoop 3. see HADOOP-12436 the good news, the class is deprecated, which explains why nobody has seen it in the wild. Any attempt to use that class would fail with hadoop 3.x on the classpath. |
The deprecated parquet-thrift class PathGlobPattern doesn't compile against hadoop 3.x because in HADOOP-12436 the nominally private class org.apache.hadoop.fs.GlobPattern implementation switched from using java.util.regex.Pattern to com.google.re2j.PatternSyntaxException. The fact nobody has ever reported this problem implies that it is never used on any hadoop 3 release, ever. This commit fixes the build by moving to the google classes. The alternative strategy would actually be to fork the hadoop class. This will work unless/until the hadoop project changes the class again. It may be time to consider removing entirely. Clearly nobody is actually using it.
Disables the API compatibility check and adds rej2j as a 'provided' dependency so that the relevant auditing checks do not fail.
|
This PR fixes Parquet to build/link against Hadoop 3.2.0 and higher. It would be cleaner to remove the deprecated class causing compatibility issues -the fact that nobody has ever reported linkage errors implies it is not in active use |
| <japicmp.version>0.14.2</japicmp.version> | ||
| <shade.prefix>shaded.parquet</shade.prefix> | ||
| <hadoop.version>2.10.1</hadoop.version> | ||
| <hadoop.version>3.2.0</hadoop.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm why 3.2.0, not 3.3.1/3.3.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was being unambitious. move to this, the oldest 3.x release working on java11 ensures that anything else on a version >= to this should link properly.
if you do want to be more current, well, spark is on 3.3.3, hive is trying to move to 3.3.x and I will be doing a 3.3.4 release in a week's time, which is just some security changes mostly of relevance to servers
|
|
||
| import java.util.regex.Pattern; | ||
| import java.util.regex.PatternSyntaxException; | ||
| import com.google.re2j.Pattern; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may not work for projects like Spark who are using Hadoop shaded client, since the GlobPattern.compiled is relocated to org.apache.hadoop.shaded.com.google.re2j.Pattern.
It might be easier to just remove the class as it has been marked as deprecated since Parquet 1.8.0, 2015. It is also not used anywhere in the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for cutting. i will update the patch
|
i will do a separate PR to remove It is used in DeprecatedFieldProjectionFilter, and that is used in org.apache.parquet.hadoop.thrift.ThriftReadSupport if "parquet.thrift.column.filter" is set. that use would have to be cut and rather than just print a deprecation warning, actually fail. nobody must be using this on anything with ASF hadoop binaries 3.2+ or they would have complained about linkage errors by now. |
|
LGTM |
|
thanks. |
This updates Parquet's Hadoop dependency to 3.2.0.
This version adds compatibility with Java 11, as well
as many other features and bug fixes.
Jira
Tests!
it's a dependency update.
Commits
Documentation