[SPARK-19660][CORE][SQL] Replace the configuration property names that are deprecated in the version of Hadoop 2.6#16990
[SPARK-19660][CORE][SQL] Replace the configuration property names that are deprecated in the version of Hadoop 2.6#16990wangyum wants to merge 7 commits intoapache:masterfrom wangyum:HadoopDeprecatedProperties
Conversation
|
Test build #73127 has finished for PR 16990 at commit
|
srowen
left a comment
There was a problem hiding this comment.
I agree with the principle, and all the changes do look as described. I am familiar with some of the properties and yes these are their newer counterparts. I agree we should avoid use of deprecated properties where possible.
|
Test build #73128 has finished for PR 16990 at commit
|
|
|
I'm working on the tests fail. |
|
Test build #73205 has started for PR 16990 at commit |
|
Test build #73207 has finished for PR 16990 at commit
|
|
@srowen @felixcheung e.g; So I change the following file names: |
srowen
left a comment
There was a problem hiding this comment.
Looks good to me. Given how it touches Hadoop config, maybe @vanzin or @steveloughran has a comment
|
LGTM, though you'd have to go do the full coverage to verify that there's not a typo in any of the strings. This is why although Spark has adopted the more readable inline strings, I'm more of a fan of the "refer to the constant" for both spelling and the ability to locate use though the IDE That said in HDFS-9301, HDFS-10610 and HDFS-6418 I have expressed my concerns about HDFS constants, and invariably encountered resistance to fixing regressions. core and YARN are stable, and I'll happily revert anything there if people complain. Note also that deprecation warnings go to a special log, Finally, I still have no idea why HDFS-531 changed fs.default.name to a new mixed case string. It does, well, nothing. |
srowen
left a comment
There was a problem hiding this comment.
I went over this again to check more carefully and have two small questions.
| @@ -1515,12 +1515,12 @@ def test_oldhadoop(self): | |||
|
|
|||
| conf = { | |||
| "mapred.output.format.class": "org.apache.hadoop.mapred.SequenceFileOutputFormat", | |||
There was a problem hiding this comment.
I'm not sure what this key was supposed to be before; maybe mapreduce.outputformat.class? but it can be mapreduce.job.outputformat.class now?
There was a problem hiding this comment.
@srowen
mapred.output.format.class map to old API and mapreduce.job.outputformat.class map to new API. see:
https://github.com/wangyum/spark/blob/97734c5af3df4e6525e8015459af16ab193dfc24/python/pyspark/tests.py#L1664-L1679
| set hive.optimize.bucketmapjoin=true; | ||
| set hive.optimize.bucketmapjoin.sortedmerge=true; | ||
| set hive.mapred.reduce.tasks.speculative.execution=false; | ||
| set hive.mapreduce.job.reduces.speculative.execution=false; |
There was a problem hiding this comment.
Is this supposed to be mapreduce.reduce.speculative? I'm looking at https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
Or maybe the hive.* version is different?
There was a problem hiding this comment.
looks likehive.mapred.reduce.tasks.speculative.execution in the [Hive wiki|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties].
But probably best to pull in a Hive developer, maybe @jcamachor. Jesus: could you look at thee hive config options and make sure they are the current set?
There was a problem hiding this comment.
@steveloughran , I checked the code and property name in Hive is hive.mapred.reduce.tasks.speculative.execution.
…s and revert hive.mapred.reduce.tasks.speculative.execution
|
OK. I have reverted |
|
Test build #73511 has finished for PR 16990 at commit
|
|
I merged this to master, but the script gave an error from git. I had experienced some intermittent Github errors, but I also have a new environment. The commit looks correct but hasn't synced immediately to Github. Not sure what's happened, but will monitor it. |
What changes were proposed in this pull request?
Replace all the Hadoop deprecated configuration property names according to DeprecatedProperties.
except:
https://github.com/apache/spark/blob/v2.1.0/python/pyspark/sql/tests.py#L1533
https://github.com/apache/spark/blob/v2.1.0/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala#L987
https://github.com/apache/spark/blob/v2.1.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala#L45
https://github.com/apache/spark/blob/v2.1.0/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L614
How was this patch tested?
Existing tests