Skip to content

[SPARK-28203][Core][Python] PythonRDD should respect SparkContext's hadoop configuration#25002

Closed
advancedxy wants to merge 2 commits intoapache:masterfrom
advancedxy:SPARK-28203
Closed

[SPARK-28203][Core][Python] PythonRDD should respect SparkContext's hadoop configuration#25002
advancedxy wants to merge 2 commits intoapache:masterfrom
advancedxy:SPARK-28203

Conversation

@advancedxy
Copy link
Contributor

What changes were proposed in this pull request?

  1. PythonHadoopUtil.mapToConf generates a Configuration with loadDefaults disabled
  2. merging hadoop conf in several places of PythonRDD is consistent.

How was this patch tested?

Added a new test and existed tests

…adoop

configuration.

This commit also makes merging hadoop conf consistent in several places
of PythonRDD
*/
def mapToConf(map: java.util.Map[String, String]): Configuration = {
val conf = new Configuration()
val conf = new Configuration(false)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this doesn't break anything. Did you run the UT locally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally this is only called in PythonRDD and I have replaced all the invocations with merged SparkContext's hadoop conf. So it shouldn't break things in spark side. I ran the UTs of Scala side, haven't run python unit tests though.

mapToConf from PythonHadoopUtilSuite to PythonRDDSuite
@advancedxy
Copy link
Contributor Author

@dongjoon-hyun sorry for the delay. I added a test case which should clearly show the wrongly used case in PythonRDD.

@advancedxy
Copy link
Contributor Author

Gently ping @dongjoon-hyun.
Also cc @cloud-fan

@cloud-fan
Copy link
Contributor

seems reasonable to me

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

yea, it seems fine to me too. Let me take another look since this fixes RDD API which is pretty conservative lately.

@SparkQA
Copy link

SparkQA commented Jul 31, 2019

Test build #108433 has finished for PR 25002 at commit 612dadb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class PythonRDDSuite extends SparkFunSuite with LocalSparkContext

@advancedxy
Copy link
Contributor Author

@HyukjinKwon The failure looks unrelated to this commit, could you trigger the test again or add
me to the whitelist?

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 31, 2019

Test build #108445 has finished for PR 25002 at commit 612dadb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class PythonRDDSuite extends SparkFunSuite with LocalSparkContext

@advancedxy
Copy link
Contributor Author

This patch fails due to an unknown error code, -9.

Still unrelated to this commit. Test should be re-triggered. Sorry for the disturbance, @HyukjinKwon

@advancedxy
Copy link
Contributor Author

Gently ping @cloud-fan, @HyukjinKwon and @dongjoon-hyun.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108962 has finished for PR 25002 at commit 612dadb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class PythonRDDSuite extends SparkFunSuite with LocalSparkContext

@advancedxy
Copy link
Contributor Author

@HyukjinKwon tests are passed. Do you have any other concerns?

@HyukjinKwon
Copy link
Member

Merged to master.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 92 usages of new Configuration(), whose default is true.

@advancedxy Could you please check whether their usage is right?
image

@advancedxy
Copy link
Contributor Author

@gatorsmile I did a quick search, most of the usage happens in XxxSuite file, so it may be ignored. Let me do have a thorough look and report back.

@advancedxy
Copy link
Contributor Author

@gatorsmile I believe only two occurrences of new Configuration are invalid, and should be fixed,
namely in org.apache.spark.input.PortableDataStream and org.apache.spark.input.WholeTextFileRecordReader

I will create a jira to resolve that.

@gatorsmile
Copy link
Member

@advancedxy Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants