[SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports #12354

HyukjinKwon · 2016-04-13T09:18:54Z

What changes were proposed in this pull request?

Old HadoopFsRelation API includes buildInternalScan() which uses SqlNewHadoopRDD in ParquetRelation.
Because now the old API is removed, SqlNewHadoopRDD is not used anymore.

So, this PR removes SqlNewHadoopRDD and several unused imports.

This was discussed in #12326.

How was this patch tested?

Several related existing unit tests and sbt scalastyle.

SparkQA · 2016-04-13T10:44:37Z

Test build #55707 has finished for PR 12354 at commit 8b8e961.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-04-13T11:13:32Z

cc @cloud-fan

cloud-fan · 2016-04-13T12:08:49Z

There is a SqlNewHadoopRDDState, we should rename it to FileScanRDDState

HyukjinKwon · 2016-04-13T12:21:48Z

Thanks! I just renamed.

SparkQA · 2016-04-13T13:59:31Z

Test build #55708 has finished for PR 12354 at commit a291332.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-13T14:14:49Z

Test build #55709 has finished for PR 12354 at commit 3f8e878.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-04-13T14:39:07Z

core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala

      split.inputSplit.value match {
-        case fs: FileSplit => SqlNewHadoopRDDState.setInputFileName(fs.getPath.toString)
-        case _ => SqlNewHadoopRDDState.unsetInputFileName()
+        case fs: FileSplit => FileScanRDDState.setInputFileName(fs.getPath.toString)


@liancheng @yhuai Do you still use HadoopRDD to read data source relation? If not, I think we don't need to update file name here anymore.

I think it is used by HiveTableScan

@cloud-fan I see. This is used in https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InputFileName.scala during code generation.

Should I remove the change of file name?

i see, how about renaming it to InputFileNameHolder?

Sure. Thanks.

… SqlNewHadoopRDD

cloud-fan · 2016-04-14T00:44:28Z

core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDDState.scala

- * TODO: Move/Combine this with org.apache.spark.sql.datasources.SqlNewHadoopRDD
+ * State for FileScanRDD objects. This is split this way because of the package splits.
 */
 private[spark] object SqlNewHadoopRDDState {


SqlNewHadoopRDDState is definitely not a good name here, how about InputFileNameHolder?

Yes. Thanks!

HyukjinKwon · 2016-04-14T00:56:53Z

@cloud-fan The commits I just submitted include the changes for MiMa tests and some comments.

SparkQA · 2016-04-14T02:33:52Z

Test build #55772 has finished for PR 12354 at commit 6cb3547.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-14T02:37:17Z

Test build #55769 has finished for PR 12354 at commit 9c5893d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-04-14T02:39:22Z

retest this please

SparkQA · 2016-04-14T04:32:49Z

Test build #55784 has finished for PR 12354 at commit 6cb3547.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-04-14T07:45:24Z

thanks! merging to master!

Remove SqlNewHadoopRDD and unused imports

8b8e961

Rename SqlNewHadoopRDDState to FileScanRDDState

a291332

Update indentation

3f8e878

cloud-fan reviewed Apr 13, 2016
View reviewed changes

Remove the change of file name and add some more changes for removing…

9c5893d

… SqlNewHadoopRDD

cloud-fan reviewed Apr 14, 2016
View reviewed changes

Rename SqlNewHadoopRDDState to InputFileNameHolder

6cb3547

asfgit closed this in b481940 Apr 14, 2016

HyukjinKwon deleted the SPARK-14596 branch January 2, 2018 03:40

[SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports #12354

[SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports #12354

Uh oh!

Conversation

HyukjinKwon commented Apr 13, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 13, 2016

Uh oh!

HyukjinKwon commented Apr 13, 2016

Uh oh!

cloud-fan commented Apr 13, 2016

Uh oh!

HyukjinKwon commented Apr 13, 2016

Uh oh!

SparkQA commented Apr 13, 2016

Uh oh!

SparkQA commented Apr 13, 2016

Uh oh!

cloud-fan Apr 13, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai Apr 13, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

HyukjinKwon commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

cloud-fan commented Apr 14, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants