Skip to content

Conversation

@huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Mar 28, 2020

What changes were proposed in this pull request?

Add back the deprecated R APIs removed by #22843 and #22815.

These APIs are

  • sparkR.init
  • sparkRSQL.init
  • sparkRHive.init
  • registerTempTable
  • createExternalTable
  • dropTempTable

No need to port the function such as

createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...)
}

because this was for the backward compatibility when SQLContext exists before assuming from #9192, but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at #13635.

Why are the changes needed?

Amend Spark's Semantic Versioning Policy

Does this PR introduce any user-facing change?

Yes
The removed R APIs are put back.

How was this patch tested?

Add back the removed tests

@huaxingao
Copy link
Contributor Author

cc @HyukjinKwon @felixcheung

@SparkQA
Copy link

SparkQA commented Mar 28, 2020

Test build #120517 has finished for PR 28058 at commit 69e2953.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

huaxingao commented Mar 28, 2020

Do we need to add back saveAsParquetFile? Seems it is not added back in scala? Also, do we need to add back sparkR.init, sparkRSQL.init, sparkRHive.init?

@gatorsmile
Copy link
Member

Also cc @mengxr @falaki Could you take a look?

@falaki
Copy link
Contributor

falaki commented Mar 28, 2020

If we have added these back to Scala, adding them to R seems good.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31290][SQL][R] Add back the deprecated R APIs [SPARK-31290][R] Add back the deprecated R APIs Mar 29, 2020
@dongjoon-hyun
Copy link
Member

cc @marmbrus

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung
Copy link
Member

felixcheung commented Mar 29, 2020 via email

@rxin
Copy link
Contributor

rxin commented Mar 29, 2020

If they have already been removed prior to 3.0 and nobody has said anything, I don't think we should add those back in.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 30, 2020

Just to clarify things, what the community agreed on is the rubric added at https://spark.apache.org/versioning-policy.html.

Several APIs listed here, jsonFile, parquetFile and saveAsParquetFile were removed in Scala and PySpark before Spark 3.0 at SPARK-12600. I skimmed user mailing list by keyword search and nobody complained.

One last thing is, SparkR dropped RDD support. So I don't think it also makes sense to add jsonRDD back.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 30, 2020

@huaxingao, seems like we roughly agreed upon excluding jsonFile, parquetFile and saveAsParquetFile at SPARK-12600. jsonRDD seems not making sense to add back since SparkR dropped RDD support officially. Can we remove these four in this PR?

@SparkQA
Copy link

SparkQA commented Mar 30, 2020

Test build #120563 has finished for PR 28058 at commit 754f938.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon

This comment has been minimized.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao, can you also add the reasons why we didn't port:

createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...)
}

in the PR description?

Apparently, this was for the backward compatibility when SQLContext exists before assuming from #9192 but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at #13635

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 30, 2020

It seems more complicated than I thought. Let me try to clarify the current status here:

jsonFile and parquetFile

jsonRDD

  • SparkR made RDD API as private APIs as of SPARK-7230; RDD related APIs are supposed to be removed as they are uesless.

saveAsParquetFile

I skimmed user-mailing list with keyword search, and I found no complaints. This implies, no complaints found about:

  • jsonFile, parquetFile and jsonRDD missing in PySpark from Spark 2.0
  • saveAsParquetFile in Scala and PySpark from Spark 2.0

So, I guess it's fine to don't add jsonFile, parquetFile, jsonRDD and saveAsParquetFile back.


TL;DR: from my rough investigation, I think it's fine to don't add jsonFile, parquetFile, jsonRDD and saveAsParquetFile because:

  • people don't quite care about jsonFile, parquetFile, jsonRDD and saveAsParquetFile.
  • jsonFile and parquetFile don't exist in PySpark.
  • jsonRDD is useless as RDD APIs became private in SparkR.
  • saveAsParquetFile does not exist in Scala and PySpark sides.

WDYT @falaki, @mengxr, @felixcheung?

@SparkQA
Copy link

SparkQA commented Mar 30, 2020

Test build #120567 has finished for PR 28058 at commit ce5969b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I discussed offline with @marmbrus. Looks we're fine to exclude these four above.

Merged to master and branch-3.0.

HyukjinKwon pushed a commit that referenced this pull request Apr 1, 2020
### What changes were proposed in this pull request?
Add back the deprecated R APIs removed by #22843 and #22815.

These APIs are

- `sparkR.init`
- `sparkRSQL.init`
- `sparkRHive.init`
- `registerTempTable`
- `createExternalTable`
- `dropTempTable`

No need to port the function such as
```r
createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...)
}
```
because this was for the backward compatibility when SQLContext exists before assuming from #9192,  but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at #13635.

### Why are the changes needed?
Amend Spark's Semantic Versioning Policy

### Does this PR introduce any user-facing change?
Yes
The removed R APIs are put back.

### How was this patch tested?
Add back the removed tests

Closes #28058 from huaxingao/r.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit fd0b228)
Signed-off-by: HyukjinKwon <[email protected]>
@huaxingao
Copy link
Contributor Author

Thank you all for the help!

@huaxingao huaxingao deleted the r branch April 1, 2020 01:41
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
Add back the deprecated R APIs removed by apache#22843 and apache#22815.

These APIs are

- `sparkR.init`
- `sparkRSQL.init`
- `sparkRHive.init`
- `registerTempTable`
- `createExternalTable`
- `dropTempTable`

No need to port the function such as
```r
createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...)
}
```
because this was for the backward compatibility when SQLContext exists before assuming from apache#9192,  but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at apache#13635.

### Why are the changes needed?
Amend Spark's Semantic Versioning Policy

### Does this PR introduce any user-facing change?
Yes
The removed R APIs are put back.

### How was this patch tested?
Add back the removed tests

Closes apache#28058 from huaxingao/r.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants