Skip to content

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Feb 16, 2021

What changes were proposed in this pull request?

This PR replaces all the occurrence of symbol literals with Symbol() constructors.

Why are the changes needed?

As of Scala 2.13, symbol literals are deprecated so when we build with Scala 2.13 and sbt, the compiler loudly inform us.

[warn] /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:38: [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; use Symbol("id") instead
[warn]     val ds = spark.range(20).select(('id % 3).as("key"), 'id).as[(Long, Long)]
[warn]                                      ^
[warn] /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:58: [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; use Symbol("id") instead
[warn]     val ds = spark.range(20).select(('id % 3).as("key"), 'id).as[(Long, Long)]

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I confirmed that compile and test:compile successfully finish with both Scala 2.12 and 2.13.

@sarutak sarutak changed the title [SPARK-34443][CORE] Replace symbol literals with Symbol constructor invocations [SPARK-34443][CORE] Replace symbol literals with Symbol constructor invocations to comply with Scala 2.13 Feb 16, 2021
@sarutak
Copy link
Member Author

sarutak commented Feb 16, 2021

For reviewers: This PR changes too many files though the changes themselves are simple.
So if it's better, I'll split the PR.

@SparkQA
Copy link

SparkQA commented Feb 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39740/

@HyukjinKwon
Copy link
Member

I am okay doing this in one go.

@HyukjinKwon
Copy link
Member

@SparkQA
Copy link

SparkQA commented Feb 16, 2021

Test build #135164 has started for PR 31569 at commit 25112ec.

@SparkQA
Copy link

SparkQA commented Feb 16, 2021

Test build #135159 has finished for PR 31569 at commit be80a62.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39747/

@SparkQA
Copy link

SparkQA commented Feb 16, 2021

Test build #135166 has started for PR 31569 at commit 748bc9a.

@SparkQA
Copy link

SparkQA commented Feb 16, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39747/

@srowen
Copy link
Member

srowen commented Feb 16, 2021

I think it's ok. I did a lot of this a long time ago too just to tackle the compiler warnings.

@srowen
Copy link
Member

srowen commented Feb 16, 2021

I share some of that frustration, not to the same extreme. Breaking binary/source compatibility across every minor release is a ... painful choice for users to eat. I have always thought of Scala as a bit more of a research language and acts accordingly. There are upsides to deciding to change quickly. In this case - removing the weird back tick syntax is a good thing, but should it ever have been that way? But not sure why it should go away entirely. As one of the major Scala projects, I'd hope nothing happens that really busts how Spark has to work.

@rxin
Copy link
Contributor

rxin commented Feb 16, 2021

This is going to be a nightmare. I didn't realize this change had happened upstream in Scala 2.13. I felt like it'd be virtually impossible for most Spark users, including all Databricks customers, to upgrade in the future, because they would need to rewrite a lot of their code, and worse dependency code that they might not control.

@sarutak
Copy link
Member Author

sarutak commented Feb 17, 2021

Hmm, sounds like bad.
This PR itself doesn't intend to prohibit users to use symbol literals (just suppress compiler warning) but if we'll complain to the Scala community of the change, I'll leave the code as it is.
What do you think?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Feb 17, 2021

The change is fine but I have to say I feel frustrated either for the same reasons from @mallman, @srowen and @rxin above.

@sarutak
Copy link
Member Author

sarutak commented Feb 17, 2021

retest this please.

@SparkQA
Copy link

SparkQA commented Feb 17, 2021

Test build #135186 has started for PR 31569 at commit 748bc9a.

@SparkQA
Copy link

SparkQA commented Feb 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39767/

@SparkQA
Copy link

SparkQA commented Feb 17, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39767/

@HeartSaVioR
Copy link
Contributor

Just a 2 cents, it might be better to convert to col() or $ (I'm now feeling unsafe for this as well though) instead of Symbol() on column representations as I feel like they're what we'd prefer as alternatives. Not a requirement as I know you should've done bugging works already.


// int type can be up cast to long type
val attrs1 = Seq('a.string, 'b.int)
val attrs1 = Seq(Symbol("a").string, Symbol("b").int)
Copy link
Member

@maropu maropu Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we replace this with another semantically-equal one, I prefer $"a" instead of Symbol as for the SQL module. IIRC @cloud-fan left the simillar comment in another PR.

HyukjinKwon pushed a commit that referenced this pull request Feb 23, 2021
…mples and documents

### What changes were proposed in this pull request?

This PR replaces all the occurrences of symbol literals (`'name`) with string interpolation (`$"name"`) in examples and documents.

### Why are the changes needed?

Symbol literals are used to represent columns in Spark SQL but the Scala community seems to remove `Symbol` completely.
As we discussed in #31569, first we should replacing symbol literals with `$"name"` in user facing examples and documents.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Build docs.

Closes #31615 from sarutak/replace-symbol-literals-in-doc-and-examples.

Authored-by: Kousuke Saruta <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
@rxin
Copy link
Contributor

rxin commented Feb 26, 2021

I asked @dragos to chime in and he started a discussion with the Scala team and Martin. Let's see what happens.

@SparkQA
Copy link

SparkQA commented Apr 6, 2021

Test build #136941 has finished for PR 31569 at commit 748bc9a.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@LuciferYang
Copy link
Contributor

Is there any progress on this issue?

@sarutak
Copy link
Member Author

sarutak commented Apr 16, 2021

@LuciferYang I keep watching the activity of the Scala community.
They will introduce a language import feature which allows to use symbol literals but it doesn't mean we can continue to use symbol literals forever.
lampepfl/dotty-feature-requests#182 (comment)

So, it seems that Spark users can use the language import feature until they finishes migrating their application but Spark needs to be replace symbol literals.
If we need to replace, most of them will be done by #31601

@srowen
Copy link
Member

srowen commented Apr 16, 2021

Backing up - this change isn't about removing usage of Symbol, but a deprecated syntax for Symbol. This doesn't actually change anything but how it's expressed in source code right? if so, why not do this in Spark (aside from the large code churn of course)?

Changing to Column syntax is another question.

@sarutak
Copy link
Member Author

sarutak commented Apr 16, 2021

Backing up - this change isn't about removing usage of Symbol, but a deprecated syntax for Symbol. This doesn't actually change anything but how it's expressed in source code right? if so, why not do this in Spark (aside from the large code churn of course)?

I intended to just replace symbol literal with Symbol constructor invocation but I found Symbol is deprecated in the future.
#31569 (comment)
If the decision will not be changed, we need to remove the usage of symbols right? (If so, the title of this PR needs do be changed)

Changing to Column syntax is another question.

Yeah, at first, the syntax stuff is a separate one but after a discussion, the most part of the symbol usage was to be removed.
#31601 (comment)

@srowen
Copy link
Member

srowen commented Apr 16, 2021

That's right. That would indeed be a bigger, problematic change.

I'm saying that this change by itself isn't doing anything but removing deprecation warnings, so we're not arguing with it (except perhaps to debate the code churn).

If the argument is - we may have to make a bigger more painful change later anyway, so this isn't worth it, I buy that too. But this by itself seems plausible to merge.

@SethTisue
Copy link

fwiw, I will oppose any deprecation of Symbol itself. I don't think you should assume I will lose.

The literal syntax was worth deprecating because it decreases the amount of Scala syntax, which makes everything easier for tooling authors, educators, etc. Whereas deprecating Symbol itself would merely be annoying, to no great benefit.

@sarutak
Copy link
Member Author

sarutak commented Apr 19, 2021

fwiw, I will oppose any deprecation of Symbol itself. I don't think you should assume I will lose.

The literal syntax was worth deprecating because it decreases the amount of Scala syntax, which makes everything easier for tooling authors, educators, etc. Whereas deprecating Symbol itself would merely be annoying, to no great benefit.

O.K, I don't care about replacing Symbol for now. Thank you.

@cloud-fan
Copy link
Contributor

I think there are 2 topics here:

  1. Public Spark APIs that use Symbols. We can't change them as they are public APIs. We are fine if Scala won't remove the Symbol class. I assume these public APIs will be less popular over time, as they are hard to use without the symbol literal syntax. $"col_name" is better.
  2. Internal Spark APIs that use Symbols, especially the testing DSL. I'd prefer to not use Symbol anymore, as it becomes hard to use without the symbol literal syntax. e.g. LocalRelation('a.int) looks good, but LocalRelation(Symbol("a").int) looks cumbersome. We can create a new DSL syntax to make it easier to write tests in Spark. (see [SPARK-34484][SQL] Rename map to mapAttr in Catalyst DSL #31601)

What do you think?

@sarutak
Copy link
Member Author

sarutak commented Apr 20, 2021

Public Spark APIs that use Symbols. We can't change them as they are public APIs. We are fine if Scala won't remove the Symbol class. I assume these public APIs will be less popular over time, as they are hard to use without the symbol literal syntax. $"col_name" is better.

Fortunately, the usage of Symbols in public non-internal APIs is limited.

  • as and alias in Dataset.scala
  • as in Column.scala
  • typedlit in functions.scala

So I think we can decide to deprecate these APIs later.

Internal Spark APIs that use Symbols, especially the testing DSL. I'd prefer to not use Symbol anymore, as it becomes hard to use without the symbol literal syntax. e.g. LocalRelation('a.int) looks good, but LocalRelation(Symbol("a").int) looks cumbersome. We can create a new DSL syntax to make it easier to write tests in Spark. (see #31601)

As I noticed in #31601, most usage of Symbols seems to be in testing DSL.
Symbol constructor is used a little but most is symbol literals.
So I think we can safely replace them with another syntax.

@rxin
Copy link
Contributor

rxin commented Apr 20, 2021 via email

@sarutak
Copy link
Member Author

sarutak commented Apr 20, 2021

DataFrame API. Basically any time anybody wants to refer to a column... just look up stack over low. There are tons of answers with sample code doing ‘col.

I mean that we don't need to decide how we treat those public APIs for now and it's easy even if we (unfortunately) need to replace it because there are three APIs.
If Scala community doesn't remove symbol literals, it's best.

It looks like it’s possible it will be kept with an import and then Spark users wouldn’t need to suffer.

Yes. As I mentioned here I think users can use the language import feature too.

@cloud-fan
Copy link
Contributor

If the language import feature for symbol literal will be always there, I think we don't need any changes. We can probably wait until the Scala community decides to remove that language import.

@LuciferYang
Copy link
Contributor

Or we can temporarily suppress the compilation warnings to make it look cleaner

@cloud-fan
Copy link
Contributor

Or we can temporarily suppress the compilation warnings to make it look cleaner

+1

@SparkQA
Copy link

SparkQA commented Apr 20, 2021

Test build #137686 has finished for PR 31569 at commit 748bc9a.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@LuciferYang
Copy link
Contributor

LuciferYang commented Apr 20, 2021

@cloud-fan create a new jira SPARK-35151 (#32261)

@SethTisue
Copy link

If the language import feature for symbol literal will be always there

As I understand it, the language import was added to give the Spark project extra time to migrate; I don't think the intention was for Spark to then turn around and say "yay, now that we have this import we don't need to do anything".

@cloud-fan
Copy link
Contributor

the language import was added to give the Spark project extra time to migrate

If that's the case, I think we need to change our internal DSL to not use symbol literals, which is done by #31601 . Shall we do it now or defer it? cc @rxin

@smarter
Copy link

smarter commented Apr 29, 2021

If the language import feature for symbol literal will be always there,

It won't always be there, we've explicitly said that it can go away at any point: lampepfl/dotty-feature-requests#182 (comment).

@srowen
Copy link
Member

srowen commented Apr 29, 2021

Yup I think that if eventually it has to go away, then, maybe doesn't make sense to replace the deprecated syntax, just suppress warnings, and take it out entirely in the future all at once.

@LuciferYang
Copy link
Contributor

Do we need to continue this work?

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Test build #140475 has finished for PR 31569 at commit 748bc9a.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.