[SPARK-26555][SQL] make ScalaReflection subtype checking thread safe#24085
[SPARK-26555][SQL] make ScalaReflection subtype checking thread safe#24085mwlon wants to merge 3 commits intoapache:masterfrom
Conversation
|
|
||
| private[catalyst] def isSubtype(tpe1: `Type`, tpe2: `Type`): Boolean = { | ||
| this.synchronized { | ||
| tpe1 <:< tpe2 |
There was a problem hiding this comment.
Hm, crazy. This is a global lock though and I'm afraid it will bottleneck/deadlock something. I'm wondering if there is any other way, or if this even catches all the thread-safety issues
There was a problem hiding this comment.
In my experiments it does. Are you familiar with a situation where it might not?
One thing I'm not totally certain about is whether it's as efficient (performance-wise) to lock here (which means accessing the lock many times per function) vs. locking once each function.
There was a problem hiding this comment.
Yeah, that's the problem, I would bet this creates some thread contention bottlenecks. It's a problem although somewhat niche w.r.t. Spark as I'm not sure you'd perform operations in parallel, usually, that called this code. It's more of a Scala bug; lightweight workarounds on the Spark end are OK but I'm unsure if this, even if it's the whole fix, is worth the contention. That said it's only an educated guess.
There was a problem hiding this comment.
It is important to workaround this in Spark, since it breaks thread safety in many important functions like createDataset. I can give examples of where it is necessary to use multithreading.
Trying this with concurrency 5, this seems to slow down subtype checking by ~4%, which I wouldn't worry about since it's never the bottleneck in a Spark application. I'm happy to change this to larger blocks of locking rather than a helper function, but it's important that we resolve this issue.
There was a problem hiding this comment.
I tend to agree it's worth the cost, for correctness. I'd hope there's a finer-grained way to lock to avoid this, but I don't know if there is, as it's quite specific to Scala internals. Let me ask dev@ for any ideas or input.
| */ | ||
| def dataTypeFor[T : TypeTag]: DataType = dataTypeFor(localTypeOf[T]) | ||
|
|
||
| private[catalyst] def isSubtype(tpe1: `Type`, tpe2: `Type`): Boolean = { |
There was a problem hiding this comment.
can we add some comments to explain it and link to the scala bug report?
|
There used to be a similar logic: https://github.com/apache/spark/pull/1423/files for a similar issue for 2.10 |
|
Good point @skonto , that looks like it was added to address a Scala bug that had been fixed, and then removed as a result. That's good precedent for reintroducing it. |
|
interesting. We removed that hack because we believe the bug has been fixed in Scala 2.11. Seems we were wrong... |
|
@cloud-fan yeah I think something else was fixed and this is a different issue of the same type: scala/bug#10766 (comment) |
|
Test build #4626 has finished for PR 24085 at commit
|
|
I'd like to merge this back to 2.3 if there are no concerns. It was indeed a mechanism that already existed and looks like it is needed post-2.10 anyway. |
|
Shall we go ahead and merge this? |
|
thanks, merging to master! We need individual PRs for 2.4 and 2.3, as |
Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). Existing tests and a new one for the new subtype checking function. Closes apache#24085 from mwlon/SPARK-26555. Authored-by: mwlon <mloncaric@hmc.edu> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
I've opened a 2.4.x backport for this at #24913 |
…thread safe This is a Spark 2.4.x backport of #24085. Original description follows below: ## What changes were proposed in this pull request? Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). ## How was this patch tested? Existing tests and a new one for the new subtype checking function. Closes #24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport. Authored-by: mwlon <mloncaric@hmc.edu> Signed-off-by: Josh Rosen <rosenville@gmail.com>
…thread safe This is a Spark 2.4.x backport of apache#24085. Original description follows below: ## What changes were proposed in this pull request? Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). ## How was this patch tested? Existing tests and a new one for the new subtype checking function. Closes apache#24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport. Authored-by: mwlon <mloncaric@hmc.edu> Signed-off-by: Josh Rosen <rosenville@gmail.com>
…thread safe This is a Spark 2.4.x backport of apache#24085. Original description follows below: ## What changes were proposed in this pull request? Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). ## How was this patch tested? Existing tests and a new one for the new subtype checking function. Closes apache#24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport. Authored-by: mwlon <mloncaric@hmc.edu> Signed-off-by: Josh Rosen <rosenville@gmail.com>
…thread safe This is a Spark 2.4.x backport of apache#24085. Original description follows below: ## What changes were proposed in this pull request? Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). ## How was this patch tested? Existing tests and a new one for the new subtype checking function. Closes apache#24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport. Authored-by: mwlon <mloncaric@hmc.edu> Signed-off-by: Josh Rosen <rosenville@gmail.com>
## What changes were proposed in this pull request?
Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).
## How was this patch tested?
Existing tests and a new one for the new subtype checking function.
Closes apache#24085 from mwlon/SPARK-26555.
Authored-by: mwlon <mloncaric@hmc.edu>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
RB=1924052
BUG=LIHADOOP-50758
G=superfriends-reviewers
R=fli,latang,mshen,zolin,yezhou
A=mshen
What changes were proposed in this pull request?
Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).
How was this patch tested?
Existing tests and a new one for the new subtype checking function.