[SPARK-10684] [SQL] StructType.interpretedOrdering need not to be serialized #8808

navis · 2015-09-18T02:12:27Z

Kryo fails with buffer overflow even with max value (2G).

{noformat}
org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1
Serialization trace:
containsChild (org.apache.spark.sql.catalyst.expressions.BoundReference)
child (org.apache.spark.sql.catalyst.expressions.SortOrder)
array (scala.collection.mutable.ArraySeq)
ordering (org.apache.spark.sql.catalyst.expressions.InterpretedOrdering)
interpretedOrdering (org.apache.spark.sql.types.StructType)
schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema). To avoid this, increase spark.kryoserializer.buffer.max value.
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

…ialized

SparkQA · 2015-09-18T06:58:39Z

Test build #1772 has finished for PR 8808 at commit a26512b.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class TaskCommitDenied(
- class Interaction(override val uid: String) extends Transformer
- abstract class LocalNode(conf: SQLConf) extends QueryPlan[LocalNode] with Logging

rxin · 2015-09-18T07:42:42Z

Thanks - I've merged this.

JoshRosen · 2015-09-18T16:13:11Z

@rxin, should this also go into 1.5.1.?

rxin · 2015-09-18T17:56:37Z

Yes - I cherry-picked it now. Thanks.

…ialized Kryo fails with buffer overflow even with max value (2G). {noformat} org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1 Serialization trace: containsChild (org.apache.spark.sql.catalyst.expressions.BoundReference) child (org.apache.spark.sql.catalyst.expressions.SortOrder) array (scala.collection.mutable.ArraySeq) ordering (org.apache.spark.sql.catalyst.expressions.InterpretedOrdering) interpretedOrdering (org.apache.spark.sql.types.StructType) schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema). To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Author: navis.ryu <[email protected]> Closes #8808 from navis/SPARK-10684. (cherry picked from commit e3b5d6c) Signed-off-by: Reynold Xin <[email protected]>

rxin · 2015-09-18T17:58:56Z

@navis can you give us the data type that caused this problem?

navis · 2015-09-22T00:07:57Z

@rxin It's just a table with 100+ string columns partitioned by a string key. It happened by a simple query just like select <100+> from <table> where <partition-key condition>.

uhonnavarkar · 2016-02-14T10:50:03Z

Is this problem still exist in Spark 1.5.2 / 1.6.0?

JoshRosen · 2016-02-16T18:28:44Z

@uhonnavarkar, this patch was incorporated into 1.5.1 and 1.6.0, if that's what you're asking.

uhonnavarkar · 2016-02-17T09:44:58Z

I have 2 question below.

I have downloaded pre-build version of spark-1.5.2-bin-hadoop2.6.tgz from spark website(http://spark.apache.org/downloads.html). still I see stack-trace in Removed reference to incubation in Spark user docs. #2 below. Is this package contains fix?
When I try to query the table(Dataframe) getting the exception. Also it is inconsistent, some time it works some time not.
Stack Trace:
Job aborted due to stage failure: Task 0 in stage 1063.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1063.0 (TID 15469, XX.XXX.XX.XXX): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 2. To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace:

And this is how I am creating Spark Context:
sparkConf = new SparkConf().setMaster(sparkMaster)
.setAppName("ABCTesting")
.set("spark.home", spark_home)
.set("spark.shuffle.consolidateFiles","true")
.set("spark.shuffle.manager","sort")
.set("spark.shuffle.spill", "false")
.set("spark.executor.memory", spark_executor_memory)
.set("spark.executor.extraClassPath", spark_executor_extra_classpath)
.set("spark.cores.max", spark_cores_max)
.set("spark.sql.shuffle.partitions", "15")
.set("spark.driver.memory", spark_driver_memory)
.set("spark.default.parallelism", "90")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
sparkContext = new JavaSparkContext(sparkConf);

…ialized Kryo fails with buffer overflow even with max value (2G). {noformat} org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1 Serialization trace: containsChild (org.apache.spark.sql.catalyst.expressions.BoundReference) child (org.apache.spark.sql.catalyst.expressions.SortOrder) array (scala.collection.mutable.ArraySeq) ordering (org.apache.spark.sql.catalyst.expressions.InterpretedOrdering) interpretedOrdering (org.apache.spark.sql.types.StructType) schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema). To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Author: navis.ryu <[email protected]> Closes apache#8808 from navis/SPARK-10684. (cherry picked from commit e3b5d6c) Signed-off-by: Reynold Xin <[email protected]> (cherry picked from commit 2c6a51e)

[SPARK-10684] [SQL] StructType.interpretedOrdering need not to be ser…

a26512b

…ialized

asfgit closed this in e3b5d6c Sep 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-10684] [SQL] StructType.interpretedOrdering need not to be serialized #8808

[SPARK-10684] [SQL] StructType.interpretedOrdering need not to be serialized #8808

Uh oh!

navis commented Sep 18, 2015

Uh oh!

SparkQA commented Sep 18, 2015

Uh oh!

rxin commented Sep 18, 2015

Uh oh!

JoshRosen commented Sep 18, 2015

Uh oh!

rxin commented Sep 18, 2015

Uh oh!

rxin commented Sep 18, 2015

Uh oh!

navis commented Sep 22, 2015

Uh oh!

uhonnavarkar commented Feb 14, 2016

Uh oh!

JoshRosen commented Feb 16, 2016

Uh oh!

uhonnavarkar commented Feb 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-10684] [SQL] StructType.interpretedOrdering need not to be serialized #8808

[SPARK-10684] [SQL] StructType.interpretedOrdering need not to be serialized #8808

Uh oh!

Conversation

navis commented Sep 18, 2015

Uh oh!

SparkQA commented Sep 18, 2015

Uh oh!

rxin commented Sep 18, 2015

Uh oh!

JoshRosen commented Sep 18, 2015

Uh oh!

rxin commented Sep 18, 2015

Uh oh!

rxin commented Sep 18, 2015

Uh oh!

navis commented Sep 22, 2015

Uh oh!

uhonnavarkar commented Feb 14, 2016

Uh oh!

JoshRosen commented Feb 16, 2016

Uh oh!

uhonnavarkar commented Feb 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants