Skip to content

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Oct 2, 2014

Original problem is SPARK-3764.

AppendingParquetOutputFormat uses a binary-incompatible method context.getTaskAttemptID.
This causes binary-incompatible of Spark itself, i.e. if Spark itself is built against hadoop-1, the artifact is for only hadoop-1, and vice versa.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21231/

@ueshin ueshin changed the title [SPARK-3771][SQL] AppendingParquetOutputFormat should use reflection to prevent breaking binary-compatibility. [SPARK-3771][SQL] AppendingParquetOutputFormat should use reflection to prevent from breaking binary-compatibility. Oct 3, 2014
@SparkQA
Copy link

SparkQA commented Oct 3, 2014

QA tests have started for PR 2638 at commit ec213c1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 3, 2014

QA tests have finished for PR 2638 at commit ec213c1.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • println(s"Failed to load main class $childMainClass.")
    • case class GetPeers(blockManagerId: BlockManagerId) extends ToBlockManagerMaster

@srowen
Copy link
Member

srowen commented Oct 3, 2014

A particular instance of Spark will be built for a particular version of Hadoop and/or YARN. It is not at this point a universal binary anyway, and so, I do not think it is necessary to add this indirection via reflection. That is, if you are deploying on Hadoop 1, you need to build Spark for Hadoop 1, and similarly for Hadoop 2.

@ueshin
Copy link
Member Author

ueshin commented Oct 3, 2014

@srowen, Thank you for your comment.
Indeed, when deploy completed apps to Spark cluster, there is a particular instance of Spark.
But Spark app developers will use artifacts in Maven Central while developing and unit-testing. The artifacts seem to be built for Hadoop 2, so if they want to test with Hadoop 1, it won't work.
What do you think?

@marmbrus
Copy link
Contributor

marmbrus commented Oct 9, 2014

@ueshin I'm not sure I fully understand. What are the two method signatures in question such that it compiles but then fails at runtime. Can you perhaps include these details in a comment?

@srowen are you satisfied with that explanation?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ueshin
Copy link
Member Author

ueshin commented Oct 9, 2014

@marmbrus, Thank you for your comment.
The TaskAttemptContext is a class in hadoop-1 but is an interface in hadoop-2.
The signatures of the method TaskAttemptContext.getTaskAttemptID for the both versions are the same, so the method calls are source-compatible but NOT binary-compatible because the opcode of method call for class is INVOKEVIRTUAL and for interface is INVOKEINTERFACE.

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have started for PR 2638 at commit efd3784.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have finished for PR 2638 at commit efd3784.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21566/Test PASSed.

@marmbrus
Copy link
Contributor

Thanks! Merged.

@asfgit asfgit closed this in 73da9c2 Oct 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants