Skip to content

Conversation

@bbossy
Copy link
Contributor

@bbossy bbossy commented Aug 14, 2014

SPARK-3039: Adds the maven property "avro.mapred.classifier" to build spark-assembly with avro-mapred with support for the new Hadoop API. Sets this property to hadoop2 for Hadoop 2 profiles.

I am not very familiar with maven, nor do I know whether this potentially breaks something in the hive part of spark. There might be a more elegant way of doing this.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Aug 14, 2014

I've looked at this part of the build a lot and can say LGTM

@bbossy
Copy link
Contributor Author

bbossy commented Aug 14, 2014

Should I also add the avro.mapred.classifier property to the yarn profile? Maybe even yarn-alpha and mapr?

Since now to build it according to the README one should run: sbt/sbt -Dhadoop.version=2.2.0 -Pyarn -Davro.mapred.classifier=hadoop2 assembly

@srowen
Copy link
Member

srowen commented Aug 14, 2014

You have to specify a Hadoop profile already, and you added the classifier to all of them. So that's fine. Building with YARN is orthogonal, so doesn't belong elsewhere I think.

@bbossy
Copy link
Contributor Author

bbossy commented Aug 14, 2014

The problem I see, is that if you build according to the README:

# Apache Hadoop 2.2.X and newer
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly

avro.mapred.classifier will not be set to hadoop2

Either the README should be changed to account for this, or the property should be added to the yarn and yarn-alpha profile (not the mapr, I think)

Or is there a way to fix this with maven?

@srowen
Copy link
Member

srowen commented Aug 14, 2014

Yeah that's out of date I believe. For example -Phadoop-2.3 has to be specified with -Dhadoop.version=2.3.0. And I think mvn is the primary build now. I imagine you could correct this in the PR here. I wonder if the README should not just point to the web site rather than duplicate this info? the web docs are up to date.

@bbossy
Copy link
Contributor Author

bbossy commented Aug 14, 2014

Yeah, you're right about yarn being orthogonal to the Hadoop version.

Apart from the maven/sbt question there is another issue: The Cloudera CDH 4.2.0 with MapReduce v2 case from the README is not covered by a hadoop profile right now. I would need to change it to
sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Davro.mapred.classifier=hadoop2 -Pyarn assembly or the mvn equivalent.

@srowen
Copy link
Member

srowen commented Aug 15, 2014

I think it works with the invocation you describe. Honestly it's not a big priority, this version, but nice to get it right. Want to open a JIRA to track updating/deleting the info from README.md? I think it needs to be fixed one way or the other.

@bbossy
Copy link
Contributor Author

bbossy commented Aug 15, 2014

Created the issue: https://issues.apache.org/jira/browse/SPARK-3069 (Build instructions in README are outdated)

@srowen: Thank you for your input!

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

Yeah - LGTM pending tests.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have started for PR 1945 at commit c32ce59.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have finished for PR 1945 at commit c32ce59.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • throw new IllegalStateException("The main method in the given main class must be static")

@asfgit asfgit closed this in c243b21 Sep 15, 2014
asfgit pushed a commit that referenced this pull request Sep 15, 2014
SPARK-3039: Adds the maven property "avro.mapred.classifier" to build spark-assembly with avro-mapred with support for the new Hadoop API. Sets this property to hadoop2 for Hadoop 2 profiles.

I am not very familiar with maven, nor do I know whether this potentially breaks something in the hive part of spark. There might be a more elegant way of doing this.

Author: Bertrand Bossy <[email protected]>

Closes #1945 from bbossy/SPARK-3039 and squashes the following commits:

c32ce59 [Bertrand Bossy] SPARK-3039: Allow spark to be built using avro-mapred for hadoop2

(cherry picked from commit c243b21)
Signed-off-by: Patrick Wendell <[email protected]>
@andrewor14
Copy link
Contributor

Hey @pwendell @srowen @bbossy this is actually causing issues for SBT applications that use the spark-hive_2.10 module. More details can be found here: https://issues.apache.org/jira/browse/SPARK-4359. For now, I have reverted this in branch-1.1 to prepare for the Spark 1.1.1 release. It may need to be reverted in other branches as well. Just a heads up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants