Skip to content

Conversation

@meawoppl
Copy link
Contributor

I have used this script to launch, destroy, start, and stop clusters successfully.

@meawoppl
Copy link
Contributor Author

The changes are pretty minimal honestly. I haven't tested 100% of the possible permutations.

Much thanks go to Eric Jonas who turned us on to this project.

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@JoshRosen
Copy link
Contributor

@nchammas is probably the right person to review this. Seems pretty straightforward to me.

@SparkQA
Copy link

SparkQA commented May 22, 2015

Test build #33302 has finished for PR 6336 at commit 2e87046.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, without this it passes a float to the start instances which get formatted like "1.0" and gets some xml barf back from amazon.

@nchammas
Copy link
Contributor

Thanks for this @meawoppl. I left some minor comments.

@meawoppl
Copy link
Contributor Author

I left my responses inline. There don't seem to be any tests that get applied to this script that I can find. . . am I right about that?

@nchammas
Copy link
Contributor

Nope, no tests apart from dev/lint-python which just checks for style.

This patch LGTM.

@meawoppl
Copy link
Contributor Author

Cool. Do I need to do anything like sign a contribution agreement, or put something in an authors file?

We here at 3scan are going to be making heavy use of spark, and I am happy to allocate some dev-time of myself and my team to get spark/Pyspark running really smoothly in ec2.

The next problem I have in mind is the one pertaining to large-cluster startup time. I can't find the JIRA issue off hand, but I suspect we can architect something to improve that situation significantly.

Please let me know if there are specific issues that might desire concerted effort.

@nchammas
Copy link
Contributor

Cool. Do I need to do anything like sign a contribution agreement, or put something in an authors file?

Nope, I think you're all set here.

The next problem I have in mind is the one pertaining to large-cluster startup time. I can't find the JIRA issue off hand, but I suspect we can architect something to improve that situation significantly.

Take a look at these issues:

  • SPARK-4325: Improve spark-ec2 cluster launch times
  • SPARK-5189: Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

I put in quite a bit of research into the problem of long launch times and would be more than happy to help you improve spark-ec2 in that area. Let's continue this discussion on the appropriate JIRAs.

@davies
Copy link
Contributor

davies commented May 26, 2015

LGTM, merge this into master and 1.4 branch.

asfgit pushed a commit that referenced this pull request May 26, 2015
…Python3

I have used this script to launch, destroy, start, and stop clusters successfully.

Author: meawoppl <[email protected]>

Closes #6336 from meawoppl/py3ec2spark and squashes the following commits:

2e87046 [meawoppl] Py3 compat fixes.

(cherry picked from commit 8dbe777)
Signed-off-by: Davies Liu <[email protected]>
@asfgit asfgit closed this in 8dbe777 May 26, 2015
@meawoppl
Copy link
Contributor Author

Thanks guys!

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…Python3

I have used this script to launch, destroy, start, and stop clusters successfully.

Author: meawoppl <[email protected]>

Closes apache#6336 from meawoppl/py3ec2spark and squashes the following commits:

2e87046 [meawoppl] Py3 compat fixes.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…Python3

I have used this script to launch, destroy, start, and stop clusters successfully.

Author: meawoppl <[email protected]>

Closes apache#6336 from meawoppl/py3ec2spark and squashes the following commits:

2e87046 [meawoppl] Py3 compat fixes.
@meawoppl
Copy link
Contributor Author

meawoppl commented Jul 9, 2015

This script is also in a weird state as it depends on the spark-mesos tooling, as well as some external deps (compiled versions/binaries etc), so its administration and updating dosen't have a single strong champion across projects.

@JoshRosen
Copy link
Contributor

@meawoppl
Copy link
Contributor Author

I dosen't appear there was ever a consensus reached there. My major
complaint in this all is that I actually had to sit down an draw a giant
flow chart of how all this stuff hooks together to debug/diagnose getting
it running with Anaconda python3, and undertaking which remains non-trivial
even for the next person. The fact remains that at present its kind of a
b!tch get this all running smoothly, and the split-brain repository thing
isn't helping anyone.

--Matthew Goodman

Check Out My Website: http://craneium.net
Find me on LinkedIn: http://tinyurl.com/d6wlch

On Thu, Jul 9, 2015 at 4:45 PM, Josh Rosen [email protected] wrote:

@meawoppl https://github.com/meawoppl, there's a discussion of some of
those issues on the spark-dev mailing list:
http://mail-archives.apache.org/mod_mbox/incubator-spark-dev/201507.mbox/%3CCAOhmDzcnYgswssNP11VbGzSLisOKjGfnuMQMQc7yHiDL5SusmA%40mail.gmail.com%3E


Reply to this email directly or view it on GitHub
#6336 (comment).

@meawoppl
Copy link
Contributor Author

Should we chime in on that thread?

--Matthew Goodman

Check Out My Website: http://craneium.net
Find me on LinkedIn: http://tinyurl.com/d6wlch

On Thu, Jul 9, 2015 at 9:45 PM, Matt Goodman [email protected] wrote:

I dosen't appear there was ever a consensus reached there. My major
complaint in this all is that I actually had to sit down an draw a giant
flow chart of how all this stuff hooks together to debug/diagnose getting
it running with Anaconda python3, and undertaking which remains non-trivial
even for the next person. The fact remains that at present its kind of a
b!tch get this all running smoothly, and the split-brain repository thing
isn't helping anyone.

--Matthew Goodman

Check Out My Website: http://craneium.net
Find me on LinkedIn: http://tinyurl.com/d6wlch

On Thu, Jul 9, 2015 at 4:45 PM, Josh Rosen [email protected]
wrote:

@meawoppl https://github.com/meawoppl, there's a discussion of some of
those issues on the spark-dev mailing list:
http://mail-archives.apache.org/mod_mbox/incubator-spark-dev/201507.mbox/%3CCAOhmDzcnYgswssNP11VbGzSLisOKjGfnuMQMQc7yHiDL5SusmA%40mail.gmail.com%3E


Reply to this email directly or view it on GitHub
#6336 (comment).

@srowen
Copy link
Member

srowen commented Jul 10, 2015

@meawoppl yes please

@meawoppl
Copy link
Contributor Author

Done.

--Matthew Goodman

Check Out My Website: http://craneium.net
Find me on LinkedIn: http://tinyurl.com/d6wlch

On Fri, Jul 10, 2015 at 12:34 AM, Sean Owen [email protected]
wrote:

@meawoppl https://github.com/meawoppl yes please


Reply to this email directly or view it on GitHub
#6336 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants