Skip to content

Conversation

@atongen
Copy link

@atongen atongen commented Jan 27, 2016

We have a similar need to what is proposed in #10768 by @Astralidea and reviewed by @dragos. This pull request implements the suggestion in the comments of that PR.

It takes logic similar to what is found in #5563 for the executors and applies it to the mesos cluster scheduler.

The scheduler will now only accept offers from agents with attributes matching the constraints from the submission.

@dragos
Copy link
Contributor

dragos commented Jan 27, 2016

ok to test

@dragos
Copy link
Contributor

dragos commented Jan 27, 2016

I'll try it out. The code looks good otherwise, thanks for picking up that PR!

@atongen
Copy link
Author

atongen commented Jan 29, 2016

Hi @dragos, I'm not sure how to grant access rights to the CI server, but let me know if there's anything I can do to help out here. Thanks.

@dragos
Copy link
Contributor

dragos commented Jan 29, 2016

The failure is spurious, git failed to check out.

On 29 ian. 2016, at 19:35, Andrew Tongen [email protected] wrote:

Hi @dragos, I'm not sure how to grant access rights to the CI server, but let me know if there's anything I can do to help out here. Thanks.


Reply to this email directly or view it on GitHub.

@andrewor14
Copy link
Contributor

retest this please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail scalastyle tests; the line needs to be below 100 chars

@andrewor14
Copy link
Contributor

Also cc @tnachen who wrote this code originally. From the JIRA:

CoarseMesosSchedulerBackend have constraints feature but dispacher deploy use MesosClusterScheduler, it is different method.

Is this caused by duplicate code somewhere? Can we resolve that, either in this patch or separately?

@andrewor14
Copy link
Contributor

By the way, I'd just like to point out that there is another patch that fixes the same issue #10768. @tnachen @dragos what's the difference and which one should we proceed with?

@SparkQA
Copy link

SparkQA commented Feb 1, 2016

Test build #50500 has finished for PR 10949 at commit 7c6650d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dragos
Copy link
Contributor

dragos commented Feb 2, 2016

@andrewor14 this is the one that should go forward. The first sentence of this PR says:

We have a similar need to what is proposed in #10768 by @Astralidea and reviewed by @dragos. This pull request implements the suggestion in the comments of that PR.

Unfortunately I can't close #10768.

@dragos
Copy link
Contributor

dragos commented Feb 2, 2016

Regarding sharing code: The logic to check constraints is already shared. The actual resource processing isn't. Maybe there is room to share more logic. I opened SPARK-10444.

@SparkQA
Copy link

SparkQA commented Feb 11, 2016

Test build #51103 has started for PR 10949 at commit 6c934bd.

@atongen
Copy link
Author

atongen commented Feb 11, 2016

Pushed changes to address scalastyle test failures.

Also, in order to run multiple dispatchers on the same mesos cluster, you should set spark.deploy.zookeeper.dir to something other than the default, which is "/spark_mesos_dispatcher", for each dispatcher or mesos will use the same zookeeper path and there will be conflicts.

@shaneknapp
Copy link
Contributor

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Feb 11, 2016

Test build #51113 has finished for PR 10949 at commit 6c934bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen
Copy link
Contributor

tnachen commented Feb 12, 2016

Btw can you add a quick unit test for this? We've added tests before already so should be straightforward to do so.

@dragos
Copy link
Contributor

dragos commented Feb 16, 2016

Good idea about the unit test. I don't think it's too hard to add one along the lines of what's already in MesosClusterSchedulerSuite.

@dragos
Copy link
Contributor

dragos commented Feb 23, 2016

Hey, @atongen will you have time to look into the additional test?

@atongen
Copy link
Author

atongen commented Feb 26, 2016

Yes, I'll be able to add tests. Sorry for the delay!

@BrickXu
Copy link

BrickXu commented Mar 8, 2016

any updates here ?

@Astralidea
Copy link

@atongen this PR dose not merge into spark 1.6.1, I hope it could merge it to some version like 1.6.2.

@atongen
Copy link
Author

atongen commented Mar 16, 2016

@tnachen, @dragos : I reviewed the tests introduced by #5563, and from what I can tell they are mainly testing only MesosSchedulerUtils#matchesAttributeRequirements in regard to attribute constraints; which is the only significant thing introduced by this PR.

Without further refactoring, testing at the next level up (MesosClusterScheduler#scheduleTasks) would require a functioning MesosClusterPersistenceEngine (not the black hole), and quite a bit of additional scaffolding.

Let me know if this is still a requirement. I would like to confirm before putting in the effort.

@tnachen
Copy link
Contributor

tnachen commented Mar 16, 2016

I think we should add tests and I don't think it requires that much refactoring, if you look at MesosClusterSchedulerSuite you can see the test "can handle multiple roles" already tries a submission and verifies it uses the Offer passed in, we can also test by doing a similiar setup where we have a Offer with attributes and without and verify it's performing the correct logic. Tests is very important and we're looking to really increase our coverage as it's getting harder and harder to catch things.

@atongen
Copy link
Author

atongen commented Mar 17, 2016

Ok, I'll look into it further, thanks!

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53852 has finished for PR 10949 at commit 82f71bc.

  • This patch fails R style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@dragos
Copy link
Contributor

dragos commented Mar 30, 2016

@atongen can you please rebase? The tests look good, but I'd like to see the test suite passing.

@dragos
Copy link
Contributor

dragos commented Apr 4, 2016

Hey @atongen, I think this is really close to being merged, can you please rebase?

@dragos
Copy link
Contributor

dragos commented Apr 27, 2016

ping @atongen

@atongen atongen force-pushed the mesos-scheduler-constraints branch from 82f71bc to e6001e9 Compare May 9, 2016 20:13
@atongen
Copy link
Author

atongen commented May 9, 2016

Hello @dragos, again, sorry for all the delays. This branch has been rebased onto master, but I've been having some trouble getting the test suite to run. Please advise.

@SparkQA
Copy link

SparkQA commented May 9, 2016

Test build #58169 has finished for PR 10949 at commit e6001e9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen
Copy link
Contributor

tnachen commented Jun 2, 2016

@atongen please rebase and try again.

* mesos scheduler respect agent constraints
* reduce line length for scalastyle test
* update test suites
@atongen atongen force-pushed the mesos-scheduler-constraints branch from e6001e9 to ce3047d Compare June 13, 2016 16:31
@SparkQA
Copy link

SparkQA commented Jun 13, 2016

Test build #60406 has finished for PR 10949 at commit ce3047d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@atongen
Copy link
Author

atongen commented Jun 13, 2016

@tnachen Tests updated and rebased. Let me know if there's anything else I can do to help out.

@evilezh
Copy link

evilezh commented Feb 6, 2017

any update on this ? It is real pain with driver. As i see patch is ready .. question is about when you can merge ?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@atongen
Copy link
Author

atongen commented Mar 6, 2018

I am going to close this PR because there doesn't appear to be any interest in getting it merged. It's unfortunate, because it was a nice feature.

@atongen atongen closed this Mar 6, 2018
@dragos
Copy link
Contributor

dragos commented Mar 6, 2018

I think both @tnachen and I have moved on to the non-Spark world in the meantime. Anyway, neither of us had commit rights. I agree it's a pity to drop it, perhaps @andrewor14 could help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants