Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to configure EC2Cluster to start a publicly accessible Scheduler while keeping Workers private #388

Open
filpano opened this issue Nov 4, 2022 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed provider/aws/ec2 Cluster provider for AWS EC2 Instances

Comments

@filpano
Copy link

filpano commented Nov 4, 2022

For existing discussion, see this Discourse thread.


I have a use case where I wish to develop something locally and then start up a dask_cloudprovider.aws.EC2Cluster such that all Scheduler/Worker communication happens on the internal EC2 Network (i.e. via the private VPC/Subnet IP addresses), but still allow me to set up a rule so that my IP can access the scheduler Dashboard.

The current normal development flow is to deploy this cluster from within AWS. This is a bit burdensome since it means I need to manage a separate VM and also code deployment to this VM (or rebuilding an image every time I make a code change).

Currently, when I start an EC2Cluster with use_private_ip=False, the Scheduler advertises its public IP to the workers. A relatively straightforward security group configuration (specific to AWS) might be:

Port range Source
8786 - 8787 [own group]
8786 - 8787 [my ip]

which would technically allow the communication described above, but does not work since it does not allow public IP<->public IP communication. Creating this group is probably outside of the scope of responsibilities for dask_cloudprovider (as opposed to the dask-default one), and it would suffice to be able to use it as given above.

According to @jacobtomlinson , it might already be possible to do this with an ECSCluster, though I haven't verified that functionality.


As a side note, I was able to get this working locally by monkey-patching dask_cloudprovider.aws.ec2#configure_vm to return instance['PublicDnsName'] instead of instance['PublicIpAddress'], but this is a super hacky workaround that:

  • Only works because AWS resolves DNS as the public IP externally and private IP internally
  • Requires the workers to have public IP addresses that it does not need (and are not reachable)

The solution to this would likely not use anything like the above, but I thought the information might be helpful in this context.

@filpano
Copy link
Author

filpano commented Nov 4, 2022

If time allows, I'd love to work on this to also get more familiar with the dask_cloudprovider ecosystem since it does not seem like an overly complicated problem, especially if a similar solution already exists for ECS.

I'm relatively new to Python and Dask, though, so I might need a bit of hand-holding/a few pointers to get started... if that's acceptable, let me know.

@jacobtomlinson jacobtomlinson added enhancement New feature or request help wanted Extra attention is needed provider/aws/ec2 Cluster provider for AWS EC2 Instances labels Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed provider/aws/ec2 Cluster provider for AWS EC2 Instances
Projects
None yet
Development

No branches or pull requests

2 participants