Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapR non-standard Hadoop security not supported #70

Open
oandre7 opened this issue Jun 25, 2019 · 14 comments
Open

MapR non-standard Hadoop security not supported #70

oandre7 opened this issue Jun 25, 2019 · 14 comments

Comments

@oandre7
Copy link

oandre7 commented Jun 25, 2019

Hi guys,

I am sorry if it's a dummy/repeated question. We have been trying to follow the example to bring up dask on yarn and keep getting error "Kerberos ticket not found, please kinit and restart" even though the user starting the cluster does have a valid ticket.

Is there anywhere where I could specifically point to the ticket location at the runtime of the cluster? We have a hadoop cluster and wanted to use dask on yarn. Wondering if anybody has tried to work with this constellation. MapR hadoop cluster/ Dask on Yarn and could give us any pointers would be highly appreciated.

Thank you!
Andre

Error attached.
log-daskyarn.txt

@jcrist
Copy link
Member

jcrist commented Jun 25, 2019

This has to do with MapR forking Hadoop and not providing a 100% compatible authentication mechanism. The problematic code is here:

https://github.com/jcrist/skein/blob/6ac489e139f5169caae3a7a8415c92c418250e92/java/src/main/java/com/anaconda/skein/Driver.java#L248-L261

To provide a better error message than Hadoop API's do (just a deadlock :/), we try to detect user's forgetting to login before instantiating UserGroupInformation. Whatever MapR has done breaks this detection logic.

If you have any suggestions, I'd be welcome to a PR adding MapR support. I don't have a MapR cluster available for testing.

@oandre7
Copy link
Author

oandre7 commented Jun 26, 2019

Hi there,

thanks for the reply. We are in contact with MapR support we will try to get some sort of way forward on that.
We do have a Mapr Cluster and whatever help from them on that we will share in here.

@jcrist jcrist changed the title Kerberos ticket not found. (hadoop cluster) MapR non-standard Hadoop security not supported Jul 15, 2019
@costrouc
Copy link

costrouc commented Aug 16, 2019

@jcrist I am running into this exact same issue with a MapR cluster. Is there a temporary workaround that we could do? I would be interested in contributing this feature to dask-yarn. What do you think a fix would require?

@andregouveiasantana did you hear back from MapR about this issue?

@jcrist
Copy link
Member

jcrist commented Aug 16, 2019

The issue here is our check for whether a user is appropriately logged in before any requests are made. The Hadoop APIs block if the user isn't logged in (unfortunate design), so I'd like to keep this check around to provide a nicer user experience. Due to MapR's fork of Hadoop, our check code is incorrect. Without access to a MapR cluster to test on, I'm not sure what to do here. The MapR sandbox vm doesn't have security enabled, if you know of a way to get that test setup working and reproducible then I can take a look.

@costrouc
Copy link

Unfortunately I don't have any way to get a test setup working and reproducible. But I am happy to test for you and post the work around/solution and help in any way that I can.

The issue here is our check for whether a user is appropriately logged in before any requests are made.

So as a hack for now would patching out this check (assuming that I am properly logged in) work?

The Hadoop APIs block if the user isn't logged in

By blocks do you mean that it just hangs and doesn't give a response one way or the other?

@jcrist
Copy link
Member

jcrist commented Aug 16, 2019

But I am happy to test for you and post the work around/solution and help in any way that I can.

I'd need to experiment with a running MapR install to figure out how their fork is different, which would be hard to do remote.

So as a hack for now would patching out this check (assuming that I am properly logged in) work?

Yes.

By blocks do you mean that it just hangs and doesn't give a response one way or the other?

Yes, an error is logged but the request just hangs forever.

@oandre7
Copy link
Author

oandre7 commented Aug 23, 2019

Hey guys,

Sorry for the delay in the reply. We had been waiting for a reply from MapR, which took some more time that expected. Unfortunately it pretty much stated the obvious that the issue is related to the specific method of authentication used by MapR's implementation. I am attaching their reply for the moment. I didn't see anything that could help.
I am trying to get them to give some more details about their implementation and maybe help point it out what exactly needs to be done.
@jcrist, tks for letting the issue open and we will try whatever possible to get some further information. Maybe you can also tell me what exactly is needed from Mapr. I invited the developer to this thread, maybe he/she will be willing.
I will keep you guys posted...
Andre
email.txt

@jcrist
Copy link
Member

jcrist commented Aug 23, 2019

The question I want to know is how to check beforehand from a UserGroupInformation object if the user is authenticated in a MapR context. Have they added another method to check if MAPRSASL authentication succeeded? Is there a way to detect a user is running on a MapR cluster instead of standard hadoop? Since MapR is closed source I can't determine this myself, if you're still in contact with them this would be good to know.

@oandre7
Copy link
Author

oandre7 commented Aug 24, 2019

@jcrist, i have asked MapR for this infomation and if they can join the conversation here. Hopefully they can help..
@costrouc, have you managed to get working with the workaround of patching out the authentication check?

@bytesemantics
Copy link

bytesemantics commented Sep 11, 2019

I can confirm similar issues with HortonWorks Hadoop.

Some non-substantive research (google queries) - led me to:
https://community.cloudera.com/t5/Support-Questions/Connecting-to-Kerberos-Enabled-hive-via-JDBC-directly-from/m-p/95833

Having looked through the https://github.com/jcrist/skein codebase (used by dask-yarn for yarn connectivity) - I wonder if the approach to use the API "getLoginUser()" is best ?

Suggest change to use "getUGIFromTicketCache(ticketCache,userId)" - and add ticketCache and userId parameters as Driver arguments.

Note: that in my use case - we are explicitly using kinit prior to dask-yarn/skein instantiation and 'klist' reports a valid non-expired kerberos ticket.

@jcrist
Copy link
Member

jcrist commented Sep 16, 2019

dask-yarn (and skein, the underlying YARN client library) have been used successfully on hortonworks installations in the past (I've done it myself, and I know others that have as well). AFAIK hortonworks hasn't done anything special with their distribution, and skein works just fine with standard hadoop (while MapR has a fork with additional features we don't support, which is what this issue is about).

If you're having issues on hortonworks, please file a new issue in https://github.com/jcrist/skein where we can discuss them.

@pkvprakash
Copy link

Submitted a PR for skein : jcrist/skein#235

@pkvprakash
Copy link

@jcrist Can you please review the fix for this issue?

@Jes6ka
Copy link

Jes6ka commented Jul 13, 2023

Any update with the patching of the issue? I am using MapR hadoop too and facing exactly the same issue in 2023.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants