Skip to content

Conversation

@prabhjyotsingh
Copy link
Contributor

@prabhjyotsingh prabhjyotsingh commented Aug 11, 2016

What is this PR for?

While running a Notebook using shell, spark, python uses same user as which zeppelin server is running. Which means these interprets have same permission on file system as zeppelin server.
IMO users should be able to impersonate themselves as a complete security system.

What type of PR is it?

[Improvement]

Todos

  • - Update doc
  • - FIX NPEs
  • - FIX CI

What is the Jira issue?

How should this be tested?

  • Enable shiro auth in shiro.ini
  • Add ssh key for the same user you want to try and impersonate (say user1).
adduser user1
ssh-keygen
ssh user1@localhost mkdir -p .ssh
cat ~/.ssh/id_rsa.pub | ssh user1@localhost 'cat >> .ssh/authorized_keys'
  • Start zeppelin server, try and run following in paragraph in a notebook
  • Go to interpreter setting page, and enable "User Impersonate" in any of the interpreter (in my example its shell interpreter)
%sh
whoami

Check that it should run as new user, i.e. "user1"

Screenshots (if appropriate)

user impersonate

Questions:

  • Does the licenses files need update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? yes

@felixcheung
Copy link
Member

shouldn't interpreter process be impersonating the user logging onto the web front end?

@prabhjyotsingh
Copy link
Contributor Author

@felixcheung Fair point, let me try and do it, will change the title to WIP for now.

@prabhjyotsingh prabhjyotsingh changed the title [ZEPPELIN-1320] Security fix for Shell/Spark and Python Interpreter [WIP] [ZEPPELIN-1320] Security fix for Shell/Spark and Python Interpreter Aug 12, 2016

private Interpreter getInterpreter(String noteId, InterpreterSetting setting, String name) {
private Interpreter getInterpreter(String noteId, InterpreterSetting setting, String name,
String userName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a nitpick, but horizontal alignment is controversial idea and generally is discouraged by the styleguide as it creates a "blast radius" of re-formatting in case of future changes i.e renaming a function.

# Conflicts:
#	zeppelin-web/src/app/interpreter/interpreter.controller.js
#	zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterFactory.java
#	zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterOption.java
@prabhjyotsingh prabhjyotsingh changed the title [WIP] [ZEPPELIN-1320] Security fix for Shell/Spark and Python Interpreter [ZEPPELIN-1320] Run zeppelin interpreter process as web front end user Aug 16, 2016
@prabhjyotsingh
Copy link
Contributor Author

CI green! Ready for review.

u)
ZEPPELIN_SSH_COMMAND="ssh ${OPTARG}@localhost "
;;
esac
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires the login user must exist in the os account and be able to ssh to localhost. I am not sure whether this is a good way, but just feel the approach is a little strange compared to the impersonation implementation in hadoop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjffdu yes, I agree, its not as implementation in hadoop, would you recommend something else ?

@jongyoul
Copy link
Member

I agree that it's simple way to use ssh to support impersonation. but I'm worried about it. First, we should consider not to use ssh server in a local machine. It's disabled on Mac by default and in case of Windows users, they might not have any ssh server. Second, even if all of users can use connect their machine via ssh, all of users' name should be the same as system users. AFAIK, Some Zeppelin use cases, the system admin uses virtual users as well. Do you think of it?

@prabhjyotsingh
Copy link
Contributor Author

Yes, I thought about the usage in mac and windows, and initially started of with using RUNAS ${userName} for windows and su - ${userName} for *nix systems, but then it requires zeppelin server to run as root. Hence, implemented with ssh ${userName}@localhost.

Have not thought about the cases in which system admin uses virtual users.

Now since with this, we are able to propagate end web user to RemoteInterpreterManagedProcess.start, we can choose to use some other mechanism in interpreter.sh/interpreter.cmd instead of "ssh", or may be make it configurable using some extra config in "zeppelin-env.sh"

What do you recommend, that would be a secure and all full proof mechanism by which we can run interpreter as different user ?


import com.google.gson.Gson;
import com.google.gson.reflect.TypeToken;
import java.util.*;
Copy link
Member

@bzz bzz Aug 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a nitpick but Zeppelin's Java code conventions discourages usage of wildcard imports.

Could you please check all the other changed files to follow this convention as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll revert this, and check that my Editor (Intellij Idea) is also configured properly.

@jongyoul
Copy link
Member

@prabhjyotsingh I don't know how to support different users' environments fully, actually. But I think it's better to use RUNAS ~ and su - ~ and using ssh without password make some security issues. In case Mesos, it uses that way to support restrict resources. But I never see using ssh without password. How do you think of it?

@jongyoul
Copy link
Member

@prabhjyotsingh Without issues above, Could you check this PR support scoped as well which uses multiple threads in one process?

@Leemoonsoo
Copy link
Member

If i add one more,
What do you guys think about adding an option Impersonate in the interpreter setting on GUI?

That'll give user flexibility of selecting current behavior (without impersonation) and new behavior. Otherwise, this PR will make incompatible user behavior change.

@prabhjyotsingh
Copy link
Contributor Author

It's better to use RUNAS ~ and su - ~

@jongyoul How about I make use RUNAS ~ and su - ~ by default, but if in zeppelin-env.sh a property say USE_SSH_IMPERSONATION is set to true, then it will use ssh web-user@localhost in this way user gets to decide, what is best suited for their user case.

Could you check this PR support scoped as well which uses multiple threads in one process?

Yes I've checked this with Shell and Python interpreter it was working as expected.

@Leemoonsoo, yes agreed, I too think this options should be there, and have implemented it as well. If you take a look at GIF attached in this PR description, it's doing that you are asking for :)

@echarles
Copy link
Member

Whatever su or ssh is used, I feel the main trick is the user provisioning on the host running the interpreter. Until now, the shiro authentication system had no impact on the user provisioning. This PR changes this.

I guess we all agree and are aware that adding user foo to shiro.ini, and enabling impersonation, will require to run adduser foo manually.

We should make this clear in the doc but also stress it in the UI (with a hover, or a clear text/link near the User Impersonate.

@prabhjyotsingh
Copy link
Contributor Author

@echarles , Yes agreed, will need to update in doc, and a extra toolbar near the check box where user can enable User Impersonate.

@echarles
Copy link
Member

To make ZEPPELIN-1337 Umbrella for multiple user support for zeppelin more readable, should we rename the following:

  • ZEPPELIN-1340: "Run Hadoop-based interpreter process on Kerberos as web front end user"
  • ZEPPELIN-1320: "Run zeppelin interpreter process as web front end user"

@echarles
Copy link
Member

... and make ZEPPELIN-1320 a subtask of ZEPPELIN-1337

?

@prabhjyotsingh
Copy link
Contributor Author

prabhjyotsingh commented Aug 22, 2016

Yes, you are right, let me do it right away.

@jongyoul
Copy link
Member

@prabhjyotsingh I agree @echarles's idea. Interpreter tries to find hadoop dependencies first and if it passes, it uses doAs. Otherwise, let's talk about how to do it. How do you think of it?

@prabhjyotsingh
Copy link
Contributor Author

Sure, In this PR I was only thinking about the otherwise case i.e. in the environment where hadoop dependencies where not present, and hence start interpreter as end-web-user.

@echarles
Copy link
Member

Btw, for the hadoop case (or spark on yarn case), this PR may give an issue for doAs.

Typically, you configure hadoop.proxyuser.foo.hosts and hadoop.proxyuser.foo.group, foo being the os/kerberos user under which you run your java code that calls doAs.

If we run ssh/su as the front-end user, we will not fullfill what the hadoop/yarn cluster is expecting.

We thus should have two checkboxes:

  • One for the OS/kerberos impersonation (this PR only adresses OS).
  • The other for Hadoop impersonation.

If you select one, I would expect the other one to be disabled.

@prabhjyotsingh
Copy link
Contributor Author

Agreed @echarles, the doAs part will be a problem, until ZEPPELIN-1340 is resolved. Until then for security we may have to run half interpreter with "User Impersonate" enable from UI (for example shell, python interpreter), and for others use the standard doAs already implemented (like livy, spark, jdbc)

@Leemoonsoo
Copy link
Member

Instead of USE_SSH_IMPERSONATION, how about let user customize impersonation method?
For example,

ZEPPELIN_INTERPRETER_IMPERSONATION_CMD="su - ${ZEPPELIN_USER_NAME}"

by default. but user can override this env variable, like

ZEPPELIN_INTERPRETER_IMPERSONATION_CMD="ssh -p12345 ${ZEPPELIN_USER_NAME}@localhost"

It gives more flexibility i think. (e.g. give additional options like -p. use different command to impersonate)

@prabhjyotsingh
Copy link
Contributor Author

@Leemoonsoo yes thats a good suggestion. Let me try and do it.

@astroshim
Copy link
Contributor

I got following checkstyle error while building source.

[INFO] There are 1 checkstyle errors.
[ERROR] NotebookServer.java[1381] (sizes) LineLength: Line is longer than 100 characters (found 102).

@prabhjyotsingh Could you fix this?

@prabhjyotsingh
Copy link
Contributor Author

Closing this, will open a new one with merge of #1265.

asfgit pushed a commit that referenced this pull request Nov 18, 2016
Have recreated this from #1322
### What is this PR for?

While running a Notebook using shell, spark, python uses same user as which zeppelin server is running. Which means these interprets have same permission on file system as zeppelin server.
IMO users should be able to impersonate themselves as a complete security system.
### What type of PR is it?

[Improvement]
### Todos
- [x] - Update doc
- [x] - FIX NPEs
- [x] - FIX CI
### What is the Jira issue?
- [ZEPPELIN-1320](https://issues.apache.org/jira/browse/ZEPPELIN-1320)
### How should this be tested?
- Enable shiro auth in shiro.ini
- Add ssh key for the same user you want to try and impersonate (say user1).

```
adduser user1
ssh-keygen
ssh user1localhost mkdir -p .ssh
cat ~/.ssh/id_rsa.pub | ssh user1localhost 'cat >> .ssh/authorized_keys'
```
- Start zeppelin server, try and run following in paragraph in a notebook
- Go to interpreter setting page, and enable "User Impersonate" in any of the interpreter (in my example its shell interpreter)

```
%sh
whoami
```

Check that it should run as new user, i.e. "user1"
### Screenshots (if appropriate)

![user impersonate](https://cloud.githubusercontent.com/assets/674497/20213127/f32fdc52-a82c-11e6-8e33-aebd6a943c5f.gif)

### Questions:
- Does the licenses files need update? no
- Is there breaking changes for older versions? no
- Does this needs documentation? yes

Author: Prabhjyot Singh <[email protected]>

Closes #1554 from prabhjyotsingh/ZEPPELIN-1320-2 and squashes the following commits:

dc69c9d [Prabhjyot Singh] @Leemoonsoo review comment: making ZEPPELIN_SSH_COMMAND configurable
1b26cc0 [Prabhjyot Singh] add doc
5a76839 [Prabhjyot Singh] show User Impersonate only when interpreter setting is "per user" and "isolated"
02c3084 [Prabhjyot Singh] Merge remote-tracking branch 'origin/master' into ZEPPELIN-1320-2
03b2f20 [Prabhjyot Singh] use user instead of ""
0ff80ec [Prabhjyot Singh] Merge remote-tracking branch 'origin/master' into ZEPPELIN-1320-2
dd0731d [Prabhjyot Singh] fix missing test cases
aff1bf0 [Prabhjyot Singh] user should have option to run these interpreters as different user.
@zjffdu
Copy link
Contributor

zjffdu commented Nov 23, 2016

Sorry for late comment. I was in vacation in the last 2 weeks. I found this didn't work for spark interpreter. @prabhjyotsingh Did you try it for spark interpreter and other interpreters ?

@prabhjyotsingh
Copy link
Contributor Author

@zjffdu Yes, you are right, with SPARK_HOME/SPARK_SUBMIT it doesn't work.

@zjffdu
Copy link
Contributor

zjffdu commented Nov 23, 2016

Then I think we should either revert this PR or fix it for spark interpreter as well. Because spark interpreter is the most important interpreter of zeppelin IMO.

@prabhjyotsingh
Copy link
Contributor Author

Sure make sense I'll try to fix it ASAP. https://issues.apache.org/jira/browse/ZEPPELIN-1701

tae-jun pushed a commit to tae-jun/zeppelin that referenced this pull request Nov 23, 2016
Have recreated this from apache#1322
### What is this PR for?

While running a Notebook using shell, spark, python uses same user as which zeppelin server is running. Which means these interprets have same permission on file system as zeppelin server.
IMO users should be able to impersonate themselves as a complete security system.
### What type of PR is it?

[Improvement]
### Todos
- [x] - Update doc
- [x] - FIX NPEs
- [x] - FIX CI
### What is the Jira issue?
- [ZEPPELIN-1320](https://issues.apache.org/jira/browse/ZEPPELIN-1320)
### How should this be tested?
- Enable shiro auth in shiro.ini
- Add ssh key for the same user you want to try and impersonate (say user1).

```
adduser user1
ssh-keygen
ssh user1localhost mkdir -p .ssh
cat ~/.ssh/id_rsa.pub | ssh user1localhost 'cat >> .ssh/authorized_keys'
```
- Start zeppelin server, try and run following in paragraph in a notebook
- Go to interpreter setting page, and enable "User Impersonate" in any of the interpreter (in my example its shell interpreter)

```
%sh
whoami
```

Check that it should run as new user, i.e. "user1"
### Screenshots (if appropriate)

![user impersonate](https://cloud.githubusercontent.com/assets/674497/20213127/f32fdc52-a82c-11e6-8e33-aebd6a943c5f.gif)

### Questions:
- Does the licenses files need update? no
- Is there breaking changes for older versions? no
- Does this needs documentation? yes

Author: Prabhjyot Singh <[email protected]>

Closes apache#1554 from prabhjyotsingh/ZEPPELIN-1320-2 and squashes the following commits:

dc69c9d [Prabhjyot Singh] @Leemoonsoo review comment: making ZEPPELIN_SSH_COMMAND configurable
1b26cc0 [Prabhjyot Singh] add doc
5a76839 [Prabhjyot Singh] show User Impersonate only when interpreter setting is "per user" and "isolated"
02c3084 [Prabhjyot Singh] Merge remote-tracking branch 'origin/master' into ZEPPELIN-1320-2
03b2f20 [Prabhjyot Singh] use user instead of ""
0ff80ec [Prabhjyot Singh] Merge remote-tracking branch 'origin/master' into ZEPPELIN-1320-2
dd0731d [Prabhjyot Singh] fix missing test cases
aff1bf0 [Prabhjyot Singh] user should have option to run these interpreters as different user.
@prabhjyotsingh prabhjyotsingh deleted the ZEPPELIN-1320 branch February 25, 2018 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants