Skip to content

Conversation

@hayssams
Copy link
Contributor

What is this PR for ?

This PR replaces the PR-1479 by removing any hadoop dependency using WEBHDFS as a communication protocol (code borrowed from PR1600)
Zeppelin currently supports many backends for storing notes through Apache Commons VFS.
Apache Commons VFS supports HDFS in readonly mode.
This PR makes HDFS a first class citizen by allowing users to load notes from / save notes to HDFS.

What type of PR is it?

Improvement

Todos

Task

What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-1515

How should this be tested?

Update zeppelin.notebook.dir property to a value like hdfs://localhost:9000/tmp/notebook and the property zeppelin.notebook.storage to the value org.apache.zeppelin.notebook.repo.HdfsNotebookRepo

check that your notes are loaded from and stored to HDFS by listing notes using the command :
hdfs dfs -ls /tmp/notebook

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? Yes

@jongyoul
Copy link
Member

I don't think it's good idea to include some interpreters as dependencies onto zeppelin-zengine.

@hayssams
Copy link
Contributor Author

@jongyoul moved HDFSCommand to zeppelin-interpreter

@felixcheung
Copy link
Member

it might be important to call this webHdfs instead of hdfs
there are significant and important differences on what is supported esp. with cloud providers

@hayssams
Copy link
Contributor Author

@felixcheung
HDFSCommand exists without the prefix Web since the beginning

@felixcheung
Copy link
Member

felixcheung commented May 12, 2017 via email

@hayssams
Copy link
Contributor Author

@felixcheung Do you want me to update the docs and the code or the docs only ?

@felixcheung
Copy link
Member

both of them if it makes sense?

@hayssams
Copy link
Contributor Author

@felixcheung
Renaming done.

@zjffdu
Copy link
Contributor

zjffdu commented Jun 30, 2017

sorry @hayssams I create another PR #2455 to use hdfs library directly. Because I think it is a little complicated to using webhdfs library and hard to do test. Maybe also lose some features, e.g. I am not sure whether webhdfs works with kerberized cluster.

@hayssams
Copy link
Contributor Author

Hello @jongyoul
OK got it.
However keep in mind that Webhdfs works in a kerberized env without any change to the source code. It simply needs to be executed in a JAAS context which is nothing else than a jaas.conf file passed as a parameter to the Zeppelin JVM

@felixcheung
Copy link
Member

I see perhaps value in both web hdfs and hdfs (jar client)?

@zjffdu
Copy link
Contributor

zjffdu commented Jul 1, 2017

maybe add one property to allow user to choose which method to use. And HdfsNoteBookRepo can delegate the real work to WebHdfsNotebookRepo which use webhdfs and NativeHdfsNotebookRepo which use native hadoop client jar. So each implementation do its own work without affecting the other.

hayssams added 3 commits July 12, 2017 15:45
# Conflicts:
#	file/src/main/java/org/apache/zeppelin/file/HDFSCommand.java
#	file/src/main/java/org/apache/zeppelin/file/WebHDFSFileInterpreter.java
#	file/src/test/java/org/apache/zeppelin/file/WebHDFSFileInterpreterTest.java
@hayssams hayssams closed this Jan 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants