Skip to content

Conversation

@jongyoul
Copy link
Member

@jongyoul jongyoul commented Apr 14, 2016

What is this PR for?

This PR enable Zeppelin server register Interpreters without any dependencies of their own. For instance, we should build spark with spark-dependencies even we use our own Spark cluster because current initialisation mechanism needs to all of its dependencies.

What type of PR is it?

[Improvement]

Todos

  • - Add RegisteredInterpreter from interpreter-setting.json in a jar or interpreter/{interpreter}/interpreter-setting.json
  • - Adjust it to Spark*Interpreter

What is the Jira issue?

How should this be tested?

  1. Prepare your own spark cluster - e.g. standalone, Yarn, Mesos -
  2. rm -rf interpreter
  3. mvn clean package -DskipTests -pl 'zeppelin-display,zeppelin-interpreter,zeppelin-server,zeppelin-web,zeppelin-zengine,angular,jdbc,spark'
  4. bin/zeppelin-daemon.sh start
  5. Check error in log
  6. apply patch
  7. mvn clean package -DskipTests -pl 'zeppelin-display,zeppelin-interpreter,zeppelin-server,zeppelin-web,zeppelin-zengine,angular,jdbc,spark'
  8. bin/zeppelin-daemon.sh start
  9. run some paragraph with simple command like sc.version

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

Description

This PR introduce three initialisation mechanism including current one.

  • {interpreter_dir}/{interpreter_group}/interpreter-setting.json
  • interpreter-settings.json in your interpreter jar
  • Current static initialization

Initialization step

  1. Get {interpreter_dir} from Configuration
  2. Get the list of {interpreter_dir}/[{interpreter_group1},{interpreter_group2}...]
  3. Find {interpreter_dir}/{interpreter_group1}/interpreter-setting.json
  4. Find interpreter-setting.json in the resources of {interpreter_dir}/{interpreter_group1}/*/.jar
  5. Adopt static init
  6. Repeat them from the second step with {interpreter_group2}

@jongyoul
Copy link
Member Author

This is the first step to reduce cost for initialisation and to loose coupling between server and interpreter. At first, I'll adjust it to Spark*Interpreter and then, do it to all other interpreters. Finally, I'll remove registeredInterpreters' mechanism.

@jongyoul
Copy link
Member Author

I tried to do my best to conserve existing codes for backward compatibility.

@jongyoul
Copy link
Member Author

@Leemoonsoo Ready to review

@Leemoonsoo
Copy link
Member

Thanks @jongyoul for taking care of the issue.

I think this subject is related to ZEPPELIN-598 and ZEPPELIN-533.

ZEPPELIN-598 try to load interpreter dynamically, from maven repository.
and ZEPPELIN-533 will create registry for that (as well as registry for helium applications, notebook repo)

Considering ZEPPELIN-598 and ZEPPELIN-533, i think source of interpreter information need to be packaged and distributed with interpreter, rather than placed in interpreter-setting.json all together.

For example Helium Application, each application provides separate json file that keeps informations of application, to helium package registry.

I was thinking the same deploy model for Interpreter and NotebookRepo.

Deploy,

  1. Deploy jar into maven repository.
  2. Deploy json file that contains information into Helium registry. (either local or central (future)).

Use,

  1. Zeppelin fetches all json file from Helium registry to list available Interpreters/NotebookRepo/Application
  2. Zeppelin dynamically loads jar when user selects from maven repository and run it based on information from helium registry.

What do you think?

@jongyoul
Copy link
Member Author

@Leemoonsoo Basically, the idea of Helium is very good and promising, and I also think my implementation is a little bit over-sized patch for solving problem. But to solve the problem of using Class.forName, we need to change initialisation mechanism not to use it. Do you have any idea to solve this?

@Leemoonsoo
Copy link
Member

Leemoonsoo commented Apr 16, 2016

Maybe i didn't explained very well. :-)

I think having another mechanism that register interpreter based on information in json file, in addition to current Class.forName is good idea.

In short, Instead of single interpreter-setting.json for all interpreters, if each interpreter can have own json file then it would be much easier to be aligned with ZEPPELIN-598 and ZEPPELIN-533 in the future.

@jongyoul
Copy link
Member Author

jongyoul commented Apr 18, 2016

@Leemoonsoo Thanks for explaining what the problem is. But interpreter-setting.json doesn't include all of interpreters' settings. In case of Spark, interpreter-setting.json includes settings of spark group. I thought all interpreter with same group - or same jar - deploy at the same time with same jar. For more detail, if I adopt a new mechanism with JDBCInterpreter, I also write a new interpreter-setting.json inside a JDBCInterpreter jar or under {interpreter}/jdbc/. Does it makes sense?

And about Helium, I've reviewed that codes, and get to know that it has a different structure and doesn't break current one, thus this PR doesn't conflict on registering interpreter. And I'm also willing to move forward to merge a new mechanism in Helium structure after Helium is merged. I think the first step is to move interpreter-setting.json into HeliumRegistry.

@jongyoul
Copy link
Member Author

@Leemoonsoo I've update the description with more information about initialization steps. Please review it.

@jongyoul
Copy link
Member Author

After this PR is accepted, I'll adopt a new mechanism to all other interpreter with separate PRs for easy review

init();
}

private void init() throws InterpreterException, IOException, RepositoryException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method now is quite big and may be hard to follow. How do you think, would that be possible/make sense to extract few high-level methods here and make it call them?

It might aid the readability and simplify understanding of new interpreter registration process.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bzz I agree with you. I'll divide this methods into several ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very good now!

@bzz
Copy link
Member

bzz commented Apr 18, 2016

@jongyoul thank you for an awesome update - static interpreter registration was a hack and it's great to see a better, testeble way to do it.

Could you please help me understand, do all the interpreter implementations need to be eventually refactored one by one to the new registration system?

If so, do you think there should be an approach we are aiming for toget there, I.e create JIRA issues for each, mark as entry-level task we use jira label beginner) and address it one by one, or do you have something else in mind?

@Leemoonsoo
Copy link
Member

@jongyoul Sounds like a good plan! Thanks for the explanation.

@jongyoul
Copy link
Member Author

@bzz "Personally", I think static initialization is the final way, thus I want to change all of them. And it sounds like a great idea to make them as beginners' tasks, and which are very clear and help them understand backend of Zeppelin.

@jongyoul
Copy link
Member Author

re-trigger

@jongyoul jongyoul closed this Apr 22, 2016
@jongyoul jongyoul reopened this Apr 22, 2016
@jongyoul jongyoul closed this Apr 23, 2016
@jongyoul jongyoul reopened this Apr 23, 2016
@jongyoul
Copy link
Member Author

again

@jongyoul jongyoul closed this Apr 27, 2016
@jongyoul jongyoul reopened this Apr 27, 2016
jongyoul added 7 commits May 11, 2016 16:29
- Added a new initialization mechanism to use interpreter-setting.json
- Adjusted new mechanism to SparkInterpreter for verification
- Changed Spark*Interpreter to use interpreter-setting.json
- Excluded interpreter-setting.json from rat check
- Excluded interpreter-setting.json from rat check
- Extracted new initialization logic into another methods
jongyoul added 5 commits May 20, 2016 11:30
- Fixed logic to check for supporting legacy mechanism
- Fixed some unicode characters in interpreter-setting.json
- Changed logger setting only for test. This will be reverted after test
@jongyoul
Copy link
Member Author

@echarles Hi, I have a problem for testing SparkRInterpreter of what you code. Could you please help solve this issue below?

Results :

Failed tests: 
  ZeppelinSparkClusterTest.sparkRTest:105 expected:<[[1] ]3> but was:<[localDF <- data.frame(name=c("a", "b", "c"), age=c(19, 23, 18))


df <- createDataFrame(sqlContext, localDF)


count(df)



simpleWarning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘htmltools’






simpleWarning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘repr’


]3>

It looks like that the result is correct but that output messages are not same as expected. I think the output format of R is related to zeppelin.R.render.options. And in my local machine, it passes. Do you have any idea for it?

@echarles
Copy link
Member

@jongyoul your conclusion are correct. I would also say there is something happeing with the zeppelin.R.render.optionsnowt taken into account (theecho = FALSE` is there to ensure the commands you type are not printed in the result).

Here, it seems the R result is returned with the given command.
Maybe the interperter settings overrides the values defined in the interepreter code?

@jongyoul
Copy link
Member Author

@echarles Thanks for the advices and I've found the reason why zeppelin.R.render.options. This is because -Pr -Pspark. According to this PR, InterpreterFactory reads initialization data from interpreter-setting.json. And for backward compatibility, also read static initialization block from Specific interpreter group and name. Thus if we include two same interpreter group and name - like thos profiles -, some variables may be overridden. I'll fix it for support the case that we build Zeppelin with two profiles.

@echarles
Copy link
Member

@jongyoul Building with both -Pr -Pspark may lead to unexpected behavior, even without the changes you brought...

@jongyoul
Copy link
Member Author

@echarles You're right but we need to pass the CI first. :-) I don't think this is a real case and I'll handle it through another PR which may be about changing CI.

jongyoul added 2 commits May 26, 2016 18:18
- Ignored while registering a new interpreter with existing interpreter key
@jongyoul
Copy link
Member Author

re-trigger

@jongyoul jongyoul closed this May 26, 2016
@jongyoul jongyoul reopened this May 26, 2016
@jongyoul
Copy link
Member Author

jongyoul commented May 26, 2016

Finally, I've passed CI.

Summary of changes,

  • Read interpreter info from file with json format
  • Fixed some tests in order to pass without static initialization

It never break to use any existing interpreter, thus we can change it without side effect.

@bzz I'll make sub tasks of ZEPPELIN-804 with a tag for beginner to move initialization mechanism from old to new

@Leemoonsoo @bzz Please review this PR

@bzz
Copy link
Member

bzz commented May 27, 2016

The code looks great to me.

Are there any potential side-effects of this change for the existing users?

@Leemoonsoo
Copy link
Member

@jongyoul Code looks good to me.
It would be better if new interpreter registration mechanism (e.g. Format of interpreter-setting.json and where the file can be placed, etc) is documented in https://github.com/apache/incubator-zeppelin/blob/master/docs/development/writingzeppelininterpreter.md.

@jongyoul
Copy link
Member Author

@Leemoonsoo Sure. I'll update it.

@jongyoul
Copy link
Member Author

@bzz In my opinion, there's no side-effects of this feature.

@jongyoul
Copy link
Member Author

Merging if there's no more discussion.

@bzz
Copy link
Member

bzz commented May 30, 2016

:shipit:

@asfgit asfgit closed this in f8e1f6c May 31, 2016
asfgit pushed a commit that referenced this pull request Jun 2, 2016
### What is this PR for?
Currently available interpreter list is not shown in `Creating New Interpreter` section. It seems this bug was generated after #835 was merged. So I temporally deactivated [3 SerializedName code lines](6d7f1bc).

### What type of PR is it?
Bug Fix

### Todos
* [x] - Fix interpreter listing bug when creating new interpreter

### What is the Jira issue?
[ZEPPELIN-931](https://issues.apache.org/jira/browse/ZEPPELIN-931)

### How should this be tested?
1. Build latest master branch and browse Zeppelin home
2. Create new interpreter -> You can not see the available interpreter list in this step like below attached screenshot
3. Apply this patch
4. Build again and browse  -> You can see the available interpreter list as normal

### Screenshots (if appropriate)
 - **Before**
<img width="1273" alt="screen shot 2016-06-01 at 12 36 42 pm" src="https://cloud.githubusercontent.com/assets/10060731/15723066/9082435e-27f5-11e6-9783-df44638dbbec.png">

 - **After**
<img width="1273" alt="screen shot 2016-06-01 at 12 33 06 pm" src="https://cloud.githubusercontent.com/assets/10060731/15723067/92bcc8ce-27f5-11e6-82f5-6c0db7b4342c.png">

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: AhyoungRyu <[email protected]>
Author: Jongyoul Lee <[email protected]>
Author: Ah young <[email protected]>

Closes #945 from AhyoungRyu/ZEPPELIN-931 and squashes the following commits:

711eb54 [Ah young] Merge pull request #2 from jongyoul/ZEPPELIN-931
6121f9b [Jongyoul Lee] - Fixed documentation
6e7dac9 [Ah young] Merge pull request #1 from jongyoul/ZEPPELIN-931
fed1b40 [Jongyoul Lee] - Fixed fieldName in interpreter-setting.json
6d7f1bc [AhyoungRyu] ZEPPELIN-931: fix interpreter listing bug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants