ZEPPELIN-804 Refactoring registration mechanism on Interpreters #835

jongyoul · 2016-04-14T03:16:12Z

What is this PR for?

This PR enable Zeppelin server register Interpreters without any dependencies of their own. For instance, we should build spark with spark-dependencies even we use our own Spark cluster because current initialisation mechanism needs to all of its dependencies.

What type of PR is it?

[Improvement]

Todos

- Add RegisteredInterpreter from interpreter-setting.json in a jar or interpreter/{interpreter}/interpreter-setting.json
- Adjust it to Spark*Interpreter

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-804

How should this be tested?

Prepare your own spark cluster - e.g. standalone, Yarn, Mesos -
rm -rf interpreter
mvn clean package -DskipTests -pl 'zeppelin-display,zeppelin-interpreter,zeppelin-server,zeppelin-web,zeppelin-zengine,angular,jdbc,spark'
bin/zeppelin-daemon.sh start
Check error in log
apply patch
mvn clean package -DskipTests -pl 'zeppelin-display,zeppelin-interpreter,zeppelin-server,zeppelin-web,zeppelin-zengine,angular,jdbc,spark'
bin/zeppelin-daemon.sh start
run some paragraph with simple command like sc.version

Screenshots (if appropriate)

Questions:

Does the licenses files need update? No
Is there breaking changes for older versions? No
Does this needs documentation? No

Description

This PR introduce three initialisation mechanism including current one.

{interpreter_dir}/{interpreter_group}/interpreter-setting.json
interpreter-settings.json in your interpreter jar
Current static initialization

Initialization step

Get {interpreter_dir} from Configuration
Get the list of {interpreter_dir}/[{interpreter_group1},{interpreter_group2}...]
Find {interpreter_dir}/{interpreter_group1}/interpreter-setting.json
Find interpreter-setting.json in the resources of {interpreter_dir}/{interpreter_group1}/*/.jar
Adopt static init
Repeat them from the second step with {interpreter_group2}

jongyoul · 2016-04-14T03:19:57Z

This is the first step to reduce cost for initialisation and to loose coupling between server and interpreter. At first, I'll adjust it to Spark*Interpreter and then, do it to all other interpreters. Finally, I'll remove registeredInterpreters' mechanism.

jongyoul · 2016-04-14T03:25:30Z

I tried to do my best to conserve existing codes for backward compatibility.

jongyoul · 2016-04-15T02:42:33Z

@Leemoonsoo Ready to review

Leemoonsoo · 2016-04-16T03:54:47Z

Thanks @jongyoul for taking care of the issue.

I think this subject is related to ZEPPELIN-598 and ZEPPELIN-533.

ZEPPELIN-598 try to load interpreter dynamically, from maven repository.
and ZEPPELIN-533 will create registry for that (as well as registry for helium applications, notebook repo)

Considering ZEPPELIN-598 and ZEPPELIN-533, i think source of interpreter information need to be packaged and distributed with interpreter, rather than placed in interpreter-setting.json all together.

For example Helium Application, each application provides separate json file that keeps informations of application, to helium package registry.

I was thinking the same deploy model for Interpreter and NotebookRepo.

Deploy,

Deploy jar into maven repository.
Deploy json file that contains information into Helium registry. (either local or central (future)).

Use,

Zeppelin fetches all json file from Helium registry to list available Interpreters/NotebookRepo/Application
Zeppelin dynamically loads jar when user selects from maven repository and run it based on information from helium registry.

What do you think?

jongyoul · 2016-04-16T06:16:11Z

@Leemoonsoo Basically, the idea of Helium is very good and promising, and I also think my implementation is a little bit over-sized patch for solving problem. But to solve the problem of using Class.forName, we need to change initialisation mechanism not to use it. Do you have any idea to solve this?

Leemoonsoo · 2016-04-16T09:19:04Z

Maybe i didn't explained very well. :-)

I think having another mechanism that register interpreter based on information in json file, in addition to current Class.forName is good idea.

In short, Instead of single interpreter-setting.json for all interpreters, if each interpreter can have own json file then it would be much easier to be aligned with ZEPPELIN-598 and ZEPPELIN-533 in the future.

jongyoul · 2016-04-18T05:48:15Z

@Leemoonsoo Thanks for explaining what the problem is. But interpreter-setting.json doesn't include all of interpreters' settings. In case of Spark, interpreter-setting.json includes settings of spark group. I thought all interpreter with same group - or same jar - deploy at the same time with same jar. For more detail, if I adopt a new mechanism with JDBCInterpreter, I also write a new interpreter-setting.json inside a JDBCInterpreter jar or under {interpreter}/jdbc/. Does it makes sense?

And about Helium, I've reviewed that codes, and get to know that it has a different structure and doesn't break current one, thus this PR doesn't conflict on registering interpreter. And I'm also willing to move forward to merge a new mechanism in Helium structure after Helium is merged. I think the first step is to move interpreter-setting.json into HeliumRegistry.

jongyoul · 2016-04-18T05:59:41Z

@Leemoonsoo I've update the description with more information about initialization steps. Please review it.

jongyoul · 2016-04-18T06:03:02Z

After this PR is accepted, I'll adopt a new mechanism to all other interpreter with separate PRs for easy review

bzz · 2016-04-18T07:45:12Z

zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterFactory.java

    init();
  }

  private void init() throws InterpreterException, IOException, RepositoryException {


This method now is quite big and may be hard to follow. How do you think, would that be possible/make sense to extract few high-level methods here and make it call them?

It might aid the readability and simplify understanding of new interpreter registration process.

What do you think?

@bzz I agree with you. I'll divide this methods into several ones.

It looks very good now!

bzz · 2016-04-18T07:57:51Z

@jongyoul thank you for an awesome update - static interpreter registration was a hack and it's great to see a better, testeble way to do it.

Could you please help me understand, do all the interpreter implementations need to be eventually refactored one by one to the new registration system?

If so, do you think there should be an approach we are aiming for toget there, I.e create JIRA issues for each, mark as entry-level task we use jira label beginner) and address it one by one, or do you have something else in mind?

Leemoonsoo · 2016-04-18T09:02:03Z

@jongyoul Sounds like a good plan! Thanks for the explanation.

jongyoul · 2016-04-18T12:18:44Z

@bzz "Personally", I think static initialization is the final way, thus I want to change all of them. And it sounds like a great idea to make them as beginners' tasks, and which are very clear and help them understand backend of Zeppelin.

jongyoul · 2016-04-22T10:36:35Z

re-trigger

jongyoul · 2016-04-27T06:35:44Z

again

- Added a new initialization mechanism to use interpreter-setting.json - Adjusted new mechanism to SparkInterpreter for verification

- Fixed the style

- Changed Spark*Interpreter to use interpreter-setting.json

- Fixed test environments

- Excluded interpreter-setting.json from rat check

- Extracted new initialization logic into another methods

- Fixed logic to check for supporting legacy mechanism

- Checked if path exists or not

- Fixed some unicode characters in interpreter-setting.json

- Changed logger setting only for test. This will be reverted after test

jongyoul · 2016-05-25T01:43:24Z

@echarles Hi, I have a problem for testing SparkRInterpreter of what you code. Could you please help solve this issue below?

Results :

Failed tests: 
  ZeppelinSparkClusterTest.sparkRTest:105 expected:<[[1] ]3> but was:<[localDF <- data.frame(name=c("a", "b", "c"), age=c(19, 23, 18))


df <- createDataFrame(sqlContext, localDF)


count(df)



simpleWarning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called â€˜htmltoolsâ€™






simpleWarning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called â€˜reprâ€™


]3>

It looks like that the result is correct but that output messages are not same as expected. I think the output format of R is related to zeppelin.R.render.options. And in my local machine, it passes. Do you have any idea for it?

echarles · 2016-05-26T06:18:36Z

@jongyoul your conclusion are correct. I would also say there is something happeing with the zeppelin.R.render.optionsnowt taken into account (theecho = FALSE` is there to ensure the commands you type are not printed in the result).

Here, it seems the R result is returned with the given command.
Maybe the interperter settings overrides the values defined in the interepreter code?

jongyoul · 2016-05-26T09:01:18Z

@echarles Thanks for the advices and I've found the reason why zeppelin.R.render.options. This is because -Pr -Pspark. According to this PR, InterpreterFactory reads initialization data from interpreter-setting.json. And for backward compatibility, also read static initialization block from Specific interpreter group and name. Thus if we include two same interpreter group and name - like thos profiles -, some variables may be overridden. I'll fix it for support the case that we build Zeppelin with two profiles.

echarles · 2016-05-26T09:03:25Z

@jongyoul Building with both -Pr -Pspark may lead to unexpected behavior, even without the changes you brought...

jongyoul · 2016-05-26T09:06:20Z

@echarles You're right but we need to pass the CI first. :-) I don't think this is a real case and I'll handle it through another PR which may be about changing CI.

- Reverted log4j setting

- Ignored while registering a new interpreter with existing interpreter key

jongyoul · 2016-05-26T13:27:08Z

re-trigger

jongyoul · 2016-05-26T14:19:29Z

Finally, I've passed CI.

Summary of changes,

Read interpreter info from file with json format
Fixed some tests in order to pass without static initialization

It never break to use any existing interpreter, thus we can change it without side effect.

@bzz I'll make sub tasks of ZEPPELIN-804 with a tag for beginner to move initialization mechanism from old to new

@Leemoonsoo @bzz Please review this PR

bzz · 2016-05-27T05:58:54Z

The code looks great to me.

Are there any potential side-effects of this change for the existing users?

Leemoonsoo · 2016-05-28T15:06:33Z

@jongyoul Code looks good to me.
It would be better if new interpreter registration mechanism (e.g. Format of interpreter-setting.json and where the file can be placed, etc) is documented in https://github.com/apache/incubator-zeppelin/blob/master/docs/development/writingzeppelininterpreter.md.

jongyoul · 2016-05-29T06:57:08Z

@Leemoonsoo Sure. I'll update it.

- Added documentation

jongyoul · 2016-05-29T16:32:49Z

@bzz In my opinion, there's no side-effects of this feature.

jongyoul · 2016-05-30T05:57:55Z

Merging if there's no more discussion.

bzz · 2016-05-30T07:18:53Z

### What is this PR for? Currently available interpreter list is not shown in `Creating New Interpreter` section. It seems this bug was generated after #835 was merged. So I temporally deactivated [3 SerializedName code lines](6d7f1bc). ### What type of PR is it? Bug Fix ### Todos * [x] - Fix interpreter listing bug when creating new interpreter ### What is the Jira issue? [ZEPPELIN-931](https://issues.apache.org/jira/browse/ZEPPELIN-931) ### How should this be tested? 1. Build latest master branch and browse Zeppelin home 2. Create new interpreter -> You can not see the available interpreter list in this step like below attached screenshot 3. Apply this patch 4. Build again and browse -> You can see the available interpreter list as normal ### Screenshots (if appropriate) - **Before** <img width="1273" alt="screen shot 2016-06-01 at 12 36 42 pm" src="https://cloud.githubusercontent.com/assets/10060731/15723066/9082435e-27f5-11e6-9783-df44638dbbec.png"> - **After** <img width="1273" alt="screen shot 2016-06-01 at 12 33 06 pm" src="https://cloud.githubusercontent.com/assets/10060731/15723067/92bcc8ce-27f5-11e6-82f5-6c0db7b4342c.png"> ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: AhyoungRyu <[email protected]> Author: Jongyoul Lee <[email protected]> Author: Ah young <[email protected]> Closes #945 from AhyoungRyu/ZEPPELIN-931 and squashes the following commits: 711eb54 [Ah young] Merge pull request #2 from jongyoul/ZEPPELIN-931 6121f9b [Jongyoul Lee] - Fixed documentation 6e7dac9 [Ah young] Merge pull request #1 from jongyoul/ZEPPELIN-931 fed1b40 [Jongyoul Lee] - Fixed fieldName in interpreter-setting.json 6d7f1bc [AhyoungRyu] ZEPPELIN-931: fix interpreter listing bug

jongyoul force-pushed the ZEPPELIN-804 branch from dce045f to 7bb5bdc Compare April 18, 2016 00:40

bzz reviewed Apr 18, 2016
View reviewed changes

jongyoul closed this Apr 22, 2016

jongyoul reopened this Apr 22, 2016

jongyoul closed this Apr 23, 2016

jongyoul reopened this Apr 23, 2016

jongyoul closed this Apr 27, 2016

jongyoul reopened this Apr 27, 2016

jongyoul added 7 commits May 11, 2016 16:29

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

ca7b96c

- Added a new initialization mechanism to use interpreter-setting.json - Adjusted new mechanism to SparkInterpreter for verification

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

48ac41d

- Fixed the style

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

d54f98e

- Changed Spark*Interpreter to use interpreter-setting.json

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

8a90fe4

- Fixed test environments

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

1fa2e52

- Fixed test environments

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

00f55a8

- Excluded interpreter-setting.json from rat check

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

519f057

- Excluded interpreter-setting.json from rat check

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

c5b7d54

- Extracted new initialization logic into another methods

jongyoul force-pushed the ZEPPELIN-804 branch from 36a64fa to c5b7d54 Compare May 11, 2016 07:29

jongyoul added 5 commits May 20, 2016 11:30

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

844dccb

- Fixed logic to check for supporting legacy mechanism

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

1b3cd0c

- Checked if path exists or not

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

3ad41bb

- Checked if path exists or not

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

e8f990f

- Fixed some unicode characters in interpreter-setting.json

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

81ab361

- Changed logger setting only for test. This will be reverted after test

jongyoul added 2 commits May 26, 2016 18:18

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

312dd77

- Reverted log4j setting

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

25bc501

- Ignored while registering a new interpreter with existing interpreter key

jongyoul closed this May 26, 2016

jongyoul reopened this May 26, 2016

ZEPPELIN-804 Refactoring registration mechanism on Interpreters

823321e

- Added documentation

asfgit closed this in f8e1f6c May 31, 2016

AhyoungRyu mentioned this pull request Jun 1, 2016

[HOTFIX] ZEPPELIN-931: fix interpreter listing bug #945

Closed

1 task

minahlee mentioned this pull request Jul 7, 2016

[ZEPPELIN-1026] set syntax highlight based on default bound interpreter #1148

Closed

4 tasks

bzz mentioned this pull request Aug 3, 2016

[ZEPPELIN-1261] Bug fix in z.show() for matplotlib graphs #1267

Closed

ZEPPELIN-804 Refactoring registration mechanism on Interpreters #835

ZEPPELIN-804 Refactoring registration mechanism on Interpreters #835

Uh oh!

Conversation

jongyoul commented Apr 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Description

This PR introduce three initialisation mechanism including current one.

Initialization step

Uh oh!

jongyoul commented Apr 14, 2016

Uh oh!

jongyoul commented Apr 14, 2016

Uh oh!

jongyoul commented Apr 15, 2016

Uh oh!

Leemoonsoo commented Apr 16, 2016

Uh oh!

jongyoul commented Apr 16, 2016

Uh oh!

Leemoonsoo commented Apr 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jongyoul commented Apr 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jongyoul commented Apr 18, 2016

Uh oh!

jongyoul commented Apr 18, 2016

Uh oh!

bzz Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

jongyoul Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

bzz May 27, 2016

Choose a reason for hiding this comment

Uh oh!

bzz commented Apr 18, 2016

Uh oh!

Leemoonsoo commented Apr 18, 2016

Uh oh!

jongyoul commented Apr 18, 2016

Uh oh!

jongyoul commented Apr 22, 2016

Uh oh!

jongyoul commented Apr 27, 2016

Uh oh!

jongyoul commented May 25, 2016

Uh oh!

echarles commented May 26, 2016

Uh oh!

jongyoul commented May 26, 2016

Uh oh!

echarles commented May 26, 2016

Uh oh!

jongyoul commented May 26, 2016

Uh oh!

jongyoul commented May 26, 2016

Uh oh!

jongyoul commented May 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bzz commented May 27, 2016

Uh oh!

Leemoonsoo commented May 28, 2016

Uh oh!

jongyoul commented May 29, 2016

Uh oh!

jongyoul commented May 29, 2016

Uh oh!

jongyoul commented May 30, 2016

jongyoul commented Apr 14, 2016 •

edited

Loading

Leemoonsoo commented Apr 16, 2016 •

edited

Loading

jongyoul commented Apr 18, 2016 •

edited

Loading

jongyoul commented May 26, 2016 •

edited

Loading