Skip to content

Conversation

@cloverhearts
Copy link
Member

@cloverhearts cloverhearts commented Jul 13, 2016

What is this PR for?

Workflow process feature.
(To ensure the success of each paragraph, it is possible to run consecutively.)

Case 1

Through a dynamic form, you can execute the order in paragraph.
There is a difference with traditional methods.
Please check the following flowchart.
workflowdynamicformcontrol

Case 2

In general, when run a plurality of Paragraph, it performs Note entire run.
This is a good way to run a lot of Paragraph contained in the Note.
However, the problem occurs if the Interpreter of Paragraph different.
notebook_example
For Paragraph each using a different type of one of the Interpreter Note but run in sequence, the end is all different.

normal notebook run
For example, Markdown is a very fast Interpreter.
The process is completed very quickly.
This is a problem in the sequential execution Paragraph.

worklfow run
This feature ensures a certain execution order Notebook with each Interpreter.

Case 3

For concurrent job in the workflow ...

job_repl

If the current functional design is supposed to run at the same time, as follows
It is to share the results of the job.
But if the situation need to run the job at the same time, subject to their execution flow.

** The results will have to succeed, the following paragraph will be executed. **

What type of PR is it?

Improvement

jira

https://issues.apache.org/jira/browse/ZEPPELIN-1165

How should this be tested?

Dynamic form
For usage, see the following:
https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/dynamicform.html

In the markdown type:

%md
${workflow:workflow=MyWorkflowName,NotebookId01:ParagraphId01|NotebookId02:ParagraphId02}
  • details explain.

You can specify the title work flow.
Simply displayed on the UI.

${workflow:workflow=
or
${workflow:workflow=MyWorkflow

title name : MyWorkflow

character a comma(,), you can start working set.

${workflow:workflow=MyWorkflow**,**

Enter your Notebook ID and Paragraph ID.

${workflow:workflow=MyWorkflow,2BS99R6QA:20160712-163556_1475991024}

In order to distinguish between the Notebook ID and Paragraph ID, characters <:> it is.
The example was set to run 20160712-163556_1475991024 paragraph with a notebook id called 2BS99R6QA.

  • how to get notebook id?
    getnotebookid
  • how to get paragraph id?
    getparagraphid

Through a pipe character( | ), you can add tasks consecutively.

${workflow:workflow=MyWorkflow,2BS99R6QA:20160712-163556_1475991024|2BS99R6QA:20160710-03221_12234323}

Note run

cap 2016-07-14 15-11-07-036

If you want to run to all Paragraph in Note, you can:

NoteID : *
or
NoteID
${workflow:workflow=MyworkflowName,NotebookId}
or
${workflow:workflow=MyworkflowName,NotebookId:*}

for example

${workflow:workflow=Myworkflow,2A94M5J1Z}

Of course, it is possible to run a particular paragraph and mix.

%md
${workflow:workflow=Myworkflow,2A94M5J1Z|r|2BQWXDH1C:20160714-151024_1284650101}

Screenshots (if appropriate)

workflow

Questions:

  • Does the licenses files need update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? yes, After the merge.

@AhyoungRyu
Copy link
Contributor

@cloverhearts Is this PR WIP?

@astroshim
Copy link
Contributor

Great feature!
Can you add documents for this or please explain more detail to review?

@cloverhearts
Copy link
Member Author

Test description modified.

@cloverhearts
Copy link
Member Author

@AhyoungRyu
It has been completed.
However, if the part you want, please tell us at any time.
Thank you.

@cloverhearts
Copy link
Member Author

update to description.

This feature ensures a certain execution order Notebook with each Interpreter.

@cloverhearts
Copy link
Member Author

@AhyoungRyu
Copy link
Contributor

@cloverhearts Thanks for the detailed description. It would be better for reviewing this cool feature 👍

@prabhjyotsingh
Copy link
Contributor

Cool feature, even I wanted to have something similar. Will test it, and give more feedback, in the mean while, and referring to the gif posted in screenshot segment, can the rest of the paragraph go into pending state.

For example in this case can the last paragraph %md go into pending state while %spark was executing ?

@cloverhearts
Copy link
Member Author

cloverhearts commented Jul 14, 2016

@prabhjyotsingh
Thank you for your opinion.
I understood as a story about running redundant during the workflow.
If the current functional design is supposed to run at the same time, as follows
It is to share the results of the job.

job_repl

But if the situation need to run the job at the same time, subject to their execution flow.
Did you find this answer?

@cloverhearts
Copy link
Member Author

update

add operator : all paragraph (*)

and ui fixed.

cap 2016-07-14 15-11-07-036

If you want to run to all Paragraph in Note, you can:

NoteID : *
or
NoteID
${workflow:workflow=MyworkflowName,NotebookId}
or
${workflow:workflow=MyworkflowName,NotebookId:*}

for example

${workflow:workflow=Myworkflow,2A94M5J1Z}

Of course, it is possible to run a particular paragraph and mix.

%md
${workflow:workflow=Myworkflow,2A94M5J1Z|r|2BQWXDH1C:20160714-151024_1284650101}

@cloverhearts
Copy link
Member Author

Screenshot change

@Leemoonsoo
Copy link
Member

Leemoonsoo commented Jul 14, 2016

Thanks @cloverhearts for the contribution and the descriptions.
And sorry for giving some negative feedbacks here.

Workflow definition
I'm not sure about workflow definition shares dynamic form syntax ${ ... }.
I think it's not a good idea access two completely different feature with one syntax.

Alternative approach
Current approach does create new syntax for workflow creation. And WorkflowManager, WorkflowJob, and so on.

I think there is much simpler way achieve the same goal, with more flexible/powerful workflow control while keeping usage simple and easy.

Currently, z.run(PARAGRAPH_ID) in spark interpreter allows run other paragraph in the same notebook.

Simply extend this method, from

z.run(PARAGRAPH_ID)   # run paragraph in current notebook

to

z.run(PARAGRAPH_ID)   # run paragraph in any notebook
z.run(NOTE_ID)        # run all paragraphs in the notebook

will achieve the same goal without introducing new syntax or WorkflowManager.
Then user can define workflow in scala programming language in spark interpreter. e.g.

z.run("Paragraph1")
z.run("Paragraph2")
if (somecondition) {
   z.run("Paragraph3")
} else {
   z.run("Note2")
}

And note that, z.run() feature can be implemented not only in SparkInterpreter but also in any other interpreter. Therefore, other interpreter may provide different interface (python, etc) to access this feature. And if we really want new syntax, it's even possible to create %workflow interpreter and provide custom syntax or some gui using AngularDisplay system to define and run workflow.

So, i think just improving z.run() provides more flexible, modular, easier solution.

What do you think?

@xiufengliu
Copy link

xiufengliu commented Jul 14, 2016

`z.run("Paragraph1")

z.run("Paragraph2")
`
will "Paragraph1" and "Paragraph2" run sequencially? If "Paragraph1" failed, will ""Paragraph2"" still run?

@corneadoug
Copy link
Contributor

I like the idea, but I also don't think that a new dynamic form is a good way to manage it.
I think that feature could fit better inside of the JOB page, especially since it is cross-notebook.
JOB page is currently only listing the Notebooks and the status of each paragraph, we could see it as a way to manage any workflow. (Full Notebook, or Custom Workflow)

I also wonder how can we handle the case of paragraphs that can run in parallel during one of the workflow step. (multiple sql queries for example)

There was a lot of demands to have a better 'Run all paragraphs' feature, and maybe we could later have a dropdown on the run button allowing to run multiple profile (to run some workflow related to that notebook, a bit like building different profile in Eclipse)

@cloverhearts
Copy link
Member Author

cloverhearts commented Jul 14, 2016

@Leemoonsoo
Thank you for your opinion!

Workflow definition

I actually agree with your opinion.
The reason I made dynamic form is thought that the user wants to manage the workflow Note.
I am satisfied with the usability of the workflow according to the dynamic form.
However, there was doubt whether this is also no problem using the dynamic form.
It is not consistent usage.
Maybe @corneadoug think is right.
(add workflow feature in job menu)
Also feel the need to do to represent the workflow Paragraph well.
% Workflow are a good idea.
However, it is creating a new interpreter makes a number of concerns.
Maybe Dynamic Form is the same problem with this issue with.
This part I will try to think more.

Alternative approach

I know of the existence of z.run.
However, z.run can not replace this function.
z.run does not guarantee the order of execution of each other for the other Paragraph interpreter.
Case 2 See the following image.

seqinterpreter

Thanks again for details opinion.

@cloverhearts
Copy link
Member Author

cloverhearts commented Jul 14, 2016

@corneadoug
Your opinion is right.
should be added to the Workflow.
But I was thinking that it must be able to control even in Note.
Dynamic Form to use convenient and it is true that I am expressing the workflow.
Satisfied.
However, I'm not sure Whether this is correct.
In particular usability and consistency.
% Workflow @Leemoonsoo 's idea is good.

If you have any other idea, please advise me.

@Leemoonsoo
Copy link
Member

z.run() currently submit paragraph into each interpreter's scheduler.

Let's say there're PARAGRAPH_1, PARAGRAPH_2, PARAGRAPH_3

%md
...

%spark
...

%sql
...

When user execute

z.run(P1)
z.run(P2)
z.run(P3)

z.run(P1) submit a job to run P1 to scheduler of Markdown interpreter.
z.run(P2) submit a job to run P2 to scheduler of Spark interpreter.
z.run(P3) submit a job to run P3 to scheduler of SparkSQL interpreter.

And each scheduler from each different interpreter works independently.
That's why P1, P2, P3 runs concurrently. (and this is why 'Run all' button doesn't run paragraph sequentially)

Then we can add synchronous option, such as

val result1 = z.runSynchronously(P1)
val result2 = z.runSynchronously(P2)
val result3 = z.runSynchronously(P3)

Then we can make sure P2 is not submitted before P1 finishes, P3 is not submitted before P2 finishes.

My points are,

  • We can provide API to interpreter runs notebook or paragraph in the other notebook.
    So user can define workflow in programming language (i.e. scala, python, etc).
    It naturally gives more flexibility and power user to creates workflow, because programming language provides such things: conditional block, future, threads, and so on.
  • To provide sequential job execution, we can provide synchronous api such as z.runSynchronously()
  • In case of users, who are not familiar with programming language, want to creates workflow, we can create %workflow interpreter and invent new syntax or GUI on top of the same z.run() API internally. This is more modular and less confusing compare to run workflow with dynamic form syntax.

@cloverhearts
Copy link
Member Author

Thank you for your answer.
I would restart operations soon to organize your information.
I plan to expand the .
If anyone have a different opinion, please write to comment.
Thank you.

@xiufengliu
Copy link

Hi, may I ask if this feature has been merged to the master?

@corneadoug
Copy link
Contributor

@xiufengliu No it wasn't merged.
In merged PR you can see a message saying: "asfgit closed this in CommitHash Xdays ago"
This is an interesting feature and there was some discussions on how this feature could be handled and how to define the flow, so I think it will need a lot of changes.

@cloverhearts
Copy link
Member Author

@xiufengliu
Thanks for your interest in this feature.
This feature can undermine the existing problems Zeppelin used consistently.
I would like to re-implement this feature in different ways.
If you ever write a Comment about the environment, or purpose you want to use you use, we seem to be a lot of help to improve and implement this feature.

@xiufengliu
Copy link

Thanks! It is a fantastic feature. I am really looking forwards to.

@cloverhearts
Copy link
Member Author

This feature is not merged.
This feature will be re-implemented.
However, because there are people who need it, and I hope to continue to manage the PR.
I will continue the discussion later in a new way.
Thank you.

@corneadoug
Copy link
Contributor

@cloverhearts If I understand, the plan is to open a new PR for this feature?
Can it be closed then?

@cloverhearts
Copy link
Member Author

@corneadoug
Yes, no problem.
:)

@xiufengliu
Copy link

Hi, may i ask if the new PR was open? I am looking forward to this feature

@xiufengliu
Copy link

Dear corneadoug:
May I ask if this pull request work is based on the zeppelin 7.0? Are there any problems to use it for now since I saw there are some conflicting files in the above checks? Thanks

/Xiufeng

@cloverhearts
Copy link
Member Author

@xiufengliu

This feature is currently implemented.
I have almost done this change so we can use it as a code implementation of the paragraph.
If a fast merge is done, I think I can introduce this feature to you before 2017.
Thank you.

# Conflicts:
#	zeppelin-server/src/main/java/org/apache/zeppelin/socket/NotebookServer.java
#	zeppelin-web/src/app/notebook/paragraph/paragraph-parameterizedQueryForm.html
#	zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js
#	zeppelin-web/src/components/websocketEvents/websocketMsg.service.js
@cloverhearts
Copy link
Member Author

#1799

I was create new pr.

and re-implementation codebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants