Skip to content

Conversation

@mispecto
Copy link

What is this PR for?

This patch restarts Livy session when it's lost.

What type of PR is it?

Improvement/Bug Fix

Todos

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1293

How should this be tested?

  • Configure livy.server.session.timeout parameter in Livy
  • Start new Livy session through Zeppelin
  • Wait for Livy session to expire
  • See that new Livy session is started when running something in Zeppelin

Questions:

  • Does the licenses files need update? - No
  • Is there breaking changes for older versions? - No
  • Does this needs documentation? - No

@zjffdu
Copy link
Contributor

zjffdu commented Sep 23, 2016

@spektom Thanks for the contribution, I am working on a PR about integration test of livy interpreter, could you mind to wait for that PR and then add integration test to that ?

@mispecto
Copy link
Author

Of course, it would be great.
On Fri, Sep 23, 2016 at 06:28 Jeff Zhang [email protected] wrote:

@spektom https://github.com/spektom Thanks for the contribution, I am
working on a PR about integration test of livy interpreter, could you mind
to wait for that PR and then add integration test to that ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1447 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAJpO0VBwVJ9NayJbPq3h9j1bRji_4cuks5qs0dkgaJpZM4KDF0H
.

@gss2002
Copy link

gss2002 commented Oct 14, 2016

@spektom and @zjffdu is it possible to add a null check also. I've been doing some debugging over the past few days and certain situations can cause nulls to be returned and in theory if a null is being returned the session is dead.

So instead of this:
if (json.matches("^(")?Session ('[0-9]' )?not found(.?"?)$")) {
throw new LivyNoSessionException();

This:
boolean clearSession = false;
if (json != null) {
if (json.matches("^(")?Session ('[0-9]' )?not found(.?"?)$")) {
clearSession = true;
}
} else {
clearSession = true;
}
if (clearSession) {
throw new LivyNoSessionException();
}

@gss2002
Copy link

gss2002 commented Oct 14, 2016

Also I have some concerns on using Exceptions to as goto's as there is definite performance issues that occur by using exceptions... http://javarevisited.blogspot.com/2013/03/0-exception-handling-best-practices-in-Java-Programming.html. Specifically since code inside of the try catch cannot be optimized.

@mispecto mispecto force-pushed the ZEPPELIN-1293 branch 2 times, most recently from 12ee62d to 5fdb6df Compare October 14, 2016 06:26
@mispecto
Copy link
Author

@gss2002 you shouldn't be concerned about performance as this code only runs when command is executed.

@gss2002
Copy link

gss2002 commented Oct 14, 2016

@spektom this fix is good. Did some extensive load testing with it this AM and it solves the session expiration issues. Thanks for the contribution

throw new LivyException("Error executing command in Livy", e);
}
if (json == null || json.matches("^(\")?Session (\'[0-9]\' )?not found(.?\"?)$")) {
throw new LivyNoSessionException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some logging when this happens?

Copy link
Contributor

@purechoc purechoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now, "Session not found" pattern matched,
session id between 1 and 10.

} catch (Exception e) {
throw new LivyException("Error executing command in Livy", e);
}
if (json == null || json.matches("^(\")?Session (\'[0-9]\' )?not found(.?\"?)$")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to be change like this, isn't it?
json.matches("^(")?Session ('[0-9]' )?not found(.?"?)$")

@gss2002
Copy link

gss2002 commented Oct 19, 2016

@spektom / @zjffdu @purechoc there is definitely an additional condition. Not sure if it's because the ConcurrentHashMaps are not being used correctly. But the exception doesn't get caught completely or correctly at times with the fix proposed here..

ERROR [2016-10-19 14:19:57,638]({pool-2-thread-11} LivyHelper.java[executeHTTP]:378) - Error with 404 StatusCode: "Session '9' not found."
ERROR [2016-10-19 14:19:57,638]({pool-2-thread-11} LivyHelper.java[interpretInput]:229) - error in interpretInput
org.apache.zeppelin.livy.LivyHelper$LivyNoSessionException: Session not found, Livy server would have restarted, or lost session.
at org.apache.zeppelin.livy.LivyHelper.executeCommand(LivyHelper.java:312)
at org.apache.zeppelin.livy.LivyHelper.interpret(LivyHelper.java:241)
at org.apache.zeppelin.livy.LivyHelper.interpretInput(LivyHelper.java:189)
at org.apache.zeppelin.livy.LivySparkInterpreter.interpret(LivySparkInterpreter.java:106)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:390)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO [2016-10-19 14:19:57,639]({pool-2-thread-11} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1476901197622 finished by scheduler org.apache.zeppelin.livy.LivySparkInterpreter37814848
INFO [2016-10-19 14:19:57,819]({pool-2-thread-34} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1476901197819 started by scheduler org.apache.zeppelin.livy.LivySparkInterpreter37814848
ERROR [2016-10-19 14:19:57,835]({pool-2-thread-34} LivyHelper.java[executeHTTP]:378) - Error with 404 StatusCode: "Session '9' not found."
ERROR [2016-10-19 14:19:57,835]({pool-2-thread-34} LivyHelper.java[interpretInput]:229) - error in interpretInput
org.apache.zeppelin.livy.LivyHelper$LivyNoSessionException: Session not found, Livy server would have restarted, or lost session.
at org.apache.zeppelin.livy.LivyHelper.executeCommand(LivyHelper.java:312)
at org.apache.zeppelin.livy.LivyHelper.interpret(LivyHelper.java:241)
at org.apache.zeppelin.livy.LivyHelper.interpretInput(LivyHelper.java:189)
at org.apache.zeppelin.livy.LivySparkInterpreter.interpret(LivySparkInterpreter.java:106)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:390)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO [2016-10-19 14:19:57,836]({pool-2-thread-34} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1476901197819 finished by scheduler org.apache.zeppelin.livy.LivySparkInterpreter37814848

@mispecto
Copy link
Author

@gss2002 are you using latest code from the branch? I can't find org.apache.zeppelin.livy.LivyHelper.interpretInput(LivyHelper.java:189) invocation at org.apache.zeppelin.livy.LivySparkInterpreter.interpret(LivySparkInterpreter.java:106) as it's shown in your stack trace.

@gss2002
Copy link

gss2002 commented Oct 20, 2016

@spektom I think what happens here is this code fires.. which has nothing to do with the fix here..

in LivySparkInterpreter:
return livyHelper.interpretInput(line, interpreterContext, userSessionMap, out,
sessionId2AppIdMap.get(sessionId), sessionId2WebUIMap.get(sessionId), displayAppInfo);

That gets called before the NoSessionException occurs.. And then in
LivyHelper --> public InterpreterResult interpretInput grabs the exception and handles it. I guess the question is can we do a rootcause on this and rethrow?

} catch (Exception e) {
  LOGGER.error("error in interpretInput", e);
  return new InterpreterResult(Code.ERROR, e.getMessage());
}

@gss2002
Copy link

gss2002 commented Oct 20, 2016

@spektom just tested against my build by catching the exception and rethrowing.. It definitely solves the issue.

} catch (LivyNoSessionException e) {
    throw e;
} catch (Exception e) {
  LOGGER.error("error in interpretInput", e);
  return new InterpreterResult(Code.ERROR, e.getMessage());
}

}

public InterpreterResult interpret(String stringLines,

The code base I'm using is here with you patch and a few of @zjffdu patches and one of mine for NestedRuntimeException for 404's with KerberosTemplate: https://github.com/gss2002/zeppelin/blob/GSS_PROD_BUILD/livy/src/main/java/org/apache/zeppelin/livy/LivyHelper.java

@zjffdu
Copy link
Contributor

zjffdu commented Oct 31, 2016

Sorry for late response. @spektom Could you add some additional message of session recreation to be displayed in frontend. Because if session is recreated, we may need to rerun all the paragraphs, without knowing session recreationusers may be confused for errors like some variables are not defined.

@purechoc
Copy link
Contributor

purechoc commented Nov 3, 2016

LGTM. it's working nicely in my environment. (zeppelin 0.7 + livy 2.1)

@zjffdu
Copy link
Contributor

zjffdu commented Dec 23, 2016

@spektom Do you want to rebase it and continue work on it ?

@zjffdu
Copy link
Contributor

zjffdu commented Jan 6, 2017

@spektom Sorry, I didn't get your reply, I create another PR for it #1861

@asfgit asfgit closed this in c38a0a0 May 9, 2018
asfgit pushed a commit that referenced this pull request May 9, 2018
close #83
close #86
close #125
close #133
close #139
close #146
close #193
close #203
close #246
close #262
close #264
close #273
close #291
close #299
close #320
close #347
close #389
close #413
close #423
close #543
close #560
close #658
close #670
close #728
close #765
close #777
close #782
close #783
close #812
close #822
close #841
close #843
close #878
close #884
close #918
close #989
close #1076
close #1135
close #1187
close #1231
close #1304
close #1316
close #1361
close #1385
close #1390
close #1414
close #1422
close #1425
close #1447
close #1458
close #1466
close #1485
close #1492
close #1495
close #1497
close #1536
close #1545
close #1561
close #1577
close #1600
close #1603
close #1678
close #1695
close #1739
close #1748
close #1765
close #1767
close #1776
close #1783
close #1799
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants