-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for multi-line log messages #455
Comments
Possible solutions:
|
This might be more a client issue, because the API only says "string" which implicitly allows new-lines, I'd say. |
@soxofaan Could you copy me an example for such an error in Python? So that I can see how it is structured and how we could format it? I think we can already make this much more pleasing without a lot of rewriting. Thinking about something like: {
"id": "132",
"level": "error",
"message": "error processing batch job due to ...",
"data": {
"type": "Stacktrace",
"stacktrace": [
{"file": "batch_job.py", "line": 319, "text": "in main"},
{"file": "batch_job.py", "line": 292, "text": "in run_driver"},
...
]
}
} Something like that should already render much nicer in the component. |
This is an example {"logs":[{"id": "error", "level": "error", "message": "error processing batch job\nTraceback (most recent call last):\n File \"batch_job.py\", line 319, in main\n run_driver()\n File \"batch_job.py\", line 292, in run_driver\n run_job(\n File \"/data2/hadoop/yarn/local/usercache/johndoe/appcache/application_1652795411773_14281/container_e5028_1652795411773_14281_01_000002/venv/lib/python3.8/site-packages/openeogeotrellis/utils.py\", line 41, in memory_logging_wrapper\n return function(*args, **kwargs)\n File \"batch_job.py\", line 388, in run_job\n assets_metadata = result.write_assets(str(output_file))\n File \"/data2/hadoop/yarn/local/usercache/johndoe/appcache/application_1652795411773_14281/container_e5028_1652795411773_14281_01_000002/venv/lib/python3.8/site-packages/openeo_driver/save_result.py\", line 110, in write_assets\n return self.cube.write_assets(filename=directory, format=self.format, format_options=self.options)\n File \"/data2/hadoop/yarn/local/usercache/johndoe/appcache/application_1652795411773_14281/container_e5028_1652795411773_14281_01_000002/venv/lib/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py\", line 1542, in write_assets\n timestamped_paths = self._get_jvm().org.openeo.geotrellis.geotiff.package.saveRDDTemporal(\n File \"/opt/spark3_2_0/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py\", line 1309, in __call__\n return_value = get_return_value(\n File \"/opt/spark3_2_0/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py\", line 326, in get_return_value\n raise Py4JJavaError(\npy4j.protocol.Py4JJavaError: An error occurred while calling z:org.openeo.geotrellis.geotiff.package.saveRDDTemporal.\n: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3652 in stage 14.0 failed 4 times, most recent failure: Lost task 3652.3 in stage 14.0 (TID 3949) (epod130.vgt.vito.be executor 37): net.jodah.failsafe.FailsafeException: java.net.SocketTimeoutException: connect timed out\n\tat net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:385)\n\tat net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)\n\tat org.openeo.geotrellissentinelhub.package$.withRetries(package.scala:59)\n\tat org.openeo.geotrellissentinelhub.DefaultProcessApi.getTile(ProcessApi.scala:119)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.$anonfun$datacube_seq$1(PyramidFactory.scala:193)\n\tat org.openeo.geotrellissentinelhub.MemoizedRlGuardAdapterCachedAccessTokenWithAuthApiFallbackAuthorizer.authorized(Authorizer.scala:46)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.authorized(PyramidFactory.scala:56)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.org$openeo$geotrellissentinelhub$PyramidFactory$$getTile$1(PyramidFactory.scala:191)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.org$openeo$geotrellissentinelhub$PyramidFactory$$dataTile$1(PyramidFactory.scala:201)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.loadMasked$1(PyramidFactory.scala:226)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.$anonfun$datacube_seq$16(PyramidFactory.scala:283)\n\tat scala.collection.Iterator$$anon$10.next(Iterator.scala:459)\n\tat scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:512)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)\n\tat scala.collection.Iterator.foreach(Iterator.scala:941)\n\tat scala.collection.Iterator.foreach$(Iterator.scala:941)\n\tat scala.collection.AbstractIterator.foreach(Iterator.scala:1429)\n\tat org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)\n\tat org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:670)\n\tat org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:424)\n\tat org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2019)\n\tat org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:259)\nCaused by: java.net.SocketTimeoutException: connect timed out\n\tat java.base/java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)\n\tat java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)\n\tat java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)\n\tat java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.base/java.net.Socket.connect(Socket.java:609)\n\tat java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:300)\n\tat java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)\n\tat java.base/sun.net.[www.http.HttpClient.openServer(HttpClient.java:474)\n\tat](https://www.http.HttpClient.openServer(HttpClient.java:474)\n\tat) java.base/sun.net.[www.http.HttpClient.openServer(HttpClient.java:569)\n\tat](https://www.http.HttpClient.openServer(HttpClient.java:569)\n\tat) java.base/sun.net.[www.protocol.https.HttpsClient.<init>(HttpsClient.java:266)\n\tat](https://www.protocol.https.HttpsClient.<init>(HttpsClient.java:266)\n\tat) java.base/sun.net.[www.protocol.https.HttpsClient.New(HttpsClient.java:373)\n\tat](https://www.protocol.https.HttpsClient.New(HttpsClient.java:373)\n\tat) java.base/sun.net.[www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203)\n\tat](https://www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203)\n\tat) java.base/sun.net.[www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)\n\tat](https://www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)\n\tat) java.base/sun.net.[www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)\n\tat](https://www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)\n\tat) java.base/sun.net.[www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189)\n\tat](https://www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189)\n\tat) java.base/sun.net.[www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)\n\tat](https://www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)\n\tat) java.base/sun.net.[www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)\n\tat](https://www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)\n\tat) java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)\n\tat java.base/sun.net.[www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334)\n\tat](https://www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334)\n\tat) scalaj.http.HttpRequest.doConnection(Http.scala:367)\n\tat scalaj.http.HttpRequest.exec(Http.scala:343)\n\tat org.openeo.geotrellissentinelhub.DefaultProcessApi.$anonfun$getTile$7(ProcessApi.scala:120)\n\tat org.openeo.geotrellissentinelhub.package$$anon$1.get(package.scala:60)\n\tat net.jodah.failsafe.Functions.lambda$get$0(Functions.java:46)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:65)\n\tat net.jodah.failsafe.Execution.executeSync(Execution.java:128)\n\tat net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:378)\n\t... 24 more\n\nDriver stacktrace:\n\tat org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2403)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2352)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2351)\n\tat scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)\n\tat scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)\n\tat scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)\n\tat org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2351)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1109)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1109)\n\tat scala.Option.foreach(Option.scala:407)\n\tat org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1109)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2591)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2533)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522)\n\tat org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:898)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2214)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2235)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2254)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2279)\n\tat org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:414)\n\tat org.apache.spark.rdd.RDD.collect(RDD.scala:1029)\n\tat org.openeo.geotrellis.geotiff.package$.saveRDDTemporal(package.scala:136)\n\tat org.openeo.geotrellis.geotiff.package.saveRDDTemporal(package.scala)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: net.jodah.failsafe.FailsafeException: java.net.SocketTimeoutException: connect timed out\n\tat net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:385)\n\tat net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)\n\tat org.openeo.geotrellissentinelhub.package$.withRetries(package.scala:59)\n\tat org.openeo.geotrellissentinelhub.DefaultProcessApi.getTile(ProcessApi.scala:119)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.$anonfun$datacube_seq$1(PyramidFactory.scala:193)\n\tat org.openeo.geotrellissentinelhub.MemoizedRlGuardAdapterCachedAccessTokenWithAuthApiFallbackAuthorizer.authorized(Authorizer.scala:46)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.authorized(PyramidFactory.scala:56)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.org$openeo$geotrellissentinelhub$PyramidFactory$$getTile$1(PyramidFactory.scala:191)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.org$openeo$geotrellissentinelhub$PyramidFactory$$dataTile$1(PyramidFactory.scala:201)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.loadMasked$1(PyramidFactory.scala:226)\n\tat org.openeo.geotrellissentinelhub.PyramidFactory.$anonfun$datacube_seq$16(PyramidFactory.scala:283)\n\tat scala.collection.Iterator$$anon$10.next(Iterator.scala:459)\n\tat scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:512)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)\n\tat scala.collection.Iterator.foreach(Iterator.scala:941)\n\tat scala.collection.Iterator.foreach$(Iterator.scala:941)\n\tat scala.collection.AbstractIterator.foreach(Iterator.scala:1429)\n\tat org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)\n\tat org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:670)\n\tat org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:424)\n\tat org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2019)\n\tat org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:259)\nCaused by: java.net.SocketTimeoutException: connect timed out\n\tat java.base/java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)\n\tat java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)\n\tat java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)\n\tat java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.base/java.net.Socket.connect(Socket.java:609)\n\tat java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:300)\n\tat java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)\n\tat java.base/sun.net.[www.http.HttpClient.openServer(HttpClient.java:474)\n\tat](https://www.http.HttpClient.openServer(HttpClient.java:474)\n\tat) java.base/sun.net.[www.http.HttpClient.openServer(HttpClient.java:569)\n\tat](https://www.http.HttpClient.openServer(HttpClient.java:569)\n\tat) java.base/sun.net.[www.protocol.https.HttpsClient.<init>(HttpsClient.java:266)\n\tat](https://www.protocol.https.HttpsClient.<init>(HttpsClient.java:266)\n\tat) java.base/sun.net.[www.protocol.https.HttpsClient.New(HttpsClient.java:373)\n\tat](https://www.protocol.https.HttpsClient.New(HttpsClient.java:373)\n\tat) java.base/sun.net.[www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203)\n\tat](https://www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203)\n\tat) java.base/sun.net.[www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)\n\tat](https://www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)\n\tat) java.base/sun.net.[www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)\n\tat](https://www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)\n\tat) java.base/sun.net.[www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189)\n\tat](https://www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189)\n\tat) java.base/sun.net.[www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)\n\tat](https://www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)\n\tat) java.base/sun.net.[www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)\n\tat](https://www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)\n\tat) java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)\n\tat java.base/sun.net.[www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334)\n\tat](https://www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334)\n\tat) scalaj.http.HttpRequest.doConnection(Http.scala:367)\n\tat scalaj.http.HttpRequest.exec(Http.scala:343)\n\tat org.openeo.geotrellissentinelhub.DefaultProcessApi.$anonfun$getTile$7(ProcessApi.scala:120)\n\tat org.openeo.geotrellissentinelhub.package$$anon$1.get(package.scala:60)\n\tat net.jodah.failsafe.Functions.lambda$get$0(Functions.java:46)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:65)\n\tat net.jodah.failsafe.Execution.executeSync(Execution.java:128)\n\tat net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:378)\n\t... 24 more\n\n"}],"links":[]}
Also note that because of our processing stack, this is pretty complex stack trace: part of it is a python stack trace, and part of it is Java/Scala stack trace, both of which can have multiple phases (e.g. an exception handler that raises another exception):
That makes it pretty hard to come up with some useful |
It would already help to extract the actual error message and put the stacktrace in an array/object style structure so that the component can render it in a more structured way. |
* Clarified the formatting of the `message` property. #455
When a batch job fails in the VITO back-end, the error logs typically contains a multi-line stack trace.
We currently encode the newlines (JSON style) as
\n
in the message string.This
\n
is currently an ad-hoc solution in terms of the openEO API, because it does not specify how things (like multi-line log messages) should be encoded.Can we standardize this in some way, so that all clients can build on this?
E.g. the web editor component collapses all whitespace (newlines and indentation), which makes user support painful.
And in the python client you also have to be careful to get a useful render of the log message.
The text was updated successfully, but these errors were encountered: