Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Failed to read the message size from the input stream #25

Open
PovilasKud opened this issue Sep 13, 2018 · 7 comments
Open

Error: Failed to read the message size from the input stream #25

PovilasKud opened this issue Sep 13, 2018 · 7 comments

Comments

@PovilasKud
Copy link

PROBLEM: I have ran a bunch (4301 qnt) of requests to REST API endpoint getting back JSON response and it crashes with error Failed to read the message size from the input stream

Full traceback:

2018-09-12 12:24:40.337 INFO <Thread-404> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Dispatching started for transformation [P_get_data] 2018-09-12 12:24:40.355 INFO <P_get_data - Get rows from result> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Finished processing (I=0, O=0, R=4301, W=4301, U=0, E=0) 2018-09-12 13:01:09.892 ERROR <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Unexpected error 2018-09-12 13:01:09.893 ERROR <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] org.pentaho.di.core.exception.KettleException: java.io.IOException: Failed to read the message size from the input stream! Failed to read the message size from the input stream!

`at org.pentaho.python.ServerUtils.receiveRowsFromPandasDataFrame(ServerUtils.java:591)
at org.pentaho.python.PythonSession.rowsFromPythonDataFrame(PythonSession.java:462)
at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutorData.constructOutputRowsFromFrame(CPythonScriptExecutorData.java:238)
at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutor.executeScriptAndProcessResult(CPythonScriptExecutor.java:367)
at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutor.processBatch(CPythonScriptExecutor.java:284)
at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutor.processRow(CPythonScriptExecutor.java:243)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
at java.lang.Thread.run(Thread.java:748)

Caused by: java.io.IOException: Failed to read the message size from the input stream!
at org.pentaho.python.ServerUtils.readDelimitedFromInputStream(ServerUtils.java:921)
at org.pentaho.python.ServerUtils.receiveRowsFromPandasDataFrame(ServerUtils.java:587)
... 7 more`

2018-09-12 13:01:09.896 ERROR <Thread-404> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Errors detected! 2018-09-12 13:01:09.897 INFO <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Finished processing (I=0, O=0, R=4301, W=0, U=0, E=1) 2018-09-12 13:01:09.898 WARN <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Transformation detected one or more steps with errors. 2018-09-12 13:01:09.899 WARN <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Transformation is killing the other steps!

I have checked serve resources(CPU, RAM) and it doesn't look like it's related to the error.

What might be the problem ?

@m-a-hall
Copy link
Contributor

This often happens when the python script fails to execute, or there is some sort of catastrophic failure with respect to the python micro-service (the latter more often results in broken socket errors though). Can you run your scripts successfully outside of PDI?

@usbrandon
Copy link

I have this problem too. My script does execute outside of PDI.

@laercioleo
Copy link

Ocorre o mesmo erro pra mim. O script Python funciona fora do Pentaho.

Outra situação é um script Python que funciona no Pentaho. Se eu alterar e der erro na execução, mesmo alterando pra situação anterior continua dando o mesmo erro. Mas se eu fechar e abrir o Pentaho para de dar erro.

@m-a-hall
Copy link
Contributor

If you are trying to retrieve the contents of a variable in python (to pass on downstream in PDI) then the variable must be json serializable. If it is not, then this can cause a communications failure with the micro server. E.g. numpy arrays are not json serializable, so need to be converted to a list before they can be retrieved from python into the step.

@oddworldng
Copy link

I had the same problem and the solution was normalizing all Python strings with accent marks using "unicodedata" library.

https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize

@qqwerty221
Copy link

Input manual python script which chinese would trigger this Error, too.
file_obj = getObjectByPath('/test_folder') #working
file_obj = getObjectByPath('/脚本') #Raise exception error

@cdm-tao
Copy link

cdm-tao commented Jun 27, 2024

This error also occurs if a comment in the Python code includes accented characters. For example:
# Llamar a DataFrame.reset_index para añadir las columnas del índice

will cause the "Failed to read message size..." error. Pentaho PDI needs to be relaunched to prevent susbsequent calls to rasks containing cpyhton nodes provoking the error "Software caused connection abort: socket write error" - even though the original code has been modified to no include accents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants