Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Deserialize tensorflow MultilineMessageKeyError #50138

Open
Oblynx opened this issue Jan 30, 2025 · 0 comments
Open

[Core] Deserialize tensorflow MultilineMessageKeyError #50138

Oblynx opened this issue Jan 30, 2025 · 0 comments
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks

Comments

@Oblynx
Copy link

Oblynx commented Jan 30, 2025

What happened + What you expected to happen

What happened

Ray throws an opaque error in place of the actual one. This is the error from Ray:

  File "/usr/local/lib/python3.11/dist-packages/ray/exceptions.py", line 45, in from_bytes
    return RayError.from_ray_exception(ray_exception)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ray/exceptions.py", line 54, in from_ray_exception
   raise RuntimeError(msg) from e
RuntimeError: Failed to unpickle serialized exception

What should have happened

Digging in with the Ray debugger, I catch the exception before Ray tries to deserialize it.

  1. I expect Ray to deserialize it.
  2. If Ray can't deserialize it, I expect it to still dump it in some way in the logs -- otherwise I have no means of fixing the issue!

The exception is of type tensorflow.python.autograph.pyct.error_utils.MultilineMessageKeyError:

Traceback (most recent call last):
  File "python/ray/_raylet.pyx", line 1974, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 1879, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 1820, in ray._raylet.execute_task.function_executor
  File "/usr/local/lib/python3.11/dist-packages/ray/_private/function_manager.py", line 696, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ray/util/tracing/tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  <user frames>
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filey6_d4pwe.py", line 76, in tf__compute_gradients
    ag__.if_stmt(ag__.ld(is_design_module), if_body_1, else_body_1, get_state_1, set_state_1, ('loss_value', 'preds'), 2)
  File "/tmp/__autograph_generated_filey6_d4pwe.py", line 58, in if_body_1
    loss_value = ag__.converted_call(ag__.ld(loss), (), dict(preds=ag__.ld(preds), inputs=ag__.ld(full_sample)), fscope)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/__autograph_generated_fileg_9kc5kn.py", line 31, in tf____call__
    raise
  File "/tmp/__autograph_generated_file6ahn6zy_.py", line 16, in tf__call
    raise
tensorflow.python.autograph.pyct.error_utils.MultilineMessageKeyError: in user code:
  <user error>

Proposed fix

  1. In this case, dump a base64 version of the pickled error
  2. Why don't we use cloudpickle to deserialize more errors?

Similar issues

Versions / Dependencies

  • Ray 2.40.0
  • Python 3.11.11

Reproduction script

Not sure how to produce an unserializable exception without Tensorflow.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@Oblynx Oblynx added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 30, 2025
@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Jan 31, 2025
@jjyao jjyao added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

3 participants