-
Notifications
You must be signed in to change notification settings - Fork 7.1k
import custom class safely in controller by using runtime env #56855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly defers the loading of the custom request router to the replica, which resolves the issue of ModuleNotFoundError in the controller. The change from serializing/deserializing the class to normalizing it to a path and importing it on demand is a good approach. I've added a couple of comments to update the docstrings which are now outdated due to these changes. Also, it seems the _serialized_request_router_cls attribute in RequestRouterConfig is now obsolete and could be removed in a follow-up to improve code clarity.
|
I think this still wont work given runtime-env for proxy is not from application |
zcin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention to explicitly only support this for non-ingress deployments then? (unless they make it importable from the image)
We would encourage users to use
|
Signed-off-by: abrar <[email protected]>
773063f to
e97f7ba
Compare
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
eicherseiji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abrarsheikh!
Is the intention to explicitly only support this for non-ingress deployments then? (unless they make it importable from the image)
We would encourage users to use
- Custom routers shipped with serve, there wont be that many of these to be honest.
- If they build their own, then we make these nuances clear to them in the documentation.
@abrarsheikh would it be possible to add the docs update to this PR as well?
I was planning to do that in a follow up PR, this once already quite long |
Signed-off-by: abrar <[email protected]>
eicherseiji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @abrarsheikh!
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
This sounds reasonable to me assuming that there will be a very good error message if ser/de fails here that tells people how to set up the more advanced distribution pattern (build into the image). Is it already supported to pass a string import path rather than the class directly? |
Not in this PR, but in a follow up PR i will write serve docs on how to do this. And that is where will point users to in the exception.
unfortunately, yes |
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
…oject#56855) currently the `RequestRouterConfig` is initialized from the controller. When `RequestRouterConfig` is initialized, we try to import the Class and serialize it. This is unsafe because the controller does not have the application runtime-env. See the following stack trace for evidence of how `RequestRouterConfig` is initialized in controller ``` Unexpected error occurred while applying config for application 'app1': Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/_private/application_state.py", line 691, in _reconcile_build_app_task overrided_infos = override_deployment_info( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/_private/application_state.py", line 1333, in override_deployment_info override_options["deployment_config"] = DeploymentConfig(**original_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 339, in __init__ values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 1074, in validate_model v_, errors_ = field.validate(value, values, loc=field.alias, cls=cls_) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 881, in validate v, errors = self._validate_singleton(v, values, loc, cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 1098, in _validate_singleton return self._apply_validators(v, values, loc, cls, self.validators) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 1154, in _apply_validators v = validator(cls, v, values, self, self.model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/class_validators.py", line 337, in <lambda> return lambda cls, v, values, field, config: validator(v) ^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 711, in validate return cls(**value) ^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/config.py", line 135, in __init__ self._serialize_request_router_cls() File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/config.py", line 151, in _serialize_request_router_cls request_router_class = import_attr(request_router_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_common/utils.py", line 44, in import_attr module = importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/importlib/__init__.py", line 90, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1387, in _gcd_import File "<frozen importlib._bootstrap>", line 1360, in _find_and_load File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked ModuleNotFoundError: No module named 'custom_request_router' ``` This PR import the custom class from `build_serve_application` so that the operation is performed in context of runtime env The same idea applies to the deployment level autoscaling function and application-level autoscaling function. Another issue happens because `cloudpickle` sees the `UniformRequestRouter` class as **a top-level symbol in an importable module** (`custom_request_router`). When `cloudpickle.dumps()` runs, it recognizes that the object can be **re-imported** via `import custom_request_router; custom_request_router.UniformRequestRouter`, so instead of embedding the actual code, it just stores that import reference. That’s why the pickle bytes only contain `b'...custom_request_router...UniformRequestRouter...'` — no source or bytecode. but we want `cloudpickle` to embed the class definition instead of just referencing it, call: ```python import importlib, cloudpickle mod = importlib.import_module("custom_request_router") cloudpickle.register_pickle_by_value(mod) payload = cloudpickle.dumps(mod.UniformRequestRouter) ``` This tells `cloudpickle` to serialize everything from that module **by value (with code included)** rather than **by reference**. We cannot rely on the reference because custom request router class can be used in serve proxy running outside of runtime env, hence having the entire bytecode make `cloudpickle.loads` possible Changes ``` request_router_config: RequestRouterConfig = Field( default=RequestRouterConfig(), update_type=DeploymentOptionUpdateType.NeedsActorReconfigure, ) ``` to ``` request_router_config: RequestRouterConfig = Field( default_factory=RequestRouterConfig, update_type=DeploymentOptionUpdateType.NeedsActorReconfigure, ) ``` because `RequestRouterConfig` was getting initialized at import time and causing a problem is the docs build. also changed ``` created_at: float = field(default_factory=lambda: time.time()) ``` ``` self._get_curr_time_s = ( get_curr_time_s if get_curr_time_s is not None else lambda: time.time() ) ``` because a reference to `time.time` is not serializable by cloudpickle --------- Signed-off-by: abrar <[email protected]> Signed-off-by: xgui <[email protected]>
## Description 1. Update docs 2. catch exception and redirect users to docs ## Related issues #56855 ## Additional information Hard to write tests for this situation. Manually verified that this is the right exception to catch ``` try: cloudpickle.loads(b'\x80\x05\x95\xbc\x03\x00\x00\x00\x00\x00\x00\x8c\x1bray.cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x03K\x03KCCnt\x00\xa0\x01\xa1\x00}\x01|\x01j\x02}\x02t\x03t\x04\x83\x01\x01\x00d\x01|\x02\x04\x00\x03\x00k\x01r*d\x02k\x00r:n\x04\x01\x00n\x0cd\x03d\x04d\x05i\x01f\x02S\x00d\x06|\x02\x04\x00\x03\x00k\x01rNd\x07k\x00r^n\x04\x01\x00n\x0cd\x08d\x04d\ti\x01f\x02S\x00d\nd\x04d\x0bi\x01f\x02S\x00d\x00S\x00\x94(NK\tK\x11K\x02\x8c\x06reason\x94\x8c\x0eBusiness hours\x94K\x12K\x14K\x04\x8c\x18Evening batch processing\x94K\x01\x8c\x0eOff-peak hours\x94t\x94(\x8c\x08datetime\x94\x8c\x03now\x94\x8c\x04hour\x94\x8c\x05print\x94\x8c\x04avro\x94t\x94\x8c\x03ctx\x94\x8c\x0ccurrent_time\x94\x8c\x0ccurrent_hour\x94\x87\x94\x8c\x1b/home/ubuntu/apps/policy.py\x94\x8c!scheduled_batch_processing_policy\x94K\x0eC\x10\x00\x03\x08\x01\x06\x01\x08\x02\x18\x01\x0c\x02\x18\x01\x0c\x03\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c\x08__file__\x94h\x18uNNNt\x94R\x94h\x00\x8c\x12_function_setstate\x94\x93\x94h#}\x94}\x94(h\x1fh\x19\x8c\x0c__qualname__\x94h\x19\x8c\x0f__annotations__\x94}\x94(h\x14\x8c\x10ray.serve.config\x94\x8c\x12AutoscalingContext\x94\x93\x94\x8c\x06return\x94h\x04\x8c\x0cGenericAlias\x94\x85\x94R\x94\x8c\x08builtins\x94\x8c\x05tuple\x94\x93\x94h2\x8c\x03int\x94\x93\x94\x8c\t_operator\x94\x8c\x07getitem\x94\x93\x94\x8c\x06typing\x94\x8c\x04Dict\x94\x93\x94h2\x8c\x03str\x94\x93\x94h:\x8c\x03Any\x94\x93\x94\x86\x94\x86\x94R\x94\x86\x94\x86\x94R\x94u\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h \x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94(h\x0eh\x0e\x8c\x08datetime\x94\x93\x94h\x12h\x00\x8c\tsubimport\x94\x93\x94h\x12\x85\x94R\x94uu\x86\x94\x86R0.') except (ModuleNotFoundError, ImportError) as e: print(f"caused by {e} {type(e)}") ``` ``` ❯ python policy.py caused by No module named 'avro' <class 'ModuleNotFoundError'> ``` --------- Signed-off-by: abrar <[email protected]>
…oject#56855) currently the `RequestRouterConfig` is initialized from the controller. When `RequestRouterConfig` is initialized, we try to import the Class and serialize it. This is unsafe because the controller does not have the application runtime-env. See the following stack trace for evidence of how `RequestRouterConfig` is initialized in controller ``` Unexpected error occurred while applying config for application 'app1': Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/_private/application_state.py", line 691, in _reconcile_build_app_task overrided_infos = override_deployment_info( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/_private/application_state.py", line 1333, in override_deployment_info override_options["deployment_config"] = DeploymentConfig(**original_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 339, in __init__ values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 1074, in validate_model v_, errors_ = field.validate(value, values, loc=field.alias, cls=cls_) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 881, in validate v, errors = self._validate_singleton(v, values, loc, cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 1098, in _validate_singleton return self._apply_validators(v, values, loc, cls, self.validators) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 1154, in _apply_validators v = validator(cls, v, values, self, self.model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/class_validators.py", line 337, in <lambda> return lambda cls, v, values, field, config: validator(v) ^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 711, in validate return cls(**value) ^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/config.py", line 135, in __init__ self._serialize_request_router_cls() File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/config.py", line 151, in _serialize_request_router_cls request_router_class = import_attr(request_router_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_common/utils.py", line 44, in import_attr module = importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/importlib/__init__.py", line 90, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1387, in _gcd_import File "<frozen importlib._bootstrap>", line 1360, in _find_and_load File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked ModuleNotFoundError: No module named 'custom_request_router' ``` This PR import the custom class from `build_serve_application` so that the operation is performed in context of runtime env The same idea applies to the deployment level autoscaling function and application-level autoscaling function. Another issue happens because `cloudpickle` sees the `UniformRequestRouter` class as **a top-level symbol in an importable module** (`custom_request_router`). When `cloudpickle.dumps()` runs, it recognizes that the object can be **re-imported** via `import custom_request_router; custom_request_router.UniformRequestRouter`, so instead of embedding the actual code, it just stores that import reference. That’s why the pickle bytes only contain `b'...custom_request_router...UniformRequestRouter...'` — no source or bytecode. but we want `cloudpickle` to embed the class definition instead of just referencing it, call: ```python import importlib, cloudpickle mod = importlib.import_module("custom_request_router") cloudpickle.register_pickle_by_value(mod) payload = cloudpickle.dumps(mod.UniformRequestRouter) ``` This tells `cloudpickle` to serialize everything from that module **by value (with code included)** rather than **by reference**. We cannot rely on the reference because custom request router class can be used in serve proxy running outside of runtime env, hence having the entire bytecode make `cloudpickle.loads` possible Changes ``` request_router_config: RequestRouterConfig = Field( default=RequestRouterConfig(), update_type=DeploymentOptionUpdateType.NeedsActorReconfigure, ) ``` to ``` request_router_config: RequestRouterConfig = Field( default_factory=RequestRouterConfig, update_type=DeploymentOptionUpdateType.NeedsActorReconfigure, ) ``` because `RequestRouterConfig` was getting initialized at import time and causing a problem is the docs build. also changed ``` created_at: float = field(default_factory=lambda: time.time()) ``` ``` self._get_curr_time_s = ( get_curr_time_s if get_curr_time_s is not None else lambda: time.time() ) ``` because a reference to `time.time` is not serializable by cloudpickle --------- Signed-off-by: abrar <[email protected]>
## Description 1. Update docs 2. catch exception and redirect users to docs ## Related issues ray-project#56855 ## Additional information Hard to write tests for this situation. Manually verified that this is the right exception to catch ``` try: cloudpickle.loads(b'\x80\x05\x95\xbc\x03\x00\x00\x00\x00\x00\x00\x8c\x1bray.cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x03K\x03KCCnt\x00\xa0\x01\xa1\x00}\x01|\x01j\x02}\x02t\x03t\x04\x83\x01\x01\x00d\x01|\x02\x04\x00\x03\x00k\x01r*d\x02k\x00r:n\x04\x01\x00n\x0cd\x03d\x04d\x05i\x01f\x02S\x00d\x06|\x02\x04\x00\x03\x00k\x01rNd\x07k\x00r^n\x04\x01\x00n\x0cd\x08d\x04d\ti\x01f\x02S\x00d\nd\x04d\x0bi\x01f\x02S\x00d\x00S\x00\x94(NK\tK\x11K\x02\x8c\x06reason\x94\x8c\x0eBusiness hours\x94K\x12K\x14K\x04\x8c\x18Evening batch processing\x94K\x01\x8c\x0eOff-peak hours\x94t\x94(\x8c\x08datetime\x94\x8c\x03now\x94\x8c\x04hour\x94\x8c\x05print\x94\x8c\x04avro\x94t\x94\x8c\x03ctx\x94\x8c\x0ccurrent_time\x94\x8c\x0ccurrent_hour\x94\x87\x94\x8c\x1b/home/ubuntu/apps/policy.py\x94\x8c!scheduled_batch_processing_policy\x94K\x0eC\x10\x00\x03\x08\x01\x06\x01\x08\x02\x18\x01\x0c\x02\x18\x01\x0c\x03\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c\x08__file__\x94h\x18uNNNt\x94R\x94h\x00\x8c\x12_function_setstate\x94\x93\x94h#}\x94}\x94(h\x1fh\x19\x8c\x0c__qualname__\x94h\x19\x8c\x0f__annotations__\x94}\x94(h\x14\x8c\x10ray.serve.config\x94\x8c\x12AutoscalingContext\x94\x93\x94\x8c\x06return\x94h\x04\x8c\x0cGenericAlias\x94\x85\x94R\x94\x8c\x08builtins\x94\x8c\x05tuple\x94\x93\x94h2\x8c\x03int\x94\x93\x94\x8c\t_operator\x94\x8c\x07getitem\x94\x93\x94\x8c\x06typing\x94\x8c\x04Dict\x94\x93\x94h2\x8c\x03str\x94\x93\x94h:\x8c\x03Any\x94\x93\x94\x86\x94\x86\x94R\x94\x86\x94\x86\x94R\x94u\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h \x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94(h\x0eh\x0e\x8c\x08datetime\x94\x93\x94h\x12h\x00\x8c\tsubimport\x94\x93\x94h\x12\x85\x94R\x94uu\x86\x94\x86R0.') except (ModuleNotFoundError, ImportError) as e: print(f"caused by {e} {type(e)}") ``` ``` ❯ python policy.py caused by No module named 'avro' <class 'ModuleNotFoundError'> ``` --------- Signed-off-by: abrar <[email protected]>
…oject#56855) currently the `RequestRouterConfig` is initialized from the controller. When `RequestRouterConfig` is initialized, we try to import the Class and serialize it. This is unsafe because the controller does not have the application runtime-env. See the following stack trace for evidence of how `RequestRouterConfig` is initialized in controller ``` Unexpected error occurred while applying config for application 'app1': Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/_private/application_state.py", line 691, in _reconcile_build_app_task overrided_infos = override_deployment_info( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/_private/application_state.py", line 1333, in override_deployment_info override_options["deployment_config"] = DeploymentConfig(**original_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 339, in __init__ values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 1074, in validate_model v_, errors_ = field.validate(value, values, loc=field.alias, cls=cls_) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 881, in validate v, errors = self._validate_singleton(v, values, loc, cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 1098, in _validate_singleton return self._apply_validators(v, values, loc, cls, self.validators) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 1154, in _apply_validators v = validator(cls, v, values, self, self.model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/class_validators.py", line 337, in <lambda> return lambda cls, v, values, field, config: validator(v) ^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 711, in validate return cls(**value) ^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/config.py", line 135, in __init__ self._serialize_request_router_cls() File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/config.py", line 151, in _serialize_request_router_cls request_router_class = import_attr(request_router_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_common/utils.py", line 44, in import_attr module = importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/importlib/__init__.py", line 90, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1387, in _gcd_import File "<frozen importlib._bootstrap>", line 1360, in _find_and_load File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked ModuleNotFoundError: No module named 'custom_request_router' ``` This PR import the custom class from `build_serve_application` so that the operation is performed in context of runtime env The same idea applies to the deployment level autoscaling function and application-level autoscaling function. Another issue happens because `cloudpickle` sees the `UniformRequestRouter` class as **a top-level symbol in an importable module** (`custom_request_router`). When `cloudpickle.dumps()` runs, it recognizes that the object can be **re-imported** via `import custom_request_router; custom_request_router.UniformRequestRouter`, so instead of embedding the actual code, it just stores that import reference. That’s why the pickle bytes only contain `b'...custom_request_router...UniformRequestRouter...'` — no source or bytecode. but we want `cloudpickle` to embed the class definition instead of just referencing it, call: ```python import importlib, cloudpickle mod = importlib.import_module("custom_request_router") cloudpickle.register_pickle_by_value(mod) payload = cloudpickle.dumps(mod.UniformRequestRouter) ``` This tells `cloudpickle` to serialize everything from that module **by value (with code included)** rather than **by reference**. We cannot rely on the reference because custom request router class can be used in serve proxy running outside of runtime env, hence having the entire bytecode make `cloudpickle.loads` possible Changes ``` request_router_config: RequestRouterConfig = Field( default=RequestRouterConfig(), update_type=DeploymentOptionUpdateType.NeedsActorReconfigure, ) ``` to ``` request_router_config: RequestRouterConfig = Field( default_factory=RequestRouterConfig, update_type=DeploymentOptionUpdateType.NeedsActorReconfigure, ) ``` because `RequestRouterConfig` was getting initialized at import time and causing a problem is the docs build. also changed ``` created_at: float = field(default_factory=lambda: time.time()) ``` ``` self._get_curr_time_s = ( get_curr_time_s if get_curr_time_s is not None else lambda: time.time() ) ``` because a reference to `time.time` is not serializable by cloudpickle --------- Signed-off-by: abrar <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
## Description 1. Update docs 2. catch exception and redirect users to docs ## Related issues ray-project#56855 ## Additional information Hard to write tests for this situation. Manually verified that this is the right exception to catch ``` try: cloudpickle.loads(b'\x80\x05\x95\xbc\x03\x00\x00\x00\x00\x00\x00\x8c\x1bray.cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x03K\x03KCCnt\x00\xa0\x01\xa1\x00}\x01|\x01j\x02}\x02t\x03t\x04\x83\x01\x01\x00d\x01|\x02\x04\x00\x03\x00k\x01r*d\x02k\x00r:n\x04\x01\x00n\x0cd\x03d\x04d\x05i\x01f\x02S\x00d\x06|\x02\x04\x00\x03\x00k\x01rNd\x07k\x00r^n\x04\x01\x00n\x0cd\x08d\x04d\ti\x01f\x02S\x00d\nd\x04d\x0bi\x01f\x02S\x00d\x00S\x00\x94(NK\tK\x11K\x02\x8c\x06reason\x94\x8c\x0eBusiness hours\x94K\x12K\x14K\x04\x8c\x18Evening batch processing\x94K\x01\x8c\x0eOff-peak hours\x94t\x94(\x8c\x08datetime\x94\x8c\x03now\x94\x8c\x04hour\x94\x8c\x05print\x94\x8c\x04avro\x94t\x94\x8c\x03ctx\x94\x8c\x0ccurrent_time\x94\x8c\x0ccurrent_hour\x94\x87\x94\x8c\x1b/home/ubuntu/apps/policy.py\x94\x8c!scheduled_batch_processing_policy\x94K\x0eC\x10\x00\x03\x08\x01\x06\x01\x08\x02\x18\x01\x0c\x02\x18\x01\x0c\x03\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c\x08__file__\x94h\x18uNNNt\x94R\x94h\x00\x8c\x12_function_setstate\x94\x93\x94h#}\x94}\x94(h\x1fh\x19\x8c\x0c__qualname__\x94h\x19\x8c\x0f__annotations__\x94}\x94(h\x14\x8c\x10ray.serve.config\x94\x8c\x12AutoscalingContext\x94\x93\x94\x8c\x06return\x94h\x04\x8c\x0cGenericAlias\x94\x85\x94R\x94\x8c\x08builtins\x94\x8c\x05tuple\x94\x93\x94h2\x8c\x03int\x94\x93\x94\x8c\t_operator\x94\x8c\x07getitem\x94\x93\x94\x8c\x06typing\x94\x8c\x04Dict\x94\x93\x94h2\x8c\x03str\x94\x93\x94h:\x8c\x03Any\x94\x93\x94\x86\x94\x86\x94R\x94\x86\x94\x86\x94R\x94u\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h \x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94(h\x0eh\x0e\x8c\x08datetime\x94\x93\x94h\x12h\x00\x8c\tsubimport\x94\x93\x94h\x12\x85\x94R\x94uu\x86\x94\x86R0.') except (ModuleNotFoundError, ImportError) as e: print(f"caused by {e} {type(e)}") ``` ``` ❯ python policy.py caused by No module named 'avro' <class 'ModuleNotFoundError'> ``` --------- Signed-off-by: abrar <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
…oject#56855) currently the `RequestRouterConfig` is initialized from the controller. When `RequestRouterConfig` is initialized, we try to import the Class and serialize it. This is unsafe because the controller does not have the application runtime-env. See the following stack trace for evidence of how `RequestRouterConfig` is initialized in controller ``` Unexpected error occurred while applying config for application 'app1': Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/_private/application_state.py", line 691, in _reconcile_build_app_task overrided_infos = override_deployment_info( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/_private/application_state.py", line 1333, in override_deployment_info override_options["deployment_config"] = DeploymentConfig(**original_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 339, in __init__ values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 1074, in validate_model v_, errors_ = field.validate(value, values, loc=field.alias, cls=cls_) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 881, in validate v, errors = self._validate_singleton(v, values, loc, cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 1098, in _validate_singleton return self._apply_validators(v, values, loc, cls, self.validators) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/fields.py", line 1154, in _apply_validators v = validator(cls, v, values, self, self.model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/class_validators.py", line 337, in <lambda> return lambda cls, v, values, field, config: validator(v) ^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/pydantic/v1/main.py", line 711, in validate return cls(**value) ^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/config.py", line 135, in __init__ self._serialize_request_router_cls() File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/config.py", line 151, in _serialize_request_router_cls request_router_class = import_attr(request_router_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/_common/utils.py", line 44, in import_attr module = importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ray/anaconda3/lib/python3.12/importlib/__init__.py", line 90, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1387, in _gcd_import File "<frozen importlib._bootstrap>", line 1360, in _find_and_load File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked ModuleNotFoundError: No module named 'custom_request_router' ``` This PR import the custom class from `build_serve_application` so that the operation is performed in context of runtime env The same idea applies to the deployment level autoscaling function and application-level autoscaling function. Another issue happens because `cloudpickle` sees the `UniformRequestRouter` class as **a top-level symbol in an importable module** (`custom_request_router`). When `cloudpickle.dumps()` runs, it recognizes that the object can be **re-imported** via `import custom_request_router; custom_request_router.UniformRequestRouter`, so instead of embedding the actual code, it just stores that import reference. That’s why the pickle bytes only contain `b'...custom_request_router...UniformRequestRouter...'` — no source or bytecode. but we want `cloudpickle` to embed the class definition instead of just referencing it, call: ```python import importlib, cloudpickle mod = importlib.import_module("custom_request_router") cloudpickle.register_pickle_by_value(mod) payload = cloudpickle.dumps(mod.UniformRequestRouter) ``` This tells `cloudpickle` to serialize everything from that module **by value (with code included)** rather than **by reference**. We cannot rely on the reference because custom request router class can be used in serve proxy running outside of runtime env, hence having the entire bytecode make `cloudpickle.loads` possible Changes ``` request_router_config: RequestRouterConfig = Field( default=RequestRouterConfig(), update_type=DeploymentOptionUpdateType.NeedsActorReconfigure, ) ``` to ``` request_router_config: RequestRouterConfig = Field( default_factory=RequestRouterConfig, update_type=DeploymentOptionUpdateType.NeedsActorReconfigure, ) ``` because `RequestRouterConfig` was getting initialized at import time and causing a problem is the docs build. also changed ``` created_at: float = field(default_factory=lambda: time.time()) ``` ``` self._get_curr_time_s = ( get_curr_time_s if get_curr_time_s is not None else lambda: time.time() ) ``` because a reference to `time.time` is not serializable by cloudpickle --------- Signed-off-by: abrar <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
## Description 1. Update docs 2. catch exception and redirect users to docs ## Related issues ray-project#56855 ## Additional information Hard to write tests for this situation. Manually verified that this is the right exception to catch ``` try: cloudpickle.loads(b'\x80\x05\x95\xbc\x03\x00\x00\x00\x00\x00\x00\x8c\x1bray.cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x03K\x03KCCnt\x00\xa0\x01\xa1\x00}\x01|\x01j\x02}\x02t\x03t\x04\x83\x01\x01\x00d\x01|\x02\x04\x00\x03\x00k\x01r*d\x02k\x00r:n\x04\x01\x00n\x0cd\x03d\x04d\x05i\x01f\x02S\x00d\x06|\x02\x04\x00\x03\x00k\x01rNd\x07k\x00r^n\x04\x01\x00n\x0cd\x08d\x04d\ti\x01f\x02S\x00d\nd\x04d\x0bi\x01f\x02S\x00d\x00S\x00\x94(NK\tK\x11K\x02\x8c\x06reason\x94\x8c\x0eBusiness hours\x94K\x12K\x14K\x04\x8c\x18Evening batch processing\x94K\x01\x8c\x0eOff-peak hours\x94t\x94(\x8c\x08datetime\x94\x8c\x03now\x94\x8c\x04hour\x94\x8c\x05print\x94\x8c\x04avro\x94t\x94\x8c\x03ctx\x94\x8c\x0ccurrent_time\x94\x8c\x0ccurrent_hour\x94\x87\x94\x8c\x1b/home/ubuntu/apps/policy.py\x94\x8c!scheduled_batch_processing_policy\x94K\x0eC\x10\x00\x03\x08\x01\x06\x01\x08\x02\x18\x01\x0c\x02\x18\x01\x0c\x03\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c\x08__file__\x94h\x18uNNNt\x94R\x94h\x00\x8c\x12_function_setstate\x94\x93\x94h#}\x94}\x94(h\x1fh\x19\x8c\x0c__qualname__\x94h\x19\x8c\x0f__annotations__\x94}\x94(h\x14\x8c\x10ray.serve.config\x94\x8c\x12AutoscalingContext\x94\x93\x94\x8c\x06return\x94h\x04\x8c\x0cGenericAlias\x94\x85\x94R\x94\x8c\x08builtins\x94\x8c\x05tuple\x94\x93\x94h2\x8c\x03int\x94\x93\x94\x8c\t_operator\x94\x8c\x07getitem\x94\x93\x94\x8c\x06typing\x94\x8c\x04Dict\x94\x93\x94h2\x8c\x03str\x94\x93\x94h:\x8c\x03Any\x94\x93\x94\x86\x94\x86\x94R\x94\x86\x94\x86\x94R\x94u\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h \x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94(h\x0eh\x0e\x8c\x08datetime\x94\x93\x94h\x12h\x00\x8c\tsubimport\x94\x93\x94h\x12\x85\x94R\x94uu\x86\x94\x86R0.') except (ModuleNotFoundError, ImportError) as e: print(f"caused by {e} {type(e)}") ``` ``` ❯ python policy.py caused by No module named 'avro' <class 'ModuleNotFoundError'> ``` --------- Signed-off-by: abrar <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
currently the
RequestRouterConfigis initialized from the controller. WhenRequestRouterConfigis initialized, we try to import the Class and serialize it. This is unsafe because the controller does not have the application runtime-env. See the following stack trace for evidence of howRequestRouterConfigis initialized in controllerThis PR import the custom class from
build_serve_applicationso that the operation is performed in context of runtime envThe same idea applies to the deployment level autoscaling function and application-level autoscaling function.
Another issue happens because
cloudpicklesees theUniformRequestRouterclass as a top-level symbol in an importable module (custom_request_router).When
cloudpickle.dumps()runs, it recognizes that the object can be re-imported viaimport custom_request_router; custom_request_router.UniformRequestRouter,so instead of embedding the actual code, it just stores that import reference.
That’s why the pickle bytes only contain
b'...custom_request_router...UniformRequestRouter...'— no source or bytecode.but we want
cloudpickleto embed the class definition instead of just referencing it, call:This tells
cloudpickleto serialize everything from that module by value (with code included) rather than by reference.We cannot rely on the reference because custom request router class can be used in serve proxy running outside of runtime env, hence having the entire bytecode make
cloudpickle.loadspossibleChanges
to
because
RequestRouterConfigwas getting initialized at import time and causing a problem is the docs build.also changed
because a reference to
time.timeis not serializable by cloudpickle