-
Notifications
You must be signed in to change notification settings - Fork 7k
[core] Use graceful shutdown path when actor OUT_OF_SCOPE (del actor)
#57090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
bc2d24f
[core] Use graceful shutdown path when actor OUT_OF_SCOPE (`del actor`)
codope efbe38f
add timeout for graceful actor cleanup with fallback to kill
codope 7b827dc
keep a map of timer by actor, handle actor restart and cancel timer w…
codope d3e7aca
defer created_actors_ cleanup for grceful shutdown; fix doc lint
codope fb6b05e
fixed workerID verification and infinite timeout case
codope 35802e1
skip doctest for code snippet
codope ef7f98e
address test and other minor comments
codope e454b70
fix test fixture
codope ef861fc
created_actors_ cleanup eagerly and simplify timer callback
codope 0bfa458
fix new test fix
codope f56c9d5
Update comment in ray_config_def.h
codope b320379
Merge remote-tracking branch 'origin/master' into ray-shutdown-del-actor
codope c33b2f8
Merge branch 'master' into ray-shutdown-del-actor
codope a45cbaa
Merge remote-tracking branch 'origin/master' into ray-shutdown-del-actor
codope ac164aa
address timer creation, move seq, erase, shutdownflag
codope c3d20b4
Merge remote-tracking branch 'origin/master' into ray-shutdown-del-actor
codope a3c6f3c
resolve conflict
codope 5454dc5
Merge remote-tracking branch 'origin/master' into ray-shutdown-del-actor
codope da1af4a
clarify comments in actor manager wrt timer
codope 362353a
clarify doc notes and add testable code snippet
codope 62f67e0
Merge remote-tracking branch 'origin/master' into ray-shutdown-del-actor
codope 0803797
use weak_ptr` for gcs actor manager and remove is_shutdown_ flag
codope 3bf9e04
Merge remote-tracking branch 'origin/master' into ray-shutdown-del-actor
codope 27cc79d
revert comments
codope File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,7 +3,9 @@ Terminating Actors | |
|
|
||
| Actor processes will be terminated automatically when all copies of the | ||
| actor handle have gone out of scope in Python, or if the original creator | ||
| process dies. | ||
| process dies. When actors terminate gracefully, Ray calls the actor's | ||
| ``__ray_shutdown__()`` method if defined, allowing for cleanup of resources | ||
| (see :ref:`actor-cleanup` for details). | ||
|
|
||
| Note that automatic termination of actors is not yet supported in Java or C++. | ||
|
|
||
|
|
@@ -33,9 +35,8 @@ manually destroyed. | |
| actor_handle = Actor.remote() | ||
|
|
||
| ray.kill(actor_handle) | ||
| # This will not go through the normal Python sys.exit | ||
| # teardown logic, so any exit handlers installed in | ||
| # the actor using ``atexit`` will not be called. | ||
| # Force kill: the actor exits immediately without cleanup. | ||
| # This will NOT call __ray_shutdown__() or atexit handlers. | ||
|
|
||
|
|
||
| .. tab-item:: Java | ||
|
|
@@ -191,3 +192,59 @@ You could see the actor is dead as a result of the user's `exit_actor()` call: | |
| is_detached: false | ||
| placement_group_id: null | ||
| repr_name: '' | ||
|
|
||
|
|
||
| .. _actor-cleanup: | ||
|
|
||
| Actor cleanup with `__ray_shutdown__` | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| When an actor terminates gracefully, Ray calls the ``__ray_shutdown__()`` method | ||
| if it exists, allowing cleanup of resources like database connections or file handles. | ||
|
|
||
| .. tab-set:: | ||
|
|
||
| .. tab-item:: Python | ||
|
|
||
| .. testcode:: | ||
|
|
||
| import ray | ||
| import tempfile | ||
| import os | ||
|
|
||
| @ray.remote | ||
| class FileProcessorActor: | ||
| def __init__(self): | ||
| self.temp_file = tempfile.NamedTemporaryFile(delete=False) | ||
| self.temp_file.write(b"processing data") | ||
| self.temp_file.flush() | ||
|
|
||
| def __ray_shutdown__(self): | ||
| # Clean up temporary file | ||
| if hasattr(self, 'temp_file'): | ||
| self.temp_file.close() | ||
| os.unlink(self.temp_file.name) | ||
|
|
||
| def process(self): | ||
| return "done" | ||
|
|
||
| actor = FileProcessorActor.remote() | ||
| ray.get(actor.process.remote()) | ||
| del actor # __ray_shutdown__() is called automatically | ||
|
|
||
| When ``__ray_shutdown__()`` is called: | ||
|
|
||
| - **Automatic termination**: When all actor handles go out of scope (``del actor`` or natural scope exit) | ||
| - **Manual graceful termination**: When you call ``actor.__ray_terminate__.remote()`` | ||
|
|
||
| When ``__ray_shutdown__()`` is **NOT** called: | ||
|
|
||
| - **Force kill**: When you use ``ray.kill(actor)`` - the actor is killed immediately without cleanup. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the future we should introduce a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agree; will create a followup |
||
| - **Unexpected termination**: When the actor process crashes or exits unexpectedly (such as a segfault or being killed by the OOM killer). | ||
|
|
||
| **Important notes:** | ||
|
|
||
| - ``__ray_shutdown__()`` runs after all actor tasks complete. | ||
| - By default, Ray waits 30 seconds for the graceful shutdown procedure (including ``__ray_shutdown__()``) to complete. If the actor doesn't exit within this timeout, it's force killed. Configure this with ``ray.init(_system_config={"actor_graceful_shutdown_timeout_ms": 60000})``. | ||
| - Exceptions in ``__ray_shutdown__()`` are caught and logged but don't prevent actor termination. | ||
| - ``__ray_shutdown__()`` must be a synchronous method, including for async actors. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.