[core] add entrypoint log for jobs #58300

chrisfellowes-anyscale · 2025-10-30T01:17:54Z

Description

this helps prevent an edge case when using file based log exporters like vector that use fingerprinting ref to identify unique files.

example edge case that this fixes:
two jobs are submitted to a cluster and begin executing at the same time, they both contain an invalid entrypoint that references a nonexistant file

before fix:

both jobs have the identical "Runtime env is setting up" log with identical timestamps
both jobs have identical entrypoint failure logs

as a result, the log files for these jobs are identical, so vector will only export one.

after fix:

both jobs have the identical "Runtime env is setting up" log with identical timestamps
each job has a unique entrypoint log containing its job_id
both jobs have identical entrypoint failure logs

vector can differentiate between these two files, so both will be exported

Related issues

Additional information

edoakes · 2025-10-30T01:36:38Z

python/ray/dashboard/modules/job/job_supervisor.py

        # Open in append mode to avoid overwriting runtime_env setup logs for the
        # supervisor actor, which are also written to the same file.
        with open(logs_path, "a") as logs_file:
+            self._logger.info(f"Running entrypoint for job {self._job_id}: {self._entrypoint}\n")


I haven't touched this code in a long while, but I believe self._logger goes to a log file for this supervisor actor, not the log file for the job itself. You'll want to write this to the logs_file that was opened a line above instead.

Signed-off-by: Chris Fellowes <[email protected]>

Signed-off-by: chrisfellowes <[email protected]> Signed-off-by: Chris Fellowes <[email protected]>

Signed-off-by: Chris Fellowes <[email protected]>

edoakes

LGTM assuming tests pass. I just added the go label, which will run the full CI tests.

edoakes · 2025-10-30T15:37:48Z

python/ray/dashboard/modules/job/job_supervisor.py

        # Open in append mode to avoid overwriting runtime_env setup logs for the
        # supervisor actor, which are also written to the same file.
        with open(logs_path, "a") as logs_file:
+            logs_file.write(f"Running entrypoint for job {self._job_id}: {self._entrypoint}\n")


nit:

Suggested change

logs_file.write(f"Running entrypoint for job {self._job_id}: {self._entrypoint}\n")

logs_file.write(f"Running entrypoint for job '{self._job_id}': {self._entrypoint}\n")

edoakes · 2025-10-30T15:38:39Z

Linter is failing: https://buildkite.com/ray-project/microcheck/builds/30180#019a35ad-f7f5-4910-941d-7a244b4f1c05/186-305

Instructions for local linting here: https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting

Signed-off-by: Chris Fellowes <[email protected]>

…sfellowes-anyscale/master

Signed-off-by: Edward Oakes <[email protected]>

Signed-off-by: Chris Fellowes <[email protected]>

this helps prevent an edge case when using file based log exporters like vector that use fingerprinting [ref](https://vector.dev/docs/reference/configuration/sources/file/#fingerprint) to identify unique files. example edge case that this fixes: two jobs are submitted to a cluster and begin executing at the same time, they both contain an invalid entrypoint that references a nonexistant file before fix: - both jobs have the identical "Runtime env is setting up" log with identical timestamps - both jobs have identical entrypoint failure logs as a result, the log files for these jobs are identical, so vector will only export one. after fix: - both jobs have the identical "Runtime env is setting up" log with identical timestamps - each job has a **unique** entrypoint log containing its job_id - both jobs have identical entrypoint failure logs vector can differentiate between these two files, so both will be exported --------- Signed-off-by: Chris Fellowes <[email protected]> Signed-off-by: chrisfellowes <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

this helps prevent an edge case when using file based log exporters like vector that use fingerprinting [ref](https://vector.dev/docs/reference/configuration/sources/file/#fingerprint) to identify unique files. example edge case that this fixes: two jobs are submitted to a cluster and begin executing at the same time, they both contain an invalid entrypoint that references a nonexistant file before fix: - both jobs have the identical "Runtime env is setting up" log with identical timestamps - both jobs have identical entrypoint failure logs as a result, the log files for these jobs are identical, so vector will only export one. after fix: - both jobs have the identical "Runtime env is setting up" log with identical timestamps - each job has a **unique** entrypoint log containing its job_id - both jobs have identical entrypoint failure logs vector can differentiate between these two files, so both will be exported --------- Signed-off-by: Chris Fellowes <[email protected]> Signed-off-by: chrisfellowes <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>

this helps prevent an edge case when using file based log exporters like vector that use fingerprinting [ref](https://vector.dev/docs/reference/configuration/sources/file/#fingerprint) to identify unique files. example edge case that this fixes: two jobs are submitted to a cluster and begin executing at the same time, they both contain an invalid entrypoint that references a nonexistant file before fix: - both jobs have the identical "Runtime env is setting up" log with identical timestamps - both jobs have identical entrypoint failure logs as a result, the log files for these jobs are identical, so vector will only export one. after fix: - both jobs have the identical "Runtime env is setting up" log with identical timestamps - each job has a **unique** entrypoint log containing its job_id - both jobs have identical entrypoint failure logs vector can differentiate between these two files, so both will be exported --------- Signed-off-by: Chris Fellowes <[email protected]> Signed-off-by: chrisfellowes <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

chrisfellowes-anyscale changed the title ~~[core] add entrypoint log for jobs~~ [wip][core] add entrypoint log for jobs Oct 30, 2025

edoakes reviewed Oct 30, 2025

View reviewed changes

chrisfellowes-anyscale marked this pull request as ready for review October 30, 2025 14:53

chrisfellowes-anyscale requested a review from a team as a code owner October 30, 2025 14:53

chrisfellowes-anyscale added 4 commits October 30, 2025 07:57

add entrypoint log for jobs

3abb855

Signed-off-by: Chris Fellowes <[email protected]>

fix test for entrypoint log

daa1fda

Signed-off-by: chrisfellowes <[email protected]> Signed-off-by: Chris Fellowes <[email protected]>

use logger

5fd357d

Signed-off-by: Chris Fellowes <[email protected]>

use correct log file

b7afa9a

Signed-off-by: Chris Fellowes <[email protected]>

chrisfellowes-anyscale force-pushed the master branch from 3724561 to b7afa9a Compare October 30, 2025 14:57

edoakes added the go add ONLY when ready to merge, run all tests label Oct 30, 2025

edoakes approved these changes Oct 30, 2025

View reviewed changes

ray-gardener bot added core Issues that should be addressed in Ray Core observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Oct 30, 2025

fix linter

dd06aed

Signed-off-by: Chris Fellowes <[email protected]>

edoakes enabled auto-merge (squash) October 30, 2025 20:40

edoakes changed the title ~~[wip][core] add entrypoint log for jobs~~ [core] add entrypoint log for jobs Oct 30, 2025

edoakes added 3 commits October 31, 2025 08:34

Merge branch 'master' of https://github.com/ray-project/ray into chri…

87e27f7

…sfellowes-anyscale/master

fix

baa2942

Signed-off-by: Edward Oakes <[email protected]>

fix lint

535059f

Signed-off-by: Edward Oakes <[email protected]>

github-actions bot disabled auto-merge October 31, 2025 14:04

edoakes enabled auto-merge (squash) October 31, 2025 14:05

fix test

476cd6d

Signed-off-by: Chris Fellowes <[email protected]>

auto-merge was automatically disabled October 31, 2025 16:02
Head branch was pushed to by a user without write access

This comment was marked as outdated.

Sign in to view

chrisfellowes-anyscale added 2 commits October 31, 2025 09:23

fix test

bb4c758

Signed-off-by: Chris Fellowes <[email protected]>

fix test

2fbd612

Signed-off-by: Chris Fellowes <[email protected]>

edoakes merged commit 769abf6 into ray-project:master Oct 31, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] add entrypoint log for jobs #58300

[core] add entrypoint log for jobs #58300

Uh oh!

chrisfellowes-anyscale commented Oct 30, 2025 •

edited

Loading

Uh oh!

edoakes Oct 30, 2025

Uh oh!

edoakes left a comment

Uh oh!

edoakes Oct 30, 2025

Uh oh!

edoakes commented Oct 30, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	logs_file.write(f"Running entrypoint for job {self._job_id}: {self._entrypoint}\n")
	logs_file.write(f"Running entrypoint for job '{self._job_id}': {self._entrypoint}\n")

[core] add entrypoint log for jobs #58300

[core] add entrypoint log for jobs #58300

Uh oh!

Conversation

chrisfellowes-anyscale commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

edoakes Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

edoakes left a comment

Choose a reason for hiding this comment

Uh oh!

edoakes Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

edoakes commented Oct 30, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chrisfellowes-anyscale commented Oct 30, 2025 •

edited

Loading