Skip to content

Conversation

@chrisfellowes-anyscale
Copy link
Contributor

@chrisfellowes-anyscale chrisfellowes-anyscale commented Oct 30, 2025

Description

this helps prevent an edge case when using file based log exporters like vector that use fingerprinting ref to identify unique files.

example edge case that this fixes:
two jobs are submitted to a cluster and begin executing at the same time, they both contain an invalid entrypoint that references a nonexistant file

before fix:

  • both jobs have the identical "Runtime env is setting up" log with identical timestamps
  • both jobs have identical entrypoint failure logs

as a result, the log files for these jobs are identical, so vector will only export one.

after fix:

  • both jobs have the identical "Runtime env is setting up" log with identical timestamps
  • each job has a unique entrypoint log containing its job_id
  • both jobs have identical entrypoint failure logs

vector can differentiate between these two files, so both will be exported

Related issues

Additional information

@chrisfellowes-anyscale chrisfellowes-anyscale changed the title [core] add entrypoint log for jobs [wip][core] add entrypoint log for jobs Oct 30, 2025
# Open in append mode to avoid overwriting runtime_env setup logs for the
# supervisor actor, which are also written to the same file.
with open(logs_path, "a") as logs_file:
self._logger.info(f"Running entrypoint for job {self._job_id}: {self._entrypoint}\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't touched this code in a long while, but I believe self._logger goes to a log file for this supervisor actor, not the log file for the job itself. You'll want to write this to the logs_file that was opened a line above instead.

@chrisfellowes-anyscale chrisfellowes-anyscale marked this pull request as ready for review October 30, 2025 14:53
@chrisfellowes-anyscale chrisfellowes-anyscale requested a review from a team as a code owner October 30, 2025 14:53
Signed-off-by: Chris Fellowes <[email protected]>
Signed-off-by: chrisfellowes <[email protected]>
Signed-off-by: Chris Fellowes <[email protected]>
Signed-off-by: Chris Fellowes <[email protected]>
Signed-off-by: Chris Fellowes <[email protected]>
@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Oct 30, 2025
Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming tests pass. I just added the go label, which will run the full CI tests.

# Open in append mode to avoid overwriting runtime_env setup logs for the
# supervisor actor, which are also written to the same file.
with open(logs_path, "a") as logs_file:
logs_file.write(f"Running entrypoint for job {self._job_id}: {self._entrypoint}\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
logs_file.write(f"Running entrypoint for job {self._job_id}: {self._entrypoint}\n")
logs_file.write(f"Running entrypoint for job '{self._job_id}': {self._entrypoint}\n")

@edoakes
Copy link
Collaborator

edoakes commented Oct 30, 2025

@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Oct 30, 2025
Signed-off-by: Chris Fellowes <[email protected]>
@edoakes edoakes enabled auto-merge (squash) October 30, 2025 20:40
@edoakes edoakes changed the title [wip][core] add entrypoint log for jobs [core] add entrypoint log for jobs Oct 30, 2025
@github-actions github-actions bot disabled auto-merge October 31, 2025 14:04
@edoakes edoakes enabled auto-merge (squash) October 31, 2025 14:05
Signed-off-by: Chris Fellowes <[email protected]>
auto-merge was automatically disabled October 31, 2025 16:02

Head branch was pushed to by a user without write access

cursor[bot]

This comment was marked as outdated.

Signed-off-by: Chris Fellowes <[email protected]>
Signed-off-by: Chris Fellowes <[email protected]>
@edoakes edoakes merged commit 769abf6 into ray-project:master Oct 31, 2025
6 checks passed
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
this helps prevent an edge case when using file based log exporters like
vector that use fingerprinting
[ref](https://vector.dev/docs/reference/configuration/sources/file/#fingerprint)
to identify unique files.

example edge case that this fixes:
two jobs are submitted to a cluster and begin executing at the same
time, they both contain an invalid entrypoint that references a
nonexistant file

before fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
  - both jobs have identical entrypoint failure logs
  
as a result, the log files for these jobs are identical, so vector will
only export one.

after fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
- each job has a **unique** entrypoint log containing its job_id
- both jobs have identical entrypoint failure logs

vector can differentiate between these two files, so both will be
exported

---------

Signed-off-by: Chris Fellowes <[email protected]>
Signed-off-by: chrisfellowes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
this helps prevent an edge case when using file based log exporters like
vector that use fingerprinting
[ref](https://vector.dev/docs/reference/configuration/sources/file/#fingerprint)
to identify unique files.

example edge case that this fixes:
two jobs are submitted to a cluster and begin executing at the same
time, they both contain an invalid entrypoint that references a
nonexistant file

before fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
  - both jobs have identical entrypoint failure logs
  
as a result, the log files for these jobs are identical, so vector will
only export one.

after fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
- each job has a **unique** entrypoint log containing its job_id
- both jobs have identical entrypoint failure logs

vector can differentiate between these two files, so both will be
exported

---------

Signed-off-by: Chris Fellowes <[email protected]>
Signed-off-by: chrisfellowes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
this helps prevent an edge case when using file based log exporters like
vector that use fingerprinting
[ref](https://vector.dev/docs/reference/configuration/sources/file/#fingerprint)
to identify unique files.

example edge case that this fixes:
two jobs are submitted to a cluster and begin executing at the same
time, they both contain an invalid entrypoint that references a
nonexistant file

before fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
  - both jobs have identical entrypoint failure logs

as a result, the log files for these jobs are identical, so vector will
only export one.

after fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
- each job has a **unique** entrypoint log containing its job_id
- both jobs have identical entrypoint failure logs

vector can differentiate between these two files, so both will be
exported

---------

Signed-off-by: Chris Fellowes <[email protected]>
Signed-off-by: chrisfellowes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
this helps prevent an edge case when using file based log exporters like
vector that use fingerprinting
[ref](https://vector.dev/docs/reference/configuration/sources/file/#fingerprint)
to identify unique files.

example edge case that this fixes:
two jobs are submitted to a cluster and begin executing at the same
time, they both contain an invalid entrypoint that references a
nonexistant file

before fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
  - both jobs have identical entrypoint failure logs
  
as a result, the log files for these jobs are identical, so vector will
only export one.

after fix:
- both jobs have the identical "Runtime env is setting up" log with
identical timestamps
- each job has a **unique** entrypoint log containing its job_id
- both jobs have identical entrypoint failure logs

vector can differentiate between these two files, so both will be
exported

---------

Signed-off-by: Chris Fellowes <[email protected]>
Signed-off-by: chrisfellowes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants