Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ray-project:master #140

Open
wants to merge 6,497 commits into
base: master
Choose a base branch
from
Open

Conversation

pull[bot]
Copy link

@pull pull bot commented Jun 29, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

kevin85421 and others added 28 commits March 6, 2025 07:21
…51055)

It's not common to call `unique_ptr::release()` because it can easily
lead to memory leaks. However, `ray_syncer_test.cc` is a special case.

I tried to change `cli_reactor` to `unique_ptr`, and then the tests will
fail with double free. I use ASAN to check:
 
1. `RayClientBidiReactor::OnDone` calls `delete this;`.
2. Unique pointer goes out of scope.

<img width="1728" alt="image"
src="https://github.com/user-attachments/assets/5f807a16-2633-4576-b057-d34dd1aaa546"
/>

Signed-off-by: kaihsun <[email protected]>
The macro is never used in our codebase, so delete it.

Signed-off-by: dentiny <[email protected]>
TPU device logs for k8s containers that request `google.com/tpu`
resources are written to the `/tmp/tpu_logs` directory. This PR adds a
symlink to the `/tmp/tpu_logs` directory when the `TPU_WORKER_ID` env
var is set, TPU log files are then added to `monitor_log_paths`. The
logs are then viewable from the Ray Dashboard:

Create a file in /tmp/tpu_logs and view symlink:

![command-line-logging](https://github.com/user-attachments/assets/c50915ad-8382-4af7-a398-40d5a249e8c8)

The tpu_logs directory is added to the 'Logs' tab on a TPU Ray worker:

![tpu_logs_dir](https://github.com/user-attachments/assets/394133b0-be70-4b98-9e86-dcad50c1b4fd)

The log file we created is ingested/viewable:

![tpu-device-log-file](https://github.com/user-attachments/assets/c42ab96a-f88b-4959-adf2-8650fd75c773)

---------

Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
#51113)

This reverts commit e4a448f.

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

#47814 (comment)

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
UV doesn't seem to carry over the right environment markers in
`requirements_compiled.txt`. We can temporarily revert this until we
find a fix for the issue.

Signed-off-by: Kevin H. Luu <[email protected]>
Fix typos in comments and strings

Signed-off-by: co63oc <[email protected]>
…tutorials (#50240)

Updates docs to use correct normalization values in image datasets.

---------

Signed-off-by: Ricardo Decal <[email protected]>
…a HuggingFace `Dataset` (#50998)

## Why are these changes needed?

`override_num_blocks` is not supported when reading from a HuggingFace
`Dataset` object - i.e in non-streaming mode. It is supported however,
in streaming mode. However, the current error message is incorrect and
mixes up the wording.

This is a tiny PR to improve the wording. 

Signed-off-by: sumanthrh <[email protected]>
changes ordering of libraries based on popularity (pytorch first, then
xgboost, then the rest)

---------

Signed-off-by: Ricardo Decal <[email protected]>
## Why are these changes needed?

The proxy currently includes http redirects as http errors, which are
emitted to metrics. 3xx shouldn't be an error. This PR excludes 3xx
responses from the error count, and updates the relevant test case.



Signed-off-by: akyang-anyscale <[email protected]>
Currently the V2 Autoscaler formats logs by converting the V2 data
structure `ClusterStatus` to the V1 structures `AutoscalerSummary` and
`LoadMetricsSummary` and then passing them to the legacy
`format_info_string`. It'd be useful for the V2 autoscaler to directly
format `ClusterStatus` to the correct output log format. This PR
refactors `utils.py` to directly format `ClusterStatus`. Additionally,
this PR changes the node reports to output `instance_id` rather than
`ip_address`, since the latter is not necessarily unique for failed
nodes.

## Related issue number

Closes #37856

---------

Signed-off-by: ryanaoleary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
## Why are these changes needed?

Some docs were failing to index properly due to their extremely long
length. I've hidden verbose cell outputs so that they index properly

## Related issue number

N/A

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Ricardo Decal <[email protected]>
Double checked locking is known to be buggy (check if null and update
pointer leads to data race), `std::once_flag` is the solution.

---------

Signed-off-by: dentiny <[email protected]>
Add various metrics that are captured in the progress bar but are not
captured in the prometheus metrics emitted.
---------

Signed-off-by: Matthew Owen <[email protected]>
Explicitly tear down the compiled graph and kill the actors rather than relying on GC.

Also previously in Compiled Graph, actor is killed only when the actor task does not finish within timeout. This PR fixes it by always killing the actor when kill_actors=True.

Signed-off-by: Rui Qiao <[email protected]>
Fix the operator id name format. The current format causes potential
collision between operator in rare cases.

Example:

```
ds = ray.data.range(100, override_num_blocks=20).limit(11)
for i in range(11):
    ds = ds.limit(1)

ds._set_name("data_head_test")
ds.materialize()
```

You would expect there will be 12 limit operators but the dashboard only
shows 11 because of id collisions:

<img width="1821" alt="Screenshot 2025-03-05 at 5 34 31 PM"
src="https://github.com/user-attachments/assets/1cdb2a58-eb0c-4c10-bb91-a33f6fa5e946"
/>

Test:
- CI

Signed-off-by: can <[email protected]>
When an asyncio task creates another asyncio task, raising
`AsyncioActorExit` cannot make the caller exit because they are not the
same task. Therefore, this PR makes `exit_actor` to request actor exit
in core worker context, which will be checked regularly by core worker.

Closes: #49451

---------

Signed-off-by: Chi-Sheng Liu <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
`logging.warn` is legacy and deprecated

Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves #51135

Error message pasted here:
```
[2025-03-06T19:49:02Z] >       assert file_info.filename == str(tpu_log_dir / tpu_device_log_file)
[2025-03-06T19:49:02Z] E       AssertionError: assert 'C:\\Users\\C...pu-device.log' == 'C:\\Users\\C...pu-device.log'
[2025-03-06T19:49:02Z] E         - C:\Users\ContainerAdministrator\AppData\Local\Temp\pytest-of-ContainerAdministrator\pytest-1\test_tpu_logs0\logs\tpu_logs\tpu-device.log
[2025-03-06T19:49:02Z] E         ?                                                                                                                 ^
[2025-03-06T19:49:02Z] E         + C:\Users\ContainerAdministrator\AppData\Local\Temp\pytest-of-ContainerAdministrator\pytest-1\test_tpu_logs0\logs/tpu_logs\tpu-device.log
[2025-03-06T19:49:02Z] E         ?                                     
```

Signed-off-by: dentiny <[email protected]>
A lot of things change coregpubuild so the multi gpu tests run more often than they need to. We're moving to manually running these multi-gpu tests.

---------

Signed-off-by: dayshah <[email protected]>
This docs code was not passing multi-gpu ci step.

---------

Signed-off-by: dayshah <[email protected]>
Somehow a bad import snuck in...

## Related issue number

Closes #49634
Closes #49632
Closes #49638
Closes #49642

---------

Signed-off-by: Edward Oakes <[email protected]>
## Why are these changes needed?

- Make the num_blocks argument optional. So no need to set
`num_blocks=None` when using `target_num_rows_per_block`.

- Add type hint for none value

- Fix formatting in [docs
page](https://docs.ray.io/en/latest/data/api/doc/ray.data.Dataset.repartition.html)


![image](https://github.com/user-attachments/assets/bfe8a845-3c37-4be6-a2dc-ef78d56c80d4)

---------

Signed-off-by: Praveen Gorthy <[email protected]>
Signed-off-by: Praveen <[email protected]>
Co-authored-by: Hao Chen <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Update the document to include the feature of python standard attributes
in log lines.

The PR also fixes all applicable errors/warnings in the doc. 

Closes #49502

---------

Signed-off-by: Mengjin Yan <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
* Use `psutil.process_iter` to replace `psutil.pids`.
* Use `proc.info["name"]` instead of `proc.name()`.
* I'm not sure whether `proc.name()` uses the cache set by
`process_iter`, but I'm certain that using info is correct since the
official docs frequently use `proc.info[...]` with `process_iter`.
* I asked a question on giampaolo/psutil#2518,
but I'm not sure how long it will take to get an answer from the
community. For now, I think we can merge it, and I'll update the use of
psutil if the maintainers have any suggestions.

---------

Signed-off-by: kaihsun <[email protected]>
kevin85421 and others added 30 commits March 20, 2025 17:27
I tried to reproduce the ASAN errors in `scheduling_queue_test.cc` for
#51516 by running:

```
bazel test --features=asan -c dbg //:scheduling_queue_test  --test_output=all
```

However, I got the following ODR error instead of the actual data racing
error.

<img width="1728" alt="image"
src="https://github.com/user-attachments/assets/0d495f26-efaa-4586-a6ec-be1729b185da"
/>

It looks like we have two packages `@com_github_madler_zlib//:zlib` and
`@net_zlib_zlib//:zlib` in our C++ codebase.

```sh
bazel query --noimplicit_deps \
    'allpaths(//:scheduling_queue_test, @net_zlib_zlib//:zlib)'

# Output
//:core_worker_lib
//:scheduling_queue_test
//src/ray/util:pipe_logger
//src/ray/util:stream_redirection_utils
@boost//:iostreams
@net_zlib_zlib//:zlib
Loading: 7 packages loaded
```

My initial thought was to avoid having `scheduling_queue_test` use
`@net_zlib_zlib//:zlib`, so I tried separating CoreWorker into smaller
BAZEL targets. However, I found it non-trivial and eventually gave up,
using the following command as a workaround.

```sh
bazel test --features=asan -c dbg //:scheduling_queue_test  --test_output=all --test_env=ASAN_OPTIONS="detect_odr_violation=0"
```

---------

Signed-off-by: kaihsun <[email protected]>
…lugins (#51565)

Fixes #51196

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
…shboard_[module_name].err (#51545)

Signed-off-by: Chi-Sheng Liu <[email protected]>
…e and supports websocket handler returns normal HTTP response (#51552)

Signed-off-by: Chi-Sheng Liu <[email protected]>
Created by release automation bot.

Update with commit 8ee3f00

Signed-off-by: Lonnie Liu <[email protected]>
Co-authored-by: Lonnie Liu <[email protected]>
This PR is stacked upon #51179,
to make redirection stream unit testable.
Basically a no-op change, to extract redirection logic out into a
separate file, and leave exit hook and global registry where they're
now.

---------

Signed-off-by: dentiny <[email protected]>
…nerated code and `custom_types.py` are inconsistent (#51568)

The validation fails when I update a `.proto` file, compile the Ray
codebase, and run a Ray program. However, the original error message
instructs me to generate the Protobuf code again, which I have already
done. Instead, I need to update `custom_types.py` to fix the issue.

The error message with this PR:

<img width="1132" alt="Screenshot 2025-03-20 at 2 49 18 PM"
src="https://github.com/user-attachments/assets/96ca439d-6e35-49c0-aabf-759ed618cd91"
/>

Signed-off-by: Kai-Hsun Chen <[email protected]>
```
REGRESSION 9.35%: client__get_calls (THROUGHPUT) regresses from 1094.7883444776185 to 992.4456902391204 in microbenchmark.json
REGRESSION 7.87%: tasks_per_second (THROUGHPUT) regresses from 399.43954902981744 to 367.9840802358416 in benchmarks/many_tasks.json
REGRESSION 6.60%: multi_client_put_gigabytes (THROUGHPUT) regresses from 43.246981615749526 to 40.39150444280067 in microbenchmark.json
REGRESSION 5.16%: client__tasks_and_put_batch (THROUGHPUT) regresses from 14341.529664523765 to 13601.436104861408 in microbenchmark.json
REGRESSION 5.03%: 1_1_actor_calls_concurrent (THROUGHPUT) regresses from 5402.532852540871 to 5130.570133178275 in microbenchmark.json
REGRESSION 4.83%: 1_1_actor_calls_async (THROUGHPUT) regresses from 8588.075503140139 to 8173.653446206568 in microbenchmark.json
REGRESSION 4.71%: single_client_tasks_and_get_batch (THROUGHPUT) regresses from 6.116479739439202 to 5.828378076935622 in microbenchmark.json
REGRESSION 4.06%: single_client_get_calls_Plasma_Store (THROUGHPUT) regresses from 10975.200393255369 to 10529.193272608605 in microbenchmark.json
REGRESSION 3.71%: client__tasks_and_get_batch (THROUGHPUT) regresses from 0.9551721070094008 to 0.9197513826205774 in microbenchmark.json
REGRESSION 3.25%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2024.9514970549762 to 1959.1925407193576 in microbenchmark.json
REGRESSION 2.78%: single_client_put_gigabytes (THROUGHPUT) regresses from 18.30617444315663 to 17.79739662942353 in microbenchmark.json
REGRESSION 1.46%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 1057.2932167754398 to 1041.8730021547178 in microbenchmark.json
REGRESSION 1.32%: 1_n_actor_calls_async (THROUGHPUT) regresses from 8168.440029557936 to 8060.698907411474 in microbenchmark.json
REGRESSION 1.19%: single_client_tasks_sync (THROUGHPUT) regresses from 981.51641421362 to 969.8384217890384 in microbenchmark.json
REGRESSION 0.89%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 1056.4662855748954 to 1047.1016344870811 in microbenchmark.json
REGRESSION 0.58%: actors_per_second (THROUGHPUT) regresses from 591.3775923644333 to 587.9457127979538 in benchmarks/many_actors.json
REGRESSION 0.56%: 1_1_async_actor_calls_sync (THROUGHPUT) regresses from 1434.2085547024217 to 1426.2018801386466 in microbenchmark.json
REGRESSION 116.92%: dashboard_p50_latency_ms (LATENCY) regresses from 32.123 to 69.681 in benchmarks/many_actors.json
REGRESSION 59.07%: dashboard_p99_latency_ms (LATENCY) regresses from 589.9 to 938.359 in benchmarks/many_tasks.json
REGRESSION 57.53%: dashboard_p95_latency_ms (LATENCY) regresses from 398.245 to 627.361 in benchmarks/many_tasks.json
REGRESSION 53.36%: dashboard_p50_latency_ms (LATENCY) regresses from 89.962 to 137.963 in benchmarks/many_tasks.json
REGRESSION 37.60%: dashboard_p99_latency_ms (LATENCY) regresses from 3067.405 to 4220.801 in benchmarks/many_actors.json
REGRESSION 12.91%: stage_0_time (LATENCY) regresses from 6.343268156051636 to 7.161974191665649 in stress_tests/stress_test_many_tasks.json
REGRESSION 10.77%: dashboard_p95_latency_ms (LATENCY) regresses from 2575.96 to 2853.454 in benchmarks/many_actors.json
REGRESSION 6.85%: dashboard_p99_latency_ms (LATENCY) regresses from 252.85 to 270.166 in benchmarks/many_pgs.json
REGRESSION 2.52%: 10000_get_time (LATENCY) regresses from 23.620077062999997 to 24.215384834000005 in scalability/single_node.json
REGRESSION 2.22%: avg_iteration_time (LATENCY) regresses from 1.1939783954620362 to 1.220467975139618 in stress_tests/stress_test_dead_actors.json
REGRESSION 1.80%: stage_3_time (LATENCY) regresses from 1829.902144908905 to 1862.925583600998 in stress_tests/stress_test_many_tasks.json
REGRESSION 1.73%: 1000000_queued_time (LATENCY) regresses from 191.976472028 to 195.30269835 in scalability/single_node.json
REGRESSION 1.56%: time_to_broadcast_1073741824_bytes_to_50_nodes (LATENCY) regresses from 17.602684142 to 17.87641767999999 in scalability/object_store.json
REGRESSION 1.12%: 10000_args_time (LATENCY) regresses from 18.656692702999997 to 18.865748501 in scalability/single_node.json
REGRESSION 0.60%: stage_2_avg_iteration_time (LATENCY) regresses from 39.46649179458618 to 39.70143375396729 in stress_tests/stress_test_many_tasks.json
REGRESSION 0.45%: 107374182400_large_object_time (LATENCY) regresses from 29.23165342300001 to 29.36276392100001 in scalability/single_node.json
```

Signed-off-by: Lonnie Liu <[email protected]>
Co-authored-by: Lonnie Liu <[email protected]>
They're owned by core team.

---------

Signed-off-by: Edward Oakes <[email protected]>
## Why are these changes needed?

It's tricky for users to implement `preprocess` function when
constructing a Processor, because users may not have an idea about
what's the input dataset should look like (i.e. what's the expected
schema). This PR proposes a new API `log_input_column_names()` that logs
the expected schema. Example:

```python
import ray
from ray.data.llm import build_llm_processor, vLLMEngineProcessorConfig

processor_config = vLLMEngineProcessorConfig(...)
processor = build_llm_processor(...)
processor.log_input_column_names()
# The first stage of the processor is ChatTemplateStage.
# Required input columns:
#     messages: A list of messages in OpenAI chat format. See https://platform.openai.com/docs/api-reference/chat/create for details.

processor_config = vLLMEngineProcessorConfig(
    apply_chat_template=False,
    tokenize=False,
)
processor = build_llm_processor(...)
processor.log_input_column_names()
# The first stage of the processor is vLLMEngineStage.
# Required input columns:
#    prompt: The text prompt (str).
#    sampling_params: The sampling parameters. See https://docs.vllm.ai/en/latest/api/inference_params.html#sampling-parameters for details.
# Optional input columns:
#    tokenized_prompt: The tokenized prompt. If provided, the prompt will not be tokenized by the vLLM engine.
#    images: The images to generate text from. If provided, the prompt will be a multimodal prompt.
#    model: The model to use for this request. If the model is different from the model set in the stage, then this is a LoRA request.
```

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Cody Yu <[email protected]>
## Why are these changes needed?

1. Adding more ops to `BlockColumnAccessor`
2. Fixing circular imports in Ray Data
3. Fixing AggregateFnV2 to be proper ABC
4. Simplifying `accumulate_block` op

---------

Signed-off-by: Alexey Kudinkin <[email protected]>
)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Fixes #51195

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
Signed-off-by: liuxsh9 <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Co-authored-by: Kourosh Hakhamaneshi <[email protected]>
…51563)

## Why are these changes needed?

`use_legacy_format` has been deprecated since Arrow 15.0.0 and [is
deleted from the
repo](https://github.com/apache/arrow/pull/45742/files).

Provided that it defaults to `use_legacy_format=False`, removing it from
the Ray repo completely.
---------

Signed-off-by: Alexey Kudinkin <[email protected]>
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Add gen config related doc

## Related issue number

Closes
https://anyscale1.atlassian.net/browse/LLM-1786?atlOrigin=eyJpIjoiZDg2MWMxNmU0YTY2NDRhMGJiN2JmNDk0NmNjYjE3OWIiLCJwIjoiaiJ9

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Gene Su <[email protected]>
The compiled graphs quickstart was misusing `testcode`, which cannot be
composed with `literalinclude`. The included code is already tested, so
no concern about missed coverage.

I've also split off the `core` and `tune` doctests into builds tagged
with the appropriate team.

---------

Signed-off-by: Edward Oakes <[email protected]>
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

The locust request duration is milliseconds.

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: akyang-anyscale <[email protected]>
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->
The request for the image in the resnet50 application could hang
indefinitely. This could block the event loop and make tests flaky. This
PR adds a 5s timeout to the `get` call.

Replica is stuck for some reason. As a result, requests are not making
progress and the client is disconnecting due to timeout.
```
2025-03-20, 0:02:18.331 | replica | d4c5cca8-c654-4d45-b212-ae0061194a86 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60001.2ms
-- | -- | -- | -- | -- | --
I | 2025-03-20, 0:02:19.247 | replica | d486af45-d50e-468f-903f-d4bf723ea143 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.3ms
I | 2025-03-20, 0:02:19.785 | replica | 173f5209-2324-43f4-a59d-d4e5ee974462 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.1ms
I | 2025-03-20, 0:02:22.534 | proxy | edeba18e-7af1-4881-a584-7f528f6d9ed2 | ip-10-0-43-65 |   | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request edeba18e-7af1-4881-a584-7f528f6d9ed2.
I | 2025-03-20, 0:03:16.520 | replica | 100a8f5d-b8be-4263-ada3-518419eb6673 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59995.2ms
I | 2025-03-20, 0:03:17.747 | replica | d433ecfa-a55b-4a56-bbad-0de5c6f1f6d9 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.2ms
I | 2025-03-20, 0:03:19.081 | replica | ed2a9925-5e9b-474c-a644-ff701c8c7899 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.9ms
I | 2025-03-20, 0:03:19.273 | replica | f67b7e3a-d33c-43e6-88a8-415c9aa6be69 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.4ms
I | 2025-03-20, 0:03:20.918 | replica | 696ca5d9-f9b4-4899-a813-c2640a24c1ff | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.3ms
I | 2025-03-20, 0:03:22.772 | proxy | 5dafad1a-455e-46ab-b584-686fafb7f420 | ip-10-0-43-65 |   | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request 5dafad1a-455e-46ab-b584-686fafb7f420.
I | 2025-03-20, 0:04:16.590 | replica | 54e4994f-efaa-413e-b45b-11080b60573f | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59996.8ms
I | 2025-03-20, 0:04:18.386 | replica | cc74890f-9369-4664-9dd7-4bbadc979245 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.7ms
I | 2025-03-20, 0:04:20.177 | proxy | 3d09d80c-9180-47cb-b8d2-16e54420447a | ip-10-0-43-65 |   | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request 3d09d80c-9180-47cb-b8d2-16e54420447a.
I | 2025-03-20, 0:04:20.922 | replica | 769f085e-17d2-4fec-ab78-9051cb2ae1ee | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.1ms
I | 2025-03-20, 0:04:21.137 | replica | a1515707-2daf-4c85-9894-9303351ac64f | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59997.8ms
I | 2025-03-20, 0:04:21.486 | replica | 9db28a80-a45a-4a0a-a209-ccd8f7d4a559 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.7ms
I | 2025-03-20, 0:04:29.288 | proxy | 58cb1d1e-fb68-4367-8b71-a704c140f8ab | ip-10-0-43-65 |   | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request 58cb1d1e-fb68-4367-8b71-a704c140f8ab.
I | 2025-03-20, 0:05:16.701 | replica | 868d7cd3-e205-4a8b-9747-7a36a362be0c | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59997.5ms
I | 2025-03-20, 0:05:20.062 | replica | 413193a9-1162-4dfd-a4bd-714cc3973cfe | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.1ms
I | 2025-03-20, 0:05:21.596 | replica | 7a5109e9-d975-4319-b8b1-412b6b90b6c7 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60001.3ms
I | 2025-03-20, 0:05:21.946 | replica | 2dd3a7d2-0988-46db-abed-088241b1f065 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.3ms
I | 2025-03-20, 0:05:22.908 | proxy | d95da992-9e5e-40f6-8746-afcf8f3cbc34 | ip-10-0-43-65 |   | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request d95da992-9e5e-40f6-8746-afcf8f3cbc34.
I | 2025-03-20, 0:05:23.046 | replica | 65460812-87b2-4410-8360-c82521233400 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59997.2ms
I | 2025-03-20, 0:05:24.479 | proxy | a8bc058a-04f6-41c8-9c66-49956f32dad0 | ip-10-0-41-135 |   | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request a8bc058a-04f6-41c8-9c66-49956f32dad0.
I | 2025-03-20, 0:06:16.765 | replica | b34ad934-8662-4e4c-a109-9cb44c636613 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60000.9ms
I | 2025-03-20, 0:06:17.227 | proxy | a571adce-3fe6-4799-a72c-d2ad11594edc | ip-10-0-41-135 |   | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request a571adce-3fe6-4799-a72c-d2ad11594edc.
I | 2025-03-20, 0:06:20.594 | replica | 3175da08-d8ed-4928-ba62-1a9dc774b690 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.2ms
I | 2025-03-20, 0:06:21.329 | proxy | 2b6cb246-0ff3-4a03-a129-4e5c69e99d6e | ip-10-0-43-65 |   | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request 2b6cb246-0ff3-4a03-a129-4e5c69e99d6e.
I | 2025-03-20, 0:06:22.382 | replica | b8adf8db-a21a-490f-98aa-ce4e2eb347fd | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60000.3ms
I | 2025-03-20, 0:06:22.476 | replica | 03b6fe72-d630-479a-8072-0e1665192db9 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60000.7ms
I | 2025-03-20, 0:06:24.476 | replica | e260d657-0975-45dd-8503-98a53d8f201e | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.4ms
```

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: akyang-anyscale <[email protected]>
…s to pypi (#51517)

- Add helper function to add build tag (e.g. `-1`) right after Ray
version in the wheel name, in cases where original wheels uploaded to
pypi/test pypi are corrupted.

---------

Signed-off-by: kevin <[email protected]>
The correct destination is stderr but not stdout.

- We've mentioned effect to stream to stderr here:
https://github.com/ray-project/ray/blob/a42e6580a59dff3291a56595a74ff27c04d9e29d/python/ray/_private/services.py#L1142-L1144
- Stream handler is used when logging filename not specified, which
streams to stderr:
https://docs.python.org/3/library/logging.handlers.html#logging.StreamHandler

Signed-off-by: dentiny <[email protected]>
## Why are these changes needed?
Add TorchDataLoader to Train Benchmark.

---------

Signed-off-by: Srinath Krishnamachari <[email protected]>
Just a missing comma and equal sign

Signed-off-by: Jonathan Dumaine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.