forked from ray-project/ray
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] master from ray-project:master #140
Open
pull
wants to merge
6,497
commits into
garymm:master
Choose a base branch
from
ray-project:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…51055) It's not common to call `unique_ptr::release()` because it can easily lead to memory leaks. However, `ray_syncer_test.cc` is a special case. I tried to change `cli_reactor` to `unique_ptr`, and then the tests will fail with double free. I use ASAN to check: 1. `RayClientBidiReactor::OnDone` calls `delete this;`. 2. Unique pointer goes out of scope. <img width="1728" alt="image" src="https://github.com/user-attachments/assets/5f807a16-2633-4576-b057-d34dd1aaa546" /> Signed-off-by: kaihsun <[email protected]>
The macro is never used in our codebase, so delete it. Signed-off-by: dentiny <[email protected]>
TPU device logs for k8s containers that request `google.com/tpu` resources are written to the `/tmp/tpu_logs` directory. This PR adds a symlink to the `/tmp/tpu_logs` directory when the `TPU_WORKER_ID` env var is set, TPU log files are then added to `monitor_log_paths`. The logs are then viewable from the Ray Dashboard: Create a file in /tmp/tpu_logs and view symlink:  The tpu_logs directory is added to the 'Logs' tab on a TPU Ray worker:  The log file we created is ingested/viewable:  --------- Signed-off-by: Ryan O'Leary <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]>
#51113) This reverts commit e4a448f. <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> #47814 (comment) ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
UV doesn't seem to carry over the right environment markers in `requirements_compiled.txt`. We can temporarily revert this until we find a fix for the issue. Signed-off-by: Kevin H. Luu <[email protected]>
Fix typos in comments and strings Signed-off-by: co63oc <[email protected]>
…tutorials (#50240) Updates docs to use correct normalization values in image datasets. --------- Signed-off-by: Ricardo Decal <[email protected]>
…a HuggingFace `Dataset` (#50998) ## Why are these changes needed? `override_num_blocks` is not supported when reading from a HuggingFace `Dataset` object - i.e in non-streaming mode. It is supported however, in streaming mode. However, the current error message is incorrect and mixes up the wording. This is a tiny PR to improve the wording. Signed-off-by: sumanthrh <[email protected]>
changes ordering of libraries based on popularity (pytorch first, then xgboost, then the rest) --------- Signed-off-by: Ricardo Decal <[email protected]>
## Why are these changes needed? The proxy currently includes http redirects as http errors, which are emitted to metrics. 3xx shouldn't be an error. This PR excludes 3xx responses from the error count, and updates the relevant test case. Signed-off-by: akyang-anyscale <[email protected]>
Currently the V2 Autoscaler formats logs by converting the V2 data structure `ClusterStatus` to the V1 structures `AutoscalerSummary` and `LoadMetricsSummary` and then passing them to the legacy `format_info_string`. It'd be useful for the V2 autoscaler to directly format `ClusterStatus` to the correct output log format. This PR refactors `utils.py` to directly format `ClusterStatus`. Additionally, this PR changes the node reports to output `instance_id` rather than `ip_address`, since the latter is not necessarily unique for failed nodes. ## Related issue number Closes #37856 --------- Signed-off-by: ryanaoleary <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
## Why are these changes needed? Some docs were failing to index properly due to their extremely long length. I've hidden verbose cell outputs so that they index properly ## Related issue number N/A ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Ricardo Decal <[email protected]>
Double checked locking is known to be buggy (check if null and update pointer leads to data race), `std::once_flag` is the solution. --------- Signed-off-by: dentiny <[email protected]>
Add various metrics that are captured in the progress bar but are not captured in the prometheus metrics emitted. --------- Signed-off-by: Matthew Owen <[email protected]>
Explicitly tear down the compiled graph and kill the actors rather than relying on GC. Also previously in Compiled Graph, actor is killed only when the actor task does not finish within timeout. This PR fixes it by always killing the actor when kill_actors=True. Signed-off-by: Rui Qiao <[email protected]>
Fix the operator id name format. The current format causes potential collision between operator in rare cases. Example: ``` ds = ray.data.range(100, override_num_blocks=20).limit(11) for i in range(11): ds = ds.limit(1) ds._set_name("data_head_test") ds.materialize() ``` You would expect there will be 12 limit operators but the dashboard only shows 11 because of id collisions: <img width="1821" alt="Screenshot 2025-03-05 at 5 34 31 PM" src="https://github.com/user-attachments/assets/1cdb2a58-eb0c-4c10-bb91-a33f6fa5e946" /> Test: - CI Signed-off-by: can <[email protected]>
When an asyncio task creates another asyncio task, raising `AsyncioActorExit` cannot make the caller exit because they are not the same task. Therefore, this PR makes `exit_actor` to request actor exit in core worker context, which will be checked regularly by core worker. Closes: #49451 --------- Signed-off-by: Chi-Sheng Liu <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
`logging.warn` is legacy and deprecated Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves #51135 Error message pasted here: ``` [2025-03-06T19:49:02Z] > assert file_info.filename == str(tpu_log_dir / tpu_device_log_file) [2025-03-06T19:49:02Z] E AssertionError: assert 'C:\\Users\\C...pu-device.log' == 'C:\\Users\\C...pu-device.log' [2025-03-06T19:49:02Z] E - C:\Users\ContainerAdministrator\AppData\Local\Temp\pytest-of-ContainerAdministrator\pytest-1\test_tpu_logs0\logs\tpu_logs\tpu-device.log [2025-03-06T19:49:02Z] E ? ^ [2025-03-06T19:49:02Z] E + C:\Users\ContainerAdministrator\AppData\Local\Temp\pytest-of-ContainerAdministrator\pytest-1\test_tpu_logs0\logs/tpu_logs\tpu-device.log [2025-03-06T19:49:02Z] E ? ``` Signed-off-by: dentiny <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
A lot of things change coregpubuild so the multi gpu tests run more often than they need to. We're moving to manually running these multi-gpu tests. --------- Signed-off-by: dayshah <[email protected]>
This docs code was not passing multi-gpu ci step. --------- Signed-off-by: dayshah <[email protected]>
- `https://docs.ray.io/en/master/serve/model_composition.html#visualizing-the-graph` is the wrong url for `Visualization of DAGs`. - I think `https://docs.ray.io/en/master/ray-core/compiled-graph/visualization.html` is the right. --------- Signed-off-by: Sangyeon Cho <[email protected]> Co-authored-by: Dhyey Shah <[email protected]>
## Why are these changes needed? - Make the num_blocks argument optional. So no need to set `num_blocks=None` when using `target_num_rows_per_block`. - Add type hint for none value - Fix formatting in [docs page](https://docs.ray.io/en/latest/data/api/doc/ray.data.Dataset.repartition.html)  --------- Signed-off-by: Praveen Gorthy <[email protected]> Signed-off-by: Praveen <[email protected]> Co-authored-by: Hao Chen <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]>
Update the document to include the feature of python standard attributes in log lines. The PR also fixes all applicable errors/warnings in the doc. Closes #49502 --------- Signed-off-by: Mengjin Yan <[email protected]> Co-authored-by: Dhyey Shah <[email protected]>
* Use `psutil.process_iter` to replace `psutil.pids`. * Use `proc.info["name"]` instead of `proc.name()`. * I'm not sure whether `proc.name()` uses the cache set by `process_iter`, but I'm certain that using info is correct since the official docs frequently use `proc.info[...]` with `process_iter`. * I asked a question on giampaolo/psutil#2518, but I'm not sure how long it will take to get an answer from the community. For now, I think we can merge it, and I'll update the use of psutil if the maintainers have any suggestions. --------- Signed-off-by: kaihsun <[email protected]>
I tried to reproduce the ASAN errors in `scheduling_queue_test.cc` for #51516 by running: ``` bazel test --features=asan -c dbg //:scheduling_queue_test --test_output=all ``` However, I got the following ODR error instead of the actual data racing error. <img width="1728" alt="image" src="https://github.com/user-attachments/assets/0d495f26-efaa-4586-a6ec-be1729b185da" /> It looks like we have two packages `@com_github_madler_zlib//:zlib` and `@net_zlib_zlib//:zlib` in our C++ codebase. ```sh bazel query --noimplicit_deps \ 'allpaths(//:scheduling_queue_test, @net_zlib_zlib//:zlib)' # Output //:core_worker_lib //:scheduling_queue_test //src/ray/util:pipe_logger //src/ray/util:stream_redirection_utils @boost//:iostreams @net_zlib_zlib//:zlib Loading: 7 packages loaded ``` My initial thought was to avoid having `scheduling_queue_test` use `@net_zlib_zlib//:zlib`, so I tried separating CoreWorker into smaller BAZEL targets. However, I found it non-trivial and eventually gave up, using the following command as a workaround. ```sh bazel test --features=asan -c dbg //:scheduling_queue_test --test_output=all --test_env=ASAN_OPTIONS="detect_odr_violation=0" ``` --------- Signed-off-by: kaihsun <[email protected]>
…lugins (#51565) Fixes #51196 <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(
…shboard_[module_name].err (#51545) Signed-off-by: Chi-Sheng Liu <[email protected]>
…e and supports websocket handler returns normal HTTP response (#51552) Signed-off-by: Chi-Sheng Liu <[email protected]>
Created by release automation bot. Update with commit 8ee3f00 Signed-off-by: Lonnie Liu <[email protected]> Co-authored-by: Lonnie Liu <[email protected]>
…P API (#51555) Signed-off-by: Chi-Sheng Liu <[email protected]>
This PR is stacked upon #51179, to make redirection stream unit testable. Basically a no-op change, to extract redirection logic out into a separate file, and leave exit hook and global registry where they're now. --------- Signed-off-by: dentiny <[email protected]>
…nerated code and `custom_types.py` are inconsistent (#51568) The validation fails when I update a `.proto` file, compile the Ray codebase, and run a Ray program. However, the original error message instructs me to generate the Protobuf code again, which I have already done. Instead, I need to update `custom_types.py` to fix the issue. The error message with this PR: <img width="1132" alt="Screenshot 2025-03-20 at 2 49 18 PM" src="https://github.com/user-attachments/assets/96ca439d-6e35-49c0-aabf-759ed618cd91" /> Signed-off-by: Kai-Hsun Chen <[email protected]>
``` REGRESSION 9.35%: client__get_calls (THROUGHPUT) regresses from 1094.7883444776185 to 992.4456902391204 in microbenchmark.json REGRESSION 7.87%: tasks_per_second (THROUGHPUT) regresses from 399.43954902981744 to 367.9840802358416 in benchmarks/many_tasks.json REGRESSION 6.60%: multi_client_put_gigabytes (THROUGHPUT) regresses from 43.246981615749526 to 40.39150444280067 in microbenchmark.json REGRESSION 5.16%: client__tasks_and_put_batch (THROUGHPUT) regresses from 14341.529664523765 to 13601.436104861408 in microbenchmark.json REGRESSION 5.03%: 1_1_actor_calls_concurrent (THROUGHPUT) regresses from 5402.532852540871 to 5130.570133178275 in microbenchmark.json REGRESSION 4.83%: 1_1_actor_calls_async (THROUGHPUT) regresses from 8588.075503140139 to 8173.653446206568 in microbenchmark.json REGRESSION 4.71%: single_client_tasks_and_get_batch (THROUGHPUT) regresses from 6.116479739439202 to 5.828378076935622 in microbenchmark.json REGRESSION 4.06%: single_client_get_calls_Plasma_Store (THROUGHPUT) regresses from 10975.200393255369 to 10529.193272608605 in microbenchmark.json REGRESSION 3.71%: client__tasks_and_get_batch (THROUGHPUT) regresses from 0.9551721070094008 to 0.9197513826205774 in microbenchmark.json REGRESSION 3.25%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2024.9514970549762 to 1959.1925407193576 in microbenchmark.json REGRESSION 2.78%: single_client_put_gigabytes (THROUGHPUT) regresses from 18.30617444315663 to 17.79739662942353 in microbenchmark.json REGRESSION 1.46%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 1057.2932167754398 to 1041.8730021547178 in microbenchmark.json REGRESSION 1.32%: 1_n_actor_calls_async (THROUGHPUT) regresses from 8168.440029557936 to 8060.698907411474 in microbenchmark.json REGRESSION 1.19%: single_client_tasks_sync (THROUGHPUT) regresses from 981.51641421362 to 969.8384217890384 in microbenchmark.json REGRESSION 0.89%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 1056.4662855748954 to 1047.1016344870811 in microbenchmark.json REGRESSION 0.58%: actors_per_second (THROUGHPUT) regresses from 591.3775923644333 to 587.9457127979538 in benchmarks/many_actors.json REGRESSION 0.56%: 1_1_async_actor_calls_sync (THROUGHPUT) regresses from 1434.2085547024217 to 1426.2018801386466 in microbenchmark.json REGRESSION 116.92%: dashboard_p50_latency_ms (LATENCY) regresses from 32.123 to 69.681 in benchmarks/many_actors.json REGRESSION 59.07%: dashboard_p99_latency_ms (LATENCY) regresses from 589.9 to 938.359 in benchmarks/many_tasks.json REGRESSION 57.53%: dashboard_p95_latency_ms (LATENCY) regresses from 398.245 to 627.361 in benchmarks/many_tasks.json REGRESSION 53.36%: dashboard_p50_latency_ms (LATENCY) regresses from 89.962 to 137.963 in benchmarks/many_tasks.json REGRESSION 37.60%: dashboard_p99_latency_ms (LATENCY) regresses from 3067.405 to 4220.801 in benchmarks/many_actors.json REGRESSION 12.91%: stage_0_time (LATENCY) regresses from 6.343268156051636 to 7.161974191665649 in stress_tests/stress_test_many_tasks.json REGRESSION 10.77%: dashboard_p95_latency_ms (LATENCY) regresses from 2575.96 to 2853.454 in benchmarks/many_actors.json REGRESSION 6.85%: dashboard_p99_latency_ms (LATENCY) regresses from 252.85 to 270.166 in benchmarks/many_pgs.json REGRESSION 2.52%: 10000_get_time (LATENCY) regresses from 23.620077062999997 to 24.215384834000005 in scalability/single_node.json REGRESSION 2.22%: avg_iteration_time (LATENCY) regresses from 1.1939783954620362 to 1.220467975139618 in stress_tests/stress_test_dead_actors.json REGRESSION 1.80%: stage_3_time (LATENCY) regresses from 1829.902144908905 to 1862.925583600998 in stress_tests/stress_test_many_tasks.json REGRESSION 1.73%: 1000000_queued_time (LATENCY) regresses from 191.976472028 to 195.30269835 in scalability/single_node.json REGRESSION 1.56%: time_to_broadcast_1073741824_bytes_to_50_nodes (LATENCY) regresses from 17.602684142 to 17.87641767999999 in scalability/object_store.json REGRESSION 1.12%: 10000_args_time (LATENCY) regresses from 18.656692702999997 to 18.865748501 in scalability/single_node.json REGRESSION 0.60%: stage_2_avg_iteration_time (LATENCY) regresses from 39.46649179458618 to 39.70143375396729 in stress_tests/stress_test_many_tasks.json REGRESSION 0.45%: 107374182400_large_object_time (LATENCY) regresses from 29.23165342300001 to 29.36276392100001 in scalability/single_node.json ``` Signed-off-by: Lonnie Liu <[email protected]> Co-authored-by: Lonnie Liu <[email protected]>
They're owned by core team. --------- Signed-off-by: Edward Oakes <[email protected]>
## Why are these changes needed? It's tricky for users to implement `preprocess` function when constructing a Processor, because users may not have an idea about what's the input dataset should look like (i.e. what's the expected schema). This PR proposes a new API `log_input_column_names()` that logs the expected schema. Example: ```python import ray from ray.data.llm import build_llm_processor, vLLMEngineProcessorConfig processor_config = vLLMEngineProcessorConfig(...) processor = build_llm_processor(...) processor.log_input_column_names() # The first stage of the processor is ChatTemplateStage. # Required input columns: # messages: A list of messages in OpenAI chat format. See https://platform.openai.com/docs/api-reference/chat/create for details. processor_config = vLLMEngineProcessorConfig( apply_chat_template=False, tokenize=False, ) processor = build_llm_processor(...) processor.log_input_column_names() # The first stage of the processor is vLLMEngineStage. # Required input columns: # prompt: The text prompt (str). # sampling_params: The sampling parameters. See https://docs.vllm.ai/en/latest/api/inference_params.html#sampling-parameters for details. # Optional input columns: # tokenized_prompt: The tokenized prompt. If provided, the prompt will not be tokenized by the vLLM engine. # images: The images to generate text from. If provided, the prompt will be a multimodal prompt. # model: The model to use for this request. If the model is different from the model set in the stage, then this is a LoRA request. ``` ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Cody Yu <[email protected]>
## Why are these changes needed? 1. Adding more ops to `BlockColumnAccessor` 2. Fixing circular imports in Ray Data 3. Fixing AggregateFnV2 to be proper ABC 4. Simplifying `accumulate_block` op --------- Signed-off-by: Alexey Kudinkin <[email protected]>
) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Fixes #51195 ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(
Signed-off-by: liuxsh9 <[email protected]> Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Co-authored-by: Kourosh Hakhamaneshi <[email protected]>
…51563) ## Why are these changes needed? `use_legacy_format` has been deprecated since Arrow 15.0.0 and [is deleted from the repo](https://github.com/apache/arrow/pull/45742/files). Provided that it defaults to `use_legacy_format=False`, removing it from the Ray repo completely. --------- Signed-off-by: Alexey Kudinkin <[email protected]>
Signed-off-by: Chi-Sheng Liu <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Add gen config related doc ## Related issue number Closes https://anyscale1.atlassian.net/browse/LLM-1786?atlOrigin=eyJpIjoiZDg2MWMxNmU0YTY2NDRhMGJiN2JmNDk0NmNjYjE3OWIiLCJwIjoiaiJ9 ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Gene Su <[email protected]>
The compiled graphs quickstart was misusing `testcode`, which cannot be composed with `literalinclude`. The included code is already tested, so no concern about missed coverage. I've also split off the `core` and `tune` doctests into builds tagged with the appropriate team. --------- Signed-off-by: Edward Oakes <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? The locust request duration is milliseconds. ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: akyang-anyscale <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> The request for the image in the resnet50 application could hang indefinitely. This could block the event loop and make tests flaky. This PR adds a 5s timeout to the `get` call. Replica is stuck for some reason. As a result, requests are not making progress and the client is disconnecting due to timeout. ``` 2025-03-20, 0:02:18.331 | replica | d4c5cca8-c654-4d45-b212-ae0061194a86 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60001.2ms -- | -- | -- | -- | -- | -- I | 2025-03-20, 0:02:19.247 | replica | d486af45-d50e-468f-903f-d4bf723ea143 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.3ms I | 2025-03-20, 0:02:19.785 | replica | 173f5209-2324-43f4-a59d-d4e5ee974462 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.1ms I | 2025-03-20, 0:02:22.534 | proxy | edeba18e-7af1-4881-a584-7f528f6d9ed2 | ip-10-0-43-65 | | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request edeba18e-7af1-4881-a584-7f528f6d9ed2. I | 2025-03-20, 0:03:16.520 | replica | 100a8f5d-b8be-4263-ada3-518419eb6673 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59995.2ms I | 2025-03-20, 0:03:17.747 | replica | d433ecfa-a55b-4a56-bbad-0de5c6f1f6d9 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.2ms I | 2025-03-20, 0:03:19.081 | replica | ed2a9925-5e9b-474c-a644-ff701c8c7899 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.9ms I | 2025-03-20, 0:03:19.273 | replica | f67b7e3a-d33c-43e6-88a8-415c9aa6be69 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.4ms I | 2025-03-20, 0:03:20.918 | replica | 696ca5d9-f9b4-4899-a813-c2640a24c1ff | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.3ms I | 2025-03-20, 0:03:22.772 | proxy | 5dafad1a-455e-46ab-b584-686fafb7f420 | ip-10-0-43-65 | | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request 5dafad1a-455e-46ab-b584-686fafb7f420. I | 2025-03-20, 0:04:16.590 | replica | 54e4994f-efaa-413e-b45b-11080b60573f | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59996.8ms I | 2025-03-20, 0:04:18.386 | replica | cc74890f-9369-4664-9dd7-4bbadc979245 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.7ms I | 2025-03-20, 0:04:20.177 | proxy | 3d09d80c-9180-47cb-b8d2-16e54420447a | ip-10-0-43-65 | | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request 3d09d80c-9180-47cb-b8d2-16e54420447a. I | 2025-03-20, 0:04:20.922 | replica | 769f085e-17d2-4fec-ab78-9051cb2ae1ee | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.1ms I | 2025-03-20, 0:04:21.137 | replica | a1515707-2daf-4c85-9894-9303351ac64f | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59997.8ms I | 2025-03-20, 0:04:21.486 | replica | 9db28a80-a45a-4a0a-a209-ccd8f7d4a559 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59998.7ms I | 2025-03-20, 0:04:29.288 | proxy | 58cb1d1e-fb68-4367-8b71-a704c140f8ab | ip-10-0-43-65 | | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request 58cb1d1e-fb68-4367-8b71-a704c140f8ab. I | 2025-03-20, 0:05:16.701 | replica | 868d7cd3-e205-4a8b-9747-7a36a362be0c | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59997.5ms I | 2025-03-20, 0:05:20.062 | replica | 413193a9-1162-4dfd-a4bd-714cc3973cfe | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.1ms I | 2025-03-20, 0:05:21.596 | replica | 7a5109e9-d975-4319-b8b1-412b6b90b6c7 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60001.3ms I | 2025-03-20, 0:05:21.946 | replica | 2dd3a7d2-0988-46db-abed-088241b1f065 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.3ms I | 2025-03-20, 0:05:22.908 | proxy | d95da992-9e5e-40f6-8746-afcf8f3cbc34 | ip-10-0-43-65 | | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request d95da992-9e5e-40f6-8746-afcf8f3cbc34. I | 2025-03-20, 0:05:23.046 | replica | 65460812-87b2-4410-8360-c82521233400 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59997.2ms I | 2025-03-20, 0:05:24.479 | proxy | a8bc058a-04f6-41c8-9c66-49956f32dad0 | ip-10-0-41-135 | | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request a8bc058a-04f6-41c8-9c66-49956f32dad0. I | 2025-03-20, 0:06:16.765 | replica | b34ad934-8662-4e4c-a109-9cb44c636613 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60000.9ms I | 2025-03-20, 0:06:17.227 | proxy | a571adce-3fe6-4799-a72c-d2ad11594edc | ip-10-0-41-135 | | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request a571adce-3fe6-4799-a72c-d2ad11594edc. I | 2025-03-20, 0:06:20.594 | replica | 3175da08-d8ed-4928-ba62-1a9dc774b690 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.2ms I | 2025-03-20, 0:06:21.329 | proxy | 2b6cb246-0ff3-4a03-a129-4e5c69e99d6e | ip-10-0-43-65 | | Replica(id='6v27b6by', deployment='Model', app='default') rejected request because it is at max capacity of 5 ongoing requests. Retrying request 2b6cb246-0ff3-4a03-a129-4e5c69e99d6e. I | 2025-03-20, 0:06:22.382 | replica | b8adf8db-a21a-490f-98aa-ce4e2eb347fd | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60000.3ms I | 2025-03-20, 0:06:22.476 | replica | 03b6fe72-d630-479a-8072-0e1665192db9 | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 60000.7ms I | 2025-03-20, 0:06:24.476 | replica | e260d657-0975-45dd-8503-98a53d8f201e | ip-10-0-41-135 | 6v27b6by | GET / CANCELLED 59999.4ms ``` ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: akyang-anyscale <[email protected]>
…s to pypi (#51517) - Add helper function to add build tag (e.g. `-1`) right after Ray version in the wheel name, in cases where original wheels uploaded to pypi/test pypi are corrupted. --------- Signed-off-by: kevin <[email protected]>
The correct destination is stderr but not stdout. - We've mentioned effect to stream to stderr here: https://github.com/ray-project/ray/blob/a42e6580a59dff3291a56595a74ff27c04d9e29d/python/ray/_private/services.py#L1142-L1144 - Stream handler is used when logging filename not specified, which streams to stderr: https://docs.python.org/3/library/logging.handlers.html#logging.StreamHandler Signed-off-by: dentiny <[email protected]>
## Why are these changes needed? Add TorchDataLoader to Train Benchmark. --------- Signed-off-by: Srinath Krishnamachari <[email protected]>
Signed-off-by: Chi-Sheng Liu <[email protected]>
…50984) Signed-off-by: Chi-Sheng Liu <[email protected]>
…51557) Signed-off-by: Tatsuya Nishiyama <[email protected]>
Signed-off-by: Chi-Sheng Liu <[email protected]>
…y in RayService (#51095) Signed-off-by: Cheng-Yeh Chung <[email protected]>
Signed-off-by: dentiny <[email protected]>
Just a missing comma and equal sign Signed-off-by: Jonathan Dumaine <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )