Skip to content

Conversation

@gangsf
Copy link
Collaborator

@gangsf gangsf commented Oct 30, 2025

Description

Ray train has some change that is not backward compatible. We are updating the multimodal AI workload template to use the latest Ray image and pass device to collate_fn

Test

Tested in this workspace.

@gangsf gangsf requested a review from a team as a code owner October 30, 2025 22:21
@gangsf gangsf requested a review from matthewdeng October 30, 2025 22:25
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the image-search-and-classification template to be compatible with a recent change in Ray Train, which now requires passing the device to collate_fn. The changes correctly propagate the device parameter through the call chain in infer.py, serve.py, and model.py. The Ray image version is also updated in the configuration files. The logic in the updated collate_fn is robust, handling execution both inside and outside of a Ray Train context. My main feedback is on the Jupyter notebook, where code is duplicated from the Python modules. I've suggested replacing these duplicated code blocks with imports to improve maintainability.

Comment on lines +898 to 917
"def collate_fn(batch, device=None):\n",
" dtypes = {\"embedding\": torch.float32, \"label\": torch.int64}\n",
" tensor_batch = {}\n",
" \n",
" # If no device is provided, try to get it from Ray Train context\n",
" if device is None:\n",
" try:\n",
" device = get_device()\n",
" except RuntimeError:\n",
" # When not in Ray Train context, use CPU for testing\n",
" device = \"cpu\"\n",
" \n",
" for key in dtypes.keys():\n",
" if key in batch:\n",
" tensor_batch[key] = torch.as_tensor(\n",
" batch[key],\n",
" dtype=dtypes[key],\n",
" device=get_device(),\n",
" device=device,\n",
" )\n",
" return tensor_batch\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and avoid code duplication, it's better to import collate_fn from the doggos.model module instead of redefining it in the notebook. This ensures that any future changes to the function in the source file are automatically reflected here.

The notebook already has the necessary path setup to import from the doggos package.

Suggested change
"def collate_fn(batch, device=None):\n",
" dtypes = {\"embedding\": torch.float32, \"label\": torch.int64}\n",
" tensor_batch = {}\n",
" \n",
" # If no device is provided, try to get it from Ray Train context\n",
" if device is None:\n",
" try:\n",
" device = get_device()\n",
" except RuntimeError:\n",
" # When not in Ray Train context, use CPU for testing\n",
" device = \"cpu\"\n",
" \n",
" for key in dtypes.keys():\n",
" if key in batch:\n",
" tensor_batch[key] = torch.as_tensor(\n",
" batch[key],\n",
" dtype=dtypes[key],\n",
" device=get_device(),\n",
" device=device,\n",
" )\n",
" return tensor_batch\n"
"from doggos.model import collate_fn\n"

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stmp

@ray-gardener ray-gardener bot added the train Ray Train Related Issue label Oct 31, 2025
@kouroshHakha kouroshHakha added the go add ONLY when ready to merge, run all tests label Oct 31, 2025
@kouroshHakha kouroshHakha enabled auto-merge (squash) October 31, 2025 21:09
@kouroshHakha kouroshHakha merged commit b6e6210 into ray-project:master Oct 31, 2025
8 checks passed
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…collate_fn (ray-project#58327)

Signed-off-by: Gang Zhao <[email protected]>
Co-authored-by: Gang Zhao <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…collate_fn (ray-project#58327)

Signed-off-by: Gang Zhao <[email protected]>
Co-authored-by: Gang Zhao <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants