Skip to content

Conversation

@simeetnayan81
Copy link
Contributor

Description

  • Expose a version parameter on ray.data.read_lance to read historical Lance dataset versions.
  • Add unit test python/ray/data/tests/test_lance.py::test_lance_read_with_version that writes an initial dataset, records the initial version, merges new data, and asserts default read returns the latest while read_lance(path, version=initial_version) returns the original columns and rows.

Related issues

Closes #58226

Additional information

As mentioned in the original issue, exposed version parameter in read_lance function. The parameter is passed down to LanceDatasource which is updated as well. Ultimately, lance.dataset takes this version param to read the specific version.

@simeetnayan81 simeetnayan81 requested a review from a team as a code owner November 21, 2025 19:38
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully adds support for reading specific versions of a Lance dataset by exposing a version parameter in ray.data.read_lance. The change is correctly propagated through the LanceDatasource to the underlying lance.dataset call. The implementation is clean, and the new parameter is well-documented. A comprehensive unit test has been added to verify the new functionality, ensuring that both the latest version (by default) and a specific historical version can be read correctly. I have one minor suggestion to improve a comment in the new test for clarity. Overall, this is a great addition.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Simeet Nayan <[email protected]>
@richardliaw richardliaw added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Nov 21, 2025
@richardliaw
Copy link
Contributor

@simeetnayan81 awesome, thanks!

@richardliaw richardliaw changed the title Add version support to read_lance [data] Add version support to read_lance Nov 21, 2025
@richardliaw richardliaw changed the title [data] Add version support to read_lance [data] Add version support to read_lance Nov 21, 2025
@richardliaw richardliaw merged commit c8e59db into ray-project:master Nov 21, 2025
7 checks passed
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
## Description
- Expose a version parameter on ray.data.read_lance to read historical
Lance dataset versions.
- Add unit test
python/ray/data/tests/test_lance.py::test_lance_read_with_version that
writes an initial dataset, records the initial version, merges new data,
and asserts default read returns the latest while read_lance(path,
version=initial_version) returns the original columns and rows.

## Related issues
> Closes ray-project#58226

## Additional information
As mentioned in the original issue, exposed version parameter in
```read_lance``` function. The parameter is passed down to
```LanceDatasource``` which is updated as well. Ultimately,
```lance.dataset``` takes this version param to read the specific
version.

---------

Signed-off-by: Simeet Nayan <[email protected]>
Signed-off-by: Simeet Nayan <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
## Description
- Expose a version parameter on ray.data.read_lance to read historical
Lance dataset versions.
- Add unit test
python/ray/data/tests/test_lance.py::test_lance_read_with_version that
writes an initial dataset, records the initial version, merges new data,
and asserts default read returns the latest while read_lance(path,
version=initial_version) returns the original columns and rows.

## Related issues
> Closes ray-project#58226 

## Additional information
As mentioned in the original issue, exposed version parameter in
```read_lance``` function. The parameter is passed down to
```LanceDatasource``` which is updated as well. Ultimately,
```lance.dataset``` takes this version param to read the specific
version.

---------

Signed-off-by: Simeet Nayan <[email protected]>
Signed-off-by: Simeet Nayan <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[data] Read previous versions of Lance dataset into ray data

2 participants