-
Notifications
You must be signed in to change notification settings - Fork 7k
[data] Add version support to read_lance
#58895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e Implementation Signed-off-by: Simeet Nayan <[email protected]>
Signed-off-by: Simeet Nayan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully adds support for reading specific versions of a Lance dataset by exposing a version parameter in ray.data.read_lance. The change is correctly propagated through the LanceDatasource to the underlying lance.dataset call. The implementation is clean, and the new parameter is well-documented. A comprehensive unit test has been added to verify the new functionality, ensuring that both the latest version (by default) and a specific historical version can be read correctly. I have one minor suggestion to improve a comment in the new test for clarity. Overall, this is a great addition.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Simeet Nayan <[email protected]>
|
@simeetnayan81 awesome, thanks! |
read_lance
## Description - Expose a version parameter on ray.data.read_lance to read historical Lance dataset versions. - Add unit test python/ray/data/tests/test_lance.py::test_lance_read_with_version that writes an initial dataset, records the initial version, merges new data, and asserts default read returns the latest while read_lance(path, version=initial_version) returns the original columns and rows. ## Related issues > Closes ray-project#58226 ## Additional information As mentioned in the original issue, exposed version parameter in ```read_lance``` function. The parameter is passed down to ```LanceDatasource``` which is updated as well. Ultimately, ```lance.dataset``` takes this version param to read the specific version. --------- Signed-off-by: Simeet Nayan <[email protected]> Signed-off-by: Simeet Nayan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: YK <[email protected]>
## Description - Expose a version parameter on ray.data.read_lance to read historical Lance dataset versions. - Add unit test python/ray/data/tests/test_lance.py::test_lance_read_with_version that writes an initial dataset, records the initial version, merges new data, and asserts default read returns the latest while read_lance(path, version=initial_version) returns the original columns and rows. ## Related issues > Closes ray-project#58226 ## Additional information As mentioned in the original issue, exposed version parameter in ```read_lance``` function. The parameter is passed down to ```LanceDatasource``` which is updated as well. Ultimately, ```lance.dataset``` takes this version param to read the specific version. --------- Signed-off-by: Simeet Nayan <[email protected]> Signed-off-by: Simeet Nayan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Description
Related issues
Additional information
As mentioned in the original issue, exposed version parameter in
read_lancefunction. The parameter is passed down toLanceDatasourcewhich is updated as well. Ultimately,lance.datasettakes this version param to read the specific version.