Skip to content

Conversation

@400Ping
Copy link
Contributor

@400Ping 400Ping commented Nov 18, 2025

Description

Completing the datetime namespace operations

Related issues

Related to #58674

Additional information

@400Ping 400Ping requested a review from a team as a code owner November 18, 2025 15:38
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces datetime expression operations under the .dt namespace, which is a great addition. The implementation in dt_namespace.py is mostly clean and follows existing patterns.

However, there is a critical omission: the dt property is not defined on the Expr class in python/ray/data/expressions.py. Without this, expressions like col('my_date').dt.year() will fail. This property needs to be added, similar to how .list, .str, and .struct are implemented. You can add the following property to the Expr class:

    @property
    def dt(self) -> "_DatetimeNamespace":
        """Access datetime operations for this expression."""
        from ray.data.namespace_expressions.dt_namespace import _DatetimeNamespace

        return _DatetimeNamespace(self)

I've also left a couple of comments in dt_namespace.py regarding potential improvements:

  • A high-severity issue with how the return data type is determined for ceil, floor, and round functions, which could lead to incorrect type inference.
  • A medium-severity suggestion to refactor duplicated code for better maintainability.

@400Ping 400Ping changed the title [Data]Compute Expressions-datetime [WIP][Data]Compute Expressions-datetime Nov 18, 2025
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Nov 18, 2025
@400Ping 400Ping marked this pull request as draft November 20, 2025 10:27
@400Ping 400Ping force-pushed the data/compute-expressions-dt branch from 86ce74f to 5a04f43 Compare November 21, 2025 18:39
Signed-off-by: 400Ping <[email protected]>
Signed-off-by: 400Ping <[email protected]>
@400Ping 400Ping changed the title [WIP][Data]Compute Expressions-datetime [Data] Compute Expressions-datetime Nov 22, 2025
@400Ping 400Ping marked this pull request as ready for review November 22, 2025 05:57
Signed-off-by: 400Ping <[email protected]>
@goutamvenkat-anyscale goutamvenkat-anyscale added the go add ONLY when ready to merge, run all tests label Nov 23, 2025
Signed-off-by: 400Ping <[email protected]>
Copy link
Contributor

@goutamvenkat-anyscale goutamvenkat-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@goutamvenkat-anyscale
Copy link
Contributor

/gemini summary

@gemini-code-assist
Copy link
Contributor

Summary of Changes

This pull request significantly enhances Ray Data's expression capabilities by adding a dedicated datetime namespace. It allows users to perform a wide range of temporal manipulations directly on columns, such as extracting date parts, formatting, and rounding timestamps. This improves the expressiveness and convenience for data processing workflows involving datetime data.

Highlights

  • New Datetime Namespace: Introduced a new _DatetimeNamespace for Ray Data expressions, enabling direct datetime operations on columns using the .dt accessor (e.g., col('timestamp').dt.year()).
  • Comprehensive Datetime Operations: The new namespace includes methods for extracting temporal components (year, month, day, hour, minute, second), formatting datetimes with strftime, and temporal rounding operations (ceil, floor, round).
  • Integration with Expression API: The dt accessor is now a property of the Expr class, making datetime operations seamlessly available within the Ray Data expression framework.
  • New Test Coverage: Added new tests to validate the functionality of all datetime namespace operations and to ensure proper error handling when used on non-datetime column types.
Changelog
  • python/ray/data/expressions.py
    • Added _DatetimeNamespace to TYPE_CHECKING imports.
    • Introduced a dt property to the Expr class to provide access to datetime operations.
    • Included _DatetimeNamespace in the __all__ list and the __getattr__ method for lazy loading.
  • python/ray/data/namespace_expressions/dt_namespace.py
    • Added a new file defining the _DatetimeNamespace class.
    • Implemented methods for extracting year, month, day, hour, minute, and second components.
    • Added strftime method for custom datetime formatting.
    • Implemented ceil, floor, and round methods for temporal rounding, preserving the input data type.
    • Utilized pyarrow_udf for all datetime operations.
  • python/ray/data/tests/test_namespace_expressions.py
    • Imported the datetime module for testing purposes.
    • Added test_datetime_namespace_all_operations to comprehensively test all new datetime functions.
    • Added test_dt_namespace_invalid_dtype_raises to verify that using the dt namespace on non-datetime columns raises an exception.
    • Made minor formatting adjustments in existing struct tests.
Activity
  • A cursor[bot] comment highlighted that the _DatetimeNamespace was implemented but inaccessible due to a missing dt property on the Expr class.
  • A gemini-code-assist[bot] comment suggested using DataType.temporal() for return_dtype in temporal rounding functions to avoid object type inference issues, which was acknowledged by goutamvenkat-anyscale.
  • A gemini-code-assist[bot] comment recommended refactoring the year, month, etc., methods to reduce code duplication using a helper function, also acknowledged by goutamvenkat-anyscale.
  • goutamvenkat-anyscale inquired about the tie_breaker parameter for round_temporal and suggested using Literal for unit parameters in temporal functions.
  • goutamvenkat-anyscale requested adding comprehensive tests for the new datetime functionality in test_namespace_expressions.py, including a consolidated test and a failure case test.
  • A cursor[bot] comment pointed out that temporal rounding should preserve the specific input temporal type (self._expr.data_type) rather than using a generic DataType.temporal(), which goutamvenkat-anyscale confirmed with an example.
  • goutamvenkat-anyscale suggested using the rows_same utility for assertions in tests.
  • A cursor[bot] comment identified a low-severity bug regarding a test parametrization mismatch.

@alexeykudinkin alexeykudinkin merged commit 0cfeb95 into ray-project:master Nov 24, 2025
6 checks passed
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
## Description
Completing the datetime namespace operations

## Related issues
Related to ray-project#58674

## Additional information

---------

Signed-off-by: 400Ping <[email protected]>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
## Description
Completing the datetime namespace operations

## Related issues
Related to ray-project#58674 

## Additional information

---------

Signed-off-by: 400Ping <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants