Skip to content

Conversation

@uros-db
Copy link
Contributor

@uros-db uros-db commented Aug 4, 2025

What changes were proposed in this pull request?

Implement the make_timestamp_ntz and try_make_timestamp_ntz functions in PySpark & PySpark Connect API.

Why are the changes needed?

Expand API support for the make_timestamp_ntz and try_make_timestamp_ntz functions.

Does this PR introduce any user-facing change?

Yes, the new functions are now available in Python API.

How was this patch tested?

Added appropriate Python function tests.

  • pyspark.sql.tests.test_functions
  • pyspark.sql.tests.connect.test_parity_functions

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Contributor Author

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this PR on pause until the corresponding Scala PR is merged: #51828.

@uros-db uros-db marked this pull request as draft August 4, 2025 21:30
@uros-db uros-db marked this pull request as ready for review August 27, 2025 07:18


def make_timestamp_ntz( # type: ignore[misc]
yearsOrDate: "ColumnOrName",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change the argument names?
does it break this?

make_timestamp_ntz(years=..., months=..., days=..., hours=..., mins=..., secs=...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's don't change the name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted this change. using keyword arguments for new date and time arguments.

@zhengruifeng zhengruifeng requested a review from ueshin September 17, 2025 07:35
@uros-db
Copy link
Contributor Author

uros-db commented Sep 17, 2025

Adding @Yicong-Huang to take over this task and address lint failures & outstanding comments.

@Yicong-Huang
Copy link
Contributor

@zhengruifeng could you please review this again? I've updated it with setting date and time as keyword arguments.

…ions with improved argument handling and error messaging


def make_timestamp_ntz( # type: ignore[misc]
*args: "ColumnOrName",
Copy link
Contributor

@zhengruifeng zhengruifeng Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we remove the *args: "ColumnOrName" and only support following patterns? @HyukjinKwon

make_timestamp_ntz(years=y, months=mo, days=d, hours=h, mins=mi, secs=s)
make_timestamp_ntz(y, mo, d, h, mins=mi, secs=s)
make_timestamp_ntz(y, mo, d, h, mi, s)

make_timestamp_ntz(date=d, time=t) <- for the new code path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep *args: "ColumnOrName", we need a helper function to preprocess the arguments so that it can be reused in all the places.

def _preprocess_make_timestampe_args(...) -> Tuple[ColumnOrName, ..., ColumnOrName] which returns a 8-tuple for the 8 arguments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the *args to match the positional arguments. Those are from the original use case. If we remove it, I am afraid that we'd cause breaking change?

Copy link
Contributor

@zhengruifeng zhengruifeng Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it won't break positional cases like following?

make_timestamp_ntz(y, mo, d, h, mi, s)
make_timestamp_ntz(y, mo, d, h, mins=mi, secs=s)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. I did one more refactor to remove *args and now we explicitly support the following 4 cases:

make_timestamp_ntz(years=y, months=mo, days=d, hours=h, mins=mi, secs=s)
make_timestamp_ntz(y, mo, d, h, mins=mi, secs=s)
make_timestamp_ntz(y, mo, d, h, mi, s)

make_timestamp_ntz(date=d, time=t)

Could you please check again?

…nctions by removing overloads and enhancing argument handling
"Value for `<arg_name>` must be between <lower_bound> and <upper_bound> (inclusive), got <actual>"
]
},
"WRONG_NUM_ARGS": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we can reuse error class CANNOT_SET_TOGETHER:

            raise PySparkValueError(
                errorClass="CANNOT_SET_TOGETHER",
                messageParameters={"arg_list": "years|months|days|hours|mins|secs and date|time"},
            )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!


@overload
def make_timestamp_ntz(
*,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need *?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is to mark the date and time (anything after *) are keyword only. Otherwise it will be marked as positional as well, then mypy would complain.

hours: Optional["ColumnOrName"] = None,
mins: Optional["ColumnOrName"] = None,
secs: Optional["ColumnOrName"] = None,
*,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need *?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

and date is None
and time is None
):
from typing import cast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast is already imported at the top of this file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed redundant imports (cast, PySparkValueError).

date: Optional["ColumnOrName"] = None,
time: Optional["ColumnOrName"] = None,
) -> Column:
# 6 positional/keyword arguments: years, months, days, hours, mins, secs
Copy link
Contributor

@zhengruifeng zhengruifeng Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify it a bit:

if year is not None:
    if any(arg is not None for arg in [date, time]):
        raise CANNOT_SET_TOGETHER
    return _invoke_function_over_columns("make_timestamp_ntz", years, ...)
else:
    if any(arg is not None for arg in [years, months, days, hours, mins, secs]):
        raise CANNOT_SET_TOGETHER
    return _invoke_function_over_columns("make_timestamp_ntz", date, time)

_invoke_function_over_columns will raise NOT_COLUMN_OR_STR if a input is None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I have it too complicated to check all arguments. I've followed your suggestion to push other validations to _invoke_function_over_columns

Comment on lines 4182 to 4186
# Invalid argument combinations - return NULL for try_ functions
# For try_ functions, invalid inputs should return NULL
from pyspark.sql.connect.functions import lit

return lit(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not allow this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed to use default values when inputs are None. Please check.

Comment on lines 4117 to 4121
_months = lit(0) if months is None else months
_days = lit(0) if days is None else days
_hours = lit(0) if hours is None else hours
_mins = lit(0) if mins is None else mins
_secs = lit(0) if secs is None else secs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why support None => lit(0) here?

…arameters and enhance test cases for required arguments
…ueError for conflicting argument combinations
@zhengruifeng
Copy link
Contributor

merged to master

@zhengruifeng
Copy link
Contributor

@Yicong-Huang some newly added tests actually depends on ANSI, and failed when ANSI is off.
see https://github.com/apache/spark/actions/runs/18025276597/job/51291154866

I am going to remove them in #52466

huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…p_ntz and try_make_timestamp_ntz functions in PySpark

### What changes were proposed in this pull request?
Implement the `make_timestamp_ntz` and `try_make_timestamp_ntz` functions in PySpark & PySpark Connect API.

### Why are the changes needed?
Expand API support for the `make_timestamp_ntz` and `try_make_timestamp_ntz` functions.

### Does this PR introduce _any_ user-facing change?
Yes, the new functions are now available in Python API.

### How was this patch tested?
Added appropriate Python function tests.

- pyspark.sql.tests.test_functions
- pyspark.sql.tests.connect.test_parity_functions

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#51831 from uros-db/python-try_make_timestamp_ntz.

Lead-authored-by: Yicong-Huang <[email protected]>
Co-authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants