Skip to content

[BUG] UNBOUNDED window ranges on null timestamp columns produces incorrect results. #1039

@revans2

Description

@revans2

Describe the bug
This is a follow on issue to #825. It looks like the window function was fixed for nulls, except for the UNBOUNDED case.

Steps/Code to reproduce bug
test_window_aggs_for_ranges against cudf-0.17 currently fails all the time. I will update the test with an xfail for those situations pointing to this issue.

Expected behavior
This test should pass

It looks like we are not handling unbounded properly and in the short term will have to disable nullable timestamp columns with unbounded preceding or following intervals.

I simplified the query to make debugging this simpler

select
  count(c) over 
    (partition by a order by cast(b as timestamp) asc
        range between  CURRENT ROW and UNBOUNDED following) as count_c_asc, a, b, c
from window_agg_table order by a, b, c

I dropped the length of the generated data to 100 and I see results like the following (with emphasis added)

CPU:

Row(count_c_asc=*5*, a=-5831592707909023540, b=None, c=756780896), 
Row(count_c_asc=4, a=-5831592707909023540, b=datetime.date(2020, 2, 29), c=-656902282), 
Row(count_c_asc=3, a=-5831592707909023540, b=datetime.date(2020, 3, 18), c=-756294971), 
Row(count_c_asc=2, a=-5831592707909023540, b=datetime.date(2020, 10, 10), c=2117211837), 
Row(count_c_asc=1, a=-5831592707909023540, b=datetime.date(2020, 12, 25), c=110877650), 

GPU:

Row(count_c_asc=*1*, a=-5831592707909023540, b=None, c=756780896), 
Row(count_c_asc=4, a=-5831592707909023540, b=datetime.date(2020, 2, 29), c=-656902282), 
Row(count_c_asc=3, a=-5831592707909023540, b=datetime.date(2020, 3, 18), c=-756294971), 
Row(count_c_asc=2, a=-5831592707909023540, b=datetime.date(2020, 10, 10), c=2117211837), 
Row(count_c_asc=1, a=-5831592707909023540, b=datetime.date(2020, 12, 25), c=110877650),

The GPU side appears to be doing what we talked about, but the CPU side appears to not care about the null in the timestamp column. It feels like unbounded really means unbounded, because when I switch the following to be INTERVAL 1000 DAYS the GPU now matches the CPU results.

Metadata

Metadata

Assignees

Labels

P0Must have for releasebugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions