Add upper_bound_column to order by #660

sfc-gh-ancoleman · 2022-08-30T13:56:15Z

resolves #659

This is a:

documentation update
bug fix with no breaking changes
new functionality
a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

The order by clause(s) used in the window functions in the mutually_exclusive_ranges test give non-deterministic results when more than one range within a partition has the same lower_bound. If zero-length ranges are not allowed, the test will FAIL (as it should) if this occurs. But if zero length ranges are allowed and gaps are not required, it should be expected that one could have more than one range within a partition with the same lower_bound.

This PR changes the order by clause(s) to use {{ lower_bound_column }}, {{ upper_bound_column }} as the ordering criteria. This will ensure that when multiple ranges have the same lower_bound, they are sorted based on their upper_bound, which will place all zero-length ranges together, with any non-zero-length range with the same lower_bound appearing as the last record in this group, causing it's upper_bound to be compared to the next distinct lower_bound.

Since the test only considers the lower and upper bound columns, the fact that records with the same lower and upper bounds won't have a guaranteed order shouldn't be problematic, since these are effectively interchangeable as far as the test is concerned. So this should be enough to guarantee deterministic results (i.e., the test will always PASS or will always FAIL, unless changes are made to the dataset or to the test configuration).

Checklist

joellabes · 2022-09-05T11:28:52Z

@sfc-gh-ancoleman I like this! Would you be able to add a test case just like your one from the initial issue which fails under the current setup (subject to the vagaries of non-deterministic sort algorithms at least) and passes under the new system?

There are sample tests in the integration_tests project in this repo - let me know if you need a hand digging into them

sfc-gh-ancoleman · 2022-09-09T17:16:28Z

@joellabes Since I believe that I just identified an edge-case that the existing test data doesn't include, I modified the existing data_test_mutually_exclusive_ranges_with_gaps_zero_length.csv seed so that it includes a zero-length record for 2020-05-08 as well as a positive-length record that also begins on 2020-05-08. I placed the positive-length record above the zero-length record in a (possibly misguided) attempt to influence ordering.

I had initially also extended the final record for supscription_id 3 so that it overlaps with some records for subscription_id 4, so that if at some point in the future the macro is changed in a way that breaks the functioning of the partition_by argument, the test on the dataset I modified should fail. I have undone this change, since it's not really related to this issue, and there is probably a better place to incorporate such a test.

joellabes · 2022-09-12T03:47:23Z

Works for me! Thanks @sfc-gh-ancoleman 🌟

sfc-gh-ancoleman added 2 commits August 30, 2022 09:40

Add upper_bound_column to order by

8dc71a0

Update CHANGELOG

1120da6

sfc-gh-ancoleman added 2 commits September 9, 2022 13:01

Update integration test

6527ce5

Remove overlap between partitions

7ea68fe

Merge branch 'main' into zero_length_range_allowed_fix

03949a7

joellabes approved these changes Sep 12, 2022

View reviewed changes

joellabes merged commit 53f8352 into dbt-labs:main Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add upper_bound_column to order by #660

Add upper_bound_column to order by #660

sfc-gh-ancoleman commented Aug 30, 2022 •

edited

Loading

joellabes commented Sep 5, 2022

sfc-gh-ancoleman commented Sep 9, 2022

joellabes commented Sep 12, 2022

Add upper_bound_column to order by #660

Add upper_bound_column to order by #660

Conversation

sfc-gh-ancoleman commented Aug 30, 2022 • edited Loading

Description & motivation

Checklist

joellabes commented Sep 5, 2022

sfc-gh-ancoleman commented Sep 9, 2022

joellabes commented Sep 12, 2022

sfc-gh-ancoleman commented Aug 30, 2022 •

edited

Loading