-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for 'a has fewer rows than b' #221
Conversation
Hi dan! Thanks for the PR! I'm a little behind on a few other things, so just to set expectations, I probably won't look at this until late this week or potentially later than that 😅 |
You do your thing 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question about this design of this test, then I'll review fully! (Trying to get these done before Christmas haha!)
select | ||
case | ||
when count_model > count_comparison then count_model - count_comparison + 1 | ||
when count_model = count_comparison then count_model - count_comparison |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've spent more minutes looking at this line than I'd care to admit.
In this case, the two tables have the same number of rows, right? Do we expect that to be a passing, or failing case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who wrote this? Someone in the dim and distant past. Yikes. Anyway, we're trying to find the number of errors so line 24+ is supposed to find the number of excess rows:
- If
count_model
=count_comparison
then we have 1 too many rows therefore return 1 - if
count_model
>count_comparison
then we havecount_model - count_comparison + 1
too many rows so return that number - if
count_model
<count_comparison
then we're good so return 0
So I think L27 should be:
when count_model = count_comparison then 1
That's one bug squashed. Is it clearer now? If it's failed the very first review by an experienced user then it probably needs some more work 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it! (Welcome back from the Christmas break 😉 )
I think one thing that is throwing me is "count model" and "count comparison" — these terms are generic so it makes it a little challenging to follow the logic. I think we can make these more explicit, and maybe adjust the case logic to be a little more verbose.
What do you think of something like this?
select
(count_model_with_more_rows - count_model_with_fewer_rows) as row_count_delta,
case
-- pass the test if the delta is positive (i.e. return the number 0)
when row_count_delta > 0 then 0
-- fail the test if they are the same number
when row_count_delta = 0 then 1
-- fail the test for negative numbers
when row_count_delta < 0 then abs(row_count_delta)
end as excess_rows
from counts
)
select excess_rows from final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this may not work on postgres — most cloud warehouses support lateral column aliasing (i.e. using a calculated column in the same CTE) but I suspect postgres does not, so we might need to move the "case" statement into another CTE! 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion looks good! I didn't know that about postgres but the very verbose CTE-heavy version I've just committed might solve it...
bbba960
to
60a3b3c
Compare
Reopened as #343 (to integrate upstream changes) |
Description & motivation
Add a
fewer_rows_than
test.This lets an analyst confirm that our model (A) has fewer rows than some reference model (B). It's useful for confirming that the number of rows has actually reduced after applying some filter or inner join in model A.
Checklist