Use sp_return_logs for billing reports, not monthly table#6935
Use sp_return_logs for billing reports, not monthly table#6935zachmargolis merged 9 commits intomainfrom
Conversation
* Also update month ranges to use timestamps instead of dates to make sure we're at the correct beginning/end of day
app/jobs/reports/month_helper.rb
Outdated
| # ] | ||
| # @param [Range<Date>] date_range | ||
| # @return [Array<Range<Date>>] | ||
| # @return [Array<Range<Time>>] |
There was a problem hiding this comment.
technically these are ActiveSupport::TimeWithZone ... but that was a lot more to write, and that's a subclass, so I felt like the gist was correct here
**Why**: Sometimes these long-running queries get serialization errors, so let's retry them individually rather than throw away the whole report
changelog: Internal, Reporting, Update billing reports to be more accurate
| temp_copy = ial_to_year_month_to_users.deep_dup | ||
|
|
||
| with_retries( | ||
| max_tries: 3, | ||
| rescue: PG::TRSerializationFailure, | ||
| handler: proc { ial_to_year_month_to_users = temp_copy }, | ||
| ) do |
There was a problem hiding this comment.
see commit notes (70ae9c6) for a longer comment, but short version is we get these errors occasionally and I figured it was worth a shot seeing if we could quickly retry + recover rather than abort the entire job
There was a problem hiding this comment.
and in case it's not obvious what it's doing, we have a ruby nested hash/multiset thing object where we add results incrementally and this is creating a copy before streaming the results, and then restoring the copy to the last known good result every time the query fails
we could rewrite this as begin/rescue syntax but we'd need add some sleep code and need a retry counter ourself, so this seemed clear enough? the alternative would be:
temp_copy = ial_to_year_month_to_users.deep_dup
attempt_count = 0
begin
stream_query(query) do |row|
# ...
end
rescue PG::TRSerializationFailure => e
attempt_count += 1
if attempt_count <3
ial_to_year_month_to_users = temp_copy
retry
else
raise e
end
end| iaa_start_date: iaa_range.begin.to_s, | ||
| iaa_end_date: iaa_range.end.to_s, | ||
| total_auth_count: 300, | ||
| total_auth_count: 21, |
There was a problem hiding this comment.
now that we're using the direct table, this was just to create fewer rows and make a faster test
| { | ||
| iaa: iaa2_key, | ||
| ial1_total_auth_count: 0, | ||
| ial2_total_auth_count: 1, | ||
| ial1_unique_users: 0, | ||
| ial2_unique_users: 1, | ||
| ial1_new_unique_users: 0, | ||
| ial2_new_unique_users: 1, | ||
| year_month: inside_iaa2.strftime('%Y%m'), | ||
| iaa_start_date: iaa2_range.begin.to_s, | ||
| iaa_end_date: iaa2_range.end.to_s, | ||
| }, |
There was a problem hiding this comment.
I think this is correct now that this has another section... I think it may have been grouped inaccurately before
| sp_return_logs.requested_at::date BETWEEN %{range_start} AND %{range_end} | ||
| sp_return_logs.requested_at BETWEEN %{range_start} AND %{range_end} |
There was a problem hiding this comment.
so I think that this change may mean these are essentially unindexed queries now 😬 , this is our partial index:
Line 588 in c62f569
cc @mitchellhenke @stevegsa @jmhooper if you have any thoughts on if I should try to add a "plain" index on (requested_at, issuer)? (and it's a huge table so I know it would fail during a normal deploy)
There was a problem hiding this comment.
Yes we need requested_at and issuer does nothing it can be dropped.
There was a problem hiding this comment.
for the combined billing reports, we do break things out by issuer by month, so I think it does help for those?
There was a problem hiding this comment.
update, thanks to some investigation help from @mitchellhenke, the ::date truncation does work like we expect so I undid this timestamp change part in c5b005e
stevegsa
left a comment
There was a problem hiding this comment.
looks great. just need the index and i think we are good to go. not sure why date isn't getting that equivalent full timestamp though. if it was this pr wouldn't be needed correct?
(see context in Slack discussion)
Accuracy
We caught a few discrepancies in our reports
when we useBETWEEN '2021-01-01' AND '2021-01-31'we get truncated results compared to the fuller timestamp versionBETWEEN '2021-01-01 00:00:00.00000' AND '2021-01-31 23:59:59.999999'so switches this to use thatPerformance
sp_return_logsis a bigger table, so these queries will likely be slower. I am running them now to make sure they return eventually, but unfortunately I think we need to be making this performance/accuracy tradeoff right now.Next Steps