Skip to content

Use sp_return_logs for billing reports, not monthly table#6935

Merged
zachmargolis merged 9 commits intomainfrom
margolis-plain-return-logs
Sep 15, 2022
Merged

Use sp_return_logs for billing reports, not monthly table#6935
zachmargolis merged 9 commits intomainfrom
margolis-plain-return-logs

Conversation

@zachmargolis
Copy link
Contributor

@zachmargolis zachmargolis commented Sep 9, 2022

(see context in Slack discussion)

Accuracy

We caught a few discrepancies in our reports

  • the monthly table is off in some some cases, I couldn't get it to match sp_return_logs, so since sp_return_logs is the raw table, it seemed more accurate and easier to check if both reports used the same thing
  • when we use BETWEEN '2021-01-01' AND '2021-01-31' we get truncated results compared to the fuller timestamp version BETWEEN '2021-01-01 00:00:00.00000' AND '2021-01-31 23:59:59.999999' so switches this to use that

Performance

sp_return_logs is a bigger table, so these queries will likely be slower. I am running them now to make sure they return eventually, but unfortunately I think we need to be making this performance/accuracy tradeoff right now.

Next Steps

  • Apparently billing only uses these two reports, so maybe we should remove all the rest
  • Maybe we drop the monthly tables since we have trouble reconciling them, and we still retain the raw data

* Also update month ranges to use timestamps instead of dates to make sure we're
  at the correct beginning/end of day
@zachmargolis zachmargolis requested a review from a team September 9, 2022 20:29
# ]
# @param [Range<Date>] date_range
# @return [Array<Range<Date>>]
# @return [Array<Range<Time>>]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically these are ActiveSupport::TimeWithZone ... but that was a lot more to write, and that's a subclass, so I felt like the gist was correct here

**Why**: Sometimes these long-running queries get serialization errors,
so let's retry them individually rather than throw away the whole report
changelog: Internal, Reporting, Update billing reports to be more accurate
Comment on lines +32 to +38
temp_copy = ial_to_year_month_to_users.deep_dup

with_retries(
max_tries: 3,
rescue: PG::TRSerializationFailure,
handler: proc { ial_to_year_month_to_users = temp_copy },
) do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see commit notes (70ae9c6) for a longer comment, but short version is we get these errors occasionally and I figured it was worth a shot seeing if we could quickly retry + recover rather than abort the entire job

Copy link
Contributor Author

@zachmargolis zachmargolis Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in case it's not obvious what it's doing, we have a ruby nested hash/multiset thing object where we add results incrementally and this is creating a copy before streaming the results, and then restoring the copy to the last known good result every time the query fails

we could rewrite this as begin/rescue syntax but we'd need add some sleep code and need a retry counter ourself, so this seemed clear enough? the alternative would be:

temp_copy = ial_to_year_month_to_users.deep_dup
attempt_count = 0
begin
  stream_query(query) do |row|
    # ...
  end
rescue PG::TRSerializationFailure => e
  attempt_count += 1
  if attempt_count <3
    ial_to_year_month_to_users = temp_copy
    retry
   else
    raise e
  end
end

@zachmargolis zachmargolis marked this pull request as ready for review September 9, 2022 21:46
iaa_start_date: iaa_range.begin.to_s,
iaa_end_date: iaa_range.end.to_s,
total_auth_count: 300,
total_auth_count: 21,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we're using the direct table, this was just to create fewer rows and make a faster test

Comment on lines +212 to +223
{
iaa: iaa2_key,
ial1_total_auth_count: 0,
ial2_total_auth_count: 1,
ial1_unique_users: 0,
ial2_unique_users: 1,
ial1_new_unique_users: 0,
ial2_new_unique_users: 1,
year_month: inside_iaa2.strftime('%Y%m'),
iaa_start_date: iaa2_range.begin.to_s,
iaa_end_date: iaa2_range.end.to_s,
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct now that this has another section... I think it may have been grouped inaccurately before

Comment on lines +128 to +104
sp_return_logs.requested_at::date BETWEEN %{range_start} AND %{range_end}
sp_return_logs.requested_at BETWEEN %{range_start} AND %{range_end}
Copy link
Contributor Author

@zachmargolis zachmargolis Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I think that this change may mean these are essentially unindexed queries now 😬 , this is our partial index:

t.index "((requested_at)::date), issuer", name: "index_sp_return_logs_on_requested_at_date_issuer", where: "(returned_at IS NOT NULL)"

cc @mitchellhenke @stevegsa @jmhooper if you have any thoughts on if I should try to add a "plain" index on (requested_at, issuer)? (and it's a huge table so I know it would fail during a normal deploy)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need requested_at and issuer does nothing it can be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the combined billing reports, we do break things out by issuer by month, so I think it does help for those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update, thanks to some investigation help from @mitchellhenke, the ::date truncation does work like we expect so I undid this timestamp change part in c5b005e

Copy link
Contributor

@stevegsa stevegsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. just need the index and i think we are good to go. not sure why date isn't getting that equivalent full timestamp though. if it was this pr wouldn't be needed correct?

@zachmargolis zachmargolis merged commit 636c497 into main Sep 15, 2022
@zachmargolis zachmargolis deleted the margolis-plain-return-logs branch September 15, 2022 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants