Skip to content

Update long-running report queries to reset connection#7104

Merged
zachmargolis merged 3 commits intomainfrom
margolis-try-to-recover-big-report-queries
Oct 6, 2022
Merged

Update long-running report queries to reset connection#7104
zachmargolis merged 3 commits intomainfrom
margolis-try-to-recover-big-report-queries

Conversation

@zachmargolis
Copy link
Contributor

🎫 Ticket

(none)

🛠 Summary of changes

Why: Occasionally we see PG::UnableToSend errors at which point the connection stops receiving queries. NewRelic link

In an attempt to recover the job and continue, we call #reconnect! on the connection. However, this would reset our session-specific timeouts, so there's a big of rejiggering to allow the connection timeout stuff from the reports to be called in to the db query classes

📜 Testing Plan

  • Try patching these changes in a Rails console and see if they work

**Why**: Occasionally we see PG::UnableToSend errors at which point
the connection stops receiving queries.

In an attempt to recover the job and continue, we call #reconnect!
on the connection. However, this would reset our session-specific
timeouts, so there's a big of rejiggering to allow the connection
timeout stuff from the reports to be called in to the db query
classes
# We use good_job's concurrency features to cancel "extra" or duplicative runs of the same job
discard_on GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError

def self.transaction_with_timeout(rails_env = Rails.env)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubocop got mad this was after the private keyword because static methods are public

Time.zone.now.end_of_day
end

def report_timeout
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never overrode this timeout in any instances, so I just removed it by inlining it

[skip changelog]
Copy link
Contributor

@mitchellhenke mitchellhenke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would switching from streaming results back to not streaming change anything regarding this?

@zachmargolis
Copy link
Contributor Author

Would switching from streaming results back to not streaming change anything regarding this?

you know, at this point it's worth a shot. the streaming queries use some more low-level connection method so they're more likely to mess up the connections, I can try switching to an all-at-once and see if things retry better

@zachmargolis
Copy link
Contributor Author

Would switching from streaming results back to not streaming change anything regarding this?

Ok so it took almost 1.5 hours, but when I ran the changes in this PR manually in the console, they worked. I'll merge this now because, at least it works. I'll have a follow-up PR where I remove streaming and see if it makes a difference in runtime

@zachmargolis zachmargolis merged commit dc57be3 into main Oct 6, 2022
@zachmargolis zachmargolis deleted the margolis-try-to-recover-big-report-queries branch October 6, 2022 23:32
jskinne3 pushed a commit that referenced this pull request Oct 12, 2022
**Why**: Occasionally we see PG::UnableToSend errors at which point
the connection stops receiving queries.

In an attempt to recover the job and continue, we call #reconnect!
on the connection. However, this would reset our session-specific
timeouts, so there's a big of rejiggering to allow the connection
timeout stuff from the reports to be called in to the db query
classes

* Move transaction_with_timeout to be static

[skip changelog]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants