Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quick Fix] Add wait time between web scrapping requests #269

Merged
merged 1 commit into from
Jan 18, 2024

Conversation

davidgxue
Copy link
Contributor

@davidgxue davidgxue commented Jan 16, 2024

Description

  • Context: See here ask_astro_load_astro_forum DAGs failing intermittently #262
  • Note: this is a quick turnaround fix. Will need to look into better retry logic long term down the road and potentially write a wrapper for the requests.
  • Added wait time between calls so we don't get 429 error when web scrapping in our DAGs.

Technical Changes

  • Added timeout of 1 sec at a few spots
  • Added user agent in requests header (see reason here)

Notes

@davidgxue davidgxue self-assigned this Jan 16, 2024
@davidgxue davidgxue marked this pull request as ready for review January 17, 2024 05:53
Copy link
Collaborator

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use something like https://github.com/jd/tenacity? Not a blocker for now though.

@davidgxue davidgxue changed the title Add wait time between web scrapping requests [Quick Fix] Add wait time between web scrapping requests Jan 18, 2024
@davidgxue
Copy link
Contributor Author

davidgxue commented Jan 18, 2024

Thanks @Lee-W for the suggestion and yes I agree. This will just be a quick turnaround fix for now so prod would have better stability. Long term probably use a mix of tenacity and implementation similar to this one https://github.com/alexwlchan/handling-http-429-with-tenacity. It's also probably better practice to create a requests client wrapper that auto retries for all of our requests.get calls. I will make an issue to track this

@davidgxue davidgxue merged commit 837ed46 into main Jan 18, 2024
8 checks passed
@davidgxue davidgxue deleted the web_scrap_request_frequency branch January 18, 2024 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ask_astro_load_astro_forum DAGs failing intermittently
3 participants