-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider a less aggressive scraping strategy on Fridays #419
Comments
For reference: the crons themselves. @shrayshray, to decode the crons, we scrape all events at 0, 15, 30, and 45 minutes after the hour. We implemented this solution to handle issues arising with Legistar timestamps that did not change as expected, e.g., #310 and #267 We also aggressively scrape bills on Friday (again, to address timestamp issues, e.g., #328) I think we should consider a multi-faceted revision of our crons:
@shrayshray - can you let us know how this sounds? and how we should prioritize making changes? |
@reginafcompton this sounds good. This should be high priority. Do you have any concerns about implementing it right away, as there will also be an agenda posted this Friday? |
@shrayshray - I can implement this solution tomorrow. I'd rather test this with our upcoming agenda on Friday, rather than wait until we have multiple agendas posted in April (which seems higher risk to me). |
@reginafcompton sounds like a plan, thank you! |
@shrayshray - we can close this issue via datamade/scrapers-us-municipal#32! Summary of changesOn Fridays, from 2-10:00 pm CT ––
This minimizes the load we place on Legistar, which should prevent an abundance of 104s. |
Recently, our aggressive event scrapes failed with 104 status codes for a period of several hours. Metro uploaded agendas during this team, but they did not show up on the site due to failures in the scraper (indeed, the problem the aggressive scrapes were meant to prevent!) Consider a less aggressive schedule. @reginafcompton suggests two scrapes, one complete and one with a small window, per hour.
The text was updated successfully, but these errors were encountered: