Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metro scraper did not scrape EventAgendaStatusName #239

Open
reginafcompton opened this issue Jul 18, 2018 · 2 comments
Open

Metro scraper did not scrape EventAgendaStatusName #239

reginafcompton opened this issue Jul 18, 2018 · 2 comments

Comments

@reginafcompton
Copy link
Contributor

Yesterday Metro canceled two events:
http://webapi.legistar.com/v1/metro/events/1366
http://webapi.legistar.com/v1/metro/events/1367

Neither the DataMade windowed scrape nor the nightly full scrape captured this change.

I manually executed pupa update lametro events window=7, which captured the change. Let's determine what happened.

@reginafcompton
Copy link
Contributor Author

reginafcompton commented Oct 17, 2018

This occurred again, and the cause of the issue is slightly more apparent. This morning, Metro canceled the following events:

http://webapi.legistar.com/v1/metro/events/1378
http://webapi.legistar.com/v1/metro/events/1379

However, the EventLastModifiedUtc reads as 2018-06-20.... That is, the Legistar API did not adjust this timestamp, when the EventAgendaStatusName changes.

I manually executed pupa update lametro events (which scrapes everything), and it pulled in the status changes.

I've reported a similar issue here: Metro-Records/la-metro-councilmatic#267

Thoughts

  1. The Legistar EventLastModifiedUTC does not communicate the way our scrapers expect. As a result, we may need to scrape events more aggressively. Scraping all events, once or twice per hour, seems reasonable. Currently, we scrape all events once per day, except on Fridays, when we scrape all events every 15 minutes. This frequency came about due to this related issue: scrape missed location change of special board meeting: June 15, 2018 Metro-Records/la-metro-councilmatic#310 – (We saw a similar issue with bills)
  2. Does Metro have any control over this timestamp in the API? Can they insure that the timestamp changes when they update an event, or does Legistar fully automate this?

@reginafcompton
Copy link
Contributor Author

We also saw this behavior with the media links. Metro staff added the audio URLs, but the timestamps did not change. Incidentally, our nightly scrape would have grabbed the URLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant