-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agenda items not preserved when two scrapers collect same event #239
Comments
I think this is dup #169 |
I don't think they're dups james. This one is suggesting unexpected behavior where rows from The linked issue is about what seems to be about a larger issue: not doing more scraping work than required, by reading from database during scrape process. I don't think it's solution would resolve this issue's concern. |
Hmm, perhaps |
Actually, re-read the other issue, and it seems relevant, like this comment. The importer can't determine whether the scrape with no agenda items lacks them because it's an intentionally incomplete object, or because those agenda items were removed in the real world. paultag's comments were too brief to read much into that Slack conversation. It seems to me that this is behaving as designed. I think the resolution would be for you to adjust Pupa's logic for your special case. Basically, you'd have to skip this logic if the data for the related model (in this case I suppose a generic solution would be to have a flag on the |
Ah ok, will admit that I had optimism after the paultag convo that this was a "simple" regression and the feature already existed, but i realize from that comment you linked that you're totally correct. Thanks for humouring me james! |
Reticketed from opencivicdata/scrapers-us-municipal#111
I have one scraper that scrapes all events scheduled in the legislative session, and another that runs for collecting details about recently passed and upcoming events (for which agendas are just being published).
So essentially:
events-incremental
. quick nightly scrape that builds events with agendas around the current date, andevents-full
. another scraper for the full schedule, that doesn't touch agendas.I am seeing that if
events-full
runs afterevents-incremental
it blows away all the agendas.A slack conversation with @paultag lead me to believe that this shouldn't be happening if no agenda items are added or touched.
However, I've just built a simple scraper a reproduced the wiping behaviour:
https://github.com/patcon/scrapers-us-municipal/tree/agenda-wipe-bug-demo
If I run the below code, the agenda items appear, and then disappear after running the second scrape:
Can someone confirm that this is unexpected behaviour, and perhaps suggest any ideas on where the regression may have happened? Thanks! :)
The text was updated successfully, but these errors were encountered: