-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise bill relations filter #112
Conversation
legistar/bills.py
Outdated
) | ||
|
||
# Assumes that there will not be more than 10 versions of a relation. | ||
seen_relations = deque([], maxlen=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think a set object would be safer and clearer here. is there a reason to strongly prefer a deque?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relations
is a list of dictionaries. Dictionaries are not hashable, so relations
can't be cast to a set. Also, relation objects referring to the same related bill are not identical duplicates:
<GranicusMatterRelation>
<MatterRelationFlag>1</MatterRelationFlag>
<MatterRelationGuid>1B541337-1A65-4D93-A6B3-BCFE4F5DA75C</MatterRelationGuid>
<MatterRelationId>1195</MatterRelationId>
<MatterRelationLastModifiedUtc>2017-09-20T22:38:45.3533333</MatterRelationLastModifiedUtc>
<MatterRelationMatterId>4333</MatterRelationMatterId>
<MatterRelationRowVersion>AAAAAACJaxw=</MatterRelationRowVersion>
</GranicusMatterRelation>
<GranicusMatterRelation>
<MatterRelationFlag>2</MatterRelationFlag>
<MatterRelationGuid>7863769D-6A6B-46AB-8F86-47CD6D0E7C33</MatterRelationGuid>
<MatterRelationId>1206</MatterRelationId>
<MatterRelationLastModifiedUtc>2017-10-09T22:27:14.51</MatterRelationLastModifiedUtc>
<MatterRelationMatterId>4333</MatterRelationMatterId>
<MatterRelationRowVersion>AAAAAACOQLs=</MatterRelationRowVersion>
</GranicusMatterRelation>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true, but you are just checking relation_id which is hashable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I misunderstood your suggestion. Done.
Description
In #47, we assumed that current relations would share the highest flag value. Per Metro-Records/la-metro-councilmatic#669 (comment), this is not the case. This PR updates the
relations
method to return a deduplicated list of relations use the most recent version of each relation, rather than a deduplicated list of relations sharing the max value of the relation flag across the entire set. It also exposes a method that can be overridden in downstream scraper instances to customize how, if at all, relations should be filtered during a scrape.Connects Metro-Records/la-metro-councilmatic#669.
Notes
We aren't 100% sure how the relation flag value is set (Metro is looking into it), but we do know that it isn't necessarily meaningful across all relations, only within versions of the same related bill.
Testing instructions
pip install -e /path/to/python-legistar-scraper
pupa update lametro --scrape bills matter_ids=4455,6008
4455
contains duplicate relations: http://webapi.legistar.com/v1/metro/matters/4455/relations6008
contains relations that do not share a common max version: http://webapi.legistar.com/v1/metro/matters/6008/relations_data/lametro/bill*
and confirm that therelated_bills
array in both files does not contain duplicates but does contain a relation object for each distinct bill in the API call.