Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCD model migration #240

Merged
merged 67 commits into from
Jun 27, 2019
Merged

OCD model migration #240

merged 67 commits into from
Jun 27, 2019

Conversation

fgregg
Copy link
Member

@fgregg fgregg commented Apr 15, 2019

This PR changes django-councilmatic to use opencivicdata models.

In concert with datamade/chi-councilmatic#252, these code changes can largely reproduce the Chicago Councilmatic site.

TODO:

@fgregg
Copy link
Member Author

fgregg commented May 13, 2019

@hancush this is ready for me to indoctrinate you.

@hancush
Copy link
Member

hancush commented Jun 12, 2019

@fgregg making good progress here.

can you say more about:

  • add order_by to meta class on EventAgendaItem
  • figure out how we want to handle updating the index

?

@hancush
Copy link
Member

hancush commented Jun 17, 2019

re: add order_by to meta class on EventAgendaItem, there's already clean_agenda_items which orders and dedupes agenda items.

@property
def clean_agenda_items(self):
agenda_items = self.agenda_items.order_by('order').all()
agenda_deduped = []
descriptions_seen = []
for a in agenda_items:
if a.description not in descriptions_seen:
descriptions_seen.append(a.description)
agenda_deduped.append(a)
return agenda_deduped

we already use this property to display agenda items. should be able to continue using with minor changes, e.g., i think it's event.agenda to get the agenda items (not event.agenda_items).

@fgregg
Copy link
Member Author

fgregg commented Jun 17, 2019

if you add ordering to the model class, it will, by default return querysets in that order. This is just an opportunity to clean up some code, and not required for this migration.

https://docs.djangoproject.com/en/2.2/ref/models/options/#ordering

@hancush
Copy link
Member

hancush commented Jun 18, 2019

Change log

Description

tl;dr - Remove the need for extensive and error-prone ETL by basing the Councilmatic app off Open Civic Data models, i.e., a jurisdiction-specific database fed by the scrapers, rather than maintaining a remote database containing a set of similar, Councilmatic-specific models.

There is a Python package containing Django versions of the OCD models. The relevant ones live in opencivicdata.core.models and opencivicdata.legislative.models.

Further reading:

Approach

Extend Open Civic Data models

Subclassing

Leverage multi-table inheritance to add additional fields to OCD models. The primary use case in the refactor is adding slugs to first-class models – Person, Event, Bill, and Organization - and adding a headshot to Person.

Since the scrapers create OCD objects, we added signals to each of the subclassed models to create the related Councilmatic model on save. Read more on Django signals. »

We have also added a management command to import headshot files.

Proxying

We make extensive use of proxy models to add custom managers and additional properties and methods to classes for use in Councilmatic code. This is an issue because the proxied models return OCD objects, not Councilmatic objects. To get around that, we used the hacky but effective django-proxy-overrides library to force models to return related objects as Councilmatic models, as needed.

To customize the model of related objects, proxy a model like opencivicdata.legislative.models.BillAction -> councilmatic_core.models.BillAction, and override one or more related object attributes with a ProxyForeignKey.

A brief summary of alternative approaches and why we ultimately did not choose to use them can be found here.

Related changes

  • Update attribute references
  • Handle changes in data type, e.g., cast date strings to datetime objects
  • Refactor use of obsolete raw SQL, e.g., management commands

Upgrade to Django 2

  • Update some module import paths
  • staticfiles -> static template tag
  • Build URLs via static template tag

Repair and extend the tests

  • Centralize fixtures and add a management command to regenerate them from a local instance DB
  • Align individual fixtures with the model change

Remaining changes

Copy link
Member Author

@fgregg fgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks great! there are some thing would be nice to change. I can't mark this as a request change.

@@ -120,7 +120,7 @@ def description(self, obj):
return "Recent sponsored bills from " + obj.name + "."

def items(self, person):
sponsored_bills = [s.bill for s in person.primary_sponsorships.order_by('-_bill__last_action_date')[:10]]
sponsored_bills = sorted((s.bill for s in person.primary_sponsorships), key=lambda x: x.last_action_date)[:10]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be ideal to push this sorting to the query.

last_date = timezone.now() - timedelta(36500)

if last_date:
last_date = last_date - last_date.utcoffset()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going on here? a comment on the method would be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll be honest with you, i have no idea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that an easier question to answer is: how should the index handle bills without a last action date?

it seems like this is placing them way, way in the past. is this for the legislative session filter? if so, then it would have the effect of not returning bills with this funny date for any of the current legislative sessions. i think that we could achieve this by simply returning None if there is no last_action_date, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that this has to do with https://github.com/datamade/chi-councilmatic/blob/master/chicago/search_indexes.py#L20-L26

but we should just handle that kind of changes in that code, and we override this method anyway. I think we can drop this method altogether.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! rebuilt the chicago index to make sure it didn't break anything, and it looks ok to me!


def convert_document_to_plaintext(self):
import textract
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this import here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you can run the tests w/o needing to install textract, which is a pita. https://textract.readthedocs.io/en/stable/installation.html / #193 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment to that effect.

from django.db import connection


class Command(BaseCommand):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!


bill = ProxyForeignKey(Bill, related_name='sponsorships', on_delete=models.CASCADE)
person = ProxyForeignKey(Person, null=True, on_delete=models.SET_NULL)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An organization can be a sponsor.

Copy link
Member Author

@fgregg fgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants