Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow_no_match when resolving members of an Organization #253

Closed
jamesturk opened this issue Feb 19, 2017 · 6 comments
Closed

allow_no_match when resolving members of an Organization #253

jamesturk opened this issue Feb 19, 2017 · 6 comments

Comments

@jamesturk
Copy link
Member

jamesturk commented Feb 19, 2017

Took a swing at writing an Organization scraper here:

https://github.com/openstates/openstates/blob/pupa-fl/scrapers/fl/committees.py

Traceback (most recent call last):
  File "/Users/james/.virtualenvs/pupa/bin/pupa", line 9, in <module>
    load_entry_point('pupa==0.6.0', 'console_scripts', 'pupa')()
  File "/Users/james/code/openstates/pupa/pupa/cli/__main__.py", line 64, in main
    subcommands[args.subcommand].handle(args, other)
  File "/Users/james/code/openstates/pupa/pupa/cli/commands/update.py", line 266, in handle
    report['import'] = self.do_import(juris, args)
  File "/Users/james/code/openstates/pupa/pupa/cli/commands/update.py", line 190, in do_import
    report.update(membership_importer.import_directory(datadir))
  File "/Users/james/code/openstates/pupa/pupa/importers/base.py", line 188, in import_directory
    return self.import_data(json_stream())
  File "/Users/james/code/openstates/pupa/pupa/importers/base.py", line 225, in import_data
    obj_id, what = self.import_item(data)
  File "/Users/james/code/openstates/pupa/pupa/importers/base.py", line 245, in import_item
    data = self.prepare_for_db(data)
  File "/Users/james/code/openstates/pupa/pupa/importers/memberships.py", line 45, in prepare_for_db
    data['person_id'] = self.person_importer.resolve_json_id(data['person_id'])
  File "/Users/james/code/openstates/pupa/pupa/importers/base.py", line 163, in resolve_json_id
    raise UnresolvedIdError(errmsg)
pupa.exceptions.UnresolvedIdError: cannot resolve pseudo id to Person: ~{"name": "Keith Perry"}

The issue is that, just like w/ bill sponsorships, etc. there are going to be members listed that don't perfectly match their person and will require future resolution.

I wanted to see if anyone has written a committee scraper yet, and if so what people thought of various options:

  • force people to create new Person objects within Org scrapers, similarly to how we create new Organizations within People scrapers if the natural way to scrape committees is from a person's detail page
  • relax the constraint, so if a person isn't found person_id is set to None on the membership

Or maybe I'm missing something else obvious that we can do here.

Thoughts?

@jamesturk
Copy link
Member Author

probably most relevant to @jpmckinney @fgregg

@fgregg
Copy link
Contributor

fgregg commented Feb 20, 2017

I just wrote one. Unfortunately, I surfaced a bug that might relate to your current problem. #254

In any case, I prefer this option:

  • force people to create new Person objects within Org scrapers, similarly to how we create new Organizations within People scrapers if the natural way to scrape committees is from a person's detail page

I often want to be able to get a listing of all the people associated with an organization, and if we don't create Person objects for them, then the alternative would be for me to check the memberships of every organization and sub organization.

@jpmckinney
Copy link
Member

Here's @patcon's committee scraper for Toronto: https://github.com/opencivicdata/scrapers-ca/blob/master/ca_on_toronto/committees.py As @jamesturk encountered, the people need to have been already scraped. If their names differed, we added other_names to the people scraper.

I guess a higher level question is: Within a jurisdiction, should all the scrapers collectively create:

  1. one object per real thing in the world
  2. any number of objects per real thing in the world

If we only run one scraper at a time, then there will always only be one object per thing by the time we import, so we're fine in any case (unless a single scraper incorrectly creates multiple objects for one thing). If we run multiple scrapers and then perform a single import, then we could have multiple objects per thing, which raises a DuplicateItemError.

DuplicateItemError is useful to me, so I prefer not to relax that. I could probably live with running a single scraper at a time. But if we want to be able to run multiple scrapers at a time, then I think we couldn't follow your first option and would have to follow your second option (unless what you meant to say is that if your org scraper also scrapes people, then you should not have a person scraper that creates people).

@jpmckinney jpmckinney changed the title writing Organization scrapers is painful allow_no_match when resolving members of an Organization Feb 20, 2017
@fgregg
Copy link
Contributor

fgregg commented Feb 20, 2017

I am persuaded by @jpmckinney.

@jamesturk
Copy link
Member Author

jamesturk commented Feb 21, 2017

If we are going to relax person_id resolution in orgs we'd have to make a change in the models too as right now memberships require a person_id.

I assume we'd do something akin to what we do in PersonVote, add a person_name field. I can start to put together a draft PR for this.

@jpmckinney
Copy link
Member

Sounds reasonable to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants