Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Reconciliation Service #92

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

fgregg
Copy link
Contributor

@fgregg fgregg commented Jun 14, 2017

This OCDEP proposes a design spec and governance model for an opencivicdata entity resolution service. This will allow publishers of civic data use the same ocd ids while staying loosely coupled.

@fgregg fgregg changed the title Reconciliation Service [WIP] Reconciliation Service Jun 14, 2017
@fgregg
Copy link
Contributor Author

fgregg commented Jul 25, 2017

Large outstanding question: How to handle existing ids

I've been reflecting on Julie A McMurry, et.al.'s excellent paper on identifiers, where she advises "only creating your own identifiers for new knowledge" While this sounds sound, I'm not sure how to square this with our requirement that our identifiers be ocd identifiers.

If we do try to heed the advise that we not create new, redundant identifiers, then who will be responsible for synchronizing with these other authorities?

If that's too much of a dependency, should we allow for search by external ids?

@fgregg
Copy link
Contributor Author

fgregg commented Jul 25, 2017

Some notes

  • should we require some sort of proof that a search was made before creating new id? Something like this his would be necessary for training, maybe
  • if we allow a delete record method we need to decide what happens when last record was deleted
  • should we have a source field in records
  • we need to encourage putting after matching
  • should the search method return anything beyond the ids, maybe the nearest matching record?
  • should have a "what this is not section" i.e. this is not an attempt to integrate information from various datasets or provide canonical information.

@fgregg
Copy link
Contributor Author

fgregg commented Dec 5, 2017

This is ready for a preliminary review. There's a number of governance and incentive questions that I still have, but I figure those can wait until we have worked on some implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant