Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a command to detect duplicates #2

Open
hampelm opened this issue Apr 29, 2017 · 0 comments
Open

Add a command to detect duplicates #2

hampelm opened this issue Apr 29, 2017 · 0 comments

Comments

@hampelm
Copy link
Member

hampelm commented Apr 29, 2017

Once we have the CSV, we should check if any of the grants are obviously duplicates.

Command: npm run check knight

To start, we can just grab the list of all grants made by the funder (since we know that ID is good) -- that way we don't have to depend on #1.

Heuristics:

  • Same grant amount as existing grant
  • Same grant start and/or end date as an existing grant
  • Same / similar name (maybe use a textual similarity function)

If a match is found, we should add a duplicate column, maybe with a score (0, 1, 2, something like that) and a link to the potential duplicate(s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant