Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R044: More robust address matching #33

Open
jpmckinney opened this issue Feb 23, 2023 · 1 comment
Open

R044: More robust address matching #33

jpmckinney opened this issue Feb 23, 2023 · 1 comment
Labels
cmd:indicators Relating to the indicators command robustness

Comments

@jpmckinney
Copy link
Member

jpmckinney commented Feb 23, 2023

For example, dedupe (as I remember) applies address normalization (for at least US addresses). If we follow the same approach, we'd need to implement appropriate normalization for different jurisdictions. This strategy uses equality tests, but allows for some address components to be missing (e.g. "Main" vs "Main St"). I know Roberto Rocha recently evaluated a few different strategies when merging Canadian political donation datasets.

I think naive fuzzy matching will yield too many false positives (e.g. 1 Main St, Podunk, New York, USA 12345 and 100 Main St, ... are very close typographically, but are not at all the same address).

The first implementation could just do simple equality.

The metadata for this indicator should include a measure of similarity (percentage or otherwise).

@jpmckinney jpmckinney changed the title NF044: More robust address matching prepare: NF044: More robust address matching May 18, 2023
@jpmckinney jpmckinney changed the title prepare: NF044: More robust address matching NF044: More robust address matching May 18, 2023
@jpmckinney
Copy link
Member Author

The prepare command could perhaps to address normalization.

@jpmckinney jpmckinney added the cmd:indicators Relating to the indicators command label May 18, 2023
@jpmckinney jpmckinney changed the title NF044: More robust address matching R044: More robust address matching May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd:indicators Relating to the indicators command robustness
Projects
None yet
Development

No branches or pull requests

1 participant