Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #302: Read address attributes from nominatim #303

Merged
merged 10 commits into from
Jun 7, 2020

Conversation

hbruch
Copy link
Collaborator

@hbruch hbruch commented Jan 24, 2018

With this PR, the address key/values are read from nominatim and if it contains a street attribute, this is copied to via street.put("name", addressStreet).

It fixes tests added with geocoders/geocoder-tester#38

@hbruch hbruch changed the title Fix #302: Read attributes from nominatim Fix #302: Read address attributes from nominatim Jan 24, 2018
@lonvia
Copy link
Collaborator

lonvia commented Jan 25, 2018

As I said in #302, this is not really a fix for Mannheim. Still, it's worth thinking about using the addr:* tag information now that Nominatim stores it properly. But if we go for that it should be a bit more complete.

A bit of background: there are two special addr tags: addr:street and addr:place. These are meant to create a connection between an OSM object and the house number in that address. If addr:street is there, then we have a standard address with street and housenumber. If addr:place is there then the house numbers are counted within some larger area. Normally, you would only use one or the other with an address (although there are, as always, exceptions. Look for 'conscription numbers').

Nominatim only uses these two addr: tags to find a 'base object' to each address and then computes the rest of the address parts by looking in which administrative areas the base object lies. The result is a kind of textual description of the location. Photon currently also uses only this information to create its response.

This location description is often the same as the postal address but not always. Here is where the other addr:* tags come into play. They normally should contain the real postal address of a place. So the content might be slightly different as to what Nominatim computes.

There are pros and cons to displaying postal address or location descriptions in the result. For photon you might actually end up with slightly better results using the addr:* tags because it does not show the entire hierarchy of admin boundaries and sometimes grabs the wrong one. The main disadvantage of the addr: tags is that in contrast to Nominatim's address information, there is no translation of place names.

To make a long story short: I think, if you start looking into addr:street, you should at least also look at addr:city, maybe addr:suburb, too. To get translations we might do something clever and use the content of the addr: tag to find the best match in Nominatim's address hierarchy. But I'm getting carried away.

@hbruch
Copy link
Collaborator Author

hbruch commented Jan 31, 2018

Thank you, @lonvia for your very valuable insights.

I think, if you start looking into addr:street, you should at least also look at addr:city, maybe addr:suburb, too.

I added suburb and neighbourhood as fields.

Currently, I'm just overwriting street, city, suburb, neighbourhood, postcode with the addr: attributes.
Some results I gathered from a test run around Mannheim with ~500.000 addresses:

Type Count
city name replacements (total) 11681
city replacements, formerly not set 22
city replacements, with Levenshtein <= 2 111
city replacements, with Levenshtein > 2 11548
street name replacements (total) 3977
street replacements, formerly not set 2106
street replacements, with Levenshtein <= 2 853
street replacements, with Levenshtein > 2 1018

Replacements with Levenshtein <= 2 usually are caused by typos (e.g Schloßplatz vs. Schlossplatz vs. Schlosssplatz), hyphenation diffs (e.g. Fränkisch-Crumbach vs Fränkisch Crumbach), case diffs (Am Großen Wald vs. Am großen Wald).

Replacements with Levenshtein > 2 are caused by completly different names (e.g. Tairnbach -> Mühlhausen) or in a few cases added/missing location appendices/prepositions (e.g. Hirschhorn (Neckar) vs. Hirschhorn, Pariser Weg vs. Am Pariser Weg) or different abbreviations (e.g. Sankt-Leoner-Straße vs. St. Leoner Straße).

Postal codes seem not to be replaced, I assume Nominatim is already using addr:postcode in case it exists(?).

To get translations we might do something clever and use the content of the addr: tag to find the best match in Nominatim's address hierarchy.

Currently, I do not yet use addr:* to pick the most probable address hierarchy match, and I'm not yet evaluating the addr:place tag. Instead of just replacing the default name with attr:* value, as I do now, another option would be to set them first and allow overwriting existing names with address hierarchy values, if the address hierarchy default name matches exactly.

@lonvia
Copy link
Collaborator

lonvia commented Feb 10, 2018

Postal codes seem not to be replaced, I assume Nominatim is already using addr:postcode in case it exists(?).

Yes, you can leave that out.

As for the other tags: addr:street and addr:city are uncontested and addr:suburb is in moderately wide use as well. Beyond that it depends on the country, so I'd leave it at that for the moment and remove addr:neighbourhood.

* Complete doc from nominatim address information.
*/
public void completeFromAddress() {
String addressStreet = address != null ? address.get("street") : null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check address == null once at the begin of the function and return early.

this.neighbourhood = new HashMap<>();
}
setOrReplace(addressNeighbourhood, this.neighbourhood, "neighbourhood");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code duplication can be avoided if you move the lookup into setOrReplace as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed some duplication but not as much as I would have liked as the assignment to the instance fields (this.locality...) must happen outside setOrReplace or am I overlooking something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking along the lines of having a function which can be used like this:

this.locality = extractAddress(this.locality, address, "neighbourhood")

Nevermind for now. It is good as is.

@gopi-ar
Copy link

gopi-ar commented Jul 16, 2018

addr:suburb is in moderately wide use as well. Beyond that it depends on the country, so I'd leave it at that for the moment and remove addr:neighbourhood.

At-least with Indian addresses, this doesn't seem to be the case. A number of addresses on our street weren't returning suburb or neighbourhood with this PR. admin_level seems reliable to some extent and all the linked place_ids in nominatim seemed to return correct admin_levels and were also marked as boundary. I added the following logic (based on admin_level info from here to @hbruch 's code and our office address in Hyderabad now shows the actual locality & suburbs.

    public boolean isNeighbourhood() {
        if ("place".equals(osmKey) && "neighbourhood".equals(osmValue)) {
            return true;
        }
        // TODO need better logic for this, not sure if it applies everywhere
        if (adminLevel != null && adminLevel == 11 && "boundary".equals(osmKey) && "administrative".equals(osmValue)) {
            return true;
        }
        return false;
    }

    public boolean isSuburb() {
        if ("place".equals(osmKey) && "suburb".equals(osmValue)) {
            return true;
        }

        // TODO need better logic for this, not sure if it applies everywhere
        if (adminLevel != null && (adminLevel == 9 || adminLevel == 10) && "boundary".equals(osmKey) && "administrative".equals(osmValue)) {
            return true;
        }
        return false;
    }

If this logic makes sense, will send an updated PR.

@hbruch hbruch mentioned this pull request Oct 17, 2018
Copy link
Collaborator

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that Nominatim produces more reliable results for suburb and neighbourhood, I'd lean toward merging this. The geocodejson standard defines 'district' and 'locality' fields, which I have used in Nominatim for those two levels in between street and city. Can we converge on this? That would mean renaming 'suburb' to 'district' and 'neighbourhood' to 'locality'.

}
// TODO admin?
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the other one, you should now use the rank address and get better results for that:

suburb: [17, 21]
neighbourhood: [22, 25]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was misleading. I meant inclusive ranges here. So it should be:

return 22 <= rankAddress && rankAddress < 26;

and

return 17 <= rankAddress && rankAddress < 22;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed that one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically you should be returning the condition directly, that is
...
return "place".equals(osmKey) && "suburb".equals(osmValue);
...

Further I would suggest, as a rule, always adding javadoc to new methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, GitHub doesn't update the view of the lines you have commented on. It actually already does what you suggest, @simonpoole.

And yes, I can add the javadoc.

if (log.isDebugEnabled()) {
log.debug("Replacing "+ field +" name '"+existingName+"' with '"+ name+ "' for osmId #" + osmId);
}
// TODO: do we need to add former name to context or better not, as it might have been wrong?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do that. The address tag might have a typo and having the original improves the result.

- Rename suburb to district
- Rename neighbourhood to locality
- Keep original, pre-replacement name
- Use address rank to determine if something is a district/locality
@leonardehrenfried
Copy link
Contributor

@lonvia

I incorporated you review feedback. I'd love it if you gave this another review. Thanks!

if (log.isDebugEnabled()) {
log.debug("Replacing "+ field +" name '"+existingName+"' with '"+ name+ "' for osmId #" + osmId);
}
namesMap.put("formerName", existingName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more thinking of adding it to the context hashset. If you have a look at completePlace() function in NominatimConnector.java, you'll see that this is what happens when multiple city-like address parts show up.

@leonardehrenfried
Copy link
Contributor

@lonvia @simonpoole I've incorporated another round of review feedback. Please give it another look.

Copy link
Collaborator

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for patiently addressing all the review comments. And apologies to @hbruch for taking two years for the review. ;) This is good to go now.

If you feel like continuing a bit in another PR: the last address bit that is still missing is county. The address ranks for it would be 5 <= address rank <= 9, the osm tag is addr:county.

this.neighbourhood = new HashMap<>();
}
setOrReplace(addressNeighbourhood, this.neighbourhood, "neighbourhood");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking along the lines of having a function which can be used like this:

this.locality = extractAddress(this.locality, address, "neighbourhood")

Nevermind for now. It is good as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants