-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #302: Read address attributes from nominatim #303
Conversation
As I said in #302, this is not really a fix for Mannheim. Still, it's worth thinking about using the addr:* tag information now that Nominatim stores it properly. But if we go for that it should be a bit more complete. A bit of background: there are two special addr tags: addr:street and addr:place. These are meant to create a connection between an OSM object and the house number in that address. If addr:street is there, then we have a standard address with street and housenumber. If addr:place is there then the house numbers are counted within some larger area. Normally, you would only use one or the other with an address (although there are, as always, exceptions. Look for 'conscription numbers'). Nominatim only uses these two addr: tags to find a 'base object' to each address and then computes the rest of the address parts by looking in which administrative areas the base object lies. The result is a kind of textual description of the location. Photon currently also uses only this information to create its response. This location description is often the same as the postal address but not always. Here is where the other addr:* tags come into play. They normally should contain the real postal address of a place. So the content might be slightly different as to what Nominatim computes. There are pros and cons to displaying postal address or location descriptions in the result. For photon you might actually end up with slightly better results using the addr:* tags because it does not show the entire hierarchy of admin boundaries and sometimes grabs the wrong one. The main disadvantage of the addr: tags is that in contrast to Nominatim's address information, there is no translation of place names. To make a long story short: I think, if you start looking into addr:street, you should at least also look at addr:city, maybe addr:suburb, too. To get translations we might do something clever and use the content of the addr: tag to find the best match in Nominatim's address hierarchy. But I'm getting carried away. |
Thank you, @lonvia for your very valuable insights.
I added suburb and neighbourhood as fields. Currently, I'm just overwriting street, city, suburb, neighbourhood, postcode with the addr: attributes.
Replacements with Levenshtein <= 2 usually are caused by typos (e.g Schloßplatz vs. Schlossplatz vs. Schlosssplatz), hyphenation diffs (e.g. Fränkisch-Crumbach vs Fränkisch Crumbach), case diffs (Am Großen Wald vs. Am großen Wald). Replacements with Levenshtein > 2 are caused by completly different names (e.g. Tairnbach -> Mühlhausen) or in a few cases added/missing location appendices/prepositions (e.g. Hirschhorn (Neckar) vs. Hirschhorn, Pariser Weg vs. Am Pariser Weg) or different abbreviations (e.g. Sankt-Leoner-Straße vs. St. Leoner Straße). Postal codes seem not to be replaced, I assume Nominatim is already using addr:postcode in case it exists(?).
Currently, I do not yet use addr:* to pick the most probable address hierarchy match, and I'm not yet evaluating the addr:place tag. Instead of just replacing the default name with attr:* value, as I do now, another option would be to set them first and allow overwriting existing names with address hierarchy values, if the address hierarchy default name matches exactly. |
Yes, you can leave that out. As for the other tags: addr:street and addr:city are uncontested and addr:suburb is in moderately wide use as well. Beyond that it depends on the country, so I'd leave it at that for the moment and remove addr:neighbourhood. |
* Complete doc from nominatim address information. | ||
*/ | ||
public void completeFromAddress() { | ||
String addressStreet = address != null ? address.get("street") : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check address == null once at the begin of the function and return early.
this.neighbourhood = new HashMap<>(); | ||
} | ||
setOrReplace(addressNeighbourhood, this.neighbourhood, "neighbourhood"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code duplication can be avoided if you move the lookup into setOrReplace as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed some duplication but not as much as I would have liked as the assignment to the instance fields (this.locality
...) must happen outside setOrReplace
or am I overlooking something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking along the lines of having a function which can be used like this:
this.locality = extractAddress(this.locality, address, "neighbourhood")
Nevermind for now. It is good as is.
At-least with Indian addresses, this doesn't seem to be the case. A number of addresses on our street weren't returning suburb or neighbourhood with this PR.
If this logic makes sense, will send an updated PR. |
Signed-off-by: Holger Bruch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that Nominatim produces more reliable results for suburb and neighbourhood, I'd lean toward merging this. The geocodejson standard defines 'district' and 'locality' fields, which I have used in Nominatim for those two levels in between street and city. Can we converge on this? That would mean renaming 'suburb' to 'district' and 'neighbourhood' to 'locality'.
} | ||
// TODO admin? | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the other one, you should now use the rank address and get better results for that:
suburb: [17, 21]
neighbourhood: [22, 25]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that was misleading. I meant inclusive ranges here. So it should be:
return 22 <= rankAddress && rankAddress < 26;
and
return 17 <= rankAddress && rankAddress < 22;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistically you should be returning the condition directly, that is
...
return "place".equals(osmKey) && "suburb".equals(osmValue);
...
Further I would suggest, as a rule, always adding javadoc to new methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, GitHub doesn't update the view of the lines you have commented on. It actually already does what you suggest, @simonpoole.
And yes, I can add the javadoc.
if (log.isDebugEnabled()) { | ||
log.debug("Replacing "+ field +" name '"+existingName+"' with '"+ name+ "' for osmId #" + osmId); | ||
} | ||
// TODO: do we need to add former name to context or better not, as it might have been wrong? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do that. The address tag might have a typo and having the original improves the result.
- Rename suburb to district - Rename neighbourhood to locality - Keep original, pre-replacement name - Use address rank to determine if something is a district/locality
I incorporated you review feedback. I'd love it if you gave this another review. Thanks! |
if (log.isDebugEnabled()) { | ||
log.debug("Replacing "+ field +" name '"+existingName+"' with '"+ name+ "' for osmId #" + osmId); | ||
} | ||
namesMap.put("formerName", existingName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more thinking of adding it to the context hashset. If you have a look at completePlace()
function in NominatimConnector.java
, you'll see that this is what happens when multiple city-like address parts show up.
8aaaf33
to
cc8211a
Compare
@lonvia @simonpoole I've incorporated another round of review feedback. Please give it another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for patiently addressing all the review comments. And apologies to @hbruch for taking two years for the review. ;) This is good to go now.
If you feel like continuing a bit in another PR: the last address bit that is still missing is county
. The address ranks for it would be 5 <= address rank <= 9
, the osm tag is addr:county
.
this.neighbourhood = new HashMap<>(); | ||
} | ||
setOrReplace(addressNeighbourhood, this.neighbourhood, "neighbourhood"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking along the lines of having a function which can be used like this:
this.locality = extractAddress(this.locality, address, "neighbourhood")
Nevermind for now. It is good as is.
With this PR, the address key/values are read from nominatim and if it contains a street attribute, this is copied to via
street.put("name", addressStreet)
.It fixes tests added with geocoders/geocoder-tester#38