Fix #302: Read address attributes from nominatim #303

hbruch · 2018-01-24T23:46:47Z

With this PR, the address key/values are read from nominatim and if it contains a street attribute, this is copied to via street.put("name", addressStreet).

It fixes tests added with geocoders/geocoder-tester#38

lonvia · 2018-01-25T19:36:45Z

As I said in #302, this is not really a fix for Mannheim. Still, it's worth thinking about using the addr:* tag information now that Nominatim stores it properly. But if we go for that it should be a bit more complete.

A bit of background: there are two special addr tags: addr:street and addr:place. These are meant to create a connection between an OSM object and the house number in that address. If addr:street is there, then we have a standard address with street and housenumber. If addr:place is there then the house numbers are counted within some larger area. Normally, you would only use one or the other with an address (although there are, as always, exceptions. Look for 'conscription numbers').

Nominatim only uses these two addr: tags to find a 'base object' to each address and then computes the rest of the address parts by looking in which administrative areas the base object lies. The result is a kind of textual description of the location. Photon currently also uses only this information to create its response.

This location description is often the same as the postal address but not always. Here is where the other addr:* tags come into play. They normally should contain the real postal address of a place. So the content might be slightly different as to what Nominatim computes.

There are pros and cons to displaying postal address or location descriptions in the result. For photon you might actually end up with slightly better results using the addr:* tags because it does not show the entire hierarchy of admin boundaries and sometimes grabs the wrong one. The main disadvantage of the addr: tags is that in contrast to Nominatim's address information, there is no translation of place names.

To make a long story short: I think, if you start looking into addr:street, you should at least also look at addr:city, maybe addr:suburb, too. To get translations we might do something clever and use the content of the addr: tag to find the best match in Nominatim's address hierarchy. But I'm getting carried away.

hbruch · 2018-01-31T10:04:31Z

Thank you, @lonvia for your very valuable insights.

I think, if you start looking into addr:street, you should at least also look at addr:city, maybe addr:suburb, too.

I added suburb and neighbourhood as fields.

Currently, I'm just overwriting street, city, suburb, neighbourhood, postcode with the addr: attributes.
Some results I gathered from a test run around Mannheim with ~500.000 addresses:

Type	Count
city name replacements (total)	11681
city replacements, formerly not set	22
city replacements, with Levenshtein <= 2	111
city replacements, with Levenshtein > 2	11548
street name replacements (total)	3977
street replacements, formerly not set	2106
street replacements, with Levenshtein <= 2	853
street replacements, with Levenshtein > 2	1018

Replacements with Levenshtein <= 2 usually are caused by typos (e.g Schloßplatz vs. Schlossplatz vs. Schlosssplatz), hyphenation diffs (e.g. Fränkisch-Crumbach vs Fränkisch Crumbach), case diffs (Am Großen Wald vs. Am großen Wald).

Replacements with Levenshtein > 2 are caused by completly different names (e.g. Tairnbach -> Mühlhausen) or in a few cases added/missing location appendices/prepositions (e.g. Hirschhorn (Neckar) vs. Hirschhorn, Pariser Weg vs. Am Pariser Weg) or different abbreviations (e.g. Sankt-Leoner-Straße vs. St. Leoner Straße).

Postal codes seem not to be replaced, I assume Nominatim is already using addr:postcode in case it exists(?).

To get translations we might do something clever and use the content of the addr: tag to find the best match in Nominatim's address hierarchy.

Currently, I do not yet use addr:* to pick the most probable address hierarchy match, and I'm not yet evaluating the addr:place tag. Instead of just replacing the default name with attr:* value, as I do now, another option would be to set them first and allow overwriting existing names with address hierarchy values, if the address hierarchy default name matches exactly.

lonvia · 2018-02-10T17:34:50Z

Postal codes seem not to be replaced, I assume Nominatim is already using addr:postcode in case it exists(?).

Yes, you can leave that out.

As for the other tags: addr:street and addr:city are uncontested and addr:suburb is in moderately wide use as well. Beyond that it depends on the country, so I'd leave it at that for the moment and remove addr:neighbourhood.

lonvia · 2018-02-10T17:37:36Z

src/main/java/de/komoot/photon/PhotonDoc.java

+     * Complete doc from nominatim address information.
+     */
+    public void completeFromAddress() {
+        String addressStreet = address != null ? address.get("street") : null;


Check address == null once at the begin of the function and return early.

lonvia · 2018-02-10T17:38:16Z

src/main/java/de/komoot/photon/PhotonDoc.java

+                this.neighbourhood = new HashMap<>();
+            }
+            setOrReplace(addressNeighbourhood, this.neighbourhood, "neighbourhood");
+        }


This code duplication can be avoided if you move the lookup into setOrReplace as well.

I've removed some duplication but not as much as I would have liked as the assignment to the instance fields (this.locality...) must happen outside setOrReplace or am I overlooking something?

I was thinking along the lines of having a function which can be used like this:

this.locality = extractAddress(this.locality, address, "neighbourhood")

Nevermind for now. It is good as is.

src/main/java/de/komoot/photon/nominatim/NominatimConnector.java

gopi-ar · 2018-07-16T06:43:20Z

addr:suburb is in moderately wide use as well. Beyond that it depends on the country, so I'd leave it at that for the moment and remove addr:neighbourhood.

At-least with Indian addresses, this doesn't seem to be the case. A number of addresses on our street weren't returning suburb or neighbourhood with this PR. admin_level seems reliable to some extent and all the linked place_ids in nominatim seemed to return correct admin_levels and were also marked as boundary. I added the following logic (based on admin_level info from here to @hbruch 's code and our office address in Hyderabad now shows the actual locality & suburbs.

    public boolean isNeighbourhood() {
        if ("place".equals(osmKey) && "neighbourhood".equals(osmValue)) {
            return true;
        }
        // TODO need better logic for this, not sure if it applies everywhere
        if (adminLevel != null && adminLevel == 11 && "boundary".equals(osmKey) && "administrative".equals(osmValue)) {
            return true;
        }
        return false;
    }

    public boolean isSuburb() {
        if ("place".equals(osmKey) && "suburb".equals(osmValue)) {
            return true;
        }

        // TODO need better logic for this, not sure if it applies everywhere
        if (adminLevel != null && (adminLevel == 9 || adminLevel == 10) && "boundary".equals(osmKey) && "administrative".equals(osmValue)) {
            return true;
        }
        return false;
    }

If this logic makes sense, will send an updated PR.

Signed-off-by: Holger Bruch <[email protected]>

lonvia

Now that Nominatim produces more reliable results for suburb and neighbourhood, I'd lean toward merging this. The geocodejson standard defines 'district' and 'locality' fields, which I have used in Nominatim for those two levels in between street and city. Can we converge on this? That would mean renaming 'suburb' to 'district' and 'neighbourhood' to 'locality'.

lonvia · 2020-05-25T20:53:42Z

src/main/java/de/komoot/photon/nominatim/model/AddressRow.java

+        }
+        // TODO admin?
+        return false;
+    }


Similar to the other one, you should now use the rank address and get better results for that:

suburb: [17, 21]
neighbourhood: [22, 25]

Sorry, that was misleading. I meant inclusive ranges here. So it should be:

return 22 <= rankAddress && rankAddress < 26;

and

return 17 <= rankAddress && rankAddress < 22;

Fixed that one.

Stylistically you should be returning the condition directly, that is
...
return "place".equals(osmKey) && "suburb".equals(osmValue);
...

Further I would suggest, as a rule, always adding javadoc to new methods.

Hmm, GitHub doesn't update the view of the lines you have commented on. It actually already does what you suggest, @simonpoole.

And yes, I can add the javadoc.

src/main/java/de/komoot/photon/nominatim/NominatimConnector.java

lonvia · 2020-05-25T21:00:30Z

src/main/java/de/komoot/photon/PhotonDoc.java

+            if (log.isDebugEnabled()) {
+                log.debug("Replacing "+ field +" name '"+existingName+"' with '"+ name+ "' for osmId #" + osmId);
+            }
+            // TODO: do we need to add former name to context or better not, as it might have been wrong?


I would do that. The address tag might have a typo and having the original improves the result.

- Rename suburb to district - Rename neighbourhood to locality - Keep original, pre-replacement name - Use address rank to determine if something is a district/locality

leonardehrenfried · 2020-05-29T19:38:14Z

@lonvia

I incorporated you review feedback. I'd love it if you gave this another review. Thanks!

lonvia · 2020-05-30T08:49:44Z

src/main/java/de/komoot/photon/PhotonDoc.java

+            if (log.isDebugEnabled()) {
+                log.debug("Replacing "+ field +" name '"+existingName+"' with '"+ name+ "' for osmId #" + osmId);
+            }
+            namesMap.put("formerName", existingName);


I was more thinking of adding it to the context hashset. If you have a look at completePlace() function in NominatimConnector.java, you'll see that this is what happens when multiple city-like address parts show up.

leonardehrenfried · 2020-06-03T13:17:00Z

@lonvia @simonpoole I've incorporated another round of review feedback. Please give it another look.

lonvia

Thanks for patiently addressing all the review comments. And apologies to @hbruch for taking two years for the review. ;) This is good to go now.

If you feel like continuing a bit in another PR: the last address bit that is still missing is county. The address ranks for it would be 5 <= address rank <= 9, the osm tag is addr:county.

lonvia · 2020-06-07T09:05:46Z

src/main/java/de/komoot/photon/PhotonDoc.java

+                this.neighbourhood = new HashMap<>();
+            }
+            setOrReplace(addressNeighbourhood, this.neighbourhood, "neighbourhood");
+        }


I was thinking along the lines of having a function which can be used like this:

this.locality = extractAddress(this.locality, address, "neighbourhood")

Nevermind for now. It is good as is.

hbruch changed the title ~~Fix #302: Read attributes from nominatim~~ Fix #302: Read address attributes from nominatim Jan 24, 2018

hbruch force-pushed the issue_302 branch from 5bec83e to 6845bbc Compare January 31, 2018 09:05

lonvia mentioned this pull request Feb 10, 2018

addr:place objects not found osm-search/Nominatim#912

Closed

lonvia reviewed Feb 10, 2018

View reviewed changes

hbruch mentioned this pull request Oct 17, 2018

Add suburb information #161

Closed

hbruch and others added 3 commits May 24, 2020 08:32

Read address attributes from nominatim

49c2125

Add neighbourhood and suburb

04d386e

Signed-off-by: Holger Bruch <[email protected]>

Early return if no address

48c3eff

hbruch force-pushed the issue_302 branch from 6845bbc to 48c3eff Compare May 25, 2020 04:40

lonvia reviewed May 25, 2020

View reviewed changes

Incorporate review feedback

10dafda

- Rename suburb to district - Rename neighbourhood to locality - Keep original, pre-replacement name - Use address rank to determine if something is a district/locality

Incorporate more feedback

ae5cf03

lonvia reviewed May 30, 2020

View reviewed changes

Add existing name to context

cc8211a

leonardehrenfried force-pushed the issue_302 branch from 8aaaf33 to cc8211a Compare June 3, 2020 12:27

leonardehrenfried added 4 commits June 3, 2020 14:28

Add Javadoc

c4ecf1b

Fix typo

2e44f7a

Remove duplicate null checks

c662672

Remove duplication in PhotonDoc

5872237

lonvia approved these changes Jun 7, 2020

View reviewed changes

lonvia merged commit e6b5af7 into komoot:master Jun 7, 2020

leonardehrenfried mentioned this pull request Jun 10, 2020

Import and index county field #468

Merged

lonvia mentioned this pull request Dec 24, 2020

Problem with area=yes or highway=pedestrian? #253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #302: Read address attributes from nominatim #303

Fix #302: Read address attributes from nominatim #303

hbruch commented Jan 24, 2018

lonvia commented Jan 25, 2018

hbruch commented Jan 31, 2018

lonvia commented Feb 10, 2018

lonvia Feb 10, 2018

lonvia Feb 10, 2018

leonardehrenfried Jun 3, 2020

lonvia Jun 7, 2020

gopi-ar commented Jul 16, 2018

lonvia left a comment

lonvia May 25, 2020

lonvia May 29, 2020

leonardehrenfried May 29, 2020

simonpoole May 29, 2020

leonardehrenfried May 29, 2020

lonvia May 25, 2020

leonardehrenfried commented May 29, 2020

lonvia May 30, 2020

leonardehrenfried commented Jun 3, 2020

lonvia left a comment

lonvia Jun 7, 2020

Fix #302: Read address attributes from nominatim #303

Fix #302: Read address attributes from nominatim #303

Conversation

hbruch commented Jan 24, 2018

lonvia commented Jan 25, 2018

hbruch commented Jan 31, 2018

lonvia commented Feb 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gopi-ar commented Jul 16, 2018

lonvia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonardehrenfried commented May 29, 2020

Choose a reason for hiding this comment

leonardehrenfried commented Jun 3, 2020

lonvia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment