Rework location bias and query builder in general #580

lonvia · 2021-05-17T11:55:41Z

This is work in progress to improve the location bias for searches. As the unusually high boost for name queries was interfering with the location scoring, the query part itself needed to be reworked as well.

In detail:

Location bias is switched to an exponential decay function with the given distance in kilometers as the 0.8 weight limit. To ensure that well-known places show up in the results nonetheless, use a second weight function based on importance and use the max of the two functions.
Introduce a new parameter ' zoom' for the location bias, which describes the radius for which the bias is effective as a function of the map zoom level. The precise relation is 0.25km*2^(18 - zoom). The location_bias_scale now describes how much weight is given to important places. Sensible values are between 0.0 and 1.0.
Weight function for importance switched to a simple linear decay function. This essentially means using importance as a direct weight factor with the twist that it must not become completely zero.
Reintroduce a subquery against the name.ngram index. This ensures similar results as the previous heavy boost on name.raw.
Add a subquery against collector.raw which gets a strong boost when the house number matches. Records with housenumbers usually have no name (the street not being in the name indexes), so that they get a huge malus over name matches. The boost offsets this malus.

The query structure is still not ideal but any further improvements require a change to the index mapping. I would like to defer that until after the next release.

With the modified weight functions we might even be able to get rid of painless. That would be a huge relief.

There is a global test instance (using last weeks planet dump) with the changes running at https://pagan.lonvia.de/ Feedback would be very much appreciated.

@hbruch @leonardehrenfried

lonvia · 2021-05-25T12:06:24Z

Here are a couple of queries that still fail:

https://pagan.lonvia.de/api?debug=1&q=alta (TF messes with the result)
https://pagan.lonvia.de/api?q=Wien%20Lohwaggase%2010 (skipping a word ranks higher than typos)

Switch to a simple linear decay function for weighing with importance. It uses the importance as a direct weigh factor. The only modification here is to ensure that the value range is above zero to avoid cancelling out the score when multiplying the weight. Location bias is switched to an exponential decay function with the given distance in kilometers as the 0.8 weight limit. To ensure that well-known places show up in the results nonetheless, use a second weight function based on importance and use the max of the two functions.

These tests rather check that the syntax is right and are as such of limited use. Given that they are very difficult to adapt to changes in the query builder, raher get rid of them.

The new zoom parameter for search indicates the approximate area the location bias should cover. For convenience, the area is computed from the usual map zoom levels. The 'location_bias_scale' parameter has slightly changed its meaning and now expresses how much og an influence importance should have when searching with location bias. Sensible values go from 0 to 1 with 0 meaning that importance has no influence at all and 1 that importance has approximately the same influence.

Get rid of the extensive boost. Instead have a second check against the ngram index for names. Also introduces a special path for housenumber matches. Housenumber records cannot match against the name index, so put more emphasis on the collector score instead.

This enables the use of different analysers.

There is no explicit index for the name in the default language. It is included in any of the language-specific indexes, so query them, when default is requested.

When requiring that a name or housenumber is present, then alternative names must be taken into account as well. The current DB schema only has a raw name index on these alternative names, so use that with a very low boost.

lonvia · 2021-06-14T12:11:38Z

The latest commit fixes an issue where alternative names (alt_name, old_name, etc.) were not searchable anymore.

kenseii · 2021-06-16T05:58:14Z

@lonvia I think with the rework of the query builder searching with post code is no longer working. For example the results of /api?q=289-1606 on the #585 PR(which is based on this PR) returns data totally different to the ones of photon.komoot.io which are quite accurate. Not sure if its a bug or if the support of post code search are going to be dropped.

lonvia · 2021-06-16T07:44:03Z

Postcode search has never really worked because we don't import the artificial postcode from Nominatim, see #310. Current master gives you some random result that would happen to have the postcode (a river in your example) but not really a postcode result. So I don't really see that as a regression.

lonvia · 2021-06-16T08:04:44Z

I've merged this now and deployed it on https://photon.komoot.io. We'll see if there is feedback from the wider user community.

JinIgarashi mentioned this pull request May 24, 2021

add Japanese language support #563

Closed

This was referenced May 25, 2021

Higher weight to name field #193

Closed

Location Preference does not appear to be honored #198

Closed

leonardehrenfried mentioned this pull request Jun 3, 2021

Photon/nominatim upgrade plan stadtnavi/digitransit-ansible#43

Closed

lonvia added 9 commits June 5, 2021 10:13

remove query json check tests

61f3ca4

These tests rather check that the syntax is right and are as such of limited use. Given that they are very difficult to adapt to changes in the query builder, raher get rid of them.

document new zoom parameter

7a7304c

do not completely exclude results from other languages

520e6ce

use best fields on raw queries

2d2f4ea

This enables the use of different analysers.

define search analysers for raw indexes always in mappings

444ebe9

avoid 'default' language in name query parts

da45392

There is no explicit index for the name in the default language. It is included in any of the language-specific indexes, so query them, when default is requested.

lonvia force-pushed the location-bias-and-query-structure branch from 0b00fbd to da45392 Compare June 5, 2021 12:57

lonvia mentioned this pull request Jun 11, 2021

Synonym list + classification terms #585

Merged

permit alternative names again

912b83e

When requiring that a name or housenumber is present, then alternative names must be taken into account as well. The current DB schema only has a raw name index on these alternative names, so use that with a very low boost.

lonvia changed the title ~~[WIP] Rework location bias and query builder in general~~ Rework location bias and query builder in general Jun 16, 2021

lonvia merged commit 22d8cf0 into komoot:master Jun 16, 2021

lonvia deleted the location-bias-and-query-structure branch June 16, 2021 07:47

This was referenced Jun 16, 2021

Location bias is not working properly #430

Closed

Feature request: zoom level parameter #428

Closed

I'm trying to find Moscow, for example #309

Closed

Finetuning, adding house number decreases relevance significantly #187

Closed

karussell mentioned this pull request Jun 16, 2021

improve response precision based on IP graphhopper/geocoder-converter#65

Open

lonvia mentioned this pull request Jun 16, 2021

Wrong search result #579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework location bias and query builder in general #580

Rework location bias and query builder in general #580

lonvia commented May 17, 2021

lonvia commented May 25, 2021

lonvia commented Jun 14, 2021

kenseii commented Jun 16, 2021

lonvia commented Jun 16, 2021

lonvia commented Jun 16, 2021

Rework location bias and query builder in general #580

Rework location bias and query builder in general #580

Conversation

lonvia commented May 17, 2021

lonvia commented May 25, 2021

lonvia commented Jun 14, 2021

kenseii commented Jun 16, 2021

lonvia commented Jun 16, 2021

lonvia commented Jun 16, 2021