-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct our use of synonyms for ES6 #381
Comments
Hmm.. very interesting, I don't think we could get away with doing our synonyms at query-time because of autocomplete. I suspect it would also increase the response-time significantly, and also potentially change some behaviour, so it's not a simple thing to change. It looks like this bug exclusively affects multi-word synonyms, and we have relatively few of those? |
I just looked and I couldn't find any multi-token synonyms listed in this repo, any idea which synonym is causing this? [edit] there are multi-token synonyms in this repo after all! see https://github.com/pelias/schema/pull/388/files |
Query time synonyms can be cool if we do change often our synonyms. But it's not really the case here. |
ohhh! you know what? i was testing using on of our geocode.earth client configurations. They have multi-token synonyms. so its still important to consider, but won't affect "stock" pelias |
This is an exploration of using the `synonym_graph` filter instead of the `synonym` filter. Quite a few integration tests fail, but they all look to be simple order changes of otherwise identical tokens. I didn't bother to fix them all because I'd first like to explore whether or not this change actually has any effect on query results or ES6 compatibility. Connects #381
This is an exploration of using the `synonym_graph` filter instead of the `synonym` filter. Quite a few integration tests fail, but they all look to be simple order changes of otherwise identical tokens. I didn't bother to fix them all because I'd first like to explore whether or not this change actually has any effect on query results or ES6 compatibility. Connects #381
This is an exploration of using the `synonym_graph` filter instead of the `synonym` filter. Quite a few integration tests fail, but they all look to be simple order changes of otherwise identical tokens. I didn't bother to fix them all because I'd first like to explore whether or not this change actually has any effect on query results or ES6 compatibility. Connects #381
This is an exploration of using the `synonym_graph` filter instead of the `synonym` filter. Quite a few integration tests fail, but they all look to be simple order changes of otherwise identical tokens. I didn't bother to fix them all because I'd first like to explore whether or not this change actually has any effect on query results or ES6 compatibility. Connects #381
okay, so I have tracked down a reproducible testcase: https://gist.github.com/missinglink/8f55271dcf4f5e7e8d0712b1f2c8d742 a simple way to trigger this error is with: POST http://localhost:9200/pelias/_analyze
{
"analyzer": "peliasIndexOneEdgeGram",
"text": "set"
} The synonym generation goes crazy: {
"tokens": [
{
"token": "s",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "se",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "set",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "sep",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "sept",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "septi",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "septie",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "septiem",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "septiemb",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "septiembr",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "septiembre",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "setb",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "setbr",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "setbre",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "sepe",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "sepb",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "sepbr",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "sepbre",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "7",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "7b",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "7br",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "7bre",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "b",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "br",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "bre",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "7",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "7r",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "7re",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "r",
"start_offset": 1,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
},
{
"token": "re",
"start_offset": 1,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
},
{
"token": "7",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
},
{
"token": "s",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
},
{
"token": "se",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
},
{
"token": "sep",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
},
{
"token": "r",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 3
},
{
"token": "re",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 3
},
{
"token": "b",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 3
},
{
"token": "br",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 3
},
{
"token": "bre",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 3
}
]
} |
While testing ES6 support, I ran into the following error while running an OSM import:
After some digging it appears this error is related to token position offsets created by the Synonym token filter.
There is a very interesting Elastic blog post from 2017 discussing the solution: the new Synonym graph token filter and how to use it to improve how synonyms expansion works. We'll need to figure out what the right solution is here for ES6 support
Overall, the biggest takeaway appears to be this:
Connects pelias/pelias#719
The text was updated successfully, but these errors were encountered: