Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arbitrary languages #33

Closed
lonvia opened this issue May 12, 2014 · 11 comments
Closed

Support arbitrary languages #33

lonvia opened this issue May 12, 2014 · 11 comments

Comments

@lonvia
Copy link
Collaborator

lonvia commented May 12, 2014

Photon should be able to do searches over all available language variants mapped in OSM (i.e. all name:* tags). Bonus points if it can improve search results by guessing the language of the query right and re-weight the results accordingly.

@tommedema
Copy link

Note that photon cannot assume that western scripts are inserted, e.g. Chinese, Arabic or Thai should work fine as well (Nominatim supports this too).

@christophlingg
Copy link
Member

how do you guess the user language in nominatim? by analyzing http headers / ip address?

@tommedema
Copy link

Nominatim accepts an accept-language parameter. If that is not set, I believe it looks at the browser's header of accepted languages, but I didn't verify this.

@christophlingg christophlingg added this to the sprint #2 milestone May 12, 2014
@lonvia
Copy link
Collaborator Author

lonvia commented May 12, 2014

For nominatim: accept-language parameter first and then HTTP accept-language header but this is used only for selecting the right name:* to output. It is not used in any way to rank the query matches or results.

@christophlingg
Copy link
Member

Thanks @lonvia !

@karussell
Copy link
Collaborator

Language detection could be done via a simple hack I made some years ago or other tools like described here

@christophlingg
Copy link
Member

sorry, wrong ticket ;-)

@karussell
Copy link
Collaborator

My comment here is relevant in this ticket as well. I'll hopefully have time in the next days to try that.

@karussell
Copy link
Collaborator

karussell commented May 30, 2014

I am creating a language detector specific for 'local' names like street names and POIs - i.e. the detector learns from OpenStreetMap data itself. Look here

It is already good for German stuff (4% of German names not detected as German), but e.g. french and english detection is really bad (30% error). I'll see how I can improve this.

Update 2020: there is a new tool https://github.com/pemistahl/lingua

@christophlingg
Copy link
Member

One of the biggest problems yet to resolve for full multilingual support is the storage size. The current elasticsearch config let size grow linear with the number of languages (approx 30 GB each).

I hope the cross_field approach mentioned previously can help.

@harry-wilson
Copy link

harry-wilson commented May 1, 2019

@lonvia , using Photon for customized data on Nominatim database does not support any languages other than en, fr, de, it. How can we search results by languages such as Arabic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants