-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making languages dynamic with command arguments #125
Conversation
- Use -nominatim-import-languages command with comma separated languages to load those languages during import - Use -nominatim-import-languages command with comma seperated languages to use certain languages during run Example: -nominatim-import-languages en,de,nl
Use -nominatim-import-languages for importing and -supported-languages for runtime languages
- Filter doubles out if there are highways with the same name & postcode numbers when language is 'nl' (i.e. the same street has 6532RA and 6532BE, which should only be returned once to prevent clutter in autocomplete)
Very nice move @Svantulden! This really is a key point to make Photon more suitable. About the languages currently hard coded in the mappings, I think this is a bad thing, that we only made to make our life easier until now. What would be the best move for this, imho, is to make the Things will become a bit harder the day we'll have some settings specific by language (so for example an analyser specific), but certainly not a blocker I think. I'm not the Java guy of the team, so I will let others review the code in details. @christophlingg @karussell thoughts on this? |
@@ -67,8 +67,8 @@ | |||
"script_score": { | |||
"script": "dist = doc['coordinate'].distanceInKm(lat, lon); 0.5 + ( 1.5 / (1.0 + dist/40.0) )", | |||
"params": { | |||
"lat": ${lat}, | |||
"lon": ${lon} | |||
"lat": "${lat}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cast floats to strings, not sure we really want to do that. What was your rationale here (apart from making the json a vaild one ;) )?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops! That is an oversight on my part.. Did not realize it would be casted as string. Will revert this change
Very nice @Svantulden ! A lot of people will benefit from your changes. Just some weeks ago someone contacted me for Hebrew support. As for the mapping: I would also go for the solution @yohanboniface described. Until know I paid attention that our small python based scripts for prototyping can still use the same mapping file. Probably this won't be possible any more, but it should not be that bad. Great work! |
Thanks alot for reviewing my PR, @yohanboniface & @christophlingg ! Nice to hear that it is in the right direction for Photon. I will experiment some with using a json template for the |
Humm, this could work, but I don't think this is the best approach, given that we'll need a more grained workflow (create the lang collectors, add some copy_to here and there, add some lang fields…): imho, you should instead parse the json to have a java structure you can work on, and then dump it again to json (or is there an ES API that takes a JSONObject as settings instead of a json string? I don't know). |
- Should be submitted as separate PR
Hey @yohanboniface & @christophlingg, I did some work on the parsing of the (empty of lang specific code) It's in a pretty rough form now, I would appreciate some feedback on how we can make this smoother. How it works now:
This works pretty okay, but what I am worried about is that it is going to be a bit of work when Photon gets updated with a new or changed mappings scheme. Then you need to update all the hooks too. Also, the code seems a bit overly duplicated. Apart from that, I fixed the following:
|
@@ -16,6 +16,9 @@ | |||
|
|||
@Parameter(names = "-nominatim-import", description = "import nominatim database into photon (this will delete previous index)") | |||
private boolean nominatimImport = false; | |||
|
|||
@Parameter(names = "-using-languages", description = "languages nominatim importer should import and use at run-time, comma seperated (default: 'en,fr,de,it')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just -languages
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple and to the point, much better 👍
I think the logic is good :) I'll let the Java dancers have a deeper look at the details of the implementation :) |
Thanks for reviewing! I reworked some of the code and factorized the adding of collectors (except for the default name adding, because that is a special case unfortunately). |
I tested it to the best of my capabilities, and all seems fine. Do you think this is ready to be merged? |
// get mappings as JSONObject | ||
mappingsString = IOUtils.toString(mappings); | ||
} catch(IOException e) { | ||
log.error("cannot setup index, elastic search config files not readable", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rethrow the exception here... otherwise photon would start with a broken mapping and it will break anyway with strange message that make it difficult to understand what's broken. I think it's a good approach to be strict here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, wouldn't it be bad practice to rethrow the exception? (see here)
I reworked some of the exception handling regarding the index creating (commit 1a50011), so that the program will not continue and properly throw with the stacktrace. The tests will also fail due to the exception throwing. I would like your opinion on this.
Hi @Svantulden, I added some comments, we are close to merge it into master! |
Thanks @christophlingg for your feedback! |
[Not finished, discussion] Making languages dynamic with command arguments
Heya!
Thanks for this amazing library! Photon is best suited for our geocoding autocomplete needs compared to Nominatim (which we still use for reverse-geocoding and details).
I made some changes to allow for dynamic languages in Photon. We specifically needed to add Dutch as a language, and while I went over the language-specific code in Photon, I noticed that it wouldn't be too hard to make it fully dynamic.
The way it works now is as follows:
To specify languages that should be imported during nominatim-import
Use the
nominatim-import-languages
command argument. i.e.java -jar photon-0.1.2.jar -nominatim-import -nominatim-import-languages nl,en -host localhost -port 5432 -database nominatim -user user -password password
for importing only the default, en and nl tags.
This could also be handy to keep Photon data as small as possible without languages that you are sure you are not going to use.
To specify languages for run-time
This could maybe be handy to block languages/enable languages for the request handler. Use the
supported-languages
command argument. i.e.java -jar photon-0.1.2.jar -supported-languages nl,en
to use the nl and en language tags (defaults to en when no lang is specified).
However...
When testing this functionality, I noticed that the order of results was different between
lang=en
andlang=nl
. After some debugging I noticed that it was the same on all the previous enabled languages(en,fr,it,de)
, but different on thenl
tag.I found out this was because I didn't update the
mappings.json
which is used on import, to specify collectors for languages. This should be dynamically changed too, when specifyingnominatim-import-languages
, which leaves this PR not finished.Thoughts?
Thoughts & discussions are very welcome, on the code I wrote and on how I/we could tackle this problem correctly.
Things to do:
supported-languages
has specified a language that doesn't exist in the dataPS. I also added specific Dutch postcode checking to this PR, which may help other Dutch users of Photon. This filters out double
highway
names who are part of the same street. In The Netherlands, a street can have different postcode letters (i.e. 6532RA and 6532BE could be the same street). The numbers however, stay the same. So to avoid clutter, I filter results that have the same postcode number AND name.