Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making languages dynamic with command arguments #125

Merged
merged 24 commits into from
Dec 20, 2014

Conversation

Svantulden
Copy link
Contributor

Heya!

Thanks for this amazing library! Photon is best suited for our geocoding autocomplete needs compared to Nominatim (which we still use for reverse-geocoding and details).

I made some changes to allow for dynamic languages in Photon. We specifically needed to add Dutch as a language, and while I went over the language-specific code in Photon, I noticed that it wouldn't be too hard to make it fully dynamic.

The way it works now is as follows:

To specify languages that should be imported during nominatim-import

Use the nominatim-import-languages command argument. i.e.
java -jar photon-0.1.2.jar -nominatim-import -nominatim-import-languages nl,en -host localhost -port 5432 -database nominatim -user user -password password
for importing only the default, en and nl tags.

This could also be handy to keep Photon data as small as possible without languages that you are sure you are not going to use.

To specify languages for run-time

This could maybe be handy to block languages/enable languages for the request handler. Use the supported-languages command argument. i.e.
java -jar photon-0.1.2.jar -supported-languages nl,en
to use the nl and en language tags (defaults to en when no lang is specified).

However...
When testing this functionality, I noticed that the order of results was different between lang=en and lang=nl. After some debugging I noticed that it was the same on all the previous enabled languages (en,fr,it,de), but different on the nl tag.

I found out this was because I didn't update the mappings.json which is used on import, to specify collectors for languages. This should be dynamically changed too, when specifying nominatim-import-languages, which leaves this PR not finished.

Thoughts?

Thoughts & discussions are very welcome, on the code I wrote and on how I/we could tackle this problem correctly.

Things to do:

  • Make mappings.json dynamic somehow
  • Fully test the changes
  • How would we handle the problem when supported-languages has specified a language that doesn't exist in the data

PS. I also added specific Dutch postcode checking to this PR, which may help other Dutch users of Photon. This filters out double highway names who are part of the same street. In The Netherlands, a street can have different postcode letters (i.e. 6532RA and 6532BE could be the same street). The numbers however, stay the same. So to avoid clutter, I filter results that have the same postcode number AND name.

Sander added 9 commits December 8, 2014 09:55
- Use -nominatim-import-languages command with comma separated
languages to load those languages during import
- Use -nominatim-import-languages command with comma seperated
languages to use certain languages during run

Example: -nominatim-import-languages en,de,nl
Use -nominatim-import-languages for importing and -supported-languages
for runtime languages
- Filter doubles out if there are highways with the same name &
postcode numbers when language is 'nl' (i.e. the same street has 6532RA
and 6532BE, which should only be returned once to prevent clutter in
autocomplete)
@yohanboniface
Copy link
Collaborator

Very nice move @Svantulden! This really is a key point to make Photon more suitable.

About the languages currently hard coded in the mappings, I think this is a bad thing, that we only made to make our life easier until now.

What would be the best move for this, imho, is to make the mappings.json a template (renaming it accordingly, like mappings_template.json just to make it clear), then to load it in Java, and itering to add the de, fr, etc. properties where needed, according to the activated languages.

Things will become a bit harder the day we'll have some settings specific by language (so for example an analyser specific), but certainly not a blocker I think.

I'm not the Java guy of the team, so I will let others review the code in details.

@christophlingg @karussell thoughts on this?

@@ -67,8 +67,8 @@
"script_score": {
"script": "dist = doc['coordinate'].distanceInKm(lat, lon); 0.5 + ( 1.5 / (1.0 + dist/40.0) )",
"params": {
"lat": ${lat},
"lon": ${lon}
"lat": "${lat}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cast floats to strings, not sure we really want to do that. What was your rationale here (apart from making the json a vaild one ;) )?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops! That is an oversight on my part.. Did not realize it would be casted as string. Will revert this change

@christophlingg
Copy link
Member

Very nice @Svantulden ! A lot of people will benefit from your changes. Just some weeks ago someone contacted me for Hebrew support.

As for the mapping: I would also go for the solution @yohanboniface described. Until know I paid attention that our small python based scripts for prototyping can still use the same mapping file. Probably this won't be possible any more, but it should not be that bad.

Great work!

@Svantulden
Copy link
Contributor Author

Thanks alot for reviewing my PR, @yohanboniface & @christophlingg ! Nice to hear that it is in the right direction for Photon.

I will experiment some with using a json template for the mappings.json. I don't have much experience with that, but maybe I can do it about the same as the query_location_bias.json.

@yohanboniface
Copy link
Collaborator

I don't have much experience with that, but maybe I can do it about the same as the
query_location_bias.json.

Humm, this could work, but I don't think this is the best approach, given that we'll need a more grained workflow (create the lang collectors, add some copy_to here and there, add some lang fields…): imho, you should instead parse the json to have a java structure you can work on, and then dump it again to json (or is there an ES API that takes a JSONObject as settings instead of a json string? I don't know).

@Svantulden
Copy link
Contributor Author

Hey @yohanboniface & @christophlingg, I did some work on the parsing of the (empty of lang specific code) mappings.json.

It's in a pretty rough form now, I would appreciate some feedback on how we can make this smoother. How it works now:

  • The mappings.json file is stripped of all lang-specific strings and objects
  • I iterate over the selected langs, and add JSONObjects to the corresponding fields that need language specific objects
  • I have three predefined json objects as strings with {lang} to be replaced for each lang
  • For the default property of name, we add the lang specific collector

This works pretty okay, but what I am worried about is that it is going to be a bit of work when Photon gets updated with a new or changed mappings scheme. Then you need to update all the hooks too. Also, the code seems a bit overly duplicated.

Apart from that, I fixed the following:

  • Reverted the NL postcode check
  • Removed some spaces
  • Unified the commandline param to using-languages. What do you think about this name?
  • Reverted the valid json

@@ -16,6 +16,9 @@

@Parameter(names = "-nominatim-import", description = "import nominatim database into photon (this will delete previous index)")
private boolean nominatimImport = false;

@Parameter(names = "-using-languages", description = "languages nominatim importer should import and use at run-time, comma seperated (default: 'en,fr,de,it')")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just -languages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple and to the point, much better 👍

@yohanboniface
Copy link
Collaborator

I think the logic is good :)

I'll let the Java dancers have a deeper look at the details of the implementation :)

@Svantulden
Copy link
Contributor Author

Thanks for reviewing! I reworked some of the code and factorized the adding of collectors (except for the default name adding, because that is a special case unfortunately).

@Svantulden
Copy link
Contributor Author

I tested it to the best of my capabilities, and all seems fine. Do you think this is ready to be merged?

// get mappings as JSONObject
mappingsString = IOUtils.toString(mappings);
} catch(IOException e) {
log.error("cannot setup index, elastic search config files not readable", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rethrow the exception here... otherwise photon would start with a broken mapping and it will break anyway with strange message that make it difficult to understand what's broken. I think it's a good approach to be strict here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wouldn't it be bad practice to rethrow the exception? (see here)

I reworked some of the exception handling regarding the index creating (commit 1a50011), so that the program will not continue and properly throw with the stacktrace. The tests will also fail due to the exception throwing. I would like your opinion on this.

@christophlingg
Copy link
Member

Hi @Svantulden, I added some comments, we are close to merge it into master!

@Svantulden
Copy link
Contributor Author

Thanks @christophlingg for your feedback!

christophlingg added a commit that referenced this pull request Dec 20, 2014
[Not finished, discussion] Making languages dynamic with command arguments
@christophlingg christophlingg merged commit e1ab765 into komoot:master Dec 20, 2014
@Svantulden Svantulden changed the title [Not finished, discussion] Making languages dynamic with command arguments Making languages dynamic with command arguments Jul 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants