Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(deduplication): Deduplicate values in phrase field
#118 added support for removing duplicate values from the name field. This logic was not also applied to the `phrase` field. Duplicate values do not affect whether or not a particular document will match for a given query, but they _do_ affect the scoring. In some cases, the scoring boost for having tokens match twice from duplicates will over-rank a particular result. In other cases, the scoring penalty for having longer fields will under-rank a particular result. To make sure our scoring is as fair as possible (pending other issues such as pelias/openstreetmap#507), we should apply our current deduplication on both the `name` and `phrase` fields.