Skip to content

Commit

Permalink
feat(mapping): Remove store mapping parameter
Browse files Browse the repository at this point in the history
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the _source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohibitive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our fields.

Effectively, we were storing a lot of fields on disk twice, which was
wasting space.

In my testing of the Portland, Oregon Docker project, which has about
1.8 million documents, this change reduces the disk space usage from
551MB to 492MB, or about 10%!

_Sidenote:_ If there are other fields we _do_ want to keep out of the
`_source` field,
[`_source.exclude`](https://github.com/pelias/schema/blob/master/mappings/document.js#L158-L159) in our document mapping is how we can do it.

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes #99
  • Loading branch information
orangejulius committed Nov 2, 2018
1 parent 3d1d542 commit 162af3a
Show file tree
Hide file tree
Showing 7 changed files with 683 additions and 1,362 deletions.
3 changes: 1 addition & 2 deletions mappings/partial/admin.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
{
"type": "string",
"analyzer": "peliasAdmin",
"store": "yes"
"analyzer": "peliasAdmin"
}
3 changes: 1 addition & 2 deletions mappings/partial/boundingbox.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
{
"type": "string",
"index": "no",
"store": "yes"
"index": "no"
}
3 changes: 1 addition & 2 deletions mappings/partial/literal.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
{
"type": "string",
"analyzer": "keyword",
"store": "yes"
"analyzer": "keyword"
}
3 changes: 1 addition & 2 deletions mappings/partial/postalcode.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
{
"type": "string",
"analyzer": "peliasZip",
"store": "yes"
"analyzer": "peliasZip"
}
Loading

0 comments on commit 162af3a

Please sign in to comment.