Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Apr 11, 2017

This change adds an index setting to define how the documents should be sorted inside each Segment.
It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk.
It is not allowed to use a nested fields inside an index that defines an index sorting since nested fields relies on the original sort of the index.
This change does not add early termination capabilities in the search layer. This will be added in a follow up.

Relates #6720

@jimczi jimczi added :Core/Infra/Core Core issues without another label >feature review v6.0.0-alpha1 labels Apr 11, 2017
@jimczi jimczi requested a review from jpountz April 11, 2017 22:17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we've been moving away from these Fields objects in general and just naming the constants or even using "sort", depending on the context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why package private instead of private?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is also worth leaving a comment about how this is stored like this for easy reading from the settings. It looks funny to my java-accustomed eye.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we have gotten here? Would they need to use the test plugin to set the version? I'm not sure this is worth checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure either but this is how we would handle mixed cluster if we allow rolling upgrades for major releases ? I know it's not possible to have a mixed cluster with 5.x and 6.x nodes so maybe just paranoid statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings.EMPTY_ARRAY might be worth using here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be easier to read as

MultiValueMode mode = modes[i];
if (mode == null) {
  mode = reverse ? MultiValueMode.MAX : MultiValueMode.MIN;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the old method and put null all the places that don't use sorting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. You suggest to change all the call to createEngine with an explicit null value ? What would that change ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean add @Nullable Sort indexSort to one of the old ctors and change all the call sites that don't need a sort to provide null. Or maybe a random one? I'm not sure about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nested fields are not compatible with index sorting because they rely on the default doc_id sorting. An error will be thrown if index sorting is activated on an index that contains nested fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If type is going away maybe we don't want to advertise it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually don't have this setting in these tests. If it isn't needed I'd drop it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'd be nicer to do it on a field just so we don't rely on type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you sort on _id? That'd make the example pretty simple.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first quick pass to understand how things work. I'm wondering whether you considered configuring the index sort in the mappings rather than the settings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/missing/reverse/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use method references instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if/else is not needed as the code in the if block would work in all cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we lowercase the modes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not specific to that PR, but we should create constants for _first and _last

@jimczi
Copy link
Contributor Author

jimczi commented Apr 13, 2017

Thanks @jpountz and @nik9000 for reviewing.

I'm wondering whether you considered configuring the index sort in the mappings rather than the settings?

I did but currently the mapping is per type and I did not find an easy way to define something at the mapping level rather than the type level. I am not saying we should not do it but it would require some non-trivial changes in how we treat mappings. Maybe we could revisit this when we remove _type entirely ? Defining the index sort in the settings felt natural to me so I followed that path, it requires some validation between the mapping and the settings but I think the change is not that big. WDYT ?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

My previous comment about configuring the index sort in the mappings rather than in the settings is not practical. We might want to reconsider when types are gone, but for now I think settings are the way to go.

Can you please add experimental tags to this feature in the docs saying that we might change the way that the index sort is configured?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think saying that segments are ordered by doc id is a bit confusing, it rather works the other way: the ordering of documents inside a segment defines doc ids? Maybe just keep it to a minimum, eg. By default Lucene does not apply any sort..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/nested/Nested/ and maybe s/on the default doc_id sorting/on the assumption that nested documents are stored in contiguous doc ids, which can be broken by index sorting/?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/setting/settings/

jimczi added 6 commits April 19, 2017 13:24
This change adds an index setting to define how the documents should be sorted inside each Segment.
It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on
disk.
It is not allowed to use a `nested` fields inside an index that defines an index sorting since `nested` fields relies on the original sort of the index.
This change does not add early termination capabilities in the search layer. This will be added in a follow up.

Relates #6720
@jimczi jimczi merged commit f05af0a into elastic:master Apr 19, 2017
@jimczi jimczi deleted the feature/index_sorting branch April 19, 2017 12:36
@jimczi
Copy link
Contributor Author

jimczi commented Apr 19, 2017

Thanks @jpountz !

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 19, 2017
* master:
  Add BucketMetricValue interface (elastic#24188)
  Enable index-time sorting (elastic#24055)
  Clarify elasticsearch user uid:gid mapping in Docker docs
  Update field-names-field.asciidoc (elastic#24178)
  ElectMasterService.hasEnoughMasterNodes should return false if no masters were found
  Remove Ubuntu 12.04 (elastic#24161)
  [Test] Add unit tests for InternalHDRPercentilesTests (elastic#24157)
  Replicate write failures (elastic#23314)
  Rename variable in translog simple commit test
  Strengthen translog commit with open view test
  Stronger check in translog prepare and commit test
  Fix translog prepare commit and commit test
  ingest-node.asciidoc - Clarify json processor (elastic#21876)
  Painless: more testing for script_stack (elastic#24168)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants