-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Enable index-time sorting #24055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable index-time sorting #24055
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we've been moving away from these Fields objects in general and just naming the constants or even using "sort", depending on the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why package private instead of private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is also worth leaving a comment about how this is stored like this for easy reading from the settings. It looks funny to my java-accustomed eye.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we have gotten here? Would they need to use the test plugin to set the version? I'm not sure this is worth checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure either but this is how we would handle mixed cluster if we allow rolling upgrades for major releases ? I know it's not possible to have a mixed cluster with 5.x and 6.x nodes so maybe just paranoid statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings.EMPTY_ARRAY might be worth using here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be easier to read as
MultiValueMode mode = modes[i];
if (mode == null) {
mode = reverse ? MultiValueMode.MAX : MultiValueMode.MIN;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the old method and put null all the places that don't use sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. You suggest to change all the call to createEngine with an explicit null value ? What would that change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I mean add @Nullable Sort indexSort to one of the old ctors and change all the call sites that don't need a sort to provide null. Or maybe a random one? I'm not sure about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nested fields are not compatible with index sorting because they rely on the default doc_id sorting. An error will be thrown if index sorting is activated on an index that contains nested fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If type is going away maybe we don't want to advertise it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually don't have this setting in these tests. If it isn't needed I'd drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it'd be nicer to do it on a field just so we don't rely on type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you sort on _id? That'd make the example pretty simple.
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first quick pass to understand how things work. I'm wondering whether you considered configuring the index sort in the mappings rather than the settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/missing/reverse/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use method references instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the if/else is not needed as the code in the if block would work in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we lowercase the modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not specific to that PR, but we should create constants for _first and _last
|
Thanks @jpountz and @nik9000 for reviewing.
I did but currently the mapping is per type and I did not find an easy way to define something at the mapping level rather than the type level. I am not saying we should not do it but it would require some non-trivial changes in how we treat mappings. Maybe we could revisit this when we remove _type entirely ? Defining the index sort in the settings felt natural to me so I followed that path, it requires some validation between the mapping and the settings but I think the change is not that big. WDYT ? |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
My previous comment about configuring the index sort in the mappings rather than in the settings is not practical. We might want to reconsider when types are gone, but for now I think settings are the way to go.
Can you please add experimental tags to this feature in the docs saying that we might change the way that the index sort is configured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think saying that segments are ordered by doc id is a bit confusing, it rather works the other way: the ordering of documents inside a segment defines doc ids? Maybe just keep it to a minimum, eg. By default Lucene does not apply any sort..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/nested/Nested/ and maybe s/on the default doc_id sorting/on the assumption that nested documents are stored in contiguous doc ids, which can be broken by index sorting/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/setting/settings/
This change adds an index setting to define how the documents should be sorted inside each Segment. It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk. It is not allowed to use a `nested` fields inside an index that defines an index sorting since `nested` fields relies on the original sort of the index. This change does not add early termination capabilities in the search layer. This will be added in a follow up. Relates #6720
|
Thanks @jpountz ! |
* master: Add BucketMetricValue interface (elastic#24188) Enable index-time sorting (elastic#24055) Clarify elasticsearch user uid:gid mapping in Docker docs Update field-names-field.asciidoc (elastic#24178) ElectMasterService.hasEnoughMasterNodes should return false if no masters were found Remove Ubuntu 12.04 (elastic#24161) [Test] Add unit tests for InternalHDRPercentilesTests (elastic#24157) Replicate write failures (elastic#23314) Rename variable in translog simple commit test Strengthen translog commit with open view test Stronger check in translog prepare and commit test Fix translog prepare commit and commit test ingest-node.asciidoc - Clarify json processor (elastic#21876) Painless: more testing for script_stack (elastic#24168)
This change adds an index setting to define how the documents should be sorted inside each Segment.
It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk.
It is not allowed to use a
nestedfields inside an index that defines an index sorting sincenestedfields relies on the original sort of the index.This change does not add early termination capabilities in the search layer. This will be added in a follow up.
Relates #6720