Skip to content

Conversation

@markharwood
Copy link
Contributor

Allow searches to fail if they are missing results eg from a timeout or a failure to execute on at least one shard. The allow_partial_search_results flag is a new parameter on search headers and a cluster-level setting controls default behaviour of searches where no preference is passed in the request. In future versions we will change the cluster default setting to throw errors instead of returning partial results.

Closes #27435

@markharwood
Copy link
Contributor Author

jenkins test this

@markharwood markharwood force-pushed the fix/27435v6x branch 2 times, most recently from 0cfd745 to 8a65513 Compare January 4, 2018 10:04
@markharwood
Copy link
Contributor Author

jenkins test this

@markharwood markharwood force-pushed the fix/27435v6x branch 3 times, most recently from d3600a0 to a1500ef Compare January 8, 2018 18:15
@markharwood markharwood requested a review from colings86 January 15, 2018 16:58
Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markharwood I left some comments, I have not looked at the bulk of the test changes in this round but will do in the next round of review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry if this is going to cause intermittent failures on CI where the slaves can sometimes run quite slow so this might still not be long enough? I wonder if @bleskes has any thoughts on how we can avoid using a sleep here?

Maybe we can use assertBusy to call and check the routing table in the cluster state until it is in the expected state?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add similar tests that check updating the cluster settings have the correct effects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: we'll need to change this default value in 6.x when we backport. Maybe it would be better to set the default to true when we merge and backport and then have a follow up PR for master that changes the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a PR for 6.x which means the docs are wrong here - default is true meaning partial results are allowed. We will be forward-porting and changing the default to false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this accept a boolean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need a Boolean to understand when an explicit choice hasn't been made by the client? It's when the value is null that we substitute for the cluster-level default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah - or did you know this and just mean to make the setters take a bool and keep the getters returning Boolean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I meant just the setters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this a boolean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this being node scope is that we could potentially have different values on different nodes (if the user sets this in elasticsearch.yml) which could cause unexpected behaviour. The same is actually already true for the default timeout above but I wonder if we should raise this in an issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, see my comment against the documentation about whether we should make this default to false in this PR to make the backport straightforward and then change the default in a follow up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure there is a "Property.ClusterScope"? All cluster-level settings I see (e.g. RecoverySettings.INDICES_RECOVERY_MAX_BYTES_PER_SEC_SETTING ) come defined with Property.NodeScope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO unfortunately there is no Property.ClusterScope so I think it makes it more important to have the setting resolved on the coordinating node so that all shards use the same value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to make this a boolean? since there is always a value returned even if the user has not set it themselves?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could resolve the value to true or false earlier (by consulting the cluster setting), for example could we maybe resolve this value in the TransportSearch action so the value is resolved on the coordinating node and consistent for all shard requests? Then this could be a boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have code in the TransportSearchAction.executeSearch that resolves the setting but not all tests transits this section of code. Consequently I was seeing NPEs from some tests when an unresolved null SearchRequest Boolean was being assigned to a boolean as a deeper part of query execution. I changed the failing tests to explicitly set SearchRequest.allowPartialSearchResults but it felt like a hack addressing the symptoms rather than the cause. Perhaps there should be a more explicit query preparation stage where user-supplied requests are augmented by any cluster-level options e.g. SearchRequest.fillInMissingParamsWithDefaults(Settings).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the code style is to avoid negating using !X and prefer X == false so its harder to miss the negation

@markharwood markharwood added :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.3.0 >feature >enhancement and removed >enhancement labels Jan 23, 2018
@markharwood markharwood force-pushed the fix/27435v6x branch 2 times, most recently from 30f0742 to f5c78fa Compare January 29, 2018 09:50
@markharwood
Copy link
Contributor Author

Ready for another review when you have time, @colings86

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left a comment about reporting all the missing shards rather than just the first but happy for you to merge if you agree with that change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should gather the missing shards and report all the shards that were unavailable rather than just the first?

… setting = false.

When true will error if search either timeouts, has partial errors or has missing shards rather
than returning partial search results.

Closes elastic#27435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :Search/Search Search-related issues that do not fall into other categories v6.3.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants