-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Faster filtering for unavailable indices #78544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
IndicesAndAliasesResolver has a quicker path if the requested indices refer to all indices. However, it only consider empty or "_all" as referring to all indices, but not the "*" wildcard. This PR ensure "*" is handled as all indices as well for slightly better performance.
|
Pinging @elastic/es-security (Team:Security) |
|
|
||
| // check for all and return list of authorized indices | ||
| if (IndexNameExpressionResolver.isAllIndices(indicesList(indicesRequest.indices()))) { | ||
| if (isAllIndices(indicesRequest.indices())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ~10K indices, this change makes the method about 1-2 ms faster (~12ms vs ~10ms). This mostly helps requests that are directly created for REST level requests, e.g. SearchRequest.
Currently, the index authorization for SearchRequest with ~10k indices takes about 22ms and its divided as:
- loadAuthorizedIndices - ~3ms
- buildIndicesAccessControl - ~7ms
- ResolveIndices (this method) - ~12ms
So a 2ms reduction is close to 10% performance gain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about making the change in IndexNameExpressionResolver.isAllIndices. But it has a test that specifically make sure false is returned for * ... Not sure whether that is still necessary. Anyway, limiting the change to where it is used is safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is surprising to me that not converting an empty array to an empty list saves 2 ms, while building a List of 10k elements takes 10 ms.
Or instead of the array->list conversion, it is the fact that * is handled here rather than indexAbstractionResolver.resolveIndexAbstractions? They have the same code, so I don't understand it either.
I might be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. I should have given more context in my initial comment. The time saving is observed for a specific field caps query during benckmarking tests:
GET /*/_field_caps?fields=*&ignore_unavailable=true&allow_no_indices=false
The option ignore_unavailable=true is critical here because that means the following code in the else branch will execute:
Lines 152 to 156 in 324996e
| if (indicesOptions.ignoreUnavailable()) { | |
| //out of all the explicit names (expanded from wildcards and original ones that were left untouched) | |
| //remove all the ones that the current user is not authorized for and ignore them | |
| replaced = replaced.stream().filter(authorizedIndices::contains).collect(Collectors.toList()); | |
| } |
If authorizedIndices is large, replaced is also large (because it is an expansion of *), this loop and collecting are where the performance differs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace that stream with something more efficient then?
Like:
if (indicesOptions.ignoreUnavailable()) {
//out of all the explicit names (expanded from wildcards and original ones that were left untouched)
//remove all the ones that the current user is not authorized for and ignore them
List<String> filtered = null;
for (ListIterator<String> itr = replaced.listIterator(); itr.hasNext();) {
final String index = itr.next();
if (authorizedIndices.contains(index) == false) {
if (filtered == null) {
filtered = new ArrayList(replaced.size()-1);
filtered.addAll( replaced.subList(0, itr.previousIndex()) );
}
} else {
if (filtered != null) {
filtered.add(index);
}
}
}
if (filtered != null) {
replaced = fitered
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a clever way to avoid list allocation. It does still step through replaced. So for the case of a single *, I think it is still worthwhile to skip it entirely.
Strictly speaking, if every item of the original indices indicesRequest.indices() contains wildcard, we can safely skip this block of code. But the value of skipping kinda decreases with increase in size of the original indices because it introduces its own loop. So maybe we can combine skipping for single original index and your faster loop to something like (note it uses isSimpleMatchPatterninstead ofisMatchAllPattern`):
var indices = indicesRequest.indices();
if (indicesOptions.ignoreUnavailable() && indices.length == 1 && Regex.isSimpleMatchPattern(indices[0])) {
// Tim's faster loop here
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the comment there mentions, the replaced collection contains authorized indices that match whatever wildcards from the request's index expression plus whatever concrete names the request had, which have been copied over when build replaced.
The code that you're working on improving now is iterating over the replaced collection, after it had been built, and removes the concrete names. But wouldn't it be better if the names were not added to the replaced collection in the first place?
This is what I ended up proposing in #76540 .
Please take a look, if it works like I tell it works, it should improve over your proposal still since it is avoiding iterating over replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @albertzaharovits ! I wasn't aware the existence of #76540 ! I think it is definitely a better solution than this PR. I think #76540 looks great by browsing through it. I need take a closer look tomorrow. But I think it is safe to close this one. Thanks again!
...security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java
Outdated
Show resolved
Hide resolved
| // check for all and return list of authorized indices | ||
| // if (isAllIndices(indicesRequest.indices())) { | ||
| if (IndexNameExpressionResolver.isAllIndices(indicesList(indicesRequest.indices()))) { | ||
| if (replaceWildcards) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you realized that the lone * index expression with replaceWildcards == false now behaves slightly different.
Before, Security would leave the * in-place and would leave core do its thing (fail in some way).
Now, we either fail in Security with index not found exception, or pass it to core as *, -*, which would also probably fail (again because request options say to not replace wildcards).
I'm not terribly concerned, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I didn't realise that. Thanks for raising it. I was relying on the tests to catch any inconsistent behaviour and it seems we have some test gap.
I did some more digging and found there are behaviour differences:
GET */_search?expand_wildcards=none&allow_no_indices=false->404Error- Existing ->
404Error - Proposed ->
404Error
- Existing ->
GET */_search?expand_wildcards=none&allow_no_indices=true->- Existing -> Empty search result
- Proposed ->
404Error
The difference is about how allow_no_indices is intepreted. I'd argue that the existing behaviour is not accurate because allow_no_indices implies "expansion" of either wildcards or aliases. But the request explicilty asks for no expansion. Hence the name * should be treated as a literal index name. In that case, the relevant option should be ignore_unavailable instead of allow_no_indices. That is the following request should return empty result
GET */_search?expand_wildcards=none&ignore_unavailable=true- Existing -> Empty result
- Proposed -> Empty result
Based on the following comment, it seems that core is not fully convinced with its own behaviour either.
Lines 218 to 223 in eeb09f0
| // If only one index is specified then whether we fail a request if an index is missing depends on the allow_no_indices | |
| // option. At some point we should change this, because there shouldn't be a reason why whether a single index | |
| // or multiple indices are specified yield different behaviour. | |
| final boolean failNoIndices = indexExpressions.length == 1 | |
| ? options.allowNoIndices() == false | |
| : options.ignoreUnavailable() == false; |
That said, behaviour change is what I tried to avoid and it is the reason why I bothered to add special handling for error message. I'll make some adjustments. Thanks again for noticing it.
...security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java
Outdated
Show resolved
Hide resolved
albertzaharovits
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm not sure this saves significant resources.
Maybe a few ifs, here and there, but then we'll have to go over all the names in the cluster state which dwarfs any savings, if any.
Am I missing something?
…ecurity/authz/IndicesAndAliasesResolver.java Co-authored-by: Albert Zaharovits <[email protected]>
As detailed in above replies, I think it is better to further limit the scope of this change to:
Therefore, I'd like to just guard the following loop Lines 152 to 156 in eeb09f0
with an additional test so it becomes This means the loop is not executed if the requested indices is a single |
I think we want That is, we need to (potentially) remove unavailable (strictly speaking, "inaccessible") indices if there were any concrete indices in the local indices list.
I suspect you'll find that the main performance issue is multiple re-allocations. You might expect that |
|
I incorporated Tim's suggestion as well as the skipping logic for a single wildcard pattern. This is ready for another look. |
|
|
||
| public void testResolveNoExpandIgnoreUnavailable() { | ||
| SearchRequest request = new SearchRequest("missing*"); | ||
| SearchRequest request = new SearchRequest(randomFrom("missing*", "*")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argh .. I realised there is another edge case where we cannot safely skip filtering unavailable indices for a single *. If the cluster truly has no indices at all, * will be resolved into itself, i.e. a literal * and it should be removed from the resolved indices if ingoreUnavailable is requested (this test is updated to ensure that).
As a result, I decided to drop my original proposal for the special handling and just leave Tim's change for faster loop.
albertzaharovits
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at #76540 , maybe it achieves the same objective in a just slightly more efficient manner.
|
Closing in favor of #76540 |
IndicesAndAliasesResolver has a quicker path if the requested indices
refer to all indices. However, it only consider empty or "_all" as
referring to all indices, but not the "*" wildcard.
This PR ensure "*" is handled as all indices as well for slightly better
performance.