Faster filtering for unavailable indices #78544

ywangd · 2021-10-01T01:57:21Z

IndicesAndAliasesResolver has a quicker path if the requested indices
refer to all indices. However, it only consider empty or "_all" as
referring to all indices, but not the "*" wildcard.

This PR ensure "*" is handled as all indices as well for slightly better
performance.

IndicesAndAliasesResolver has a quicker path if the requested indices refer to all indices. However, it only consider empty or "_all" as referring to all indices, but not the "*" wildcard. This PR ensure "*" is handled as all indices as well for slightly better performance.

elasticmachine · 2021-10-01T01:57:25Z

Pinging @elastic/es-security (Team:Security)

ywangd · 2021-10-01T02:04:31Z

...security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java


            // check for all and return list of authorized indices
-            if (IndexNameExpressionResolver.isAllIndices(indicesList(indicesRequest.indices()))) {
+            if (isAllIndices(indicesRequest.indices())) {


For ~10K indices, this change makes the method about 1-2 ms faster (~12ms vs ~10ms). This mostly helps requests that are directly created for REST level requests, e.g. SearchRequest.

Currently, the index authorization for SearchRequest with ~10k indices takes about 22ms and its divided as:

loadAuthorizedIndices - ~3ms

buildIndicesAccessControl - ~7ms

ResolveIndices (this method) - ~12ms

So a 2ms reduction is close to 10% performance gain.

I thought about making the change in IndexNameExpressionResolver.isAllIndices. But it has a test that specifically make sure false is returned for * ... Not sure whether that is still necessary. Anyway, limiting the change to where it is used is safer.

It is surprising to me that not converting an empty array to an empty list saves 2 ms, while building a List of 10k elements takes 10 ms.
Or instead of the array->list conversion, it is the fact that * is handled here rather than indexAbstractionResolver.resolveIndexAbstractions? They have the same code, so I don't understand it either.
I might be missing something.

Sorry for the confusion. I should have given more context in my initial comment. The time saving is observed for a specific field caps query during benckmarking tests:

GET /*/_field_caps?fields=*&ignore_unavailable=true&allow_no_indices=false

The option ignore_unavailable=true is critical here because that means the following code in the else branch will execute:

elasticsearch/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java

Lines 152 to 156 in 324996e

if (indicesOptions.ignoreUnavailable()) {

//out of all the explicit names (expanded from wildcards and original ones that were left untouched)

//remove all the ones that the current user is not authorized for and ignore them

replaced = replaced.stream().filter(authorizedIndices::contains).collect(Collectors.toList());

}

If authorizedIndices is large, replaced is also large (because it is an expansion of *), this loop and collecting are where the performance differs.

Should we replace that stream with something more efficient then?
Like:

if (indicesOptions.ignoreUnavailable()) { //out of all the explicit names (expanded from wildcards and original ones that were left untouched) //remove all the ones that the current user is not authorized for and ignore them List<String> filtered = null; for (ListIterator<String> itr = replaced.listIterator(); itr.hasNext();) { final String index = itr.next(); if (authorizedIndices.contains(index) == false) { if (filtered == null) { filtered = new ArrayList(replaced.size()-1); filtered.addAll( replaced.subList(0, itr.previousIndex()) ); } } else { if (filtered != null) { filtered.add(index); } } } if (filtered != null) { replaced = fitered } }

That's a clever way to avoid list allocation. It does still step through replaced. So for the case of a single *, I think it is still worthwhile to skip it entirely.

Strictly speaking, if every item of the original indices indicesRequest.indices() contains wildcard, we can safely skip this block of code. But the value of skipping kinda decreases with increase in size of the original indices because it introduces its own loop. So maybe we can combine skipping for single original index and your faster loop to something like (note it uses isSimpleMatchPatterninstead ofisMatchAllPattern`):

var indices = indicesRequest.indices(); if (indicesOptions.ignoreUnavailable() && indices.length == 1 && Regex.isSimpleMatchPattern(indices[0])) { // Tim's faster loop here }

As the comment there mentions, the replaced collection contains authorized indices that match whatever wildcards from the request's index expression plus whatever concrete names the request had, which have been copied over when build replaced.

The code that you're working on improving now is iterating over the replaced collection, after it had been built, and removes the concrete names. But wouldn't it be better if the names were not added to the replaced collection in the first place?
This is what I ended up proposing in #76540 .

Please take a look, if it works like I tell it works, it should improve over your proposal still since it is avoiding iterating over replaced.

Thanks @albertzaharovits ! I wasn't aware the existence of #76540 ! I think it is definitely a better solution than this PR. I think #76540 looks great by browsing through it. I need take a closer look tomorrow. But I think it is safe to close this one. Thanks again!

...security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java

albertzaharovits · 2021-10-05T18:27:27Z

...security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java

            // check for all and return list of authorized indices
+//            if (isAllIndices(indicesRequest.indices())) {
            if (IndexNameExpressionResolver.isAllIndices(indicesList(indicesRequest.indices()))) {
                if (replaceWildcards) {


I think you realized that the lone * index expression with replaceWildcards == false now behaves slightly different.
Before, Security would leave the * in-place and would leave core do its thing (fail in some way).
Now, we either fail in Security with index not found exception, or pass it to core as *, -*, which would also probably fail (again because request options say to not replace wildcards).
I'm not terribly concerned, though.

No, I didn't realise that. Thanks for raising it. I was relying on the tests to catch any inconsistent behaviour and it seems we have some test gap.

I did some more digging and found there are behaviour differences:

GET */_search?expand_wildcards=none&allow_no_indices=false -> 404 Error

Existing -> 404 Error

Proposed -> 404 Error

GET */_search?expand_wildcards=none&allow_no_indices=true ->

Existing -> Empty search result

Proposed -> 404 Error

The difference is about how allow_no_indices is intepreted. I'd argue that the existing behaviour is not accurate because allow_no_indices implies "expansion" of either wildcards or aliases. But the request explicilty asks for no expansion. Hence the name * should be treated as a literal index name. In that case, the relevant option should be ignore_unavailable instead of allow_no_indices. That is the following request should return empty result

GET */_search?expand_wildcards=none&ignore_unavailable=true

Existing -> Empty result

Proposed -> Empty result

Based on the following comment, it seems that core is not fully convinced with its own behaviour either.

elasticsearch/server/src/main/java/org/elasticsearch/cluster/metadata/IndexNameExpressionResolver.java

Lines 218 to 223 in eeb09f0

// If only one index is specified then whether we fail a request if an index is missing depends on the allow_no_indices

// option. At some point we should change this, because there shouldn't be a reason why whether a single index

// or multiple indices are specified yield different behaviour.

final boolean failNoIndices = indexExpressions.length == 1

? options.allowNoIndices() == false

: options.ignoreUnavailable() == false;

That said, behaviour change is what I tried to avoid and it is the reason why I bothered to add special handling for error message. I'll make some adjustments. Thanks again for noticing it.

...security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java

albertzaharovits

Overall I'm not sure this saves significant resources.
Maybe a few ifs, here and there, but then we'll have to go over all the names in the cluster state which dwarfs any savings, if any.
Am I missing something?

…ecurity/authz/IndicesAndAliasesResolver.java Co-authored-by: Albert Zaharovits <[email protected]>

…as-all

ywangd · 2021-10-06T05:06:36Z

Overall I'm not sure this saves significant resources. Maybe a few ifs, here and there, but then we'll have to go over all the names in the cluster state which dwarfs any savings, if any. Am I missing something?

As detailed in above replies, I think it is better to further limit the scope of this change to:

Avoid behaviour change
Stay closer to where the performance gain is achieved

Therefore, I'd like to just guard the following loop

elasticsearch/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java

Lines 152 to 156 in eeb09f0

    
           if (indicesOptions.ignoreUnavailable()) { 
        
               //out of all the explicit names (expanded from wildcards and original ones that were left untouched) 
        
               //remove all the ones that the current user is not authorized for and ignore them 
        
               replaced = replaced.stream().filter(authorizedIndices::contains).collect(Collectors.toList()); 
        
           }

with an additional test so it becomes

if (indicesOptions.ignoreUnavailable() 
    && false == (replaceWildcards && Regex.isMatchAllPattern(indicesRequest.indices()[0]))) {
...
}

This means the loop is not executed if the requested indices is a single * and wildcard expansion is on. Please let me know what you think. Thanks!

tvernum · 2021-10-07T03:50:46Z

if (indicesOptions.ignoreUnavailable()
&& false == (replaceWildcards && Regex.isMatchAllPattern(indicesRequest.indices()[0]))) {
...
}

I think we want

if (indicesOptions.ignoreUnavailable() 
    && split.getLocal().stream().allMatch(Regex::isSimpleMatchPattern) == false)

That is, we need to (potentially) remove unavailable (strictly speaking, "inaccessible") indices if there were any concrete indices in the local indices list.

That's a clever way to avoid list allocation. It does still step through replaced.

I suspect you'll find that the main performance issue is multiple re-allocations. You might expect that Collectors.toList generates a list with the correct initial size, but it doesn't - my guess (based on previous profiling) is that simply creating the new list to be the same initial size as the original list would actually give you most of the performance improvement you're seeing.

…as-all

ywangd · 2021-10-11T02:41:23Z

I incorporated Tim's suggestion as well as the skipping logic for a single wildcard pattern. This is ready for another look.

ywangd · 2021-10-11T03:38:36Z

...ity/src/test/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolverTests.java


    public void testResolveNoExpandIgnoreUnavailable() {
-        SearchRequest request = new SearchRequest("missing*");
+        SearchRequest request = new SearchRequest(randomFrom("missing*", "*"));


argh .. I realised there is another edge case where we cannot safely skip filtering unavailable indices for a single *. If the cluster truly has no indices at all, * will be resolved into itself, i.e. a literal * and it should be removed from the resolved indices if ingoreUnavailable is requested (this test is updated to ensure that).

As a result, I decided to drop my original proposal for the special handling and just leave Tim's change for faster loop.

albertzaharovits

Please take a look at #76540 , maybe it achieves the same objective in a just slightly more efficient manner.

ywangd · 2021-10-13T12:24:53Z

Closing in favor of #76540

ywangd added >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.0.0 v7.16.0 labels Oct 1, 2021

ywangd requested review from albertzaharovits and tvernum October 1, 2021 01:57

elasticmachine added the Team:Security Meta label for security team label Oct 1, 2021

ywangd commented Oct 1, 2021

View reviewed changes

fix tests

c483e4d

albertzaharovits reviewed Oct 5, 2021

View reviewed changes

...security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java Outdated Show resolved Hide resolved

albertzaharovits reviewed Oct 5, 2021

View reviewed changes

...security/src/main/java/org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java Outdated Show resolved Hide resolved

albertzaharovits reviewed Oct 5, 2021

View reviewed changes

ywangd and others added 3 commits October 6, 2021 14:28

Update x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/s…

3b10fa2

…ecurity/authz/IndicesAndAliasesResolver.java Co-authored-by: Albert Zaharovits <[email protected]>

Merge remote-tracking branch 'origin/master' into match-all-wildcard-…

af276f1

…as-all

Merge remote-tracking branch 'origin/master' into match-all-wildcard-…

734a7ad

…as-all

ywangd added 3 commits October 11, 2021 13:23

Merge remote-tracking branch 'origin/master' into match-all-wildcard-…

b4237b4

…as-all

address feedback

f9a2c88

tweak

4268c27

ywangd requested a review from albertzaharovits October 11, 2021 02:38

extract method

90c1b00

ywangd changed the title ~~Treat * as all indices when resolving indices~~ Skip or faster filtering for unavailable indices Oct 11, 2021

ywangd changed the title ~~Skip or faster filtering for unavailable indices~~ Faster filtering for unavailable indices Oct 11, 2021

fix tests

69f56c6

ywangd commented Oct 11, 2021

View reviewed changes

albertzaharovits reviewed Oct 12, 2021

View reviewed changes

ywangd closed this Oct 13, 2021

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

	if (indicesOptions.ignoreUnavailable()) {
	//out of all the explicit names (expanded from wildcards and original ones that were left untouched)
	//remove all the ones that the current user is not authorized for and ignore them
	replaced = replaced.stream().filter(authorizedIndices::contains).collect(Collectors.toList());
	}

	// If only one index is specified then whether we fail a request if an index is missing depends on the allow_no_indices
	// option. At some point we should change this, because there shouldn't be a reason why whether a single index
	// or multiple indices are specified yield different behaviour.
	final boolean failNoIndices = indexExpressions.length == 1
	? options.allowNoIndices() == false
	: options.ignoreUnavailable() == false;

Faster filtering for unavailable indices #78544

Faster filtering for unavailable indices #78544

Uh oh!

Conversation

ywangd commented Oct 1, 2021

Uh oh!

elasticmachine commented Oct 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tvernum Oct 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

albertzaharovits left a comment

Choose a reason for hiding this comment

Uh oh!

ywangd commented Oct 6, 2021

Uh oh!

tvernum commented Oct 7, 2021

Uh oh!

ywangd commented Oct 11, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertzaharovits left a comment

Choose a reason for hiding this comment

Uh oh!

ywangd commented Oct 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tvernum Oct 6, 2021 •

edited

Loading