Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Oct 1, 2021

IndicesAndAliasesResolver has a quicker path if the requested indices
refer to all indices. However, it only consider empty or "_all" as
referring to all indices, but not the "*" wildcard.

This PR ensure "*" is handled as all indices as well for slightly better
performance.

IndicesAndAliasesResolver has a quicker path if the requested indices
refer to all indices. However, it only consider empty or "_all" as
referring to all indices, but not the "*" wildcard.

This PR ensure "*" is handled as all indices as well for slightly better
performance.
@ywangd ywangd added >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.0.0 v7.16.0 labels Oct 1, 2021
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Oct 1, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)


// check for all and return list of authorized indices
if (IndexNameExpressionResolver.isAllIndices(indicesList(indicesRequest.indices()))) {
if (isAllIndices(indicesRequest.indices())) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ~10K indices, this change makes the method about 1-2 ms faster (~12ms vs ~10ms). This mostly helps requests that are directly created for REST level requests, e.g. SearchRequest.

Currently, the index authorization for SearchRequest with ~10k indices takes about 22ms and its divided as:

  • loadAuthorizedIndices - ~3ms
  • buildIndicesAccessControl - ~7ms
  • ResolveIndices (this method) - ~12ms

So a 2ms reduction is close to 10% performance gain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about making the change in IndexNameExpressionResolver.isAllIndices. But it has a test that specifically make sure false is returned for * ... Not sure whether that is still necessary. Anyway, limiting the change to where it is used is safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is surprising to me that not converting an empty array to an empty list saves 2 ms, while building a List of 10k elements takes 10 ms.
Or instead of the array->list conversion, it is the fact that * is handled here rather than indexAbstractionResolver.resolveIndexAbstractions? They have the same code, so I don't understand it either.
I might be missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. I should have given more context in my initial comment. The time saving is observed for a specific field caps query during benckmarking tests:

GET /*/_field_caps?fields=*&ignore_unavailable=true&allow_no_indices=false

The option ignore_unavailable=true is critical here because that means the following code in the else branch will execute:

if (indicesOptions.ignoreUnavailable()) {
//out of all the explicit names (expanded from wildcards and original ones that were left untouched)
//remove all the ones that the current user is not authorized for and ignore them
replaced = replaced.stream().filter(authorizedIndices::contains).collect(Collectors.toList());
}

If authorizedIndices is large, replaced is also large (because it is an expansion of *), this loop and collecting are where the performance differs.

Copy link
Contributor

@tvernum tvernum Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace that stream with something more efficient then?
Like:

 if (indicesOptions.ignoreUnavailable()) { 
     //out of all the explicit names (expanded from wildcards and original ones that were left untouched) 
     //remove all the ones that the current user is not authorized for and ignore them 
     List<String> filtered = null;
     for (ListIterator<String> itr = replaced.listIterator(); itr.hasNext();) {
          final String index = itr.next();
          if (authorizedIndices.contains(index) == false) {
               if (filtered == null) {
                   filtered = new ArrayList(replaced.size()-1);
                   filtered.addAll( replaced.subList(0, itr.previousIndex()) );
               }
          } else {
               if (filtered != null) {
                   filtered.add(index);
               }
          }
    }
    if (filtered != null) {
        replaced = fitered
    }
 } 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clever way to avoid list allocation. It does still step through replaced. So for the case of a single *, I think it is still worthwhile to skip it entirely.

Strictly speaking, if every item of the original indices indicesRequest.indices() contains wildcard, we can safely skip this block of code. But the value of skipping kinda decreases with increase in size of the original indices because it introduces its own loop. So maybe we can combine skipping for single original index and your faster loop to something like (note it uses isSimpleMatchPatterninstead ofisMatchAllPattern`):

var indices = indicesRequest.indices();
if (indicesOptions.ignoreUnavailable() && indices.length == 1 && Regex.isSimpleMatchPattern(indices[0])) {
    // Tim's faster loop here
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment there mentions, the replaced collection contains authorized indices that match whatever wildcards from the request's index expression plus whatever concrete names the request had, which have been copied over when build replaced.

The code that you're working on improving now is iterating over the replaced collection, after it had been built, and removes the concrete names. But wouldn't it be better if the names were not added to the replaced collection in the first place?
This is what I ended up proposing in #76540 .

Please take a look, if it works like I tell it works, it should improve over your proposal still since it is avoiding iterating over replaced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @albertzaharovits ! I wasn't aware the existence of #76540 ! I think it is definitely a better solution than this PR. I think #76540 looks great by browsing through it. I need take a closer look tomorrow. But I think it is safe to close this one. Thanks again!

// check for all and return list of authorized indices
// if (isAllIndices(indicesRequest.indices())) {
if (IndexNameExpressionResolver.isAllIndices(indicesList(indicesRequest.indices()))) {
if (replaceWildcards) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you realized that the lone * index expression with replaceWildcards == false now behaves slightly different.
Before, Security would leave the * in-place and would leave core do its thing (fail in some way).
Now, we either fail in Security with index not found exception, or pass it to core as *, -*, which would also probably fail (again because request options say to not replace wildcards).
I'm not terribly concerned, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't realise that. Thanks for raising it. I was relying on the tests to catch any inconsistent behaviour and it seems we have some test gap.

I did some more digging and found there are behaviour differences:

  1. GET */_search?expand_wildcards=none&allow_no_indices=false -> 404 Error
    • Existing -> 404 Error
    • Proposed -> 404 Error
  2. GET */_search?expand_wildcards=none&allow_no_indices=true ->
    • Existing -> Empty search result
    • Proposed -> 404 Error

The difference is about how allow_no_indices is intepreted. I'd argue that the existing behaviour is not accurate because allow_no_indices implies "expansion" of either wildcards or aliases. But the request explicilty asks for no expansion. Hence the name * should be treated as a literal index name. In that case, the relevant option should be ignore_unavailable instead of allow_no_indices. That is the following request should return empty result

  1. GET */_search?expand_wildcards=none&ignore_unavailable=true
    • Existing -> Empty result
    • Proposed -> Empty result

Based on the following comment, it seems that core is not fully convinced with its own behaviour either.

// If only one index is specified then whether we fail a request if an index is missing depends on the allow_no_indices
// option. At some point we should change this, because there shouldn't be a reason why whether a single index
// or multiple indices are specified yield different behaviour.
final boolean failNoIndices = indexExpressions.length == 1
? options.allowNoIndices() == false
: options.ignoreUnavailable() == false;

That said, behaviour change is what I tried to avoid and it is the reason why I bothered to add special handling for error message. I'll make some adjustments. Thanks again for noticing it.

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm not sure this saves significant resources.
Maybe a few ifs, here and there, but then we'll have to go over all the names in the cluster state which dwarfs any savings, if any.
Am I missing something?

@ywangd
Copy link
Member Author

ywangd commented Oct 6, 2021

Overall I'm not sure this saves significant resources. Maybe a few ifs, here and there, but then we'll have to go over all the names in the cluster state which dwarfs any savings, if any. Am I missing something?

As detailed in above replies, I think it is better to further limit the scope of this change to:

  1. Avoid behaviour change
  2. Stay closer to where the performance gain is achieved

Therefore, I'd like to just guard the following loop

if (indicesOptions.ignoreUnavailable()) {
//out of all the explicit names (expanded from wildcards and original ones that were left untouched)
//remove all the ones that the current user is not authorized for and ignore them
replaced = replaced.stream().filter(authorizedIndices::contains).collect(Collectors.toList());
}

with an additional test so it becomes

if (indicesOptions.ignoreUnavailable() 
    && false == (replaceWildcards && Regex.isMatchAllPattern(indicesRequest.indices()[0]))) {
...
}

This means the loop is not executed if the requested indices is a single * and wildcard expansion is on. Please let me know what you think. Thanks!

@tvernum
Copy link
Contributor

tvernum commented Oct 7, 2021

if (indicesOptions.ignoreUnavailable()
&& false == (replaceWildcards && Regex.isMatchAllPattern(indicesRequest.indices()[0]))) {
...
}

I think we want

if (indicesOptions.ignoreUnavailable() 
    && split.getLocal().stream().allMatch(Regex::isSimpleMatchPattern) == false)

That is, we need to (potentially) remove unavailable (strictly speaking, "inaccessible") indices if there were any concrete indices in the local indices list.

That's a clever way to avoid list allocation. It does still step through replaced.

I suspect you'll find that the main performance issue is multiple re-allocations. You might expect that Collectors.toList generates a list with the correct initial size, but it doesn't - my guess (based on previous profiling) is that simply creating the new list to be the same initial size as the original list would actually give you most of the performance improvement you're seeing.

@ywangd
Copy link
Member Author

ywangd commented Oct 11, 2021

I incorporated Tim's suggestion as well as the skipping logic for a single wildcard pattern. This is ready for another look.

@ywangd ywangd changed the title Treat * as all indices when resolving indices Skip or faster filtering for unavailable indices Oct 11, 2021
@ywangd ywangd changed the title Skip or faster filtering for unavailable indices Faster filtering for unavailable indices Oct 11, 2021

public void testResolveNoExpandIgnoreUnavailable() {
SearchRequest request = new SearchRequest("missing*");
SearchRequest request = new SearchRequest(randomFrom("missing*", "*"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh .. I realised there is another edge case where we cannot safely skip filtering unavailable indices for a single *. If the cluster truly has no indices at all, * will be resolved into itself, i.e. a literal * and it should be removed from the resolved indices if ingoreUnavailable is requested (this test is updated to ensure that).

As a result, I decided to drop my original proposal for the special handling and just leave Tim's change for faster loop.

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at #76540 , maybe it achieves the same objective in a just slightly more efficient manner.

@ywangd
Copy link
Member Author

ywangd commented Oct 13, 2021

Closing in favor of #76540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v7.16.0 v8.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants