Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ ResolvedIndices resolveIndicesAndAliases(String action, IndicesRequest indicesRe
final boolean replaceWildcards = indicesOptions.expandWildcardsOpen() || indicesOptions.expandWildcardsClosed();

// check for all and return list of authorized indices
if (IndexNameExpressionResolver.isAllIndices(indicesList(indicesRequest.indices()))) {
if (isAllIndices(indicesRequest.indices())) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ~10K indices, this change makes the method about 1-2 ms faster (~12ms vs ~10ms). This mostly helps requests that are directly created for REST level requests, e.g. SearchRequest.

Currently, the index authorization for SearchRequest with ~10k indices takes about 22ms and its divided as:

  • loadAuthorizedIndices - ~3ms
  • buildIndicesAccessControl - ~7ms
  • ResolveIndices (this method) - ~12ms

So a 2ms reduction is close to 10% performance gain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about making the change in IndexNameExpressionResolver.isAllIndices. But it has a test that specifically make sure false is returned for * ... Not sure whether that is still necessary. Anyway, limiting the change to where it is used is safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is surprising to me that not converting an empty array to an empty list saves 2 ms, while building a List of 10k elements takes 10 ms.
Or instead of the array->list conversion, it is the fact that * is handled here rather than indexAbstractionResolver.resolveIndexAbstractions? They have the same code, so I don't understand it either.
I might be missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. I should have given more context in my initial comment. The time saving is observed for a specific field caps query during benckmarking tests:

GET /*/_field_caps?fields=*&ignore_unavailable=true&allow_no_indices=false

The option ignore_unavailable=true is critical here because that means the following code in the else branch will execute:

if (indicesOptions.ignoreUnavailable()) {
//out of all the explicit names (expanded from wildcards and original ones that were left untouched)
//remove all the ones that the current user is not authorized for and ignore them
replaced = replaced.stream().filter(authorizedIndices::contains).collect(Collectors.toList());
}

If authorizedIndices is large, replaced is also large (because it is an expansion of *), this loop and collecting are where the performance differs.

Copy link
Contributor

@tvernum tvernum Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace that stream with something more efficient then?
Like:

 if (indicesOptions.ignoreUnavailable()) { 
     //out of all the explicit names (expanded from wildcards and original ones that were left untouched) 
     //remove all the ones that the current user is not authorized for and ignore them 
     List<String> filtered = null;
     for (ListIterator<String> itr = replaced.listIterator(); itr.hasNext();) {
          final String index = itr.next();
          if (authorizedIndices.contains(index) == false) {
               if (filtered == null) {
                   filtered = new ArrayList(replaced.size()-1);
                   filtered.addAll( replaced.subList(0, itr.previousIndex()) );
               }
          } else {
               if (filtered != null) {
                   filtered.add(index);
               }
          }
    }
    if (filtered != null) {
        replaced = fitered
    }
 } 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clever way to avoid list allocation. It does still step through replaced. So for the case of a single *, I think it is still worthwhile to skip it entirely.

Strictly speaking, if every item of the original indices indicesRequest.indices() contains wildcard, we can safely skip this block of code. But the value of skipping kinda decreases with increase in size of the original indices because it introduces its own loop. So maybe we can combine skipping for single original index and your faster loop to something like (note it uses isSimpleMatchPatterninstead ofisMatchAllPattern`):

var indices = indicesRequest.indices();
if (indicesOptions.ignoreUnavailable() && indices.length == 1 && Regex.isSimpleMatchPattern(indices[0])) {
    // Tim's faster loop here
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment there mentions, the replaced collection contains authorized indices that match whatever wildcards from the request's index expression plus whatever concrete names the request had, which have been copied over when build replaced.

The code that you're working on improving now is iterating over the replaced collection, after it had been built, and removes the concrete names. But wouldn't it be better if the names were not added to the replaced collection in the first place?
This is what I ended up proposing in #76540 .

Please take a look, if it works like I tell it works, it should improve over your proposal still since it is avoiding iterating over replaced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @albertzaharovits ! I wasn't aware the existence of #76540 ! I think it is definitely a better solution than this PR. I think #76540 looks great by browsing through it. I need take a closer look tomorrow. But I think it is safe to close this one. Thanks again!

if (replaceWildcards) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you realized that the lone * index expression with replaceWildcards == false now behaves slightly different.
Before, Security would leave the * in-place and would leave core do its thing (fail in some way).
Now, we either fail in Security with index not found exception, or pass it to core as *, -*, which would also probably fail (again because request options say to not replace wildcards).
I'm not terribly concerned, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't realise that. Thanks for raising it. I was relying on the tests to catch any inconsistent behaviour and it seems we have some test gap.

I did some more digging and found there are behaviour differences:

  1. GET */_search?expand_wildcards=none&allow_no_indices=false -> 404 Error
    • Existing -> 404 Error
    • Proposed -> 404 Error
  2. GET */_search?expand_wildcards=none&allow_no_indices=true ->
    • Existing -> Empty search result
    • Proposed -> 404 Error

The difference is about how allow_no_indices is intepreted. I'd argue that the existing behaviour is not accurate because allow_no_indices implies "expansion" of either wildcards or aliases. But the request explicilty asks for no expansion. Hence the name * should be treated as a literal index name. In that case, the relevant option should be ignore_unavailable instead of allow_no_indices. That is the following request should return empty result

  1. GET */_search?expand_wildcards=none&ignore_unavailable=true
    • Existing -> Empty result
    • Proposed -> Empty result

Based on the following comment, it seems that core is not fully convinced with its own behaviour either.

// If only one index is specified then whether we fail a request if an index is missing depends on the allow_no_indices
// option. At some point we should change this, because there shouldn't be a reason why whether a single index
// or multiple indices are specified yield different behaviour.
final boolean failNoIndices = indexExpressions.length == 1
? options.allowNoIndices() == false
: options.ignoreUnavailable() == false;

That said, behaviour change is what I tried to avoid and it is the reason why I bothered to add special handling for error message. I'll make some adjustments. Thanks again for noticing it.

for (String authorizedIndex : authorizedIndices) {
if (IndexAbstractionResolver.isIndexVisible("*", authorizedIndex, indicesOptions, metadata, nameExpressionResolver,
Expand Down Expand Up @@ -244,6 +244,11 @@ ResolvedIndices resolveIndicesAndAliases(String action, IndicesRequest indicesRe
return resolvedIndicesBuilder.build();
}

static boolean isAllIndices(String[] names) {
return names == null || names.length == 0
|| (names.length == 1 && (Metadata.ALL.equals(names[0]) || Regex.isMatchAllPattern(names[0])));
}

/**
* Special handling of the value to authorize for a put mapping request. Dynamic put mapping
* requests use a concrete index, but we allow permissions to be defined on aliases so if the
Expand Down