diff --git a/docs/reference/index-modules.asciidoc b/docs/reference/index-modules.asciidoc index aa95bb8fa2e33..df4688db9d7f8 100644 --- a/docs/reference/index-modules.asciidoc +++ b/docs/reference/index-modules.asciidoc @@ -205,6 +205,7 @@ specific index module: The maximum number of terms that can be used in Terms Query. Defaults to `65536`. +[[index-max-regex-length]] `index.max_regex_length`:: The maximum length of regex that can be used in Regexp Query. diff --git a/docs/reference/query-dsl.asciidoc b/docs/reference/query-dsl.asciidoc index 74d22d6de411e..1a279101531c2 100644 --- a/docs/reference/query-dsl.asciidoc +++ b/docs/reference/query-dsl.asciidoc @@ -47,4 +47,6 @@ include::query-dsl/term-level-queries.asciidoc[] include::query-dsl/minimum-should-match.asciidoc[] -include::query-dsl/multi-term-rewrite.asciidoc[] \ No newline at end of file +include::query-dsl/multi-term-rewrite.asciidoc[] + +include::query-dsl/regexp-syntax.asciidoc[] \ No newline at end of file diff --git a/docs/reference/query-dsl/regexp-query.asciidoc b/docs/reference/query-dsl/regexp-query.asciidoc index 1df4107f6ef7f..1feed72d45b25 100644 --- a/docs/reference/query-dsl/regexp-query.asciidoc +++ b/docs/reference/query-dsl/regexp-query.asciidoc @@ -4,98 +4,86 @@ Regexp ++++ -The `regexp` query allows you to use regular expression term queries. -See <> for details of the supported regular expression language. -The "term queries" in that first sentence means that Elasticsearch will apply -the regexp to the terms produced by the tokenizer for that field, and not -to the original text of the field. +Returns documents that contain terms matching a +https://en.wikipedia.org/wiki/Regular_expression[regular expression]. -*Note*: The performance of a `regexp` query heavily depends on the -regular expression chosen. Matching everything like `.*` is very slow as -well as using lookaround regular expressions. If possible, you should -try to use a long prefix before your regular expression starts. Wildcard -matchers like `.*?+` will mostly lower performance. +A regular expression is a way to match patterns in data using placeholder +characters, called operators. For a list of operators supported by the +`regexp` query, see <>. -[source,js] --------------------------------------------------- -GET /_search -{ - "query": { - "regexp":{ - "name.first": "s.*y" - } - } -} --------------------------------------------------- -// CONSOLE +[[regexp-query-ex-request]] +==== Example request -Boosting is also supported +The following search returns documents where the `user` field contains any term +that begins with `k` and ends with `y`. The `.*` operators match any +characters of any length, including no characters. Matching +terms can include `ky`, `kay`, and `kimchy`. [source,js] --------------------------------------------------- +---- GET /_search { "query": { - "regexp":{ - "name.first":{ - "value":"s.*y", - "boost":1.2 + "regexp": { + "user": { + "value": "k.*y", + "flags" : "ALL", + "max_determinized_states": 10000, + "rewrite": "constant_score" } } } } --------------------------------------------------- +---- // CONSOLE -You can also use special flags -[source,js] --------------------------------------------------- -GET /_search -{ - "query": { - "regexp":{ - "name.first": { - "value": "s.*y", - "flags" : "INTERSECTION|COMPLEMENT|EMPTY" - } - } - } -} --------------------------------------------------- -// CONSOLE +[[regexp-top-level-params]] +==== Top-level parameters for `regexp` +``:: +(Required, object) Field you wish to search. -Possible flags are `ALL` (default), `ANYSTRING`, `COMPLEMENT`, -`EMPTY`, `INTERSECTION`, `INTERVAL`, or `NONE`. Please check the -http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/util/automaton/RegExp.html[Lucene -documentation] for their meaning +[[regexp-query-field-params]] +==== Parameters for `` +`value`:: +(Required, string) Regular expression for terms you wish to find in the provided +``. For a list of supported operators, see <>. ++ +-- +By default, regular expressions are limited to 1,000 characters. You can change +this limit using the <> +setting. -Regular expressions are dangerous because it's easy to accidentally -create an innocuous looking one that requires an exponential number of -internal determinized automaton states (and corresponding RAM and CPU) -for Lucene to execute. Lucene prevents these using the -`max_determinized_states` setting (defaults to 10000). You can raise -this limit to allow more complex regular expressions to execute. +[WARNING] +===== +The performance of the `regexp` query can vary based on the regular expression +provided. To improve performance, avoid using wildcard patterns, such as `.*` or +`.*?+`, without a prefix or suffix. +===== +-- -[source,js] --------------------------------------------------- -GET /_search -{ - "query": { - "regexp":{ - "name.first": { - "value": "s.*y", - "flags" : "INTERSECTION|COMPLEMENT|EMPTY", - "max_determinized_states": 20000 - } - } - } -} --------------------------------------------------- -// CONSOLE +`flags`:: +(Optional, string) Enables optional operators for the regular expression. For +valid values and more information, see <>. + +`max_determinized_states`:: ++ +-- +(Optional, integer) Maximum number of +https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states] +required for the query. Default is `10000`. + +{es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse +regular expressions. Lucene converts each regular expression to a finite +automaton containing a number of determinized states. -NOTE: By default the maximum length of regex string allowed in a Regexp Query -is limited to 1000. You can update the `index.max_regex_length` index setting -to bypass this limit. +You can use this parameter to prevent that conversion from unintentionally +consuming too many resources. You may need to increase this limit to run complex +regular expressions. +-- -include::regexp-syntax.asciidoc[] +`rewrite`:: +(Optional, string) Method used to rewrite the query. For valid values and more +information, see the <>. diff --git a/docs/reference/query-dsl/regexp-syntax.asciidoc b/docs/reference/query-dsl/regexp-syntax.asciidoc index 74094b0cab1b0..cd8e24661728a 100644 --- a/docs/reference/query-dsl/regexp-syntax.asciidoc +++ b/docs/reference/query-dsl/regexp-syntax.asciidoc @@ -1,286 +1,224 @@ [[regexp-syntax]] -==== Regular expression syntax +== Regular expression syntax -Regular expression queries are supported by the `regexp` and the `query_string` -queries. The Lucene regular expression engine -is not Perl-compatible but supports a smaller range of operators. +A https://en.wikipedia.org/wiki/Regular_expression[regular expression] is a way to +match patterns in data using placeholder characters, called operators. -[NOTE] -===== -We will not attempt to explain regular expressions, but -just explain the supported operators. -===== +{es} supports regular expressions in the following queries: -===== Standard operators +* <> +* <> -Anchoring:: -+ --- - -Most regular expression engines allow you to match any part of a string. -If you want the regexp pattern to start at the beginning of the string or -finish at the end of the string, then you have to _anchor_ it specifically, -using `^` to indicate the beginning or `$` to indicate the end. - -Lucene's patterns are always anchored. The pattern provided must match -the entire string. For string `"abcde"`: - - ab.* # match - abcd # no match - --- - -Allowed characters:: -+ --- +{es} uses https://lucene.apache.org/core/[Apache Lucene]'s regular expression +engine to parse these queries. -Any Unicode characters may be used in the pattern, but certain characters -are reserved and must be escaped. The standard reserved characters are: +[float] +[[regexp-reserved-characters]] +=== Reserved characters +Lucene's regular expression engine supports all Unicode characters. However, the +following characters are reserved as operators: .... . ? + * | { } [ ] ( ) " \ .... -If you enable optional features (see below) then these characters may -also be reserved: +Depending on the <> enabled, the +following characters may also be reserved: - # @ & < > ~ - -Any reserved character can be escaped with a backslash `"\*"` including -a literal backslash character: `"\\"` +.... +# @ & < > ~ +.... -Additionally, any characters (except double quotes) are interpreted literally -when surrounded by double quotes: +To use one of these characters literally, escape it with a preceding +backslash or surround it with double quotes. For example: - john"@smith.com" +.... +\@ # renders as a literal '@' +\\ # renders as a literal '\' +"john@smith.com" # renders as 'john@smith.com' +.... + +[float] +[[regexp-standard-operators]] +=== Standard operators --- +Lucene's regular expression engine does not use the +https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions[Perl +Compatible Regular Expressions (PCRE)] library, but it does support the +following standard operators. -Match any character:: +`.`:: + -- +Matches any character. For example: -The period `"."` can be used to represent any character. For string `"abcde"`: - - ab... # match - a.c.e # match - +.... +ab. # matches 'aba', 'abb', 'abz', etc. +.... -- -One-or-more:: +`?`:: + -- +Repeat the preceding character zero or one times. Often used to make the +preceding character optional. For example: -The plus sign `"+"` can be used to repeat the preceding shortest pattern -once or more times. For string `"aaabbb"`: - - a+b+ # match - aa+bb+ # match - a+.+ # match - aa+bbb+ # match - +.... +abc? # matches 'ab' and 'abc' +.... -- -Zero-or-more:: +`+`:: + -- +Repeat the preceding character one or more times. For example: -The asterisk `"*"` can be used to match the preceding shortest pattern -zero-or-more times. For string `"aaabbb`": - - a*b* # match - a*b*c* # match - .*bbb.* # match - aaa*bbb* # match - +.... +ab+ # matches 'abb', 'abbb', 'abbbb', etc. +.... -- -Zero-or-one:: +`*`:: + -- +Repeat the preceding character zero or more times. For example: -The question mark `"?"` makes the preceding shortest pattern optional. It -matches zero or one times. For string `"aaabbb"`: - - aaa?bbb? # match - aaaa?bbbb? # match - .....?.? # match - aa?bb? # no match - +.... +ab* # matches 'ab', 'abb', 'abbb', 'abbbb', etc. +.... -- -Min-to-max:: +`{}`:: + -- +Minimum and maximum number of times the preceding character can repeat. For +example: -Curly brackets `"{}"` can be used to specify a minimum and (optionally) -a maximum number of times the preceding shortest pattern can repeat. The -allowed forms are: - - {5} # repeat exactly 5 times - {2,5} # repeat at least twice and at most 5 times - {2,} # repeat at least twice - -For string `"aaabbb"`: - - a{3}b{3} # match - a{2,4}b{2,4} # match - a{2,}b{2,} # match - .{3}.{3} # match - a{4}b{4} # no match - a{4,6}b{4,6} # no match - a{4,}b{4,} # no match - +.... +a{2} # matches 'aa' +a{2,4} # matches 'aa', 'aaa', and 'aaaa' +a{2,} # matches 'a` repeated two or more times +.... -- -Grouping:: +`|`:: + -- - -Parentheses `"()"` can be used to form sub-patterns. The quantity operators -listed above operate on the shortest previous pattern, which can be a group. -For string `"ababab"`: - - (ab)+ # match - ab(ab)+ # match - (..)+ # match - (...)+ # no match - (ab)* # match - abab(ab)? # match - ab(ab)? # no match - (ab){3} # match - (ab){1,2} # no match - +OR operator. The match will succeed if the longest pattern on either the left +side OR the right side matches. For example: +.... +abc|xyz # matches 'abc' and 'xyz' +.... -- -Alternation:: +`( … )`:: + -- +Forms a group. You can use a group to treat part of the expression as a single +character. For example: -The pipe symbol `"|"` acts as an OR operator. The match will succeed if -the pattern on either the left-hand side OR the right-hand side matches. -The alternation applies to the _longest pattern_, not the shortest. -For string `"aabb"`: - - aabb|bbaa # match - aacc|bb # no match - aa(cc|bb) # match - a+|b+ # no match - a+b+|b+a+ # match - a+(b|c)+ # match - +.... +abc(def)? # matches 'abc' and 'abcdef' but not 'abcd' +.... -- -Character classes:: +`[ … ]`:: + -- +Match one of the characters in the brackets. For example: -Ranges of potential characters may be represented as character classes -by enclosing them in square brackets `"[]"`. A leading `^` -negates the character class. The allowed forms are: - - [abc] # 'a' or 'b' or 'c' - [a-c] # 'a' or 'b' or 'c' - [-abc] # '-' or 'a' or 'b' or 'c' - [abc\-] # '-' or 'a' or 'b' or 'c' - [^abc] # any character except 'a' or 'b' or 'c' - [^a-c] # any character except 'a' or 'b' or 'c' - [^-abc] # any character except '-' or 'a' or 'b' or 'c' - [^abc\-] # any character except '-' or 'a' or 'b' or 'c' +.... +[abc] # matches 'a', 'b', 'c' +.... -Note that the dash `"-"` indicates a range of characters, unless it is -the first character or if it is escaped with a backslash. +Inside the brackets, `-` indicates a range unless `-` is the first character or +escaped. For example: -For string `"abcd"`: +.... +[a-c] # matches 'a', 'b', or 'c' +[-abc] # '-' is first character. Matches '-', 'a', 'b', or 'c' +[abc\-] # Escapes '-'. Matches 'a', 'b', 'c', or '-' +.... - ab[cd]+ # match - [a-d]+ # match - [^a-d]+ # no match +A `^` before a character in the brackets negates the character or range. For +example: +.... +[^abc] # matches any character except 'a', 'b', or 'c' +[^a-c] # matches any character except 'a', 'b', or 'c' +[^-abc] # matches any character except '-', 'a', 'b', or 'c' +[^abc\-] # matches any character except 'a', 'b', 'c', or '-' +.... -- -===== Optional operators - -These operators are available by default as the `flags` parameter defaults to `ALL`. -Different flag combinations (concatenated with `"|"`) can be used to enable/disable -specific operators: +[float] +[[regexp-optional-operators]] +=== Optional operators - { - "regexp": { - "username": { - "value": "john~athon<1-5>", - "flags": "COMPLEMENT|INTERVAL" - } - } - } +You can use the `flags` parameter to enable more optional operators for +Lucene's regular expression engine. -Complement:: -+ --- - -The complement is probably the most useful option. The shortest pattern that -follows a tilde `"~"` is negated. For instance, `"ab~cd" means: +To enable multiple operators, use a `|` separator. For example, a `flags` value +of `COMPLEMENT|INTERVAL` enables the `COMPLEMENT` and `INTERVAL` operators. -* Starts with `a` -* Followed by `b` -* Followed by a string of any length that is anything but `c` -* Ends with `d` +[float] +==== Valid values -For the string `"abcdef"`: +`ALL` (Default):: +Enables all optional operators. - ab~df # match - ab~cf # match - ab~cdef # no match - a~(cb)def # match - a~(bc)def # no match - -Enabled with the `COMPLEMENT` or `ALL` flags. +`COMPLEMENT`:: ++ +-- +Enables the `~` operator. You can use `~` to negate the shortest following +pattern. For example: +.... +a~bc # matches 'adc' and 'aec' but not 'abc' +.... -- -Interval:: +`INTERVAL`:: + -- +Enables the `<>` operators. You can use `<>` to match a numeric range. For +example: -The interval option enables the use of numeric ranges, enclosed by angle -brackets `"<>"`. For string: `"foo80"`: - - foo<1-100> # match - foo<01-100> # match - foo<001-100> # no match - -Enabled with the `INTERVAL` or `ALL` flags. - - +.... +foo<1-100> # matches 'foo1', 'foo2' ... 'foo99', 'foo100' +foo<01-100> # matches 'foo01', 'foo02' ... 'foo99', 'foo100' +.... -- -Intersection:: +`INTERSECTION`:: + -- +Enables the `&` operator, which acts as an AND operator. The match will succeed +if patterns on both the left side AND the right side matches. For example: -The ampersand `"&"` joins two patterns in a way that both of them have to -match. For string `"aaabbb"`: - - aaa.+&.+bbb # match - aaa&bbb # no match - -Using this feature usually means that you should rewrite your regular -expression. - -Enabled with the `INTERSECTION` or `ALL` flags. - +.... +aaa.+&.+bbb # matches 'aaabbb' +.... -- -Any string:: +`ANYSTRING`:: + -- +Enables the `@` operator. You can use `@` to match any entire +string. -The at sign `"@"` matches any string in its entirety. This could be combined -with the intersection and complement above to express ``everything except''. -For instance: +You can combine the `@` operator with `&` and `~` operators to create an +"everything except" logic. For example: - @&~(foo.+) # anything except string beginning with "foo" - -Enabled with the `ANYSTRING` or `ALL` flags. +.... +@&~(abc.+) # matches everything except terms beginning with 'abc' +.... -- + +[float] +[[regexp-unsupported-operators]] +=== Unsupported operators +Lucene's regular expression engine does not support anchor operators, such as +`^` (beginning of line) or `$` (end of line). To match a term, the regular +expression must match the entire string. \ No newline at end of file diff --git a/x-pack/docs/en/rest-api/security/role-mapping-resources.asciidoc b/x-pack/docs/en/rest-api/security/role-mapping-resources.asciidoc index 54581d4c72195..1b41d89db0bf1 100644 --- a/x-pack/docs/en/rest-api/security/role-mapping-resources.asciidoc +++ b/x-pack/docs/en/rest-api/security/role-mapping-resources.asciidoc @@ -49,7 +49,7 @@ The value specified in the field rule can be one of the following types: | Simple String | Exactly matches the provided value. | "esadmin" | Wildcard String | Matches the provided value using a wildcard. | "*,dc=example,dc=com" | Regular Expression | Matches the provided value using a - {ref}/query-dsl-regexp-query.html#regexp-syntax[Lucene regexp]. | "/.\*-admin[0-9]*/" + {ref}/regexp-syntax.html[Lucene regexp]. | "/.\*-admin[0-9]*/" | Number | Matches an equivalent numerical value. | 7 | Null | Matches a null or missing value. | null | Array | Tests each element in the array in diff --git a/x-pack/docs/en/security/auditing/output-logfile.asciidoc b/x-pack/docs/en/security/auditing/output-logfile.asciidoc index f5b1dbad79ae9..422d987fe343f 100644 --- a/x-pack/docs/en/security/auditing/output-logfile.asciidoc +++ b/x-pack/docs/en/security/auditing/output-logfile.asciidoc @@ -132,7 +132,7 @@ Please take time to review these policies whenever your system architecture chan A policy is a named set of filter rules. Each filter rule applies to a single event attribute, one of the `users`, `realms`, `roles` or `indices` attributes. The filter rule defines -a list of {ref}/query-dsl-regexp-query.html#regexp-syntax[Lucene regexp], *any* of which has to match the value of the audit +a list of {ref}/regexp-syntax.html[Lucene regexp], *any* of which has to match the value of the audit event attribute for the rule to match. A policy matches an event if *all* the rules comprising it match the event. An audit event is ignored, therefore not printed, if it matches *any* policy. All other