diff --git a/docs/reference/index-modules.asciidoc b/docs/reference/index-modules.asciidoc
index aa95bb8fa2e33..df4688db9d7f8 100644
--- a/docs/reference/index-modules.asciidoc
+++ b/docs/reference/index-modules.asciidoc
@@ -205,6 +205,7 @@ specific index module:
The maximum number of terms that can be used in Terms Query.
Defaults to `65536`.
+[[index-max-regex-length]]
`index.max_regex_length`::
The maximum length of regex that can be used in Regexp Query.
diff --git a/docs/reference/query-dsl.asciidoc b/docs/reference/query-dsl.asciidoc
index 74d22d6de411e..1a279101531c2 100644
--- a/docs/reference/query-dsl.asciidoc
+++ b/docs/reference/query-dsl.asciidoc
@@ -47,4 +47,6 @@ include::query-dsl/term-level-queries.asciidoc[]
include::query-dsl/minimum-should-match.asciidoc[]
-include::query-dsl/multi-term-rewrite.asciidoc[]
\ No newline at end of file
+include::query-dsl/multi-term-rewrite.asciidoc[]
+
+include::query-dsl/regexp-syntax.asciidoc[]
\ No newline at end of file
diff --git a/docs/reference/query-dsl/regexp-query.asciidoc b/docs/reference/query-dsl/regexp-query.asciidoc
index 1df4107f6ef7f..1feed72d45b25 100644
--- a/docs/reference/query-dsl/regexp-query.asciidoc
+++ b/docs/reference/query-dsl/regexp-query.asciidoc
@@ -4,98 +4,86 @@
Regexp
++++
-The `regexp` query allows you to use regular expression term queries.
-See <> for details of the supported regular expression language.
-The "term queries" in that first sentence means that Elasticsearch will apply
-the regexp to the terms produced by the tokenizer for that field, and not
-to the original text of the field.
+Returns documents that contain terms matching a
+https://en.wikipedia.org/wiki/Regular_expression[regular expression].
-*Note*: The performance of a `regexp` query heavily depends on the
-regular expression chosen. Matching everything like `.*` is very slow as
-well as using lookaround regular expressions. If possible, you should
-try to use a long prefix before your regular expression starts. Wildcard
-matchers like `.*?+` will mostly lower performance.
+A regular expression is a way to match patterns in data using placeholder
+characters, called operators. For a list of operators supported by the
+`regexp` query, see <>.
-[source,js]
---------------------------------------------------
-GET /_search
-{
- "query": {
- "regexp":{
- "name.first": "s.*y"
- }
- }
-}
---------------------------------------------------
-// CONSOLE
+[[regexp-query-ex-request]]
+==== Example request
-Boosting is also supported
+The following search returns documents where the `user` field contains any term
+that begins with `k` and ends with `y`. The `.*` operators match any
+characters of any length, including no characters. Matching
+terms can include `ky`, `kay`, and `kimchy`.
[source,js]
---------------------------------------------------
+----
GET /_search
{
"query": {
- "regexp":{
- "name.first":{
- "value":"s.*y",
- "boost":1.2
+ "regexp": {
+ "user": {
+ "value": "k.*y",
+ "flags" : "ALL",
+ "max_determinized_states": 10000,
+ "rewrite": "constant_score"
}
}
}
}
---------------------------------------------------
+----
// CONSOLE
-You can also use special flags
-[source,js]
---------------------------------------------------
-GET /_search
-{
- "query": {
- "regexp":{
- "name.first": {
- "value": "s.*y",
- "flags" : "INTERSECTION|COMPLEMENT|EMPTY"
- }
- }
- }
-}
---------------------------------------------------
-// CONSOLE
+[[regexp-top-level-params]]
+==== Top-level parameters for `regexp`
+``::
+(Required, object) Field you wish to search.
-Possible flags are `ALL` (default), `ANYSTRING`, `COMPLEMENT`,
-`EMPTY`, `INTERSECTION`, `INTERVAL`, or `NONE`. Please check the
-http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/util/automaton/RegExp.html[Lucene
-documentation] for their meaning
+[[regexp-query-field-params]]
+==== Parameters for ``
+`value`::
+(Required, string) Regular expression for terms you wish to find in the provided
+``. For a list of supported operators, see <>.
++
+--
+By default, regular expressions are limited to 1,000 characters. You can change
+this limit using the <>
+setting.
-Regular expressions are dangerous because it's easy to accidentally
-create an innocuous looking one that requires an exponential number of
-internal determinized automaton states (and corresponding RAM and CPU)
-for Lucene to execute. Lucene prevents these using the
-`max_determinized_states` setting (defaults to 10000). You can raise
-this limit to allow more complex regular expressions to execute.
+[WARNING]
+=====
+The performance of the `regexp` query can vary based on the regular expression
+provided. To improve performance, avoid using wildcard patterns, such as `.*` or
+`.*?+`, without a prefix or suffix.
+=====
+--
-[source,js]
---------------------------------------------------
-GET /_search
-{
- "query": {
- "regexp":{
- "name.first": {
- "value": "s.*y",
- "flags" : "INTERSECTION|COMPLEMENT|EMPTY",
- "max_determinized_states": 20000
- }
- }
- }
-}
---------------------------------------------------
-// CONSOLE
+`flags`::
+(Optional, string) Enables optional operators for the regular expression. For
+valid values and more information, see <>.
+
+`max_determinized_states`::
++
+--
+(Optional, integer) Maximum number of
+https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states]
+required for the query. Default is `10000`.
+
+{es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse
+regular expressions. Lucene converts each regular expression to a finite
+automaton containing a number of determinized states.
-NOTE: By default the maximum length of regex string allowed in a Regexp Query
-is limited to 1000. You can update the `index.max_regex_length` index setting
-to bypass this limit.
+You can use this parameter to prevent that conversion from unintentionally
+consuming too many resources. You may need to increase this limit to run complex
+regular expressions.
+--
-include::regexp-syntax.asciidoc[]
+`rewrite`::
+(Optional, string) Method used to rewrite the query. For valid values and more
+information, see the <>.
diff --git a/docs/reference/query-dsl/regexp-syntax.asciidoc b/docs/reference/query-dsl/regexp-syntax.asciidoc
index 74094b0cab1b0..cd8e24661728a 100644
--- a/docs/reference/query-dsl/regexp-syntax.asciidoc
+++ b/docs/reference/query-dsl/regexp-syntax.asciidoc
@@ -1,286 +1,224 @@
[[regexp-syntax]]
-==== Regular expression syntax
+== Regular expression syntax
-Regular expression queries are supported by the `regexp` and the `query_string`
-queries. The Lucene regular expression engine
-is not Perl-compatible but supports a smaller range of operators.
+A https://en.wikipedia.org/wiki/Regular_expression[regular expression] is a way to
+match patterns in data using placeholder characters, called operators.
-[NOTE]
-=====
-We will not attempt to explain regular expressions, but
-just explain the supported operators.
-=====
+{es} supports regular expressions in the following queries:
-===== Standard operators
+* <>
+* <>
-Anchoring::
-+
---
-
-Most regular expression engines allow you to match any part of a string.
-If you want the regexp pattern to start at the beginning of the string or
-finish at the end of the string, then you have to _anchor_ it specifically,
-using `^` to indicate the beginning or `$` to indicate the end.
-
-Lucene's patterns are always anchored. The pattern provided must match
-the entire string. For string `"abcde"`:
-
- ab.* # match
- abcd # no match
-
---
-
-Allowed characters::
-+
---
+{es} uses https://lucene.apache.org/core/[Apache Lucene]'s regular expression
+engine to parse these queries.
-Any Unicode characters may be used in the pattern, but certain characters
-are reserved and must be escaped. The standard reserved characters are:
+[float]
+[[regexp-reserved-characters]]
+=== Reserved characters
+Lucene's regular expression engine supports all Unicode characters. However, the
+following characters are reserved as operators:
....
. ? + * | { } [ ] ( ) " \
....
-If you enable optional features (see below) then these characters may
-also be reserved:
+Depending on the <> enabled, the
+following characters may also be reserved:
- # @ & < > ~
-
-Any reserved character can be escaped with a backslash `"\*"` including
-a literal backslash character: `"\\"`
+....
+# @ & < > ~
+....
-Additionally, any characters (except double quotes) are interpreted literally
-when surrounded by double quotes:
+To use one of these characters literally, escape it with a preceding
+backslash or surround it with double quotes. For example:
- john"@smith.com"
+....
+\@ # renders as a literal '@'
+\\ # renders as a literal '\'
+"john@smith.com" # renders as 'john@smith.com'
+....
+
+[float]
+[[regexp-standard-operators]]
+=== Standard operators
---
+Lucene's regular expression engine does not use the
+https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions[Perl
+Compatible Regular Expressions (PCRE)] library, but it does support the
+following standard operators.
-Match any character::
+`.`::
+
--
+Matches any character. For example:
-The period `"."` can be used to represent any character. For string `"abcde"`:
-
- ab... # match
- a.c.e # match
-
+....
+ab. # matches 'aba', 'abb', 'abz', etc.
+....
--
-One-or-more::
+`?`::
+
--
+Repeat the preceding character zero or one times. Often used to make the
+preceding character optional. For example:
-The plus sign `"+"` can be used to repeat the preceding shortest pattern
-once or more times. For string `"aaabbb"`:
-
- a+b+ # match
- aa+bb+ # match
- a+.+ # match
- aa+bbb+ # match
-
+....
+abc? # matches 'ab' and 'abc'
+....
--
-Zero-or-more::
+`+`::
+
--
+Repeat the preceding character one or more times. For example:
-The asterisk `"*"` can be used to match the preceding shortest pattern
-zero-or-more times. For string `"aaabbb`":
-
- a*b* # match
- a*b*c* # match
- .*bbb.* # match
- aaa*bbb* # match
-
+....
+ab+ # matches 'abb', 'abbb', 'abbbb', etc.
+....
--
-Zero-or-one::
+`*`::
+
--
+Repeat the preceding character zero or more times. For example:
-The question mark `"?"` makes the preceding shortest pattern optional. It
-matches zero or one times. For string `"aaabbb"`:
-
- aaa?bbb? # match
- aaaa?bbbb? # match
- .....?.? # match
- aa?bb? # no match
-
+....
+ab* # matches 'ab', 'abb', 'abbb', 'abbbb', etc.
+....
--
-Min-to-max::
+`{}`::
+
--
+Minimum and maximum number of times the preceding character can repeat. For
+example:
-Curly brackets `"{}"` can be used to specify a minimum and (optionally)
-a maximum number of times the preceding shortest pattern can repeat. The
-allowed forms are:
-
- {5} # repeat exactly 5 times
- {2,5} # repeat at least twice and at most 5 times
- {2,} # repeat at least twice
-
-For string `"aaabbb"`:
-
- a{3}b{3} # match
- a{2,4}b{2,4} # match
- a{2,}b{2,} # match
- .{3}.{3} # match
- a{4}b{4} # no match
- a{4,6}b{4,6} # no match
- a{4,}b{4,} # no match
-
+....
+a{2} # matches 'aa'
+a{2,4} # matches 'aa', 'aaa', and 'aaaa'
+a{2,} # matches 'a` repeated two or more times
+....
--
-Grouping::
+`|`::
+
--
-
-Parentheses `"()"` can be used to form sub-patterns. The quantity operators
-listed above operate on the shortest previous pattern, which can be a group.
-For string `"ababab"`:
-
- (ab)+ # match
- ab(ab)+ # match
- (..)+ # match
- (...)+ # no match
- (ab)* # match
- abab(ab)? # match
- ab(ab)? # no match
- (ab){3} # match
- (ab){1,2} # no match
-
+OR operator. The match will succeed if the longest pattern on either the left
+side OR the right side matches. For example:
+....
+abc|xyz # matches 'abc' and 'xyz'
+....
--
-Alternation::
+`( … )`::
+
--
+Forms a group. You can use a group to treat part of the expression as a single
+character. For example:
-The pipe symbol `"|"` acts as an OR operator. The match will succeed if
-the pattern on either the left-hand side OR the right-hand side matches.
-The alternation applies to the _longest pattern_, not the shortest.
-For string `"aabb"`:
-
- aabb|bbaa # match
- aacc|bb # no match
- aa(cc|bb) # match
- a+|b+ # no match
- a+b+|b+a+ # match
- a+(b|c)+ # match
-
+....
+abc(def)? # matches 'abc' and 'abcdef' but not 'abcd'
+....
--
-Character classes::
+`[ … ]`::
+
--
+Match one of the characters in the brackets. For example:
-Ranges of potential characters may be represented as character classes
-by enclosing them in square brackets `"[]"`. A leading `^`
-negates the character class. The allowed forms are:
-
- [abc] # 'a' or 'b' or 'c'
- [a-c] # 'a' or 'b' or 'c'
- [-abc] # '-' or 'a' or 'b' or 'c'
- [abc\-] # '-' or 'a' or 'b' or 'c'
- [^abc] # any character except 'a' or 'b' or 'c'
- [^a-c] # any character except 'a' or 'b' or 'c'
- [^-abc] # any character except '-' or 'a' or 'b' or 'c'
- [^abc\-] # any character except '-' or 'a' or 'b' or 'c'
+....
+[abc] # matches 'a', 'b', 'c'
+....
-Note that the dash `"-"` indicates a range of characters, unless it is
-the first character or if it is escaped with a backslash.
+Inside the brackets, `-` indicates a range unless `-` is the first character or
+escaped. For example:
-For string `"abcd"`:
+....
+[a-c] # matches 'a', 'b', or 'c'
+[-abc] # '-' is first character. Matches '-', 'a', 'b', or 'c'
+[abc\-] # Escapes '-'. Matches 'a', 'b', 'c', or '-'
+....
- ab[cd]+ # match
- [a-d]+ # match
- [^a-d]+ # no match
+A `^` before a character in the brackets negates the character or range. For
+example:
+....
+[^abc] # matches any character except 'a', 'b', or 'c'
+[^a-c] # matches any character except 'a', 'b', or 'c'
+[^-abc] # matches any character except '-', 'a', 'b', or 'c'
+[^abc\-] # matches any character except 'a', 'b', 'c', or '-'
+....
--
-===== Optional operators
-
-These operators are available by default as the `flags` parameter defaults to `ALL`.
-Different flag combinations (concatenated with `"|"`) can be used to enable/disable
-specific operators:
+[float]
+[[regexp-optional-operators]]
+=== Optional operators
- {
- "regexp": {
- "username": {
- "value": "john~athon<1-5>",
- "flags": "COMPLEMENT|INTERVAL"
- }
- }
- }
+You can use the `flags` parameter to enable more optional operators for
+Lucene's regular expression engine.
-Complement::
-+
---
-
-The complement is probably the most useful option. The shortest pattern that
-follows a tilde `"~"` is negated. For instance, `"ab~cd" means:
+To enable multiple operators, use a `|` separator. For example, a `flags` value
+of `COMPLEMENT|INTERVAL` enables the `COMPLEMENT` and `INTERVAL` operators.
-* Starts with `a`
-* Followed by `b`
-* Followed by a string of any length that is anything but `c`
-* Ends with `d`
+[float]
+==== Valid values
-For the string `"abcdef"`:
+`ALL` (Default)::
+Enables all optional operators.
- ab~df # match
- ab~cf # match
- ab~cdef # no match
- a~(cb)def # match
- a~(bc)def # no match
-
-Enabled with the `COMPLEMENT` or `ALL` flags.
+`COMPLEMENT`::
++
+--
+Enables the `~` operator. You can use `~` to negate the shortest following
+pattern. For example:
+....
+a~bc # matches 'adc' and 'aec' but not 'abc'
+....
--
-Interval::
+`INTERVAL`::
+
--
+Enables the `<>` operators. You can use `<>` to match a numeric range. For
+example:
-The interval option enables the use of numeric ranges, enclosed by angle
-brackets `"<>"`. For string: `"foo80"`:
-
- foo<1-100> # match
- foo<01-100> # match
- foo<001-100> # no match
-
-Enabled with the `INTERVAL` or `ALL` flags.
-
-
+....
+foo<1-100> # matches 'foo1', 'foo2' ... 'foo99', 'foo100'
+foo<01-100> # matches 'foo01', 'foo02' ... 'foo99', 'foo100'
+....
--
-Intersection::
+`INTERSECTION`::
+
--
+Enables the `&` operator, which acts as an AND operator. The match will succeed
+if patterns on both the left side AND the right side matches. For example:
-The ampersand `"&"` joins two patterns in a way that both of them have to
-match. For string `"aaabbb"`:
-
- aaa.+&.+bbb # match
- aaa&bbb # no match
-
-Using this feature usually means that you should rewrite your regular
-expression.
-
-Enabled with the `INTERSECTION` or `ALL` flags.
-
+....
+aaa.+&.+bbb # matches 'aaabbb'
+....
--
-Any string::
+`ANYSTRING`::
+
--
+Enables the `@` operator. You can use `@` to match any entire
+string.
-The at sign `"@"` matches any string in its entirety. This could be combined
-with the intersection and complement above to express ``everything except''.
-For instance:
+You can combine the `@` operator with `&` and `~` operators to create an
+"everything except" logic. For example:
- @&~(foo.+) # anything except string beginning with "foo"
-
-Enabled with the `ANYSTRING` or `ALL` flags.
+....
+@&~(abc.+) # matches everything except terms beginning with 'abc'
+....
--
+
+[float]
+[[regexp-unsupported-operators]]
+=== Unsupported operators
+Lucene's regular expression engine does not support anchor operators, such as
+`^` (beginning of line) or `$` (end of line). To match a term, the regular
+expression must match the entire string.
\ No newline at end of file
diff --git a/x-pack/docs/en/rest-api/security/role-mapping-resources.asciidoc b/x-pack/docs/en/rest-api/security/role-mapping-resources.asciidoc
index 54581d4c72195..1b41d89db0bf1 100644
--- a/x-pack/docs/en/rest-api/security/role-mapping-resources.asciidoc
+++ b/x-pack/docs/en/rest-api/security/role-mapping-resources.asciidoc
@@ -49,7 +49,7 @@ The value specified in the field rule can be one of the following types:
| Simple String | Exactly matches the provided value. | "esadmin"
| Wildcard String | Matches the provided value using a wildcard. | "*,dc=example,dc=com"
| Regular Expression | Matches the provided value using a
- {ref}/query-dsl-regexp-query.html#regexp-syntax[Lucene regexp]. | "/.\*-admin[0-9]*/"
+ {ref}/regexp-syntax.html[Lucene regexp]. | "/.\*-admin[0-9]*/"
| Number | Matches an equivalent numerical value. | 7
| Null | Matches a null or missing value. | null
| Array | Tests each element in the array in
diff --git a/x-pack/docs/en/security/auditing/output-logfile.asciidoc b/x-pack/docs/en/security/auditing/output-logfile.asciidoc
index f5b1dbad79ae9..422d987fe343f 100644
--- a/x-pack/docs/en/security/auditing/output-logfile.asciidoc
+++ b/x-pack/docs/en/security/auditing/output-logfile.asciidoc
@@ -132,7 +132,7 @@ Please take time to review these policies whenever your system architecture chan
A policy is a named set of filter rules. Each filter rule applies to a single event attribute,
one of the `users`, `realms`, `roles` or `indices` attributes. The filter rule defines
-a list of {ref}/query-dsl-regexp-query.html#regexp-syntax[Lucene regexp], *any* of which has to match the value of the audit
+a list of {ref}/regexp-syntax.html[Lucene regexp], *any* of which has to match the value of the audit
event attribute for the rule to match.
A policy matches an event if *all* the rules comprising it match the event.
An audit event is ignored, therefore not printed, if it matches *any* policy. All other