Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/opensearch/bool.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: default
title: Boolean Queries
title: Boolean queries
parent: OpenSearch
nav_order: 11
---
Expand Down
26 changes: 12 additions & 14 deletions docs/opensearch/cluster.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: default
title: Cluster Formation
title: Cluster formation
parent: OpenSearch
nav_order: 2
---
Expand All @@ -13,17 +13,17 @@ OpenSearch can operate as a single-node or multi-node cluster. The steps to conf

To create and deploy an OpenSearch cluster according to your requirements, it’s important to understand how node discovery and cluster formation work and what settings govern them.

There are many ways that you can design a cluster. The following illustration shows a basic architecture.
There are many ways to design a cluster. The following illustration shows a basic architecture:

![multi-node cluster architecture diagram](../../images/cluster.png)

This is a four-node cluster that has one dedicated master node, one dedicated coordinating node, and two data nodes that are master-eligible and also used for ingesting data.

The following table provides brief descriptions of the node types.
The following table provides brief descriptions of the node types:

Node type | Description | Best practices for production
:--- | :--- | :-- |
`Master` | Manages the overall operation of a cluster and keeps track of the cluster state. This includes creating and deleting indices, keeping track of the nodes that join and leave the cluster, checking the health of each node in the cluster (by running ping requests), and allocating shards to nodes. | Three dedicated master nodes in three different zones is the right approach for almost all production use cases. This makes sure your cluster never loses quorum. Two nodes will be idle for most of the time except when one node goes down or needs some maintenance.
`Master` | Manages the overall operation of a cluster and keeps track of the cluster state. This includes creating and deleting indices, keeping track of the nodes that join and leave the cluster, checking the health of each node in the cluster (by running ping requests), and allocating shards to nodes. | Three dedicated master nodes in three different zones is the right approach for almost all production use cases. This configuration ensures your cluster never loses quorum. Two nodes will be idle for most of the time except when one node goes down or needs some maintenance.
`Master-eligible` | Elects one node among them as the master node through a voting process. | For production clusters, make sure you have dedicated master nodes. The way to achieve a dedicated node type is to mark all other node types as false. In this case, you have to mark all the other nodes as not master-eligible.
`Data` | Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards. These are the worker nodes of your cluster and need more disk space than any other node type. | As you add data nodes, keep them balanced between zones. For example, if you have three zones, add data nodes in multiples of three, one for each zone. We recommend using storage and RAM-heavy nodes.
`Ingest` | Preprocesses data before storing it in the cluster. Runs an ingest pipeline that transforms your data before adding it to an index. | If you plan to ingest a lot of data and run complex ingest pipelines, we recommend you use dedicated ingest nodes. You can also optionally offload your indexing from the data nodes so that your data nodes are used exclusively for searching and aggregating.
Expand All @@ -37,11 +37,9 @@ This page demonstrates how to work with the different node types. It assumes tha

## Prerequisites

Before you get started, you must install and configure OpenSearch on all of your nodes. For information about the available options, see [Install and Configure](../../install/).
Before you get started, you must install and configure OpenSearch on all of your nodes. For information about the available options, see [Install and configure OpenSearch](../../install/).

After you are done, use SSH to connect to each node, and then open the `config/opensearch.yml` file.

You can set all configurations for your cluster in this file.
After you're done, use SSH to connect to each node, then open the `config/opensearch.yml` file. You can set all configurations for your cluster in this file.

## Step 1: Name a cluster

Expand Down Expand Up @@ -132,7 +130,7 @@ node.ingest: false

## Step 3: Bind a cluster to specific IP addresses

`network_host` defines the IP address that's used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6:
`network_host` defines the IP address used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6:

```yml
network.host: [_local_, _site_]
Expand All @@ -154,7 +152,7 @@ Now that you've configured the network hosts, you need to configure the discover

Zen Discovery is the built-in, default mechanism that uses [unicast](https://en.wikipedia.org/wiki/Unicast) to find other nodes in the cluster.

You can generally just add all of your master-eligible nodes to the `discovery.seed_hosts` array. When a node starts up, it finds the other master-eligible nodes, determines which one is the master, and asks to join the cluster.
You can generally just add all your master-eligible nodes to the `discovery.seed_hosts` array. When a node starts up, it finds the other master-eligible nodes, determines which one is the master, and asks to join the cluster.

For example, for `opensearch-master` the line looks something like this:

Expand All @@ -165,7 +163,7 @@ discovery.seed_hosts: ["<private IP of opensearch-d1>", "<private IP of opensear

## Step 5: Start the cluster

After you set the configurations, start OpenSearch on all nodes.
After you set the configurations, start OpenSearch on all nodes:

```bash
sudo systemctl start opensearch.service
Expand Down Expand Up @@ -220,9 +218,9 @@ PUT _cluster/settings
}
```

You can either use `persistent` or `transient` settings. We recommend the `persistent` setting because it persists through a cluster reboot. Transient settings do not persist through a cluster reboot.
You can either use `persistent` or `transient` settings. We recommend the `persistent` setting because it persists through a cluster reboot. Transient settings don't persist through a cluster reboot.

Shard allocation awareness attempts to separate primary and replica shards across multiple zones. But, if only one zone is available (such as after a zone failure), OpenSearch allocates replica shards to the only remaining zone.
Shard allocation awareness attempts to separate primary and replica shards across multiple zones. However, if only one zone is available (such as after a zone failure), OpenSearch allocates replica shards to the only remaining zone.

Another option is to require that primary and replica shards are never allocated to the same zone. This is called forced awareness.

Expand All @@ -238,7 +236,7 @@ PUT _cluster/settings
}
```

Now, if a data node fails, forced awareness does not allocate the replicas to a node in the same zone. Instead, the cluster enters a yellow state and only allocates the replicas when nodes in another zone come online.
Now, if a data node fails, forced awareness doesn't allocate the replicas to a node in the same zone. Instead, the cluster enters a yellow state and only allocates the replicas when nodes in another zone come online.

In our two-zone architecture, we can use allocation awareness if `opensearch-d1` and `opensearch-d2` are less than 50% utilized, so that each of them have the storage capacity to allocate replicas in the same zone.
If that is not the case, and `opensearch-d1` and `opensearch-d2` do not have the capacity to contain all primary and replica shards, we can use forced awareness. This approach helps to make sure that, in the event of a failure, OpenSearch doesn't overload your last remaining zone and lock up your cluster due to lack of storage.
Expand Down
6 changes: 3 additions & 3 deletions docs/opensearch/full-text.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: default
title: Full-Text Queries
title: Full-text queries
parent: OpenSearch
nav_order: 10
---
Expand Down Expand Up @@ -421,10 +421,10 @@ Option | Valid values | Description
`fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). <br /><br />If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. The default is false.
`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common Terms](#common-terms) queries and `operator` in this table.
`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common terms](#common-terms) queries and `operator` in this table.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indices. `max_expansions` specifies the maximum number of terms that the fuzzy query expands to. The default is 50.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common Terms](#common-terms) queries.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common terms](#common-terms) queries.
`operator` | `or, and` | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match.
`phrase_slop` | `0` (default) or a positive integer | See `slop`.
`prefix_length` | `0` (default) or a positive integer | The number of leading characters that are not considered in fuzziness.
Expand Down
4 changes: 2 additions & 2 deletions docs/opensearch/index-alias.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
layout: default
title: Index Aliases
title: Index aliases
parent: OpenSearch
nav_order: 4
---

# Index alias
# Index aliases

An alias is a virtual index name that can point to one or more indices.

Expand Down
16 changes: 7 additions & 9 deletions docs/opensearch/index-data.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: default
title: Index Data
title: Index data
parent: OpenSearch
nav_order: 3
---
Expand All @@ -16,9 +16,9 @@ For situations in which new data arrives incrementally (for example, customer or

Before you can search data, you must *index* it. Indexing is the method by which search engines organize data for fast retrieval. The resulting structure is called, fittingly, an index.

In OpenSearch, the basic unit of data is a JSON *document*. Within an index, OpenSearch identifies each document using a unique *ID*.
In OpenSearch, the basic unit of data is a JSON *document*. Within an index, OpenSearch identifies each document using a unique ID.

A request to the index API looks like the following:
A request to the index API looks like this:

```json
PUT <index>/_doc/<id>
Expand All @@ -31,7 +31,6 @@ A request to the `_bulk` API looks a little different, because you specify the i
POST _bulk
{ "index": { "_index": "<index>", "_id": "<id>" } }
{ "A JSON": "document" }

```

Bulk data must conform to a specific format, which requires a newline character (`\n`) at the end of every line, including the last line. This is the basic format:
Expand All @@ -41,10 +40,9 @@ Action and metadata\n
Optional document\n
Action and metadata\n
Optional document\n

```

The document is optional, because `delete` actions do not require a document. The other actions (`index`, `create`, and `update`) all require a document. If you specifically want the action to fail if the document already exists, use the `create` action instead of the `index` action.
The document is optional, because `delete` actions don't require a document. The other actions (`index`, `create`, and `update`) all require a document. If you specifically want the action to fail if the document already exists, use the `create` action instead of the `index` action.
{: .note }

To index bulk data using the `curl` command, navigate to the folder where you have your file saved and run the following command:
Expand All @@ -55,14 +53,14 @@ curl -H "Content-Type: application/x-ndjson" -POST https://localhost:9200/data/_

If any one of the actions in the `_bulk` API fail, OpenSearch continues to execute the other actions. Examine the `items` array in the response to figure out what went wrong. The entries in the `items` array are in the same order as the actions specified in the request.

OpenSearch features automatic index creation when you add a document to an index that doesn't already exist. It also features automatic ID generation if you don't specify an ID in the request. This simple example automatically creates the movies index, indexes the document, and assigns it a unique ID:
OpenSearch automatically creates an index when you add a document to an index that doesn't already exist. It also automatically generates an ID if you don't specify an ID in the request. This simple example automatically creates the movies index, indexes the document, and assigns it a unique ID:

```json
POST movies/_doc
{ "title": "Spirited Away" }
```

Automatic ID generation has a clear downside: because the indexing request didn't specify a document ID, you can't easily update the document at a later time. Also, if you run this request 10 times, OpenSearch indexes this document as 10 different documents with unique IDs. To specify an ID of 1, use the following request, and note the use of PUT instead of POST:
Automatic ID generation has a clear downside: because the indexing request didn't specify a document ID, you can't easily update the document at a later time. Also, if you run this request 10 times, OpenSearch indexes this document as 10 different documents with unique IDs. To specify an ID of 1, use the following request (note the use of PUT instead of POST):

```json
PUT movies/_doc/1
Expand All @@ -83,7 +81,7 @@ PUT more-movies
OpenSearch indices have the following naming restrictions:

- All letters must be lowercase.
- Index names can't begin with `_` (underscore) or `-` (hyphen).
- Index names can't begin with underscores (`_`) or hyphens (`-`).
- Index names can't contain spaces, commas, or the following characters:

`:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, or `<`
Expand Down
23 changes: 10 additions & 13 deletions docs/opensearch/index-templates.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,14 @@
---
layout: default
title: Index Templates
title: Index templates
parent: OpenSearch
nav_order: 5
---

# Index template
# Index templates

Index templates let you initialize new indices with predefined mappings and settings. For example, if you continuously index log data, you can define an index template so that all of these indices have the same number of shards and replicas.

OpenSearch switched from `_template` to `_index_template` in version 7.8. Use `_template` for older versions of OpenSearch.
{: .note }

---

#### Table of contents
Expand All @@ -21,7 +18,7 @@ OpenSearch switched from `_template` to `_index_template` in version 7.8. Use `_

---

## Create template
## Create a template

To create an index template, use a POST request:

Expand Down Expand Up @@ -110,7 +107,7 @@ GET logs-2020-01-01

Any additional indices that match this pattern---`logs-2020-01-02`, `logs-2020-01-03`, and so on---will inherit the same mappings and settings.

## Retrieve template
## Retrieve a template

To list all index templates:

Expand Down Expand Up @@ -148,7 +145,7 @@ You can create multiple index templates for your indices. If the index name matc

The settings from the more recently created index templates override the settings of older index templates. So, you can first define a few common settings in a generic template that can act as a catch-all and then add more specialized settings as required.

An even better approach is to explicitly specify template priority using the `order` parameter. OpenSearch applies templates with lower priority numbers first and then overrides them with templates that have higher priority numbers.
An even better approach is to explicitly specify template priority using the `order` parameter. OpenSearch applies templates with lower priority numbers first and then overrides them with templates with higher priority numbers.

For example, say you have the following two templates that both match the `logs-2020-01-02` index and there’s a conflict in the `number_of_shards` field:

Expand Down Expand Up @@ -188,19 +185,19 @@ PUT _index_template/template-02

Because `template-02` has a higher `priority` value, it takes precedence over `template-01` . The `logs-2020-01-02` index would have the `number_of_shards` value as 3.

## Delete template
## Delete a template

You can delete an index template using its name, as shown in the following command:
You can delete an index template using its name:

```json
DELETE _index_template/daily_logs
```

## Index template options

You can specify the options shown in the following table:
You can specify the following template options:

Option | Type | Description | Required
:--- | :--- | :--- | :---
`priority` | `Number` | Specify the priority of the index template. | No
`create` | `Boolean` | Specify whether this index template should replace an existing one. | No
`priority` | `Number` | The priority of the index template. | No
`create` | `Boolean` | Whether this index template should replace an existing one. | No
Loading