elastic · debadair · Jun 17, 2019 · Jun 7, 2019 · Jun 17, 2019 · Jun 17, 2019
diff --git a/docs/reference/index.asciidoc b/docs/reference/index.asciidoc
@@ -10,6 +10,8 @@
 
 include::../Versions.asciidoc[]
 
+include::intro.asciidoc[]
+
 include::getting-started.asciidoc[]
 
 include::setup.asciidoc[]

diff --git a/docs/reference/intro.asciidoc b/docs/reference/intro.asciidoc
@@ -0,0 +1,244 @@
+[[elasticsearch-intro]]
+= You know, for search (and analysis)
+[partintro]
+--
+{es} is the distributed search and analytics engine at the heart of
+the {stack}. {ls} and {beats} facilitate collecting, aggregating, and
+enriching your data and storing it in {es}. {kib} enables you to
+interactively explore, visualize, and share insights into your data and manage
+and monitor the stack. {es} is where the indexing, search, and analysis
+magic happen.
+
+{es} provides real-time search and analytics for all types of data. Whether you
+have structured or unstructured text, numerical data, or geospatial data,
+{es} can efficiently store and index it in a way that supports fast searches.
+You can go far beyond simple data retrieval and aggregate information to discover
+trends and patterns in your data. And as your data and query volume grows, the
+distributed nature of {es} enables your deployment to grow seamlessly right
+along with it.
+
+While not _every_ problem is a search problem, {es} offers speed and flexibility
+to handle data in a wide variety of use cases:
+
+* Add a search box to an app or website
+* Store and analyze logs, metrics, and security event data
+* Use machine learning to automatically model the behavior of your data in real
+  time
+* Automate business workflows using {es} as a storage engine
+* Manage, integrate, and analyze spatial information using {es} as a geographic
+  information system (GIS)
+* Store and process genetic data using {es} as a bioinformatics research tool
+
+We’re continually amazed by the novel ways people use search. Whether
+your use case is similar to one of these, or you're using {es} to tackle a new
+problem, the way you work with your data, documents, and indices in {es} is
+the same.
+--
+
+[[documents-indices]]
+== Data in: documents and indices
+
+{es} is a distributed document store. Instead of storing information as rows of
+columnar data, {es} stores complex data structures that have been serialized
+as JSON documents. When you have multiple {es} nodes in a cluster, stored
+documents are distributed across the cluster and can be accessed immediately
+from any node.
+
+When a document is stored, it is indexed and fully searchable in near
+real-time--within 1 second. {es} uses a data structure called an
+inverted index that supports very fast full-text searches. An inverted index
+lists every unique word that appears in any document and identifies all of the
+documents each word occurs in.
+
+An index can be thought of as an optimized collection of documents and each
+document is a collection of fields, which are the key-value pairs that contain
+your data. By default, {es} indexes all data in every field and each indexed
+field has a dedicated, optimized data structure. For example, text fields are
+stored in inverted indices, and numeric and geo fields are stored in BKD trees.
+The ability to use the per-field data structures to assemble and return search
+results is what makes {es} so fast.
+
+{es} also has the ability to be schema-less, which means that documents can be
+indexed without explicitly specifying how to handle each of the different fields
+that might occur in a document. When dynamic mapping is enabled, {es}
+automatically detects and adds new fields to the index settings. This default
+behavior makes it easy to start indexing and exploring your data--just start
+indexing documents and {es} will detect and map booleans, floating point and
+integer values, dates, and strings to the appropriate {es} datatypes.
+
+Ultimately, however, you know more about your data and how you want to use it
+than {es} can. You can define rules to control dynamic mapping and use custom
+mappings to take full control of how fields are stored and indexed.
+
+Defining your own mappings enables you to:
+
+* Distinguish between full-text string fields and exact value string fields
+* Perform language-specific text analysis
+* Optimize fields for partial matching
+* Use custom date formats
+* Use data types such as  geo_point and geo_shape that cannot be automatically
+detected
+
+It’s often useful to index the same field in different ways for different
+purposes. For example, you might want to index a string field as both a text
+field for full-text search and as a keyword field for sorting or aggregating
+your data. Or, you might choose to use more than one language analyzer to
+process the contents of a string field that contains user input.
+
+The analysis chain that is applied to a full-text field during indexing is also
+used at search time. When you query a full-text field, the query text undergoes
+the same analysis before the terms are looked up in the index.
+
+[[search-analyze]]
+== Information out: search and analyze
+
+While you can use {es} as a document store and retrieve documents and their
+metadata, the real power comes from being able to easily access the full suite
+of search capabilities built on the Apache Lucene search engine library.
+
+{es} provides a simple, coherent REST API for managing your cluster and indexing
+and searching your data.  For testing purposes, you can easily submit requests
+directly from the command line or through the Developer Console in {kib}. From
+your applications, you can use the
+https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} client]
+for your language of choice: Java, JavaScript, Go, .NET, PHP, Perl, Python
+or Ruby.
+
+[float]
+[[search-data]]
+=== Searching your data
+
+The {es} REST APIs support structured queries, full text queries, and the
+ability to combine them into more complex queries. Structured queries are
+similar to the types of queries you can construct in SQL. For example, you
+could search the `gender` and `age` fields in your `employee` index and sort the
+matches by the `hire_date` field. Full-text queries find all documents that
+match the query string and return them sorted by _relevance_--how good a match
+they are for your search terms.
+
+In addition to searching for individual terms, you can perform phrase searches,
+similarity searches, prefix searches, and get autocomplete suggestions.
+
+Have geospatial or other numerical data {es} you want to search? {es} indexes
+non-textual data in highly-optimized data structures that support
+high-performance geo and numerical queries.
+
+You can access all of these search capabilities using {es}'s
+comprehensive JSON-style query language (Query DSL). You can also
+construct SQL style queries to search and aggregate data natively inside
+{es}, and JDBC and ODBC drivers enable a broad range of third-party
+applications to interact with {es} via SQL.
+
+[float]
+[[analyze-data]]
+=== Analyzing your data
+
+{es} aggregations enable you to build complex summaries of your data and gain
+insight into key metrics, patterns, and trends. Instead of just finding the
+proverbial “needle in a haystack”, aggregations enable you to answer questions
+like:
+
+* How many needles are in the haystack?
+* What is the average length of the needles?
+* What is the median length of the needles, broken down by manufacturer?
+* How many needles were added to the haystack in each of the last six months?
+
+You can also use aggregations to answer more subtle questions, such as:
+
+* What are your most popular needle manufacturers?
+* Are there any unusual or anomalous clumps of needles?
+
+Because aggregations leverage the same data-structures used for search, they are
+also very fast. This enables you to analyze and visualize your data in real time.
+Your reports and dashboards update as your data changes so you can take action
+based on the latest information.
+
+What’s more, aggregations operate alongside search requests. You can search
+documents, filter results, and perform analytics at the same time, on the same
+data, in a single request. And because aggregations are calculated in the
+context of a particular search, you’re not just displaying a count of all
+four-star hotels, you’re displaying a count of the four-star hotels
+that match your users' search criteria.
+
+[float]
+[[more-features]]
+==== But wait, there’s more
+
+Want to automate the analysis of your time-series data? You can use the machine
+learning features to create accurate baselines of normal behavior in your data
+and identify anomalous patterns. With machine learning, you can detect:
+
+* Anomalies related to temporal deviations in values, counts, or frequencies
+* Statistical rarity
+* Unusual behaviors for a member of a population
+
+And the best part? You can do this without having to specify algorithms, models,
+or other data science-related configurations.
+
+[[scalability]]
+== Scalability and resilience: clusters, nodes, and shards
+
+{es} is built to be always available and to scale with your needs. It does this
+by being distributed by nature. You can add servers (nodes) to a cluster to
-by being distributed by nature. You can add servers (nodes) to a cluster to
+by being distributed by nature. When you add servers (nodes) to a cluster to
-by being distributed by nature. You can add servers (nodes) to a cluster to
+by being distributed by nature. When you add servers (nodes) to a cluster to
+increase capacity and {es} automatically distributes your data and query load
-increase capacity and {es} automatically distributes your data and query load
+increase capacity, {es} automatically distributes your data and query load
-increase capacity and {es} automatically distributes your data and query load
+increase capacity, {es} automatically distributes your data and query load
+across all of the available nodes. No need to overhaul your application, {es}
+knows how to balance multi-node clusters to provide scale and high availability.
+The more nodes, the merrier.
+
+How does this work? Under the covers, an {es} index is really just a logical
+grouping of one or more physical shards, where each shard is actually a
+self-contained index. By distributing the documents in an index across multiple
+shards, and distributing those shards across multiple nodes, {es} can ensure
-shards, and distributing those shards across multiple nodes, {es} can ensure
+shards, and distributing those shards across multiple nodes, {es} ensures
-shards, and distributing those shards across multiple nodes, {es} can ensure
+shards, and distributing those shards across multiple nodes, {es} ensures
+redundancy to protect against hardware failures and benefit from increased
+query capacity as nodes are added to a cluster. As the cluster grows (or shrinks),
+{es} automatically migrates shards to rebalance the cluster.
+
+There are two types of shards: primaries and replicas. Each document in an index
+belongs to one primary shard. A replica shard is a copy of a primary shard.
+Replicas provide redundant copies of your data to  protect against hardware
+failure and provide increased capacity to serve read requests
+like searching or retrieving a document.
+
+The number of primary shards in an index is fixed at the time that an index is
+created, but the number of replica shards can be changed at any time, without
+interrupting indexing or query operations.
+
+There are a number of performance considerations and trade offs with respect
+to shard size and the number of primary shards configured for an index. The more
+shards, the more overhead there is simply in maintaining those indices. The
+larger the shard size, the longer it takes to move shards around when {es}
+needs to rebalance a cluster. Querying lots of small shards makes the processing
+per shard faster, but more queries means more overhead, so querying a smaller
+number of larger shards might be faster. In short...it depends. The best way
+to determine the optimal configuration for your use case is to through testing
+with your own data and queries.
+
+[float]
+[[disaster-ccr]]
+=== In case of disaster
+
+For performance reasons, the nodes within a cluster need to be on the same
+network. Balancing shards in a cluster across nodes in different data centers
+simply takes too long. But high-availability architectures demand that you avoid
+putting all of your eggs in one basket. In the event of a major outage in one
+location, servers in another location need to be able to take over. Seamlessly.
+The answer? Cross-cluster replication (CCR).
+
+CCR provides a way to automatically synchronize indices from your primary cluster
+to a secondary remote cluster that can serve as a hot backup. If the primary
+cluster fails, the secondary cluster can take over. You can also use CCR to
+create secondary clusters to serve read requests in geo-proximity to your users.
+
+Cross cluster replication is active-passive. The index on the primary cluster is
+the active leader index and handles all write requests. Indices replicated to
+secondary clusters are read-only followers.
+
+[float]
+[[admin]]
+=== Care and feeding
+
+As with any enterprise system, you need tools to secure, manage, and
+monitor your {es} clusters. Security, monitoring, and administrative features
+that are integrated into {es} enable you to use {kib} as a control center for
+managing a cluster. Features like data rollups and index lifecycle
+management help you intelligently manage your data over time.