fix build, titles, formatting

lashae · May 30, 2014 · 6be6384 · 6be6384
1 parent 3f0c9b2
commit 6be6384
Show file tree

Hide file tree

Showing 12 changed files with 46 additions and 47 deletions.
diff --git a/300_Aggregations/05_overview.asciidoc b/300_Aggregations/05_overview.asciidoc
@@ -1,7 +1,4 @@
 
-== Elasticsearch offers more than just search
-
-
 Up until this point, the this book has been dedicated to search.  With search, 
 we have a query and we wish to find a subset of documents which
 match the query.  We are looking for the proverbial needle(s) in the

diff --git a/300_Aggregations/15_concepts_buckets.asciidoc b/300_Aggregations/15_concepts_buckets.asciidoc
@@ -1,5 +1,5 @@
 
-=== High-level concepts
+== High-level concepts
 
 Like the query DSL, aggregations have a _composable_ syntax: independent units
 of functionality can be mixed and matched to provide the custom behavior that 
@@ -14,12 +14,12 @@ _Metrics_:: Statistics calculated on the documents in a bucket.
 That's it!  Every aggregation is simply a combination of one or more buckets
 and zero or more metrics. To translate into rough SQL terms:
 
-[source]
-----
+[source,sql]
+--------------------------------------------------
 SELECT COUNT(color) <1>
 FROM table 
 GROUP BY color <2>
-----
+--------------------------------------------------
 <1> `COUNT(color)` is equivalent to a metric
 <2> `GROUP BY color` is equivalent to a bucket
 
@@ -29,7 +29,7 @@ to `COUNT()`, `SUM()`, `MAX()`, etc
 
 Let's dig into both of these concepts and see what they entail.
 
-==== Buckets
+=== Buckets
 
 A bucket is simply a collection of documents that meet a certain criteria.
 
@@ -51,7 +51,7 @@ partition documents in many different ways (by hour, by most popular terms, by
 age ranges, by geographical location, etc).  But fundamentally they all operate 
 on the same principle: partitioning documents based on a criteria.
 
-==== Metrics
+=== Metrics
 
 Buckets allow us to partition documents into useful subsets, but ultimately what
 we want is some kind of _metric_ calculated on those documents in each bucket.  
@@ -63,7 +63,7 @@ which are calculated using the document values.  In practical terms, metrics all
 you to calculate quantities such as the average salary, or the maximum sale price,
 or the 95th percentile for query latency.
 
-==== Combining the two
+=== Combining the two
 
 An aggregation is a combination of buckets and metrics.  An aggregation may have
 a single bucket, or a single metric, or one of each.  It may even have multiple

diff --git a/300_Aggregations/20_basic_example.asciidoc b/300_Aggregations/20_basic_example.asciidoc
@@ -1,6 +1,6 @@
 // This section feels like you're worrying too much about explaining the syntax, rather than the point of aggs.  By this stage in the book, people should be used to the ES api, so I think we can assume more.  I'd change the emphasis here and state that intention: we want to find out what the most popular colours are.  To do that we'll use a "terms" agg, which counts up every term in the "color" field and returns the 10 most popular.
 // Step two: Add a query, to show that the aggs are calculated live on the results from the user's query.
-=== Aggregation Test-drive
+== Aggregation Test-drive
 
 We could spend the next few pages defining the various aggregations
 and their syntax, but aggregations are truly best learned by example.
@@ -56,7 +56,7 @@ GET /cars/transactions/_search?search_type=count <1>
 
 // Add the search_type=count thing as a sidebar, so it doesn't get in the way
 <1> Because we don't care about search results, we are going to use the `count`
-<<search-type,`search_type`>, which will be faster.
+<<search-type,search_type>>, which will be faster.
 <2> Aggregations are placed under the top-level `"aggs"` parameter (the longer `"aggregations"`
 will also work if you prefer that)
 <3> We then name the aggregation whatever we want -- "popular_colors" in this example

diff --git a/300_Aggregations/28_bucket_metric_list.asciidoc b/300_Aggregations/28_bucket_metric_list.asciidoc
@@ -1,6 +1,6 @@
 // I'd limit this list to the metrics and rely on the obvious. You don't need to explain what min/max/avg etc are.  Then say that we'll discusss these more interesting metrics in later chapters: cardinality, percentiles, significant terms. The buckets I'd mention under the relevant section, eg Histo & Range, etc
 
-=== Available Buckets and Metrics
+== Available Buckets and Metrics
 
 There are a number of different buckets and metrics.  The reference documentation
 does a great job describing the various parameters and how they affect
@@ -9,7 +9,7 @@ link to the reference docs and provide a brief description.  Skim the list
 so that you know what is available, and check the reference docs when you need
 exact parameters.
 
-==== Buckets
+=== Buckets
 
     - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-global-aggregation.html[Global]: includes all documents in your index
     - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html[Filter]: only includes documents that match
@@ -32,7 +32,7 @@ exact parameters.
     - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geohashgrid-aggregation.html[Geohash Grid]: partitions documents according to
     what geohash grid they fall into
 
-==== Metrics
+=== Metrics
 
     - Individual statistics: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-min-aggregation.html[Min], http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-max-aggregation.html[Max], http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html[Avg], http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html[Sum]
     - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-stats-aggregation.html[Stats]: calculates min/mean/max/sum/count of documents in bucket

diff --git a/300_Aggregations/30_histogram.asciidoc b/300_Aggregations/30_histogram.asciidoc
@@ -16,8 +16,7 @@ The histogram works by specifying an interval.  If we were histogram'ing sale
 prices, you might specify an interval of 20,000.  This would create a new bucket
 every $20,000.  Documents are then sorted into buckets.
 
-Since you've already seen a few examples of aggregations, we'll go straight to a
-nested example.  For our dashboard, we want a bar chart of car sale prices, but we
+For our dashboard, we want a bar chart of car sale prices, but we
 also want to know the top selling make per price range.  This is easily accomplished
 using a `terms` bucket nested inside the `histogram`:
 
@@ -54,7 +53,7 @@ top make per price range
 As you can see, our query is built around the "price" aggregation, which contains
 a `histogram` bucket.  This bucket requires a numeric field to calculate
 buckets on, and an interval size.  The interval defines how "wide" each bucket
-is.  An interval of 20000 means we will have ranges [0-20000, 20000-40000, etc]
+is.  An interval of 20000 means we will have ranges `[0-20000, 20000-40000, ...]`
 
 Next, we define a nested bucket inside of the histogram.  This is a `terms` bucket
 over the "make" field.  There is also a new "size" parameter, which defines how

diff --git a/300_Aggregations/35_date_histogram.asciidoc b/300_Aggregations/35_date_histogram.asciidoc
@@ -1,5 +1,5 @@
 
-=== Looking at time
+== Looking at time
 
 If search is the most popular activity in Elasticsearch, building date
 histograms must be the second most popular.  Why would you want to use a date
@@ -30,14 +30,15 @@ Technically, yes.  A regular `histogram` bucket will work with dates.  However,
 it is not calendar-aware.  With the `date_histogram`, you can specify intervals
 such as `1 month`, which knows that February is shorter than December.  The
 `date_histogram` also has the advantage of being able to work with timezones,
-such as displaying a graph in the timezone of the user rather than the server.
+which allows you to customize graphs to the timezone of the user, not the server.
 
 The regular histogram will interpret dates as numbers, which means you must specify
 intervals in terms of milliseconds.  And the aggregation doesn't know about
 calendar intervals, which makes it largely useless for dates.
 ****
 
-Our first example will build a simple line chart: how many cars were sold each month?
+Our first example will build a simple line chart to answer the question:
+how many cars were sold each month?
 
 [source,js]
 --------------------------------------------------
@@ -65,7 +66,8 @@ per month.  This will give us the number of cars sold in each month.  An additio
 dates are simply represented as a numeric value.  This tends to make UI designers
 grumpy, however, so a prettier format can be specified using common date formatting.
 
-The response is both expected and a little surprising:
+The response is both expected and a little surprising (see if you can spot
+the "surprise"):
 
 [source,js]
 --------------------------------------------------
@@ -117,19 +119,20 @@ The response is both expected and a little surprising:
 The aggregation is represented in full.  As you can see, we have buckets
 which represent months, a count of docs in each month, and our pretty "key_as_string".
 
-==== Returning empty buckets
+=== Returning empty buckets
 
 Notice something odd about that last response?
 
-Yep, that's right.  We are missing months!  By default, the `date_histogram`
-and (`histogram` too, for that matter) only returns buckets which have a non-zero
+Yep, that's right.  We are missing a few months!  By default, the `date_histogram`
+(and `histogram` too) only returns buckets which have a non-zero
 document count.
 
 This means your histogram will be a minimal response.  Often, this is not the
 behavior you actually want.  For many applications, you would like to dump the
 response directly into a graphing library without doing any post-processing.
 
-There are two additional parameters we can set which will provide this behavior:
+Essentially, we want buckets even if they have a count of zero. There are two 
+additional parameters we can set which will provide this behavior:
 
 [source,js]
 --------------------------------------------------
@@ -153,7 +156,7 @@ GET /cars/transactions/_search?search_type=count
 --------------------------------------------------
 // SENSE: 300_Aggregations/35_date_histogram.json
 <1> This parameter forces empty buckets to be returned
-<2> While this parameter forces the entire year to be returned
+<2> This parameter forces the entire year to be returned
 
 The two additional parameters will force the response to return all months in the
 year, regardless of their doc count.  The `min_doc_count` is very understandable:
@@ -171,7 +174,7 @@ minimum value or _after_ the maximum value.
 The `extended_bounds` parameter does just that.  Once you add those two settings,
 you'll get a response that is easy to plug straight into your graphing libraries.
 
-==== Extended Example
+=== Extended Example
 
 Just like we've seen a dozen times already, buckets can be nested in buckets for
 more sophisticated behavior.  For illustration, we'll build an aggregation

diff --git a/300_Aggregations/40_scope.asciidoc b/300_Aggregations/40_scope.asciidoc
@@ -1,5 +1,5 @@
 
-=== Scoping Aggregations
+== Scoping Aggregations
 
 With all of the aggregation examples given so far, you may have noticed that we
 omitted a `query` from the search request.  The entire request was
@@ -136,11 +136,11 @@ by adding a search bar.  This allows the user to search for terms and see all
 of the graphs (which are powered by aggregations, and thus scoped to the query)
 update in real-time.  Try that with Hadoop!
 
-<TODO> Maybe add two screenshots of a Kibana dashboard that changes considerably
+//<TODO> Maybe add two screenshots of a Kibana dashboard that changes considerably
 when the search changes?
 
 
-==== Global Bucket
+=== Global Bucket
 
 You'll often want your aggregation to be scoped to your query.  But sometimes
 you'll want to search for some subset of data, but aggregate across _all_ of

diff --git a/300_Aggregations/45_filtering.asciidoc b/300_Aggregations/45_filtering.asciidoc
@@ -1,11 +1,11 @@
 
-=== Filtering Aggregations
+== Filtering Queries and Aggregations
 
 A natural extension to aggregation scoping is filtering.  Because the aggregation
 operates in the context of the query scope, any filter applied to the query
 will also apply to the aggregation.
 
-==== Filtered Query
+=== Filtered Query
 If we want to find all cars over $10,000 and also calculate the average price
 for those cars, we can simply use a `filtered` query:
 
@@ -36,7 +36,7 @@ query like we discussed in the last section.  The query (which happens to includ
 a filter) returns a certain subset of documents, and the aggregation operates
 on those documents.
 
-==== Filter bucket
+=== Filter bucket
 
 But what if you would like to filter just the aggregation results?  Imagine we
 have are building the search page for our car dealership.  We want to display
@@ -91,7 +91,7 @@ Since the `filter` bucket operates like any other bucket, you are free to nest
 other buckets and metrics inside.  All nested components will "inherit" the filter.
 This allows you to filter selective portions of the aggregation as required.
 
-==== Post Filter
+=== Post Filter
 
 So far, we have a way to filter the both search results and aggregations (a
 `filtered` query), as well as filtering individual portions of the aggregation

diff --git a/300_Aggregations/50_sorting_ordering.asciidoc b/300_Aggregations/50_sorting_ordering.asciidoc
@@ -1,5 +1,5 @@
 
-=== Sorting multi-value buckets
+== Sorting multi-value buckets
 
 Multi-value buckets -- like the `terms`, `histogram` and `date_histogram` -- 
 dynamically produce many buckets.  How does Elasticsearch decide what order
@@ -12,7 +12,7 @@ criteria: price, population, frequency.
 But sometimes you'll want to modify this sort order, and there are a few ways to
 do it depending on the bucket.
 
-==== Intrinsic sorts
+=== Intrinsic sorts
 
 These sort modes are "intrinsic" to the bucket...they operate on data that bucket
 generates such as `doc_count`.  They share the same syntax but differ slightly
@@ -47,7 +47,7 @@ one of several values:
 - `_key`: Sort by the numeric value of each bucket's key (conceptually similar to `_term`).
 Works only with `histogram` and `date_histogram`
 
-==== Sorting by a metric
+=== Sorting by a metric
 
 Often, you'll find yourself wanting to sort based on a metric's calculated value.
 For our car sales analytics dashboard, we may want to build a bar chart of
@@ -86,14 +86,8 @@ the name of the metric.  Some metrics, however, emit multiple values.  The
 `extended_stats` metric is a good example: it provides half a dozen individual 
 metrics.
 
-[INFO]
-.Applicable buckets
-====
-Metric-based sorting works with `terms`, `histogram` and `date_histogram`
-====
-
-If you want to sort on a multi-value metric, you just need to use the fully-qualified
-dot path:
+If you want to sort on a multi-value metric, you just need to use the
+dot-path to the metric of interest:
 
 [source,js]
 --------------------------------------------------
@@ -122,7 +116,7 @@ GET /cars/transactions/_search?search_type=count
 In this example we are sorting on the variance of each bucket, so that colors
 with the least variance in price will appear before those that have more variance.
 
-==== Sorting based on "deep" metrics
+=== Sorting based on "deep" metrics
 
 In the prior examples, the metric was a direct child of the bucket.  An average
 price was calculated for each term.  It is possible to sort on "deeper" metrics,

diff --git a/304_Approximate_Aggregations.asciidoc b/304_Approximate_Aggregations.asciidoc
@@ -1 +1,3 @@
+
+== Approximate Aggregations (todo)
 TODO
diff --git a/305_Significant_Terms.asciidoc b/305_Significant_Terms.asciidoc
@@ -1 +1,3 @@
+
+== Significant Terms (todo)
 TODO
diff --git a/306_Practical_Considerations.asciidoc b/306_Practical_Considerations.asciidoc
@@ -1 +1,3 @@
+
+== Practical Considerations (todo)
 TODO