- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
Add doc_count field mapper #58339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Closed
      
      
            csoulios
  wants to merge
  53
  commits into
  elastic:feature/aggregate-metrics
from
csoulios:doc_count-field-mapper
  
      
      
   
      
    
  
     Closed
                    Add doc_count field mapper #58339
Changes from all commits
      Commits
    
    
            Show all changes
          
          
            53 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      4b5fab3
              
                Initial version of doc_count field mapper
              
              
                csoulios cd515b3
              
                added tests
              
              
                csoulios 655e112
              
                Build fixes
              
              
                csoulios db13d83
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios 191d793
              
                Added tests for doc_count fieldmapper
              
              
                csoulios 5f81bee
              
                doc count tests
              
              
                csoulios dab8219
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios ecdc603
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios 520ac9a
              
                Resolve conflicts after merge from master
              
              
                csoulios 676ffc6
              
                Added yaml test for doc_count field type
              
              
                csoulios 7c7139c
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios d3b9c45
              
                Minor changes to test
              
              
                csoulios c36ecac
              
                Fix issue with not-registering field mapper
              
              
                csoulios 4dca391
              
                Simplify terms agg test
              
              
                csoulios 912d943
              
                Add doc_count provider in the buckets aggregator
              
              
                csoulios be46a00
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios c0f23ae
              
                Initialize doc_count provider once
              
              
                csoulios f7b43c1
              
                Added tests for FieldBasedDocCountProvider
              
              
                csoulios 5e1b96a
              
                Added more tests to DocCountFieldMapper
              
              
                csoulios 80d832b
              
                Fixed NPE at AggregatorTestCase
              
              
                csoulios 1e8b472
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios e24d680
              
                Updated branch to fix build after merge
              
              
                csoulios 74c727b
              
                Added validation for single doc_count field
              
              
                csoulios cd2c84d
              
                Added version skips to fix broken tests
              
              
                csoulios 91246eb
              
                Added documentation for doc_count
              
              
                csoulios 77aa346
              
                Changes to address review comments:
              
              
                csoulios 39c43a0
              
                Use _doc_count as Lucene field for doc count
              
              
                csoulios 8ca3fbc
              
                Minor change: field rename
              
              
                csoulios 83929cb
              
                Minor change to yml test.
              
              
                csoulios 848fc77
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios 0a1731d
              
                Fix errors from merge
              
              
                csoulios 82f092a
              
                Converted _doc_count to metadata field type
              
              
                csoulios ba92359
              
                Throw an error if parsed value is not a number
              
              
                csoulios cb61366
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios 522c385
              
                Make _doc_count field a metadata field
              
              
                csoulios df2a2eb
              
                Fixed broken tests
              
              
                csoulios 838436f
              
                Fix bug in low cardinality ordinal terms aggs
              
              
                csoulios 4a92c80
              
                Update docs that _doc_count is a metadata field
              
              
                csoulios 5d6d037
              
                Fix broken ML tests
              
              
                csoulios 0ff6fe1
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios 23e4b30
              
                Fix errors after merge
              
              
                csoulios b258653
              
                Addressed review comments
              
              
                csoulios f5ed1df
              
                Addressed reviewer comments
              
              
                csoulios 2fcdcf6
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios 4138d16
              
                Added DocCountFieldTypeTests
              
              
                csoulios 5d38b7f
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios 654847e
              
                Fix errors after merge
              
              
                csoulios 7b7ca43
              
                Make composite agg respect _doc_count field
              
              
                csoulios ce44e87
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios 5621c44
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios 1d969a1
              
                DocCountProvider rethrows IOException instead of swallowing it
              
              
                csoulios cb05034
              
                Set familyTypeName of _doc_count to integer
              
              
                csoulios d7d80f4
              
                Merge branch 'feature/aggregate-metrics' into doc_count-field-mapper
              
              
                csoulios File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| [[mapping-doc-count-field]] | ||
| === `_doc_count` data type | ||
| ++++ | ||
| <titleabbrev>_doc_count</titleabbrev> | ||
| ++++ | ||
|  | ||
| Bucket aggregations always return a field named `doc_count` showing the number of documents that were aggregated and partitioned | ||
| in each bucket. Computation of the value of `doc_count` is very simple. `doc_count` is incremented by 1 for every document collected | ||
| in each bucket. | ||
|  | ||
| While this simple approach is effective when computing aggregations over individual documents, it fails to accurately represent | ||
| documents that store pre-aggregated data (such as `histogram` or `aggregate_metric_double` fields), because one summary field may | ||
| represent multiple documents. | ||
|  | ||
| To allow for correct computation of the number of documents when working with pre-aggregated data, we have introduced a | ||
| metadata field type named `_doc_count`. `_doc_count` must always be a positive integer representing the number of documents | ||
| aggregated in a single summary field. | ||
|  | ||
| When field `_doc_count` is added to a document, all bucket aggregations will respect its value and increment the bucket `doc_count` | ||
| by the value of the field. If a document does not contain any `_doc_count` field, `_doc_count = 1` is implied by default. | ||
|  | ||
| [IMPORTANT] | ||
| ======== | ||
| * A `_doc_count` field can only store a single positive integer per document. Nested arrays are not allowed. | ||
| * If a document contains no `_doc_count` fields, aggregators will increment by 1, which is the default behavior. | ||
| ======== | ||
|  | ||
| [[mapping-doc-count-field-example]] | ||
| ==== Example | ||
|  | ||
| The following <<indices-create-index, create index>> API request creates a new index with the following field mappings: | ||
|  | ||
| * `my_histogram`, a `histogram` field used to store percentile data | ||
| * `my_text`, a `keyword` field used to store a title for the histogram | ||
|  | ||
| [source,console] | ||
| -------------------------------------------------- | ||
| PUT my_index | ||
| { | ||
| "mappings" : { | ||
| "properties" : { | ||
| "my_histogram" : { | ||
| "type" : "histogram" | ||
| }, | ||
| "my_text" : { | ||
| "type" : "keyword" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| -------------------------------------------------- | ||
|  | ||
| The following <<docs-index_,index>> API requests store pre-aggregated data for | ||
| two histograms: `histogram_1` and `histogram_2`. | ||
|  | ||
| [source,console] | ||
| -------------------------------------------------- | ||
| PUT my_index/_doc/1 | ||
| { | ||
| "my_text" : "histogram_1", | ||
| "my_histogram" : { | ||
| "values" : [0.1, 0.2, 0.3, 0.4, 0.5], | ||
| "counts" : [3, 7, 23, 12, 6] | ||
| }, | ||
| "_doc_count": 45 <1> | ||
| } | ||
|  | ||
| PUT my_index/_doc/2 | ||
| { | ||
| "my_text" : "histogram_2", | ||
| "my_histogram" : { | ||
| "values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5], | ||
| "counts" : [8, 17, 8, 7, 6, 2] | ||
| }, | ||
| "_doc_count_": 62 <1> | ||
| } | ||
| -------------------------------------------------- | ||
| <1> Field `_doc_count` must be a positive integer storing the number of documents aggregated to produce each histogram. | ||
|  | ||
| If we run the following <<search-aggregations-bucket-terms-aggregation, terms aggregation>> on `my_index`: | ||
|  | ||
| [source,console] | ||
| -------------------------------------------------- | ||
| GET /_search | ||
| { | ||
| "aggs" : { | ||
| "histogram_titles" : { | ||
| "terms" : { "field" : "my_text" } | ||
| } | ||
| } | ||
| } | ||
| -------------------------------------------------- | ||
|  | ||
| We will get the following response: | ||
|  | ||
| [source,console-result] | ||
| -------------------------------------------------- | ||
| { | ||
| ... | ||
| "aggregations" : { | ||
| "histogram_titles" : { | ||
| "doc_count_error_upper_bound": 0, | ||
| "sum_other_doc_count": 0, | ||
| "buckets" : [ | ||
| { | ||
| "key" : "histogram_2", | ||
| "doc_count" : 62 | ||
| }, | ||
| { | ||
| "key" : "histogram_1", | ||
| "doc_count" : 45 | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| } | ||
| -------------------------------------------------- | ||
| // TESTRESPONSE[skip:test not setup] | ||
        
          
          
            150 changes: 150 additions & 0 deletions
          
          150 
        
  ...api-spec/src/main/resources/rest-api-spec/test/search.aggregation/370_doc_count_field.yml
  
  
      
      
   
        
      
      
    
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,150 @@ | ||
| setup: | ||
| - do: | ||
| indices.create: | ||
| index: test_1 | ||
| body: | ||
| settings: | ||
| number_of_replicas: 0 | ||
| mappings: | ||
| properties: | ||
| str: | ||
| type: keyword | ||
| number: | ||
| type: integer | ||
|  | ||
| - do: | ||
| bulk: | ||
| index: test_1 | ||
| refresh: true | ||
| body: | ||
| - '{"index": {}}' | ||
| - '{"_doc_count": 10, "str": "abc", "number" : 500, "unmapped": "abc" }' | ||
| - '{"index": {}}' | ||
| - '{"_doc_count": 5, "str": "xyz", "number" : 100, "unmapped": "xyz" }' | ||
| - '{"index": {}}' | ||
| - '{"_doc_count": 7, "str": "foo", "number" : 100, "unmapped": "foo" }' | ||
| - '{"index": {}}' | ||
| - '{"_doc_count": 1, "str": "foo", "number" : 200, "unmapped": "foo" }' | ||
| - '{"index": {}}' | ||
| - '{"str": "abc", "number" : 500, "unmapped": "abc" }' | ||
|  | ||
| --- | ||
| "Test numeric terms agg with doc_count": | ||
| - skip: | ||
| version: " - 7.99.99" | ||
| reason: "Doc count fields are only implemented in 8.0" | ||
|  | ||
| - do: | ||
| search: | ||
| rest_total_hits_as_int: true | ||
| body: { "size" : 0, "aggs" : { "num_terms" : { "terms" : { "field" : "number" } } } } | ||
|  | ||
| - match: { hits.total: 5 } | ||
| - length: { aggregations.num_terms.buckets: 3 } | ||
| - match: { aggregations.num_terms.buckets.0.key: 100 } | ||
| - match: { aggregations.num_terms.buckets.0.doc_count: 12 } | ||
| - match: { aggregations.num_terms.buckets.1.key: 500 } | ||
| - match: { aggregations.num_terms.buckets.1.doc_count: 11 } | ||
| - match: { aggregations.num_terms.buckets.2.key: 200 } | ||
| - match: { aggregations.num_terms.buckets.2.doc_count: 1 } | ||
|  | ||
|  | ||
| --- | ||
| "Test keyword terms agg with doc_count": | ||
| - skip: | ||
| version: " - 7.99.99" | ||
| reason: "Doc count fields are only implemented in 8.0" | ||
| - do: | ||
| search: | ||
| rest_total_hits_as_int: true | ||
| body: { "size" : 0, "aggs" : { "str_terms" : { "terms" : { "field" : "str" } } } } | ||
|  | ||
| - match: { hits.total: 5 } | ||
| - length: { aggregations.str_terms.buckets: 3 } | ||
| - match: { aggregations.str_terms.buckets.0.key: "abc" } | ||
| - match: { aggregations.str_terms.buckets.0.doc_count: 11 } | ||
| - match: { aggregations.str_terms.buckets.1.key: "foo" } | ||
| - match: { aggregations.str_terms.buckets.1.doc_count: 8 } | ||
| - match: { aggregations.str_terms.buckets.2.key: "xyz" } | ||
| - match: { aggregations.str_terms.buckets.2.doc_count: 5 } | ||
|  | ||
| --- | ||
|  | ||
| "Test unmapped string terms agg with doc_count": | ||
| - skip: | ||
| version: " - 7.99.99" | ||
| reason: "Doc count fields are only implemented in 8.0" | ||
| - do: | ||
| bulk: | ||
| index: test_2 | ||
| refresh: true | ||
| body: | ||
| - '{"index": {}}' | ||
| - '{"_doc_count": 10, "str": "abc" }' | ||
| - '{"index": {}}' | ||
| - '{"str": "abc" }' | ||
| - do: | ||
| search: | ||
| index: test_2 | ||
| rest_total_hits_as_int: true | ||
| body: { "size" : 0, "aggs" : { "str_terms" : { "terms" : { "field" : "str.keyword" } } } } | ||
|  | ||
| - match: { hits.total: 2 } | ||
| - length: { aggregations.str_terms.buckets: 1 } | ||
| - match: { aggregations.str_terms.buckets.0.key: "abc" } | ||
| - match: { aggregations.str_terms.buckets.0.doc_count: 11 } | ||
|  | ||
| --- | ||
| "Test composite str_terms agg with doc_count": | ||
| - skip: | ||
| version: " - 7.99.99" | ||
| reason: "Doc count fields are only implemented in 8.0" | ||
| - do: | ||
| search: | ||
| rest_total_hits_as_int: true | ||
| body: { "size" : 0, "aggs" : | ||
| { "composite_agg" : { "composite" : | ||
| { | ||
| "sources": ["str_terms": { "terms": { "field": "str" } }] | ||
| } | ||
| } | ||
| } | ||
| } | ||
|  | ||
| - match: { hits.total: 5 } | ||
| - length: { aggregations.composite_agg.buckets: 3 } | ||
| - match: { aggregations.composite_agg.buckets.0.key.str_terms: "abc" } | ||
| - match: { aggregations.composite_agg.buckets.0.doc_count: 11 } | ||
| - match: { aggregations.composite_agg.buckets.1.key.str_terms: "foo" } | ||
| - match: { aggregations.composite_agg.buckets.1.doc_count: 8 } | ||
| - match: { aggregations.composite_agg.buckets.2.key.str_terms: "xyz" } | ||
| - match: { aggregations.composite_agg.buckets.2.doc_count: 5 } | ||
|  | ||
|  | ||
| --- | ||
| "Test composite num_terms agg with doc_count": | ||
| - skip: | ||
| version: " - 7.99.99" | ||
| reason: "Doc count fields are only implemented in 8.0" | ||
| - do: | ||
| search: | ||
| rest_total_hits_as_int: true | ||
| body: { "size" : 0, "aggs" : | ||
| { "composite_agg" : | ||
| { "composite" : | ||
| { | ||
| "sources": ["num_terms" : { "terms" : { "field" : "number" } }] | ||
| } | ||
| } | ||
| } | ||
| } | ||
|  | ||
| - match: { hits.total: 5 } | ||
| - length: { aggregations.composite_agg.buckets: 3 } | ||
| - match: { aggregations.composite_agg.buckets.0.key.num_terms: 100 } | ||
| - match: { aggregations.composite_agg.buckets.0.doc_count: 12 } | ||
| - match: { aggregations.composite_agg.buckets.1.key.num_terms: 200 } | ||
| - match: { aggregations.composite_agg.buckets.1.doc_count: 1 } | ||
| - match: { aggregations.composite_agg.buckets.2.key.num_terms: 500 } | ||
| - match: { aggregations.composite_agg.buckets.2.doc_count: 11 } | ||
|  | 
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random thought while reading the restrictions, is it possible to define
_doc_countas an object? We should forbid that as well if it isn't already... but i suspect the current restrictions prevent it from being an object too.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an assertion that the input is a
VALUE_NUMBERin theparseCreateField()method.elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/DocCountFieldMapper.java
Line 113 in 7b7ca43
Is there anything else that should be added?