Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch_indices_mappings_stats_fields metric doesn't appear to report accurate count #674

Open
hartfordfive opened this issue Jan 20, 2023 · 1 comment

Comments

@hartfordfive
Copy link
Contributor

hartfordfive commented Jan 20, 2023

Problem Description

I've recently enabled the es.indices_mappings flag to gain the ability to monitor and alert in the case of index field limits being exceeded. Unfortunately, I've noticed that the number reported by the exporter does not appear to be accurate. I've confirmed this by testing with a sample document observing the resulting response from Elasticsearch and then confirming via the field capabilities API. I've outlined the steps to reproduce this issue below.

Relevant Software Versions

  • Elasticsearch 8.4

Reproducing the problem

1. Create a test index

PUT /dummy-data-2023.01.20/

response:

{
	"acknowledged": true,
	"shards_acknowledged": true,
	"index": "dummy-data-2023.01.20"
}

2. Confirm index mapping is empty:

GET /dummy-data-2023.01.20/_mapping

response:

{
	"dummy-data-2023.01.20": {
		"mappings": {}
	}
}

and also via the field capabilities API:

GET /dummy-data-2023.01.20/_field_caps?fields=*&filters=-metadata

response:

{
	"indices": [
		"dummy-data-2023.01.20"
	],
	"fields": {}
}

3. Set the max number of fields in the index to 40

Take note that by default, each field has text type and a "keyword" via the "fields" property.

PUT /dummy-data-2023.01.20/_settings
{
	"index.mapping.total_fields.limit": 40
}

response:

{
	"acknowledged": true
}

and confirm the updated settings have been applied:

GET /dummy-data-2023.01.20/_settings

response:

{
	"dummy-data-2023.01.20": {
		"settings": {
			"index": {
				"routing": {
					"allocation": {
						"include": {
							"_tier_preference": "data_content"
						}
					}
				},
				"mapping": {
					"total_fields": {
						"limit": "40"
					}
				},
				"number_of_shards": "1",
				"provided_name": "dummy-data-2023.01.20",
				"creation_date": "1674226159773",
				"number_of_replicas": "1",
				"uuid": "pSGTiVyuTOWvb3H38VLrDA",
				"version": {
					"created": "8040399"
				}
			}
		}
	}
}

4. Index a document that has exactly the max number of fields

POST /dummy-data-2023.01.20/_doc
{
	"data": {
		"field1": "value",
		"field2": "value",
		"field3": "value",
		"field4": "value",
		"field5": "value",
		"field6": "value",
		"field7": "value",
		"field8": "value",
		"field9": 9,
		"field10": 10
	},
	"data2": {
		"field1": "value",
		"field2": "value",
		"field3": "value",
		"field4": "value",
		"field5": "value",
		"nested_field6": {
			"field1": "value",
			"field2": "value",
			"field3": "value",
			"field4": "value",
			"field5": 1222
		}
	}
}

The breakdown explaning the 40 fields. :

  • The top-level field data, data2 and data2.nested_field6 are each considered objects and each count as 1 field (3 total)
  • The nested fields data.field1 to data.field8 each have a text type and a keyword type via the "fields" property. (8 fields x 2 = 16 total)
  • The data.field9 and data.field10 fields are typed as an integers therefore only counts as a each one field (2 total).
  • The nested fields data2.field1 to data.field5 each have a text type and a keyword type via the "fields" property. (5 fields x 2 = 10 total)
  • The nested fields data2.nested_field6.field1 to data.nested_field6.field5 each have a text type and a keyword type via the "fields" property. (4 fields x 2 = 8 total)
  • The nested fields data2.nested_field6.field5 is typed as an integer thus counts as one field . (1 total)

5. View the updated index mapping

I've excluding the mapping of the index in this case for brevity and instead opted to display the fields with the capabilities API. With it, we can confirm it has exactly 40 fields:

GET /dummy-data-2023.01.20/_field_caps?fields=*&filters=-metadata

data.field7.keyword
data2.nested_field6.field2.keyword
data2.field4
data2.field5.keyword
data2.nested_field6.field1
data2.field5
data2.field4.keyword
data.field6.keyword
data
data2.nested_field6
data.field5.keyword
data2.field3.keyword
data2.nested_field6.field4.keyword
data2.nested_field6.field4
data2.nested_field6.field3.keyword
data2.nested_field6.field5
data2.nested_field6.field2
data2.nested_field6.field3
data.field2.keyword
data.field3.keyword
data2.field1.keyword
data2.field2.keyword
data.field4.keyword
data.field1.keyword
data.field1
data.field4
data.field5
data.field2
data.field3
data.field8
data.field9
data.field6
data.field7
data2
data.field10
data2.nested_field6.field1.keyword
data2.field1
data.field8.keyword
data2.field2
data2.field3

6. Attempt to index a document that will exceed than the max number of fields by 1 new field

POST /dummy-data-2023.01.20/_doc
{
	"data": {
		"field1": "value",
		"field2": "value",
		"field3": "value",
		"field4": "value",
		"field5": "value",
		"field6": "value",
		"field7": "value",
		"field8": "value",
		"field9": 9,
		"field10": 10
	},
	"data2": {
		"field1": "value",
		"field2": "value",
		"field3": "value",
		"field4": "value",
		"field5": "value",
		"nested_field6": {
			"field1": "value",
			"field2": "value",
			"field3": "value",
			"field4": "value",
			"field5": 1222
		},
		"field7": 1.3456
	}
}

In this case, the data2.field7 field attempts to triggers a mapping update but consequently fails:

{
	"error": {
		"root_cause": [
			{
				"type": "mapper_parsing_exception",
				"reason": "failed to parse"
			}
		],
		"type": "mapper_parsing_exception",
		"reason": "failed to parse",
		"caused_by": {
			"type": "illegal_argument_exception",
			"reason": "Limit of total fields [40] has been exceeded while adding new fields [1]"
		}
	},
	"status": 400
}

7. Observe index field value reported for this index

Running the following PromQL query:

elasticsearch_indices_mappings_stats_fields{index="dummy-data-2023.01.20"}

shows a result of 37 fields:

elasticsearch_indices_mappings_stats_fields{
  index="dummy-data-2023.01.20", 
  instance="es-exporter.local:443", 
  job="es-exporter", 
  namespace="elasticsearch", 
  pod_name="es-exporter-271vfb4822-x2v23"
 } 37

Which we know is incorrect because:

  1. We previously confirmed in step 5 that there were in fact 40 fields present in the mapping by using the field capabilities API.
  2. Attempting to add a single new field in step 6 caused a mapping error to be returned due to the limit being exceeded.

Possible Solution

I've started looking at the code and figuring out a possible fix although it
seems like a matter of updating the logic in the recursive counting function for the mappings. Another option that could eventually be possible is to completely switch over to use the field capabilities API although that wouldn't be efficient in the current state. The reason being is that API call returns a single list of fields that isn't grouped by index, regardless of how many indices are specified. In the meantime, if anyone agrees that this would be a beneficial improvement later on, I suggest you subscribe to this related Elastic github issue which outlines a possible API call that returns a list of unique fields for each index.

@hartfordfive
Copy link
Contributor Author

I believe I've identified and resolved the issue. I've tested with a few sample indices and compared against the field capabilities API and the field count seem to be accurate now. I'll follow up shortly with a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant