GET field count per index API #68947

Leaf-Lin · 2021-02-12T00:44:29Z

Today, there's an API to get per index field or all fields:

GET <index>/_field_caps?fields=*

which returns:

{
  "indices" : [
    "my_index",
    "my_index2",
    "my_index3",
    ...
  ],
  "fields" : {
    "my_field1" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true
      }
    },
    "my_field2" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true
      }
    },
    ...
  }
}

But the output from the above doesn't have break down by index, so it's not easy to troubleshoot which index is having mapping explosion problem.

I can potentially loop over my indices list and running GET <index>/field_caps?fields=* API to extract the field length per index, but again, this is not ideal.

It would be most useful if Elasticsearch can have an API out-of-the-box to count the number of fields breakdown by the indices, expected output should look like:

index_name      field_count
my_index        10
my_index2       20
my_index3       30

This can be part of the GET indices/stats or cat indices API.

Btw, kibana might be able to take advantage of the API instead of doing its own aggregation/counting to show index pattern field count:

Not sure if the following is the best script to extract the field count, but with some jq over the GET _mapping output, I am able to get the desired format:

curl -XGET https://localhost:9200/_mapping > mapping.json
jq -r '[to_entries[]| .key as $index|  [ .value.mappings|to_entries[]|{(.key): ([.value|..|.type?|select(.!=null)]|length)} ]|map(to_entries)| flatten| sort_by(.value)|from_entries as $types | {index_name: $index, index_field_count: ([$types|to_entries[].value]|add)}]|.[]|.|map(.)|@tsv ' mapping.json  | column -t  | sort

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-02-12T00:45:00Z

Pinging @elastic/es-core-features (Team:Core/Features)

mbrunnert · 2022-08-08T23:45:04Z

This feature is becoming more valuable now that the documentation recommends sizing heap based on field counts

geekpete · 2022-11-09T00:26:32Z

Seems we have field counts with Field Usage Stats api, but it's not so consumable with some top level counts.

GET /*/_field_usage_stats

Output is per shard and recursive, so getting totals is still a manual exercise.
Potential for a feature request to add some extended parameter to do recursive field counts and show totals at each level.

And it's only doing actively used fields not all fields in the mapping? So that could be something that extended could also do.

Hythloday-zero · 2023-02-08T18:48:27Z

https://www.elastic.co/guide/en/kibana/master/data-views-api-get.html#data-views-api-get works to get the raw info similar to the Kibana UI screenshot, but it doesn't get us a fields.count value which would be extremely valuable.

shaigbdb · 2023-03-09T10:43:14Z

Seems we have field counts with Field Usage Stats api, but it's not so consumable with some top level counts.

GET /*/_field_usage_stats

This returns field usage information - I'm not seeing a field count in the output.

geekpete · 2023-03-09T21:20:40Z

Sorry I meant the response returns the lists of fields but counting them is something you need to do manually.

inqueue · 2023-06-02T14:01:11Z

Bump. Is there any chance of getting this in the near term? Elasticsearch has already imposed a field limit via index.mapping.total_fields.limit for quite some time and having field count information available would be very useful for understanding the usage, especially for dynamically mapped indices. One shouldn't have to be a JQ 🥷 or require Kibana to get the index field count!

lucabelluccini · 2024-11-06T17:56:07Z

Especially following the introduction of index.mapping.total_fields.ignore_dynamic_beyond_limit via #96235 (and it being used in 8.14+ on several index templates of Fleet), it is crucial to get the field count at runtime in order to eventually automate / warn if the field count is reaching the limit.

A poor's man way of counting fields might be gron mappings of index | grep '.type =' -c, but it seems Elasticsearch doesn't only count "leaf" fields but also the parent field as object.

In addition, the count reported by a Kibana Data View is incorrect:

As if I try to add few fields to the index above, I get a rejection due to field limit exceeded (10000).

 (status=400): {\"type\":\"document_parsing_exception\",\"reason\":\"[1:3999] failed to parse: Limit of total fields [10000] has been exceeded while adding new fields [19]\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [10000] has been exceeded while adding new fields [19]\"}},

FYI @flash1293 / @felixbarny

felixbarny · 2024-11-06T18:35:06Z

I agree that this would be very useful. I feel like we're re-implementing this on the client side in multiple places (and not always in the correct way). For example, in the dataset quality page. cc @achyutjhunjhunwala @gbamparop.

I think exposing this in some ES API would be relatively straightforward as we already have a method for this in MappingLookup:

elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/MappingLookup.java

Lines 250 to 255 in 8db9181

    
               /** 
        
                * Returns the total number of fields defined in the mappings, including field mappers, object mappers as well as runtime fields. 
        
                */ 
        
               public long getTotalFieldsCount() { 
        
                   return totalFieldsCount; 
        
               }

VimCommando · 2024-11-07T17:33:13Z

On a related note I would love to see field counts, and breakdown by field type, in the <index>/_stats API. It would make alerting for mapping explosions trivial.

felixbarny · 2024-11-07T19:43:34Z

I've created a quick POC for adding the fields count to the index stats API: #116438

elasticsearchmachine · 2024-11-08T12:50:44Z

Pinging @elastic/es-data-management (Team:Data Management)

felixbarny · 2024-11-13T16:56:55Z

I discussed this with @dakrone. A challenge with adding this to the index stats API is that this API isn't available on serverless, as it's considered to be exposing too many low-level details. However, for this use case we'll want to have a user-accessible API.

We discussed adding this to get mappings. It has some technical challenges as get mappings is answered by the coordinating node, getting the unparsed mapping from cluster state. To call MappingLookup#getTotalFieldsCount, we would either need to delegate to the primary shard, or parse the mapping on the coordinating node. It'll also be a bit awkward to return a property for the field count from get mappings that's not allowed to be present in a put mappings request. It may break use cases where a client gets the mapping, modifies it, and updates it. Only adding the field count in the get mapping response when a verbose flag is enabled could mitigate the breaking aspect of it. The awkwardness is still there.

javanna · 2024-11-18T10:47:03Z

I can see how having clear field count per index is valuable, especially as this is reverse-engineered now in multiple places, and probably not aligned with the ES logic necessarily. I see the challenges with using index stats API, as well as get mappings. It sounds like this info could be added to field caps more easily, although it does not map strictly to the capabilities of fields.

felixbarny · 2024-11-18T11:13:09Z

I forgot to mention that we also discussed adding this to field caps. A challenge with that approach is that field caps isn't focused on a particular index, but at an index pattern. Even if you're looking at just a single data stream, it has multiple backing indices, where each may have a different number of fields. While we could add another section in the field_caps response that lists the number of fields for each matching index, it seems to go against the spirit of that API which aggregates data for fields across multiple indices. For this, we probably want an API that's focussed on indices rather than fields.

Is the cat indices API available on serverless?

dakrone · 2024-11-18T21:59:48Z

Is the cat indices API available on serverless?

Yes, cat indices is available for Serverless (though it should be a human-invoked thing, and not used or relied upon programmatically).

felixbarny · 2024-11-28T13:08:47Z

The index stats still seems like the most appropriate place to add this. It's index-centric and already fetches stats from the shards rather than just answering on the coordinating node, which makes accessing MappingLookup#getTotalFieldsCount straightforward. The only downside is that it doesn't work for serverless. But we could think about allowing the index stats api for serverless but limit the index metrics that we're exposing. But I'm not sure if there's precedent for partially exposing an API in serverless.

lucabelluccini · 2025-01-27T12:44:58Z

This is still causing a lot of pain to investigate problems with ignore_dynamic_beyond_limit introduced in elastic/kibana#178398

Main reason is the field count in Kibana is way lower than the "count" done on Elasticsearch side.

felixbarny · 2025-01-27T13:08:53Z

Main reason is the field count in Kibana is way lower than the "count" done on Elasticsearch side.

Is that something we can fix on the Kibana side? I agree that ideally, there should be an API in ES to expose that information but in the meantime, we could fix the bug in Kibana so that it has the same logic to count the number of fields.

javanna · 2025-02-05T10:09:21Z

Could we clarify why Kibana needs to replicate fields counting? I though that the idea behind using ignore_dynamic_beyond_limit was to have less total fields limit issues, as it's a graceful behaviour compared to rejecting docs.

lucabelluccini · 2025-02-05T12:42:08Z

The usability problem with ignore_dynamic_beyond_limit and not having the real field count in general is:

Kibana says the index has N fields, where N << limit
ES ignores fields because the way ES counts fields is different from what is reported from Kibana. ES internally "counts" K fields, K > limit

Therefore, users and support have hard time to understand why fields started to be ignored as Kibana tells there are 740 fields in a given index/data view.

We would like to have the "real" count used by Elasticsearch which triggered ignoring fields.
This can help automation/alerts/ui hints to tell users: "this data view contains indices with ignored fields because those indices X,Y,Z exceeded the limit of dynamic fields".

On top of this, maybe we should file it separately, is to trigger a warning if a query makes use of a field which is ignored in 1 or more indices being searched on.

javanna · 2025-02-05T15:30:48Z

Maybe a silly question, but I think I am missing why Kibana needs to be aware of how many fields there are in the mappings in the first place.

lucabelluccini · 2025-02-05T16:09:20Z

The original ask is to have an API on Elasticsearch side to expose the count.
It would be very useful for monitoring, automated alerts, analyzers - and troubleshooting.

As users typically interact and manage their cluster through Kibana, having the count of fields in Stack Management would be another good place to leverage the count and show this info there for troubleshooting purposes (self service).
Or even the "new" Data Set Quality app might leverage such info.

We had already several users hit by a problem related to the introduction of ignore_dynamic_beyond_limit (as it wasn't well advertised in 8.14) and being unable to easily pinpoint which indices have exceeded the limit, apart having the access to the client side logs where we were getting the Limit of total fields [10000] has been exceeded while adding....

Another approach would be to have a boolean flag reporting if at least 1 field was ignored (due to limit or malformed).

In short, without focusing on the implementation, I think it would be nice to know:

How many fields do I have in this index?
Did we hit the field count limit in this index (with or without ignore_dynamic_beyond_limit)?
Are there ignored fields in this index?
- If yes, how many due to ignore_malformed?
- If yes, how many due to ignore_dynamic_beyond_limit?
- If yes, which fields have been ignored?
How many documents have _ignored fields in this index?

javanna · 2025-02-06T09:04:07Z

I get the high-level ask, that is clear to me.

I don't follow the part where Kibana gets mentioned:

Main reason is the field count in Kibana is way lower than the "count" done on Elasticsearch side.

where does Kibana show the field count today and what does it do with it? Pardon my ignorance there. I do get that if the field count was exposed clearly from ES there would be no need to reverse engineer it outside of ES.

lucabelluccini · 2025-02-06T11:42:55Z

No problem and that's great to check as we need to be sure we're picking the correct solution for the problem(s).

where does Kibana show the field count today and what does it do with it? Pardon my ignorance there. I do get that if the field count was exposed clearly from ES there would be no need to reverse engineer it outside of ES.

Right now, Kibana shows the field count in data views, not really on specific indices

javanna · 2025-02-06T14:11:37Z

I understand, thanks. That should absolutely be read as "leaf fields", not including object fields indeed.

Leaf-Lin added >enhancement needs:triage Requires assignment of a team area label :Data Management/Stats Statistics tracking and retrieval APIs labels Feb 12, 2021

elasticmachine added the Team:Data Management Meta label for data/management team label Feb 12, 2021

Leaf-Lin mentioned this issue Feb 12, 2021

Getting index _field_caps?fields=* output elastic/support-diagnostics#469

Closed

jtibshirani removed the needs:triage Requires assignment of a team area label label Feb 19, 2021

hartfordfive mentioned this issue Jan 20, 2023

elasticsearch_indices_mappings_stats_fields metric doesn't appear to report accurate count prometheus-community/elasticsearch_exporter#674

Open

felixbarny linked a pull request Nov 7, 2024 that will close this issue

Add mapped fields count to index stats #116438

Draft

felixbarny removed the :Data Management/Stats Statistics tracking and retrieval APIs label Nov 8, 2024

elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Data Management Meta label for data/management team labels Nov 8, 2024

felixbarny added the :Data Management/Stats Statistics tracking and retrieval APIs label Nov 8, 2024

elasticsearchmachine added Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GET field count per index API #68947

GET field count per index API #68947

Leaf-Lin commented Feb 12, 2021 •

edited

Loading

elasticmachine commented Feb 12, 2021

mbrunnert commented Aug 8, 2022

geekpete commented Nov 9, 2022 •

edited

Loading

Hythloday-zero commented Feb 8, 2023 •

edited

Loading

shaigbdb commented Mar 9, 2023

geekpete commented Mar 9, 2023

inqueue commented Jun 2, 2023

lucabelluccini commented Nov 6, 2024 •

edited

Loading

felixbarny commented Nov 6, 2024

VimCommando commented Nov 7, 2024

felixbarny commented Nov 7, 2024

elasticsearchmachine commented Nov 8, 2024

felixbarny commented Nov 13, 2024

javanna commented Nov 18, 2024

felixbarny commented Nov 18, 2024

dakrone commented Nov 18, 2024

felixbarny commented Nov 28, 2024

lucabelluccini commented Jan 27, 2025

felixbarny commented Jan 27, 2025

javanna commented Feb 5, 2025

lucabelluccini commented Feb 5, 2025 •

edited

Loading

javanna commented Feb 5, 2025

lucabelluccini commented Feb 5, 2025

javanna commented Feb 6, 2025

lucabelluccini commented Feb 6, 2025

javanna commented Feb 6, 2025

GET field count per index API #68947

GET field count per index API #68947

Comments

Leaf-Lin commented Feb 12, 2021 • edited Loading

elasticmachine commented Feb 12, 2021

mbrunnert commented Aug 8, 2022

geekpete commented Nov 9, 2022 • edited Loading

Hythloday-zero commented Feb 8, 2023 • edited Loading

shaigbdb commented Mar 9, 2023

geekpete commented Mar 9, 2023

inqueue commented Jun 2, 2023

lucabelluccini commented Nov 6, 2024 • edited Loading

felixbarny commented Nov 6, 2024

VimCommando commented Nov 7, 2024

felixbarny commented Nov 7, 2024

elasticsearchmachine commented Nov 8, 2024

felixbarny commented Nov 13, 2024

javanna commented Nov 18, 2024

felixbarny commented Nov 18, 2024

dakrone commented Nov 18, 2024

felixbarny commented Nov 28, 2024

lucabelluccini commented Jan 27, 2025

felixbarny commented Jan 27, 2025

javanna commented Feb 5, 2025

lucabelluccini commented Feb 5, 2025 • edited Loading

javanna commented Feb 5, 2025

lucabelluccini commented Feb 5, 2025

javanna commented Feb 6, 2025

lucabelluccini commented Feb 6, 2025

javanna commented Feb 6, 2025

Leaf-Lin commented Feb 12, 2021 •

edited

Loading

geekpete commented Nov 9, 2022 •

edited

Loading

Hythloday-zero commented Feb 8, 2023 •

edited

Loading

lucabelluccini commented Nov 6, 2024 •

edited

Loading

lucabelluccini commented Feb 5, 2025 •

edited

Loading