-
Notifications
You must be signed in to change notification settings - Fork 25.2k
GET field count per index API #68947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-core-features (Team:Core/Features) |
This feature is becoming more valuable now that the documentation recommends sizing heap based on field counts |
Seems we have field counts with Field Usage Stats api, but it's not so consumable with some top level counts.
Output is per shard and recursive, so getting totals is still a manual exercise. And it's only doing actively used fields not all fields in the mapping? So that could be something that extended could also do. |
https://www.elastic.co/guide/en/kibana/master/data-views-api-get.html#data-views-api-get works to get the raw info similar to the Kibana UI screenshot, but it doesn't get us a fields.count value which would be extremely valuable. |
This returns field usage information - I'm not seeing a field count in the output. |
Sorry I meant the response returns the lists of fields but counting them is something you need to do manually. |
Bump. Is there any chance of getting this in the near term? Elasticsearch has already imposed a field limit via |
Especially following the introduction of A poor's man way of counting fields might be In addition, the count reported by a Kibana Data View is incorrect: As if I try to add few fields to the index above, I get a rejection due to field limit exceeded (10000).
FYI @flash1293 / @felixbarny |
I agree that this would be very useful. I feel like we're re-implementing this on the client side in multiple places (and not always in the correct way). For example, in the dataset quality page. cc @achyutjhunjhunwala @gbamparop. I think exposing this in some ES API would be relatively straightforward as we already have a method for this in elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/MappingLookup.java Lines 250 to 255 in 8db9181
|
On a related note I would love to see field counts, and breakdown by field type, in the |
I've created a quick POC for adding the fields count to the index stats API: #116438 |
Pinging @elastic/es-data-management (Team:Data Management) |
I discussed this with @dakrone. A challenge with adding this to the index stats API is that this API isn't available on serverless, as it's considered to be exposing too many low-level details. However, for this use case we'll want to have a user-accessible API. We discussed adding this to get mappings. It has some technical challenges as get mappings is answered by the coordinating node, getting the unparsed mapping from cluster state. To call |
I can see how having clear field count per index is valuable, especially as this is reverse-engineered now in multiple places, and probably not aligned with the ES logic necessarily. I see the challenges with using index stats API, as well as get mappings. It sounds like this info could be added to field caps more easily, although it does not map strictly to the capabilities of fields. |
I forgot to mention that we also discussed adding this to field caps. A challenge with that approach is that field caps isn't focused on a particular index, but at an index pattern. Even if you're looking at just a single data stream, it has multiple backing indices, where each may have a different number of fields. While we could add another section in the field_caps response that lists the number of fields for each matching index, it seems to go against the spirit of that API which aggregates data for fields across multiple indices. For this, we probably want an API that's focussed on indices rather than fields. Is the cat indices API available on serverless? |
Yes, cat indices is available for Serverless (though it should be a human-invoked thing, and not used or relied upon programmatically). |
The index stats still seems like the most appropriate place to add this. It's index-centric and already fetches stats from the shards rather than just answering on the coordinating node, which makes accessing |
This is still causing a lot of pain to investigate problems with Main reason is the field count in Kibana is way lower than the "count" done on Elasticsearch side. |
Is that something we can fix on the Kibana side? I agree that ideally, there should be an API in ES to expose that information but in the meantime, we could fix the bug in Kibana so that it has the same logic to count the number of fields. |
Could we clarify why Kibana needs to replicate fields counting? I though that the idea behind using ignore_dynamic_beyond_limit was to have less total fields limit issues, as it's a graceful behaviour compared to rejecting docs. |
The usability problem with
Therefore, users and support have hard time to understand why fields started to be ignored as Kibana tells there are 740 fields in a given index/data view. We would like to have the "real" count used by Elasticsearch which triggered ignoring fields. On top of this, maybe we should file it separately, is to trigger a warning if a query makes use of a field which is ignored in 1 or more indices being searched on. |
Maybe a silly question, but I think I am missing why Kibana needs to be aware of how many fields there are in the mappings in the first place. |
The original ask is to have an API on Elasticsearch side to expose the count. As users typically interact and manage their cluster through Kibana, having the count of fields in Stack Management would be another good place to leverage the count and show this info there for troubleshooting purposes (self service). We had already several users hit by a problem related to the introduction of Another approach would be to have a boolean flag reporting if at least 1 field was ignored (due to limit or malformed). In short, without focusing on the implementation, I think it would be nice to know:
|
I get the high-level ask, that is clear to me. I don't follow the part where Kibana gets mentioned:
where does Kibana show the field count today and what does it do with it? Pardon my ignorance there. I do get that if the field count was exposed clearly from ES there would be no need to reverse engineer it outside of ES. |
I understand, thanks. That should absolutely be read as "leaf fields", not including object fields indeed. |
Today, there's an API to get per index field or all fields:
which returns:
But the output from the above doesn't have break down by index, so it's not easy to troubleshoot which index is having mapping explosion problem.
I can potentially loop over my indices list and running
GET <index>/field_caps?fields=*
API to extract the field length per index, but again, this is not ideal.It would be most useful if Elasticsearch can have an API out-of-the-box to count the number of fields breakdown by the indices, expected output should look like:
This can be part of the
GET indices/stats
orcat indices
API.Btw, kibana might be able to take advantage of the API instead of doing its own aggregation/counting to show index pattern field count:

Not sure if the following is the best script to extract the field count, but with some
jq
over theGET _mapping
output, I am able to get the desired format:The text was updated successfully, but these errors were encountered: