Control storing array source with index setting#112397
Control storing array source with index setting#112397elasticsearchmachine merged 25 commits intoelastic:mainfrom
Conversation
|
Hi @kkrik-es, I've created a changelog YAML for you. |
…ray' into synthetic-source/store-source-array
# Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java
|
run elasticsearch-ci/bwc-snapshots |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| public enum StoreSourceMode { | ||
| DISABLED("disabled"), // No source recording | ||
| ARRAYS("arrays"), // Store source for arrays of mapped fields | ||
| ENABLED("enabled"); // Store source for both singletons and arrays of mapped fields |
There was a problem hiding this comment.
Not excited about the names here.. Maybe DISABLED => OFF or NONE , ENABLED => FULL or ALL?
There was a problem hiding this comment.
Maybe naive question, but why do we need to enum here? For now at least it seems that just the index.mapping.store_array_source setting is sufficient?
There was a problem hiding this comment.
The idea is that we can add the FULL option for some fields, in case we want to track their source even if they're singletons, e.g. for geo or for objects with deep structure like nested. This option will never be set at the index level as it would disable synthetic source.
There was a problem hiding this comment.
I think that @jimczi also expressed the desire to generalize the option of storing the source of objects or fields verbatim.
There was a problem hiding this comment.
Updated to none/arrays/all, ptal.
| @@ -0,0 +1,732 @@ | |||
| --- | |||
| object param - object array: | |||
There was a problem hiding this comment.
Are these just moved yaml tests?
There was a problem hiding this comment.
Yeah moved the ones with store_array_source and added some similar with the index setting.
| public enum StoreSourceMode { | ||
| DISABLED("disabled"), // No source recording | ||
| ARRAYS("arrays"), // Store source for arrays of mapped fields | ||
| ENABLED("enabled"); // Store source for both singletons and arrays of mapped fields |
There was a problem hiding this comment.
Maybe naive question, but why do we need to enum here? For now at least it seems that just the index.mapping.store_array_source setting is sufficient?
…ray' into synthetic-source/store-source-array
...lasticsearch/datastreams/logsdb/qa/StandardVersusLogsIndexModeRandomDataChallengeRestIT.java
Outdated
Show resolved
Hide resolved
|
|
||
| // Setting StoreSourceMode to ENABLED at the index level is equivalent to disabling synthetic source, which is not desired. | ||
| // Since the only valid option is to track array source by default, we use a boolean index setting for it. | ||
| public static final Setting<Boolean> STORE_ARRAY_SOURCE_SETTING = Setting.boolSetting( |
There was a problem hiding this comment.
I am not really happy that the name will be different between index settings and mapper parameters (we have them named the same at least for ignore_malformed) but i see why it is done. Can we maybe make all not valid for index setting but have the same name?
There was a problem hiding this comment.
Yeah that was my initial take here, too. An enum here is more flexible, we may want to have other values in the future..
Changed to match, @martijnvg wdyt?
jimczi
left a comment
There was a problem hiding this comment.
Did we consider offering this option at the mapping level instead of applying it to the global index settings? We don't need this to be the default. It could be enabled at the object level within the mapping. We could also provide the option at the root object level, similar to how we handle subobjects.
Allowing this option for the entire mapping might not be ideal, especially if there are many objects. I believe the need to preserve array order is likely specific to certain parts of the mapping for particular use cases?
Another solution could be to require the field to be nested in order to preserve the ordering. Forcing users to use nested fields when they have multiple objects to differentiate feels more consistent to me. Otherwise, we're introducing a third method for handling arrays of objects, which would only apply to the _source field and be inconsistent with how the data is indexed.
We'll be adding the per-field option in follow-up PRs, hence the introduced enum. We want to include the index-level, catch-all option for Iiuc, nested fields (ie |
| parser = context.parser(); | ||
| } | ||
| context = context.createNestedContext((NestedObjectMapper) context.parent()); | ||
| } else if (context.storeSourceModeFromIndexSettings() == Mapper.StoreSourceMode.ALL && context.canAddIgnoredField()) { |
There was a problem hiding this comment.
I don't understand why this is needed for the singleton case. I thought that store array source would only apply to arrays, do we also need to store single objects?
There was a problem hiding this comment.
Hm correct, this is not possible actually.. I'll revert it.
It will be added as a singleton option for per-field configuration - there are cases where we may want to store the source as-is, e.g. for geo or full object hierarchies. We also have the equivalent for nested - highjacking store_array_source:
There was a problem hiding this comment.
Btw this is intended to replace store_array_source for objects too.
|
Early results from Rally show no significant increase in storage size (1%). Indexing took 15% more, something to keep in mind.. |
martijnvg
left a comment
There was a problem hiding this comment.
Left two comments, otherwise LGTM
server/src/main/java/org/elasticsearch/index/mapper/Mapper.java
Outdated
Show resolved
Hide resolved
| public static final Setting<StoreSourceMode> STORE_ARRAY_SOURCE_SETTING = Setting.enumSetting( | ||
| StoreSourceMode.class, | ||
| "index.mapping.store_source", | ||
| StoreSourceMode.NONE, |
There was a problem hiding this comment.
Should the default by arrays if index.mode is logsdb?
There was a problem hiding this comment.
I think we should change this in the logs template, so that it can be overriden for logsdb indexes, if not desired.
There was a problem hiding this comment.
I see, I didn't know this was the plan to enable. This makes sense to me.
Introduce an index setting that forces storing the source of leaf field and object arrays in synthetic source mode. Nested objects are excluded as they already preserve ordering in synthetic source. Next step is to introduce override params at the mapper level that will allow disabling the source, or storing the source for arrays (if not enabled at index level), or storing the source for both arrays and singletons. This will happen in follow-up changes, so that we can benchmark the impact of this change in parallel. Related to elastic#112012
Introduce an index setting that forces storing the source of leaf field and object arrays in synthetic source mode. Nested objects are excluded as they already preserve ordering in synthetic source.
Next step is to introduce override params at the mapper level that will allow disabling the source, or storing the source for arrays (if not enabled at index level), or storing the source for both arrays and singletons. This will happen in follow-up changes, so that we can benchmark the impact of this change in parallel.
Related to #112012