Skip to content

Comments

Sparse doc values index for LogsDB @timestamp field#121018

Closed
salvatore-campagna wants to merge 13 commits intoelastic:mainfrom
salvatore-campagna:feature/timestamp-sparse-index
Closed

Sparse doc values index for LogsDB @timestamp field#121018
salvatore-campagna wants to merge 13 commits intoelastic:mainfrom
salvatore-campagna:feature/timestamp-sparse-index

Conversation

@salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Jan 28, 2025

This PR introduces support for a sparse doc values index for the @timestamp field in DateFieldMapper when specific conditions are met:

  • The index mode is set to LOGSDB.
  • The field name is @timestamp and mapped as a date field.
  • The field is included in the primary sort configuration.
  • The field has doc values and indexing is not disabled explicitly (not index: false).

When all the conditions above hold true, we:

  • We use the corresponding Sorted Set doc values format with indexing enabled (DocValuesSkipIndexType.RANGE is used in Lucene as the only available sparse index at the moment).
  • Disable indexing of the @timestamp field, dropping the inverted index in favor of the sparse doc values index.

Some queries might experience slower performance as a result of using a doc values sparse index instead of an inverted index.

Disabling the inverted index on the @timestamp field while enabling the sparse doc values index is expected to:

  • Reduce the storage footprint depending on the size of the inverted index relative to the sparse index.
  • Improve indexing throughput by reducing the amount of data written during segment flushes.

@salvatore-campagna salvatore-campagna changed the title feature: sparse doc values index on @timestamp Sparse doc values index on @timestamp Jan 28, 2025
@salvatore-campagna salvatore-campagna changed the title Sparse doc values index on @timestamp Sparse doc values index for LogsDB @timestamp field Jan 28, 2025
public static final IndexVersion INFERENCE_METADATA_FIELDS = def(9_005_00_0, Version.LUCENE_10_0_0);
public static final IndexVersion LOGSB_OPTIONAL_SORTING_ON_HOST_NAME = def(9_006_00_0, Version.LUCENE_10_0_0);
public static final IndexVersion SOURCE_MAPPER_MODE_ATTRIBUTE_NOOP = def(9_007_00_0, Version.LUCENE_10_0_0);
public static final IndexVersion TIMESTAMP_DOC_VALUES_SPARSE_INDEX = def(9_008_00_0, Version.LUCENE_10_0_0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After merging #120741 I will need to increase this to 9_010_00_0 or whatever depending on other PRs merged in the meanwhile.

public abstract class FieldMapper extends Mapper {
private static final Logger logger = LogManager.getLogger(FieldMapper.class);

public static final FeatureFlag DOC_VALUES_SPARSE_INDEX = new FeatureFlag("doc_values_sparse_index");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done already in #120741

}

private boolean hasDocValuesSparseIndex(final String fullFieldName) {
return index.isConfigured() == false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic as in #120741

when(mockedParserContext.getIndexSettings()).thenReturn(
new IndexSettings(
IndexMetadata.builder("_na_")
.settings(Settings.builder().put(IndexMetadata.SETTING_VERSION_CREATED, IndexVersion.current()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the index version here is required because we need to validate the existence of the sparse index or the inverted index in DataStreamTimestampFIeldMapper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants