Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions docs/reference/how-to/disk-usage.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -135,20 +135,40 @@ PUT index
--------------------------------------------------
// CONSOLE

[float]
=== Watch your shard size

Larger shards are going to be more efficient at storing data. While it is not possible to give a precise recommendation for how large shards should be, and very large shards come with drawbacks (for example, longer recovery times), it is generally possible to have shard sizes in the 20-30 GB range.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should specify a size here as it depends on too many factors. With fast replica recovery coming that is another mitigating factor (#22484) to one of the drawbacks that you mention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor Makes sense. Do you think we should mention an upper range, e.g. 50 GB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question but I don't think so. We are still fighting the "30 GB" heap recommendation, too many people see that number and think it's the magical number where they should set their heap without enough consideration for all the factors involved. Instead, I think that the verbiage is good but we should avoid enshrining specific numbers.


[float]
=== Disable `_all`

The <<mapping-all-field,`_all`>> field indexes the value of all fields of a
document and can use significant space. If you never need to search against all
fields at the same time, it can be disabled.

[float]
=== Disable `_source`

The <<mapping-source-field,`_source`>> field stores the original JSON body of the document. If you don’t need access to it you can disable it. However, APIs that needs access to `_source` such as update and reindex won’t work.

[float]
=== Use `best_compression`

The `_source` and stored fields can easily take a non negligible amount of disk
space. They can be compressed more aggressively by using the `best_compression`
<<index-codec,codec>>.

[float]
=== Force Merge

Elasticsearch stores data in segments. Segments make up a Lucene index - a shard in Elasticsearch. The <<indices-forcemerge,`_forcemerge` API>> can be used to reduce the number of segments per shard. In many cases, the number of segments can be reduced to one per shard by setting `max_num_segments=1`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps turn this around a bit: Elasticsearch stores data in shards. Shards are Lucene indices and are composed of segments. Segments are the actual files on disk, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, how about: "Indices in Elasticsearch are stored in one or more shards. Each shard is a Lucene index and made up of one or more segments - the actual files on disk. Larger segments are more efficient for storing data. The <<indices-forcemerge,_forcemerge API>> can be [...]"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me.


[float]
=== Shrink Index

The <<indices-shrink-index,Shrink API>> allows you to reduce the number of shards in an index. Together with the Force Merge API above, this can significantly reduce the number of shards and segments of an index.

[float]
=== Use the smallest numeric type that is sufficient

Expand Down