Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot swappable path.data disks #18279

Closed
PhaedrusTheGreek opened this issue May 11, 2016 · 16 comments
Closed

Hot swappable path.data disks #18279

PhaedrusTheGreek opened this issue May 11, 2016 · 16 comments
Assignees
Labels
:Core/Infra/Resiliency Keep running when everything is ok. Die quickly if things go horribly wrong. >enhancement resiliency

Comments

@PhaedrusTheGreek
Copy link
Contributor

It seems that when making use of path.data over multiple physical disks, that when a disk is removed, the system should recover automatically. Currently, searches and or indexing requests over missing shards throw exceptions, and no allocation/recovery occurs. The only way to bring the data back online is to restart the node, or to reinsert the original disk with existing data.

It would be great if Elasticsearch could:

  • Automatically recover when disks are removed
  • Automatically make use of a newly returned empty disk

Steps to Test / Reproduce:

  1. Set up path.data over 2 disks, and start 2 elasticsearch nodes locally
path.data: ["/Volumes/KINGSTON", "/Volumes/SDCARD"]
  1. Index some data over 5 shards.
index    shard prirep state   docs  store ip        node
test1003 4     r      STARTED    2 10.1kb 127.0.0.1 Jacqueline Falsworth
test1003 4     p      STARTED    2 10.1kb 127.0.0.1 Vindicator
test1003 3     r      STARTED    6 24.4kb 127.0.0.1 Jacqueline Falsworth
test1003 3     p      STARTED    6 24.5kb 127.0.0.1 Vindicator
test1003 1     r      STARTED   10 40.6kb 127.0.0.1 Jacqueline Falsworth
test1003 1     p      STARTED   10 45.5kb 127.0.0.1 Vindicator
test1003 2     r      STARTED    2 10.1kb 127.0.0.1 Jacqueline Falsworth
test1003 2     p      STARTED    2 10.1kb 127.0.0.1 Vindicator
test1003 0     r      STARTED    3 10.1kb 127.0.0.1 Jacqueline Falsworth
test1003 0     p      STARTED    3 10.1kb 127.0.0.1 Vindicator
  1. Remove the disk that contains most/all of the data

Exceptions start to show in logs

2016-05-11 11:50:18,961][DEBUG][action.admin.indices.stats] [Vindicator] [indices:monitor/stats] failed to execute operation for shard [[[test1003/01ABN7pTQDCoTa80WMdAvg]][0], node[AMr_NWrVSFCuNV-YCOfsVg], [P], s[STARTED], a[id=IMwYwgWrTLCZYa08WJRNvg]]
ElasticsearchException[failed to refresh store stats]; nested: NoSuchFileException[/Volumes/KINGSTON/elasticsearch/nodes/0/indices/01ABN7pTQDCoTa80WMdAvg/0/index];
    at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1411)
    at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1396)
    at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:54)
    at org.elasticsearch.index.store.Store.stats(Store.java:321)
    at org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:632)
    at org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:137)
    at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:166)
    at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:414)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:393)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:380)
    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:65)
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:468)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: /Volumes/KINGSTON/elasticsearch/nodes/0/indices/01ABN7pTQDCoTa80WMdAvg/0/index
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:407)
    at java.nio.file.Files.newDirectoryStream(Files.java:457)
    at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:215)
    at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:234)
    at org.elasticsearch.index.store.FsDirectoryService$1.listAll(FsDirectoryService.java:135)
    at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
    at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
    at org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1417)
    at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1409)
    ... 18 more
[2016-05-11 11:50:26,796][WARN ][monitor.fs               ] [Vindicator] Failed to fetch fs stats - returning empty instance

but _cat/shards shows everything is OK

index    shard prirep state   docs store ip        node
test1003 4     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 4     p      STARTED            127.0.0.1 Vindicator
test1003 3     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 3     p      STARTED            127.0.0.1 Vindicator
test1003 1     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 1     p      STARTED            127.0.0.1 Vindicator
test1003 2     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 2     p      STARTED            127.0.0.1 Vindicator
test1003 0     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 0     p      STARTED            127.0.0.1 Vindicator
  1. Post a _refresh

No change

  1. Index some data
{
   "error": {
      "root_cause": [
         {
            "type": "index_failed_engine_exception",
            "reason": "Index failed for [test1003#AVSghrSCuf6DFWq498vy]",
            "index_uuid": "01ABN7pTQDCoTa80WMdAvg",
            "shard": "1",
            "index": "test1003"
         }
      ],
      "type": "index_failed_engine_exception",
      "reason": "Index failed for [test1003#AVSghrSCuf6DFWq498vy]",
      "index_uuid": "01ABN7pTQDCoTa80WMdAvg",
      "shard": "1",
      "index": "test1003",
      "caused_by": {
         "type": "i_o_exception",
         "reason": "Input/output error: NIOFSIndexInput(path=\"/Volumes/KINGSTON/elasticsearch/nodes/0/indices/01ABN7pTQDCoTa80WMdAvg/1/index/_a.cfs\") [slice=_a_Lucene50_0.tim]",
         "caused_by": {
            "type": "i_o_exception",
            "reason": "Input/output error"
         }
      }
   },
   "status": 500
}

Logs show an exception

[2016-05-11 11:52:26,911][DEBUG][action.admin.indices.stats] [Vindicator] [indices:monitor/stats] failed to execute operation for shard [[[test1003/01ABN7pTQDCoTa80WMdAvg]][0], node[AMr_NWrVSFCuNV-YCOfsVg], [P], s[STARTED], a[id=IMwYwgWrTLCZYa08WJRNvg]]
ElasticsearchException[failed to refresh store stats]; nested: NoSuchFileException[/Volumes/KINGSTON/elasticsearch/nodes/0/indices/01ABN7pTQDCoTa80WMdAvg/0/index];
    at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1411)
    at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1396)
    at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:54)
    at org.elasticsearch.index.store.Store.stats(Store.java:321)
    at org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:632)
    at org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:137)
    at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:166)
    at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:414)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:393)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:380)
    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:65)
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:468)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: /Volumes/KINGSTON/elasticsearch/nodes/0/indices/01ABN7pTQDCoTa80WMdAvg/0/index
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:407)
    at java.nio.file.Files.newDirectoryStream(Files.java:457)
    at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:215)
    at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:234)
    at org.elasticsearch.index.store.FsDirectoryService$1.listAll(FsDirectoryService.java:135)
    at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
    at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
    at org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1417)
    at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1409)
    ... 18 more

_cat/shards still show all shards STARTED

index    shard prirep state   docs store ip        node
test1003 4     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 4     p      STARTED            127.0.0.1 Vindicator
test1003 3     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 3     p      STARTED            127.0.0.1 Vindicator
test1003 1     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 1     p      STARTED            127.0.0.1 Vindicator
test1003 2     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 2     p      STARTED            127.0.0.1 Vindicator
test1003 0     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 0     p      STARTED            127.0.0.1 Vindicator
  1. Wait 5 minutes, Search some data:

No change

{
   "took": 15,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 3,
      "failed": 2,
      "failures": [
         {
            "shard": 0,
            "index": "test1003",
            "node": "AMr_NWrVSFCuNV-YCOfsVg",
            "reason": {
               "type": "i_o_exception",
               "reason": "Input/output error: NIOFSIndexInput(path=\"/Volumes/KINGSTON/elasticsearch/nodes/0/indices/01ABN7pTQDCoTa80WMdAvg/0/index/_0.cfs\") [slice=_0.fdt]",
               "caused_by": {
                  "type": "i_o_exception",
                  "reason": "Input/output error"
               }
            }
         },
         {
            "shard": 1,
            "index": "test1003",
            "node": "wK5mnEIaT82Wz3wdTAjv6Q",
            "reason": {
               "type": "i_o_exception",
               "reason": "Input/output error: NIOFSIndexInput(path=\"/Volumes/KINGSTON/elasticsearch/nodes/1/indices/01ABN7pTQDCoTa80WMdAvg/1/index/_2.cfs\") [slice=_2.fdt]",
               "caused_by": {
                  "type": "i_o_exception",
                  "reason": "Input/output error"
               }
            }
         }
      ]
   },
   "hits": {
      "total": 23,
      "max_score": 1,
      "hits": []
   }
}
index    shard prirep state   docs store ip        node
test1003 4     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 4     p      STARTED            127.0.0.1 Vindicator
test1003 3     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 3     p      STARTED            127.0.0.1 Vindicator
test1003 1     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 1     p      STARTED            127.0.0.1 Vindicator
test1003 2     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 2     p      STARTED            127.0.0.1 Vindicator
test1003 0     r      STARTED            127.0.0.1 Jacqueline Falsworth
test1003 0     p      STARTED            127.0.0.1 Vindicator
@clintongormley clintongormley added :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. discuss labels May 12, 2016
@clintongormley
Copy link
Contributor

Related to #18217

@clintongormley
Copy link
Contributor

While I think there may be improvements that can be made when a disk dies, if you want hot swapping etc I think you need a proper RAID system or LVS

@s1monw
Copy link
Contributor

s1monw commented May 13, 2016

I think we need to add some resiliency here:

  • we should check if we can write on the datapath before we allocate
  • we should fail the engine if we hit an IOException in any case it's really crazy that we don't do that. There should not be any IOException here

I will take care of this

@s1monw
Copy link
Contributor

s1monw commented May 13, 2016

yeah I am torn on the hot-swapping. I think we can potentially take things out of the loop internally but if you are pluggin in a new disk and we should auto-detect that a datapath is good again I think you should restart the node instead?

@PhaedrusTheGreek
Copy link
Contributor Author

Definitely we don't want to introduce any resiliency issues. Some manual intervention makes sense, but restarting a node can sometimes take a long time. Should there be something like delayed allocation on marking a path.data as failed? - there is the case of something like NFS, where a network problem might make the drive appear to come and go.

@s1monw
Copy link
Contributor

s1monw commented May 14, 2016

I think if you loose a disk you need to restart the node. I can totally improve along the lines of failing shards quicker but we shouldn't try to be fancy here. I think we should take the node out of the cluster somehow but that's something that needs more thought.

@PhaedrusTheGreek
Copy link
Contributor Author

Multiple disks on path.data offers some added benefit over RAID0, in that IO is spread out over all disks, theoretically matching RAID0 performance, but while not causing a total volume failure on a single disk loss.

Restarting a node is much easier than re-building a logical volume, and much less data is lost, so either way we are ahead.

@evanvolgas
Copy link

evanvolgas commented Aug 10, 2016

I think if you loose a disk you need to restart the node. I can totally improve along the lines of failing shards quicker but we shouldn't try to be fancy here. I think we should take the node out of the cluster somehow but that's something that needs more thought.

In general this makes sense but it would be nice if you could apply something like a transient setting to tell that node that a disk has died and to temporarily stop trying to perform I/O on it. That would still require manual intervention, but it would allow to apply a temporary hotfix if a node restart is not immediately feasible.

@evanvolgas
Copy link

evanvolgas commented Oct 24, 2016

Had this issue come up against last night.

Our logging nodes have 4 SSDs. We've passed an array to the path.data in elasticsearch.yaml.

Over the weekend, one of the file systems on one of the disks one one of the ES servers became corrupt. Over the next 12 hours, ES spewed 500GB of errors like the following into the logs, filling up the root partition and eventually alerting us (because we alert on disk usage but we didn't at the time have alerts on ES log file size / growth)

[2016-10-22 00:00:04,017][WARN ][cluster.action.shard     ] [deliverability_master02-es02] [logstash-delivery-2016.10.14.09][0] received shard failed for target shard [[logstash-delivery-2016.10.14.09][0], node[J_Wws-cKQPKPJjIE7lEacw], relocating [IIKJ3BHGRlG0IYmZ3GLeNA], [R], v[8192], s[INITIALIZING], a[id=HwzksPLITruZz94vsNTMvg, rId=6DS2pI5FS3uih0a1yvRJFw], expected_shard_size[25697352067]], indexUUID [RL1zWoD6SN6_ZmpjPGM0Yw], message [failed to create shard], failure [ElasticsearchException[failed to create shard]; nested: NotSerializableExceptionWrapper[file_system_exception: /storage/sdd1/deliverability/nodes/0/indices/logstash-delivery-2016.10.14.09/0/_state: Input/output error]; ]
[logstash-delivery-2016.10.14.09][[logstash-delivery-2016.10.14.09][0]] ElasticsearchException[failed to create shard]; nested: NotSerializableExceptionWrapper[file_system_exception: /storage/sdd1/deliverability/nodes/0/indices/logstash-delivery-2016.10.14.09/0/_state: Input/output error];
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:389)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:620)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:520)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:177)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: NotSerializableExceptionWrapper[file_system_exception: /storage/sdd1/deliverability/nodes/0/indices/logstash-delivery-2016.10.14.09/0/_state: Input/output error]
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
    at java.nio.file.Files.newDirectoryStream(Files.java:457)
    at org.elasticsearch.gateway.MetaDataStateFormat.loadLatestState(MetaDataStateFormat.java:257)
    at org.elasticsearch.index.shard.ShardPath.loadShardPath(ShardPath.java:122)
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:310)
    ... 10 more

There are 12 data nodes in this cluster with 4 SSDs, 3 dedicated masters, and we run a replication factor of 2 using hourly indices with 2 primary shards.

During the time that this happened, Elasticsearch continued to place primary shards on the failed storage/sdd1 drive. Because the writes to the primary failed, and because we were only alerted of the problem (interesting to note as well that the cluster remained green the entire time and none of our monitoring and alerting of /_cluster and _nodes stats caught it.. which is our fault, but still important to note) because the errors in the logs filled up the root disk.

As a result of Elasticsearch continuing to place primary shards on the failed disk, we lost half of the log data for 9 out of the 12 hours that this disk was unreachable (because 9 out of 12 times it attempted to place at least one of each hour's primary shards on the unreachable disk; the writes to primary failed, the primary was never moved elsewhere).

I suspect, although I did not dig into it or write a test case to prove it, that the process whereby Elasticsearch determines which nodes are eligible to get a write and which disk to write to once it gets there might also bias further writes towards the drive that failed. In our case, we had 9 data nodes that were eligible to accept writes, each having 4 eligible disks that had not exceeded any water marks or otherwise were unwritable. Over 12 hours, 9 of the 24 primary shards created were allocated to the node with the disk failure and it routed them to the unreachable disk. As a result of being unwritable for several hours, that disk also was less full than the other disks on the cluster. Again, I don't know that a disk failure like the one we had biases shard placement in favor of writing to the unreachable disk. But we did see an abnormally high number of shards placed on one machine, and on one disk on one machine.... abnormal enough to make me wonder if that wasn't just a coincidence.

All of which to say.... I think this issue is extremely important. I also think @s1monw is right to suggest that ensuring a filepath is writable before placing a shard (especially a primary shard) will go a long way towards adding resiliency.

@bleskes
Copy link
Contributor

bleskes commented Mar 13, 2018

#18279 (comment) describe two things that need to happen to resolve this issue. The first has been done in #16745 . The second (failing the shard) is very easy. I opened #29008 to highlight it as an adopt me and a low hanging fruit. Closing this one as superseded by these two issues.

@bleskes bleskes closed this as completed Mar 13, 2018
@evanvolgas
Copy link

@bleskes would you consider reopening this ticket as a high hanging fruit, as per #29008 (comment)? Or, if you feel it should remain closed, can you share a bit more of your thinking about why? I don't feel like #16745 and #18279 (comment) are talking about the same thing

@bleskes
Copy link
Contributor

bleskes commented Mar 15, 2018

@EvanV I agree it's not the same thing. As the discussion above indicates, we feel adding hot swappiness on the path level will come at a too high of a price. Elasticsearch currently works on the level of a node - shard copies are spread up across nodes and if a shard fails the master will try to assign it to another node. We can do better there and start tracking failures per node so we can stop allocating to it (we don't do that now) but adding another conceptual layer isn't worth it. LVM or RAID are much more mature solutions to achieve that part. That said, there were a few things we can do that came out of the discussion. One is done and the other is tracked by the another issue, which is why I closed this one.

@evanvolgas
Copy link

evanvolgas commented Mar 15, 2018

Thank you for explaining. I see what you're saying.

I feel like this ticket shouldn't be called "Hot swappable data paths" and instead be a bug report along the lines of "ES shouldn't allocate shards to dead disks." I think the later is still true, albeit far more complicated, to your point. I also feel like the docs recommending multiple file paths should be caveated that RAID0 might be a better option, depending on your needs (I'm happy to submit an update to the docs along these lines, if you'd be open to accepting it).

You're definitely right that ES shouldn't be responsible for replacing RAID or LVM. Focusing on the issues you did makes sense as a better solution than currently exists. Not to beat a dead horse, but I do feel that ES should be capable of not trying to allocate shards to dead disks. That is how I viewed this original issue, and it sounds like we both agree that #29008 doesn't quite cover that. Would you be open to adding an issue along the lines of "ES Shouldn't Allocate Shard to Dead Disks" and/or renaming this one and orienting the scope of it around that, not hot swappable disks?

@jasontedor jasontedor added :Core/Infra/Resiliency Keep running when everything is ok. Die quickly if things go horribly wrong. and removed :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Mar 15, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@bleskes
Copy link
Contributor

bleskes commented Mar 16, 2018

I also feel like the docs recommending multiple file paths should be caveated that RAID0 might be a better option, depending on your needs (I'm happy to submit an update to the docs along these lines, if you'd be open to accepting it).

Yes please, though I tried to find what you meant and couldn't.

Would you be open to adding an issue along the lines of "ES Shouldn't Allocate Shard to Dead Disks" and/or renaming this one and orienting the scope of it around that, not hot swappable disks?

I think this one #18417 covers it? If you agree, feel free to comment there.

@evanvolgas
Copy link

Yes please, though I tried to find what you meant and couldn't.

I may be recalling incorrectly, or it may have been a blog post. In any event, I'll poke around and add a note to the docs on "things to watch out for" vis a vis multiple data paths.

#18417 does cover my concern yes. Thanks for taking the time to explain your reasoning on this one. I wasn't following you at first, but it's very clear now what you're thinking and how you're breaking down the work on this task. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Resiliency Keep running when everything is ok. Die quickly if things go horribly wrong. >enhancement resiliency
Projects
None yet
Development

No branches or pull requests

7 participants