Skip to content

[migrations] Throw error if reindex task fails#26062

Merged
tylersmalley merged 3 commits intoelastic:masterfrom
tylersmalley:reindex-failure
Nov 27, 2018
Merged

[migrations] Throw error if reindex task fails#26062
tylersmalley merged 3 commits intoelastic:masterfrom
tylersmalley:reindex-failure

Conversation

@tylersmalley
Copy link
Member

@tylersmalley tylersmalley commented Nov 22, 2018

We are not correctly handing the re-index task failing. This is a problem in a scenerio where there are missing shards, which mean that some or all of the documents will be lost.

This PR is only to resolve the issue of possible data loss. We will work on subsequent PR's to address re-try logic where possible.

Testing:

Start ES with a Kibana index pre 6.5.0, which requires a migration and has a missing segment file. This will result in a re-index failure of .kibana > .kibana_1. Alternatively, you can start ES using my data directory here.

Start Kibana, which will trigger the re-index, which should fail.

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
@elasticmachine

This comment has been minimized.

@tylersmalley

This comment has been minimized.

@elasticmachine

This comment has been minimized.

@tylersmalley

This comment has been minimized.

@elasticmachine

This comment has been minimized.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@tylersmalley tylersmalley merged commit 286c6a7 into elastic:master Nov 27, 2018
tylersmalley added a commit to tylersmalley/kibana that referenced this pull request Nov 27, 2018
Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
tylersmalley added a commit to tylersmalley/kibana that referenced this pull request Nov 27, 2018
Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
tylersmalley added a commit that referenced this pull request Nov 27, 2018
Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
tylersmalley added a commit that referenced this pull request Nov 27, 2018
Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
sebelga added a commit that referenced this pull request Nov 28, 2018
* [APM] Fix horizontal scrollbar being visible in windows 8.1 (#25988)

* [APM] Changed 'Response Time' to 'Duration' in transactions screens (#25990)

* translate InfraOps visualization component (Part 3) (#25213)

* translate InfraOps visualization component (Part 3 - part of folder components)

* update translation of Infra Ops vizualization component (Part 3)

* update translation of Infra Ops vizualization component (Part 3)

* change some ids and add pluralization

* update Infra Ops Part 3 - change some ids, change some intl.formatMessage() to <FormattedMessage> and directly wrap some classes by injectI18n()

* update Infra-III - add static to displayName

* [i18n] Translate Agg_types(part_3) (#26118)

* Translate agg_types - metrics

* Fix issues

* [ML] Aggregate anomalies table data using configured Kibana timezone (#26192)

* [ML] Aggregate anomalies table data using configured Kibana timezone

* [ML] Move dataFormatTz prop out of controller scope

* [ML] Fix alignment of filter icons in anomalies table (#26253)

* [ML] Fix alignment of filter icons in anomalies table

* [ML] Teak y position of icons in expanded row of table

* translate sample data (#26069)

translate sample data

* [ML] Wrap controller initialization in assertions. (#26265)

- The controller tests introduced in #25382 had a flaw: If a controller initialization would fail and throw an error, that test suite wouldn't be able to clean up any stubs. So tests using the same stubs would report and error because the stubs couldn't be wrapped again.
- This PR wraps every controller initialization inside an assertion and catches those errors properly as part of the test.

* [APM] fixes #20145 by displaying span.context.http.url in the span details flyout (#26238)

* Fix spaces license check (#26270)

## Summary

Allows the public spaces API to work with a gold license

Resolves #26271

* Job Info button in Reporting Listing (#25421)

* Job Info button in Reporting Listing

* use lodash directly

* start of flyout use

* description list in flyout

* capitalize

* undefined guard

* expire info on close

* add jest test

* better at error handling + messaging

* Add description for vis types (#26243)

* [migrations] Throw error if reindex task fails (#26062)

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

* [Reporting] Better logging for waitForSelector failure (#25762)

* [Reporting] Better logging for waitForSelector failure

* break waitForSelector

* experimental changes

* cleanup/consistency

* fix some test report title strings

* test disable chromium

* roll back test code

* take out non-working phantom changes

* roll back disable chromium test

* allow logger to use curried tags

* temporarily re-do report failure-causing change for test

* replace newline with escaped for single log line

* undo test change

* remove obsolete test

* [kbn/pm] allow packages to define extra paths to clean (#26132)

I noticed some discussion about how kbn clean should probably clear out the `.eslintcache` file, since it doesn't handle changes in related modules (for things like the import plugin) and it would be nice if `yarn kbn clean` took care of the issue. I figured it's not a bad idea, but adding `.eslintcache` directly to `@kbn/pm` felt wrong, so instead I've added another config options that can go in the package.json file, `clean.extraPatterns`. This array of patterns is passed into `del()` so that it can include things like negation.

As the name suggests this doesn't override the initial paths, just adds some extras that will be checked and cleared when `yarn kbn clean` is run.

* [config] fix logging.useUTC deprecation unset (#26053)

* Extend precommit hook script to support git GUI apps (#25883)

* feat(NA): extend support from precommit hook to git GUI apps.

* docs(NA): more descriptive error message.

* [DOCS] Clarify monitoring dependencies (#26229)

* apm: add ECS fields to index pattern (#26214)

* support standard license (#26294)

* [kbn-pm] update build

* [eslint] use disallow license header rule (#26309)

Fixes #26295

There are several places where we have accidentally added new license headers with linters but failed to remove old license headers manually. This prevents that by applying the an inverted version of the license headers rule that removed invalid license headers when files are moved.

* Bump node to 8.14.0 (#26313)

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

* Watch optimizer cache invalidation  (#24172)

* chore(NA): cherry pick work from spencer on impleting the cache invalidation system and merging it with current master.

* feat(NA): add support for dlls bundle into the cache state invalidation system.

* chore(NA): merge with master.

* feat(NA): first working version for the watch cache.

* feat(NA): added logger, correct cache delete and removed last todos.

* feat(NA): remove some useless features for the time being.

* refact(NA): just pass kibanaHapiServer.log function directly instead of an anonimous function that calls the kibanaHapiServer.log one.

* refact(NA): move everything to async.

* refact(NA): remove dll mentions.

* chore(NA): removed types/mkdirp as we dont use mkdirp into typescript.
@LeeDr
Copy link

LeeDr commented Jan 11, 2019

Testing this on 6.6.0 BC3, I used the Elasticsearch data from the original description (on debian 9 with Elasticsearch and Kibana running as a service);

Here's the reindex fails and Kibana shuts down;

Jan 11 22:16:05 packer-virtualbox-iso-1518710300 kibana[5429]: {"type":"log","@timestamp":"2019-01-11T22:16:05Z","tags":["info","migrations"],"pid":5429,"message":"Creating index .kibana_2."}
Jan 11 22:16:06 packer-virtualbox-iso-1518710300 kibana[5429]: {"type":"log","@timestamp":"2019-01-11T22:16:06Z","tags":["info","migrations"],"pid":5429,"message":"Reindexing .kibana to .kibana_1"}
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 kibana[5429]: {"type":"log","@timestamp":"2019-01-11T22:16:07Z","tags":["debug","root"],"pid":5429,"message":"shutting root down"}
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 kibana[5429]: {"type":"log","@timestamp":"2019-01-11T22:16:07Z","tags":["fatal","root"],"pid":5429,"message":"Error: Re-index failed [search_phase_execution_exception] all shards failed :: {\"type\":\"search_phase_execution_exception\",\"reaso                                            n\":\"all shards failed\",\"phase\":\"query\",\"grouped\":true,\"failed_shards\":[]}\n    at callCluster.then.result (/usr/share/kibana/src/server/saved_objects/migrations/core/elastic_index.js:283:23)\n    at tryCatcher (/usr/share/kibana/node_modules/bluebird/js/main/util.js:26:23)\n    at                                             Promise._settlePromiseFromHandler (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:503:31)\n    at Promise._settlePromiseAt (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:577:18)\n    at Promise._settlePromises (/usr/share/kibana/node_modules/bluebird/js/main/promise                                            .js:693:14)\n    at Async._drainQueue (/usr/share/kibana/node_modules/bluebird/js/main/async.js:123:16)\n    at Async._drainQueues (/usr/share/kibana/node_modules/bluebird/js/main/async.js:133:10)\n    at Immediate.Async.drainQueues [as _onImmediate] (/usr/share/kibana/node_modules/bluebird/                                            js/main/async.js:15:14)\n    at runCallback (timers.js:705:18)\n    at tryOnImmediate (timers.js:676:5)\n    at processImmediate (timers.js:658:5)"}
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 kibana[5429]: {"type":"log","@timestamp":"2019-01-11T22:16:07Z","tags":["debug","server"],"pid":5429,"message":"stopping server"}
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 kibana[5429]: {"type":"log","@timestamp":"2019-01-11T22:16:07Z","tags":["debug","legacy-service"],"pid":5429,"message":"stopping legacy service"}
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 kibana[5429]: {"type":"log","@timestamp":"2019-01-11T22:16:07Z","tags":["debug","plugins-service"],"pid":5429,"message":"Stopping plugins service"}
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 kibana[5429]: {"type":"log","@timestamp":"2019-01-11T22:16:07Z","tags":["debug","http","server"],"pid":5429,"message":"stopping http server"}
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 kibana[5429]: {"type":"log","@timestamp":"2019-01-11T22:16:07Z","tags":["debug","legacy-proxy"],"pid":5429,"message":"Event is being forwarded: close"}
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 kibana[5429]:  FATAL  Error: Re-index failed [search_phase_execution_exception] all shards failed :: {"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[]}
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 systemd[1]: kibana.service: Main process exited, code=exited, status=1/FAILURE
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 systemd[1]: kibana.service: Unit entered failed state.
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 systemd[1]: kibana.service: Failed with result 'exit-code'.
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 systemd[1]: kibana.service: Service hold-off time over, scheduling restart.
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 systemd[1]: Stopped Kibana.
Jan 11 22:16:07 packer-virtualbox-iso-1518710300 systemd[1]: Started Kibana.

But the service tries to start it again ^.

Then I get;

Jan 11 22:16:12 packer-virtualbox-iso-1518710300 kibana[5451]: {"type":"log","@timestamp":"2019-01-11T22:16:12Z","tags":["info","migrations"],"pid":5451,"message":"Creating index .kibana_2."}
Jan 11 22:16:12 packer-virtualbox-iso-1518710300 kibana[5451]: {"type":"log","@timestamp":"2019-01-11T22:16:12Z","tags":["warning","migrations"],"pid":5451,"message":"Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_2 and restarting Kibana."}

From this point on, Kibana page only says "Kibana server is not ready yet ".

It's up to the user to stop Kibana, fix the shards (hopefully from a snapshot), and restart it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants