Skip to content

Upgrade Assistant - Phase 2 - Reindexing #26368

@joshdover

Description

@joshdover

As part of Phase 2 of #20890, we need to add a UI and state layer to allow users to reindex old indices (created before 6.x) in order to be compatible with 7.0.

Left to Implement

In first PR:

  • Add confirmation textbox for destructive changes
  • Handle conflicting index names
  • Design cleanup
  • Ensure indices are writable if reindexing fails
  • Handle pausing ML jobs when reindexing ML indices
  • Stop/start watcher when reindex .watches

In follow up PR(s):

Other nice-to-haves:


Details

This feature will be similar in flow to the upgrade assistant in 5.6 and will:

  • Make the old index read-only
  • Create new index with the same settings and mappings
  • Begin the reindexing using the Reindex API
  • Wait for reindex to finish
  • Alias old index name to point to new index and delete old index

One issue with this flow last time was around persistence. Almost all of this logic was driven by client-side code, so if you left the page in the browser the process would stop. This time around we want to persist the reindex process into a saved object and leverage the Task Manager (#24356) to poll Elasticsearch's Task API (naming is fun) to poll the status of the reindex task and to resume the flow once the reindex is done. We've decided to persist this using saved objects that we will update using optimistic concurrency. We are going to break this work into two parts, first to get this working ONLY when the browser is on the page, and then if we have time, add a worker that could handle this in the background. We should also be able to offer a reindex progress indicator and the ability to abort or reset a reindex process.

Browser-driven iteration

For each reindex operation, we will create a saved object that acts as a state-machine to track the steps of the reindex process. To update this object, we will utilize the version parameter in Elasticsearch to ensure that there are not two browser tabs (or workers) attempting to update the object simultaneously.

Reindex flow:

  • User clicks "reindex", browser makes API call to server to begin reindexing for the given index.
  • Server creates a saved object to track this reindex operation with a status. Begins the first steps of the reindex: set old index as readonly, create new index, start the reindex operation. For each step of the way, we update the saved object's status field to track the state machine.
  • While the browser tab is on the Upgrade Assistant page, the browser will continue to poll for known reindexes in progress.
  • Once the reindex has finished, the server will complete the reindex process: alias the new index, delete the old index, mark the reindex operation as completed.

If the user leaves the page while the browser is polling, the alias switchover will not complete until they return to the upgrade assistant.

Worker-driven iteration

Largely the same flow, but we will have a in-process worker on the server side that will look for in-progress reindex operations, and continue to poll for their completion.

To reduce overhead from polling Elasticsearch, we could only boot up this worker if there are any known reindexes in-progress. This check will be done at startup and when a new reindex operation is started.

Potential problem:

  • kibana1 starts up, no reindex operations in progress, does not start worker.
  • kibana2 starts up, receives request to start reindex operation, starts worker.
  • kibana2 crashes before reindex is complete
  • kibana1 never starts worker, reindex operation is not shown as completed (and aliases not swapped over).

We could address this issue by either:

  • Polling for in-progress reindex operation saved objects on regular, but infrequent basis (say, every 5 minutes). If a new one is found, start polling its progress frequently (every 10s).
  • Polling for in-progress redindex operation saved objects whenever the user visits the Upgrade Assistant.

Known Unknowns

  • Which settings should be copied from the original index to the new index? So far, I know these cannot be copied:
    • index.uuid
    • index.creation_date
    • index.version.created
    • index.version.upgraded
    • index.provided_name
    • index.blocks
    • index.legacy
  • Can we intelligently block the user from using this tool for large indices? If so, how do we decide this? Can ES's reindex API tell us whether or not this process should succeed?
  • UI Design

Possible Improvements

  • Should we offer an option to reindex many small indices in a single action (done in serial, not in parallel)?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions