Skip to content

Commit

Permalink
md: Ensure resync is reported after it starts
Browse files Browse the repository at this point in the history
The 07layouts test in mdadm fails on some systems. The failure
presents itself as the backup file not being removed before the next
layout is grown into:

  mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
      File exists

This is because the background mdadm process, which is responsible for
cleaning up this backup file gets into an infinite loop waiting for
the reshape to start. mdadm checks the mdstat file if a reshape is
going and, if it is not, it waits for an event on the file or times
out in 5 seconds. On faster machines, the reshape may complete before
the 5 seconds times out, and thus the background mdadm process loops
waiting for a reshape to start that has already occurred.

mdadm reads the mdstat file to start, but mdstat does not report that the
reshape has begun, even though it has indeed begun. So the mdstat_wait()
call (in mdadm) which polls on the mdstat file won't ever return until
timing out.

The reason mdstat reports the reshape has started is due to an issue
in status_resync(). recovery_active is subtracted from curr_resync which
will result in a value of zero for the first chunk of reshaped data, and
the resulting read will report no reshape in progress.

To fix this, if "resync - recovery_active" is an overloaded value, force
the value to be MD_RESYNC_ACTIVE so the code reports a resync in progress.

Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
  • Loading branch information
lsgunth authored and axboe committed Aug 2, 2022
1 parent eac58d0 commit b368856
Showing 1 changed file with 12 additions and 2 deletions.
14 changes: 12 additions & 2 deletions drivers/md/md.c
Original file line number Diff line number Diff line change
Expand Up @@ -8022,10 +8022,20 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
/* Still cleaning up */
resync = max_sectors;
} else if (resync > max_sectors)
} else if (resync > max_sectors) {
resync = max_sectors;
else
} else {
resync -= atomic_read(&mddev->recovery_active);
if (resync < MD_RESYNC_ACTIVE) {
/*
* Resync has started, but the subtraction has
* yielded one of the special values. Force it
* to active to ensure the status reports an
* active resync.
*/
resync = MD_RESYNC_ACTIVE;
}
}

if (resync == MD_RESYNC_NONE) {
if (test_bit(MD_RESYNCING_REMOTE, &mddev->recovery)) {
Expand Down

0 comments on commit b368856

Please sign in to comment.