Skip to content

Commit 6b7faad

Browse files
adam900710kdave
authored andcommitted
btrfs: Ensure we trim ranges across block group boundary
[BUG] When deleting large files (which cross block group boundary) with discard mount option, we find some btrfs_discard_extent() calls only trimmed part of its space, not the whole range: btrfs_discard_extent: type=0x1 start=19626196992 len=2144530432 trimmed=1073741824 ratio=50% type: bbio->map_type, in above case, it's SINGLE DATA. start: Logical address of this trim len: Logical length of this trim trimmed: Physically trimmed bytes ratio: trimmed / len Thus leaving some unused space not discarded. [CAUSE] When discard mount option is specified, after a transaction is fully committed (super block written to disk), we begin to cleanup pinned extents in the following call chain: btrfs_commit_transaction() |- btrfs_finish_extent_commit() |- find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY); |- btrfs_discard_extent() However, pinned extents are recorded in an extent_io_tree, which can merge adjacent extent states. When a large file gets deleted and it has adjacent file extents across block group boundary, we will get a large merged range like this: |<--- BG1 --->|<--- BG2 --->| |//////|<-- Range to discard --->|/////| To discard that range, we have the following calls: btrfs_discard_extent() |- btrfs_map_block() | Returned bbio will end at BG1's end. As btrfs_map_block() | never returns result across block group boundary. |- btrfs_issuse_discard() Issue discard for each stripe. So we will only discard the range in BG1, not the remaining part in BG2. Furthermore, this bug is not that reliably observed, for above case, if there is no other extent in BG2, BG2 will be empty and btrfs will trim all space of BG2, covering up the bug. [FIX] - Allow __btrfs_map_block_for_discard() to modify @Length parameter btrfs_map_block() uses its @Length paramter to notify the caller how many bytes are mapped in current call. With __btrfs_map_block_for_discard() also modifing the @Length, btrfs_discard_extent() now understands when to do extra trim. - Call btrfs_map_block() in a loop until we hit the range end Since we now know how many bytes are mapped each time, we can iterate through each block group boundary and issue correct trim for each range. Reviewed-by: Filipe Manana <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Tested-by: Nikolay Borisov <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 2d97461 commit 6b7faad

File tree

2 files changed

+35
-12
lines changed

2 files changed

+35
-12
lines changed

fs/btrfs/extent-tree.c

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1306,8 +1306,10 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
13061306
int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
13071307
u64 num_bytes, u64 *actual_bytes)
13081308
{
1309-
int ret;
1309+
int ret = 0;
13101310
u64 discarded_bytes = 0;
1311+
u64 end = bytenr + num_bytes;
1312+
u64 cur = bytenr;
13111313
struct btrfs_bio *bbio = NULL;
13121314

13131315

@@ -1316,15 +1318,23 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
13161318
* associated to its stripes that don't go away while we are discarding.
13171319
*/
13181320
btrfs_bio_counter_inc_blocked(fs_info);
1319-
/* Tell the block device(s) that the sectors can be discarded */
1320-
ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, bytenr, &num_bytes,
1321-
&bbio, 0);
1322-
/* Error condition is -ENOMEM */
1323-
if (!ret) {
1324-
struct btrfs_bio_stripe *stripe = bbio->stripes;
1321+
while (cur < end) {
1322+
struct btrfs_bio_stripe *stripe;
13251323
int i;
13261324

1325+
num_bytes = end - cur;
1326+
/* Tell the block device(s) that the sectors can be discarded */
1327+
ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, cur,
1328+
&num_bytes, &bbio, 0);
1329+
/*
1330+
* Error can be -ENOMEM, -ENOENT (no such chunk mapping) or
1331+
* -EOPNOTSUPP. For any such error, @num_bytes is not updated,
1332+
* thus we can't continue anyway.
1333+
*/
1334+
if (ret < 0)
1335+
goto out;
13271336

1337+
stripe = bbio->stripes;
13281338
for (i = 0; i < bbio->num_stripes; i++, stripe++) {
13291339
u64 bytes;
13301340
struct request_queue *req_q;
@@ -1341,10 +1351,19 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
13411351
stripe->physical,
13421352
stripe->length,
13431353
&bytes);
1344-
if (!ret)
1354+
if (!ret) {
13451355
discarded_bytes += bytes;
1346-
else if (ret != -EOPNOTSUPP)
1347-
break; /* Logic errors or -ENOMEM, or -EIO but I don't know how that could happen JDM */
1356+
} else if (ret != -EOPNOTSUPP) {
1357+
/*
1358+
* Logic errors or -ENOMEM, or -EIO, but
1359+
* unlikely to happen.
1360+
*
1361+
* And since there are two loops, explicitly
1362+
* go to out to avoid confusion.
1363+
*/
1364+
btrfs_put_bbio(bbio);
1365+
goto out;
1366+
}
13481367

13491368
/*
13501369
* Just in case we get back EOPNOTSUPP for some reason,
@@ -1354,7 +1373,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
13541373
ret = 0;
13551374
}
13561375
btrfs_put_bbio(bbio);
1376+
cur += num_bytes;
13571377
}
1378+
out:
13581379
btrfs_bio_counter_dec(fs_info);
13591380

13601381
if (actual_bytes)

fs/btrfs/volumes.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5374,12 +5374,13 @@ void btrfs_put_bbio(struct btrfs_bio *bbio)
53745374
* replace.
53755375
*/
53765376
static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info,
5377-
u64 logical, u64 length,
5377+
u64 logical, u64 *length_ret,
53785378
struct btrfs_bio **bbio_ret)
53795379
{
53805380
struct extent_map *em;
53815381
struct map_lookup *map;
53825382
struct btrfs_bio *bbio;
5383+
u64 length = *length_ret;
53835384
u64 offset;
53845385
u64 stripe_nr;
53855386
u64 stripe_nr_end;
@@ -5413,6 +5414,7 @@ static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info,
54135414

54145415
offset = logical - em->start;
54155416
length = min_t(u64, em->start + em->len - logical, length);
5417+
*length_ret = length;
54165418

54175419
stripe_len = map->stripe_len;
54185420
/*
@@ -5827,7 +5829,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
58275829

58285830
if (op == BTRFS_MAP_DISCARD)
58295831
return __btrfs_map_block_for_discard(fs_info, logical,
5830-
*length, bbio_ret);
5832+
length, bbio_ret);
58315833

58325834
ret = btrfs_get_io_geometry(fs_info, op, logical, *length, &geom);
58335835
if (ret < 0)

0 commit comments

Comments
 (0)