Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDD performance decreasing a lot #298

Open
Massimo-B opened this issue Dec 10, 2024 · 27 comments
Open

HDD performance decreasing a lot #298

Massimo-B opened this issue Dec 10, 2024 · 27 comments

Comments

@Massimo-B
Copy link

Hi, I'm running several single device btrfs from 1TB up to 4TB, most are SSD, NVMe, some are 1TB HDDs.
I use btrbk a lot to store snapshots to other btrfs. I'm running bees on all btrfs, especially on the bigger btrfs for deduping snapshots from different machines.

I plan to get one single 20TB HDD for NAS data, server data and also all the snapshots from other machines centralized on this big btrfs.

But as all non-rotating SSDs are fine, the rotating HDDs performance are seriously decreasing. Most probably due to fragmentation of extents? Usually I can measure fragmentation only per file and I did not find any serious high fragmentation. How could I analyze the overall fragmentation of the whole btrfs?

For instance:
My 4TB NVMe receiving snapshots currently currently has 441 subvolumes. btrfs filesystem usage -g says 1759GiB free, min 960GiB. Runs fine.
The 1TB HDD has 74 subvolumes, 200GiB free. But running btrbk, transferring ~100MB takes 10 to 20 minutes and doing a noisy lot of IO. bees is almost never finishing even after days.

The planned 20TB HDD would be connected to a 24/7 thin-client with small CPU and only 8 GB of RAM. Due to the size there could be around 800 subvolumes over time... I doubt that will work if I'm already struggling with the small 1TB disk. I'm quite sure that bees is responsible for the degraded performance. But without bees I would never be able to dedupe snapshots received from different machines.

@kakra
Copy link
Contributor

kakra commented Dec 10, 2024

On COW filesystems, it's usually only useful to measure free space fragmentation because you cannot avoid to fragment files - it's a feature of COW to do exactly that. I'm not sure if free space fragmentation can be measured in btrfs, but it can be optimized by using tools like btrfs-balance-least-used with a parameter like -u80 to let it recombine chunks until each chunk is filled at least 80%. It starts by optimizing the least used chunks first, then walks it way up to the usage limit or 100% (if you don't set a usage limit).

See: https://github.com/knorrie/python-btrfs (maybe this toolset has a utility to measure fragmentation)

Optimizing free space fragmentation should improve write performance because the HDD heads tend to move less. Remember: COW always writes data to unused space.

@Zygo
Copy link
Owner

Zygo commented Dec 10, 2024

Is this with v0.10 or master? See #296. Things have improved a lot recently, particularly for handling snapshots and reducing fragmentation.

@Massimo-B
Copy link
Author

Wow, highly interesting. Thanks for the huge amount of development @Zygo and testing by @kakra only 1 week after the initial comment and RC.
Going to try that soon on my old btrfs with many bees versions working over the years.

But already existing fragmentation would not be healed by new extent scan mode?

@kakra
Copy link
Contributor

kakra commented Dec 10, 2024

If I understood correctly, bees will not decrease fragmentation, but due to the way the new extent scanner works, it avoids a lot of additional fragmentation that the old scanner introduced. It will still split extents and thus fragment some files but the impact is greatly reduced. To use it, switch to scan mode 4 (or remove the parameter from your script, it's the default).

@Massimo-B
Copy link
Author

As I'm building bees on Gentoo with version (**)9999 taking the latest repository files, how can I see which version I currently have?
/usr/libexec/bees and /usr/sbin/beesd both don't know a --version option.

@Zygo
Copy link
Owner

Zygo commented Dec 11, 2024

It's the first line of output:

# bees
bees version v0.10-33617-gab288ff84

It also appears in beesstats.txt:

Now:     2024-12-11-02-29-22
Uptime:  93379 seconds
Version: v0.10-33617-gab288ff84

@kakra
Copy link
Contributor

kakra commented Dec 11, 2024

As I'm building bees on Gentoo with version (**)9999 taking the latest repository files

I'm maintaining that ebuild. The 9999 version builds from git master. It should have told you about the new scan mode at the start of the build, then you're using the latest build.

Something like bees version v0.10-57-ge403398 if you run beesd --help, meaning it's 57 commits ahead of v0.10.

@Massimo-B
Copy link
Author

Today going to start v0.10-57-ge403398-dirty on my problematic 1TB HDD. First tests on SSDs have been successful.

Before I would like to optimize the free space fragmentation:

btrfs-balance-least-used with a parameter like -u80

How would I do that?
btrfs balance start -dusage=80 -musage=80 /mnt/btrfs ?

Wouldn't it be
btrfs balance start -dusage=0 -musage=0 /mnt/btrfs
in order to defrag chunks that are 0% used which is the free space?

@Zygo
Copy link
Owner

Zygo commented Dec 12, 2024

There is a utility in the python3-btrfs package called btrfs-balance-least-used which will target the lowest-usage block groups first, until the limit specified by -u is reached:
btrfs-balance-least-used -u 80

You can also use
btrfs balance start -dusage=80 /mnt/btrfs

Never use -musage or balance metadata when not converting to a different profile. It will delete existing metadata allocations and may allow the filesystem to run out of metadata space.

Generally it's better to run balance after dedupe, since the dedupe will leave behind a lot of random free space fragments. Balancing before dedupe will move the data around before the free space appears.

Balancing also requires bees to scan all the data again to keep the locations in the hash table up to date. Minimizing balance using the usage filter keeps the amount of rescanned data low.

@kakra
Copy link
Contributor

kakra commented Dec 12, 2024

The knorrie version does a better job than plain btrfs balance because it frees least used chunks first, possibly filling other considered chunks to the max so it can skip those later in the process.

@Massimo-B, see above, I posted this:

See: https://github.com/knorrie/python-btrfs (maybe this toolset has a utility to measure fragmentation)

@Massimo-B
Copy link
Author

Massimo-B commented Dec 17, 2024

You can also use btrfs balance start -dusage=80 /mnt/btrfs

Never use -musage or balance metadata when not converting to a different profile. It will delete existing metadata allocations and may allow the filesystem to run out of metadata space.

Too late, I already did
btrfs balance start -dusage=80 -musage=80 /mnt/btrfs
on both my SSD/NVMe btrfs and also the HDD.

My knowledge about the internals of btrfs are insufficient. I tried learning from #btrfs on Libera for not polluting tickets here with basic questions.

From what I understood, please correct if I'm wrong: btrfs organizes in chunks and extents. Chunks are physical organisation, extents are logical. Chunks are ~1GiB for data, extents are 4KiB up to several MiB. Chunks contain extents. Chunks are not necessarily contiguous and can be physically fragmented. Extents extents can be fragmented over chunks. Files can be fragmented over extents.

Now what does balance do? Reading man btrfs-balance, I'm confused as it first describes "block groups" for -d[<filters>]. Later it mentions "chunks" for -m[<filters>]. Is that the same thing, different wording?
At least on section FILTERS it mentioned block groups again for both data and metadata, so this would be worth a rewrite of the man-page.

Balance does spread chunks over devices. What it actually does on single devices is not that clear. Section EXAMPLES->MAKING BLOCK GROUP LAYOUT MORE COMPACT says, it takes partially filled chunks and move the extents to other chunks, then removing the empty chunks? That means the extents of a file is not changed, but extents are moved from partially filled chunks for filling existing chunks to have a higher usage? This does defragment the fragmentation of extents over chunks?
Reading further, what does that mean about the usable space? It feels like balance leaves unused chunks, which then must be removed by -dusage=0? So after -dusage=50 and -dusage=85 I need another -dusage=0 to finish? Confusing.

So why do you say "never use -musage"? In the way of -musage=50, -musage=85 the compacting done during the balancing could be run out of free space until the -musage=0 is able to free the empty chunks?

Generally, the balancing on HDD I need to get rid of free space fragmentation. Would I need balancing on SSD/NVMe at all?

What does the knorrie version do better? Is it just the order of processing because it sorts for least used chunks first? Some -dusage=10 and -dusage=20 would also only consider less used chunks and moves their extents to higher used chunks, maybe not sorted.

@lilydjwg
Copy link

I need another -dusage=0 to finish?

No.

So why do you say "never use -musage"?

It makes metadata more compact, and thus more likely it want to allocate a new block group metadata---if it's not possible, you may be stuck since deleting things requires updating metadata.

Would I need balancing on SSD/NVMe at all?

I recommend you balance when you have little unallocated space (but still a lot of unused space inside block groups), so when a new metadata block group is needed, you have that space.

My backup HDD performance also decreased quite a lot, but I think it's due to bees "deduplicated" shared data between snapshots and so unshared its metadata.

@kakra
Copy link
Contributor

kakra commented Dec 17, 2024

What does the knorrie version do better? Is it just the order of processing because it sorts for least used chunks first?

It's the order which helps finishing the process faster with less IO pressure.

@Zygo
Copy link
Owner

Zygo commented Dec 17, 2024

The loss of performance is concerning.

Dedupe will leave behind a lot of small free space holes and btrfs isn't great at filling small free space holes. Balance won't fix that--it allocates on...64K? (power-of-2? minimum-length?) boundaries, so the new block groups created during balance are full of tiny holes too. Regular writes do a second allocation pass that eventually fills in the smaller holes so allocation gets fast again, but I don't know of any way to speed that process up.

Unsharing snapshot metadata can also cause a slowdown--or a speedup, at random--because it churns all the metadata and redistributes it across the disk surface. That will also randomly get faster or slower over time with normal writes, as they'll move metadata pages that are modified by normal day-to-day writes closer together and reduce seek time.


A dev_extent is a contiguous region of physical space on one device. A chunk is one or more dev_extents organized in a RAID profile. A block group is a chunk with a free space counter. Block groups and chunks are very similar, and the terms are used interchangeably in cases where the presence or absence of the free space counter is not important.

An extent is a contiguous region of logical space in a block group. A file is a list of references to extents, indexed by offset from the beginning of the file. Extents are always contiguous--one fragment is one extent by definition. Extents are created when writing to a file, and can be smaller than the written region if the written region exceeds the maximum size for an extent, or if the largest free space available in any block group is smaller than the written region.

An extent cannot be modified once it is written. Extent reference lists in files can be modified--overwriting part of an extent will result in the reference to the old extent being shortened or split, and a new reference to the new extent will be inserted into the file's extent reference list. nodatacow files are an exception to this rule, but only if the extent that is overwritten is not shared. prealloc files have extent references that can be overwritten exactly once.

Balance moves extents from one block group to others, possibly creating new block groups in the process, or filling in empty space in existing block groups. Each block group is deleted after its contents have been relocated. The product of balance is the free space left behind by the deleted block groups, e.g.

  1. if you want more space for metadata, you balance data block groups, and hope more are deleted than created.
  2. If you want to expand one drive (replace a small one with a large one) in a 5-drive raid1 filesystem, you balance data block groups on devices other than the one that was expanded, creating equal amounts of free space on the non-expanded drives.
  3. If you want less space for metadata, you balance metadata block groups to delete them. Never do this unless you need to delete all the metadata block groups for some reason (e.g. removing a drive from a filesystem, converting to a different metadata profile, or to fix severe dev_extent fragmentation that results from btrfs-convert).

Almost every use case for balance requires a script to balance block groups in the optimal order for that use case. btrfs-balance-least-used is designed for example 1 above. There should be five or six such scripts by now for use cases like "expand one drive in N>2 disk raid1" or "defragment dev_extents after btrfs-convert", but nobody seems to get around to writing and publishing them (myself included).

Extents cannot cross block group boundaries. Balance cannot change the length of any extent. This means balance cannot necessarily fill block groups to 100%. Balance on a filesystem that is over about 90% full (the point where problems start varies from 84% to 99.5% depending on what's on the filesystem) can sometimes run out of space because of these limitations.

Defrag copies data to a new extent, and asks the allocator to allocate the largest extents possible (or exactly the size of the file, if that's smaller). This request may or may not be honored, and in the latter case, defrag will replace a large extent with multiple smaller ones.

@kakra
Copy link
Contributor

kakra commented Dec 17, 2024

I think btrfs really needs a tool to walk all the extents (possibly file by file), look if there's a chain of extents for that file which are not shared with other refs, and then rewrite that into continuous new extent - aka "snapshot-aware defrag". Then another pass would do something like "btrfs-balance-least-used" to fill the gaps that were created previously. Then, if needed, repeat (if the fs is quite full already, multiple passes are usually needed to bubble all the free space to the end of the storage). Tho, due to chunked allocation, this works differently on btrfs.

This would be similar to the work that many modern defragmenters do on NTFS cloud images to move as little data as possible while leaving as few holes as possible: defragment, consolidate, repeat. This improves access through the backend storage because bigger holes result in more flash cell trimming, and fewer fragments result in less IO requests (especially if the backend storage is properly maintained). From my own experience, such a cycle on a regular basis improves IO a lot in cloud VMs. Well, at least for SSD. A HDD defragmenter gets much more performance back by doing proper spatial reallocation (to restore data locality of closely related data). This is hard to implement correctly, and almost impossible on btrfs (due to cow).

@Massimo-B
Copy link
Author

It makes metadata more compact, and thus more likely it want to allocate a new block group metadata---if it's not possible, you may be stuck since deleting things requires updating metadata.

If I would run into this situation, what could I do?

Again some basic questions for better understanding:
Looking at my 4TB NVMe:

# btrfs filesystem usage -g /mnt/usb/mobiledata |grep estimated
    Free (estimated):		        1756.10GiB	(min: 966.27GiB)

# btrfs filesystem usage /mnt/usb/mobiledata |grep Size
Data,single: Size:1.96TiB, Used:1.78TiB (91.20%)
Metadata,DUP: Size:71.00GiB, Used:46.08GiB (64.90%)
System,DUP: Size:32.00MiB, Used:240.00KiB (0.73%)

Data chunks are like a reservation on the filesystem and can't be used for others like Metadata? The Size - Used = Unused is the space that can't be used for other things? So compacting the partially filled data chunks and removing the empty chunks actually frees the unusable space and makes it possible to be used for other purposes?
Why can't that be done on fhe fly and always needs a balancing? Why do other filesystems don't have that issue? It's like a bad scaling between data and metadata, like saying the filesystem has free space but already reserved for data or metadata?

What is the usual way when initially filling a btrfs? Chunks are filled one by one, and if there is no chunk left, a new chunk is going to be created (allocated) from free space? Without balancing, partially filled or empty chunks will never be deleted, so chunk count is only increasing, never decreasing in normal usage without balancing?

So in order to have enough free space for new data, it does not matter if data is going to partially filled chunks or by allocating new chunks, as long as the allocation is not done for metadata and unused, or the other way round that data chunks have too much allocation but unused while metadata can't allocate new chunks?

In that context, what does Free (estimated) mean? The man page says:

approximate size of the remaining free space usable for data, including currently allocated space and estimating the usage of the unallocated space based on the block group profiles, the min is the lower bound of the estimate in case multiple profiles are present

So "Free" means Unallocated + Allocated but unused for Data only? But that never takes into account if Metadata could be blocked if no allocation is possible anymore?

@Zygo
Copy link
Owner

Zygo commented Dec 18, 2024

Why can't that be done on fhe fly and always needs a balancing?

Balancing is how you do that. You can run balance with a utility like btrfs-balance-least-used -u 75 which lets you pick the times when the rearrangement of data is done, or you can have the kernel do it automatically as needed with echo 75 | tee /sys/fs/btrfs/*/allocation/data/bg_reclaim_threshold.

For "small" filesystems you can use -O mixed-bg to mkfs.btrfs which will eliminate the distinction between data and metadata block groups. The drawback is that metadata pages must be 4K in size and they make free space fragmentation worse because they're mixed with data block groups. mixed-bg isn't used with "large" filesystems because of the resulting performance issues. The size of "small" and "large" depends on your tolerance for low performance vs. having to manage separate data+metadata allocations, but the threshold is somewhere around 8 GiB.

What is the usual way when initially filling a btrfs? Chunks are filled one by one, and if there is no chunk left, a new chunk is going to be created (allocated) from free space? Without balancing, partially filled or empty chunks will never be deleted, so chunk count is only increasing, never decreasing in normal usage without balancing?

Yes. If you consistently use a filesystem the same way, the ratio of data to metadata stays relatively constant, so when the filesystem is finally filled up, the data to metadata ratio will be correct for your workload; however, if you change the way you use the filesystem (e.g. using snapshots with the relatime mount option or using dedupe for the first time), you may need to reclaim data space for metadata to expand.

Due to asymmetries between data and metadata, you usually want to reduce data allocations but never metadata allocations.

So in order to have enough free space for new data, it does not matter if data is going to partially filled chunks or by allocating new chunks, as long as the allocation is not done for metadata and unused, or the other way round that data chunks have too much allocation but unused while metadata can't allocate new chunks?

The bad case is when metadata grows to fill all existing allocated metadata space, so there is no unallocated space left to allocate more metadata. The kernel will try to force the filesystem to be read-only before that happens. In extreme cases, there is too much metadata committed to the filesystem before the kernel detects the problem, then the fileystem cannot be mounted read-write any more.

Generally, if metadata has been allocated in the past, the space was needed in order to handle some prior peak load of metadata usage. So it's not usually a good idea to reduce metadata allocation, even if there seems to be a lot of unused allocated space. Metadata balance should not be attempted if there are less than (3 + nr_disks) * 1G of free space in metadata allocations. If you have large snapshots, the limit for safe metadata balance may be significantly higher. These calculations are complex and the data required to make them is not easily available from the btrfs tools, so we generally recommend to users never balance metadata unless there's a major event like a change in the number of drives in the filesystem (and even then, only attempt it after ensuring that data block groups have been reclaimed).

Data allocation has no such restrictions. You can deallocate as much data as you like, and btrfs will either allocate more, or give an ENOSPC error when writing data, without forcing the filesystem read-only. You can't overwrite data when data block groups are full, but you can always delete data as long as there is free metadata space.

Partially filled data chunks are slower to write to, since the largest fragment sizes cannot be used when all of the free spaces are scattered and small. Balancing data helps with that by relocating data to consolidate free space into larger fragments.
There is no need to do the same thing for metadata: since all metadata blocks are the same size, fragmentation isn't possible.

In that context, what does Free (estimated) mean?

"Free" space is the sum of:

  1. Unused space in all existing data block group allocations
  2. The amount of data block group allocations that could be achieved from the unallocated space with the current raid profile. btrfs fi usage tries to give a minimum, while the kernel tries to give a maximum. The kernel and btrfs tools often get this part wrong, especially for devices of different sizes or mixed profiles during a profile conversion.

Note that metadata allocations are never considered free space, regardless of how much metadata space is unused within the allocations. Also note that actual usable space for data may be less than the estimate if more metadata space is needed for the new data (e.g. crc32c csums are 0.1% of the data size, so if you add 1 TiB of new data, you'll also add at least 1 GiB of metadata).

@kakra
Copy link
Contributor

kakra commented Dec 18, 2024

Data allocation has no such restrictions. You can deallocate as much data as you like, and btrfs will either allocate more, or give an ENOSPC error when writing data, without forcing the filesystem read-only. You can't overwrite data when data block groups are full, but you can always delete data as long as there is free metadata space.

But deleting data may not result in getting another free chunk that could then be allocated to meta data. That makes it even more important to run some sort of balance from time to time (or by automation).

But balance is a heavy operation. It doesn't always generate a lot of IO but it often keeps the FS busy with locks, leading to system stalls for different periods of time. Thus, I do this manually from time to time. But you may also want to run a cronjob to do this in less busy hours. E.g., my NAS does this some time during the day when I am at work, and it doesn't do this at night because my backups will run at this time.

@Massimo-B I still think all of this is a concern for you because you use btrfs as a backup vault, and try using bees to keep the vault as small as possible, because btrfs-send isn't good at preventing duplication. There are tools like borg or restic which are a lot better with this job on traditional file systems: less overhead from exploding fragmentation levels, better data density, better error recovery, better deduplication, less seeking, higher performance. But yes, you cannot directly browse the snapshots in the vault as a tree of files and directories (but you can use a fuse mount). However, I think this is what to use local short-lived rotating snapshots for.

Tools like btrfs-send have been created with replication in mind, not to create backups where you keep many of those snapshots.

@Massimo-B
Copy link
Author

automatically as needed with echo 75 | tee /sys/fs/btrfs/*/allocation/data/bg_reclaim_threshold.
Interesting, I learned more about balancing and this auto-balancing from https://wiki.tnonline.net/w/Btrfs/Balance

The bad case is when metadata grows to fill all existing allocated metadata space, so there is no unallocated space left to allocate more metadata. The kernel will try to force the filesystem to be read-only before that happens.

But as long as there is free unallocated space, meta data chunks can still be created?
Just for the case, if no metadata space is available anymore and fs is read-only, would there be any repairing possible?

so we generally recommend to users never balance metadata unless there's a major event like a change in the number of drives in the filesystem (and even then, only attempt it after ensuring that data block groups have been reclaimed).

I did balance metadata already, my mistake, but I did not run into the read-only situation yet. How can I repair the harm of the metadata balancing now?

@Massimo-B
Copy link
Author

@Massimo-B I still think all of this is a concern for you because you use btrfs as a backup vault, and try using bees to keep the vault as small as possible, because btrfs-send isn't good at preventing duplication. There are tools like borg or restic which are a lot better with this job on traditional file systems: less overhead from exploding fragmentation levels, better data density, better error recovery, better deduplication, less seeking, higher performance. But yes, you cannot directly browse the snapshots in the vault as a tree of files and directories (but you can use a fuse mount). However, I think this is what to use local short-lived rotating snapshots for.

I know and I keep that in mind. I'm going to do some cold backup with either borg or restic from my central btrfs soon.
But I'm not sure if that would entirely replace my mobile storage having all snapshots from all machines, and I'm working with that data and historical data. That currently is a 4TB USB-NVMe which is fast.
As long as borg/restic can also store all those snapshots and make it available via fuse-mount, it could replace my big mobile btrfs, maybe. I need to try it one day.

As btrfs on HDDs had quite bad performance after deduplication (might be better now after balancing and the new bees RC here...), the fuse mount onto a borg/restic might not be much slower.

Tools like btrfs-send have been created with replication in mind, not to create backups where you keep many of those snapshots.

But actually btrfs-send is as optimal as snapshots are. The dedupe I only need because different machines are contributing their snapshots to that central btrfs. The performance of btrfs-send is actually depending on the way of compression and decompression. I'm often collecting snapshots from machines over network. I did some benchmarks in the past using different zstd levels with and without adapt, having btrfs-send using --compressed-data or not...There were big differences, as my btrfs is already zstd-compressed and ĺeaving the compression without re-compression had advantages. I'm going to do it again soon and publish the results from commands like:
ssh -i '/root/.ssh/id_ed25519' -o compression=no root@mb 'btrfs send -p '\''/mnt/archive/mb/home/home.20240801T071700+0200/'\'' '\''/mnt/archive/mb/home/home.20240805T071700+0200//'\'' | mbuffer -v 1 -q -m 5% | zstd -c -T0' | zstd -d -c -T0 | mbuffer -v 1 -m 5% | btrfs receive '/mnt/local/data/archive/mb/home/'

@lilydjwg
Copy link

But as long as there is free unallocated space, meta data chunks can still be created?

Yes.

Just for the case, if no metadata space is available anymore and fs is read-only, would there be any repairing possible?

Prepare a large enough spare disk / partition, reboot to an environment that won't do anything to the filesystem automatically, and mount it and try btrfs device add new_device. If it succeeds, then you can start delete / balance things.

@Massimo-B
Copy link
Author

@Massimo-B I still think all of this is a concern for you because you use btrfs as a backup vault, and try using bees to keep the vault as small as possible, because btrfs-send isn't good at preventing duplication. There are tools like borg or restic which are a lot better with this job...

And another advantage on differential btrfs-send, I was able to collect snapshots from remote machine via slow uplinks very efficiently. But I guess the other backup tools also have efficient ways to do that in a similar way...

@Massimo-B
Copy link
Author

btrfs-balance-least-used -u 80

You can also use
btrfs balance start -dusage=80 /mnt/btrfs

The knorrie version does a better job than plain btrfs balance because it frees least used chunks first, possibly filling other considered chunks to the max so it can skip those later in the process.

Just to be clear, after both actions, the result would be exactly the same? Just that knorries btrfs-balance-least-used is more efficient and faster in that way?

@kakra
Copy link
Contributor

kakra commented Jan 15, 2025

This depends on what you describe the same... It will result in the same chunk usage (none below 80%) but of course it will not result in the same layout because extents are processed in a different order. So an extent which ends up at the start of the hard disk with the knorrie version may be picked up much later with the native btrfs-balance version and then end up somewhere in the middle of the hard disk.

I'm not sure if that really matters. There's no way in btrfs to control data locality anyways (which could be important for HDD usage). The knorrie version frees chunks much faster.

@Massimo-B
Copy link
Author

The knorrie version does a better job than plain btrfs balance because it frees least used chunks first, possibly filling other considered chunks to the max so it can skip those later in the process.

Then actually the knorrie version ist a better replacement for btrfs balance in any case, and there is no reason to use btrfs balance for balancing data?

<offtopic, about Gentoo> dev-python/btrfs seems to be the wrong category as it contains useful executable tools like /usr/bin/btrfs-balance-least-used, I should file a ticket to move and rename it to something like sys-fs/knorrietools or whatever.

@kakra
Copy link
Contributor

kakra commented Jan 16, 2025

seems to be the wrong category as it contains useful executable tools

I don't think so... It's mainly a btrfs API lib for python, but coincidentally, it also ships some useful tools that have probably been started as demo/test project. Personally, I'm using btrfs-balance-least-used directly from the git checkout (symlinked from /usr/local/sbin to actually launch it without providing the path).

@kakra
Copy link
Contributor

kakra commented Jan 16, 2025

Then actually the knorrie version ist a better replacement for btrfs balance in any case, and there is no reason to use btrfs balance for balancing data?

For this use-case, yes. It won't work that well if you need to change the raid profile of all chunks because it ignores chunks which are full already. So there are valid reasons to use both tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants