Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ public class RegionSizeCalculator {
private final Map<byte[], Long> sizeMap = new TreeMap<>(Bytes.BYTES_COMPARATOR);

static final String ENABLE_REGIONSIZECALCULATOR = "hbase.regionsizecalculator.enable";
private static final long MEGABYTE = 1024L * 1024L;

/**
* Computes size of each region for table and given column families.
Expand Down Expand Up @@ -82,8 +81,8 @@ private void init(RegionLocator regionLocator, Admin admin) throws IOException {
regionLocator.getName())) {

byte[] regionId = regionLoad.getRegionName();
long regionSizeBytes =
((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE;
long regionSizeBytes = (long) regionLoad.getMemStoreSize().get(Size.Unit.BYTE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we use Unit.MEGABYTE and then multiply 1024 * 1024 in the past? Can you find any related issues about this? Seems strange...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it was like this in the first place...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing Duo.

From my understanding, before we introduce the Size api, we got region size by getMemstoreSizeMB and getStoreFileSizeMB, so I guess this is just to keep the same when applied the new Size api ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, checked the code on how we constructor the store file size and memstore size, the default unit is MB, so it is useless to pass an unit less than MB here...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But anyway, I think the problem here is we also need to account memstore size when calculating region size?

Copy link
Contributor Author

@frostruan frostruan Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. otherwise the data in memstore will be lost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Hadoop assumes these split sizes are in megabytes and so we follow suit.

To protect against loss of precision, when the bytes-unit value is non-0, we can apply a minimum of 1mb. We should always add this minimum when running against an online-cluster.

When running against a snapshot, I'm not sure. MR over snapshots instantiates the region in the mapper process -- I assume that also reads the WAL and populates a memstore. In that case, we need the 1mb minimum here too. If not, we can permit the 0 to pass through and give the empty split optomization a chance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To protect against loss of precision, when the bytes-unit value is non-0, we can apply a minimum of 1mb. We should always add this minimum when running against an online-cluster.

Agree. And I believe this problem has been solved in https://issues.apache.org/jira/browse/HBASE-26609

When running against a snapshot, I'm not sure. MR over snapshots instantiates the region in the mapper process -- I assume that also reads the WAL and populates a memstore. In that case, we need the 1mb minimum here too. If not, we can permit the 0 to pass through and give the empty split optomization a chance.

The value of snapshot input split length will always be 0.
https://github.com/apache/hbase/blob/rel/3.0.0-beta-1/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L185
I think maybe this should be increased to 1MB too ?

+ (long) regionLoad.getStoreFileSize().get(Size.Unit.BYTE);

sizeMap.put(regionId, regionSizeBytes);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,14 @@ public void testSimpleTestCase() throws Exception {

RegionLocator regionLocator = mockRegionLocator("region1", "region2", "region3");

Admin admin = mockAdmin(mockRegion("region1", 123), mockRegion("region3", 1232),
mockRegion("region2", 54321));
Admin admin = mockAdmin(mockRegion("region1", 123, 1), mockRegion("region3", 1232, 1),
mockRegion("region2", 54321, 1));

RegionSizeCalculator calculator = new RegionSizeCalculator(regionLocator, admin);

assertEquals(123 * megabyte, calculator.getRegionSize(Bytes.toBytes("region1")));
assertEquals(54321 * megabyte, calculator.getRegionSize(Bytes.toBytes("region2")));
assertEquals(1232 * megabyte, calculator.getRegionSize(Bytes.toBytes("region3")));
assertEquals((123 + 1) * megabyte, calculator.getRegionSize(Bytes.toBytes("region1")));
assertEquals((54321 + 1) * megabyte, calculator.getRegionSize(Bytes.toBytes("region2")));
assertEquals((1232 + 1) * megabyte, calculator.getRegionSize(Bytes.toBytes("region3")));
// if regionCalculator does not know about a region, it should return 0
assertEquals(0 * megabyte, calculator.getRegionSize(Bytes.toBytes("otherTableRegion")));

Expand All @@ -82,11 +82,11 @@ public void testLargeRegion() throws Exception {

RegionLocator regionLocator = mockRegionLocator("largeRegion");

Admin admin = mockAdmin(mockRegion("largeRegion", Integer.MAX_VALUE));
Admin admin = mockAdmin(mockRegion("largeRegion", Integer.MAX_VALUE, 128));

RegionSizeCalculator calculator = new RegionSizeCalculator(regionLocator, admin);

assertEquals(((long) Integer.MAX_VALUE) * megabyte,
assertEquals(((Integer.MAX_VALUE + 128L)) * megabyte,
calculator.getRegionSize(Bytes.toBytes("largeRegion")));
}

Expand All @@ -96,11 +96,11 @@ public void testDisabled() throws Exception {
String regionName = "cz.goout:/index.html";
RegionLocator table = mockRegionLocator(regionName);

Admin admin = mockAdmin(mockRegion(regionName, 999));
Admin admin = mockAdmin(mockRegion(regionName, 999, 1));

// first request on enabled calculator
RegionSizeCalculator calculator = new RegionSizeCalculator(table, admin);
assertEquals(999 * megabyte, calculator.getRegionSize(Bytes.toBytes(regionName)));
assertEquals((999 + 1) * megabyte, calculator.getRegionSize(Bytes.toBytes(regionName)));

// then disabled calculator.
configuration.setBoolean(RegionSizeCalculator.ENABLE_REGIONSIZECALCULATOR, false);
Expand Down Expand Up @@ -147,11 +147,12 @@ private Admin mockAdmin(RegionMetrics... regionLoadArray) throws Exception {
* Creates mock of region with given name and size.
* @param fileSizeMb number of megabytes occupied by region in file store in megabytes
*/
private RegionMetrics mockRegion(String regionName, int fileSizeMb) {
private RegionMetrics mockRegion(String regionName, int fileSizeMb, int memStoreSizeMB) {
RegionMetrics region = Mockito.mock(RegionMetrics.class);
when(region.getRegionName()).thenReturn(Bytes.toBytes(regionName));
when(region.getNameAsString()).thenReturn(regionName);
when(region.getStoreFileSize()).thenReturn(new Size(fileSizeMb, Size.Unit.MEGABYTE));
when(region.getMemStoreSize()).thenReturn(new Size(memStoreSizeMB, Size.Unit.MEGABYTE));
return region;
}
}