-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-24286: HMaster won't become healthy after after cloning or crea… #2113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -71,7 +71,11 @@ private static void writeFsLayout(Path rootDir, Configuration conf) throws IOExc | |
| LOG.info("BOOTSTRAP: creating hbase:meta region"); | ||
| FileSystem fs = rootDir.getFileSystem(conf); | ||
| Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME); | ||
| if (fs.exists(tableDir) && !fs.delete(tableDir, true)) { | ||
| boolean removeMeta = conf.getBoolean(HConstants.REMOVE_META_ON_RESTART, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. // Checking if meta needs initializing.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I will come back on this later tomorrow, but I agreed with you that we should check explicitly how we define partial bootstrap and that also, do you mean if the clusterID did't write to ZK, is it
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am not sure whether that can be really used. I need to check the code. We need a way to identify the fact that its a cluster redeploy. Not use some config to identify that.. The HBase system should be smart enough. So I was just wondering whether this we can use to know that. May be not.. Need to see. So my thinking is this that we will make the feature of recreate a cluster on top of existing data a 1st class feature for HBase itself.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That would be great, let's find a good way to differentiate this case.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry for late and I have reread the code and come up the following. First of all, the Further, a combination of looking at WALs, Procedure WALs and Zookeeper data are the requirement and are used to define
There, for this PR, if we only focus on the cloud use cases, the unknown servers and
So, we're fixing case 2b in this PR, and I have come up the prototype and unit tests are running off this PR now ( The proposed changes are
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for the delay. Need to read through the analysis what you put above.. What you mention about when, after recreate cluster, the start will hang because of META not getting assigned, is correct. Can u pls create a sub issue for this case? ie. knowing whether we are starting HM after a recreate (create cluster over existing data)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In case of HM start and the bootstrap we create the ClusterID and write to FS and then to zk and then create the META table FS layout. So in a cluster recreate, we will see clusterID is there in FS and also the META FS layout but no clusterID in zk. Ya seems we can use this as indication for cluster recreate over existing data. In HM start, this is some thing we need to check at 1st itself and track. If this mode is true, later when (if) we do INIT_META_WRITE_FS_LAYOUT , we should not delete the META dir. As part of the Bootstrap when we write that proc to MasterProcWal, we can include this mode (boolean) info also. This is a protobuf message anyways. So even if this HM got killed and restarted (at a point where the clusterId was written to zk but the Meta FS layout part was not reached) we can use the info added as part of the bootstrap wal entry and make sure NOT to delete the meta dir.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds right to me, as you suggested we put this PR on-held and depends on the new sub-task. I will try to send another JIRA and PR out in a few days and refer to the conversation we discussed here. Thanks again Anoop
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry for harping on an implementation detail: let's sideline meta and not delete please :).
This seems like a very reasonable starting point. Like Anoop points out, if we can be very sure that we will only trigger this case when we are absolutely sure we're in the "cloud recreate" situation, that will bring a lot of confidence.
Lazy Josh: did you get a new Jira created already for this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @joshelser the new JIRA is HBASE-24833 and the discussion are mainly on the new PR#2237. I may need to send email to dev@ list for a boarder discussion if we should not depend on the data on zookeeper (that will help us to prevent deleting the meta directory) |
||
| HConstants.DEFAULT_REMOVE_META_ON_RESTART); | ||
| // we use zookeeper data to tell if this is a partial created meta, if so we should delete | ||
| // and recreate the meta table. | ||
| if (removeMeta && fs.exists(tableDir) && !fs.delete(tableDir, true)) { | ||
| LOG.warn("Can not delete partial created meta table, continue..."); | ||
| } | ||
| // Bootstrapping, make sure blockcache is off. Else, one will be | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,226 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package org.apache.hadoop.hbase.master; | ||
|
|
||
| import static org.junit.Assert.assertEquals; | ||
| import static org.junit.Assert.assertFalse; | ||
| import static org.junit.Assert.assertTrue; | ||
|
|
||
| import java.io.IOException; | ||
| import java.time.Duration; | ||
| import java.util.List; | ||
|
|
||
| import org.apache.hadoop.fs.Path; | ||
| import org.apache.hadoop.hbase.Cell; | ||
| import org.apache.hadoop.hbase.HBaseClassTestRule; | ||
| import org.apache.hadoop.hbase.HBaseTestingUtility; | ||
| import org.apache.hadoop.hbase.HConstants; | ||
| import org.apache.hadoop.hbase.MiniHBaseCluster; | ||
| import org.apache.hadoop.hbase.ServerName; | ||
| import org.apache.hadoop.hbase.TableName; | ||
| import org.apache.hadoop.hbase.client.Get; | ||
| import org.apache.hadoop.hbase.client.Put; | ||
| import org.apache.hadoop.hbase.client.RegionInfo; | ||
| import org.apache.hadoop.hbase.client.Result; | ||
| import org.apache.hadoop.hbase.client.Table; | ||
| import org.apache.hadoop.hbase.master.region.MasterRegionFactory; | ||
| import org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore; | ||
| import org.apache.hadoop.hbase.regionserver.HRegionServer; | ||
| import org.apache.hadoop.hbase.testclassification.LargeTests; | ||
| import org.apache.hadoop.hbase.util.Bytes; | ||
| import org.apache.hadoop.hbase.util.CommonFSUtils; | ||
| import org.apache.hadoop.hbase.zookeeper.ZKUtil; | ||
|
|
||
| import org.junit.Before; | ||
| import org.junit.ClassRule; | ||
| import org.junit.Rule; | ||
| import org.junit.Test; | ||
| import org.junit.experimental.categories.Category; | ||
| import org.junit.rules.TestName; | ||
|
|
||
| /** | ||
| * Test reuse storefiles within data directory when cluster failover with a set of new region | ||
| * servers with different hostnames with or without WALs and Zookeeper ZNodes support. For any | ||
| * hbase system table and user table can be assigned normally after cluster restart. | ||
| */ | ||
| @Category({ LargeTests.class }) | ||
| public class TestRecreateCluster { | ||
| @ClassRule | ||
| public static final HBaseClassTestRule CLASS_RULE = | ||
| HBaseClassTestRule.forClass(TestRecreateCluster.class); | ||
|
|
||
| @Rule | ||
| public TestName name = new TestName(); | ||
|
|
||
| private static final HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility(); | ||
| private static final int NUM_RS = 3; | ||
| private static final long TIMEOUT_MS = Duration.ofMinutes(2).toMillis(); | ||
|
|
||
| @Before | ||
| public void setup() { | ||
| TEST_UTIL.getConfiguration().setBoolean(HConstants.REMOVE_META_ON_RESTART, false); | ||
| } | ||
|
|
||
| @Test | ||
| public void testRecreateCluster_UserTableDisabled() throws Exception { | ||
| TEST_UTIL.startMiniCluster(NUM_RS); | ||
| try { | ||
| TableName tableName = TableName.valueOf("t1"); | ||
| prepareDataBeforeRecreate(TEST_UTIL, tableName); | ||
| TEST_UTIL.getAdmin().disableTable(tableName); | ||
| TEST_UTIL.waitTableDisabled(tableName.getName()); | ||
| restartHBaseCluster(true); | ||
| TEST_UTIL.getAdmin().enableTable(tableName); | ||
| validateDataAfterRecreate(TEST_UTIL, tableName); | ||
| } finally { | ||
| TEST_UTIL.shutdownMiniCluster(); | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| public void testRecreateCluster_UserTableEnabled() throws Exception { | ||
| validateRecreateClusterWithUserTableEnabled(true); | ||
| } | ||
|
|
||
| @Test | ||
| public void testRecreateCluster_UserTableEnabled_WithoutCleanupWALsAndZNodes() throws Exception { | ||
| TEST_UTIL.getConfiguration().setBoolean(HConstants.REMOVE_META_ON_RESTART, | ||
| HConstants.DEFAULT_REMOVE_META_ON_RESTART); | ||
| validateRecreateClusterWithUserTableEnabled(false); | ||
| } | ||
|
|
||
| private void validateRecreateClusterWithUserTableEnabled(boolean cleanupWALsAndZNodes) | ||
| throws Exception { | ||
| TEST_UTIL.startMiniCluster(NUM_RS); | ||
| try { | ||
| TableName tableName = TableName.valueOf("t1"); | ||
| prepareDataBeforeRecreate(TEST_UTIL, tableName); | ||
| restartHBaseCluster(cleanupWALsAndZNodes); | ||
| validateDataAfterRecreate(TEST_UTIL, tableName); | ||
| } finally { | ||
| TEST_UTIL.shutdownMiniCluster(); | ||
| } | ||
| } | ||
|
|
||
| private void restartHBaseCluster(boolean cleanUpWALsAndZNodes) throws Exception { | ||
| // flush cache so that everything is on disk | ||
| TEST_UTIL.getMiniHBaseCluster().flushcache(); | ||
|
|
||
| List<ServerName> oldServers = | ||
| TEST_UTIL.getHBaseCluster().getMaster().getServerManager().getOnlineServersList(); | ||
|
|
||
| // make sure there is no procedures pending | ||
| TEST_UTIL.waitFor(TIMEOUT_MS, () -> TEST_UTIL.getHBaseCluster().getMaster() | ||
| .getProcedures().stream().filter(p -> p.isFinished()).findAny().isPresent()); | ||
|
|
||
| // shutdown and delete data if needed | ||
| Path walRootDirPath = TEST_UTIL.getMiniHBaseCluster().getMaster().getWALRootDir(); | ||
| Path rootDirPath = CommonFSUtils.getRootDir(TEST_UTIL.getConfiguration()); | ||
| TEST_UTIL.shutdownMiniHBaseCluster(); | ||
|
|
||
| if (cleanUpWALsAndZNodes) { | ||
| TEST_UTIL.getDFSCluster().getFileSystem() | ||
| .delete(new Path(rootDirPath, MasterRegionFactory.MASTER_STORE_DIR), true); | ||
| TEST_UTIL.getDFSCluster().getFileSystem() | ||
| .delete(new Path(walRootDirPath, MasterRegionFactory.MASTER_STORE_DIR), true); | ||
| TEST_UTIL.getDFSCluster().getFileSystem() | ||
| .delete(new Path(walRootDirPath, WALProcedureStore.MASTER_PROCEDURE_LOGDIR), true); | ||
|
|
||
| TEST_UTIL.getDFSCluster().getFileSystem() | ||
| .delete(new Path(walRootDirPath, HConstants.HREGION_LOGDIR_NAME), true); | ||
| TEST_UTIL.getDFSCluster().getFileSystem() | ||
| .delete(new Path(walRootDirPath, HConstants.HREGION_OLDLOGDIR_NAME), true); | ||
| // delete all zk data | ||
| // we cannot keep ZK data because it will hold the meta region states as open and | ||
| // didn't submit a InitMetaProcedure | ||
| ZKUtil.deleteChildrenRecursively(TEST_UTIL.getZooKeeperWatcher(), | ||
| TEST_UTIL.getZooKeeperWatcher().getZNodePaths().baseZNode); | ||
| TEST_UTIL.shutdownMiniZKCluster(); | ||
| TEST_UTIL.startMiniZKCluster(); | ||
| } | ||
|
|
||
| TEST_UTIL.restartHBaseCluster(NUM_RS); | ||
| TEST_UTIL.waitFor(TIMEOUT_MS, | ||
| () -> TEST_UTIL.getMiniHBaseCluster().getNumLiveRegionServers() == NUM_RS); | ||
|
|
||
| // make sure we have a new set of region servers with different hostnames and ports | ||
| List<ServerName> newServers = | ||
| TEST_UTIL.getHBaseCluster().getMaster().getServerManager().getOnlineServersList(); | ||
| assertFalse(newServers.stream().filter(newServer -> oldServers.contains(newServer)).findAny() | ||
| .isPresent()); | ||
| } | ||
|
|
||
| private void prepareDataBeforeRecreate( | ||
| HBaseTestingUtility testUtil, TableName tableName) throws Exception { | ||
| Table table = testUtil.createTable(tableName, "f"); | ||
| Put put = new Put(Bytes.toBytes("r1")); | ||
| put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("c"), Bytes.toBytes("v")); | ||
| table.put(put); | ||
|
|
||
| ensureTableNotColocatedWithSystemTable(tableName, TableName.NAMESPACE_TABLE_NAME); | ||
| } | ||
|
|
||
| private void ensureTableNotColocatedWithSystemTable(TableName userTable, TableName systemTable) | ||
| throws IOException, InterruptedException { | ||
| MiniHBaseCluster hbaseCluster = TEST_UTIL.getHBaseCluster(); | ||
| assertTrue("Please start more than 1 regionserver", | ||
| hbaseCluster.getRegionServerThreads().size() > 1); | ||
|
|
||
| int userTableServerNum = getServerNumForTableWithOnlyOneRegion(userTable); | ||
| int systemTableServerNum = getServerNumForTableWithOnlyOneRegion(systemTable); | ||
|
|
||
| if (userTableServerNum != systemTableServerNum) { | ||
| // no-ops if user table and system are already on a different host | ||
| return; | ||
| } | ||
|
|
||
| int destServerNum = (systemTableServerNum + 1) % NUM_RS; | ||
| assertTrue(systemTableServerNum != destServerNum); | ||
|
|
||
| HRegionServer systemTableServer = hbaseCluster.getRegionServer(systemTableServerNum); | ||
| HRegionServer destServer = hbaseCluster.getRegionServer(destServerNum); | ||
| assertTrue(!systemTableServer.equals(destServer)); | ||
| // make sure the dest server is live before moving region | ||
| hbaseCluster.waitForRegionServerToStart(destServer.getServerName().getHostname(), | ||
| destServer.getServerName().getPort(), TIMEOUT_MS); | ||
| // move region of userTable to a different regionserver not co-located with system table | ||
| TEST_UTIL.moveRegionAndWait(TEST_UTIL.getAdmin().getRegions(userTable).get(0), | ||
| destServer.getServerName()); | ||
| } | ||
|
|
||
| private int getServerNumForTableWithOnlyOneRegion(TableName tableName) throws IOException { | ||
| List<RegionInfo> tableRegionInfos = TEST_UTIL.getAdmin().getRegions(tableName); | ||
| assertEquals(1, tableRegionInfos.size()); | ||
| return TEST_UTIL.getHBaseCluster() | ||
| .getServerWith(tableRegionInfos.get(0).getRegionName()); | ||
| } | ||
|
|
||
| private void validateDataAfterRecreate( | ||
| HBaseTestingUtility testUtil, TableName tableName) throws Exception { | ||
| Table t1 = testUtil.getConnection().getTable(tableName); | ||
| Get get = new Get(Bytes.toBytes("r1")); | ||
| get.addColumn(Bytes.toBytes("f"), Bytes.toBytes("c")); | ||
| Result result = t1.get(get); | ||
| assertTrue(result.advance()); | ||
| Cell cell = result.current(); | ||
| assertEquals("v", Bytes.toString(cell.getValueArray(), | ||
| cell.getValueOffset(), cell.getValueLength())); | ||
| assertFalse(result.advance()); | ||
| } | ||
|
|
||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collapse these down into one method so we don't end up making 4 iterations over a list of (potentially) a lot of regions.