Skip to content

Conversation

@taklwu
Copy link
Contributor

@taklwu taklwu commented Jul 21, 2020

…ting a new cluster pointing at the same file system

HBase currently does not handle Unknown Servers automatically and requires
users to run hbck2 scheduleRecoveries when one see unknown servers on
the HBase report UI.

This became a blocker on HBase2 adoption especially when a table wasn't
disabled before shutting down a HBase cluster on cloud or any dynamic
environment that hostname may change frequently. Once the cluster restarts,
hbase:meta will be keeping the old hostname/IPs for the previous cluster,
and those region servers became Unknown Servers and will never be recycled.

Our fix here is to trigger a repair immediately after the CatalogJanitor
figured out any Unknown Servers with submitting a HBCKServerCrashProcedure
such that regions on Unknown Server can be reassigned to other online
servers.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 2m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+1 💚 mvninstall 4m 12s branch-2 passed
+1 💚 checkstyle 1m 18s branch-2 passed
+1 💚 spotbugs 2m 10s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 38s the patch passed
+1 💚 checkstyle 1m 19s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 12m 41s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 2m 18s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
38m 3s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux ab2dba034cd0 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 5bb76bf
Max. process+thread count 84 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

// mismatches with hbase:meta. in fact if HBCKSCP finds any in-memory region states,
// HBCKSCP is basically same as SCP.
lastReport.unknownServers.stream().forEach(regionInfoServerNamePair -> {
LOG.warn("Submitting HBCKSCP for Unknown Region Server {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be at INFO level? if it is at WARN what can an operator do to prevent future warnings? if this is something that is going to happen as a matter of course then INFO is more appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this is expected, the log is just for info purposes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I will change it.

@@ -0,0 +1,185 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be a non-javadoc comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, sorry that I copied from other files and i found many place in hbase has this header lol

TEST_UTIL.shutdownMiniHBaseCluster();
TEST_UTIL.getDFSCluster().getFileSystem().delete(
new Path(walRootPath.toString(), HConstants.HREGION_LOGDIR_NAME), true);
TEST_UTIL.getDFSCluster().getFileSystem().delete(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some check before this that the cluster successfully cleanly shut down and thus has no procedures pending?

if we are expressly trying to test that we can come up safely after having the master procedure wal destroyed while things are in-flight then we should make that a test case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor to wait will procedure completed and delete the WALs and Master procedure WALs.

@joshelser
Copy link
Member

I'm not sure if we want to discuss here or on #2114, but, copying from #2114 (review),

While I think the CatalogJanitor approach is probably an effective solution, I wonder if there's a "faster" solution we could do.

The main question is, when we don't have ZooKeeper telling us that a RegionServer has died, how can we be certain that a RegionServer won't "come back"? If we get into a situation where data was still hosted on a RegionServer we thought was dead, we would double-assign the region and that'd be a big bug.

Any thoughts on how to try to minimize the chance of us incorrectly marking a RegionServer as dead?

@z-york
Copy link
Contributor

z-york commented Jul 21, 2020

I think the code is the same between the two, so let's discuss here since it seems all the info is here (at the moment).

I'm not sure if we want to discuss here or on #2114, but, copying from #2114 (review),

While I think the CatalogJanitor approach is probably an effective solution, I wonder if there's a "faster" solution we could do.

The main question is, when we don't have ZooKeeper telling us that a RegionServer has died, how can we be certain that a RegionServer won't "come back"? If we get into a situation where data was still hosted on a RegionServer we thought was dead, we would double-assign the region and that'd be a big bug.

Any thoughts on how to try to minimize the chance of us incorrectly marking a RegionServer as dead?

In what cases can a RS be marked as "unknown"? If we think this is a transient state, we can always add a ttl before reassigning (but that will add considerable recovery time).

@z-york
Copy link
Contributor

z-york commented Jul 21, 2020

 The main question is, when we don't have ZooKeeper telling us that a RegionServer has died, how can we be certain that > a RegionServer won't "come back"? If we get into a situation where data was still hosted on a RegionServer we thought was > dead, we would double-assign the region and that'd be a big bug.

If hbase:meta points to the correct RS, theoretically, no writes should come to this RS, right? In that case, there shouldn't be too adverse of affects... if that RS crashes, the Region won't be reassigned because HMaster/ZK didn't know about it.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 4m 15s branch-2 passed
+1 💚 compile 1m 4s branch-2 passed
+1 💚 shadedjars 5m 50s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 44s hbase-server in branch-2 failed.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 56s the patch passed
+1 💚 compile 1m 2s the patch passed
+1 💚 javac 1m 2s the patch passed
+1 💚 shadedjars 5m 46s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 41s hbase-server in the patch failed.
_ Other Tests _
-1 ❌ unit 127m 57s hbase-server in the patch failed.
154m 12s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux 81fa51548e0e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 5bb76bf
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/testReport/
Max. process+thread count 3845 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@taklwu
Copy link
Contributor Author

taklwu commented Jul 21, 2020

I need to double check the unit test failing, will update here only once I have the correct patch (something is hanging the region restarts).

Any thoughts on how to try to minimize the chance of us incorrectly marking a RegionServer as dead?

Correct me if I'm wrong, if an unknown server join on a hbase cluster, since it's not online or dead, assignment manager should not take those online regions as consideration and serving request. or is this not true?

Basically, DEAD and UNKNOWN_SERVER are different in branch-2 and master branch, which in ServerManager we only track onlineServers and deadServer and I didn't find any transition that a UNKNOWN_SERVER could be moved to DEAD or ONLINE.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 13s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 55s branch-2 passed
+1 💚 compile 0m 59s branch-2 passed
+1 💚 shadedjars 5m 26s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 34s the patch passed
+1 💚 compile 0m 58s the patch passed
+1 💚 javac 0m 58s the patch passed
+1 💚 shadedjars 5m 29s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 36s the patch passed
_ Other Tests _
-1 ❌ unit 204m 11s hbase-server in the patch failed.
228m 56s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux e2aad40cef88 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 5bb76bf
Default Java 1.8.0_232
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/testReport/
Max. process+thread count 2869 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@joshelser
Copy link
Member

If hbase:meta points to the correct RS, theoretically, no writes should come to this RS, right?

No. A client which spans the lifetime of your cluster (prior to shutdown, destruction, and recreation) could potentially have a cached region location. This is why fencing (e.g. HDFS lease recovery) is super-important for us. A client could presumably continue to try to write to a RS who has gone haywire.

@z-york
Copy link
Contributor

z-york commented Jul 21, 2020

If hbase:meta points to the correct RS, theoretically, no writes should come to this RS, right?

No. A client which spans the lifetime of your cluster (prior to shutdown, destruction, and recreation) could potentially have a cached region location. This is why fencing (e.g. HDFS lease recovery) is super-important for us. A client could presumably continue to try to write to a RS who has gone haywire.

Doesn't a SCP trigger log splitting (and therefore recoverLease) which would handle this case?

@joshelser
Copy link
Member

Correct me if I'm wrong, if an unknown server join on a hbase cluster, since it's not online or dead, assignment manager should not take those online regions as consideration and serving request. or is this not true?

So, I feel like you're asking a different question than what I was concerned about (I was concerned about making sure all "old" RegionServers are actually down before we reassign regions onto new servers). This worries me because we rely on SCP's which we are acknowledging are gone in this scenario. Do we just have to make an external "requirement" that the system stopping old hardware ensures all previous RegionServers are fully dead before proceeding with creating new ones that point at the same data?

To the question you asked, what is the definition of an "unknown" server in your case: a ServerName listed in meta as a region's assigned location which is not in the AssignmentManagers set of live RegionServers? If that's the case, yes, that's how I understand AM to work today -- the presence of an "unknown" server as an assignment indicates a failure in the system. That is, we lost a MasterProcWAL which had an SCP for a RS. I think that's why this is a "fix by hand" kind of scenario today.

Basically, DEAD and UNKNOWN_SERVER are different in branch-2 and master branch, which in ServerManager we only track onlineServers and deadServer and I didn't find any transition that a UNKNOWN_SERVER could be moved to DEAD or ONLINE.

Yup, that's the crux of it.

This has been a nagging problem in the back of my mind that turns my stomach. This is what I think the situation is (to try to come up with some common terminology):

  1. hbase:meta has assigned regions to a set of RegionServers rs1
  2. All hosts of rs1 are shutdown and destroyed (i.e. meta still contains references to them)
  3. A new set of RegionServers are created, rs2, which have completely unique hostnames to rs1
  4. All MasterProcWALs from the cluster with rs1 are lost.

For HBase's consistency/safety, before we start any RS in rs2, we want to make sure all RS in rs1 are completely "down". That is, no RS in rs1 can be allowed to accept any more writes. What I'm wondering is, can we do something within HBase (without relying on whomever is controlling that infrastructure) to allow RS in rs2 to start coming up? When can we be sure that an UNKNOWN_SERVER is actually dead? By definition, we don't know the state of it.

@joshelser
Copy link
Member

Doesn't a SCP trigger log splitting (and therefore recoverLease) which would handle this case?

That's my point. We don't have the SCP because the proc wals were deleted. We normally do an SCP when we receive the RS ephemeral node deletion in ZK. Since we don't have either of these, we just have to be super sure that it's actually safe to submit that SCP. I agree with you if that we did submit an SCP, the system should recover.

This makes me wonder... do we have any analogous situations in a "normal" cluster (with hardware). For example..

  1. I have a healthy cluster (1 master, many RS)
  2. I stop the master
  3. I kill one RS
    3a. I do not restart that RS
  4. I restart the master

Do we submit an SCP for that RS today? Or, only when the new instance of that RS is started? I think this is a comparable situation -- maybe there's something I've not considered that we can still pull "state" from (e.g. we store something in the proc wals)

@z-york
Copy link
Contributor

z-york commented Jul 21, 2020

Ah, ignore my comment... I see that we aren't really running a SCP for those cases, we're just cleaning them up from meta?

@z-york
Copy link
Contributor

z-york commented Jul 21, 2020

Also from my investigation, I found that this comment should be updated if we handle this in CatalogJanitor: https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/HBCKServerCrashProcedure.java#L53

@z-york
Copy link
Contributor

z-york commented Jul 21, 2020

Doesn't a SCP trigger log splitting (and therefore recoverLease) which would handle this case?

That's my point. We don't have the SCP because the proc wals were deleted. We normally do an SCP when we receive the RS ephemeral node deletion in ZK. Since we don't have either of these, we just have to be super sure that it's actually safe to submit that SCP. I agree with you if that we did submit an SCP, the system should recover.

This makes me wonder... do we have any analogous situations in a "normal" cluster (with hardware). For example..

1. I have a healthy cluster (1 master, many RS)

2. I stop the master

3. I kill one RS
   3a. I do not restart that RS

4. I restart the master

Do we submit an SCP for that RS today? Or, only when the new instance of that RS is started? I think this is a comparable situation -- maybe there's something I've not considered that we can still pull "state" from (e.g. we store something in the proc wals)

Wouldn't there still be a znode in this case? That would probably trigger a SCP. Maybe you would get that situation if you added a 3b. delete zNode/clear out ZK

@joshelser
Copy link
Member

Wouldn't there still be a znode in this case? That would probably trigger a SCP. Maybe you would get that situation if you added a 3b. delete zNode/clear out ZK

Not sure, honestly :). I know the znodes that the Master watches for cluster membership are ephemeral (not persistent). However, maybe there is something else. I'd have to look. I can try to do this tmrw if you or Stephen don't trace through :)

@taklwu
Copy link
Contributor Author

taklwu commented Jul 22, 2020

Thanks Josh, and honestly I didn't know the logic till now. And here is the finding for both situations you're concerning:

first case

1. hbase:meta has assigned regions to a set of RegionServers rs1
2. All hosts of rs1 are shutdown and destroyed (i.e. meta still contains references to them)
3. A new set of RegionServers are created, rs2, which have completely unique hostnames to rs1
4. All MasterProcWALs from the cluster with rs1 are lost.

second case

1. I have a healthy cluster (1 master, many RS)
2. I stop the master
3. I kill one RS
   3a. I do not restart that RS
4. I restart the master

There is three Key parts in the normal system to handle region server has been deleted, MasterProcWALs/MasterRegion for DEAD server being tracked by SCP, Region servers name exists in WAL for possibly live servers.

If MasterProcWALs/MasterRegion both exist after a cluster restarts, when RegionServerTracker starts, RegionServerTracker figures out all online servers, and if we don't have Znode (with same hostname when restart?) for possibly live servers, marked they are dead and scheduled SCP for it as well as continue the SCP for already dead servers. That would be normal cases.

2020-07-22 09:55:24,729 INFO  [master/localhost:0:becomeActiveMaster] master.RegionServerTracker(123): Starting RegionServerTracker; 0 have existing ServerCrashProcedures, 3 possibly 'live' servers, and 0 'splitting'.
2020-07-22 09:55:24,730 DEBUG [master/localhost:0:becomeActiveMaster] zookeeper.RecoverableZooKeeper(183): Node /hbase/draining/localhost,55572,1595436917066 already deleted, retry=false
2020-07-22 09:55:24,730 INFO  [master/localhost:0:becomeActiveMaster] master.ServerManager(585): Processing expiration of localhost,55572,1595436917066 on localhost,55667,1595436924374
2020-07-22 09:55:24,755 DEBUG [master/localhost:0:becomeActiveMaster] procedure2.ProcedureExecutor(1050): Stored pid=12, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure localhost,55572,1595436917066, splitWal=true, meta=true

Then in the case of deleting MasterProcWALs (or MasterRegion in branch-2.3+) and kept the ZK nodes, even there is no procedure MasterProcWALs restored from, as long as we have the WAL from for previous host, we can still schedule SCP for it. but if MasterProcWALs and WAL are deleted, neither of the first and second cases will not operating normally.

The case we were originally trying to solve that is falling into the situation of MasterProcWALs and WAL are deleted after cluster restarted, we don't have the WAL, MasterProcWALs/MasterRegion and Zookeeper but HFiles, then those servers are under unknown and regions cannot be reassigned.


About the unit tests failure, Now....I'm hitting a strange issue, my tests works fine if I delete WAL, MasterProcWALs, and ZK baseZNode in branch-2.2. However, with the same setup in branch-2.3+ and master will hangs the master initialization if the ZK baseZNode is deleted with or without my changes. (what has been changed in branch-2.3? I found MasterRegion but not sure why that's related to ZK data, is it a bug? )

Interestingly, my fix works if keep the baseZnode, so, I'm trying to figure out a right way to cleanup zookeeper such it matched the one of the cloud use cases that WAL on HDFS and ZK are also deleted when HBase cluster terminated.

@taklwu taklwu force-pushed the HBASE-24286-branch-2 branch 2 times, most recently from 6f23800 to 3245587 Compare July 22, 2020 21:51
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 4m 3s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 32s branch-2 passed
+1 💚 checkstyle 1m 31s branch-2 passed
+1 💚 spotbugs 2m 40s branch-2 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
+1 💚 mvninstall 3m 14s the patch passed
+1 💚 checkstyle 1m 29s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 32s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 2m 55s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 26s The patch does not generate ASF License warnings.
39m 19s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 95f531c7f9df 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ce4e692
Max. process+thread count 94 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@taklwu
Copy link
Contributor Author

taklwu commented Jul 22, 2020

while waiting for the unit tests runs, I want to bring up two extra topics and may follow on new JIRA(s)

  1. We reverted a change in HBASE-24471 that always deletes existing meta table if we're restarting on a fresh cluster with No WALs and No ZK data. I'm wondered if @Apache9 added this meta table removal for a special requirement on branch-2.3+, and that was the major behavior changes between branch-2.2 (it didn't delete meta if exists) and branch-2.3+. Here, should we add a feature flag to enable this meta directory removal ? IMO migration from an cluster with existing meta table and other tables may fail and need HBCK to repair region states (pending unit test suite completes to prove our change is safe).

  2. With/Without this PR, I found a potential master cannot initialize issues and could be a bug on a dynamic hostname environment. If we only keep ZK data and has no WALs support, the location of meta table have the old hostname, and it hangs and waits for meta region to be online on that old hosts. however, it cannot be online because InitMetaProcedure cannot be submitted as meta region considers as OPEN and blocks by the condition of if (rs != null && rs.isOffline()) {). Normally, if WALs exist, the missing server will be expires and meta region will come online after the SCP handled that dead server. is this behavior as expected? do you guys think we should support this corner case?

### for case 2.
2020-07-22 13:16:05,802 INFO  [master/localhost:0:becomeActiveMaster] master.HMaster(1020): hbase:meta {1588230740 state=OPEN, ts=1595448965762, server=localhost,54945,1595448957980}
...
2020-07-22 15:04:33,802 WARN  [master/localhost:0:becomeActiveMaster] master.HMaster(1230): hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1595455438210, server=localhost,62506,1595455430742}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

@taklwu taklwu requested a review from Apache9 July 22, 2020 22:33
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 29s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 4m 13s branch-2 passed
+1 💚 compile 1m 29s branch-2 passed
+1 💚 shadedjars 5m 48s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 19s hbase-common in branch-2 failed.
-0 ⚠️ javadoc 0m 42s hbase-server in branch-2 failed.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 21s Maven dependency ordering for patch
+1 💚 mvninstall 3m 53s the patch passed
+1 💚 compile 1m 28s the patch passed
+1 💚 javac 1m 28s the patch passed
+1 💚 shadedjars 5m 44s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 18s hbase-common in the patch failed.
-0 ⚠️ javadoc 0m 37s hbase-server in the patch failed.
_ Other Tests _
+1 💚 unit 1m 35s hbase-common in the patch passed.
-1 ❌ unit 125m 46s hbase-server in the patch failed.
156m 34s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux 45f3be02e791 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ce4e692
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-common.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-common.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/testReport/
Max. process+thread count 4356 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@taklwu taklwu force-pushed the HBASE-24286-branch-2 branch from 3245587 to b34fe54 Compare July 23, 2020 01:26
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 10m 24s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 39s Maven dependency ordering for branch
+1 💚 mvninstall 5m 7s branch-2 passed
+1 💚 compile 1m 46s branch-2 passed
+1 💚 shadedjars 6m 34s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 15s branch-2 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for patch
+1 💚 mvninstall 4m 31s the patch passed
+1 💚 compile 1m 42s the patch passed
+1 💚 javac 1m 42s the patch passed
+1 💚 shadedjars 6m 57s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 11s the patch passed
_ Other Tests _
+1 💚 unit 1m 53s hbase-common in the patch passed.
-1 ❌ unit 204m 40s hbase-server in the patch failed.
249m 12s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux 43e4b1a50653 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ce4e692
Default Java 1.8.0_232
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/testReport/
Max. process+thread count 2829 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/2/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented Jul 23, 2020

I haven't looked at the patch yet, but what I want to say is that, it is not a good idea to use HBCK stuff in automatic fail recovery. The design of HBCK is that, the operation is dangerous so only operators can perform it, manually, and the operators need to know the risk.

Will report back later after reviewing the patch, and also I need to learn what is the problem we want to solve.

Thanks.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 25s Maven dependency ordering for branch
+1 💚 mvninstall 3m 14s branch-2 passed
+1 💚 checkstyle 1m 30s branch-2 passed
+1 💚 spotbugs 2m 35s branch-2 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 7s the patch passed
+1 💚 checkstyle 1m 29s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 18s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 2m 59s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 25s The patch does not generate ASF License warnings.
35m 19s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux acadf2fab521 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ce4e692
Max. process+thread count 94 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@taklwu
Copy link
Contributor Author

taklwu commented Jul 23, 2020

thanks @Apache9 we can switch to use standard SCP instead, for this case running SCP should be the same. The only difference is HBCKSCP may rescan the meta table (slower) if it cannot find any in-memory region states, IMO that rescan is always skipped because the unknown server were normally brought by loadMeta()

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 36s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 36s branch-2 passed
+1 💚 compile 1m 18s branch-2 passed
+1 💚 shadedjars 5m 7s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 0s branch-2 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for patch
+1 💚 mvninstall 3m 14s the patch passed
+1 💚 compile 1m 17s the patch passed
+1 💚 javac 1m 17s the patch passed
+1 💚 shadedjars 4m 59s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 56s the patch passed
_ Other Tests _
+1 💚 unit 1m 23s hbase-common in the patch passed.
+1 💚 unit 138m 25s hbase-server in the patch passed.
164m 54s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux fa3cc9c7ad94 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ce4e692
Default Java 1.8.0_232
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/testReport/
Max. process+thread count 4227 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 12s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for branch
+1 💚 mvninstall 4m 37s branch-2 passed
+1 💚 compile 1m 35s branch-2 passed
+1 💚 shadedjars 6m 25s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 18s hbase-common in branch-2 failed.
-0 ⚠️ javadoc 0m 41s hbase-server in branch-2 failed.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 19s Maven dependency ordering for patch
+1 💚 mvninstall 4m 25s the patch passed
+1 💚 compile 1m 35s the patch passed
+1 💚 javac 1m 35s the patch passed
+1 💚 shadedjars 6m 23s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 16s hbase-common in the patch failed.
-0 ⚠️ javadoc 0m 41s hbase-server in the patch failed.
_ Other Tests _
+1 💚 unit 1m 57s hbase-common in the patch passed.
+1 💚 unit 205m 8s hbase-server in the patch passed.
237m 54s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux 72870c2f59a6 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / ce4e692
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-common.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-common.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/testReport/
Max. process+thread count 2618 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/3/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented Jul 23, 2020

You're right that the new set of region servers start on new environment that they are identified as 'Unknown server'.

I think the old server will be unknown servers, not the new servers?

I do not think there is a guarantee that you change the filesystem layout of HBase internal, and HBase cluster will still be functional. Even if sometimes it could, as you said, on 1.4, it does not mean that we will always keep this behavior in new versions.

For your scenario, it is OK as you can confirm that there is no data loss as you manually flushed all the data. But what if another user just configs the wrong WAL directory? In this case, if we schedule SCPs automatically, there will be data loss.

In general, in HBase, we only know that something strange happen, but we do not know how we come into this situation so it is not safe for us to just recover automatically. Only end user knows, so I still prefer we just provide tools for them to recover, not add some 'unsafe' options in HBase...

Thanks.

@anoopsjohn
Copy link
Contributor

anoopsjohn commented Jul 23, 2020

Typically, you can just disable all the tables, restart the whole cluster on a new environment, and then enable all the tables.

Agree to Duo. That might be a better path for such use case where the cluster is deleted (after persisting all data) and later create pointing to existing FS.

In general, in HBase, we only know that something strange happen, but we do not know how we come into this situation so it is not safe for us to just recover automatically

Ya I am also agreeing to this point.
In 2.x we store the table's disable/enable state in META. So we will have that info available in the newly recreated cluster also. (The zk data from old cluster also wont be there in new cluster I believe)

@z-york
Copy link
Contributor

z-york commented Jul 23, 2020

So the use case here is starting a new cluster in the cloud where HDFS (WAL) data on the previous cluster will not be available. One of the benefits of storing the data off the cluster (in our case, S3), is to not have to replicate data (and just create a new cluster pointed to the same root directory). IMO, in this case we shouldn't need the WAL directories to exist just to tell us to reassign and this is a valid use case.

I get that there is pushback for enabling this in the catalog cleaner, and I think that's fine. For this case, it's a one time issue, not something that periodically needs fixing. (there might be other unknown server cases that would require that, but that isn't blocking us at the moment). So, instead maybe a 1-time run to cleanup old servers/schedule SCP for them (this is what the code that was removed in HBASE-20708 actually did) makes the most sense? I understand that it was removed to simplify the assignment, but it has a very different behavior. In fact it looks like we don't even try to read hbase:meta if it is found (without SCP/WAL) and simply just delete the directory[1]. What problem is being solved by deleting instead of trying to assign it if data is there?

[1]

private static void writeFsLayout(Path rootDir, Configuration conf) throws IOException {
LOG.info("BOOTSTRAP: creating hbase:meta region");
FileSystem fs = rootDir.getFileSystem(conf);
Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME);
if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
LOG.warn("Can not delete partial created meta table, continue...");
}

For your scenario, it is OK as you can confirm that there is no data loss as you manually flushed all the data. But what if another user just configs the wrong WAL directory? In this case, if we schedule SCPs automatically, there will be data loss.

In this scenario, regardless of what we do, there will be dataloss unless the correct WAL directory is (again) specified. In fact, I don't believe you can change WAL dir without restarting servers (I also don't think it works with rolling restart). I don't think this is a valid scenario for this issue.

@Apache9
Copy link
Contributor

Apache9 commented Jul 24, 2020

In this scenario, regardless of what we do, there will be dataloss unless the correct WAL directory is (again) specified. In fact, I don't believe you can change WAL dir without restarting servers (I also don't think it works with rolling restart). I don't think this is a valid scenario for this issue.

No, there will be silent data loss, the user will notice that no region is online, and the cluster is not in a good state, just the same as what you described here.

And again, this is not a normal operation in HBase, we do not expect that the WAL directories can be removed without SCP. I wonder why our SCP can even pass without a WAL directory. We should hang there I suppose. Only HBCKSCP can do the dangerous operation.

@Apache9
Copy link
Contributor

Apache9 commented Jul 24, 2020

In general, if you touch the internal of HBase directly, it may lead to data loss, unexpected behavior, etc.

As I said above, the current design is to compare the WAL directories and the region server znodes on zookeeper to detect dead region servers when master starts up. If you just removed the WAL directories then HBase will have unexpected behaviors. Any addendums to solve the problem here should be considered as dangerous operations, which should only be in HBCK.

If you want to solve the problem automatically, you should find another way to detect the dead region servers when master starts up, to make sure we do not rely on the WAL directories. But I'm still a bit nervous that when SCP notices that there is no WAL directory for a dead region server, what should it do. It is not the expected behavior in HBase...

@taklwu
Copy link
Contributor Author

taklwu commented Jul 24, 2020

Thanks everyone, let me give a try on adding unknown servers that does not have WAL and ZK node as part of the dead server when only at HMaster startup, such dead server could be handled as usual. then it will be limit the scope to only at master initialization time (although it should be still SCP if I understand correctly).

@taklwu taklwu force-pushed the HBASE-24286-branch-2 branch 2 times, most recently from 95a6bac to 69a0be0 Compare July 25, 2020 06:11
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 3m 59s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 41s branch-2 passed
+1 💚 checkstyle 1m 12s branch-2 passed
+1 💚 spotbugs 1m 57s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 14s the patch passed
+1 💚 checkstyle 1m 8s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 28s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 2m 8s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
36m 24s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 3e0e99f07e9e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 6cb51cc
Max. process+thread count 94 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME);
if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
LOG.warn("Can not delete partial created meta table, continue...");
if (fs.exists(tableDir)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another issue other than the unknown RS issue.
In cluster recreate cloud case, the zk data is not there and so no META region location. So for HM, this is a cluster bootstrap and so as part of that init meta and its FS.
HBASE-24471 added this extra code of deleting an existing META table FS path. It is done by considering as a partial created meta table as part of some previous failed bootstrap cc @Apache9
So this issue is because of the zk data not in new cluster. Unknown server issue is because of loss of WAL (to be precise the master proc wal)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as Zack and I highlighted in the comments above, we found the change was from HBASE-24471 and it's a bit different from that change before (I agreed that change is solving the meta startup/bootstrap problem.).

let me send another update and add a feature flag to be able to turn off delete partial meta (default is to delete it). Then for the cloud case, we can turn it off for the cloud use case that ZK data has been deleted before we come up a better way to tell what is partial. (sorry that I don't have a good way to validate if meta is partial from previous bootstrap. any suggestion?)

Copy link
Contributor

@anoopsjohn anoopsjohn Jul 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that in cloud cluster recreate case, this META FS delete is a bigger concern. At least the other case of unknown servers, we can make the table to be disabled at least.. But here not such workaround even possible I fear. Thanks for pointing out.
Now taking a step back. All these decision points in HM start regarding bootstrapping or AM work is based on the assumption that HBase is a continuously running cluster. So the cluster comes up first and then data getting persisted in FS over the run. Now in cloud there is a very useful feature of delete a cluster and keep the data in blob store and later when needed, recreate the cluster pointing to the existing data. Many of the decision making in HM startup is not holding right at that time! (Like the one above which thinks that if there is no META location in zk, means meta table was never been online and no data in it) So what we need is a way to know that whether this cluster start is from an existing data (cluster recreate).. All these decisions can be based on that check result (?) Even the unknown server handling also so that it will happen ONLY in this special cloud case and only once.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 33s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 23s branch-2 passed
+1 💚 compile 0m 54s branch-2 passed
+1 💚 shadedjars 4m 58s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 36s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 18s the patch passed
+1 💚 compile 0m 56s the patch passed
+1 💚 javac 0m 56s the patch passed
+1 💚 shadedjars 4m 57s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 35s the patch passed
_ Other Tests _
-1 ❌ unit 143m 14s hbase-server in the patch failed.
165m 31s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux f4fdda5689ee 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 6cb51cc
Default Java 1.8.0_232
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/testReport/
Max. process+thread count 4261 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 4m 38s branch-2 passed
+1 💚 compile 1m 12s branch-2 passed
+1 💚 shadedjars 6m 22s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 43s hbase-server in branch-2 failed.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 23s the patch passed
+1 💚 compile 1m 8s the patch passed
+1 💚 javac 1m 8s the patch passed
+1 💚 shadedjars 6m 24s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 41s hbase-server in the patch failed.
_ Other Tests _
-1 ❌ unit 193m 51s hbase-server in the patch failed.
222m 20s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux 4cde733da70e 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 6cb51cc
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/testReport/
Max. process+thread count 2818 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/4/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 17s branch-2 passed
+1 💚 checkstyle 1m 8s branch-2 passed
+1 💚 spotbugs 1m 55s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 14s the patch passed
+1 💚 checkstyle 1m 9s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 21s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 2m 4s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
32m 16s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 90fa8fe64874 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 6cb51cc
Max. process+thread count 94 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 11s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 4m 38s branch-2 passed
+1 💚 compile 1m 9s branch-2 passed
+1 💚 shadedjars 6m 23s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 42s hbase-server in branch-2 failed.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 25s the patch passed
+1 💚 compile 1m 9s the patch passed
+1 💚 javac 1m 9s the patch passed
+1 💚 shadedjars 6m 23s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 40s hbase-server in the patch failed.
_ Other Tests _
-1 ❌ unit 191m 11s hbase-server in the patch failed.
219m 54s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux dac4f37c3ab3 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 6cb51cc
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/testReport/
Max. process+thread count 2654 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 19s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 42s branch-2 passed
+1 💚 compile 1m 0s branch-2 passed
+1 💚 shadedjars 5m 32s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s branch-2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 39s the patch passed
+1 💚 compile 0m 58s the patch passed
+1 💚 javac 0m 58s the patch passed
+1 💚 shadedjars 5m 37s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 36s the patch passed
_ Other Tests _
-1 ❌ unit 205m 56s hbase-server in the patch failed.
230m 57s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux 6c1f52f11b2c 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 6cb51cc
Default Java 1.8.0_232
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/testReport/
Max. process+thread count 2841 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/5/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

…ting a new cluster pointing at the same file system

HBase currently does not handle `Unknown Servers` automatically and requires
users to run hbck2 `scheduleRecoveries` when one see unknown servers on
the HBase report UI.

This became an issue on HBase2 adoption especially when a table wasn't
disabled before shutting down a HBase cluster on cloud that WALs and
Zookeeper data are removed on a dynamic environment that hostname changes
frequently. Once the cluster restarts, hbase:meta keeps the old hostname/IPs
for region servers that were running in the last cluster. Those region
servers became `Unknown Servers` and regions on those region servers are
never been reassigned automatically.

Our fix here is to trigger a repair immediately after hbase:meta is loaded
and is online, find any non-offline regions of enabled table on `Unknown
Servers` such that thet can be reassigned to other online servers.

- Also introduce a feature to that skip the removal of the hbase:meta
table directory if InitMetaProcedure#writeFsLayout runs, especially if
ZNode is fresh but hbase:meta table exists
@taklwu taklwu force-pushed the HBASE-24286-branch-2 branch from 69a0be0 to ded37e0 Compare July 26, 2020 06:23
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for branch
+1 💚 mvninstall 3m 12s branch-2 passed
+1 💚 checkstyle 1m 33s branch-2 passed
+1 💚 spotbugs 2m 35s branch-2 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 13s the patch passed
-0 ⚠️ checkstyle 1m 8s hbase-server: The patch generated 2 new + 113 unchanged - 2 fixed = 115 total (was 115)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 26s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 2m 58s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 25s The patch does not generate ASF License warnings.
35m 41s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux b5a79b310d1f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 6cb51cc
checkstyle https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
Max. process+thread count 94 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for branch
+1 💚 mvninstall 4m 41s branch-2 passed
+1 💚 compile 1m 33s branch-2 passed
+1 💚 shadedjars 6m 27s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 18s hbase-common in branch-2 failed.
-0 ⚠️ javadoc 0m 40s hbase-server in branch-2 failed.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 4m 25s the patch passed
+1 💚 compile 1m 36s the patch passed
+1 💚 javac 1m 36s the patch passed
+1 💚 shadedjars 6m 25s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 17s hbase-common in the patch failed.
-0 ⚠️ javadoc 0m 39s hbase-server in the patch failed.
_ Other Tests _
+1 💚 unit 1m 47s hbase-common in the patch passed.
-1 ❌ unit 191m 44s hbase-server in the patch failed.
224m 20s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux 8a3751ac66f0 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 6cb51cc
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-common.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-common.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/testReport/
Max. process+thread count 2840 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 7s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for branch
+1 💚 mvninstall 3m 39s branch-2 passed
+1 💚 compile 1m 18s branch-2 passed
+1 💚 shadedjars 5m 30s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 55s branch-2 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
+1 💚 mvninstall 3m 34s the patch passed
+1 💚 compile 1m 17s the patch passed
+1 💚 javac 1m 17s the patch passed
+1 💚 shadedjars 5m 25s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 54s the patch passed
_ Other Tests _
+1 💚 unit 1m 36s hbase-common in the patch passed.
+1 💚 unit 198m 57s hbase-server in the patch passed.
227m 0s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #2113
JIRA Issue HBASE-24286
Optional Tests javac javadoc unit shadedjars compile
uname Linux 5ebeb0416846 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 6cb51cc
Default Java 1.8.0_232
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/testReport/
Max. process+thread count 2680 (vs. ulimit of 12500)
modules C: hbase-common hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2113/6/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

FileSystem fs = rootDir.getFileSystem(conf);
Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME);
if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
boolean removeMeta = conf.getBoolean(HConstants.REMOVE_META_ON_RESTART,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Checking if meta needs initializing.
status.setStatus("Initializing meta table if this is a new deploy");
InitMetaProcedure initMetaProc = null;
// Print out state of hbase:meta on startup; helps debugging.
RegionState rs = this.assignmentManager.getRegionStates().
getRegionState(RegionInfoBuilder.FIRST_META_REGIONINFO);
LOG.info("hbase:meta {}", rs);
if (rs != null && rs.isOffline()) {
Optional optProc = procedureExecutor.getProcedures().stream()
.filter(p -> p instanceof InitMetaProcedure).map(o -> (InitMetaProcedure) o).findAny();
initMetaProc = optProc.orElseGet(() -> {
// schedule an init meta procedure if meta has not been deployed yet
InitMetaProcedure temp = new InitMetaProcedure();
procedureExecutor.submitProcedure(temp);
return temp;
});
}
So the checks we do see that META location is not there in zk and so it thinks its new deploy. So here is what we need to tackle.
In cloud redeploy case we will see a pattern where we will have a clusterId in the FS and not in zk. This can be used as an indicator? IMO we should find it (using this way or other) that its a redeploy on an existing datset and all these places, we need to consider that also to decide we need such bootstrap steps.
We should not be doing that with a config way IMO. Because then in cloud based deploy, what if the 1st time start fail and there is a need for this bootstrap cleaning of META FS dir?
Even the other unknown server case also. Lets identify clearly this redeploy case and act then only.
Can we pls ave that discuss and conclude on a solution for that and then move forward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cloud redeploy case we will see a pattern where we will have a clusterId in the FS and not in zk. This can be used as an indicator?

I will come back on this later tomorrow, but I agreed with you that we should check explicitly how we define partial bootstrap and that partial meta need some cleanup.

also, do you mean if the clusterID did't write to ZK, is it partial during bootstrap ?

Copy link
Contributor

@anoopsjohn anoopsjohn Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, do you mean if the clusterID did't write to ZK, is it partial during bootstrap ?

I am not sure whether that can be really used. I need to check the code. We need a way to identify the fact that its a cluster redeploy. Not use some config to identify that.. The HBase system should be smart enough. So I was just wondering whether this we can use to know that. May be not.. Need to see. So my thinking is this that we will make the feature of recreate a cluster on top of existing data a 1st class feature for HBase itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, do you mean if the clusterID did't write to ZK, is it partial during bootstrap ?

I am not sure whether that can be really used. I need to check the code. We need a way to identify the fact that its a cluster redeploy. Not use some config to identify that.. The HBase system should be smart enough. So I was just wondering whether this we can use to know that. May be not.. Need to see. So my thinking is this that we will make the feature of recreate a cluster on top of existing data a 1st class feature for HBase itself.

That would be great, let's find a good way to differentiate this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for late and I have reread the code and come up the following.

First of all, the partial meta in current logic should mean a Procedure WAL of InitMetaProcedure did not succeed and INIT_META_ASSIGN_META was not completed. Currently, even if meta table can be read and a Table Descriptor can be retrieved but not assigned, it is still considered to be partial (correct me if I'm wrong). So, in short, partial meta table cannot be defined by reading the tableinfo or storefile itself.

Further, a combination of looking at WALs, Procedure WALs and Zookeeper data are the requirement and are used to define partial meta in the normal cases. But for the cloud use case, or other use cases that one of the requirements is missing, we will need a different discussion. For example.

  1. partial meta on the HDFS long running cluster cases
    a. if have WALs and have ZK, it will be able to reassign normally
    b. if have WALs but no ZK, it will not submit a new/enter into any state of InitMetaProcedure because it found the old InitMetaProcedure in the WAL. then the old server was handled by submit any SCP and assignment manager is do nothing. such Master hangs and does not finish initialization. (this is a different problem from the cloud case)
    c. if no WALs but have ZK, state=OPEN remains for hbase:meta when opening an existing meta region, InitMetaProcedure will not be submitted/entered as well (see this section in HMaster). master will hang and does not finish initialization. (this is a different problem from the cloud case)

There, for this PR, if we only focus on the cloud use cases, the unknown servers and partial meta will be much simpler. e.g. to when running InitMetaProcedure, clusterID in zookeeper (suggested by Anoop) can be used to indicate if it's partial meta that indicates ZK data is fresh, Region WALs and procedure WAL of InitMetaProcedure may not be exist. And if WAL and procedure WAL exits, it fails into the same failures as mentioned above case 1b (out of scope for this PR).

  1. partial meta on Cloud without WALs and ZK
    a. if we're in INIT_META_WRITE_FS_LAYOUT and continue, then ZK should have existed when master restarts. Otherwise for the case of have WALs and no ZK, we will fail back to case 1b and we don't handle it within this PR.
    b. if no WAL and no ZK, it submits a InitMetaProcedure but the procedure lands with INIT_META_WRITE_FS_LAYOUT
    • during INIT_META_WRITE_FS_LAYOUT, we check if ZK does not exist and with an existing meta directory, we should trust it and try to open it.
      • we're running this state of INIT_META_WRITE_FS_LAYOUT only when ZK does not exist or INIT_META_WRITE_FS_LAYOUT didn't finish previously.

So, we're fixing case 2b in this PR, and I have come up the prototype and unit tests are running off this PR now (TestClusterRestartFailoverSplitWithoutZk is falling even without our changes on branch-2).

The proposed changes are

  • Only perform regions reassignment for regions on unknown server when there is no PE WALs, no Region WALs and no ZK data
  • Do not recreate meta table directory if the restarted procedure of InitMetaProcedure#INIT_META_WRITE_FS_LAYOUT comes with no ZK data (or maybe no WAL as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Need to read through the analysis what you put above.. What you mention about when, after recreate cluster, the start will hang because of META not getting assigned, is correct. Can u pls create a sub issue for this case? ie. knowing whether we are starting HM after a recreate (create cluster over existing data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of HM start and the bootstrap we create the ClusterID and write to FS and then to zk and then create the META table FS layout. So in a cluster recreate, we will see clusterID is there in FS and also the META FS layout but no clusterID in zk. Ya seems we can use this as indication for cluster recreate over existing data. In HM start, this is some thing we need to check at 1st itself and track. If this mode is true, later when (if) we do INIT_META_WRITE_FS_LAYOUT , we should not delete the META dir. As part of the Bootstrap when we write that proc to MasterProcWal, we can include this mode (boolean) info also. This is a protobuf message anyways. So even if this HM got killed and restarted (at a point where the clusterId was written to zk but the Meta FS layout part was not reached) we can use the info added as part of the bootstrap wal entry and make sure NOT to delete the meta dir.
Can we do this part alone in a sub task and a provide a patch pls? This is very key part.. That is why better we can fine tune this with all diff testcases. Sounds good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds right to me, as you suggested we put this PR on-held and depends on the new sub-task. I will try to send another JIRA and PR out in a few days and refer to the conversation we discussed here.

Thanks again Anoop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not delete the META dir.

Sorry for harping on an implementation detail: let's sideline meta and not delete please :).

Can we do this part alone in a sub task and a provide a patch pls? This is very key part..

This seems like a very reasonable starting point. Like Anoop points out, if we can be very sure that we will only trigger this case when we are absolutely sure we're in the "cloud recreate" situation, that will bring a lot of confidence.

I will try to send another JIRA and PR out in a few days and refer to the conversation we discussed here.

Lazy Josh: did you get a new Jira created already for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshelser the new JIRA is HBASE-24833 and the discussion are mainly on the new PR#2237. I may need to send email to dev@ list for a boarder discussion if we should not depend on the data on zookeeper (that will help us to prevent deleting the meta directory)

@joshelser
Copy link
Member

If you want to solve the problem automatically, you should find another way to detect the dead region servers when master starts up, to make sure we do not rely on the WAL directories. But I'm still a bit nervous that when SCP notices that there is no WAL directory for a dead region server, what should it do. It is not the expected behavior in HBase...

Am catching up.. but I think Duo's comments here hit on a lot of the worry that I had. How can we be 100% certain that we don't hit this other code-path when we are not in the "cloud recreation" path? Is this better served by HBCK2-type automation which can be run when stuff is being "recreated"? Just thinking out loud.

Making a pass through the code changes you have so far, Stephen. Looks like some good reviews by Anoop already :)

Comment on lines +1453 to +1462
.filter(s -> !s.isOffline())
.filter(s -> isTableEnabled(s.getRegion().getTable()))
.filter(s -> !regionStates.isRegionInTransition(s.getRegion()))
.filter(s -> {
ServerName serverName = regionStates.getRegionServerOfRegion(s.getRegion());
if (serverName == null) {
return false;
}
return master.getServerManager().isServerKnownAndOnline(serverName)
.equals(ServerManager.ServerLiveState.UNKNOWN);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collapse these down into one method so we don't end up making 4 iterations over a list of (potentially) a lot of regions.

FileSystem fs = rootDir.getFileSystem(conf);
Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME);
if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
boolean removeMeta = conf.getBoolean(HConstants.REMOVE_META_ON_RESTART,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not delete the META dir.

Sorry for harping on an implementation detail: let's sideline meta and not delete please :).

Can we do this part alone in a sub task and a provide a patch pls? This is very key part..

This seems like a very reasonable starting point. Like Anoop points out, if we can be very sure that we will only trigger this case when we are absolutely sure we're in the "cloud recreate" situation, that will bring a lot of confidence.

I will try to send another JIRA and PR out in a few days and refer to the conversation we discussed here.

Lazy Josh: did you get a new Jira created already for this?

@saintstack
Copy link
Contributor

On the @joshelser good question of 'I was concerned about making sure all "old" RegionServers are actually down before we reassign regions onto new servers', SCP should probably call expire on the ServerName it is passed. It'd be redundant in most cases. I thought it did this already but it does not. Queuing an SCP adds the server to the 'Dead Servers' list (I think -- check) so if it arrives at any time subsequent, it will be told 'YouAreDead..' and it will shut itself down.

On the @Apache9 question:

And I prefer we just add a new operation in HBCK2, to scan for unknown servers and schedule SCP for them? Or maybe we already have one in place? @saintstack Can you recall sir? I'm not very familiar with all the operations in HBCK2.

Currently HBCK2 does not have special handling for 'Unknown Servers'. The 'HBCK Report' page that reports 'Unknown Servers' found by a CatalogJanitor run suggests:

The below are servers mentioned in the hbase:meta table that are no longer 'live' or known 'dead'. The server likely belongs to an older cluster epoch since replaced by a new instance because of a restart/crash. To clear 'Unknown Servers', run 'hbck2 scheduleRecoveries UNKNOWN_SERVERNAME'. This will schedule a ServerCrashProcedure. It will clear out 'Unknown Server' references and schedule reassigns of any Regions that were associated with this host. But first!, be sure the referenced Region is not currently stuck looping trying to OPEN. Does it show as a Region-In-Transition on the Master home page? Is it mentioned in the 'Procedures and Locks' Procedures list? If so, perhaps it stuck in a loop trying to OPEN but unable to because of a missing reference or file. Read the Master log looking for the most recent mentions of the associated Region name. Try and address any such complaint first. If successful, a side-effect should be the clean up of the 'Unknown Servers' list. It may take a while. OPENs are retried forever but the interval between retries grows. The 'Unknown Server' may be cleared because it is just the last RegionServer the Region was successfully opened on; on the next open, the 'Unknown Server' will be purged.

So, the 'fix' for 'Unknown Servers' as exercised by myself recently was to parse the 'HBCK Report' page to make a list of all 'Unknown Servers' and then script a call to 'hbck2 scheduleRecoveries' for each one. We should be able to do better than this -- either add handling of 'Unknown Servers' to the set of issues 'fixed' when we run 'hbck2 fixMeta' or as is done here, scheduling an SCP for any 'Unknown Server' found when CatalogJanitor runs.

On the latter auto-fix, there is understandable reluctance. I think this comes of 'Unknown Servers' being an ill-defined entity-type; the auto-fix can wait on the concept hardening.

I like this comment of @Apache9:

I do not think there is a guarantee that you change the filesystem layout of HBase internal, and HBase cluster will still be functional. Even if sometimes it could, as you said, on 1.4, it does not mean that we will always keep this behavior in new versions.

But there should be 'safe' means of attaining your ends @taklwu .

Perhaps of help is a little known utility, hbase.master.maintenance_mode config, where you can start the Master in 'maintenance' mode (HBASE-21073): Master comes up, assigns meta but nothing else... it is so you can ask Master to make edits of state/procedures/meta. Perhaps you could script moving cluster to new location, starting Master in new location in maintenance mode, edit meta (a scp that doesn't assign?), then shut it down followed by normal restart.

@joshelser
Copy link
Member

Mentioning here as the recommendation of Zach, I'm trying to see if we can get an answer as to whether or not we think a default=false configuration option to automatically schedule SCPs when unknown servers are seen, as described in #2114

I agree/acknowledge that other solutions to this also exist (like Stack nicely wrote up), but those would require a bit more automation to implement.

I don't want to bulldoze the issue, but this is an open wound for me that keeps getting more salt rubbed into it :)

@taklwu
Copy link
Contributor Author

taklwu commented Sep 8, 2022

I'm closing it, not much movement we can make here, if anyone has a better solution or hit the issue again, please link this discussion on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants