Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 29 additions & 2 deletions hbase-tools/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ module are:

- RegionsMerger;
- MissingRegionDirsRepairTool;

- RegionsOnUnknownServersRecoverer;

## Setup
Make sure HBase tools jar is added to HBase classpath:
Expand Down Expand Up @@ -138,4 +138,31 @@ the affected regions, it copies the entire region dir to a
region hfiles to a `HBASE_ROOT_DIR/.missing_dirs_repair/TS/TBL_NAME/bulkload` dir, renaming these
files with the pattern `REGION_NAME-FILENAME`. For a given table, all affected regions would then
have all its files under same directory for bulkload. _MissingRegionDirsRepairTool_ then uses
_LoadIncrementalHFiles_ to load all files for a given table at once.
_LoadIncrementalHFiles_ to load all files for a given table at once.

## RegionsOnUnknownServersRecoverer - Tool for recovering regions on "unknown servers."

_RegionsOnUnknownServersRecoverer_ parses the master log to identify `unknown servers`
holding regions. This condition may happen in the event of recovering previously destroyed clusters,
where new Master/RS names completely differ from the previous ones currently
stored in meta table (see HBASE-24286).

```
NOTE: This tool is useful for clusters runing hbase versions lower than 2.2.7, 2.3.5 and 2.4.7.
For any of these versions or higher, HBCK2 'recoverUnknown' option can be used as a much simpler solution.
```

### Usage

This tool requires the master logs path as parameter. Assuming classpath is properly set, can be run as follows:

```
$ hbase org.apache.hbase.RegionsOnUnknownServersRecoverer PATH_TO_MASTER_LOGS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dry-run option is not mentioned here.

```


### Implementation Details

_RegionsOnUnknownServersRecoverer_ parses master log file searching for specific messages mentioning
"unknown servers". Once "unknown servers" are found, it then uses `HBCK2.scheduleRecoveries` to
submit SCPs for each of these "unknown servers".
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hbase;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.HashSet;
import java.util.Set;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


/**
* Tool for identifying Unknown Servers from master logs and schedule SCPs for each of those using
* HBCK2 'scheduleRecoveries' option. This is useful for clusters running hbase versions lower than
* 2.2.7, 2.3.5 and 2.4.7. For any of these versions or higher, use HBCK2 'recoverUnknown' option.
*/
public class RegionsOnUnknownServersRecoverer extends Configured implements Tool {

private static final Logger LOG =
LoggerFactory.getLogger(RegionsOnUnknownServersRecoverer.class.getName());

private static final String CATALOG_JANITOR = "CatalogJanitor: hole=";

private static final String UNKNOWN_SERVER = "unknown_server=";

private Configuration conf;

private Set<String> unknownServers = new HashSet<>();

private boolean dryRun = false;

public RegionsOnUnknownServersRecoverer(Configuration conf){
this.conf = conf;
}

@Override
public int run(String[] args) throws Exception {
String logPath = null;
if(args.length>=1 && args.length<3) {
logPath = args[0];
if(args.length==2) {
dryRun = Boolean.parseBoolean(args[1]);
}
} else {
LOG.error("Wrong number of arguments. "
+ "Arguments are: <PATH_TO_MASTER_LOGS> [dryRun]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the usage message it is not obvious that the [dryRun] parameter should be true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to mention the available options here. I guess we can explain the dryRun option in the README only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it in the readme is enough but in my opinion, it is still not clear what an operator should use as the second parameter.

hbase org.apache.hbase.RegionsOnUnknownServersRecoverer /var/log/hbase.log true
hbase org.apache.hbase.RegionsOnUnknownServersRecoverer /var/log/hbase.log dryRun

Anything passed that is not true will schedule SCPs because the default option is dryrun=false.

return 1;
}
BufferedReader reader = null;
try(Connection conn = ConnectionFactory.createConnection(conf)) {
reader = new BufferedReader(new FileReader(new File(logPath)));
String line = null;
while((line = reader.readLine()) != null){
if(line.contains(CATALOG_JANITOR)){
String[] servers = line.split(UNKNOWN_SERVER);
for(int i=1; i<servers.length; i++){
String server = servers[i].split("/")[0];
if(!unknownServers.contains(server)){
LOG.info("Adding server {} to our list of servers that will have SCPs.", server);
unknownServers.add(server);
}
}
}
}
if(dryRun){
StringBuilder builder =
new StringBuilder("This is a dry run, no SCPs will be submitted. Found unknown servers:");
builder.append("\n");
unknownServers.stream().forEach(s -> builder.append(s).append("\n"));
LOG.info(builder.toString());
} else {
HBCK2 hbck2 = new HBCK2(conf);
LOG.info("Submitting SCPs for the found unknown servers with "
+ "HBCK2 scheduleRecoveries option.");
hbck2.scheduleRecoveries(conn.getHbck(), unknownServers.toArray(new String[] {}));
}
} catch(Exception e){
LOG.error("Recovering unknown servers failed:", e);
return 2;
} finally {
reader.close();
}
return 0;
}

public static void main(String [] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
int errCode = ToolRunner.run(new RegionsOnUnknownServersRecoverer(conf), args);
if (errCode != 0) {
System.exit(errCode);
}
}
}