Skip to content

Conversation

@sadanand48
Copy link
Contributor

@sadanand48 sadanand48 commented May 19, 2020

What changes were proposed in this pull request?

This tool parses rocksdb file for SCM and dumps specified table data. Also there is a list command which parses any db file and lists the columnFamilies(tables).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3622

How was this patch tested?

Manually Tested.

Here are the commands:
1.bin/ozone debug ldb --db=/tmp/ozone/data/metadata/scm.db/ list_column_families
default
validCerts
deletedBlocks
pipelines
revokedCerts
containers

2.bin/ozone debug ldb --db=/tmp/ozone/data/metadata/scm.db/ scan --column_family=containers

@avijayanhwx
Copy link
Contributor

avijayanhwx commented May 19, 2020

@sadanand48 Thanks for working on this useful tool.

There is an Ozone component called 'Recon' which is supposed to act as the "management" HQ for Ozone. Currently, it has access to SCM and OM metadata in the same format as they are in the respective components. The vision for Recon is to know what is happening in Ozone and understand what decisions are being taken by SCM, OM & DN. Please take a look at HDDS-1084 and HDDS-1996 for more information.

Since Recon already has the SCM DB, the same tool should also work against that. Can we add a flag to this CLI like --recon which allows the tool to read from the Recon server as well? That will be useful in understanding Recon's state.

@swagle
Copy link
Contributor

swagle commented May 19, 2020

@sadanand48 Thanks for working on this useful tool.

There is an Ozone component called 'Recon' which is supposed to act as the "management" HQ for Ozone. Currently, it has access to SCM and OM metadata in the same format as they are in the respective components. The vision for Recon is to know what is happening in Ozone and understand what decisions are being taken by SCM, OM & DN. Please take a look at HDDS-1084 and HDDS-1996 for more information.

Since Recon already has the SCM DB, the same tool should also work against that. Can we add a flag to this CLI like --recon which allows the tool to read from the Recon server as well? That will be useful in understanding Recon's state.

Why not add the code to a package in Recon?

@elek elek changed the title HDDS-3622.Implement rocksdb tool to parse scm db HDDS-3622. Implement rocksdb tool to parse scm db May 21, 2020
@elek
Copy link
Member

elek commented May 21, 2020

Thank you @sadanand48, I am very happy to have this feature.

Can we add a flag to this CLI like --recon which allows the tool to read from the Recon server as well?

I think it's a more generic question. How can I choose the appropriate db definition based on a given database / table? There can be multiple options:

  1. If the column families are unique, we can try to use just the name
  2. If not, we can try to identify the required db definition based on the name of the directory (scm.db, ...)

Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again to work on this. I really like it.

One more usability comment:

This patch introduces two subcommands:

ozone rdbparser scmdbparser and ozone rdbparser list.

As I wrote it earlier, I would prefer to have just one single command which can detect the required db schema based on name. I think we can follow the parameter / arguments convention of ldb.

What do you think about this:

ozone ldb --db=/data/.../scm.db --column_family=pipeline scan
ozone ldb --db=/data/..../scm.db list_column_families

For me it's more natural (for example parser seems to be redundant in the current name)

But to be honest: none of my comments are so important: I am happy to commit it in this form if we will improve it in the next few commits...

`ozone

return null;
}

private static List<byte[]> getColumnFamilyList(String dbPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit1: I am not sure if we need a separated method just to call this function.

List<byte[]> columnFamilies = RocksDB.listColumnFamilies(new Options(), parent.getDbPath);

Nit2: I tend to agree with Uncle Bob, that we don't need to include the type of a variable in the name of the variable:

Hungarian Notation was considered to be pretty important back in the Windows C API, when everything was an integer handle or a long pointer or a void pointer, or one of several implementations of “string” (with different uses and attributes). The compiler did not check types in those days, so the programmers needed a crutch to help them remember the types.

In modern languages we have much richer type systems, and the compilers remember and enforce the types. What’s more, there is a trend toward smaller classes and shorter functions so that people can usually see the point of declaration of each variable they’re using.

Java programmers don’t need type encoding. Objects are strongly typed, and editing environments have advanced such that they detect a type error long before you can run a compile! So nowadays HN and other forms of type encoding are simply impediments. They make it harder to change the name or type of a variable, function, or class. They make it harder to read the code. And they create the possibility that the encoding system will mislead the reader.

(In: Clean Code / Chapter 2. Meaningful name)

)
public class SCMDBParser implements Callable<Void> {

@CommandLine.Option(names = {"-table"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can you please use --table. It's more standard (or use -t and --table as an alias)

@sadanand48
Copy link
Contributor Author

Thanks @elek for the comments. I have made the changes you suggested. Also, i am now choosing the DBDefinition according to the directory name. In this patch , i have done it only for scm. Will extend it for om in a follow up jira.

  1. If not, we can try to identify the required db definition based on the name of the directory (scm.db, ...)

@sadanand48
Copy link
Contributor Author

@elek Can you please review this ? I have made the changes you suggested.

Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, i am now choosing the DBDefinition according to the directory name. In this patch , i have done it only for scm. Will extend it for om in a follow up jira.

Sounds like a great plan.

Thanks for the update. (And sorry for the late re-check . I had a few "code writing" days) Will merge it soon.

I think it will be a very useful tool to debug problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants