Skip to content

Conversation

@symious
Copy link
Contributor

@symious symious commented Oct 2, 2022

What changes were proposed in this pull request?

This ticket is to add the metric of EstimatedKeyCount for SCM DB.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7282

How was this patch tested?

unit test.

checkTableStatus(statefulServiceConfigTable,
STATEFUL_SERVICE_CONFIG.getName());

mxBean = MBeans.register("SCMMetadataStore", "SCMMetadataStoreImpl",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this needs an "unRegister" in the stop() method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodonnel Thank you for your reminder, updated the PR, please have a look.

}

@Override
public String getEstimatedKeyCount() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typed this earlier, but forgot to post it. This means we are going to have a metric where the value is a command seperated list of key value pairs, eg:

metric -> tab1 : 10, tab2 : 15 ... etc.

Have we got any other metrics like this? Would it be better if we had a metric for each table directly, so they can be charted etc if needed? Ie

SCMEstimatedKeyCountTab1 -> 10
SCMEstimatedKeyCounttab2 -> 15
etc

You would need to have dynamic metric names for this, but there is an example of doing that in ReplicationManagerMetrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodonnel Thank you for the advice. Updated the PR, please have a look.

public class SCMMetadataStoreImpl implements SCMMetadataStore {

public static final Set<DBColumnFamilyDefinition<?, ?>> COLUMN_FAMILIES =
new HashSet<>(Arrays.asList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this extra map containing all the tables? They are already stored in the tableMap, so we can iterate it to list out all the tables.

My concern is that if someone later adds a new table, they can easily miss adding it to this map, and then the metric will not automatically appear.

If we drive it off the tableMap, then there is nothing special to do around metrics when adding a new table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be necessary, when I use the tableMap from the StoreImpl, the metrics class will invoke NullPointerException.

try {
return e.getKey() + " : " + e.getValue().getEstimatedKeyCount();
} catch (IOException ex) {
ex.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we send this to the logger, rather than stderr?

count = scmMetadataStore.getTableMap().get(e.getKey())
.getEstimatedKeyCount();
} catch (IOException exception) {
// Ignore exception here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably log exceptions here rather than totally ignore them.

"EstimatedKeyCount",
"Tracked estimated key count of all column families");

private static final Map<String, MetricsInfo> ESTIMATED_KEY_COUNT_METRICS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand - you would get a null pointer here, as this is a static definition. What about moving this to be an instance variable and build the Interns.info objects when this metrics object is created? That means we still have a cache of the names inside the object to avoid allocations each time the metrics are requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Updated the PR, please have a look.

@sodonnel
Copy link
Contributor

sodonnel commented Oct 5, 2022

This change looks good now - thanks for taking on the changes. However there is one more thing I think we should change. We are calling getEstimatedRowCount on each RocksDB table twice, due to forming the concatenated string in getEstimatedKeyCountStr() and also the individual metrics. I guess the estimatedRowCount call is quite cheap in RocksDB, but it seems wasteful to call it twice.

Could you remove the getEstimatedKeyCountStr method, and build up the gson object as you iterate and create the individual metrics? That would move all the metrics logic into the metrics class, and also stop us calling into RocksDB twice per table.

@symious
Copy link
Contributor Author

symious commented Oct 5, 2022

@sodonnel Sure, updated the PR, please have a look.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if we get a green CI run.

@sodonnel
Copy link
Contributor

sodonnel commented Oct 5, 2022

I think the test failiures are related to this change. I tried running TestContainerStorageManagerHA from intellij and it fails with:

2022-10-05 15:20:37,004 [Listener at 0.0.0.0/51308] INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:start(191)) - StorageContainerManager metrics system started
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000011f0238c8, pid=82478, tid=0x0000000000001603
#
# JRE version: OpenJDK Runtime Environment (Zulu 8.60.0.21-CA-macos-aarch64) (8.0_322-b06) (build 1.8.0_322-b06)
# Java VM: OpenJDK 64-Bit Server VM (25.322-b06 mixed mode bsd-aarch64 compressed oops)
# Problematic frame:
# C  [librocksdbjni896780423911491940.jnilib+0x238c8]  Java_org_rocksdb_RocksDB_getLongProperty+0x6c
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/sodonnell/source/ozone2/hadoop-ozone/integration-test/hs_err_pid82478.log
#
# If you would like to submit a bug report, please visit:
#   http://www.azul.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Same error two times. I tried with the master branch and the tests run fine, so there is something related to this test which is causing the problem.

builder.tag(ESTIMATED_KEY_COUNT, gson.toJson(keyCountMap));
}

public static void unRegister() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does unRegister need to be synchronized too, as it can access instance while register is running? I'm not sure if it does or not, but might be safer to make it synchronized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me add it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants