HDDS-7199. Implement new mix workload Read/Write Freon command which meets specific test requirements #3754

DaveTeng0 · 2022-09-14T18:41:43Z

What changes were proposed in this pull request?

Measure r/w performance when there is a very large of amount of metadata in rocksDB, & different amount of working sets whose size are larger than cache available.

Pure read
Pure write
Mixed workload, Read + Write

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7199

How was this patch tested?

Robot tests, manual tests in cluster.

DaveTeng0 · 2022-09-14T18:42:43Z

cc. @kerneltime @jojochuang @umamaheswararao @duongkame

kerneltime · 2022-09-20T22:17:09Z

cc @duongkame

…block-token generation

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OzoneClientKeyReadWriteOps.java

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/RangeKeysGenerator.java

… get key details

hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/security/x509/SecurityConfig.java

duongkame · 2022-09-23T22:12:42Z

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java

-        HDDS_BLOCK_TOKEN_ENABLED_DEFAULT);
+//    this.grpcBlockTokenEnabled = conf.getBoolean(HDDS_BLOCK_TOKEN_ENABLED,
+//        HDDS_BLOCK_TOKEN_ENABLED_DEFAULT);
+    this.grpcBlockTokenEnabled = false;


Please ensure this doesn't get in.

yes!! let me change it back!

DaveTeng0 · 2022-09-26T21:56:30Z

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OzoneClientKeyReadWriteOps.java

+    } else {
+      byte[] data = new byte[objectSizeInBytes];
+      try (OzoneInputStream inputStream = ozoneBucket.readKey(keyName)) {
+        inputStream.read(data);


@kerneltime hey Ritesh, if I don't need to access the result from read method, is there a way we could suppress the build error thrown from git workflow by this line (because the build throws an error that we ignore the result of read method)? Thanks!

You can add a @SuppressWarnings({"unused"}) to tell findbugs make an exception here.

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OzoneClientKeyReadWriteOps.java

kerneltime · 2022-09-27T21:09:03Z

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OzoneClientKeyReadWriteOps.java

+
+  @CommandLine.Option(names = {"-r", "--range"},
+          description = "index range of read/write operations.",
+          defaultValue = "0")


We need to establish if --range is optional? It could read contiguously till it hit not found for a key and limit the range to that point. Or it could always be a required option.

Let's put it as required for now! We could think about to make it become optional as an improvement later!

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OzoneClientKeyReadWriteOps.java

kerneltime · 2022-09-27T21:30:29Z

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OzoneClientKeyReadWriteOps.java

+  public String getKeyName(int clientIndex) {
+    int start, end;
+    // separate tasks evenly to each client
+    if (range < clientsCount) {


Please validate the expected behavior when client >> range. It might make sense to cap the client to be <= range, so in the case client configured is less than range, we only create enough clients such that each client reads only one object. Or to hammer the same client from multiple clients, if range < clients, each client gets the entire range.

ok I update the logic a little bit!
Each thread has its own client which holds tcp connection to OM.

We set the range here which would become the range of each thread/client could read/write on.

For example, If we set range = 10, start_index (index of key) = 20, threadNo. = 30, and we run the Freon test 5 times.
Then only 5 clients would process the read/write operation from key index 20 to 29 (range=10).

If we set range = 10, start_index (index of key) = 20, threadNo. = 30 (clients = 30), and we run the Freon test 40 times.
Then first 30 tests would be assign to 30 client to read/write from key index 20 to 29 (range=10).
The rest of 10 tests would be picked up by any 10 clients whoever finish their tasks first.

Then we utilize multiple nodes to separate the total range of keys the user want to test on.

DaveTeng0 · 2022-09-28T01:40:11Z

Yes! The command will potentially read some key which doesn't exist in the cluster.
Currently the freon command would stop running & report failure, but let me think more about this how to make it better!!

kerneltime · 2022-10-12T05:35:25Z

Yes! The command will potentially read some key which doesn't exist in the cluster.
Currently the freon command would stop running & report failure, but let me think more about this how to make it better!!

Code can continue testing and add a metric at the end for successful reads vs. not found read. It is a valid test to look at the performance of OM to report a key does not exist. Object Stores can be bombarded with nonexistent keys, and the performance of the underlying storage to report keys that don't exist is important. LSM tree-based storage has to scan all levels before reporting a key as not found and in some ways represents the worst case performance at scale.

DaveTeng0 · 2022-10-19T21:48:35Z

Yes! The command will potentially read some key which doesn't exist in the cluster.
Currently the freon command would stop running & report failure, but let me think more about this how to make it better!!

Code can continue testing and add a metric at the end for successful reads vs. not found read. It is a valid test to look at the performance of OM to report a key does not exist. Object Stores can be bombarded with nonexistent keys, and the performance of the underlying storage to report keys that don't exist is important. LSM tree-based storage has to scan all levels before reporting a key as not found and in some ways represents the worst case performance at scale.

Thanks Ritesh for the context!! I'll create a separate jira regarding to this! This definitely make sense!!

duongkame · 2022-10-19T22:24:28Z

Code can continue testing and add a metric at the end for successful reads vs. not found read. It is a valid test to look at the performance of OM to report a key does not exist. Object Stores can be bombarded with nonexistent keys, and the performance of the underlying storage to report keys that don't exist is important. LSM tree-based storage has to scan all levels before reporting a key as not found and in some ways represents the worst case performance at scale.

agree, reading nonexistent keys is a valid test case and the tool should support it deterministically. To do that, it has to know (on its own) which key exists and which doesn't. Warp does that by having a pre-test phase in which warp creates a set of keys (10K or so) and keeps the created keys in memory for the real read test.

We can also do the same for this tool, by maintaining a set of known keys that can be initialized by a pretest phase and grows with the write test.

kaijchen · 2022-10-20T02:19:50Z

There are failures in CI, please make sure CI pass before merge @kerneltime.

…d which meets specific test requirements (#3754)" This reverts commit e45f9b8.

adoroszlai · 2022-10-20T02:31:54Z

Sorry, I had to revert this, because checkstyle and findbugs failures affect all other PRs. Please fix the failures and open new PR.

DaveTeng0 · 2022-10-20T03:04:53Z

Code can continue testing and add a metric at the end for successful reads vs. not found read. It is a valid test to look at the performance of OM to report a key does not exist. Object Stores can be bombarded with nonexistent keys, and the performance of the underlying storage to report keys that don't exist is important. LSM tree-based storage has to scan all levels before reporting a key as not found and in some ways represents the worst case performance at scale.

agree, reading nonexistent keys is a valid test case and the tool should support it deterministically. To do that, it has to know (on its own) which key exists and which doesn't. Warp does that by having a pre-test phase in which warp creates a set of keys (10K or so) and keeps the created keys in memory for the real read test.

We can also do the same for this tool, by maintaining a set of known keys that can be initialized by a pretest phase and grows with the write test.

Sure!! I'll take a look how Warp do it and create a jira for it!

DaveTeng0 · 2022-10-20T03:05:21Z

There are failures in CI, please make sure CI pass before merge @kerneltime.

Sorry!! I'll take a look!!

DaveTeng0 · 2022-10-20T03:06:22Z

Sorry, I had to revert this, because checkstyle and findbugs failures affect all other PRs. Please fix the failures and open new PR.

Sorry!! I'll take a look!! thanks Attila!

kerneltime · 2022-10-20T18:00:17Z

Sorry, I had to revert this, because checkstyle and findbugs failures affect all other PRs. Please fix the failures and open new PR.

Thank you @adoroszlai!
I should have checked

DaveTeng0 added 14 commits August 25, 2022 15:39

add Freon mix workload test command

44cb5ba

update

c026a4e

update mix workload test

202e36c

update mix workload Freon command

31188a2

Update mix workload Freon command class

a4383bf

fix compiling error

a16bb53

Add robot test for read/write keys operation

d8ceca7

remove testing Freon class:

8e69bc5

handle read/write key exception

2e8e03c

remove additional LOG variable

24b4e1f

Update mix workload Freon command

95442e8

Freon test ozone mix workload

d3f01ef

Remove debug logs

58c809c

Fix code style

68ee21a

Update robot test

f7f1dd6

DaveTeng0 changed the title ~~HDDS 7199: Implement new mix workload Read/Write Freon command which meets specific test requirements~~ HDDS-7199: Implement new mix workload Read/Write Freon command which meets specific test requirements Sep 14, 2022

sodonnel changed the title ~~HDDS-7199: Implement new mix workload Read/Write Freon command which meets specific test requirements~~ HDDS-7199. Implement new mix workload Read/Write Freon command which meets specific test requirements Sep 15, 2022

DaveTeng0 added 3 commits September 16, 2022 05:35

Add multi-clients support

244e855

Fix checkstyle error

25d6057

Add name for range keys generator

7cc0685

set md5 as default algorithm to calculate unsorted key name; disable …

4651529

…block-token generation

duongkame reviewed Sep 20, 2022

View reviewed changes

DaveTeng0 added 2 commits September 20, 2022 16:30

Comment out unused block-token import; ignore block-token unit tests

102cb3c

Update CLI option name;Use enum for read/write task-type;use proxy to…

0d259f9

… get key details

duongkame reviewed Sep 23, 2022

View reviewed changes

DaveTeng0 added 2 commits September 26, 2022 09:21

Revert distabled block-token generation

0db071e

Remove debug message

552946f

DaveTeng0 commented Sep 26, 2022

View reviewed changes