HDDS-3925. SCM Pipeline DB should directly use UUID bytes for key rather than rely on proto serialization for key. #1197

fapifta · 2020-07-11T01:46:53Z

What changes were proposed in this pull request?

As we recently learned, protoBuf serialization is not needed to produce the same byte array over the wire, and therefore the toByteArray method's return value should not be used to byte array based comparison of protobuf serializations.
In the SCM's RocksDB we use the PipelineID as a key in the Pipeline table, and the key is created using the byte array representation of the protobuf serialization of the PipelineID, which can lead to mismatches when we access anything from the table based on a key. At the moment this can prevent us to delete a Pipeline from the table if there is a change in the byte array representation.

In order to avoid this situation, the way how we create the key to this table from the PipelineID needs to be changed, this is one part of this PR, and covered in the PipelineIDCodec related changes.
The other part is to clean up the DB from the old keys, as SCM will not be able to close and remove those Pipeline based on the ID as after we change the key serialization the byte array matched and stored will not be the same for the same PipelineID, that is used in the delete method which the code uses. So SCM can end up with pipelines that will never be cleaned up from the DB, and are polluting the in memory structures at startup until they are considered invalid.
In order to avoid this, SCMPipelineManager from now on checks if the key deserialization results in the same PipelineID as the one stored in the value in the table which is a Pipeline object. If the two are different, we remove the old version from the table, and add it with the new key, so that the pipelines are still preserved, but later can be deleted from RocksDB as well by the SCM when appropriate.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3925

How was this patch tested?

JUnit tests are added to new things. And to check key migration happens properly.

…es via the iterator.

…sion.

…approach.

avijayanhwx · 2020-07-13T20:35:28Z

Thank you working on this pifta. I have verified the working using docker based testing.
LGTM +1.

Can we add a unit test for to verify that removeFromDb actually removes the entry? I am OK with adding it through a follow up JIRA.

…actions with the underlying RocksIterator and RocksDBTable.

fapifta · 2020-07-14T00:50:58Z

Hi @avijayanhwx,

thank you for the review, I have pushed the requested test, and a bit more.

At the end of the day, I have added tests to verify the behaviour and interactions of RDBStoreIterator with the underlying RockIterator, and the RocksDBTable. I hope this should sufficiently address the test request, let me know if you thought about something different.

As the TypedTable.TypedTableIterator class purely delegates to the raw RDbStoreIterator, I think that does not require too much tests.

…-3925

avijayanhwx

LGTM +1. I will merge it after a clean CI.

hadoop-hdds/framework/src/test/java/org/apache/hadoop/hdds/utils/db/TestRDBStoreIterator.java

avijayanhwx · 2020-07-14T17:04:22Z

/retest

…-3925

fapifta · 2020-07-14T17:37:30Z

Thank you for the review @avijayanhwx, I haven't seen the CI checks running, so I pushed the branch once more, let's see, all failures seems irrelevant, especially after a clean build was there already before I have added the new test.

adoroszlai · 2020-07-14T17:39:07Z

/retest does not work (#1137)

avijayanhwx · 2020-07-14T21:31:09Z

Thank you for fixing this @fapifta. I have merged your patch.

…her than rely on proto serialization for key. (#1197) (cherry picked from commit 0a1cce5)

…her than rely on proto serialization for key. (apache#1197)

fapifta added 8 commits July 10, 2020 17:30

HDDS-3925. Add first implementation that uses the approach that remov…

caf60fd

…es via the iterator.

HDDS-3925. Addressing comments from internal review and design discus…

9f989cb

…sion.

Add tests to the codec conversion methods, and change the conversion …

4fb4dc0

…approach.

HDDs-3925. Added test to changes in SCMPipelineManager.

d5838fa

HDDS-3925. Added missing license, and fixed checkstyle issues.

74df1ac

Add javadoc to newly added method on TableIterator interface.

78f955a

Fix typo

09c2477

Trigger a new test run.

e15f761

avijayanhwx requested a review from nandakumar131 July 13, 2020 20:48

fapifta added 2 commits July 14, 2020 02:43

HDDS-3925. Review request. Add tests to verify RDBStoreIterator inter…

44b9c1c

…actions with the underlying RocksIterator and RocksDBTable.

Added license. Removed unused import.

acd7ba9

fapifta and others added 5 commits July 14, 2020 02:57

Fix missing javadoc.

da6ef72

Trigger a new test run.

10c6209

empty commit to retest build

b4cba4e

Trigger a new test run.

8344923

Merge branch 'HDDS-3925' of github.com:fapifta/hadoop-ozone into HDDS…

84eaec1

…-3925

avijayanhwx approved these changes Jul 14, 2020

View reviewed changes

hadoop-hdds/framework/src/test/java/org/apache/hadoop/hdds/utils/db/TestRDBStoreIterator.java Show resolved Hide resolved

web-flow and others added 3 commits July 14, 2020 17:04

empty commit to retest build

94ade5b

Trigger a new test run.

4bce3d4

Merge branch 'HDDS-3925' of github.com:fapifta/hadoop-ozone into HDDS…

01fcec1

…-3925

avijayanhwx merged commit 0a1cce5 into apache:master Jul 14, 2020

ChenSammi added the 0.6.0 label Jul 16, 2020

ChenSammi pushed a commit that referenced this pull request Jul 22, 2020

HDDS-3925. SCM Pipeline DB should directly use UUID bytes for key rat…

d97fa28

…her than rely on proto serialization for key. (#1197) (cherry picked from commit 0a1cce5)

fapifta deleted the HDDS-3925 branch July 23, 2020 11:50

rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020

HDDS-3925. SCM Pipeline DB should directly use UUID bytes for key rat…

3955788

…her than rely on proto serialization for key. (apache#1197)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-3925. SCM Pipeline DB should directly use UUID bytes for key rather than rely on proto serialization for key. #1197

HDDS-3925. SCM Pipeline DB should directly use UUID bytes for key rather than rely on proto serialization for key. #1197

Uh oh!

fapifta commented Jul 11, 2020

Uh oh!

avijayanhwx commented Jul 13, 2020 •

edited

Loading

Uh oh!

fapifta commented Jul 14, 2020

Uh oh!

avijayanhwx left a comment

Uh oh!

Uh oh!

avijayanhwx commented Jul 14, 2020

Uh oh!

fapifta commented Jul 14, 2020

Uh oh!

adoroszlai commented Jul 14, 2020

Uh oh!

avijayanhwx commented Jul 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HDDS-3925. SCM Pipeline DB should directly use UUID bytes for key rather than rely on proto serialization for key. #1197

HDDS-3925. SCM Pipeline DB should directly use UUID bytes for key rather than rely on proto serialization for key. #1197

Uh oh!

Conversation

fapifta commented Jul 11, 2020

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

avijayanhwx commented Jul 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fapifta commented Jul 14, 2020

Uh oh!

avijayanhwx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

avijayanhwx commented Jul 14, 2020

Uh oh!

fapifta commented Jul 14, 2020

Uh oh!

adoroszlai commented Jul 14, 2020

Uh oh!

avijayanhwx commented Jul 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

avijayanhwx commented Jul 13, 2020 •

edited

Loading