-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-5073. Use ReplicationConfig on client side #2136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sodonnel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This follows on nicely from #2096.
| try { | ||
| factor = ReplicationFactor.valueOf(Integer.parseInt(factorString)); | ||
| } catch (NumberFormatException ex) { | ||
| factor = ReplicationFactor.valueOf(factorString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an exception explicitly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are chances that both of these calls will throw errors? Whats the expected type of the factorString ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the expected values are ONE and THREE but this code adds support to parse ozone.replication which can be 1 and 3 today. Long-term it will enable to use 3:2 in the ozone.replication (but we also need to configure EC as default replicationType)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an exception explicitly here?
Currently, IllegalArgumentException is thrown with the message:
Exception in thread "main" java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hdds.protocol.proto.HddsProtos.ReplicationFactor.asd
I think the type (IllegalArgumentException) is fine, but I modified the error message with throwing explicit exception:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid RatisReplicationFactor 'xxx'. Please use ONE or THREE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. At first, I thought ReplicationFactor is only for legacy configs and won't be used for EC schemas.
|
I saw there are some classes and tests still using the legacy replicationType and replicationFactor, for example, ObjectEndpiont.put(). |
|
|
||
| private void verifyNodeType(NodeType type) | ||
| throws InconsistentStorageStateException { | ||
| throws |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unintended changes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, reverting it. Thanks for the notification.
| } | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unintended changes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted, thanks.
| try { | ||
| factor = ReplicationFactor.valueOf(Integer.parseInt(factorString)); | ||
| } catch (NumberFormatException ex) { | ||
| factor = ReplicationFactor.valueOf(factorString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are chances that both of these calls will throw errors? Whats the expected type of the factorString ?
| .get(OZONE_REPLICATION_TYPE, OZONE_REPLICATION_TYPE_DEFAULT)); | ||
| } | ||
|
|
||
| if (replication == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why, do we want to convert replication factor to a string ? What are the expected strings which we expect after EC implementation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.
I think we can start with d:p pattern where d is the number of data parts and p is the number of the parity part. But we can also extend it later with codec. At first, we can support 3:2 and 6:4 and 10:3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want this config to be a string ? can cant we expose this as known set of enums ? cc @umamaheswararao @arp7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question (thanks for it), and I think there are multiple parts.
-
This is only for
ozone shcommands not for s3 (where we have storage classes) or for o3fs/ofs (where we plan to configure it cluster wide or bucket level. -
Originally I had the same idea.
ReplicationConfigtoday can have multiple parameters. (For example3:2can be defined asnew ECReplicationConfig(3,2)). I thought that it might be better to accept only enums (or pre-defined strings) which are further defined by the administrators (or hard-coded by the developers):
For example:
| replication type | enum / group name | replication config |
|---|---|---|
| RATIS | ONE | `new RatisReplicationConfig(ONE) |
| RATIS | THREE | `new RatisReplicationConfig(THREE) |
| STANDALONE | ONE | `new StandaloneReplicationConfig(ONE) |
| EC | 3:2 |
new EcReplicationConfig(3:2) |
| EC | 6:4 |
new EcReplicationConfig(6:4) |
| EC | 10:3 |
new EcReplicationConfig(10:3) |
But when we add the codec to ECReplicationConfig (RS vs XOR) it will create more and more entries in this table.
- I think (at least at long term) administrators should have the flexibility to define new groups (shouldn't be everything hardcoded in the code)
- I think it's an additional complexity to manage all of these mappings.
As ozone sh is a low level cli tool, it seems to be easier to avoid this grouping and just enable to directly configure all possible parameters (especially as there is no limit on the SCM/OM side, we can handle different configuration without any problem)
- Compatibility
I also felt that this approach is a more natural extension to the current scheme. We can introduce a new abstraction level (with new cli parameters like --storage-type) where we accept names instead of low level parameters:
| name | replication type | replication config |
|---|---|---|
| REDUCED | RATIS | `new RatisReplicationConfig(ONE) |
| DEFAULT | RATIS | |
| STANDALONE | `new StandaloneReplicationConfig(ONE) | |
| EC_3_2 | EC | new EcReplicationConfig(3:2) |
But I am not sure if we need this abstraction level, especially at the beginning of the feature branch.
- Enum vs string
When you asked about enum, we can talk about two parts: do we need pre-defined groups AND/OR do we need to store in enum the groups?
I learned with the port changes that enum is very dangerous. We introduced a new service on DN and added a new enum value for the service discovery API and introduced a lot of backward compatibility problem (new enum values couldn't be deserialized with old clients). String seems to be more flexible (we can validate it after protobuf conversion).
This example doesn't fully apply here, as it's a full client-only change. (on protobuf this string is not sent), but I learned that enums (in some cases) can be harder
TLDR: This is why I think the string based approach is better here, but these are somewhat subjective points, please add your opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel keeping values as strings opens up the whole pandora box of validations to do.
Is there a list of terminal policies this config will translate to ? I mean with client to server communication, do we plan to use string to pass the code info ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mukul1987 Do you have any more questions or concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not comfortable with the idea of an open string-based config. ENUMS are not dangerous every time when we transfer them over the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of an open string based config either. ENUMs are not inherently dangerous. Strings will be far worse.
The bigger issue is a lack of discipline in thinking through compatibility concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENUMS are not dangerous every time when we transfer them over the network.
Thanks the comment @mukul1987 and @arp7
As we discussed during the EC meeting it's NOT about protobuf changes. It's a client side only change.
In general: we need multiple type of replication configs. For example:
new RatisReplicationConfig(ONE)
new RatisReplicationConfig(THREE)
new StandaloneReplicationConfig(ONE)
new StandaloneReplicationConfig(THREE)
new ECReplicationConfig(2,1,XOR)
new ECReplicationConfig(4,3,RS)
new ECReplicationConfig(6,4,RS)
new ECReplicationConfig(10,1,RS)
All of these configs are strongly typed both in Java code (OM/SCM) and protobuf. ONE/THREE are enums as before.
The only problem here is that Enum types / replication definitions depend on the replication config type. EC may need different enum values. First we need to choose the replication config based on the type (EC/RATIS/STANDLAONE) and after that we can ask the selected ReplicationConfig to parse the string to Enum. This is what we do in this patch.
It's a separated discussion if we need to validate the input string in ECReplicationConfig based on predefined enums or based on smart validation rules. Both can work, and please add your opinion to https://issues.apache.org/jira/browse/HDDS-5247
The bigger issue is a lack of discipline in thinking through compatibility concerns.
Thanks the feedback @arp7
Can you please share your compatibility concerns?
This is a client side change and as you can see none of the smoketests are changed, so we can assume it's fully compatible. ozone.replication is also changed, but it accepts both integer and string existing values work as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me also with client-only change (i.e. no strings on the wire)
|
ping @mukul1987, @umamaheswararao |
|
@mukul1987 @umamaheswararao @arp7: Can we merge it? Do you have any more concerns? |
mukul1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THe current patch looks ok, lets do a followup patch for the validation.
|
Thanks the review @sodonnel, @symious @mukul1987 @arp7 I am merging it now... |
…ing-upgrade-master-merge * upstream/master: (76 commits) HDDS-5280. Make XceiverClientManager creation when necessary in ContainerOperationClient (apache#2289) HDDS-5272. Make ozonefs.robot execution repeatable (apache#2280) HDDS-5123. Use the pre-created apache/ozone-testkrb5 image during secure acceptance tests (apache#2165) HDDS-4993. Add guardrail for reserved buffer size when DN reads a chunk (apache#2058) HDDS-4936. Change ozone groupId from org.apache.hadoop to org.apache.ozone (apache#2018) HDDS-4043. allow deletion from Trash directory without -skipTrash option (apache#2110) HDDS-4927. Determine over and under utilized datanodes in Container Balancer. (apache#2230) HDDS-5273. Handle unsecure cluster convert to secure cluster for SCM. (apache#2281) HDDS-5158. Add documentation for SCM HA Security. (apache#2205) HDDS-5275. Datanode Report Publisher publishes one extra report after DN shutdown (apache#2283) HDDS-5241. SCM UI should have leader/follower and Primordial SCM information (apache#2260) HDDS-5219. Limit number of bad volumes by dfs.datanode.failed.volumes.tolerated. (apache#2243) HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. (apache#2271) HDDS-5191. Increase default pvc storage size (apache#2219) HDDS-5073. Use ReplicationConfig on client side (apache#2136) HDDS-5250. Build integration tests with Maven cache (apache#2269) HDDS-5236. Require block token for more operations (apache#2254) HDDS-5266 Misspelt words in S3MultipartUploadCommitPartRequest.java line 202 (apache#2279) HDDS-5249. Race Condition between Full and Incremental Container Reports (apache#2268) HDDS-5142. Make generic streaming client/service for container re-replication, data read, scm/om snapshot download (apache#2256) ... Conflicts: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/MockNodeManager.java hadoop-ozone/dist/src/main/compose/testlib.sh hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto hadoop-ozone/ozone-manager/pom.xml hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/utils/OzoneManagerRatisUtils.java hadoop-ozone/s3gateway/pom.xml
What changes were proposed in this pull request?
HDDS-5011 introduced new ReplicationConfig with multiple implementations (Ratis/Standalone).
To make ozone sh client EC compatible we can use ReplicationConfig on the client side too: --replication parameter can be parsed from the string based on the ReplicationConfig. Today it can be THREE or ONE, but later ECReplicationConfig can support more sophisticated schemes (like 3:2)
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5073
How was this patch tested?
With local docker-compose cluster + CI
No logic has been changed only the internal representation of the data.