Skip to content

Conversation

@elek
Copy link
Member

@elek elek commented Apr 9, 2021

What changes were proposed in this pull request?

HDDS-5011 introduced new ReplicationConfig with multiple implementations (Ratis/Standalone).

To make ozone sh client EC compatible we can use ReplicationConfig on the client side too: --replication parameter can be parsed from the string based on the ReplicationConfig. Today it can be THREE or ONE, but later ECReplicationConfig can support more sophisticated schemes (like 3:2)

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5073

How was this patch tested?

With local docker-compose cluster + CI

No logic has been changed only the internal representation of the data.

@elek elek requested review from sodonnel and umamaheswararao April 9, 2021 09:09
Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This follows on nicely from #2096.

try {
factor = ReplicationFactor.valueOf(Integer.parseInt(factorString));
} catch (NumberFormatException ex) {
factor = ReplicationFactor.valueOf(factorString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw an exception explicitly here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are chances that both of these calls will throw errors? Whats the expected type of the factorString ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the expected values are ONE and THREE but this code adds support to parse ozone.replication which can be 1 and 3 today. Long-term it will enable to use 3:2 in the ozone.replication (but we also need to configure EC as default replicationType)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw an exception explicitly here?

Currently, IllegalArgumentException is thrown with the message:

Exception in thread "main" java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hdds.protocol.proto.HddsProtos.ReplicationFactor.asd

I think the type (IllegalArgumentException) is fine, but I modified the error message with throwing explicit exception:

Exception in thread "main" java.lang.IllegalArgumentException: Invalid RatisReplicationFactor 'xxx'. Please use ONE or THREE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. At first, I thought ReplicationFactor is only for legacy configs and won't be used for EC schemas.

@symious
Copy link
Contributor

symious commented Apr 24, 2021

I saw there are some classes and tests still using the legacy replicationType and replicationFactor, for example, ObjectEndpiont.put().
The source branch is named "client-only" and if the work of updating other classes hasn't started, maybe I can help with the work?


private void verifyNodeType(NodeType type)
throws InconsistentStorageStateException {
throws
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unintended changes ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, reverting it. Thanks for the notification.

}
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unintended changes ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted, thanks.

try {
factor = ReplicationFactor.valueOf(Integer.parseInt(factorString));
} catch (NumberFormatException ex) {
factor = ReplicationFactor.valueOf(factorString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are chances that both of these calls will throw errors? Whats the expected type of the factorString ?

.get(OZONE_REPLICATION_TYPE, OZONE_REPLICATION_TYPE_DEFAULT));
}

if (replication == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why, do we want to convert replication factor to a string ? What are the expected strings which we expect after EC implementation ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.

I think we can start with d:p pattern where d is the number of data parts and p is the number of the parity part. But we can also extend it later with codec. At first, we can support 3:2 and 6:4 and 10:3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want this config to be a string ? can cant we expose this as known set of enums ? cc @umamaheswararao @arp7

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question (thanks for it), and I think there are multiple parts.

  1. This is only for ozone sh commands not for s3 (where we have storage classes) or for o3fs/ofs (where we plan to configure it cluster wide or bucket level.

  2. Originally I had the same idea. ReplicationConfig today can have multiple parameters. (For example 3:2 can be defined as new ECReplicationConfig(3,2)). I thought that it might be better to accept only enums (or pre-defined strings) which are further defined by the administrators (or hard-coded by the developers):

For example:

replication type enum / group name replication config
RATIS ONE `new RatisReplicationConfig(ONE)
RATIS THREE `new RatisReplicationConfig(THREE)
STANDALONE ONE `new StandaloneReplicationConfig(ONE)
EC 3:2 new EcReplicationConfig(3:2)
EC 6:4 new EcReplicationConfig(6:4)
EC 10:3 new EcReplicationConfig(10:3)

But when we add the codec to ECReplicationConfig (RS vs XOR) it will create more and more entries in this table.

  1. I think (at least at long term) administrators should have the flexibility to define new groups (shouldn't be everything hardcoded in the code)
  2. I think it's an additional complexity to manage all of these mappings.

As ozone sh is a low level cli tool, it seems to be easier to avoid this grouping and just enable to directly configure all possible parameters (especially as there is no limit on the SCM/OM side, we can handle different configuration without any problem)

  1. Compatibility

I also felt that this approach is a more natural extension to the current scheme. We can introduce a new abstraction level (with new cli parameters like --storage-type) where we accept names instead of low level parameters:

name replication type replication config
REDUCED RATIS `new RatisReplicationConfig(ONE)
DEFAULT RATIS
STANDALONE `new StandaloneReplicationConfig(ONE)
EC_3_2 EC new EcReplicationConfig(3:2)

But I am not sure if we need this abstraction level, especially at the beginning of the feature branch.

  1. Enum vs string

When you asked about enum, we can talk about two parts: do we need pre-defined groups AND/OR do we need to store in enum the groups?

I learned with the port changes that enum is very dangerous. We introduced a new service on DN and added a new enum value for the service discovery API and introduced a lot of backward compatibility problem (new enum values couldn't be deserialized with old clients). String seems to be more flexible (we can validate it after protobuf conversion).

This example doesn't fully apply here, as it's a full client-only change. (on protobuf this string is not sent), but I learned that enums (in some cases) can be harder

TLDR: This is why I think the string based approach is better here, but these are somewhat subjective points, please add your opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel keeping values as strings opens up the whole pandora box of validations to do.

Is there a list of terminal policies this config will translate to ? I mean with client to server communication, do we plan to use string to pass the code info ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mukul1987 Do you have any more questions or concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not comfortable with the idea of an open string-based config. ENUMS are not dangerous every time when we transfer them over the network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of an open string based config either. ENUMs are not inherently dangerous. Strings will be far worse.

The bigger issue is a lack of discipline in thinking through compatibility concerns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENUMS are not dangerous every time when we transfer them over the network.

Thanks the comment @mukul1987 and @arp7

As we discussed during the EC meeting it's NOT about protobuf changes. It's a client side only change.

In general: we need multiple type of replication configs. For example:

new RatisReplicationConfig(ONE)
new RatisReplicationConfig(THREE)
new StandaloneReplicationConfig(ONE)
new StandaloneReplicationConfig(THREE)
new ECReplicationConfig(2,1,XOR)
new ECReplicationConfig(4,3,RS)
new ECReplicationConfig(6,4,RS)
new ECReplicationConfig(10,1,RS)

All of these configs are strongly typed both in Java code (OM/SCM) and protobuf. ONE/THREE are enums as before.

The only problem here is that Enum types / replication definitions depend on the replication config type. EC may need different enum values. First we need to choose the replication config based on the type (EC/RATIS/STANDLAONE) and after that we can ask the selected ReplicationConfig to parse the string to Enum. This is what we do in this patch.

It's a separated discussion if we need to validate the input string in ECReplicationConfig based on predefined enums or based on smart validation rules. Both can work, and please add your opinion to https://issues.apache.org/jira/browse/HDDS-5247

The bigger issue is a lack of discipline in thinking through compatibility concerns.

Thanks the feedback @arp7

Can you please share your compatibility concerns?

This is a client side change and as you can see none of the smoketests are changed, so we can assume it's fully compatible. ozone.replication is also changed, but it accepts both integer and string existing values work as before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me also with client-only change (i.e. no strings on the wire)

@elek
Copy link
Member Author

elek commented May 17, 2021

ping @mukul1987, @umamaheswararao

@elek
Copy link
Member Author

elek commented May 21, 2021

@mukul1987 @umamaheswararao @arp7: Can we merge it? Do you have any more concerns?

Copy link
Contributor

@mukul1987 mukul1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THe current patch looks ok, lets do a followup patch for the validation.

@elek
Copy link
Member Author

elek commented May 25, 2021

Thanks the review @sodonnel, @symious @mukul1987 @arp7

I am merging it now...

@elek elek merged commit 0b4779c into apache:master May 25, 2021
errose28 added a commit to errose28/ozone that referenced this pull request Jun 1, 2021
…ing-upgrade-master-merge

* upstream/master: (76 commits)
  HDDS-5280. Make XceiverClientManager creation when necessary in ContainerOperationClient (apache#2289)
  HDDS-5272. Make ozonefs.robot execution repeatable (apache#2280)
  HDDS-5123. Use the pre-created apache/ozone-testkrb5 image during secure acceptance tests (apache#2165)
  HDDS-4993. Add guardrail for reserved buffer size when DN reads a chunk (apache#2058)
  HDDS-4936. Change ozone groupId from org.apache.hadoop to org.apache.ozone (apache#2018)
  HDDS-4043. allow deletion from Trash directory without -skipTrash option (apache#2110)
  HDDS-4927. Determine over and under utilized datanodes in Container Balancer. (apache#2230)
  HDDS-5273. Handle unsecure cluster convert to secure cluster for SCM. (apache#2281)
  HDDS-5158. Add documentation for SCM HA Security. (apache#2205)
  HDDS-5275. Datanode Report Publisher publishes one extra report after DN shutdown (apache#2283)
  HDDS-5241. SCM UI should have leader/follower and Primordial SCM information (apache#2260)
  HDDS-5219. Limit number of bad volumes by dfs.datanode.failed.volumes.tolerated. (apache#2243)
  HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. (apache#2271)
  HDDS-5191. Increase default pvc storage size (apache#2219)
  HDDS-5073. Use ReplicationConfig on client side  (apache#2136)
  HDDS-5250. Build integration tests with Maven cache (apache#2269)
  HDDS-5236. Require block token for more operations (apache#2254)
  HDDS-5266 Misspelt words in S3MultipartUploadCommitPartRequest.java line 202 (apache#2279)
  HDDS-5249. Race Condition between Full and Incremental Container Reports (apache#2268)
  HDDS-5142. Make generic streaming client/service for container re-replication, data read, scm/om snapshot download (apache#2256)
  ...

Conflicts:
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java
	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java
	hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
	hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/MockNodeManager.java
	hadoop-ozone/dist/src/main/compose/testlib.sh
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
	hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto
	hadoop-ozone/ozone-manager/pom.xml
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/utils/OzoneManagerRatisUtils.java
	hadoop-ozone/s3gateway/pom.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants