Skip to content

Conversation

@xBis7
Copy link
Contributor

@xBis7 xBis7 commented Oct 14, 2022

What changes were proposed in this pull request?

In order to make the CLI and ozone admin datanode subcommands, more consistent, added a hostname parameter for the command list info and an address parameter that accepts both ip and hostname for the command usage info.

There is an hdfs flag dfs.datanode.use.datanode.hostname checked by the SCMNodeManager during start up and if it's true then we can get the datanode info by providing the hostname with the --ip pamameter, while ip will not be available anymore. The default case is keeping the flag's value to false where the ip is available but not the hostname.

If we add the --address parameter, then we can stop using the flag and clean up the code.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7329

How was this patch tested?

Modified the existing tests and they are all passing. Also, some unit tests for the cases where hostname or ip changes. Tested manually in a docker cluster and kubernetes.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xBis7 for working on this.

@adoroszlai adoroszlai requested a review from kerneltime October 15, 2022 19:31
@adoroszlai
Copy link
Contributor

@sokui @Xushaohong please review

@sokui
Copy link
Contributor

sokui commented Oct 15, 2022

Hi @xBis7 , we should not deprecate dfs.datanode.use.datanode.hostname and always make it as false. This flag is also used by the users who deploy ozone to ip changing environment such as k8s. In those environments, the ip of the ozone components can be changed, and thus we should use the hostname for communication.

This PR is to make the datanodes to be adaptive to k8s environments by letting it use hostname. Please check it out. Thanks. #3186

@xBis7
Copy link
Contributor Author

xBis7 commented Oct 16, 2022

Hi @sokui, I wasn't aware of this. I was told that this flag is not used by the community and that we can deprecate it. I looked at #3186 and you are right we need to keep the existing code as it is.

The goal here is to have a consistent CLI, so we keep the code as it is but we won't expose that behavior to the user. We need the --ip to accept only an IP address and add the --hostname parameter for accepting a hostname. We could create two new maps, one for mapping IP to UUID and one for mapping hostname to UUID and we could use those maps for accessing datanode info from the CLI. @adoroszlai @sokui What do you think about this approach?

@Xushaohong
Copy link
Contributor

The goal here is to have a consistent CLI, so we keep the code as it is but we won't expose that behavior to the user. We need the --ip to accept only an IP address and add the --hostname parameter for accepting a hostname. We could create two new maps, one for mapping IP to UUID and one for mapping hostname to UUID and we could use those maps for accessing datanode info from the CLI. @adoroszlai @sokui What do you think about this approach?

This sounds feasible, but some worry occurs that how to make sure the IP - UUID - hostname mapping is always consistent. If the hostname changes at some DN, we may need the retry logic with the IP-UUID mapping and need to update the corresponding hostname.

@xBis7
Copy link
Contributor Author

xBis7 commented Oct 17, 2022

If the hostname changes at some DN, we may need the retry logic with the IP-UUID mapping and need to update the corresponding hostname.

@Xushaohong Can you provide an example in the code of how we are currently handling such a case?

@xBis7
Copy link
Contributor Author

xBis7 commented Oct 17, 2022

Moved everything back to how it was and added these two methods used only by the CLI. This achieves the wanted behavior and I think it doesn't affect the system in the cases mentioned above. Let me know what you all think.

@kerneltime
Copy link
Contributor

I think we need to revisit this PR. Ideally, the listing of nodes should work the same for IP and hostname and should not require opening up specific APIs in SCM protocol.

@xBis7
Copy link
Contributor Author

xBis7 commented Oct 18, 2022

@kerneltime Let me explain in more details the purpose of this PR. The issue is that you can either have ip or hostname and use only one of them to get the datanode info.

from master the default behavior

bash-4.2$ ozone admin datanode usageinfo --ip=172.21.0.5
Usage Information (1 Datanodes)

UUID         : 08c48611-7121-46a5-a047-129f573c78a6 
IP Address   : 172.21.0.5 (ozone_datanode_1.ozone_default) 
Capacity     : 182775984128 B (170.22 GB) 
Total Used   : 168680075264 B (157.10 GB) 
Total Used % : 92.29% 
Ozone Used   : 4096 B (4 KB) 
Ozone Used % : 0.00% 
Remaining    : 14095908864 B (13.13 GB) 
Remaining %  : 7.71% 

bash-4.2$ ozone admin datanode usageinfo --ip=ozone_datanode_1.ozone_default
Usage Information (0 Datanodes)

from master with dfs.datanode.use.datanode.hostname=true

bash-4.2$ ozone admin datanode usageinfo --ip=172.19.0.5
Usage Information (0 Datanodes)

bash-4.2$ ozone admin datanode usageinfo --ip=d416d3fec041
Usage Information (1 Datanodes)

UUID         : 809c1065-2594-4208-bc57-4442b71022e2 
IP Address   : 172.19.0.5 (d416d3fec041) 
Capacity     : 182775984128 B (170.22 GB) 
Total Used   : 168679792640 B (157.10 GB) 
Total Used % : 92.29% 
Ozone Used   : 4096 B (4 KB) 
Ozone Used % : 0.00% 
Remaining    : 14096191488 B (13.13 GB) 
Remaining %  : 7.71% 

We want to have a --hostname parameter and also we want the CLI to be more consistent, meaning that the --ip accepts only ip and the new --hostname accepts only hostname. If we just add the --hostname parameter then in the above second case we will have --hostname that accepts hostname and --ip that also accepts only hostname.

As pointed out to me, we can't deprecate the flag or change the current logic in SCMNodeManager because it's necessary for environments like Kubernetes where the datanode can restart and end up with a new Ip address. That's why I created specific APIs for the CLI.

The current code achieves the desired behavior.

from this branch with dfs.datanode.use.datanode.hostname=true

bash-4.2$ ozone admin datanode usageinfo --ip=172.22.0.5
Usage Information (1 Datanodes)

UUID         : 961511cf-38ec-412c-b805-381549ce03cf 
IP Address   : 172.22.0.5 
Hostname     : 16cbb539ec92 
Capacity     : 182775984128 B (170.22 GB) 
Total Used   : 168680542208 B (157.10 GB) 
Total Used % : 92.29% 
Ozone Used   : 4096 B (4 KB) 
Ozone Used % : 0.00% 
Remaining    : 14095441920 B (13.13 GB) 
Remaining %  : 7.71% 

bash-4.2$ ozone admin datanode usageinfo --ip=16cbb539ec92
Usage Information (0 Datanodes)

bash-4.2$ ozone admin datanode usageinfo --hostname=16cbb539ec92
Usage Information (1 Datanodes)

UUID         : 961511cf-38ec-412c-b805-381549ce03cf 
IP Address   : 172.22.0.5 
Hostname     : 16cbb539ec92 
Capacity     : 182775984128 B (170.22 GB) 
Total Used   : 169011408896 B (157.40 GB) 
Total Used % : 92.47% 
Ozone Used   : 4096 B (4 KB) 
Ozone Used % : 0.00% 
Remaining    : 13764575232 B (12.82 GB) 
Remaining %  : 7.53% 

@kerneltime
Copy link
Contributor

Let's discuss this in the next community sync @neils-dev

@xBis7
Copy link
Contributor Author

xBis7 commented Oct 24, 2022

@sokui @Xushaohong If this Map holds both IPs and Hostnames at all times, will this be an issue? In case we are in an ip changing environment and the ip doesn't work, hostname will still be available for the user. What do you think? Also, if you could provide a way for me to test it.

@Xushaohong
Copy link
Contributor

Xushaohong commented Oct 25, 2022

If this Map holds both IPs and Hostnames at all times, will this be an issue? In case we are in an ip changing environment and the ip doesn't work, hostname will still be available for the user. What do you think? Also, if you could provide a way for me to test it.

Currently, this map is fine with IP changing environment since we either use IP or hostname, not both of them simultaneously. The prerequisite is the config dfs.datanode.use.datanode.hostname.
If the map holds both IPs and Hostnames at all times, as it maps hostname / IP to the DNs, there will be the same UUID occurs in different keys' value sets causing a sequence of unknown errors. This part of the structure needs to be designed if you want to do so.

The simple test environment could be found in the kubernetes directory.
E.g ozone/hadoop-ozone/dist/target/ozone-1.3.0-SNAPSHOT/kubernetes
There is a README.md to teach how to set up an ozone k8s cluster.
Kill/ Restart one DN pod could change the IP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also be very careful. In my impression, in some of my test, datanodeDetails.getHostName() might give you IP instead of host name. Please test if your approach does not have bugs for this case as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sokui can you elaborate when you see an IP address being returned for hostname?

Copy link
Contributor

@sokui sokui Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerneltime it has been so long time, I could not remember. But my testing environment was Kubernetes. I think when we deployed ozone, I saw this happens a couple of times. But not sure what was the exact condition.

@neils-dev
Copy link
Contributor

Thanks @sokui for your comments. On the errors you mention that may happen when using the map for both ip and hostnames mapping to UUIDs of DNs,

If the map holds both IPs and Hostnames at all times, as it maps hostname / IP to the DNs, there will be the same UUID occurs in different keys' value sets causing a sequence of unknown errors.

Can you give some detail on this? Where this can cause problems?

@sokui
Copy link
Contributor

sokui commented Oct 26, 2022

Thanks @sokui for your comments. On the errors you mention that may happen when using the map for both ip and hostnames mapping to UUIDs of DNs,

If the map holds both IPs and Hostnames at all times, as it maps hostname / IP to the DNs, there will be the same UUID occurs in different keys' value sets causing a sequence of unknown errors.

Can you give some detail on this? Where this can cause problems?

@neils-dev , I did not mean your solution will have problems in this case. I just try to reminder you to be careful of this case. When the variable/method named as hostname, people may assume it is just the hostname not IP, but in some case it is not true. So far in this PR, I do not think this is an issue.

@xBis7
Copy link
Contributor Author

xBis7 commented Oct 26, 2022

@Xushaohong

Kill/ Restart one DN pod could change the IP.

This works fine with the current approach. Any particular scenarios or test cases? Commands to execute? I'm referring to anything more complex than getting the datanode info.

@kerneltime
Copy link
Contributor

This looks much better, will complete my review this week.

@Xushaohong
Copy link
Contributor

Kill/ Restart one DN pod could change the IP.

This works fine with the current approach. Any particular scenarios or test cases? Commands to execute? I'm referring to anything more complex than getting the datanode info.

Maybe you could try on the metal environment and adjust the hostname manually which is the case for hostname change.
If the patch is able to handle the IP-changing and hostname-changing cases, that would be great~

@sokui
Copy link
Contributor

sokui commented Oct 27, 2022

Kill/ Restart one DN pod could change the IP.

This works fine with the current approach. Any particular scenarios or test cases? Commands to execute? I'm referring to anything more complex than getting the datanode info.

Maybe you could try on the metal environment and adjust the hostname manually which is the case for hostname change. If the patch is able to handle the IP-changing and hostname-changing cases, that would be great~

For k8s, if you deploy datanodes as deployment instead of statefulset, when you delete a node, it will restart with a different hostname, I think.

@kerneltime
Copy link
Contributor

Please add a end to end robot test which should help close the loop for the feature.

@neils-dev neils-dev added the gr label Jan 26, 2023
@adoroszlai adoroszlai marked this pull request as draft November 21, 2023 07:23
@adoroszlai
Copy link
Contributor

@xBis7 I plan to finish this, are you OK with that?

@xBis7
Copy link
Contributor Author

xBis7 commented Nov 21, 2023

@adoroszlai Please, feel free to take this over.

As far as I can recall, these were the unresolved issues that blocked this ticket

  • We need to use hostnames with Kubernetes because it keeps changing Node IPs dynamically
  • It would be nice to migrate Ozone entirely to using hostnames
  • Ozone is too dependent on IPs for topology and therefore moving entirely to hostnames is impossible for now
  • Based on this flag dfs.datanode.use.datanode.hostname we can either have IPs available or hostnames. SCMNodeManager needs refactoring to have both of them available

@adoroszlai
Copy link
Contributor

As far as I can recall, these were the unresolved issues that blocked this ticket

Thanks @xBis7, that's very useful for the future. However, here I don't intend to address all hostname-related problems, just trying to wrap it up considering your latest comment:

Went back to using the flag for hostnames and now everything works and we still have the desired outcome, that the ip and hostname are both available at the client.

@adoroszlai adoroszlai marked this pull request as ready for review November 22, 2023 10:32
@adoroszlai adoroszlai requested review from dombizita, kaijchen, sadanand48, siddhantsangwan and sokui and removed request for sokui November 23, 2023 14:09
DatanodeDetails dn = nodeStateManager.getNode(datanodeDetails);
Preconditions.checkState(dn.getParent() != null);
addToDnsToUuidMap(dnsName, datanodeDetails.getUuid());
addToDnsToUuidMap(ipAddress, uuid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are synchronized methods, I think we should do both updates (ip & hostname) in a single method call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sadanand48, updated.

}
updateDnsToUuidMap(oldDnsName, dnsName, datanodeDetails.getUuid());
updateDnsToUuidMap(oldIpAddress, ipAddress, uuid);
updateDnsToUuidMap(oldHostName, hostName, uuid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

Copy link
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adoroszlai for the patch update, LGTM

@adoroszlai
Copy link
Contributor

@xBis7 please let me know if you are OK with the latest update to be merged

@xBis7
Copy link
Contributor Author

xBis7 commented Nov 27, 2023

@adoroszlai Thanks for finishing this. I've gone through the changes. LGTM!

@adoroszlai adoroszlai merged commit 8c97e1e into apache:master Nov 27, 2023
@adoroszlai
Copy link
Contributor

Thanks @xBis7 for the patch, @kerneltime, @sadanand48, @sokui, @Xushaohong for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants