-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17577.Add Support for CreateFlag.NO_LOCAL_WRITE in File Creation to Manage Disk Space and Network Load in Labeled YARN Nodes #6935
base: trunk
Are you sure you want to change the base?
Conversation
… to Manage Disk Space and Network Load in Labeled YARN Nodes
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Thanks @liangyu-1 for your report and PR. What about to invoke the following interface and set flag to |
… to Manage Disk Space and Network Load in Labeled YARN Nodes
… to Manage Disk Space and Network Load in Labeled YARN Nodes
@Hexiaoqiao thanks for your reply, I think it's a good idea and I have resubmitted my code that invokes the interface and replace the createFlag. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@liangyu-1 Can we add a corresponding unit test for this? We need to fix the checkstyle issue. |
… to Manage Disk Space and Network Load in Labeled YARN Nodes
hi, @slfan1989, I have just add an Unit Test on the result of function DFSClient has no interface to set the address for DataNode, so I can only add this Unit Test to ensure that we successfully add the Thanks |
💔 -1 overall
This message was automatically generated. |
Sorry for late response. And I didn't know why & what this PR want to do. When I said 'What about to invoke the following interface and set flag to CreateFlag.IGNORE_CLIENT_LOCALITY.', I means that the current implement also support to skip localization when write data to HDFS. Please check again. Thanks. |
@Hexiaoqiao Thanks for your reply, I think I have understood your suggestion. You mean that I can implement that interface and add the Flag in that function. But in my scenario, I am using flink's fileSystem API, and I have read the source code of the flink API that it used I think this will also happens in most computation engines because most engines directly uses function But in my pr, I can solve the problem by just adding the hadoop configuration, this is much more convenient. |
💔 -1 overall
This message was automatically generated. |
… to Manage Disk Space and Network Load in Labeled YARN Nodes
🎊 +1 overall
This message was automatically generated. |
Got it. But I am sorry to disagree your opinion. There is one flexible interface however upstream system do not invoke it, thus we should push the upstream system to update. Another side, if config as this PR do, this will affect whole side run at this Client which could not be expected. In one word, suggest to proposal and submit PR at Flink side. Thanks again. |
@Hexiaoqiao , This does not only happens in Flink, but also other engines like Spark etc. If I only sunmit a PR at FLINK side, the other engines' API (like SPARK, SPARK Structured stream) will not be able to use this feature and we need to rebuilt the whole computation project whenever we choose to use a new computation engine. |
That ain't our concern, We provide an interface to do things, If those engines want to leverage that functionality they can do that way. Those engines can't update their code or their are multiple clients or so doesn't justify bothering the hadoop side code |
… to Manage Disk Space and Network Load in Labeled YARN Nodes
Description of PR
As described in HDFS-17577
I am currently using Apache Flink to write files into Hadoop. The Flink application runs on a labeled YARN queue. During operation, it has been observed that the local disks on these labeled nodes get filled up quickly, and the network load is significantly high. This issue arises because Hadoop prioritizes writing files to the local node first, and the number of these labeled nodes is quite limited.
The current behavior leads to inefficient disk space utilization and high network traffic on these few labeled nodes, which could potentially affect the performance and reliability of the application. As shown in the picture, the host I circled have a average net_bytes_sent speed 1.2GB/s while the others are just 50MB/s, this imbalance in network and disk space nearly destroyed the whole cluster.

Implementation:
I add an configuration
dfs.client.write.no_local_write
to support theCreateFlag.NO_LOCAL_WRITE
during the file creation process in Hadoop's file system APIs. This will provide flexibility to applications like Flink running in labeled queues to opt for non-local writes when necessary.How was this patch tested?
I have rebuilt the whole hadoop-hdfs-client module, and then test them using flink on the labeled YARN queue, the distribution of disk storage across the nodes in the cluster is more even, and the network load has also improved.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?