You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran benchmark to compare the performance to upload data to Cosmos between AzureDLFile.write() and the write API from native libhdfs.so. The result show there is significant gap. To write same amount data to Cosmos, the time used by azure data lake store is more than double of the time HDFS used. I also checked the network throughput, with HDFS we can push it to about 4Gb. And for ADL the throughput is only reach to 1.3Gb.
In my testing, I used multiple thread to write the data. Each thread creates individual file and write data into it. I tried to increase the thread number and the buffer size. it didn't help to improve the performance.
My questions are:
Is this performance gap expected? Since azure data lake store is based on REST API.
Is there any advanced API or parameter I can try to improve the throughput? For my scenario, we have to use the streaming write API to upload the data.
Environment summary
SDK Version: What version of the SDK are you using? (pip show azure-datalake-store)
Answer here: The latest.
Python Version: What Python version are you using? Is it 64-bit or 32-bit?
Answer here: python version: 3.6.9 64
OS Version: What OS and version are you using?
Answer here: Ubuntu 18.04
The text was updated successfully, but these errors were encountered:
Description
I ran benchmark to compare the performance to upload data to Cosmos between AzureDLFile.write() and the write API from native libhdfs.so. The result show there is significant gap. To write same amount data to Cosmos, the time used by azure data lake store is more than double of the time HDFS used. I also checked the network throughput, with HDFS we can push it to about 4Gb. And for ADL the throughput is only reach to 1.3Gb.
In my testing, I used multiple thread to write the data. Each thread creates individual file and write data into it. I tried to increase the thread number and the buffer size. it didn't help to improve the performance.
My questions are:
Environment summary
SDK Version: What version of the SDK are you using? (pip show azure-datalake-store)
Answer here: The latest.
Python Version: What Python version are you using? Is it 64-bit or 32-bit?
Answer here: python version: 3.6.9 64
OS Version: What OS and version are you using?
Answer here: Ubuntu 18.04
The text was updated successfully, but these errors were encountered: