-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TruncatedOutput: only 1495314753 of 2121389892 bytes read #554
TruncatedOutput: only 1495314753 of 2121389892 bytes read #554
Comments
Anyone any idea? |
It looks, as you say, like it's a network problem. The connection between your machine and b2 cloud server has deteriorated to a point of failure and the CLI has reported it (as it should). A future version of B2 CLI will do more to automatically recover from this type of issues. |
Thanks for having a look and confirming. Currently, this issue is completely preventing me from using the command line tool, since I can't download most of my files. (Uploads work fine, so I'm pretty baffled.) Is there a workaround to keep the connection alive, or to download parts of a file instead? |
I can see from the stacktrace that you are using a relatively new version of the CLI, which has parallel transferer (which I have implemented) enabled by default for large files and your file is large. An obvious workaround would be to split the file to several small files (using 7zip with a very low compression?) and reassemble it upon restore. It's not ideal, but maybe it will work for you? Actually, I have observed an issue like the one you report here on my workstation during testing of parallel transferer. In my case it was caused by VirtualBox "NAT" network driver, which is known to cause massive issues when performance gets reasonably high. If you are using VirtualBox "NAT" driver, please try to switch to "Bridged" - it resolved the problem instantly in my case (and it improved performance significantly). Alternatively, (since there is no configurability for parallel trasferer parameters in the current version), you can try to revert to b2 CLI version 1.3.6, which always used just one thread to download files, regardless of their size. It may be slower, but more reliable in your case. |
I would love to try this. Unfortunately, I'm not at all familiar with github (sacrilegious, I know) so I'm just looking up what I need to do exactly. I don't expect you to tutor me on how to actually use this site, but if there happens to be a command line you could tell me that would install CLI 1.3.6 off the bat, I'd love to hear it. Otherwise, I'll continue my github crash course. Here's where I'm up to: |
@Desto7 and if you'd like to install a version that you have checked out locally, then:
|
Going to 1.3.6 has fixed my downloading issue. Brilliant!! Thank you so much! For those curious, 1.3.6 is slower, taking 14 minutes for a 2.1GB file, at a very fluctuating bitrate, as opposed to 8 minutes at my max bitrate when using the latest build. For my purpose, half speed is fine, so thank you again! |
@Desto7 could you tell me a little bit more about your environment? Is it a vm, an IoT device, on what network it is etc? |
It is a W10 computer on a home network, nothing too special. |
I'm having the same issue occasionally on large (> 1GB) files. Also on W10, however I run this stuff through WSL. Not sure how helpful this will be, but it's one more datapoint.
|
I'm looking into this |
I'm getting this error as well and it stops consistently at around 40gb download mark for me.
|
I fixed it in Backblaze/b2-sdk-python#32 |
@ppolewicz I know this has been closed for over a year, but downloading a 104G file kept failing for me and using 1.3.6 fixed the issue. |
@rtrainer can you please try with b2cli v2.0? Quite a few things have been rewritten there, it should be correct and faster than 1.3.6 |
I used the latest release which is v2.0.2. |
@rtrainer just to be clear: downloading a 104G file kept failing for you with CLI v2.0.2, then you switched to 1.3.6 and it worked fine? |
That is correct. During the download there were a couple of timeouts but it kept going until between 75GB and 100GB when it would fail. 1.3.6 took longer but worked perfectly with no timeouts. I tried running it on Windows and Ubuntu 18.04. |
@rtrainer this did not show up on my tests, clearly there is a difference in the environment. Could you please say a bit more about your environment, specifically everything you can say about your network connection (and any usage of it during the download process), what type of storage device you are writing on (and if anything else is writing to it), what is the age of that device and amount of remaining free space, filesystem type - anything you can think of will help me narrow down the cause. (I know of one potential cause but it seems that in your case it may be something different). Also I'd like to ask what behavior would you like to see in a perfect world - should the download process retry for a really long time (say, a day) if that's necessary because of a horrible connection? Currently the number of attempts is limited (to 5 per fragment I believe, which is subject to change) and we might want to change that. Finding a solution that you'd be happy with would be a nice starting point. |
My internet connection is a 1Gb FIOS fiber to my router. I run enterprise grade equipment in my home network. All of my switches are connected with 1GB fiber interconnects and I have a 1 Gb connection to my laptop. My laptop has 64GB RAM, an i7-7700K processor and multiple 1TB Samsung SSD storage device. I watched my network after the first couple of failures and saw nothing to make me believe there was a problem with it. I would like to see an option for the number of thread for a download and the number of retries. Maybe also the timeout value. This would give me some tools to maybe work around the problem. I am happy to do whatever testing you would like me to do to help you understand what is going on and hopefully solve this. |
Rather than giving you the tools to manually configure the program so that it doesn't crash on you, I'd like to come up with something that will automatically configure itself for you (so if running it on 8 threads causes problems, the number of threads should be decreased until a single thread remains or the problem disappears). The program needs to know what your exit criteria is though (because otherwise we could just set infinite retries and it would eventually complete - but that's unfeasible for many usecases). Can it be a timeout for the entire (sync) operation? |
Backblaze/b2-sdk-python#32 improved this a little bit but the fix is not complete - sync operation can create N*10 threads for downloads which can cause thread starvation and eventually a timeout. Proper threadpool must be introduced. |
@tteggelit this might have been caused by network conditions, where something that supervises your connection (anti-botnet, anti-ddos or something like that) might really not like you to overload the network with a lot of connections. In your case, however, you only have 1Gbe connection, so assuming no overhead it should be capable of transferring 125MB/s or so. The speed of B2 CLI is 610MB/s (on Python 3.11 on Linux), so you will easily saturate the network with just a single thread (on low latency - on larger latency you can use more). Running more threads than your network can handle can lead to broken transfers. The documentation of
|
@Lusitaniae TruncatedOutput is usually a consequence of irrepairable error that happened before it, it just indicates that the download has failed and retries have also failed and now we ended up with a file that is not fully downloaded so we are going to stop the sync operaiton completely. I think we are going to change this behavior in B2 CLI 4.0 to allow users to keep recovery attempts for a set period of time rather than 5 attempts per fragment, but if a fragment fails to download 5 times in a row, it's a problem and we should find what it is. Are there other exceptions in the logs that you haven't shown? The performance counters show that it's mainly the network you are waiting for. This is expected behavior - it just means that your storage device is not too slow (23min waiting for network, 38s waiting for the drive). |
@ppolewicz Blaming the network doesn't quite explain why the upload - with what I assume uses the same default thread concurrency - worked fine and why the vast majority of the downloaded files (like 97%) worked the first time, but 5 subsequent attempts at these specific 150 files kept failing. I have Speedtest results being gathered from my network and while I can certainly see an impact to the bandwidth available for the Speedtest results during these transactions (you can see the initial upload impact in the green and then the subsequent download impacts in the yellow), it's by no means saturating or causing starvation. In fact, the single thread |
I don't have any access to the server infrastructure, but as far as I understand, the storage unit that a group of your files is being kept might have been undergoing critical fragment reconstruction, during which the performance of that storage unit would be degraded. This is not a problem specific to Backblaze B2 but any storage based on erasure coding - B2 uses 17+3, so if a couple of drives of the same "stripe" die, it's getting pretty important to try to recover the missing fragments and maybe (this is pure speculation as I've never seen the server code) the restore operation is being prioritized over user traffic, which in your case might have shown as broken downloads. These things come to an end though - the data is reconstructed and the cluster performance recovers. In order to confirm my suspicion you can try to download it now and see if it works. My guess is that when you run it today, it'll work fine. |
Given the volume of my downloads, each time I run sync takes about 1h runtime multiple attempts in a day across different days and the result is b2 cli was "reliably failing" to perform a full sync with success I've moved the script to use s5cmd and it has been working so far also from discussing with backblaze team, looks like my account is assigned to the smallest cluster they have in EU region (and I'm the heaviest user there) and previously they asked to slow down traffic as it was causing too much load on their systems |
@ppolewicz That wouldn't explain why when I deleted the offending files and then re-uploaded them (successfully the first time) with |
@Lusitaniae a small cluster has a good chance of being impacted by rebuild (which takes some time). It might be that it finished rebuilding just as you were switching from cli to s5cmd and that is why it works now. @tteggelit default of 10 might not be the best setting, it's just something that was decided years ago. We'll be releasing cli v4 soon, which will allow us to change the defaults and this might be one of the changes that we'll need to do (also we'll change the retry policy to be even more bulletproof). Thank you for your suggestion. |
I had the same issue as others noted. It's not clear why single threading seems to work but perhaps the CLI should try failed parallel downloads again in single threaded mode? |
@yeonsy can you please try to reproduce it again with logs enabled? |
@ppolewicz I have a log output from one of the failed runs but I don't have the time to properly redact it at the moment. If you email me at [email removed] I can mail you the log. |
Ok, sent you an email |
Just wanted to add a data point - I'm moving a Windows Server box from a Hetzner box in Estonia to a new box at a different DC in Prague, and trying to transfer 45TB between the two over SMB is unusable. I've specifically created a new Backblaze account using the EU region for latency and speed, and both boxes are like 1 or 2 hops away from the Backblaze DC.
Both of these machines are high performance servers, in reliable data centers, on 10Gbps unmetered ports. They are directly connected to the Internet with live ipv4 addresses. There is no NAT or firewall or anything interfering here - and the box in Prague is a brand new installation, so there's zero chance something could be awry there. I'm running the latest version of b2 as installed by pip (4.0.1). I then kicked off a new pass with As an extra experiment, I installed 1.3.6 in a python venv, and ran it with 40 threads. Using a version this old is honestly a little nuts, but it works at roughly the same speed (even a little slower) than the Finally, I ran the current version with 40 threads, but I included There is DEFINITELY something wrong with the multiple streams per file downloading, unfortunately. I'm not sure if it's just on large files or what the deal is, but given that 1.3.6 works perfectly, and restricting download streams to 1 works perfectly, it's safe to say that this functionality isn't working as expected. Don't worry @ppolewicz - as you pointed out, even stable open-source software isn't inherently bug free. I'm confident you'll figure this out, and I for one appreciate all the effort you put into this piece of software! |
@fluffypony this can, unfortunately, be caused by a number of reasons and I cannot address it without figuring out what the reason is. Your story gives me important insight, thank you very, very much for taking the time to write it down. I think the code of ParallelDownloader is correct. At one time I've been able to pull ~40Gbps using a single instance of b2 cli running under a special interpreter, but that was on a low latency network. Ideally the code could figure out what the situation is and it would automatically adjust the thread count and the streams per file, ideally we'd be able to use a (non-existent as of yet) resource on the server to continue a broken download Now, what happened in your case, is hard to say - I would really love to take a look at the logs of the failing operation to see why it stopped and what the problem was. Is it possible for you to re-trigger the error with logs enabled? |
Sure thing - I have re-run it with |
@fluffypony you can send it to me; and yes, email from the commit author field will be fine |
Sent! I used 100 threads to try induce it into failing as much as possible😆 |
Received 38MB of logs. Thank you. I will review them tomorrow. |
I am using the latest b2 client, and I consistently get problems downloading large files (100GB+) using I tried using Any recommendations for alterantives? I'm currently exploring using rclone sync or simply mounting with s3fs and then rsyncing from there. |
Please share logs so we can confirm if it is the same bug. Hopefully extra data point will make it easier to debug. |
I removed locations from the logs. This happened when trying to sync a folder with two sizes: 970GB and 3.4GB (using default 10 threads). Even a smaller file failed. As I said, I see this all the time across many clusters. b2 is the only service that I'm consistently encountering issues with when downloading files. There are no issues with uploads. Setting threads to 1 works most of the time, but it is superslow.
|
This only happens if an individual part is retried 20 times without success. What you are seeing in the end is sync file failure, which just states that the file couldn't be fully downloaded, but there is at least 20 exceptions prior to that (probably much more) stating the real failure reason. |
That is the only thing I see in the console. Where can I see the full log? Also, is it possible to manually control the number of retries, backoff time, etc? Also, there are no issues with uploading large files. Why would downloading them be so different? Our clusters are located in different places across North America, and usually we don't have any issues downloading from GCS or AWS S3. But the failure rate downloading with Do you have a recommendation on alternative ways we can download from b2 buckets? I've just started exploring |
@yury-tokpanov see https://github.com/Backblaze/B2_Command_Line_Tool?tab=readme-ov-file#detailed-logs B2 buckets can have s3 interface compatibility, so you can try that. Let us know if that works, though at this point I'm pretty sure it's a storage device or a network issue, because you are failing with 1 thread and there is hardcoded 20 retries per file. Or maybe it's a bug - if you can find out from the logs what is causing it, we'll fix it - it's just not possible to fix that one without being able to reproduce it or see the logs. I'm sure you understand. As far as I know, the fastest s3 downloader out there is s5cmd. |
@fluffypony @yury-tokpanov I'm happy to say, the new B2 CLI 4.1.0 release fixes the reported problem by properly retrying in case of middle-of-the-data-stream errors; hence it will work correctly even when the network is congested due to multiple concurrent connections. |
Amazing - thank you so much for all your effort with this! |
Thank you! We will going to test it! |
I'm able to use "b2 download-file-by-name" to download small files, but when I target a 2.1 GB file, it crashes out randomly midway through. (I have run the command at least 10 times over the course of two days. Each time it crashed after having downloaded in the range of 1.4 - 2.0 GB out of 2.1 GB).
Reading through the issues page, it seemed that "b2 sync" is recommended. Same issue remains though, crashing out at about 1.7 GB.
Since no one else appears to have this rather fundamental problem, I suspect it's related to my region/isp/home network. Still.. any help would be appreciated. I have attached a --debugLog, and pasted a typical command line response here.
Thanks in advance
b2_cli.log
CMD:
Output:
The text was updated successfully, but these errors were encountered: