Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 - Copy failed: Error: Unknown system error (2.00.01) #318

Closed
wsegatto opened this issue Jan 22, 2021 · 35 comments
Closed

V2 - Copy failed: Error: Unknown system error (2.00.01) #318

wsegatto opened this issue Jan 22, 2021 · 35 comments

Comments

@wsegatto
Copy link

wsegatto commented Jan 22, 2021

Describe the bug
Copy operation from the transcode folder to the output folder fails constantly.

Copy failed: Error: Unknown system error -139412464: Unknown system error -139412464, copyfile '/home/pi/tmp/Transcode/DARK-TdarrCacheFile-7CKwlGW4b.mkv' -> '/home/pi/tmp/Output/DARK-TdarrCacheFile-meuGljGGn.mkv'

To Reproduce
In my case, I just have to convert any file.

Expected behavior
Files should be moved to the Output folder. If operation fails (e.g. drive not mounted), a retry/skip button should be available. Already transcoded files in the queue awaiting copy should be saved, so if I restart the server, the copy should continue, not the entire transcode again. Logs should be clear about the error.

Screenshots
image

Desktop:

  • OS: Raspberry PI OS ARMv7 32 bits:latest

Additional context
I'm running Tdarr 1.99.09.

@HaveAGitGat
Copy link
Owner

The staging section has been saved on server restart since .10 and a retry copy button was added in .11. Worthing trying .14.

@wsegatto
Copy link
Author

wsegatto commented Jan 23, 2021

Thank you for the info! I'll give it a try once I'm able to open .14 I have opened #319 for this.

@wsegatto
Copy link
Author

Just had my first successful transcode and copy to output folder and clearing of the transcode folder in the process with V2.
Seems like the issue is solved in Debian Buster (ARMv7)!

Will try now with a CIFS mount in my Samba share.

@wsegatto
Copy link
Author

OK, so the issue continues, but I have an insight here.
I monitored the remote folder while Tdarr was copying to the Output folder. The file was totally copied but then got deleted and Tdarr said the copy failed. There was no info in the logs, only:

Error: Unknown system error -138642310: Unknown system error -138642310, copyfile in Tdarr.

Could you please verify what exactly you run after the copy? Maybe the renaming is failing. If you provide me the code and I simulate directly in the console to see what error I get.

@jasonsansone
Copy link

I am having a similar but slightly different error. I didn't open a new issue in case it is a duplicate.

Encoding and copying works on the master container which has all modules (Server, WebUI, Node). My slave containers are all receiving errors when attempting to copy from their local transcode cache location back to the storage directory after a successful encode. I am using lxc containers on Proxmox with CephFS for the cluster storage backend, so paths are all the same. Transcode cache is always /opt/scratch which is local to the lxc container and backed by an NVMe storage cluster. This all worked fine on v1, but that was before remote nodes existed in v2. I can manually copy the file from the scratch directory to the storage cluster. I have checked permissions on all containers. Promox nodes are linked by 2x 10GBe fiber in LACP, so the network is not a bottleneck.

Is the transcode cache supposed to be on a network accessible location and not local? The new instructions hint that is the case. If so.... what is the point of having a "cache" directory? The network share will be the same speed and open to the same issues for both the cache and final destination. Shouldn't the flow be to write to a local drive while reading from the network share, verify successful transcode, move the local cached transcode to the remote share, delete original file, rename transcode to original filename? That appears to be exactly what happens (and what I want to happen) when using the master node with the Server module. It only fails on "remotes".

Error: ENOENT: no such file or directory, stat '/opt/scratch/redactedfilename-TdarrCacheFile-xrB4X3mdq.mkv'

Screen Shot 2021-01-25 at 8 34 17 PM

@wsegatto
Copy link
Author

I had that issue with a Samba Share and it turned out to be a mount issue. Are your files in the remote directory appearing as being owned by root when you ls -la in the folder? I solved it by adding these options when mounting the dir uid=1000,gid=1000,dir_mode=0777,file_mode=0777.

By cache you mean transcode cache? Well, if you have to make multiple passes to the file, it does in this temp area and only at the end copy and remove the files to the Output folder. Transcode folder should be local in the node as far as I know.

@wsegatto
Copy link
Author

wsegatto commented Jan 26, 2021

OK, so the issue continues, but I have an insight here.
I monitored the remote folder while Tdarr was copying to the Output folder. The file was totally copied but then got deleted and Tdarr said the copy failed. There was no info in the logs, only:

Error: Unknown system error -138642310: Unknown system error -138642310, copyfile in Tdarr.

Could you please verify what exactly you run after the copy? Maybe the renaming is failing. If you provide me the code and I simulate directly in the console to see what error I get.

@HaveAGitGat, when you get the chance.

@wsegatto
Copy link
Author

wsegatto commented Jan 27, 2021

This is what's happening in my output mount (file size is 2.7GB)

Tdarr.Copy.mp4

@wsegatto
Copy link
Author

It seems the issue only happens for files larger than 2GB, it's not related to NFS or CIFS.
This was supposed to be fixed in NodeJS v10, but the issue occurs with v12 as well.

nodejs/node#30085

@wsegatto
Copy link
Author

This seems related. It seems copyfilesync throws errors if the size of the file exceeds the integer size in bytes, overflows it and turns negative.
https://gitmemory.com/issue/nodejs/node/30085/545466474

const fs = require("fs");

fs.copyFileSync("./2GB.bin", "./2GB-copySync.bin");

fs.copyFile(
  "./2GB.bin",
  "./2GB-copyASync.bin",
  console.log
);

fs.promises
  .copyFile("./2GB.bin", "./2GB-copyPromise.bin")
  .then(console.log, console.error);

@supersnellehenk
Copy link

Just had a brief conversation on Discord with @wsegatto, as noted in his OS he's running a 32 bit system. These systems don't really support stuff over 2 GB in size. So there will probably be a significant rewrite (I don't know the code, so can't really tell) to make it work on 32 bit systems. I think something like reading the first 2 GB of the file, then reading each 2 GB after that by an offset.

Node issue can be ignored as this isn't a node issue but rather a max integer issue due to 32 bit.

@jasonsansone
Copy link

The problem still exists that the functionality for moving files appears to be handled by the server and not the node, thus a node can not utilize a transcode cache location that isn’t accessible to the server.

@supersnellehenk
Copy link

The problem still exists that the functionality for moving files appears to be handled by the server and not the node, thus a node can not utilize a transcode cache location that isn’t accessible to the server.

That's not related to this issue, but that's by design. It's easier on the network, plus other nodes can access the same files. So three independent nodes can finish a single file. So node 1 runs plugin 1, node 1 picks up a different file to run a plugin on, node 2 runs plugin 2, etc. as an example. It's easier on the network because it doesn't hammer the full network bandwidth once it's done doing all the plugins, but gradually streams the file back into the cache at the same rate as transcoding.

@jasonsansone
Copy link

jasonsansone commented Jan 28, 2021

Haha your network maybe. I have 20Gbe. But understood.

edit: even for 1Gbe LAN, that’s still 125MBs which exceeds most consumer hard drive mixed R/W speeds. I’m not sure the network would be that much of a bottleneck, if any. Storage usually is the constraint. 10GBe is obviously 1250MBs, which is going to exceed anything short of flash based storage. Exceeding 1.2GBs on spinning rust is an impressive metric.

@wsegatto
Copy link
Author

wsegatto commented Jan 28, 2021

Thanks for your help guys. My point here is that the file IS copied but then it's deleted. I mean, it doesn't seem like the issue is the copy procedure physically, but rather the method of copy.

In any case I'll try with my other RPI that has 64 bits on to see its behavior.

EDIT: Yeah, I don't think I'll be able to try in 64 bits now. The updater never worked in my Debian Buster:

root@raspberrypi:/opt/Tdarr# ./Tdarr_Updater 
./Tdarr_Updater: error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory

root@raspberrypi:/opt/Tdarr# apt install libatomic1
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libatomic1 is already the newest version (8.3.0-6).
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.

root@raspberrypi:/opt/Tdarr# node -v
v15.7.0

root@raspberrypi:/opt/Tdarr# npm -v
7.4.3

@jasonsansone
Copy link

I don’t have anything ARM based, but I can test anything else that is x86_64. It’s easy to spin up a VM of Debian or anything else *nix.

@wsegatto
Copy link
Author

I have manually pointed node to the libatomic library but then I get another error. :(

pi@raspberrypi:/opt/Tdarr $ LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu

pi@raspberrypi:/opt/Tdarr $ export LD_LIBRARY_PATH

pi@raspberrypi:/opt/Tdarr $ ./Tdarr_Updater 
./Tdarr_Updater: error while loading shared libraries: libatomic.so.1: wrong ELF class: ELFCLASS64

Why is this hard? :(

@HaveAGitGat
Copy link
Owner

Tdarr uses fs-extra for this:

const fs = require('fs-extra')

const filePath = 'C:/test'
const newFilePath = 'C:/test_new'

fs.copy(filePath, newFilePath)
  .then(() => console.log('success!'))
  .catch(err => console.error(err))

You could try running the above script with a 2GB+ test file. It seems no error is being given but the new file is being automatically deleted by fsextra when it fails. Tdarr then sees it as successful and deletes the cache file.

@wsegatto
Copy link
Author

wsegatto commented Jan 28, 2021

@HaveAGitGat I have executed the script you mentioned and I got a success message. The file had 2.7GB and was completely copied to the output folder with no errors at all.

> fs.copy(filePath, newFilePath).then(() => console.log('success!')).catch(err => console.error(err))
Promise { <pending> }
> success!

It seems no error is being given but the new file is being automatically deleted by fsextra when it fails. Tdarr then sees it as successful and deletes the cache file.

That is not what happens. Tdarr is not seeing as successful and is not deleting the cache file. It throws the error I'm reporting in this issue. The file stays there in the transcode folder and the output file is deleted.

I'm trying with Tdarr again. FYI: I'm running 2.00.00 for all 3 modules since Node_Server 2.00.01 is not yet available for arm (I believe the packages are inconsistent since 2.00.01 is available for Node and WebGUI).

@wsegatto wsegatto changed the title V2 Preview - Copy failed: Error: Unknown system error V2 - Copy failed: Error: Unknown system error (2.00.00) Jan 28, 2021
@wsegatto
Copy link
Author

Nope, copy failed. The script works with no issues, @HaveAGitGat, but inside Tdarr the copy fails with errors consistently for files over 2GB.

image

@jasonsansone
Copy link

Thanks for your help guys. My point here is that the file IS copied but then it's deleted. I mean, it doesn't seem like the issue is the copy procedure physically, but rather the method of copy.

In any case I'll try with my other RPI that has 64 bits on to see its behavior.

EDIT: Yeah, I don't think I'll be able to try in 64 bits now. The updater never worked in my Debian Buster:

root@raspberrypi:/opt/Tdarr# ./Tdarr_Updater 
./Tdarr_Updater: error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory

root@raspberrypi:/opt/Tdarr# apt install libatomic1
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libatomic1 is already the newest version (8.3.0-6).
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.

root@raspberrypi:/opt/Tdarr# node -v
v15.7.0

root@raspberrypi:/opt/Tdarr# npm -v
7.4.3

Tested 2.00.01 on Debian Buster. Everything works fine. However, that is amd64 arch, not ARM. Whatever the issues are, they are confined to ARM arch.

@wsegatto
Copy link
Author

Thanks for the test. 2.00.01 is still not fully available in ARM, packages are inconsistent (latest Server version is still in 2.00.00).

[2021-01-28T10:56:19.047] [INFO] Tdarr_Updater - Tdarr_Node | Current version: 2.00.01 | Required version: 2.00.01
[2021-01-28T10:56:19.056] [INFO] Tdarr_Updater - Tdarr_Node | Up to date! Version: 2.00.01!
[2021-01-28T10:56:19.058] [INFO] Tdarr_Updater - 
[2021-01-28T10:56:19.059] [INFO] Tdarr_Updater - 
[2021-01-28T10:56:19.061] [INFO] Tdarr_Updater - Tdarr_Server | Current version: 2.00.00 | Required version: 2.00.01
[2021-01-28T10:56:19.062] [WARN] Tdarr_Updater - Tdarr_Server | Not the required version! Updating tDownloading 0.00 KB/s:>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>(100%)
[2021-01-28T10:56:21.637] [ERROR] Tdarr_Updater - Tdarr_Server | No module for version 2.00.01 on linux_arm
[2021-01-28T10:56:21.638] [ERROR] Tdarr_Updater - Tdarr_Server | Error downloading!
[2021-01-28T10:56:21.639] [INFO] Tdarr_Updater - 
[2021-01-28T10:56:21.640] [INFO] Tdarr_Updater - 
[2021-01-28T10:56:21.641] [INFO] Tdarr_Updater - Tdarr_WebUI | Current version: 2.00.01 | Required version: 2.00.01
[2021-01-28T10:56:21.642] [INFO] Tdarr_Updater - Tdarr_WebUI | Up to date! Version: 2.00.01!
[2021-01-28T10:56:21.644] [INFO] Tdarr_Updater - 
[2021-01-28T10:56:21.644] [INFO] Tdarr_Updater - Finished!

@HaveAGitGat
Copy link
Owner

@wsegatto sorry about that, it should now be up.

@wsegatto
Copy link
Author

I don't know how Node works - whether it uses my local libraries or the libraries during compilation, but I can say for sure that all copy methods seem to work when executed directly with node.

> fs.copy(filePath, newFilePath).then(() => console.log('success!')).catch(err => console.error(err))
Promise { <pending> }
> success!

> try {
...   fs.copySync(filePath, newFilePath)
...   console.log('success!')
... } catch (err) {
...   console.error(err)
... }
success!
 
> try {
...   fs.copyFileSync(filePath, newFilePath)
...   console.log('success!')
... } catch (err) {
...   console.error(err)
... }

success!

@wsegatto
Copy link
Author

wsegatto commented Jan 28, 2021

Issue continues in v2.00.01. I guess Tdarr is not for ARM. I'm about to give up. The updater doesn't even start in armv8 64 with Debian Buster and in 32 bits there are copy issues.

I even tried to download the latest fs-extra and input it to the node_modules folder but nothing changed.
I can only imagine there is a problem with Tdarr.

Is there a way to debug this? With no logs it's just a wild guess here.

@jasonsansone
Copy link

I realized I can emulate ARM on Proxmox. What build / OS are you using? I can try to make a test VM. I am happy to spin up whatever we need to help debug.

@wsegatto
Copy link
Author

I realized I can emulate ARM on Proxmox. What build / OS are you using? I can try to make a test VM. I am happy to spin up whatever we need to help debug.

Thank you so much.
I'm running Raspberry Pi OS 32 bits, latest version, which is a Debian Buster, armv7/armhf.

@wsegatto
Copy link
Author

The test is simple. Just setup a library with source/transcode/output folders, then use a plugin like Tdarr_Plugin_a9hd_FFMPEG_Transcode_Specific_Audio_Stream_Codecs or
Tdarr_Plugin_MC93_Migz4CleanSubs. Use an input file larger than 2GB with a margin to spare, like 2.5GB.
The copy from the transcode folder to the output folder will fail. Doesn't have to be a remote folder. Local folders also have this problem.

@wsegatto wsegatto changed the title V2 - Copy failed: Error: Unknown system error (2.00.00) V2 - Copy failed: Error: Unknown system error (2.00.01) Jan 29, 2021
@HaveAGitGat
Copy link
Owner

I’ll have another look at this today, thanks

@HaveAGitGat
Copy link
Owner

This should now be fixed in 2.00.02 just released. Working fine with a 4GB file on my Pi.

@wsegatto
Copy link
Author

Hi everyone!

Thank you so much for the help. I can confirm the issue no longer occurs in 2.00.02.
@HaveAGitGat, for clarity purposes, would you mind sharing what have you changed?

image

Thanks! You just got a new Patreon.

@HaveAGitGat
Copy link
Owner

@wsegatto I didn't do too much investigating but it's an issue with fsextra when it's packaged up. I tried some other file copy libraries and cp-file seems to do the job.

@HaveAGitGat
Copy link
Owner

Glad it finally works :)

@wsegatto
Copy link
Author

Thank you so much! May this also serve as a suggestion for so many other projects facing the same issues. :)
I'm running transcodes for my entire library now. Thank you!

@marshalleq
Copy link

For posterity, 32bit file systems have been dealing with files larger than 2GB for decades - 32 bit should have zero to do with this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants