-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance on large collections #89
Comments
Probably yes. That's not easy to implement currently. Either file finder should be moved to different process or file finder code converted to iterator and this iterator should not do any slow operations as main process controls child processes and should respond instant. Also some compications for I can mark it as enhancement, can't tell when I implement it. But looks useful.
see also this part of doc:
|
Also, try use |
and limiting by file filtering is helping if you have a so huge journal which does not fit into memory (filtering save memory in this case). |
Hi I +1 this enhancement. Here I have 800k files, it takes 45 minutes to only list those files. Given my unstable internet connection, it's almost a no-go for using mt-aws-glacier in this case (restarting the process takes 45m again :/ ). |
In case of network errors mt-aws-glacier should retry request (at least 100 times), so you don't have to restart. It should work without need of restart in this case. If it crashes instead of retry - pls report a bug. |
Hi @vsespb Agreed. FYI, I manage to fail 100+ times some requests. Which cause others "tasks" to stop (see below logs). Looks like the counter is global and not per upload task? Don't get why some workers are always out and others works (will try to decrease concurrency from 4 to 1). I'll check also why I get unexpected SIGINT signal on the process despite I run it using Thanks,
|
no, it's per task!
not sure. that's interesting. never saw this. some workers (PIDs) or some files?
nohup should prevent SIGHUP, not SIGINT
yes, you need decrease concurrency and part size. see docs: https://github.com/vsespb/mt-aws-glacier#limitations I believe HTTP 408 is problem on Amazon side (I reproduced it using cURL), but they don't seem to care https://forums.aws.amazon.com/thread.jspa?messageID=511490񼸂 |
Thanks for the support @vsespb. Your ticket about HTTP408 is interesting, I'll try some tweaking too. I have attached the full log for your information, but I definitively need to decrease my concurrency and partsize first to ensure better success.
Cheers, |
I'm trying to backup large file collection. About two million files. Why does mt-aws-glacier does not upload files as it finds them on disk and instead insists on creating a list of every file in the target directory first?
I see it reports it found new files every 1000 of them, but I don't see any uploads starting:
I can't reasonably wait until it loads a list of the whole 2 million files into memory first. I could use filters to select a subset of files at first (hopefully it doesn't load files not covered by /root/reference), but it would be so much better if it just checked file, checked it's status in index, start uploading, proceed further when there's a free spot on the upload queue.
What do you think?
The text was updated successfully, but these errors were encountered: