Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to update only newer file? #42

Open
higimo opened this issue Feb 13, 2018 · 15 comments · May be fixed by #100
Open

How to update only newer file? #42

higimo opened this issue Feb 13, 2018 · 15 comments · May be fixed by #100

Comments

@higimo
Copy link

higimo commented Feb 13, 2018

I have upload 1 gb every deploy. But update 2 files. How upload onli this newer files?

@simonh1000
Copy link
Owner

simonh1000 commented May 5, 2018

Sorry, but that's not supported at present. I'd find it useful too, but have no timeline to develop it.

A number of people have had a go at this now

  1. @TomFreudenberg (Incremental Update based on a folder-hash file #117) based on a storing a file of MD5s each time ftp-deploy is invoked
  2. @limikael (added newFilesOnly #151) uses a more reliable date comparison
  3. @paganaye (implement updateNewerFiles option #141) who proposes storing a file on the destination (essentially of name and date)
  4. me ([wip] Upload newer #100) based on file size and date comparison

I'm not excited about creating extra files (state) and storing them on people's computers, but adding the MD5 is perhaps the gold standard in comparing. However, MD5ing a 1 GB file (such as motivated some of the contributors of PRs) is expensive in other ways

I think my preference is for solution 2, and does work well in the PHP use case. It may not be perfect but perhaps good enough, not least as so many developers now enjoy git-style deploys anyway

Would welcome thoughts

@simonh1000
Copy link
Owner

@higimo if you have time to test this branch https://github.com/simonh1000/ftp-deploy/tree/upload-newer, that would be much appreciated

@gabrielnvg
Copy link

@higimo if you have time to test this branch https://github.com/simonh1000/ftp-deploy/tree/upload-newer, that would be much appreciated

I tested, and in my case, I generate a build folder every time before deploying.
Therefore, my files will always have a newer date.

Maybe it's a good idea to pass as parameters the options to choose between only update if file is newer, only if size is different, or only if both (and in my case, I would choose only if size is different).

In time, nice job on this project!
It saved me a lot of time :)

@mhadaily
Copy link

mhadaily commented Dec 1, 2020

I have a proposal on this, what if we create hashtable of files and when you upload you upload like a .state.json and every time we want to upload, we can upload compare hash files, and if any difference, then we upload. that makes sure we upload when there is actually a difference and that could work on any environment,

What do you think?

@simonh1000
Copy link
Owner

I'm reluctant to do that @mhadaily because it create an extra piece of state that could get out of sync. I know that I have sometimes switched to an ftp client to upload a bugfix to a specific file (perhaps when I was away from my main machine) so that the state.json you propose could end up being inaccurate. The solution in the PR works for e.g. php sites - I think (that's why it needs testing) - but not, as @gabrielnvg notes, for projects that compile the distributed assets.

There do seem to be ways that work, but I have not tried to look at the source code of e.g. the grunt ftp client I used back in the day which seemed to work quite nicely

@TomFreudenberg
Copy link

Hi maybe you checkout this approach - is working well on our side #117

@simonh1000
Copy link
Owner

@gabrielnvg is there any solution to your case - your build process probably also creates cache busting assets that differ by name even if the content has not changed?

@gabrielnvg
Copy link

@gabrielnvg is there any solution to your case - your build process probably also creates cache busting assets that differ by name even if the content has not changed?

If I remember, on that project (3 years ago haha) the build files didn't have name ganeration.
So, by being the same name, they were all replaced.

@limikael
Copy link

Seems like we will never be able to agree on the one solution to rule them all. Perhaps add several of the methods that now exists as PRs and make it possible to select comparison method with a flag?

FTP file upload isn't exactly cutting edge and the sexiest kind of project to work on, so I can understand that this is a bit stuck. It is still quite useful from time to time for many people, however. I'm unemployed and would have time to volunteer to take it on, but would like to ask for a donation in that case, I'm currently sadly in hustler lifestyle mode: https://www.buymeacoffee.com/limikael

What I would implement then would be a flag to select either #117 or #151. Should there be more methods?

(why am I suitable for the job? one of the implementations is mine)

@TomFreudenberg
Copy link

Hi all,

I will just respond that the PR #117 is still working on our side without any issues for years. We use also sometimes git deployments. While git also based on some type of separate info database (hashes) and checks the differences against remote repositories, I still believe in a file or dictionary with some kind of data about the files. Date/Time and Size is not suitable to compare differences in many cases. There must be some kind of hash or key to make sure what data / files we are talking about. In case that FTP won't let us call something on server I think that is the only suitable way.

Just my 2 cents

Tom

@limikael
Copy link

I agree, hash is more reliable... But it requires that the file with hashes is already there on the server... If you are in a situation where you have set it up that way from the beginning, then it will be there, and everything will be fine... One might even argue that this is the majority of the cases. However, in my case, there was no file there because no one had used this command before. I was tasked with updating an already existing project with a lot of files. Also, I was the only one in the team using this command, other people used regular FTP clients, and I didn't have the authority to tell them to do otherwise. In this case, it was better for me to use file date. So this is why I suggest to have a flag where one can switch, and use #117 where it makes sense, and #151 where it makes sense. @TomFreudenberg do you think having such a switch will cause problems or confusion?

@simonh1000
Copy link
Owner

simonh1000 commented May 21, 2023 via email

@TomFreudenberg
Copy link

TomFreudenberg commented May 21, 2023

Hey guys, I do not see any problem in having both options.

Just to make it clear:

But it requires that the file with hashes is already there on the server.

No, it has not to be there. If the file is not there, then it is the same as all are missing or be different. What might also be done is, that all files on remote gets deleted in that case.

  1. Check if remote hash file exists
  2. If yes - proceed
  3. If no - copy all files to remote
  4. Last there might be a cleanup process - looking for all files on remote not in hash file yet an delete them. That would create a consistent duplication on remote side.

Step 4. might be optional or partial based on remote path-regex.

P.S.: 1. to 4. is our current workflow

@limikael
Copy link

limikael commented May 22, 2023

Great!

No, it has not to be there. If the file is not there, then it is the same as all are missing or be different.

So let me also clarify that I understand this. I also approve of your process and workflow. However, in the project where I used this software I didn't have the authority to tell people to use such a workflow, even if I would have wanted. Think about the scenario where I would have used it, and I would have updated the hash file, but other people in the project would have used other FTP clients. There is a potential scenario then where someone else updates a file, but they don't update the hash file. The project also included a lot of huge files, e.g. movies and such. My job was to update just a few .html files. In this particular case, file date seemed like the best option. So yeah! Being able to select algorithm seems like a good solution!

I also see that #117 relies on a config key fileFolderHashSums. Whereas #151 relies on newFilesOnly. So it seems that even with the current implementations there shouldn't be a conflict. @simonh1000 anything preventing you from just go ahead and merge these both solutions?

@TomFreudenberg
Copy link

Think about the scenario where I would have used it, and I would have updated the hash file, but other people in the project would have used other FTP clients. There is a potential scenario then where someone else updates a file, but they don't update the hash file.

OK, got it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants