Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Allow checking for alternate filenames to prevent changing extractor.*.filename resulting in redownloading everything #1673

Open
Scripter17 opened this issue Jul 5, 2021 · 4 comments

Comments

@Scripter17
Copy link
Contributor

Sometimes people change the name files are saved with and don't want to change all the file names and/or redownload everything

So if your config starts as

{
	"extractor":{
		"twitter":{
			"filename":"{author[name]}-{tweet_id}-{num}.{extension}"
		}
	}
}

then changes to

{
	"extractor":{
		"twitter":{
			"filename":"twitter-{author[name]}-{tweet_id}-{num}.{extension}",
			"duplicate-check-better-name-pending":["{author[name]}-{tweet_id}-{num}.{extension}""]
		}
	}
}

you won't redownload all the images. Granted you'll need to check the directory but maybe you could do ["twitter/{author}/..."]

Ideally you'd save all the metadata and use that to rename the files (and hope to god you didn't mess it up again), but if you're lazy and/or don't have the metadata this'd be useful

Actually while I'm at it, can we get a mass-renamer option for if we have the metadata? Something like --mass-rename [oldconfig.json] newconfig.json ./gallery-dl?

@Hrxn
Copy link
Contributor

Hrxn commented Jul 5, 2021

Uh.. this sounds horribly complicated. Maybe I'm just misunderstanding something.

But please, whatever the details, I strongly suggest to use the archive file feature in any way.

The whole point of the archive is to avoid any redownloads, regardless of any changes to the directory structure, filename settings, or the file data itself.

For metadata (depends significantly on what gets provided by the site, of course), simply use the metadata postprocessor.

You can set this at the base level in your config file, like in the example.

       "postprocessors": [
            {
                "name"       : "zip",
                "compression": "store",
                "extension"  : "cbz",
                "filter"     : "extension not in ('zip', 'rar')",
                "whitelist"  : ["mangadex", "exhentai", "nhentai"]
            },
            {
                "name": "metadata",
                "whitelist": ["danbooru", "yandere", "sankaku"],
                "mode": "custom"
            }
        ],

Turn on the metadata postprocessor for specific sites using the "whitelist": option.

Additional options for the metadata postprocessor begin here.
(metadata.<option-name>)

Might be of interest for you: content-format

To format the metadata to whatever you like..

There's one more thing:
You can also add your own custom metadata, the option is called keywords.

Most underrated feature, if I may say so 😄

Actually while I'm at it, can we get a mass-renamer option for if we have the metadata?

Not sure what you are trying to do exactly, but there are already dozens of decent mass-renamer programs out there.

@Scripter17
Copy link
Contributor Author

Basically if I try to download [this tweet] with the following config, it'll first check if gallery-dl/twitter/torproject/torproject-1410634591598047233-1.jpg exists and, if it does, just not download it again

{
	"extractor":{
		"twitter":{
			"filename":"twitter-{author[name]}-{tweet_id}-{num}.{extension}",
			"duplicate-check-better-name-pending":["{author[name]}-{tweet_id}-{num}.{extension}""]
		}
	}
}

The mass-renamer would look through every metadata JSON file in a gallery-dl folder then rename/move it and its corresponding image(s) to match the new config rules. So it'd rename gallery-dl/twitter/torproject/torproject-1410634591598047233-1.jpg to gallery-dl/twitter/torproject/twitter-torproject-1410634591598047233-1.jpg by just reading the keywords in the JSON file.

Personally these are much more convenient and user-friendly than archives, other renaming programs, and/or making your own reamer (as I've done and do not recommend)

Come to think of it these could be merged into one option

@github-account1111
Copy link

That sounds like something you could accomplish with a script or like Hrxn said one of the bulk-renaming programs. If you use an archive file, you're free to rename the downloads however you want. You can even move or delete them, they won't be redownloaded (with skip:true or abort:N). I also imagine it's way faster to check against the archive db than comparing file names.

@QPUNeptune
Copy link

QPUNeptune commented Apr 22, 2022

isnt there a way to add another column where you put the filename on the archive database, so when it sees that the file is downloaded but the filename is different (by comparing the filename stored on the archive against with the newly generated one), it just renames the file already on the computer while also storing it on the archive?
edit: i dont even think that there's a need to compare, just to store the filename generated just so the program would know which file to rename. Basically:

  1. Try to download
  2. Is on archive
  3. Generate new filename
  4. Renames the file before it saves the newly generated filename on archive
  5. Saves the filename on archive

the compare isnt necessary cuz it would just rename to itself without it, it would just add complexity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants