tags in filename for sankaku or other booru host #94

wankio · 2018-07-11T08:48:50Z

1, host_id_rawfilename
Can it change to host_id_tags ? Because i don't see option in config file and filename already limited to 255char ?

2, does it have link history to avoid dupplicate downloaded file ? like ripme

3, can it have filename format like below software ?
https://github.com/Nandaka/DanbooruDownloader

thank 👍

mikf · 2018-07-11T09:27:08Z

1, host_id_rawfilename
Can it change to host_id_tags ? Because i don't see option in config file and filename already limited to 255char ?
...
3, can it have filename format like below software ?

You can configure the output filename and directory with the extractor.filename and extractor.directory options. To change the filename format for sankaku to "host_id_rawfilename", you would put something like this in your config file:

{
  "extractor": {
    "sankaku": {
      "filename": "{category}_{id}_{tags}.{extension}"
    }
  }
}

2, does it have link history to avoid dupplicate downloaded file ? like ripme

gallery-dl skips downloads for files that already exist and there is also the archive option (also available with the --download-archive command-line switch)

wankio · 2018-07-11T10:56:34Z

oh thank, i will try that :)

update :
in config.json
"sankaku":
{
"username": null,
"password": null,
"wait-min": 2.5,
"wait-max": 5.0,
"filename": "{category}{id}{tags}.{extension}"
},
it have Errno 22 Invalid argument

mikf · 2018-07-11T12:39:51Z

The config snippet you posted looks fine and should work.

Could you post the whole output when you run gallery-dl with the --verbose option? It would be helpful to know where exactly this exception occurs.

wankio · 2018-07-11T12:55:31Z

I:\DOWNLOADS\Command tools>gallery-dl https://chan.sankakucomplex.com/?tags=chan_co --verbose
[gallery-dl][debug] Version 1.4.2
[gallery-dl][debug] Python 3.4.4 - Windows-10-10.0.17134
[gallery-dl][debug] requests 2.19.1 - urllib3 1.23
[gallery-dl][debug] Starting DownloadJob for 'https://chan.sankakucomplex.com/?tags=chan_co'
[sankaku][debug] Using SankakuTagExtractor for 'https://chan.sankakucomplex.com/?tags=chan_co'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): chan.sankakucomplex.com:443
[urllib3.connectionpool][debug] https://chan.sankakucomplex.com:443 "GET /?tags=chan_co&page=1 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://chan.sankakucomplex.com:443 "GET /post/show/7024858 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): cs.sankakucomplex.com:443
[urllib3.connectionpool][debug] https://cs.sankakucomplex.com:443 "GET /data/64/bf/64bf0aa8829e737468e9a0a229ad0166.jpg?e=1531388877&m=Y6qa7KMsjcFbb6NBDTI6pQ HTTP/1.1" 200 695172
  .\gallery-dl\Chan.Sankaku\chan_co\7024858_2018-07-11 03... hair, white bikini, white gloves, white swimsuit, wink.jpg
[sankaku][error] Unable to download data: [Errno 22] Invalid argument: '\\\\?\\I:\\DOWNLOADS\\Command tools\\gallery-dl\\Chan.Sankaku\\chan_co\\7024858_2018-07-11 03_45_fate (series), fate_grand order, bb (fate), chan co, simple background, 1_1 aspect ratio, 1girl, asymmetrical hair, bangs, bikini, black choker, breasts, choker, clavicle, cleavage, ;d, eyebrows visible through hair, female, front-tie bikini, front-tie top, gloves, hair ornament, hair ribbon, hand on hip, hand up, large breasts, long hair, long sleeves, looking at viewer, megane, navel, one eye closed, open mouth, pointer, ponytail, purple eyes, purple hair, red ribbon, ribbon, rimless eyewear, side ponytail, side-tie bikini, smile, solo, star, swimsuit, tied hair, very long hair, white bikini, white gloves, white swimsuit, wink.jpg.part'

My new config

"filename": "{id}_{created_at}_{tags}.{extension}",
            "directory":["Chan.Sankaku","{search_tags}"],
            "archive": "./gallery-dl/archive-chan.sankaku.sqlite3"

mikf · 2018-07-11T13:38:37Z

OK, that filename is way too long (670 characters) and there is currently, as also noted in #92, no way to prevent that.

I guess too long filenames could just be cut short to fit into the 255 character limit, but a more configurable approach (like string slicing for format string replacement fields) would be nice as well. I'll think of something ...

And, by the way: Python, at least on Linux, recognizes long filenames: OSError: [Errno 36] File name too long, so I wasn't quite sure how this error came to be. But on Windows you either get [Errno 2] No such file or directory or [Errno 22] Invalid argument.

wankio · 2018-07-11T13:52:31Z

that's what i'm thinking...filename too long because we can't limited how many tags can be add in filename...anyway , thank :)

and can it support these format too ?

- %provider% 	= provider Name
- %id% 		= Image ID
- %tags% 	= Image Tags
- %rating% 	= Image Rating
- %md5% 	= MD5 Hash
- %artist% 	= Artist Tag
- %copyright% 	= Copyright Tag
- %character% 	= Character Tag
- %circle% 	= Circle Tag, yande.re extension
- %faults% 	= Faults Tag, yande.re extension
- %originalFilename% = Original Filename
- %searchtag% 	= Search tag

mikf · 2018-07-11T14:30:25Z

All of these fields are already available, but under different names.

%provider% -> {category}
%id% -> {id}
%tags% -> {tags} (or {tag_string} on danbooru)
%rating% -> {score}
%originalFilename% -> {name}.{extension}

and so on. The exact names depend on the booru board in question, as gallery-dl is just using the API responses without much modification. Take a look at the output with -K to get a complete list of replacement field names.

To enable {tags_artist}, {tags_character} and so on, you need to set extractor.*.tags to true.

wankio · 2018-07-12T06:47:46Z

so after you add option to prevent long filename, i just need add tags:true in extractor {sankaku..} to enable artist/character ?
Can gallery-dl use this search_tags ? : [tags]+date:<=yyyy.mm.dd
because after 1000result downloaded you can't download anymore...so you need add +date:<=yyyy.mm.dd after tag to have download more 1000result. yyyy.mm.dd is created_at i think so

to compare with danbooru downloader and other, i think gallery-dl is better
1 - low memory usage (i think because it only use one thread instead multi-thread)
2 - archive (skipped downloaded id) (ripme have it but danbooru downloader and other dont)
3 - bunch download from pastebin (ripme have rip from clipbloard, danbooro downloader dont have)

mikf · 2018-07-14T21:00:16Z

so after you add option to prevent long filename, i just need add tags:true in extractor {sankaku..} to enable artist/character ?

Yes, but it would be easier to enable this option for all boorus by just setting extractor.tags to true. Otherwise you would have to enable it for each site individually, i.e. extractor.sankaku.tags, extractor.gelbooru.tags, and so on.

Concerning filename lengths: you can now (since 8fe9056) slice values in format strings.
{tags[:200]} would limit it to 200 characters max - everything after that will be cut off.

Can gallery-dl use this search_tags ? : [tags]+date:<=yyyy.mm.dd
because after 1000result downloaded you can't download anymore...so you need add +date:<=yyyy.mm.dd after tag to have download more 1000result. yyyy.mm.dd is created_at i think so

It can, but that's not necessary if you want to go past 1000 results / page 50. You don't even need to provide username and password if you want to go past page 25. Being logged in only lets you use more than 5 tags at once and allows you to jump to higher page numbers faster (with --range 800-, for example)

ghost · 2018-07-21T07:50:00Z

[danbooru][error] An unexpected error occurred: AttributeError - 'list' object has no attribute 'startswith'.

Edit: this is my first post here. am i doing it right?

mikf · 2018-07-21T08:05:52Z

You should open a new issue, post the URL in question and, if possible, the complete error output with --verbose.

wankio · 2018-07-22T06:28:59Z

ok i will test it soon :)

wankio · 2018-07-29T06:25:09Z

Yes, but it would be easier to enable this option for all boorus by just setting extractor.tags to true. Otherwise you would have to enable it for each site individually, i.e. extractor.sankaku.tags, extractor.gelbooru.tags, and so on.

Concerning filename lengths: you can now (since 8fe9056) slice values in format strings.
{tags[:200]} would limit it to 200 characters max - everything after that will be cut off.

     "sankaku":
        {
            "username": null,
            "password": null,
            "wait-min": 2.5,
            "wait-max": 5.0,
            "filename": "{tags_artist}_{tags[:200]}_{id}_{created_at}_.{extension}",
            "directory":["Chan.Sankaku","{search_tags}"],
            "tags": true      
        },

[sankaku][error] Applying filename format string failed: TypeError: string indices must be integers

When i'm even not set {tags} ..gallery-dl still only set filename as {id}{created_at}.{extension} instead {tags_artist}{id}{created_at}_.{extension}

It can, but that's not necessary if you want to go past 1000 results / page 50. You don't even need to provide username and password if you want to go past page 25. Being logged in only lets you use more than 5 tags at once and allows you to jump to higher page numbers faster (with --range 800-, for example)

so if i input tags have higher than 1000result, it will keep downloading until have nothing to download ?

mikf · 2018-07-29T08:53:44Z

[sankaku][error] Applying filename format string failed: TypeError: string indices must be integers

When i'm even not set {tags} ..gallery-dl still only set filename as {id}{created_at}.{extension} instead {tags_artist}{id}{created_at}_.{extension}

You are using version 1.4.2 and not the latest git snapshot. The {tags[:200]} thing and the tags option for sankaku hasn't been "officially" released yet. Do a pip install --upgrade https://github.com/mikf/gallery-dl/archive/master.zip and try again.

so if i input tags have higher than 1000result, it will keep downloading until have nothing to download ?

Yes, it only stops after downloading all search results, but you can set a custom upper limit with, again, the --range option.

wankio · 2018-07-29T10:08:09Z

oh nice ty, installed python version and it worked :)

i just test host local file and using r:link to batch download, wow it work too :)

The L option allows for the contents of a format field to be replaced with <replacement> if its length is greater than <maxlen>. Example: {f:L5/too long/} -> "foo" (if "f" is "foo") -> "too long" (if "f" is "foobar") (#92) (#94)

mikf · 2018-07-29T12:16:57Z

host local file and using r:link to batch download

  -i, --input-file FILE     Download URLs found in FILE

And to quote myself from the other issue:
You can now use the L format specifier to set a replacement if the format field value is too long. For example {tags:L100/too many tags/} (e0dd8df).

wankio · 2018-07-29T13:53:14Z

thank, so i need update gallery-dl again ?

mikf · 2018-07-29T14:05:09Z

Only if you want to use the L format specifier feature.

wankio · 2018-08-01T16:27:41Z

oh today it stop working in 3 hours....no error, just stop download. Command Window still processing but it dont download any new link in 3hours (checked website, still no error)
and with archive option in sankaku extractor, why i feel it so slow to check downloaded link. Wait Min/Max 2/5 but sometime it wait 8-10 or maybe 20+ seconds to just check files

mikf · 2018-08-01T20:09:07Z

oh today it stop working in 3 hours....no error, just stop download.

Hmm, there is a slim possibility that a HTTP requests "gets stuck" and the client waits forever for a reply from the remote server. Some HTTP requests send by gallery-dl - for some reason - don't have a timeout, so it probably happened with one of those. Fixing this should be easy. In the meantime: Ctrl+c and try again.

why i feel it so slow to check downloaded link

Because it has to get download URL and metadata before it can check if a file has already been downloaded (same as youtube-dl). It doesn't help that Sankaku is incredibly slow itself, so you have to wait 2-5 seconds before each HTTP request (to avoid 429 Too Many Requests errors) and then you have to wait for the request itself to finish, which might take another 5 seconds.

When using sakaku stuff, you should really use the --range command-line option when necessary, as it allows the extractor to quickly jump ahead. gallery-dl --range 250- URL... is going to immediately jump to image nr. 250 and start from there.

wankio · 2018-08-02T06:57:40Z

yeah...it's easy to fix with --range you told me

Being logged in only lets you use more than 5 tags at once and allows you to jump to higher page numbers faster (with --range 800-, for example)

5tags at once, you mean 5 tags combined : ?tags=dynasty_warriors brown_hair china_dress female shoes right ?

Because it has to get download URL and metadata before it can check if a file has already been downloaded (same as youtube-dl). It doesn't help that Sankaku is incredibly slow itself, so you have to wait 2-5 seconds before each HTTP request (to avoid 429 Too Many Requests errors) and then you have to wait for the request itself to finish, which might take another 5 seconds.

sometime it wait 15-20seconds is normal ?

When using sakaku stuff, you should really use the --range command-line option when necessary, as it allows the extractor to quickly jump ahead. gallery-dl --range 250- URL... is going to immediately jump to image nr. 250 and start from there.

so i need to count downloaded files and compare with tags(totalresult) to know exactly range i need to put in right ?

It should have feature to skipped tags once it reach downloaded files (so it just only download newer pictures and stopped once it reach downloaded files if extractor archive option enabled)

mikf · 2018-08-02T14:41:36Z

yeah...it's easy to fix with --range you told me

That is not what I meant. I wanted to say "It's easy for me to add a timeout to regular HTTP requests, so it doesn't get stuck anymore" -> 68d6033

5tags at once, you mean 5 tags combined : ?tags=dynasty_warriors brown_hair china_dress female shoes right ?

Right.

sometime it wait 15-20seconds is normal ?

Not really, no. I might be the case that the wait-min/-max default values are too low and you get 429 Too Many Requests responses from sankaku. In that case gallery-dl retries the original request after waiting for a bit, but it can take quite a bit of time until sankaku sends a normal response.

You can enable verbose output (-v) to see what goes on behind the scenes. If you encounter anything 429 related, increase wait-min/-max until this doesn't happen anymore.

so i need to count downloaded files and compare with tags(totalresult) to know exactly range i need to put in right ?

Your computer can count them for you:

... and you don't need the exact range, the start index is enough.

--range 200-300 will download anything from 200 to 300, but you can omit the end index (--range 200-) to download from 200 to the end or the start index to download up to 300 (--range -300).

It should have feature to skipped tags once it reach downloaded files (so it just only download newer pictures and stopped once it reach downloaded files if extractor archive option enabled)

  --abort-on-skip           Abort extractor run if a file download would
                            normally be skipped, i.e. if a file with the same
                            filename already exists

or the extractor.skip option

wankio · 2018-08-05T04:30:33Z

thank you

can we have a option to download sample if original file dimension is too big, depend on width or height ?
some files have 9000-10000px width , if we limit maximum width(3000 maybe) it will download sample instead

mikf · 2018-08-05T08:33:26Z

Not going to happen.
You can download the original and then down-sample it yourself, or ignore it with --filter.

You should also open a new issue if you want to suggest a new feature. This one here is closed for a reason.

wankio · 2018-08-05T10:14:08Z

ok thank

mikf added the user:question label Jul 11, 2018

mikf added bug enhancement and removed user:question labels Jul 11, 2018

mikf added a commit that referenced this issue Jul 14, 2018

[sankaku] add 'tags' option (#94)

269dc2b

mikf closed this as completed Jul 29, 2018

Fuwiz mentioned this issue Jul 31, 2024

How to get artist name from gelbooru post? #5919

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tags in filename for sankaku or other booru host #94

tags in filename for sankaku or other booru host #94

wankio commented Jul 11, 2018

mikf commented Jul 11, 2018

wankio commented Jul 11, 2018 •

edited

Loading

mikf commented Jul 11, 2018

wankio commented Jul 11, 2018 •

edited

Loading

mikf commented Jul 11, 2018

wankio commented Jul 11, 2018 •

edited

Loading

mikf commented Jul 11, 2018

wankio commented Jul 12, 2018 •

edited

Loading

mikf commented Jul 14, 2018

ghost commented Jul 21, 2018 •

edited by ghost

Loading

mikf commented Jul 21, 2018

wankio commented Jul 22, 2018

wankio commented Jul 29, 2018 •

edited

Loading

mikf commented Jul 29, 2018

wankio commented Jul 29, 2018 •

edited

Loading

mikf commented Jul 29, 2018

wankio commented Jul 29, 2018

mikf commented Jul 29, 2018

wankio commented Aug 1, 2018 •

edited

Loading

mikf commented Aug 1, 2018

wankio commented Aug 2, 2018 •

edited

Loading

mikf commented Aug 2, 2018

wankio commented Aug 5, 2018 •

edited

Loading

mikf commented Aug 5, 2018

wankio commented Aug 5, 2018

tags in filename for sankaku or other booru host #94

tags in filename for sankaku or other booru host #94

Comments

wankio commented Jul 11, 2018

mikf commented Jul 11, 2018

wankio commented Jul 11, 2018 • edited Loading

mikf commented Jul 11, 2018

wankio commented Jul 11, 2018 • edited Loading

mikf commented Jul 11, 2018

wankio commented Jul 11, 2018 • edited Loading

mikf commented Jul 11, 2018

wankio commented Jul 12, 2018 • edited Loading

mikf commented Jul 14, 2018

ghost commented Jul 21, 2018 • edited by ghost Loading

mikf commented Jul 21, 2018

wankio commented Jul 22, 2018

wankio commented Jul 29, 2018 • edited Loading

mikf commented Jul 29, 2018

wankio commented Jul 29, 2018 • edited Loading

mikf commented Jul 29, 2018

wankio commented Jul 29, 2018

mikf commented Jul 29, 2018

wankio commented Aug 1, 2018 • edited Loading

mikf commented Aug 1, 2018

wankio commented Aug 2, 2018 • edited Loading

mikf commented Aug 2, 2018

wankio commented Aug 5, 2018 • edited Loading

mikf commented Aug 5, 2018

wankio commented Aug 5, 2018

wankio commented Jul 11, 2018 •

edited

Loading

wankio commented Jul 11, 2018 •

edited

Loading

wankio commented Jul 11, 2018 •

edited

Loading

wankio commented Jul 12, 2018 •

edited

Loading

ghost commented Jul 21, 2018 •

edited by ghost

Loading

wankio commented Jul 29, 2018 •

edited

Loading

wankio commented Jul 29, 2018 •

edited

Loading

wankio commented Aug 1, 2018 •

edited

Loading

wankio commented Aug 2, 2018 •

edited

Loading

wankio commented Aug 5, 2018 •

edited

Loading