-
-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to use as library #642
Comments
Use a Setting config options should be done via the functions in config.py, like For example: from gallery_dl import config, job
config.load() # load default config files
config.set(("extractor",), "base-directory", "/tmp/")
config.set(("extractor", "imgur"), "filename", "{id}{title:?_//}.{extension}")
for url in urls:
job.DownloadJob(url).run() |
Thank you @mikf, it helped a lot. For others reading this issue, to know which options you need to set use the two config examples (1 and 2) with the options description. Here are some options I've set: config.set(('extractor',), "archive", '~/.gallery-dl/archive.sql')
config.set(('extractor',), "base-directory", '~/downloads')
config.set(('extractor', 'deviantart'), "image-range", '1-10')
config.set(('extractor', 'deviantart'), "flat", False)
config.set(('extractor', 'deviantart'), "metadata", True)
config.set(
('extractor',),
'postprocessors',
[
{
"name": "metadata",
"mode": "json",
}
]
) I'm still unable to configure the output, what am I doing wrong? config.set(('output',), 'mode', 'terminal')
config.set(
('output',),
'log',
{
"level": "info",
"format": {
"debug": "\u001b[0;37m{name}: {message}\u001b[0m",
"info": "\u001b[1;37m{name}: {message}\u001b[0m",
"warning": "\u001b[1;33m{name}: {message}\u001b[0m",
"error": "\u001b[1;31m{name}: {message}\u001b[0m"
}
},
)
config.set(
('output',),
'logfile',
{
"path": "log.txt",
"mode": "w",
"level": "debug"
},
)
config.set(
('output',),
"unsupportedfile",
{
"path": "unsupported.txt",
"mode": "a",
"format": "{asctime} {message}",
"format-date": "%Y-%m-%d-%H-%M-%S"
},
) It produces the following 'output': {'log': {'format': {'debug': '\x1b[0;37m{name}: {message}\x1b[0m',
'error': '\x1b[1;31m{name}: {message}\x1b[0m',
'info': '\x1b[1;37m{name}: {message}\x1b[0m',
'warning': '\x1b[1;33m{name}: {message}\x1b[0m'},
'level': 'info'},
'logfile': {'level': 'debug', 'mode': 'w', 'path': 'log.txt'},
'mode': 'auto',
'unsupportedfile': {'format': '{asctime} {message}',
'format-date': '%Y-%m-%d-%H-%M-%S',
'mode': 'a',
'path': 'unsupported.txt'}}} Which is similar to the config example, but neither Thanks |
All logging output is done via Python's You can use that to configure and attach your own handlers to the root logger, Take a look at For example import logging
from gallery_dl import output
# initialze logging and setup logging handler to stderr
output.initialize_logging(logging.INFO)
# apply config options to stderr handler and create file handler
output.configure_logging(logging.INFO)
# create unsupported-file handler
output.setup_logging_handler("unsupportedfile", fmt="{message}") |
@mikf would you accept a PR documenting how to do this? |
@rpdelaney Sure. I'd be happy about any sort of contribution, especially documentation. Let me know if you need anything or if I should explain how certain things (are supposed to) work. |
😅 I tried understanding this without success. For a split async moment, I simply use |
config example { "output": {
"log": { "level": "debug" },
"#": "write logging messages to a separate file",
"logfile": { "path": "/home/user/log.log", "mode": "a", "level": "debug" },
"#": "write unrecognized URLs to a separate file",
"unsupportedfile": { "path": "/home/user/unsupported.log", "mode": "a" }
}} import logging
from gallery_dl import config, output
from gallery_dl.exception import NoExtractorError
from gallery_dl.extractor.common import get_soup
from gallery_dl.job import DataJob
# load config before setting up logging
config.load()
# initialze logging and setup logging handler to stderr
output.initialize_logging(logging.DEBUG)
# apply config options to stderr handler and create file handler
output.configure_logging(logging.DEBUG)
# create unsupported-file handler
output.setup_logging_handler("unsupportedfile", fmt="{message}")
url = 'https://www.reddit.com/r/Hololive/comments/rcqpgr/'
job = DataJob(url)
job.run()
# process `job.data` if you want to supress import os
with open(os.devnull, "w") as f:
job.file = f
job.run() |
Does anyone of you know how to output all urls, like with the -g flag, but as a Python list and not on the stdout? Any ideas? |
have you tried last code? after job.run you can get the output from job.data |
@53845714nF just copy the class UrlJob(Job):
def __init__(self, url, parent=None):
Job.__init__(self, url, parent)
self.urls = []
def handle_url(self, url, _):
self.urls.append(url) Accessing URLs afterwards is then just >>> j = UrlJob("imgur.com/asdqwe")
>>> j.run()
0
>>> j.urls
['https://i.imgur.com/asdqw.jpg'] |
@mikf Awesome, works for me. 😘 And thanks for the work is a great program. Hint: |
I've managed to create a job which acts like a Python generator. Useful when you need to extract large amount of posts, especially from Pixiv or E-Hentai, because the job produces each posts right after it was extracted and groups images by post. Essentially, you can just iterate over posts and their URLs. Implementationfrom itertools import groupby
from operator import itemgetter
from gallery_dl.extractor.message import Message
from gallery_dl.job import Job
from gallery_dl.util import build_duration_func
from gallery_dl.exception import StopExtraction
# https://stackoverflow.com/questions/12775449/group-an-iterable-by-a-predicate-in-python
def igroup(iterable, isstart):
"""
Turn [header, data1, data2, header, data3, data4, data5, header, header, ...]
into [
(header, [data1, data2]),
(header, [data3, data4, data5]),
(header, []),
(header, []),
...
]
"""
def key(item, count=[False]):
if isstart(item):
count[0] = not count[0] # start new group
return count[0]
for xs in map(itemgetter(1), groupby(iterable, key)):
header = next(xs)
yield header, xs
class GeneratorJob(Job):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dispatched = False
def _run(self):
extractor = self.extractor
sleep = build_duration_func(extractor.config("sleep-extractor"))
if sleep:
extractor.sleep(sleep(), "extractor")
try:
for msg in extractor:
self.dispatch(msg)
if self.dispatched:
yield msg
self.dispatched = False
except StopExtraction:
pass
def run(self):
message_generator = self._run()
for post_mes, url_mess in igroup(
message_generator, lambda msg: msg[0] == Message.Directory
):
post = post_mes[1]
urls = map(lambda mes: (mes[1], mes[2]), url_mess)
yield (post, urls)
def handle_url(self, url, kwdict):
self.dispatched = True Example usagefor post_dict, image_infos in GeneratorJob("https://www.pixiv.net/en/users/3143520/illustrations").run():
print(post_dict)
# Note: you must completely consume image_infos each time.
for image_url, image_dict in image_infos:
print(image_url)
print(image_dict)
print() The example URL is SFW. Example output (first 3 posts)
|
I've done a lot of playing with gallery_dl for the past few days but I have hit a snag. I'm trying to set up a custom function to run if there's an error / log message I don't like. yt_dlp lets me do this by adding a custom logger, of which I've attempted a couple dozen ways of attempting to read the log output from gallery_dl. I don't want to write this to a file, making the current output options irrelevant. If anyone has experience with this, I'd love to see an example on how to do this properly. I appreciate it. I'll post my code I have later but I realized it's quite impolite to ask for help without contributing something. I've wrote a series of config changes for config setting at the beginning of my code. gallery_dl.config.load()
# Set global config settings for GalleryDL Temporarily
gallery_dl.config.set(('extractor',), "archive", '/imgur/archive/imgur.sql')
gallery_dl.config.set(('extractor',), "base-directory", '/imgur/archive')
gallery_dl.config.set(('extractor',), "sleep", 1 )
gallery_dl.config.set(('extractor',), "http-timeout", 5 )
# Set Direct link extractor settings
gallery_dl.config.set(('extractor', 'directlink'), "archive", '/imgur/archive/imgur.sql')
gallery_dl.config.set(('extractor', 'directlink'), "archive-prefix", 'imgur.com, ')
gallery_dl.config.set(('extractor', 'directlink'), "archive-format", 'image, {filename}')
gallery_dl.config.set(('extractor', 'directlink'), "archive-pragma", { "journal_mode=WAL", "synchronous=NORMAL" } )
gallery_dl.config.set(('extractor', 'directlink'), "base-directory", '/imgur/archive/')
gallery_dl.config.set(('extractor', 'directlink'), "directory", { "image" } )
gallery_dl.config.set(('extractor', 'directlink'), "filename", '{filename}.{extension!l}')
gallery_dl.config.set(('extractor', 'directlink'), "sleep", 1 )
# Set Imgur Extractor Settings
gallery_dl.config.set(('extractor', 'imgur'), "archive", '/imgur/archive/imgur.sql')
gallery_dl.config.set(('extractor', 'imgur'), "archive-prefix", 'imgur.com, ')
gallery_dl.config.set(('extractor', 'imgur'), "archive-format", '{subcategory}, {id}')
gallery_dl.config.set(('extractor', 'imgur'), "archive-pragma", { "journal_mode=WAL", "synchronous=NORMAL" } )
gallery_dl.config.set(('extractor', 'imgur'), "base-directory", '/imgur/archive/')
gallery_dl.config.set(('extractor', 'imgur'), "filename", '{id|filename}.{extension!l}')
# Set other Imgur Extractor Settings
gallery_dl.config.set(('extractor', 'imgur'), "image", { "directory" : [ "image" ] })
gallery_dl.config.set(('extractor', 'imgur'), "album", { "directory" : [ "album", "{album['id']}" ] })
gallery_dl.config.set(('extractor', 'imgur'), "favorite", { "directory" : [ "favorite" ] })
gallery_dl.config.set(('extractor', 'imgur'), "gallery", { "directory" : [ "gallery" ] })
gallery_dl.config.set(('extractor', 'imgur'), "search", { "directory" : [ "search" ] })
gallery_dl.config.set(('extractor', 'imgur'), "subreddit", { "directory" : [ "subreddit" ] })
gallery_dl.config.set(('extractor', 'imgur'), "tag", { "directory" : [ "tag" ] })
gallery_dl.config.set(('extractor', 'imgur'), "user", { "directory" : [ "user" ] })
# Set Downlader Extractor Settings
gallery_dl.config.set(('downloader',), 'mtime', True)
# Set postprocessor settings globally
gallery_dl.config.set(('extractor',),
'postprocessors',
[
{
"name": "metadata",
"mode": "json",
"extension": "json",
"extension-format": "{extension!l}.json",
"event": "file",
"mtime": True
}
]) |
I've kept hammering at it, this is going from a short snippet on stack overflow that does in fact work in it's entirety with a logger of I was looking at output.py and realized gallery-dl creates a basic logger object of I may be mistaken in how that actually returns or creates this, in that case I apologize. import logging
import gallery_dl
class MyLogger(logging.Handler):
#def handle(*args):
def emit(*args):
print('Custom Handler')
for item in args:
print(item)
gallery_dl.output.initialize_logging(logging.INFO)
gallery_dl.output.configure_logging(logging.INFO)
logging.getLogger('gallery-dl').addHandler(MyLogger(logging.INFO))
gallery_dl.job.DownloadJob("imgur.com/8372ne").run() I've included a junk command at the end that'll fail as an example of something I want to run a custom handler on for behavior within my embedded library use. |
I've tried grabbing stderr with StringIO to no success. I've came up with this after digging around the code. class MyLogger(logging.Handler):
#def handle(*args):
def emit(*args, **kwags):
print('--- args ---')
for item in args:
print(item)
print('---')
# --- main ---
logger = logging.getLogger('imgur')
logger.setLevel(logging.ERROR)
my_logger = MyLogger(logging.ERROR)
logger.addHandler(my_logger)
gallery_dl.job.DownloadJob("imgur.com/8372ne").run() However the results I get vs what's printed to stdout/err is completely useless. Results
There's no info as to the type of error just that there's an error. |
@pink-red's code wasn't working for me (I only tried version 1.25.5 - Git HEAD: 915d868) Modifying it like this seems to be working for my purposes: from gallery_dl.extractor.message import Message
from gallery_dl.job import Job
from gallery_dl.util import build_duration_func
from gallery_dl.exception import StopExtraction
class GeneratorJob(Job):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dispatched = False
self.visited = set()
self.status = 0
def message_generator(self):
extractor = self.extractor
sleep = build_duration_func(extractor.config("sleep-extractor"))
if sleep:
extractor.sleep(sleep(), "extractor")
try:
for msg in extractor:
self.dispatch(msg)
if self.dispatched:
yield msg
self.dispatched = False
except StopExtraction:
pass
def run(self):
for msg in self.message_generator():
ident, url, kwdict = msg
if ident == Message.Url:
yield (msg[1], msg[2])
elif ident == Message.Queue:
if url in self.visited:
continue
self.visited.add(url)
cls = kwdict.get("_extractor")
if cls:
extr = cls.from_url(url)
else:
extr = self.extractor.find(url)
if extr:
job = self.__class__(extr, self)
for webpath, info in job.run():
yield (webpath, info)
else:
raise TypeError
def handle_url(self, url, kwdict):
self.dispatched = True
def handle_queue(self, url, kwdict):
self.dispatched = True |
Is it just me or is the library not usable in Jupyter Notebook? Importing ---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 import gallery_dl
File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/__init__.py:11
9 import sys
10 import logging
---> 11 from . import version, config, option, output, extractor, job, util, exception
13 __author__ = "Mike Fährmann"
14 __copyright__ = "Copyright 2014-2023 Mike Fährmann"
File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/option.py:14
12 import logging
13 import sys
---> 14 from . import job, util, version
17 class ConfigAction(argparse.Action):
18 """Set argparse results as config values"""
File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/job.py:15
13 import collections
14 from . import extractor, downloader, postprocessor
---> 15 from . import config, text, util, path, formatter, output, exception, version
16 from .extractor.message import Message
17 from .output import stdout_write
File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/output.py:257
253 sys.stderr.write(s)
254 sys.stderr.flush()
--> 257 if sys.stdout.line_buffering:
258 def stdout_write(s):
259 sys.stdout.write(s)
AttributeError: 'OutStream' object has no attribute 'line_buffering' Importing in a normal (Note: I am using Python 3.11.6, ipykernel 6.26.0 and gallery-dl 1.26.1) |
Im using gallery-dl for downloadng instagram reels. Thanks to @mikf i managed to get the list of links, but how could i also get the shortcode from each reel?
but for some reason it doesnt get the first reel's shortcode |
File shortcodes ( |
What do you mean? the code abode does get me all the necessary information except for the first reel |
I am using the script and am having a problem with checking if the file has been downloaded or not. from gallery_dl import config, job
config.set((), 'base-directory', './')
config.set(('extractor', 'instagram'), 'filename', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}.{extension}')
config.set(('extractor', 'instagram'), 'archive', '.archives/{category}.sqlte3')
config.set(('extractor', 'instagram'), 'archive-format', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}')
config.set(('extractor', 'instagram', 'posts'), 'directory', ['instagram', '{username}', 'Posts'])
job.DownloadJob('https://www.instagram.com/username').run() |
Did you check the directory you set? What are you expecting to happen? If you're looking for logging output, check mikf's post and rachmadaniHaryono's post about configuring logging from further up if you haven't already.. |
I am expecting that in the next runs, it will not overwrite the files it has previously downloaded. I will refer to the source you suggested. Thank you for spending your time |
Yes, your assumptions should be correct so far. |
Yeah, it worked. I tried deleting all downloaded files and it wont download them again. but it printed the output as |
I want to filter specific dates into my code, and I know there is way using command line with --filter |
I also want to filter specific dates into my code, and I know there is way using command line with --filter |
ues this set |
Are only the images filtered here? Or does this also reduce the number of requests to Twitter, for example? |
|
Created draft of documentation in the Wiki: Embedding gallery‐dl from another Python script |
Hmm, looks like I moved it and the wiki doesn't update. New link: https://github.com/mikf/gallery-dl/wiki/Developer-Instructions#embedding |
Thanks for the document, but the first thing I saw in this doc was "youtube-dl". Kinda funny haha. |
Thanks for catching that; it's now been corrected. Let me know if you see anything else that can be improved! |
Hi, I intend to use gallery-dl as a library for a program to periodically fetch the selected sources.
I already do it with youtube-dl as it's documented in their docs.
Is there a simple way to do the following with gallery-dl?
I've seen in this issue that you can process one url:
But it doesn't downloads the file, nor it works with gallery links such as https://www.deviantart.com/{{ user }}
Thank you
The text was updated successfully, but these errors were encountered: