Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reddit][Ask] Specifying output format for Reddit extractor #551

Closed
TheGlassEyedVillian opened this issue Dec 30, 2019 · 9 comments
Closed

Comments

@TheGlassEyedVillian
Copy link

TheGlassEyedVillian commented Dec 30, 2019

Trying to figure out how I can configure "filename" for Reddit extractor in the config file
I've tried to use the --list-keywords option
But it doesn't return Filename options which in this case should be under Keywords for filename
Tried the same with Pixiv and Imgur which seem to work

Is there any way to set the same for Reddit?
In this case i'd like to extract the posts in the following manner
{id}_{title}.{extension}

I've tried this for Reddit based on the output

gallery-dl --list-keywords https://www.reddit.com/r/sadcats/comments/e6ua3c/_/

Keywords for --chapter-filter:

all_awardings[]
allow_live_comments
False
approved_at_utc
None
approved_by
None
archived
False
author
SkyeDrakon
author_flair_background_color
None
author_flair_css_class
None
author_flair_richtext[]
author_flair_template_id
None
author_flair_text
None
author_flair_text_color
None
author_flair_type
text
author_fullname
t2_4jqvxkac
author_patreon_flair
False
author_premium
False
awarders[]
banned_at_utc
None
banned_by
None
can_gild
True
can_mod_post
False
category
None
clicked
False
content_categories
None
contest_mode
False
created
1575638923.0
created_utc
1575610123.0
discussion_type
None
distinguished
None
domain
i.redd.it
downs
0
edited
False
gilded
0
hidden
False
hide_score
False
id
e6ua3c
is_crosspostable
True
is_meta
False
is_original_content
False
is_reddit_media_domain
True
is_robot_indexable
True
is_self
False
is_video
False
likes
None
link_flair_background_color

link_flair_css_class
None
link_flair_richtext[]
link_flair_text
None
link_flair_text_color
dark
link_flair_type
text
locked
False
media
None
media_only
False
mod_note
None
mod_reason_by
None
mod_reason_title
None
mod_reports[]
name
t3_e6ua3c
no_follow
False
num_comments
2
num_crossposts
0
num_duplicates
0
num_reports
None
over_18
False
parent_whitelist_status
None
permalink
/r/sadcats/comments/e6ua3c/_/
pinned
False
post_hint
image
preview[enabled]
True
preview[images][][id]
eMEM0YGeIHBDFXoN5m5aafOjDBGAvPp78llLcTDoawU
preview[images][][resolutions][][height]
93
preview[images][][resolutions][][url]
https://preview.redd.it/k3f7x5rb8y241.jpg?width=108&crop=smart&auto=webp&s=e3882000c70dc2dfb0c9c9c8f478f54975c64945
preview[images][][resolutions][][width]
108
preview[images][][source][height]
556
preview[images][][source][url]
https://preview.redd.it/k3f7x5rb8y241.jpg?auto=webp&s=61f726f380df20140b5ecc439bb322e5570bfa63
preview[images][][source][width]
640
pwls
None
quarantine
False
removal_reason
None
removed_by
None
removed_by_category
None
report_reasons
None
saved
False
score
155
secure_media
None
selftext

selftext_html
None
send_replies
True
spoiler
False
steward_reports[]
stickied
False
subreddit
sadcats
subreddit_id
t5_2qr26
subreddit_name_prefixed
r/sadcats
subreddit_subscribers
84176
subreddit_type
public
suggested_sort
None
thumbnail
https://b.thumbs.redditmedia.com/Eogt4WDbD_UVsEEWi3Ro9uVmgQ_LTysf6LYXfjSVwJk.jpg
thumbnail_height
121
thumbnail_width
140
title
:(
total_awards_received
0
ups
155
upvote_ratio
0.98
url
https://i.redd.it/k3f7x5rb8y241.jpg
user_reports[]
view_count
None
visited
False
whitelist_status
None
wls
None
`

But this results in the filename being
None_None.jpg

How can i configure it so that my output file is named as
e6ua3c_ :(.jpg

Please ignore the illegal filename in this case its just for example

Thanks for reading through

@wankio
Copy link
Contributor

wankio commented Dec 30, 2019

as far as i know metadata for reddit is broken

@mikf
Copy link
Owner

mikf commented Jan 2, 2020

… maybe not outright broken, just not implemented.

Reddit extractors are more or less only "URL fetchers" for submissions and comments. These URLs will get handled by other extractors according to their configuration, but there is currently no good way of using the Reddit metadata itself to build filenames.

You could use something like gallery-dl -d SUBREDDIT reddit.com/r/SUBREDDIT to put all media files from one subreddit into its own directory, but that's about it.

@Hrxn
Copy link
Contributor

Hrxn commented Jan 2, 2020

You could use something like gallery-dl -d SUBREDDIT reddit.com/r/SUBREDDIT to put all media files from one subreddit into its own directory, but that's about it.

That's what I've been doing with Reddit as well, whenever used so far.
And now that I think about it, something like this would probably be a good default setting here.
Or maybe use something like {base-dir}/reddit/SUBREDDIT, basically.
I don't know about you, but I think that would be the most likely expected behaviour here in this case. Feel free to disagree and give counterpoints, obviously.

I don't know about proper implementation, like always running something along the lines of -d SUBREDDIT implicitly, but I don't know what @mikf 's plans are with regard to any changes with the reddit extractor, or any other "architectural" changes in general.

@mikf
Copy link
Owner

mikf commented Jan 2, 2020

I'm thinking of

  1. Handling Reddit-hosted files "natively" (i.redd.it, v.redd.it) so it becomes possible to use the "reddit" section in your config file to define proper directory- and filenames for those, and
  2. Implicitly changing the base-directory for spawned child extractors so any files downloaded by those will be included in the parent directory. Something like
reddit
└── lavaporn
    ├── e6bt0n Was told you guys would enjoy this..jpg
    ├── gfycat
    │   ├── gfycat_DazzlingAgitatedDromaeosaur_Wonder of Science …
    │   └── gfycat_SoggyRepulsiveAustraliancurlew_Close up of magma strands splitting apart.mp4
    └── imgur
    │   └── imgur_aosmMbD_Lava flow from the Pacaya volcano, with Agua, Acat…
…

Does this sound OK?

@Hrxn
Copy link
Contributor

Hrxn commented Jan 3, 2020

Yes, even better, in my opinion. I'd vote in favor of it.

@kattjevfel
Copy link
Contributor

as long as it can be opted out from, I quite like the way it is now tbh.

@Hrxn
Copy link
Contributor

Hrxn commented Jan 3, 2020

@kattus , if I may ask, how do you sort stuff coming from a specific subreddit then?

@kattjevfel
Copy link
Contributor

kattjevfel commented Jan 4, 2020

@kattus , if I may ask, how do you sort stuff coming from a specific subreddit then?

I don't, my primary use for the reddit feature is downloading "saved" content.

Though I must add my main problem with the above suggestion is forcing filenames. I may have spoken a bit hastily, I suppose in the end it means I'll just have to force even more custom filenames, lovely.

@mikf
Copy link
Owner

mikf commented Jan 31, 2020

Filenames for images and videos hosted on Reddit ([iv].redd.it) now use their submission's metadata and can be customized with extractor.reddit.filename.

Putting results from child extractors into their parent's directory can be enabled by enabling parent-directory for the parent extractor, e.g. reddit.

@kattus you can restore the old behavior by setting

  • filename to "{filename}.{extension}" and
  • directory to ["{category}"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants