Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gelbooru] Missing Favorites Extractor #3704

Closed
9696neko opened this issue Feb 26, 2023 · 6 comments
Closed

[Gelbooru] Missing Favorites Extractor #3704

9696neko opened this issue Feb 26, 2023 · 6 comments

Comments

@9696neko
Copy link

Hello!

When I run gallery-dl "https://gelbooru.com/index.php?page=favorites&s=view&id=1111111" --write-metadata -o skip=true where 1111111 is my user ID, I get:
gallery-dl: Unsupported URL 'https://gelbooru.com/index.php?page=favorites&s=view&id=1111111

From looking at gallery-dl --list-extractors I can discern only "Pool, Post, and Tag" are defined for Gelbooru.

Reading the code in gallery_dl/extractor/ it looks like there are Favorites Extractors defined for GelbooruV01 and V02. It looks like a favorite extractor section is missing, that would probably use V01/V02's, in the main gelbooru.py file?

I would like to request this Favorites addition to the Gelbooru Extractors. Thanks!

@mikf
Copy link
Owner

mikf commented Feb 26, 2023

Favorites listings like https://gelbooru.com/index.php?page=favorites&s=view&id=1111111 seem to require login / redirect to the login page.

Would it be OK to internally use the results from https://gelbooru.com/index.php?page=post&s=list&tags=fav:1111111, which for some reason does not require a login, or are its results somehow different?

@9696neko
Copy link
Author

The results from your link format are publically accessible but they differ from logged-in favorites in two ways:

  1. [Important] They are sorted by descending image id compared to the logged-in view's chronological (newest added to oldest). This would affect people who want to run subsequent update runs where gallery-dl aborts after say five duplicates instead of scanning every favorite.
  2. By default Gelbooru hides fringe results behind a setting that can be set without an account called "Display all site content." If not set, some images will be excluded whether logged in or not.

So I believe it is not OK unless this is the only method we can rely on due to sorted by id difference.

mikf added a commit that referenced this issue Mar 1, 2023
requires logged in cookies to work
@mikf mikf closed this as completed Mar 1, 2023
@9696neko
Copy link
Author

9696neko commented Mar 2, 2023

Great stuff. Thank you very much!

I will wait for next release before testing it.

@9696neko
Copy link
Author

9696neko commented Mar 8, 2023

I ended up testing this out and it works! Build: 1.25.0-dev

Some nitpicks:

  • It initially found no images by using Gelbooru API key but then I remembered it needed login cookies. Maybe the error message could mention this?
  • I did notice you maybe forgot to update --list-extractors with this new supported format?

mikf added a commit that referenced this issue Mar 15, 2023
and add docstring so it shows up in --list-extractors
mikf added a commit that referenced this issue Mar 15, 2023
@mikf
Copy link
Owner

mikf commented Mar 15, 2023

I found way to access favorites without cookies, API key, or any form of authentication: dcb8af6.

What I haven't found is a way to have them returned in descending order; &order=desc has no effect. The current code has to do a bit extra to reverse them and --range does not work as expected


--list-extractors skips extractors without docstring, which was the case for the favorite extractor. Fixed in b756dc1.

@9696neko
Copy link
Author

Very nice work.

I originally tried to find the source code for *booru's (to grab the GET params API) but had no luck.
From looking at the commits, you reverse the favorites before querying (which I guess are chronologically ascending). If so, the current code seems fine as a workaround until we find another way.

Regarding --range you may want to disable it for this extractor unless we can figure something out. I think unexpected results may be worse then nothing at all because people may mistakenly think their range flag was successful.

Sorry I can't contribute more because my Python skills are rusty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants