Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tumblr] OAuth upgrades #65

Closed
Hrxn opened this issue Jan 7, 2018 · 10 comments
Closed

[tumblr] OAuth upgrades #65

Hrxn opened this issue Jan 7, 2018 · 10 comments

Comments

@Hrxn
Copy link
Contributor

Hrxn commented Jan 7, 2018

One important difference when using Tumblr with OAuth 1.0a authentication:

Try this Tumblr blog, for example: https://embedded-demos.tumblr.com/

It doesn't work with gallery-dl. Open in an incognito tab:

1
You'll end here: https://www.tumblr.com/login_required/embedded-demos

Open while signed in to Tumblr, and you will see it. But only displayed alongside your Tumblr Dashboard.

It's this "ingenious" feature:
2

I hate it. Ridiculous waste of screen space. You can't use /tagged. And you can't use /archive, which is essential, in my opinion.
(If anyone of you knows how this still works, please let me know!)

But here is the interesting part:

You can try it for yourself, with the Tumblr Web Console. Just check the Blog Info, for example.
https://api.tumblr.com/console/calls/blog/info

3
VS.

4

You can observe the same results with
https://api.tumblr.com/console/calls/blog/posts

And you actually get the "posts": [ { ... array in your response! (jq '.response.posts', for example)
Great news!

All of this has one additional benefit:

https://www.tumblr.com/docs/en/api/v2#user-methods

OAuth is required for /user/likes

/user/likes — Retrieve a User's Likes

Use this method to retrieve the liked posts that match the OAuth credentials submitted with the request.

Responses

  • liked_posts - Array - An array of post objects (posts liked by the user)
  • liked_count - Number - Total number of liked posts

This might be a useful additional feature, i.e. likes subcategory/subextractor, together with the already existing user, post and tag.

@mikf
Copy link
Owner

mikf commented Jan 12, 2018

OAuth support and likes extractor done.
I decided do use /blog/….tumblr.com/likes instead of /user/likes since the latter only provides liked posts of the OAuth authenticated user and doesn't have a "real" URL to be associated with, whereas the former is theoretically usable for any Tumblr blog (example.tumblr.com/likes). It just so turns out that a lot of Tumblr users have made their likes private.

@mikf mikf closed this as completed Jan 12, 2018
@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 13, 2018

Hey, that's really great. Thanks a lot!

I've already updated gallery-dl (of course), and will try a run with a "hidden" tumblr log later today.
Will report any results here.

It's too bad that I can't think of any way to really check the results. Bothers me for a while now, but I can't wrap my head around it. If it's really possible at all.
A pure picture blog (where one post = one picture) would be nice, but I've never seen one so far that is also hidden. Maybe later...

@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 13, 2018

Good idea with the likes extraction, by the way. Private likes are private, obviously, that was not in question. I just thought it is useful to use the "like" feature and then be able to easily get all your own likes, similar to maybe your own favorites on DeviantArt or something. But other public likes as well, even better 😄

@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 15, 2018

@mikf One question regarding the OAuth procedure...

token = extractor.config("access-token")
token_secret = extractor.config("access-token-secret")
if token and token_secret:
self.session = util.OAuthSession(
extractor.session,
self.api_key, api_secret, token, token_secret)
self.api_key = None
else:
self.session = extractor.session

If I read this code correctly, gallery-dl checks the config for "access-token" and "access-token-secret", and if present switches to authentication via OAuth. Otherwise, it would only use the API Key, right?

(BTW, the other OAuth extractors (DeviantArt, Flickr, Reddit) are doing basically the same, or is it something different?)

And the API Key is always = OAuth Consumer Key. Yay for convenience.

What I don't get, and maybe I'm only reading the Tumblr API Docs wrong somehow, but what about the API Secret (Or the OAuth Consumer Secret, as it's called there)?

Because they always seem to mention the full set of four values:
Consumer Key, Consumer Secret, Token, Token Secret

And when doing the gallery-dl oauth:tumblr, requesting and confirming tokens, you'll get the Access Token and Access Token Secret (for 'extractor.tumblr.access-token' and 'extractor.tumblr.access-token-secret').

"""Minimal interface for the Tumblr API v2"""
API_KEY = "O3hU2tMi5e4Qs5t3vezEi6L0qRORJ5y9oUpSGsrWu8iA3UCc3B"
API_SECRET = "sFdsK3PDdP2QpYMRAoq0oDnw0sFS24XigXmdfnaeNZpJpqAn03"
def __init__(self, extractor):
self.api_key = extractor.config("api-key", self.API_KEY)
api_secret = extractor.config("api-secret", self.API_SECRET)

This tries to retrieve "api-key" and "api-secret" from the config, but when is the api-secret really used? I mean, we have both Key and Secret at the beginning of this example, as the predefined default values. Is there any reasoning for that? To not set both Key and Secret in the config?

@mikf
Copy link
Owner

mikf commented Jan 15, 2018

All four sites either issue API requests on the user's behalf if the required credentials are provided (refresh-token for DeviantArt and Reddit; access-token and access-token-secret for Flickr and Tumblr) which requires API key+secret + said user-credentials, or they fall back to "public" API access which only needs an API key and maybe an API secret. OAuth2 calls this "client credentials grant". Tumblr and Flickr, using OAuth1.0a, don't support this directly and let you use just your API key instead.

The API secret (and access token secret) in OAuth1.0 is used to generate an oauth_signature value to sign a request and basically tell the API servers that said request has been issued by an authorized client (see OAuthSession.sign(), specifically line 496).

The API key and secret values are read from the config to allow users to specify their own API credentials in case gallery-dl's default values become invalid or something in that regard. Retrieving the API secret is actually fairly pointless, at least for right now, as this would only be useful in conjunction with user-credentials retrieved through oauth:tumblr while using a custom API key+secret, which has not been implemented yet.

@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 15, 2018

Uh, okay. Not sure if I really understand that correctly right now.

Retrieving the API secret is actually fairly pointless, at least for right now, as this would only be useful in conjunction with user-credentials retrieved through oauth:tumblr while using a custom API key+secret [..]

Isn't that the point? API Key + API Secret in conjunction with user-credentials? I mean, depends on what exactly is meant with being pointless here. Maybe I am misunderstanding the exact API limitations?
Depends on what exactly happens when doing API requests on the user's behalf, I guess? Some authorized client, in essence, but without custom API Key and API Secret, the requests are always being made by the same client? Isn't that increasing the risk of running into API limitations rather quickly?

@mikf
Copy link
Owner

mikf commented Jan 16, 2018

I called it "pointless" because it wasn't properly implemented on galler-dl's side and wouldn't have worked as it should have.

oauth:flickrtumblr would only get a token-pair associated with gallery-dl's default api-key and -secret. Using these tokens together with a custom api-key and -secret generates an invalid oauth_signature value which gets rejected by Tumblr's API servers.

91ed147 allows the authorization step (gallery-dl oauth:tumblr) to use your custom api-key and -secret. What you should do to get this to work:

  • set api-key and api-secret to your own custom values
  • get a new pair of access-token and access-token-secret by doing the oauth:tumblr step again
  • put these two into your config file as well

And regarding API limitations: Tumblr allows for 1000 request per hour, 5000 requests per day per application, which is always the "gallery-dl Application" registered on Tumblr (unless you set your own api-key and -secret). There havn't been any complaints about rate-limits as of yet, so I guess this is fine for now.

@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 16, 2018

oauth:flickr would only get a token-pair associated with gallery-dl's default api-key and -secret. Using these tokens together with a custom api-key and -secret generates an invalid oauth_signature value which gets rejected by Tumblr's API servers.

Why oauth:flickr? Or did you mean tumblr?

91ed147 allows the authorization step (gallery-dl oauth:tumblr) to use your custom api-key and -secret. What you should do to get this to work:

Great, thanks. Will try that later. I have my api-key and api-secret ready, and I need new tokens again, as expected.

And regarding API limitations: Tumblr allows for 1000 request per hour, 5000 requests per day per application, which is always the "gallery-dl Application" registered on Tumblr (unless you set your own api-key and -secret). There havn't been any complaints about rate-limits as of yet, so I guess this is fine for now.

Yes, I think now we're getting to the gist of it, because that's the reason why I've asked initially.
Requests per hour / requests per day per application, which means the "gallery-dl Application" as default. So, this is the real interesting question, how requests are counted by the Tumblr API: By Access Token & Secret, or by API Key & Secret. Because, and please correct me if I am wrong here, the predefined values that currently exist in tumblr.py are basically the identity of "gallery-dl Application". As such, it can be identified by the Tumblr API. And if they really limit only per application and disregard Access Token & Secret for counting here, I think that this is a barrier we'll run into pretty quickly. Because all users of gallery-dl simultaneously use the "gallery-dl Application", unless they use their own API Key & Secret, which was just changed with the latest commit, right?
If I remember correctly, the posts API endpoint returns 50 posts per request right now (which is also the allowed maximum, if I'm not mistaken). Considering that many blogs on Tumblr easily have 20 000 - 40 000 posts (sometimes even more, I've seen some with 50k, 60k, and even 80k), those limits seem rather low to me. Taking just one blog with 40000 posts for example, that means we would have to make 800 requests alone for the posts, so this would be our minimum value. And that would just be one blog. And one user of gallery-dl..

@mikf
Copy link
Owner

mikf commented Jan 17, 2018

Ah, yes, I meant tumblr. Sorry about that.

I'm pretty sure the rate limit is applied per API key and all users of gallery-dl can only issue 5000 API requests per day in total when using the default key. Seems to be enough at the moment and there is always the option to apply for rate limit removal. I doubt this is going to have much success and I really don't know how I should convince them, but who knows.

Another option would be to go the same route as ripme and have multiple API Keys and choose one at random for every extractor run. This wouldn't work very well with the whole OAuth and Access Token thing, but you could require all users who want to use an Access Token to also supply their own API Key.

@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 20, 2018

@mikf Uh, before I forget it completely..

Okay, I used some old API Key & Secret of mine. When doing the OAuth authorization request, you can see the "application" it originates from, and in this case, it was no longer "gallery-dl Application".
Some old cryptic name I've used instead..
Tried that with some "hidden" blogs, and everything seems good, so, can confirm it's working.
👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants