Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter cursor scroll broken, can no longer retrieve more than a handful of tweets at a time #1170

Closed
God-damnit-all opened this issue Dec 10, 2020 · 26 comments
Labels

Comments

@God-damnit-all
Copy link
Contributor

God-damnit-all commented Dec 10, 2020

This error is occuring whenever the cursor=scroll parameter is added to the extraction URL:

[twitter][error] 403 Forbidden ("This request requires a matching csrf cookie and header.")

@God-damnit-all God-damnit-all changed the title Twitter cursor scroll for search results seems to be broken [URGENT] Twitter cursor scroll broken, can no longer retrieve more than a handful of tweets at a time Dec 10, 2020
@God-damnit-all
Copy link
Contributor Author

This seems to be inconsistent, so I'm starting to think it's some new way of throttling requests...

@biznizz
Copy link

biznizz commented Dec 11, 2020

Is happening to me too, even when I put my username and password into my config instead of using my exported cookies.

@mikf mikf closed this as completed in a00b60f Dec 11, 2020
@mikf
Copy link
Owner

mikf commented Dec 11, 2020

This should have only been a problem when using the built in login functionality. Exported cookies with a new ct0 cookie or anonymous requests would still have worked.

By the way, someone on Gitter recently posted this link. Apparently Twitter is removing its nojs interface on the 15th, which might break any new login attempts by gallery-dl (cached sessions should keep working though).

@mikf mikf reopened this Dec 11, 2020
@mikf mikf pinned this issue Dec 11, 2020
@mikf mikf added the fixed label Dec 11, 2020
@mikf mikf changed the title [URGENT] Twitter cursor scroll broken, can no longer retrieve more than a handful of tweets at a time Twitter cursor scroll broken, can no longer retrieve more than a handful of tweets at a time Dec 11, 2020
@God-damnit-all
Copy link
Contributor Author

@mikf So just to clarify, after the 15th, we just export a cookie file for gallery-dl to use and everything will be fine?

@mikf
Copy link
Owner

mikf commented Dec 11, 2020

Either that, or you keep using username and password like before and gallery-dl will keep using its cached Twitter login session (*). Only new logins will most likely not work.

Me figuring out how to do a login on the regular login page would also be a solution, but, just like the last time I tried, I still don't know how to get a valid authenticity_token value.

(*) it's possible to extend the "lifetime" of a cached Twitter session by at least 4 years by modifying the expires timestamp in the cache db. Twitter session cookies are valid for 5 years, but gallery-dl renews them after one year, just to be safe.

@hellupline
Copy link
Contributor

I use cookies with /media , and I am experiencing the same issue

@God-damnit-all
Copy link
Contributor Author

I use cookies with /media , and I am experiencing the same issue

It's been fixed, you'll need to use the dev version or wait for a stable release update.

@hellupline
Copy link
Contributor

hellupline commented Dec 12, 2020

It's been fixed, you'll need to use the dev version or wait for a stable release update.

sorry to be annoying about this, but its on master branch ?, because I just installed from master, just extract cookies from twitter.com and still getting this issue

also, I noticed this behaviour is random,some times I get the first "batch", sometimes I get denied right on first request.

┌─[✔]─[1:47]─[hellupline/stuff_06/hellupline]
└──$ python3 -m pip install --user --upgrade https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Using cached https://github.com/mikf/gallery-dl/archive/master.tar.gz
Requirement already satisfied: requests>=2.11.0 in /usr/lib/python3.8/site-packages (from gallery-dl==1.16.0.dev0) (2.24.0)
Requirement already satisfied: chardet>=3.0.2 in /usr/lib/python3.8/site-packages (from requests>=2.11.0->gallery-dl==1.16.0.dev0) (3.0.4)
Requirement already satisfied: idna>=2.5 in /usr/lib/python3.8/site-packages (from requests>=2.11.0->gallery-dl==1.16.0.dev0) (2.10)
Requirement already satisfied: urllib3>=1.21.1 in /home/hellupline/.local/lib/python3.8/site-packages (from requests>=2.11.0->gallery-dl==1.16.0.dev0) (1.25.11)
┌─[✔]─[1:47]─[hellupline/stuff_06/hellupline]
└──$ gallery-dl --cookie twitter.txt -o skip=true https://twitter.com/owakita_  
[twitter][error] 403 Forbidden ("This request requires a matching csrf cookie and header.")
┌─[✗]─[1:47]─[hellupline/stuff_06/hellupline]

config file I used:

{
    "cache": {"file": "/run/mount/hellupline/stuff_06/hellupline/data-hoarder/_scripts/db/gallery_dl-cache.db"},
    "extractor": {
        "base-directory": "/run/mount/hellupline/stuff_06/hellupline/data-hoarder/",
        "archive": "/run/mount/hellupline/stuff_06/hellupline/data-hoarder/_scripts/db/gallery_dl.db",
        "cookies": "/run/mount/hellupline/stuff_06/hellupline/data-hoarder/_scripts/cookies-twitter.txt",
        "skip": "abort:3",
        "user-agent": "Python:gallery-dl:0.8.4 (by /u/mikf1)",
        "twitter": {
            "directory": ["{category}", "{subcategory}", "{user[name]}", "{tweet_id}"],
            "filename": "{tweet_id}_{num:>05}.{extension}",
            "archive-format":"{tweet_id}_{retweet_id}_{num}",
            "retweets": false,
            "replies": true,
            "quoted": true,
            "videos": true,
            "twitpic": true
        },
        "postprocessors": [{"name": "metadata", "mode": "json"}]
    },
    "downloader": {
        "http": {
            "adjust-extensions": true,
            "mtime": true,
            "rate": null,
            "retries": 4,
            "timeout": 30.0,
            "verify": true,
            "enabled": "true"
        },
        "part": true,
        "part-directory": null
    },
    "output": {
        "mode": "auto",
        "progress": true,
        "shorten": false,
        "log": "[{name}][{levelname}] {message}",
        "unsupportedfile": {
            "path": "/run/mount/hellupline/stuff_06/hellupline/data-hoarder/_scripts/logs/gallery_dl-unsupported-file.log",
            "mode": "a"
        },
        "logfile": {
            "path": "/run/mount/hellupline/stuff_06/hellupline/data-hoarder/_scripts/logs/gallery_dl.log",
            "mode": "a",
            "level": "debug"
        }
    }
}

@God-damnit-all
Copy link
Contributor Author

I'm assuming you cleared all your cookies related to twitter in your browser and then logged back in and re-exported them?

@hellupline
Copy link
Contributor

I'm assuming you cleared all your cookies related to twitter in your browser and then logged back in and re-exported them?

yes I did

my guess is: because they are deprecating the old no-js site, they are rolling an update with requires something else, using some blue-green deployment, thats why I get an error, but sometimes I get the expected response

also, I am getting those errors on both /<username> and /<username>/media

@biznizz
Copy link

biznizz commented Dec 12, 2020

Huh. I just downloaded the latest dev build and reset my twitter extractor back to using a cookies.txt and stopped getting the error.

At least, I haven't re-encountered it yet.

@God-damnit-all
Copy link
Contributor Author

God-damnit-all commented Dec 12, 2020

@mikf There's something very wrong and I'm not sure if this change is related. Even though in my account settings, it's set to display sensitive content and to not hide it in searches, I'm no longer getting those tweets during extraction unless I remove my login information from my config file.

Switching to a cookie file does not help.

EDIT: Yeah, this change must be related, since I've looked through other accounts I've extracted just before this change and they downloaded everything just fine. It's just anything that was downloaded after this change.

@mikf
Copy link
Owner

mikf commented Dec 12, 2020

Even though in my account settings, it's set to display sensitive content and to not hide it in searches

Your search settings (not your account settings) look like this?
Because before unticking those boxes just now, I had the same behavior as you described.
screenshot

And I'm fairly certain a00b60f has nothing to do with this. All it does is update the x-csrf-token header when necessary, which is only used to detect and prevent Cross-Site Request Forgery.

@God-damnit-all
Copy link
Contributor Author

Even though in my account settings, it's set to display sensitive content and to not hide it in searches

Your search settings (not your account settings) look like this?
Because before unticking those boxes just now, I had the same behavior as you described.
screenshot

And I'm fairly certain a00b60f has nothing to do with this. All it does is update the x-csrf-token header when necessary, which is only used to detect and prevent Cross-Site Request Forgery.

Yes, they look like that. And this is happening with retrieving just a normal link, e.g. twitter.com/username, I haven't actually tested to see if it's affecting search results yet or not. And I do have this enabled.

image

@mikf
Copy link
Owner

mikf commented Dec 12, 2020

So for example for this Tweet you don't get any images?

$ gallery-dl -u USERNAME https://twitter.com/i/web/status/1337783924555010053
/tmp/twitter/IllustOgre/1337783924555010053_1.jpg
/tmp/twitter/IllustOgre/1337783924555010053_2.jpg

or from the account?

$ gallery-dl -u USERNAME https://twitter.com/IllustOgre
/tmp/twitter/IllustOgre/1337808490211864577_1.jpg
/tmp/twitter/IllustOgre/1337808490211864577_2.jpg
/tmp/twitter/IllustOgre/1337783924555010053_1.jpg  # from the tweet above
/tmp/twitter/IllustOgre/1337783924555010053_2.jpg
/tmp/twitter/IllustOgre/1337764924982849536_1.jpg
/tmp/twitter/IllustOgre/1337717486234468352_1.jpg
...

@God-damnit-all
Copy link
Contributor Author

Hang on a second. There's something strange going on.

@mikf
Copy link
Owner

mikf commented Dec 12, 2020

1337808490211864577 is a retweet of 1337783924555010053, which is why 1337783924555010053 got downloaded twice with retweets=original.

But the important thing is 1337783924555010053 is considered sensitive content.

@God-damnit-all
Copy link
Contributor Author

Yeah, I realized that and I deleted my reply. It's not happening for that account, but it's happening for a different one that I'm skittish about posting here. I'm trying to get down to the bottom of this.

@God-damnit-all
Copy link
Contributor Author

God-damnit-all commented Dec 12, 2020

Fuck me, that isn't it either, I forgot to revert my config. This is driving me crazy. Do you have an email? The account I have for you to test is a bit embarrassing.

@mikf
Copy link
Owner

mikf commented Dec 12, 2020

https://github.com/mikf/gallery-dl/blob/master/gallery_dl/__init__.py#L15

... and I was just about done reverting that commit

The only Twitter related changes since 1.15.3 are

and none of them are responsible, it seems. Maybe a Twitter internal change?

@God-damnit-all
Copy link
Contributor Author

... and I was just about done reverting that commit

I'm very sorry for that. I forgot to hit 'save' on my test configuration and when I reverted that commit I thought I had fixed it. I'm a bit frazzled since I'm now worried that this has been a problem for a long time and I'm not sure how much of what I've downloaded has been affected.

I emailed you.

@mikf
Copy link
Owner

mikf commented Dec 12, 2020

Ok, I can see what you mean.
I also rolled back to v1.15.0 and it has the same behavior there. Weird.

@God-damnit-all
Copy link
Contributor Author

Ok, I can see what you mean.
I also rolled back to v1.15.0 and it has the same behavior there. Weird.

It doesn't really make any sense why it's behaving this way. I really hope this hasn't been affecting search results too.

@God-damnit-all
Copy link
Contributor Author

Ok, I can see what you mean.
I also rolled back to v1.15.0 and it has the same behavior there. Weird.

Could it be that it only affects tweets that are part of reply chains? (Even if the artist is replying to themselves.)

@mikf
Copy link
Owner

mikf commented Dec 13, 2020

@ImportTaste maybe open another, different issue for this, since this is definitely worth investigating.

@hellupline @biznizz v1.16.0 with the fix is out.
Sorry for the misinformation about exported cookies working with the old version. I forgot the old code would simply overwrite any ct0 cookie.

As for installing the dev version of gallery-dl: the old instructions (pip install --upgrade) don't work anymore(?) if you'd already installed a dev build before. I've updated them in the README to: pip install -U -I --no-deps --no-cache-dir

@mikf mikf closed this as completed Dec 13, 2020
@hellupline
Copy link
Contributor

can confirm, my previous confusion was because I was already using an dev build,

I solved it using ( and was going to send this here, but you already solved that, blazing fast mikf ):
python3 -m pip install --no-cache-dir --user --upgrade --force-reinstall https://github.com/mikf/gallery-dl/archive/master.tar.gz

@mikf mikf unpinned this issue Dec 17, 2020
mikf referenced this issue Dec 29, 2020
When logged in, some entries returned by Twitter's API are so called
'homeConversation's (they would be regular tweet entries otherwise.)

Those weren't picked up before and resulted in missing files compared
to accessing a timeline as guest.

('/media' timelines and search results were not affected)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants