Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twitter bug - gy-dl v1.24.2 skips some posts omitting unique images, gives a reason "quoted tweet" #3439

Closed
lesfe opened this issue Dec 21, 2022 · 10 comments · Fixed by #3455
Labels

Comments

@lesfe
Copy link

lesfe commented Dec 21, 2022

> gallery-dl --option output.skip=false -v https://twitter.com/xinzoruo/ 1>ok.log 2>&1

[gallery-dl][debug] Version 1.24.2
[gallery-dl][debug] Python REDACTED - Linux-REDACTED
[gallery-dl][debug] requests REDACTED - urllib3 REDACTED 
[gallery-dl][debug] Configuration Files []
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/xinzoruo/'
[twitter][debug] Using TwitterTimelineExtractor for 'https://twitter.com/xinzoruo/'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/7mjxD3-C6BxitPMVQ6w0-Q/UserByScreenName?variables=%7B%22screen_name%22%3A%22xinzoruo%22%2C%22withSafetyModeUserFields%22%3Atrue%2C%22withSuperFollowsUserFields%22%3Atrue%7D HTTP/1.1" 200 1019
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/nRybED9kRbN-TOWioHq1ng/UserMedia?variables=%7B%22userId%22%3A%221507705239507329034%22%2C%22count%22%3A100%2C%22includePromotedContent%22%3Afalse%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withBirdwatchPivots%22%3Afalse%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withClientEventToken%22%3Afalse%2C%22withBirdwatchNotes%22%3Afalse%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Afalse%2C%22__fs_interactive_text%22%3Afalse%2C%22__fs_dont_mention_me_view_api_enabled%22%3Afalse%7D HTTP/1.1" 200 61572
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/nRybED9kRbN-TOWioHq1ng/UserMedia?variables=%7B%22userId%22%3A%221507705239507329034%22%2C%22count%22%3A100%2C%22includePromotedContent%22%3Afalse%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withBirdwatchPivots%22%3Afalse%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withClientEventToken%22%3Afalse%2C%22withBirdwatchNotes%22%3Afalse%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Afalse%2C%22__fs_interactive_text%22%3Afalse%2C%22__fs_dont_mention_me_view_api_enabled%22%3Afalse%2C%22cursor%22%3A%22HBaMgL39hOeCpysAAA%3D%3D%22%7D HTTP/1.1" 200 63048
[twitter][debug] Skipping 1523267334663667713 (quoted tweet)
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/nRybED9kRbN-TOWioHq1ng/UserMedia?variables=%7B%22userId%22%3A%221507705239507329034%22%2C%22count%22%3A100%2C%22includePromotedContent%22%3Afalse%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withBirdwatchPivots%22%3Afalse%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withClientEventToken%22%3Afalse%2C%22withBirdwatchNotes%22%3Afalse%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Afalse%2C%22__fs_interactive_text%22%3Afalse%2C%22__fs_dont_mention_me_view_api_enabled%22%3Afalse%2C%22cursor%22%3A%22HBaCgKPRhdr2sSoAAA%3D%3D%22%7D HTTP/1.1" 200 23460
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/nRybED9kRbN-TOWioHq1ng/UserMedia?variables=%7B%22userId%22%3A%221507705239507329034%22%2C%22count%22%3A100%2C%22includePromotedContent%22%3Afalse%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withBirdwatchPivots%22%3Afalse%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withClientEventToken%22%3Afalse%2C%22withBirdwatchNotes%22%3Afalse%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Afalse%2C%22__fs_interactive_text%22%3Afalse%2C%22__fs_dont_mention_me_view_api_enabled%22%3Afalse%2C%22cursor%22%3A%22HBaCwKiBo5jWjCoAAA%3D%3D%22%7D HTTP/1.1" 200 300
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&send_error_codes=true&simple_quoted_tweet=true&count=100&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2CsuperFollowMetadata&q=from%3Axinzoruo+max_id%3A1516776707532853249+filter%3Alinks&tweet_search_mode=live&query_source=typed_query&pc=1&spelling_corrections=1 HTTP/1.1" 200 2943
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&send_error_codes=true&simple_quoted_tweet=true&count=100&cursor=scroll%3AthGAVUV0VFVBaCwKiBo5jWjCoWgsCogaOY1owqEnEV7Ih6FYCJehgHREVGQVVMVDUBFQAVAAA%3D&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2CsuperFollowMetadata&q=from%3Axinzoruo+max_id%3A1516776707532853249+filter%3Alinks&tweet_search_mode=live&query_source=typed_query&pc=1&spelling_corrections=1 HTTP/1.1" 200 692

The post 1523267334663667713 got skipped, even though it contains unique images - that is undesireable.

https://twitter.com/xinzoruo/status/1523267334663667713

xinzoruo

@rautamiekka
Copy link
Contributor

rautamiekka commented Dec 22, 2022

[gallery-dl][debug] Python REDACTED - Linux-REDACTED
[gallery-dl][debug] requests REDACTED - urllib3 REDACTED

Those are some of the most pointless redactions I've ever seen, even hurting in some cases.

@lesfe
Copy link
Author

lesfe commented Dec 22, 2022

Scrolling down on xinzuoro's timeline to around that post this is what I see:

May 7 https://twitter.com/xinzoruo/status/1522904349050949632
May 8 https://twitter.com/xinzoruo/status/1523267334663667713

(Just mentioning this since it might be a clue to what's "quoting" what and why.)

[gallery-dl][debug] Python REDACTED - Linux-REDACTED
[gallery-dl][debug] requests REDACTED - urllib3 REDACTED

Those are some of the most pointless redactions I've ever seen, even hurting in some cases.

I'd like my friends to remain ignorant of the fact that I download images of nude anime chicks from the internet.

@lesfe lesfe changed the title twitter extractor bug - gy-dl v1.24.2 skips some posts omitting unique images, gives a reason "quoted tweet" twitter bug - gy-dl v1.24.2 skips some posts omitting unique images, gives a reason "quoted tweet" Dec 22, 2022
@rautamiekka
Copy link
Contributor

I'd like my friends to remain ignorant of the fact that I download images of nude anime chicks from the internet.

None of their business. You do what you want, and downloading nude anime chicks is the least of problems. You'd do well to drop such idiots and stopping being a doormat.

@ClosedPort22
Copy link
Contributor

The skip is not due to output.skip. It's because you did not enable extractor.twitter.quoted, so -o quoted=true should work. If an image was skipped due to being considered a duplicate, you'd see a "# path/to/image.jpg" line in the output.

@lesfe
Copy link
Author

lesfe commented Dec 22, 2022

The skip is not due to output.skip. It's because you did not enable extractor.twitter.quoted, so -o quoted=true should work. If an image was skipped due to being considered a duplicate, you'd see a "# path/to/image.jpg" line in the output.

I used output.skip the make the log more brief.

It's nice that there's a setting that I can enable to get the desired result in this particular case, but I'm reporting this issue at all because I belive that there is a problem with gallery-dl, because it doesn't do what I expect it to do - when I type gallery-dl https://twitter.com/xinzoruo, I expect gallery-dl to pull all of the illustrations that xinzoruo uploaded to twitter and put them in a folder on my hard drive. It doesn't. And now you tell me if maybe extractor.twitter.quoted should be set to true by default or if there's something else that should be done. Or if nothing should be done because this is a quirk of twitter that's too hard to deal with.

@Hrxn
Copy link
Contributor

Hrxn commented Dec 23, 2022

What should be done, in the future, by everyone, as a first step...

  1. Visit this page: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst
  2. Ctrl+F, and search for .twitter, to quickly check options for this extractor, and their defaults
  3. ??
  4. Profit

@ClosedPort22
Copy link
Contributor

I'm reporting this issue at all because I belive that there is a problem with gallery-dl, because it doesn't do what I expect it to do - when I type gallery-dl https://twitter.com/xinzoruo, I expect gallery-dl to pull all of the illustrations that xinzoruo uploaded to twitter and put them in a folder on my hard drive.

This behavior is documented. I wouldn't say it's a problem when there's documentation for it. Not sure why quoted is set to false by default, but it could be because otherwise gallery-dl would fetch media from other users' tweets.

@a84r7a3rga76fg
Copy link

What does 1>ok.log 2>&1 do?

@rautamiekka
Copy link
Contributor

1>ok.log 2>&1

"Send stdout (short for standard output) into ok.log, and send stderr (short for standard error) into what stdout is sent to", assuming the app doesn't explicitly send anything to stdout, such as when dumping the JSON. In other words: save all non-explicitly defined output into ok.log.

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Dec 24, 2022

I've just checked the source code again, and it appears to me that 1523267334663667713 actually refers to the original tweet which the author themselves quoted. Although the quoted tweet triggered gallery-dl's skipping mechanism, it should be possible to fetch the original tweet as you go further down the timeline.

EDIT: Hm, I couldn't find the metadata of the original tweets in my downloads. Perhaps this has something to do with 1d14928?

EDIT 2: This definitely has something to do with 1d14928. I rearranged the if statements a little bit, and now it should correctly fetch the original tweet when quoted=false. I'll open a PR if further tests show no regressions.

ClosedPort22 added a commit to ClosedPort22/gallery-dl that referenced this issue Dec 24, 2022
Do not consider a tweet seen before applying 'retweet', 'quote' and
'reply' checks. Otherwise the original tweets will also be skipped if
the "derivative" tweets and the original tweets are from the same user.
@mikf mikf added the bug label Jan 6, 2023
@mikf mikf closed this as completed in #3455 Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants