Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter – emit current tweet metadata even with "retweets":"original"; also fix quoted retweets saving #2549

Open
aleksusklim opened this issue May 2, 2022 · 0 comments

Comments

@aleksusklim
Copy link

Twitter – emit current tweet metadata even with "retweets":"original"; also fix quoted retweets saving

I'm writing a local twitter gallery viewer script. Currently I use this configuration in gallery-dl.conf:

"twitter":
{
  "username": null,
  "password": null,
  "cards": true,
  "conversations": true,
  "pinned": true,
  "syndication": true,
  "quoted": true,
  "replies": true,
  "retweets": "original",
  "text-tweets": true,
  "twitpic": false,
  "users": "timeline",
  "videos": true,
  "postprocessors": [{
      "name": "metadata",
      "event": "post",
      "filename": "{tweet_id}.json",
      "directory": "tweets"
  }]
}

And this command line: gallery-dl.exe --cookies cookies-twitter-com.txt "https://twitter.com/USERNAME" (gallery-dl version 1.21.2)

The problem is, with "retweets":"original", gallery-dl puts metadata of retweets ONLY to their author's folder (along with actual media), where retweet_id and tweet_id will be the same number. It does not store .json for me in the main timeline that I am downloading.

If I will use "retweets":true, then I will get all .json metadata with distinct tweet_id and retweet_id, but unfortunately this downloads all media to current timeline folder (with unique tweet_id, rather than general retweet_id), which kills deduplication in case several users heavily retweet each other.

I found a workaround for this: invoke gallery-dl twice, the second time changing "retweets" to true, but disabling actual media downloads: -o retweets=true --no-download

Looks like this is doing what I want: it stores metadata-only files for retweets without quotes, where retweet_id points to original tweet and also returns extended user info in "author" (that will be different from "user").

But I found another inconsistency with quoted retweets (those with added text): even with "retweets":"original" they are getting saved into the current timeline rather than into the original author folder!

Media file has metadata with "tweet_id" of original author (that is being quoted), "user" of current timeline, "author" of original, "content" of quoted tweet (and not the added text), and "quote_by" pointing to real retweet: metadata for it will contain: "tweet_id" itself, "user" and "author" equal to current timeline, "content" is the added text, and "quote_id" pointing back to actual media/tweet.

I believe something is wrong here. From my point of view, quoted and direct retweets shouldn't have so much difference and must be handled similarly: if I want to store ALL media to current user folder ("retweets":true) then it should dump here all quoted and direct retweets; if I want to put ALL retweets to their authors folders – then it should do it for both quoted and direct retweets also.

My suggestions:

  1. Create another retweets option value to preserve backwards compatibility. For example, it can be "retweets":"all_original", defaulting to "retweets":"original" behavior.
  2. Emit metadata for direct retweets in "all_original" mode (or even "original" if you consider this as a bug) just as for "retweets":true.
  3. Handle quoted retweets just as direct retweets when "all_original" is set, saving media and metadata to author's folder only, dropping quote_by field.
  4. In quoted retweet metadata (which has quote_id) store "author" object from the original tweet (the one that had quote_by), just as for direct retweets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant