Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Twitter] Download all tweets to .jsons, not tweets with media #2588

Closed
deepsummersee opened this issue May 14, 2022 · 4 comments
Closed

[Twitter] Download all tweets to .jsons, not tweets with media #2588

deepsummersee opened this issue May 14, 2022 · 4 comments

Comments

@deepsummersee
Copy link

Is there a way to download an entire Twitter profile's tweets (including replies) as .json files, similar to what the --write-metadata flag does, but not exclusive to only media tweets? I'm referring to something similar to what a site like https://www.vicinitas.io/free-tools/download-user-tweets does.

@aleksusklim
Copy link

"text-tweets": true,
Refer to mine #2549

@deepsummersee
Copy link
Author

Followed the config syntax in your issue, works like a charm. For the record, I had to specify the config file location with the -c option despite it being in my %APPDATA%\gallery-dl\, but no issues otherwise. Thanks.

@aleksusklim
Copy link

Also I wanted to know width and height of each image/video in metadata. The only reasonable way I could achieve it – was by using temporary files just to hold media dimensions in their name:

  "postprocessors": [{
      "name": "metadata",
      "event": "post",
      "filename": "{tweet_id}.json",
      "directory": "tweet"
  },{
      "name": "metadata",
      "event": "prepare",
      "filename": "{tweet_id}_{num}.{extension}.{width}x{height}.tmp",
      "directory": "meta",
      "mode": "custom",
      "content-format": " "
  }]

It writes empty files (they actually contain 1 whitespace) for the sole purpose for me to parse them later as a part of directory traversal step, mapping their names to actual images there. I couldn't find a way to output each image metadata to JSON of the post itself; gallery-dl can output media dimensions only to separate jsons.

Since each .json is around 3 Kb, while NTFS sector size is 4 Kb, there is no point in compressing them whatsoever. But, actually outputting 1 byte to each .tmp (instead of entire JSON) file results in them taking 0 bytes of disk space! (Instead of 4 Kb, keeping everything in MFT).

Finally, the slowest part of my script was not directory traversal (with width&height parsing), but opening and reading all of posts' jsons. So I made a caching mechanism, storing in one big JSON everything I need from all of these small jsons (filtering fields that I'm not interested in). That was a huge performance tweak: each next scan will read only those jsons that weren't cached before, dropping caches of now non-existed jsons.

@allendema
Copy link
Contributor

@aleksusklim

opening and reading all of posts' jsons

Create one jsonl file with a valid json in every line, you can tweak the fields:
#3137 (comment)

That config (you can upgrade it) also applies to your question in #2624 (comment):

can gallery-dl optionally append to metadata file, instead of rewriting it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants