Skip to content

Commit 05764f3

Browse files
committed
fix: rate limit of GraphQL search api
fix: reset poll count to 0 in online api update: archive script update: readme about archive script
1 parent 7ad0d56 commit 05764f3

File tree

6 files changed

+376
-276
lines changed

6 files changed

+376
-276
lines changed

README.MD

+6-6
Original file line numberDiff line numberDiff line change
@@ -76,10 +76,10 @@ This repository included `core/crawler/api/scripts`, frontend repository is [her
7676

7777
### Archiver
7878

79-
* archive userinfo, nearly all tweets(not included **reply**) and nearly **ALL MEDIA**(included avatar and banner) by search api
80-
* TODO: spaces, boradcast, mix media (like image and video in the same tweet)
79+
* archive userinfo, most tweets(included **reply**) and nearly **ALL MEDIA**(included avatar and banner) by search api, `Following` and `Followers`
80+
* TODO: spaces, boradcast (ffmpeg command)
8181
* **PLEASE PRECHECK THE ACCOUNT HAVEN BEEN SEARCHBAN**
82-
* **DO NOT EXECUTE `init.sh` UNTIL YOU BACKUP ALL ARCHIVED DATA IN FOLDER './twitter_archiver/'**, it will clean all archived data
82+
* Read more in [archiver/README.md](https://github.com/BANKA2017/twitter-monitor/tree/node/apps/archiver)
8383

8484
### CloudFlare Workers
8585

@@ -215,10 +215,10 @@ Those are Chinese articles
215215

216216
### Archiver
217217

218-
* 通过**搜索API**备份帐号的用户信息,几乎所有推文(不包括回复)以及几乎所有的媒体文件(包括当前头像和banner)
219-
* TODO: 备份最近30天的空间、播客(暂不清楚怎么搞)、混合媒体(差不多就是同一条推文上有两个或以上的视频或者图片和视频放一起
218+
* 通过**搜索API**备份帐号的用户信息,大多数推文(包括回复)和媒体文件(包括当前头像和banner)`Following``Followers`
219+
* TODO: 备份Spaces、播客(生成 ffmpeg 命令
220220
* **使用前请检查待备份帐号是否被搜索封禁**
221-
* **在备份好'./twitter_archiver/' 文件夹的内容前请不要运行 `init.sh`**,它将会清除掉相应文件夹的内容
221+
* 使用方式请阅读 [archiver/README.md](https://github.com/BANKA2017/twitter-monitor/tree/node/apps/archiver)
222222

223223
### CloudFlare Workers
224224

apps/archiver/README.md

+72
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
Archiver
2+
---
3+
4+
## ⚠ WARNING
5+
6+
- We cannot guarantee that these features will be available.
7+
- When archiving someone else's Twitter, please obtain their permission first.
8+
- The structure of the generated data is still being adjusted, and the current results may not be available in viewer.
9+
10+
11+
## Known issues
12+
- Unable to crawl most of the retweets.
13+
- Unable to crawl tweets marked as sensitive content (TODO login can solve).
14+
- Unable to crawl copyrighted media files in some region.
15+
- Some videos are damaged, which is normal. Downloading the corresponding m3u8 will result in a lower quality version.
16+
- Unable to crawl tweets from protected/banned/deleted users.
17+
- The rate limit status after logging in will follow the account rather than the guest token (TODO not implemented yet).
18+
19+
## Features
20+
21+
- Userinfo (not included the author of the quoted tweet).
22+
- Tweets and replies can be searched anonymously, not included most retweets.
23+
- Polls
24+
- Avatar, banner, photos and videos.
25+
- Following and followers list (optional)
26+
- Keep raw data for future used.
27+
28+
## TODO
29+
30+
- Space and Broadcast with ffmpeg
31+
- Login by **COOKIE**
32+
- Incremental update tweets/followers/following list
33+
34+
## Init
35+
36+
- Execute command:
37+
38+
```shell
39+
#bash
40+
bash init.sh <screen_name> # like 'twitter'
41+
#or powershell
42+
.\init.ps1 <screen_name>
43+
```
44+
45+
A folder named `screen_name` will be created. If the folder `screen_name` already exists, you will be prompted to delete or rename the folder.
46+
47+
## Run
48+
49+
### Crawler
50+
51+
```shell
52+
node archive.mjs [OPTION]
53+
```
54+
|Parameter|Required|Description|
55+
|:--|:--|:--|
56+
|--all|Optional|All data (UserInfo, Tweets, Following, Followers)|
57+
|--followers|Optional|Get Followers|
58+
|--following|Optional|Get Following|
59+
|--media|Optional|Get Media|
60+
|--skip_\<key of argvList \>|Optional|Key of argvList included `user_info_and_tweets`, `followers`, `following` and `media`. Will skip the corresponding job.|
61+
62+
### Retry media
63+
64+
```shell
65+
node retryMedia.mjs
66+
```
67+
68+
Attempt to retrieve the failed images during crawling. (useless)
69+
70+
## View
71+
72+
The front-end project is currently under development and if it is ready, it might be available in <https://github.com/BANKA2017/twitter-archive-viewer>.

0 commit comments

Comments
 (0)