Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Weibo] "快转" and retweets not properly recognized #3874

Open
gq20110204 opened this issue Apr 4, 2023 · 2 comments
Open

[Weibo] "快转" and retweets not properly recognized #3874

gq20110204 opened this issue Apr 4, 2023 · 2 comments

Comments

@gq20110204
Copy link

gq20110204 commented Apr 4, 2023

I don't speak English, so I used machine translation, but I don't know if the meaning is correct.

English Ver.

Both 1 and 2 have "retweets=false" set, but it doesn't work for them

  1. Fast retweet is not recognized (someone mentioned issues before [Bug] Weibo "快转" 'fast retweet' treated as a post by the retweeting user #2825)
    It's very special and almost identical in structure to the original tweet
    Sample reference: https://weibo.com/ajax/statuses/mymblog?uid=2553930725&page=8&feature=0
    You can find it by searching for "快转"
  2. Some retweets are not excluded
    I found a temporary workaround
    Modify extractor/weibo.py line 64
# Exclude fast retweet (fast retweet have a feature "ai_play_picture_type")
if "ai_play_picture_type" not in status["comment_manage_info"]:
    # Fix retweeted missing judgment
    if "retweeted_status" not in status:
        self._extract_status(status, files)
  1. If a user has a lot of tweets, tweets from earlier years will not be downloaded (after about 100 pages/200 posts)
    But if you use the video and album links separately, you can download the whole thing.
  2. The problem introduced by the recent new version (display problem does not affect the normal download)
    Some of the earlier Weibo videos were published in the form of external links and were not part of Weibo videos, but were not correctly excluded.
    Reference sample: https://weibo.com/ajax/statuses/mymblog?uid=5391508520&page=10&feature=0
    Search for "h5_url" and you will see that it is not the same as the new Weibo now, it has external links to third-party websites, and the clarity is empty.

中文版

1和2两个问题均设置了"retweets=false" 但对他们并未生效

  1. 快转视频未被正确识别(之前有人提过issues [Bug] Weibo "快转" 'fast retweet' treated as a post by the retweeting user #2825
    它很特别 结构上和原创微博几乎一模一样
    参考样本: https://weibo.com/ajax/statuses/mymblog?uid=2553930725&page=8&feature=0
    搜索"快转"可以找到它
  2. 部分转发视频未被排除
    我找到了一个临时解决方法
    修改 extractor/weibo.py 第64行
# 排除快转(快转的推文有个特征 "ai_play_picture_type")
if "ai_play_picture_type" not in status["comment_manage_info"]:
    # 修复转发缺少判断
    if "retweeted_status" not in status:
        self._extract_status(status, files)
  1. 如果一个用户的微博很多 发布年份较早的微博就无法被下载(约100页/200条后)
    但单独用视频和相册的链接分别下载一遍就能下载完整
  2. 最近新版本引入的问题(显示问题 并不影响正常下载)
    早期的微博 部分视频用了外链形式发布 不属于微博视频 但未被正确排除
    参考样本:https://weibo.com/ajax/statuses/mymblog?uid=5391508520&page=10&feature=0
    搜索"h5_url" 会发现与现在的新微博不一样 它外链了第三方网站 清晰度为空
@gq20110204 gq20110204 changed the title [Weibo] Fast retweet and retweets not properly recognized [Weibo] "快转" and retweets not properly recognized Apr 4, 2023
@mikf mikf added the site:bug label Mar 6, 2024
mikf added a commit that referenced this issue Mar 7, 2024
- handle 快转 retweets
- disable 'retweets' by default
- skip all retweet media when 'retweets' are disabled
- extract all retweet media when 'retweets' is set to "original"
@mikf
Copy link
Owner

mikf commented Mar 7, 2024

  1. fixed in ace16f0

  2. same

  3. See Weibo download incomplete #4168

    But if you use the video and album links separately, you can download the whole thing.

    You could use the include option to download from both tabtype=video and tabtype=album with https://weibo.com/USER as input URL.

  4. haven't look into this yet

@gq20110204
Copy link
Author

  1. fixed in ace16f0

  2. same

  3. See Weibo download incomplete #4168

    But if you use the video and album links separately, you can download the whole thing.

    You could use the include option to download from both tabtype=video and tabtype=album with https://weibo.com/USER as input URL.

  4. haven't look into this yet

English Ver.

Thank you for fixing the issue

A new issue was discovered while using tabtype=video
The video list may include videos that are not posted by the blogger themselves, but they have not been excluded

sample: https://weibo.com/u/6989546088?tabtype=video
sample video: 新西兰,一个孤独与梦幻的国家!
weibo id: 4690184971224486
api request sample: https://weibo.com/ajax/profile/getprofilevideolist?uid=6989546088&cursor=0

中文版

感谢修复问题

在使用 tabtype=video 时发现了一个新问题
视频列表中会包含不是博主本人发布的视频 但没有进行排除

样本: https://weibo.com/u/6989546088?tabtype=video
样本视频: 新西兰,一个孤独与梦幻的国家!
微博id: 4690184971224486
api请求样本: https://weibo.com/ajax/profile/getprofilevideolist?uid=6989546088&cursor=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants