Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

主题帖和回复贴接口请求参数中的q_type=2到底是什么? #123

Closed
n0099 opened this issue May 19, 2023 · 25 comments
Closed

主题帖和回复贴接口请求参数中的q_type=2到底是什么? #123

n0099 opened this issue May 19, 2023 · 25 comments
Labels
discussion discussion

Comments

@n0099
Copy link

n0099 commented May 19, 2023

  1. 阁下于 May 26, 2022 v2.6.1 c5c208f#diff-4d330e17cd513d344a73ce66cfe7682bf66be5294b077a46f38fd4c9d61fa2dcL822 时删除了q_type=2

  2. 而我注意到最近16天内以 https://tieba.baidu.com/p/6616695318 为代表的如下主题帖请求某些页数的主题帖回复贴列表时返回了0条回复贴,这在以前通常代表请求的页数还不存在或主题帖已被删除
    https://tieba.baidu.com/p/3611123694
    https://tieba.baidu.com/p/4944438028
    https://tieba.baidu.com/p/5207410447
    https://tieba.baidu.com/p/5214988494
    https://tieba.baidu.com/p/5261166918
    https://tieba.baidu.com/p/6616695318
    https://tieba.baidu.com/p/6993372330
    https://tieba.baidu.com/p/7096366852
    https://tieba.baidu.com/p/7230997267
    https://tieba.baidu.com/p/7816662822
    https://tieba.baidu.com/p/7832611927
    https://tieba.baidu.com/p/7943900568
    https://tieba.baidu.com/p/7950129599
    https://tieba.baidu.com/p/8215005930
    https://tieba.baidu.com/p/8227942444
    https://tieba.baidu.com/p/8235105996
    https://tieba.baidu.com/p/8342345319
    https://tieba.baidu.com/p/8355529720
    https://tieba.baidu.com/p/8394353126
    https://tieba.baidu.com/p/8402224919
    https://tieba.baidu.com/p/8408045722
    https://tieba.baidu.com/p/8409361546
    https://tieba.baidu.com/p/8412021878
    https://tieba.baidu.com/p/8416955449
    https://tieba.baidu.com/p/8416972007
    https://tieba.baidu.com/p/8417800020
    https://tieba.baidu.com/p/8417882008
    https://tieba.baidu.com/p/8418007671
    https://tieba.baidu.com/p/8418615605
    但我通过访问贴吧网页端手动检测这些主题帖都存在对应的主题帖回复贴列表页数,并且这些主题帖大多都有着大量回复(以水楼为代表)

  3. https://github.com/n0099/TiebaMonitor/blob/e7d7240aebeee74d04f7b5d4748af69dff3ed5b0/client_tester.php#L41 修改为

    -            'pn' => $_GET['pn']
    +            'pn' => $_GET['pn'],
    +            'rn' => 30,
    +            'q_type' => 2

    以模仿 https://github.com/n0099/TiebaMonitor/blob/91e617a4940f222b0d9d4299c93146b84e2301d7/c%23/crawler/src/Tieba/Crawl/Crawler/ReplyCrawler.cs#L31 后可复现

    $ php-cgi client_tester_2.php client_version=12.26.1.0 type=replies tid=6616695318 pn=1 | sed -n '3,$p' | jq '.post_list | length'
    0

    而删除q_type=2参数后就是预期的30条回复贴
    值得注意的是参数名qtype也有着同样的效果(贴吧接口validate层会做命名转换?也可能的确有着两个alias别名)

  4. 还注意到在我的tbm生产环境数据库中的v吧表中有

    SELECT T.tid, T.replyCount, R.replyCount, SR.replyCount, R.replyCount + SR.replyCount,
           CAST(T.replyCount AS SIGNED) - (R.replyCount + SR.replyCount) AS diff
    FROM tbmc_f97650_thread AS T
        JOIN (SELECT COUNT(*) AS replyCount, tid FROM tbmc_f97650_reply GROUP BY tid) AS R
        JOIN (SELECT COUNT(*) AS replyCount, tid FROM tbmc_f97650_subReply GROUP BY tid) AS SR
    ON T.tid = R.tid AND T.tid = SR.tid AND T.replyCount != (R.replyCount + SR.replyCount)
    ORDER BY diff DESC;

    image
    https://tieba.baidu.com/p/7258962480 这样显著的贴吧报告的(在吧首页主题帖列表接口中返回的)主题帖回复贴+楼中楼数量远大于实际爬下来并存储于表中的回复贴+楼中楼数量,尽管一般而言两者相差几条是正常的,因为我记得以前删除回复贴/楼中楼后在吧首页主题帖列表中的主题帖回复贴+楼中楼数量并不会减少(因此当时是仅自增计数器)

  5. 所以我猜测q_type=2是某种会减少主题帖回复贴列表接口返回的回复贴数量的参数取值,考虑到其主要减少的是在水楼这样大量内容重复的回复贴上,这可能也跟主客态有关

  6. 另外吧首页主题帖列表接口中也有这个参数取值,其是否也会减少主题帖1L回复贴内容重复/相似的出现?

n0099 added a commit to n0099/tbclient.protobuf that referenced this issue May 19, 2023
n0099 added a commit to n0099/tbclient.protobuf that referenced this issue May 19, 2023
@n0099
Copy link
Author

n0099 commented May 19, 2023

  1. https://github.dev/n0099/tbclient.protobuf/tree/tbclient/proto 中快速检索q_type = \d+;可得以下接口都有着名为q_type的参数
    proto/CommonReq.proto:
      43:     int32 q_type = 40;
    proto/ActivityPage/DataReq.proto:
      12:     int32 q_type = 8;
    proto/AgreeMe/DataReq.proto:
      11:     int32 q_type = 7;
    proto/ExcPbPage/DataReq.proto:
      9:     uint64 q_type = 5;
    proto/FrsPage/DataReq.proto:
      22:     int32 q_type = 14;
    proto/GetForumsFromForumClass/DataReq.proto:
      9:     uint64 q_type = 5;
    proto/GetMyPost/DataReq.proto:
      12:     int32 q_type = 8;
    proto/GetTopicRelateThread/DataReq.proto:
      14:     int32 q_type = 10;
    proto/Hottopic/DataReq.proto:
      11:     int32 q_type = 7;
    proto/ItemPage/DataReq.proto:
      10:     int32 q_type = 6;
    proto/PbPage/DataReq.proto:
      25:     int32 q_type = 17;
    proto/Personal/DataReq.proto:
      10:     uint32 q_type = 6;
    proto/Personalized/DataReq.proto:
      17:     int32 q_type = 11;
    proto/Profile/DataReq.proto:
      16:     uint32 q_type = 12;
    proto/ReplyMe/DataReq.proto:
      11:     int32 q_type = 7;
    proto/StarTrends/DataReq.proto:
      9:     uint32 q_type = 5;
    proto/ThreadList/DataReq.proto:
      15:     uint32 q_type = 7;
    proto/UserPost/DataReq.proto:
      36:     int32 q_type = 32;
    proto/ZoneRight/DataReq.proto:
      8:     uint32 req_type = 4;
    
    主题帖回复贴列表接口为例可知q_type参数早在第一个引入protobuf接口的客户端版本6.2.2(实际上可能更早,但我懒得逐行git blame https://github.com/n0099/open-tbclient/tree/src 以追溯json接口的历史)中就出现了:https://github.com/n0099/tbclient.protobuf/blame/tbclient/proto/PbPage/DataReq.proto#LL25C14-L25C14
    image

n0099 added a commit to n0099/open-tbm that referenced this issue May 19, 2023
…Crawler.GetRequestsForPage()`

@ c#/crawler
n0099 added a commit to n0099/open-tbm that referenced this issue May 19, 2023
…Crawler.GetRequestsForPage()`

$ git submodule update --remote
@ c#/crawler
@lumina37
Copy link
Owner

等我过两三天有空了再来研究一下这个,如果能减少请求失败的次数那这个q_type还是相当有用的

@n0099
Copy link
Author

n0099 commented May 19, 2023

如果能减少请求失败的次数那这个q_type还是相当有用的

  1. 阁下真的读完了?我通篇要描述的实际上是目前只要带有q_type=2就会0条回复贴

@n0099
Copy link
Author

n0099 commented May 19, 2023

  1. 现在我认为1.和2.的原因是贴吧服务端已经删除了主题帖回复贴列表接口的q_typeis_fold_comment_req参数,只要提供了他们不论取值或_client_version是什么都会返回110001未知错误,而一个{"error_code":110001,"logid":"2786573351","ctime":"0","error_msg":"未知错误","server_time":2,"time":1684496786}当然是有着0条回复贴的

    这也解释了为什么

    2. 值得注意的是参数名qtype也有着同样的效果

    因为qtype参数本就不存在所以自然也是110001

    于是3.就无法得到解释了,我现在不可能回到贴吧对带q_type和或is_fold_comment_req参数直接返回110001之前去测试这个参数到底会不会减少回复贴数量,所以我现在想要知道的是阁下去年此时为何要在 c5c208f 中删除q_type=2

@lumina37
Copy link
Owner

只是当时觉得删了也不影响,能节省一丁点带宽就删了

@n0099
Copy link
Author

n0099 commented May 19, 2023

  1. 而根据

    $ find -name '*.log' -exec bash -c 'echo -n {} "" & grep -o "Reply list is empty" {} | wc -l' \; | sort
    ./20230504.log 477
    ./20230505.log 603
    ./20230506.log 467
    ./20230507.log 81
    ./20230508.log 1200
    ./20230509.log 0
    ./20230510.log 0
    ./20230511.log 162
    ./20230512.log 747
    ./20230513.log 332
    ./20230514.log 729
    ./20230515.log 774
    ./20230516.log 1001
    ./20230517.log 575
    ./20230518.log 51606
    ./latest.log 198839
    $ find -name '*.log' -exec bash -c 'echo -n {} "" & grep -o -P "Reply list is empty.*tid\":6616695318" {} | wc -l' \; | sort
    ./20230504.log 0
    ./20230505.log 2
    ./20230506.log 0
    ./20230507.log 0
    ./20230508.log 0
    ./20230509.log 0
    ./20230510.log 0
    ./20230511.log 0
    ./20230512.log 2
    ./20230513.log 0
    ./20230514.log 0
    ./20230515.log 0
    ./20230516.log 0
    ./20230517.log 0
    ./20230518.log 50615
    ./latest.log 182003

    8. 回到贴吧对带q_type和或is_fold_comment_req参数直接返回110001之前

    对于is_fold_comment_req参数的时间丶大概是2023-05-18 18:55:32左右

    又根据

    $ find -name '*.log' -exec bash -c 'echo -n {} "" & grep -o -P "HTTP 429.*tid\":6616695318" {} | wc -l' \; | sort
    ./20230504.log 4761
    ./20230505.log 4690
    ./20230506.log 5796
    ./20230507.log 4903
    ./20230508.log 3787
    ./20230509.log 1299
    ./20230510.log 0
    ./20230511.log 5615
    ./20230512.log 1105
    ./20230513.log 0
    ./20230514.log 5422
    ./20230515.log 6196
    ./20230516.log 6014
    ./20230517.log 6686
    ./20230518.log 6686
    ./latest.log 6156
    $ find -name '*.log' -exec bash -c 'echo -n {} "" & grep -o -P "HTTP 429" {} | wc -l' \; | sort
    ./20230504.log 6878
    ./20230505.log 5534
    ./20230506.log 7499
    ./20230507.log 6330
    ./20230508.log 5851
    ./20230509.log 1428
    ./20230510.log 0
    ./20230511.log 6669
    ./20230512.log 1277
    ./20230513.log 0
    ./20230514.log 7355
    ./20230515.log 8136
    ./20230516.log 7831
    ./20230517.log 8828
    ./20230518.log 8489
    ./latest.log 7206

    贴吧服务端开始对q_type参数返回110001的时间可能更早

@n0099
Copy link
Author

n0099 commented May 19, 2023

当时觉得删了也不影响

  1. 所以阁下当时在类似水楼的有着大量内容相似/重复的回复贴的主题帖上测试得出这个q_type的存在与否或取值不会改变主题帖回复贴列表接口所返回的回复贴数量以及实际pid们?

@n0099
Copy link
Author

n0099 commented May 19, 2023

  1. 关于fold折叠回复贴本身:
    正如位于 https://github.com/n0099/TiebaMonitor/blob/91e617a4940f222b0d9d4299c93146b84e2301d7/c%23/crawler/src/Tieba/Crawl/Crawler/ReplyCrawler.cs#L38-L40 的代码注释所言

    // as of client version 12.12.1.0 (not including), folded replies won't be include in response:
    // https://github.com/n0099/TiebaMonitor/commit/b8e7d2645e456271f52457f56500aaedaf28a010#diff-cf67f7f9e82d44aa5be8f85cd24946e5bb7829ca7940c9d056bb1e3849b8f981R32
    // so we have to manually requesting the folded replies by appending the returned request tasks

    去年4月时阁下指出了如果提供的_client_version参数>12.12.1.0那返回的回复贴列表中就不会存在已折叠回复贴
    然而不知何时起贴吧服务端又将其改成了无视_client_version一律带上已折叠回复贴
    tid5261166918的17857L pid123808965427为例
    image

    $ php-cgi client_tester_2.php client_version=12.40.1.1 type=replies tid=5261166918 pn=440 | sed -n '3,$p' | jq '.post_list | map(select(.is_fold!=0))'
    [
      {
        "need_log": 0,
        "show_squared": 0,
        "dynamic_url": "",
        "id": 123808965427,
        "title": "回复:【公告】最近被系统封禁的小伙伴看过来~",
        "floor": 17857,
        "time": 1548402945,
        "content": [
          {
            "type": 0,
            "text": "申请解封_"
          },
          {
            "type": 4,
            "text": "@双手战江山",
            "uid": 1353229638
          },
          {
            "type": 0,
            "text": " \n是否已经在解封帐号自动通道交过申请:是\n申诉理由:我一直遵守百度有关的管理规定,每天只是签到、正常回复吧友,17年id被盗了,申诉回来发现被人拿去刷广告,然后被封了,因为我的失误导致我的id被人盗取,是我的问题,我一定长个教训,认真遵守贴吧规则!"
          }
        ],
        "is_post_visible": 0,
        "is_fold": 2,
        "fold_tip": "查看本楼内容",
        "is_top_agree_post": 0,
        "is_ntitle": 0,
        "bimg_url": "",
        "agree": {
          "has_agree": 0,
          "agree_type": 0,
          "disagree_num": 0,
          "diff_agree_num": 0,
          "agree_num": 0
        },
        "sub_post_number": 0,
        "ios_bimg_format": "",
        "author_id": 3447893
      },
      {
        "is_fold": 6,
        "show_squared": 0,
        "id": 123888465600,
        "floor": 17858,
        "bimg_url": "",
        "author_id": 4030490853,
        "fold_tip": "",
        "time": 1548864463,
        "sub_post_number": 0,
        "ios_bimg_format": "",
        "dynamic_url": "",
        "is_ntitle": 0,
        "agree": {
          "diff_agree_num": 0,
          "agree_num": 0,
          "has_agree": 0,
          "agree_type": 0,
          "disagree_num": 0
        },
        "is_post_visible": 0,
        "need_log": 0,
        "is_top_agree_post": 0,
        "title": "回复:【公告】最近被系统封禁的小伙伴看过来~",
        "content": [
          {
            "type": 0,
            "text": "麻烦帮我解封一下,IDAppello4"
          }
        ]
      }
    ]

    可见不仅有着pid123808965427还指出了下一层17858L的pid123888465600也有着is_fold!=0(尽管网页端上不会显示,可能是因为其fold_tip""),而_client_version的取值也不影响结果

  2. 进一步的此前用于判断当前页是否有已折叠回复贴从而决定是否需要额外请求一次带is_fold_comment_req=1的请求的.has_fold_comment=1实际上是指示整个主题帖(而非当前请求页)中是否有已折叠回复贴,比如tid5261166918的第3页(但第1和2页有)没有任何is_fold!=0的回复贴

    $ php-cgi client_tester_2.php client_version=12.40.1.1 type=replies tid=5261166918 pn=3 | sed -n '3,$p' | jq '.post_list | map(select(.is_fold!=0))'
    []

    然而仍然有.has_fold_comment=1

    $ php-cgi client_tester_2.php client_version=12.40.1.1 type=replies tid=5261166918 pn=3 | grep -o -P '.{5}has_fold_comment.{5}'
    "}},"has_fold_comment":1,"

n0099 added a commit to n0099/tbclient.protobuf that referenced this issue May 19, 2023
n0099 added a commit to n0099/tbclient.protobuf that referenced this issue May 19, 2023
n0099 added a commit to n0099/tbclient.protobuf that referenced this issue May 19, 2023
n0099 added a commit to n0099/tbclient.protobuf that referenced this issue May 19, 2023
n0099 added a commit to n0099/tbclient.protobuf that referenced this issue May 19, 2023
n0099 added a commit to n0099/tbclient.protobuf that referenced this issue May 19, 2023
n0099 added a commit to n0099/tbclient.protobuf that referenced this issue May 19, 2023
n0099 added a commit to n0099/open-tbm that referenced this issue May 19, 2023
…iotieba#123 (comment) @ `ReplyCrawler.GetRequestsForPage()`

@ c#/crawler
$ git submodule update --remote
@lumina37
Copy link
Owner

我保守估计得后天才能动手实验,厂里催得紧

@lumina37
Copy link
Owner

我这里加上q_type=2好像也没什么区别啊,只是分别用ws和http随便测了几个页码

@lumina37
Copy link
Owner

另外折叠相关的代码我都准备在下个版本清理掉,好像现在也没什么用

@lumina37 lumina37 added the discussion discussion label May 24, 2023
@n0099
Copy link
Author

n0099 commented May 24, 2023

我这里加上q_type=2好像也没什么区别啊,只是分别用ws和http随便测了几个页码

加上不是直接110001未知错误

另外折叠相关的代码我都准备在下个版本清理掉,好像现在也没什么用

c999767
主要是以前的回复贴还保持着折叠状态,网页端上也是比如这一页 https://tieba.baidu.com/p/5214988494?pn=90 全都给折叠了,而截止2023年5月客户端接口也跟网页端一样:

$ php-cgi client_tester_2.php client_version=12.40.1.1 type=replies tid=5214988494 pn=90 | sed -n '3,$p'
{"error_code":0,"logid":"3458261427","server_time":22343,"ctime":"0","time":1684933058,"error_msg":""}
$ php-cgi client_tester_2.php client_version=8.8.8.8 type=replies tid=5214988494 pn=90 | sed -n '3,$p'
{"error_msg":"","error_code":0,"logid":"3469067793","ctime":"0","server_time":77771,"time":1684933069}

什么err0 msg""

@n0099
Copy link
Author

n0099 commented May 24, 2023

这一页 tieba.baidu.com/p/5214988494?pn=90 全都给折叠了

哦我搞混了18年那回部署的本楼包含部分广告等违规内容的回复,已经被屏蔽,暂不可见因楼主发布了广告等违规内容,回复此楼不顶帖哦~策略跟后来的fold折叠回复贴功能
tid5214988494是有某些pid处于本楼包含部分广告等违规内容的回复,已经被屏蔽,暂不可见状态,而整个tid处于因楼主发布了广告等违规内容,回复此楼不顶帖哦~状态

截止2023年5月客户端接口也跟网页端一样:

既然网页端看不到那一页那客户端看不到也是合理的,这跟后来部署的fold折叠回复贴毫无关系

@n0099
Copy link
Author

n0099 commented May 24, 2023

回到fold折叠回复贴本身

另外折叠相关的代码我都准备在下个版本清理掉,好像现在也没什么用

eqs


image

fid count isFold
97650 10059415 0
27497591 9689164 0
17019292 8061329 0
3255599 8014598 0
2265748 2253963 0
6087183 552604 0
27546680 526483 0
898666 412969 0
27278534 127199 0
228500 71653 0
19871743 59650 0
23546288 39789 0
78579 29892 0
4734432 19771 0
2265748 7467 1
2265748 4392 6
2265748 3179 2
2265748 1457 5
17019292 1399 6
898666 1223 2
898666 1013 6
25459979 951 0
898666 690 1
3255599 518 6
97650 501 6
27497591 226 6
23546288 223 1
898666 217 5
228500 194 1
78579 170 5
78579 165 1
27278534 121 6
6087183 74 5
4734432 67 1
19871743 60 1
78579 54 2
6087183 50 6
4734432 49 2
228500 45 5
19871743 44 2
228500 34 6
27546680 34 6
6087183 26 1
23546288 20 5
23546288 17 2
228500 13 2
23546288 12 6
19871743 8 6
19871743 8 5
4734432 5 6
6087183 4 2
17019292 4 5
78579 3 6
4734432 2 5

稀有到我都不想再archive这个isFold了,毕竟即便我用了mysql中8null=1bit存储优化我也希望能要么拆表出去(one-to-zeroOrOne relationship aka optional vertical partitioning 这样连每行1bit来指示IS NULL都不需要了)要么物理削除


而根据
image

count month
2013-09 1
2015-10 2
2016-01 1
2016-03 10
2016-04 2
2016-05 2
2016-06 4
2016-07 5
2016-08 4
2016-09 4
2016-10 4
2016-11 6
2016-12 10
2017-01 2
2017-02 6
2017-03 7
2017-04 3
2017-05 8
2017-06 27
2017-07 20
2017-08 27
2017-09 27
2017-10 6
2017-11 19
2017-12 31
2018-01 135
2018-02 135
2018-03 163
2018-04 492
2018-05 712
2018-06 741
2018-07 830
2018-08 908
2018-09 593
2018-10 731
2018-11 882
2018-12 1180
2019-01 1927
2019-02 4716
2019-03 4968
2019-04 1214
2019-05 200
2019-06 191
2019-07 2
2019-08 5
2019-09 1
2019-11 3
2020-01 1
2020-02 2
2020-03 8
2020-04 4
2020-05 9
2020-06 23
2020-07 23
2020-08 38
2020-09 8
2020-10 150
2020-11 20
2020-12 18
2021-01 5
2021-02 19
2021-03 22
2021-04 19
2021-05 27
2021-06 39
2021-07 326
2021-08 473
2021-09 363
2021-10 384
2021-11 191
2021-12 99
2022-01 74
2022-02 47
2022-03 52
2022-04 58
2022-05 149
2022-06 31
2022-07 35
2022-08 33
2022-09 35
2022-10 29
2022-11 7

可见fold折叠回复贴这策略主要用于18年1月~19年6月20年10月~22年11月两个时间段,并在22年11月后疑似彻底虚元,还有极少量13 15 16 17年的回复贴也会被挖出来加上折叠


又用
image

tid count
6062186860 2046
6088034801 143
4543639472 112
6086253737 86
4815622632 85
5914682306 80
5858698299 78
5520459893 76
5711344015 75
5189989245 72
5915436528 61
3656202350 58
5261166918 47
5351619411 46
5740013136 45
6043743451 41
5779968474 40
5868793321 39
5821704822 38
5669513911 38
5752693575 35
5403243984 35
6013990333 35
5935524343 34
5909514989 34
6008316815 33
2873064865 33
5787951911 31
5969997180 31
6039003901 31
5835224454 31
5850314440 30
6011085619 29
6094908103 28
5639543100 28
5601339291 27
5782690990 27
5963083986 26
5697375636 26
5699283119 25
5764243124 25
5688918493 25
7515778827 25
5214963020 24
5815143613 24
5666442583 24
5653565852 24
6899475217 24
5816852083 23
5915438375 23
5214988494 23
5527213844 23
6063219615 23
5780907376 23
5826722519 23
5972646681 22
6054076740 21
5995620951 21
5883187495 21
5709571952 21
5917411915 21
5675184663 21
5916063885 20
6038728815 20
5784064024 20
5475249986 20
5765980569 20
5785530233 20
5845684538 20
5854089729 20
...

然而人生自古谁无死?不幸地,前几个tid都早已贴吧404,只能说简中互联网没有记忆。


以这3个is_fold=2的pid为例
https://tieba.baidu.com/p/6116591812?pid=125402098933#125402098933
image

$ php-cgi client_tester_2.php client_version=8.8.8.8 type=replies tid=6116591812 pn=10 | sed -n '3,$p' | jq '.post_list[] | select(.is_fold!=0)'
{
  "id": 125402098933,
  "floor": 325,
  "is_ntitle": 0,
  "need_log": 0,
  "is_top_agree_post": 0,
  "show_squared": 0,
  "agree": {
    "has_agree": 0,
    "agree_type": 0,
    "disagree_num": 0,
    "diff_agree_num": 1,
    "agree_num": 1
  },
  "is_fold": 2,
  "time": 1556710174,
  "sub_post_number": 1,
  "bimg_url": "",
  "ios_bimg_format": "",
  "author_id": 166890595,
  "title": "回复:重温b站刚买的凉宫,团长的性格真是恶劣啊",
  "content": [
    {
      "text": "image_emoticon25",
      "c": "滑稽",
      "type": 2
    },
    {
      "type": 0,
      "text": "听歌就完事了 God knows我TM吹爆"
    }
  ],
  "is_post_visible": 0,
  "fold_tip": "查看本楼内容",
  "dynamic_url": ""
}

https://tieba.baidu.com/p/4543639472?pid=124427670134#124427670134
image

$ php-cgi client_tester_2.php client_version=12.46.1.1 type=replies tid=4543639472 pn=198 | sed -n '3,$p' | jq '.post_list[] | select(.is_fold!=0)'
{
  "is_ntitle": 0,
  "ios_bimg_format": "",
  "agree": {
    "has_agree": 0,
    "agree_type": 0,
    "disagree_num": 0,
    "diff_agree_num": 1,
    "agree_num": 1
  },
  "is_top_agree_post": 0,
  "title": "回复:【吧务贴】大b吧新举报申诉楼",
  "content": [
    {
      "link": "http://tieba.baidu.com/p/6055681352?share=9105&fr=share",
      "type": 1,
      "text": "http://tieba.baidu.com/p/6055681352?share=9105&fr=share"
    },
    {
      "type": 0,
      "text": " 22楼 理由 :人身攻击"
    }
  ],
  "sub_post_number": 1,
  "dynamic_url": "",
  "floor": 10892,
  "show_squared": 0,
  "need_log": 0,
  "time": 1551833529,
  "bimg_url": "",
  "author_id": 866443627,
  "is_post_visible": 0,
  "is_fold": 2,
  "fold_tip": "查看本楼内容",
  "id": 124427670134
}

https://tieba.baidu.com/p/4543639472?pid=123477137834#123477137834
image

$ php-cgi client_tester_2.php client_version=12.46.1.1 type=replies tid=4543639472 pn=194 | sed -n '3,$p' | jq '.post_list[] | select(.is_fold!=0)'
{
  "is_ntitle": 0,
  "sub_post_number": 0,
  "fold_tip": "查看本楼内容",
  "show_squared": 0,
  "id": 123477137834,
  "floor": 10624,
  "content": [
    {
      "type": 1,
      "text": "https://tieba.baidu.com/p/5994337462?pn=1#123476598598l",
      "link": "https://tieba.baidu.com/p/5994337462?pn=1#123476598598l"
    },
    {
      "text": " 强行引战2d3d谁上限高 而且说什么在哪贴哪贴看见什么 又强行来一贴弄二手屎 外带楼主是个3d学生 看不起2d画风 我说2d神韵被其嘲讽 完全没有讨论的态度 就是捧1踩1 ",
      "type": 0
    },
    {
      "type": 4,
      "text": "@被羊追杀",
      "uid": 412694632
    },
    {
      "text": " ",
      "type": 0
    },
    {
      "type": 4,
      "text": "@二十分好",
      "uid": 1059382505
    }
  ],
  "author_id": 734843822,
  "is_post_visible": 0,
  "is_fold": 2,
  "dynamic_url": "",
  "title": "回复:【吧务贴】大b吧新举报申诉楼",
  "bimg_url": "",
  "ios_bimg_format": "",
  "is_top_agree_post": 0,
  "time": 1546339376,
  "agree": {
    "has_agree": 0,
    "agree_type": 0,
    "disagree_num": 0,
    "diff_agree_num": 0,
    "agree_num": 0
  },
  "need_log": 0
}

再次印证了 #123 (comment)

  1. 然而不知何时起贴吧服务端又将其改成了无视_client_version一律带上已折叠回复贴

@lumina37
Copy link
Owner

那确实是没什么用,删了是正确的

@n0099
Copy link
Author

n0099 commented May 25, 2023

我这里加上q_type=2好像也没什么区别啊,只是分别用ws和http随便测了几个页码

加上不是直接110001未知错误

所以回到本issue标题,到底什么是q_type=2?去年的客户端请求时有带这个?为什么我现在带上就会110001而阁下不会?

@lumina37
Copy link
Owner

这我不清楚,我带上q_type也不会出110001

@lumina37
Copy link
Owner

要不我给你拉个branch出来测试?

@n0099
Copy link
Author

n0099 commented May 25, 2023

阁下那边试试

wget https://raw.githubusercontent.com/n0099/TiebaMonitor/v1/client_tester.php
patch client_tester.php << EOF
41c41,43
<             'pn' => $_GET['pn']
---
>             'pn' => $_GET['pn'],
>             'rn' => 30,
>             'q_type' => 2
EOF
php-cgi client_tester.php client_version=12.46.1.1 type=replies tid=4543639472 pn=194 | sed -n '3,$p'

(也就是http传输json的客户端接口,而非http传输protobuf encoding二进制ws传输json或protobuf)呢
以及

$ nslookup c.tieba.baidu.com
Server:         168.63.129.16
Address:        168.63.129.16#53

Non-authoritative answer:
c.tieba.baidu.com       canonical name = c.n.shifen.com.
Name:   c.n.shifen.com
Address: 103.235.46.139
Name:   c.n.shifen.com
Address: 103.235.46.140

@lumina37
Copy link
Owner

推荐使用tiebac.baidu.com,12.x版本后所有的c.tieba.baidu.com都被替换成了前述链接,因此前者可能更耐造

PS C:\WINDOWS\system32> nslookup c.tieba.baidu.com
服务器:  UnKnown
Address:  fe80::d6da:21ff:fe05:576f

非权威应答:
名称:    c.n.shifen.com
Addresses:  39.156.66.138
          223.109.81.34
          223.109.81.35
          112.34.111.194
          183.232.231.118
Aliases:  c.tieba.baidu.com

PS C:\WINDOWS\system32> nslookup tiebac.baidu.com
服务器:  UnKnown
Address:  fe80::d6da:21ff:fe05:576f

非权威应答:
名称:    tiebacchunwan.n.shifen.com
Address:  183.232.231.118
Aliases:  tiebac.baidu.com

另外我这台机没有php,安装的话起码得等明天

@lumina37 lumina37 reopened this May 25, 2023
@n0099
Copy link
Author

n0099 commented May 25, 2023

推荐使用tiebac.baidu.com

我之前还以为这是贴吧网管又配置nginx和dns错误搞出来的又一个贴吧反代域,如同经典
https://jump.bdimg.com
https://jump2.bdimg.com
https://wefan.baidu.com
https://nani.baidu.com


12.x版本后所有的c.tieba.baidu.com都被替换成了前述链接,因此前者可能更耐造

https://ping.chinaz.com/c.tieba.baidu.com
https://ping.chinaz.com/tiebac.baidu.com
都是相同的百度云加速ip段(甚至许多ip都相同)啊,国外也都是解析到hk节点103.235.46.140 103.235.46.139
#82 (comment)

另外c.tieba.baidu.com的rps限制因您所请求的cdnip而异,国内的百度cdnip的rps限制要高一些在20~30rps左右,而hk节点(所有非大陆ip都会被dns解析到那一个hk节点去,当然您也可以手动改hosts指定使用别的节点ip,但百度国内cdn节点的海外线路稀烂)只有10rps


#105 (comment)

  1. 另外由于c.tieba.baidu.com下的endpoint不支持HTTP/2,所以阁下此前于 3886edf7b7bba9 把改成一堆接口的url scheme改成https反而会更慢(当然贴吧方面要求携带BDUSS cookie的接口必须https是合理的),我以前curl测试http要300ms左右而https要600ms左右,而如果他们哪天支持了HTTP/2那复用现有的connection就可以压缩response header中那一堆重复的cookie(不论您的request header多简洁有无BDUSS他们都会返回一堆遥测tracing用途的cookie)

事实核查:截止2023年5月,域tiebac.baidu.com下的客户端接口endpoint还是不支持http2或http3,但贴吧网管终于配置正确了ssl证书(走的*.baidu.comwildcard SAN,而c.tieba.baidu.com的证书设成baidu.com的了)

  • tiebac.baidu.com
$ curl --http2 -o /dev/null -v 'http://tiebac.baidu.com/c/f/frs/page?cmd=301001'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 103.235.46.140:80...
* Connected to tiebac.baidu.com (103.235.46.140) port 80 (#0)
> GET /c/f/frs/page?cmd=301001 HTTP/1.1
> Host: tiebac.baidu.com
> User-Agent: curl/7.81.0
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAAQCAAAAAAIAAAAA
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Type: application/x-javascript;charset=utf-8
< Date: Fri, 26 May 2023 01:36:31 GMT
< P3p: CP=" OTI DSP COR IVA OUR IND COM "
< Server: Apache
< Set-Cookie: BAIDUID=122B07DB8917816357E8EB3BEBF732C3:FG=1; expires=Sat, 25-May-24 01:36:31 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
< Tracecode: 21919588212336534282052609
< Vary: Accept-Encoding
< X-Xss-Protection: 1; mode=block
< Transfer-Encoding: chunked
<
{ [126 bytes data]
100   115    0   115    0     0    282      0 --:--:-- --:--:-- --:--:--   282
* Connection #0 to host tiebac.baidu.com left intact

支持https:

$ curl --http2 -o /dev/null -v 'https://tiebac.baidu.com/c/f/frs/page?cmd=301001'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 103.235.46.140:443...
* Connected to tiebac.baidu.com (103.235.46.140) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [102 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [4810 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS header, Finished (20):
} [5 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* TLSv1.2 (IN), TLS header, Finished (20):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=CN; ST=beijing; L=beijing; OU=service operation department; O=Beijing Baidu Netcom Science Technology Co., Ltd; CN=baidu.com
*  start date: Jul  5 05:16:02 2022 GMT
*  expire date: Aug  6 05:16:01 2023 GMT
*  subjectAltName: host "tiebac.baidu.com" matched cert's "*.baidu.com"
*  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign RSA OV SSL CA 2018
*  SSL certificate verify ok.
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
> GET /c/f/frs/page?cmd=301001 HTTP/1.1
> Host: tiebac.baidu.com
> User-Agent: curl/7.81.0
> Accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Type: application/x-javascript;charset=utf-8
< Date: Fri, 26 May 2023 01:36:41 GMT
< P3p: CP=" OTI DSP COR IVA OUR IND COM "
< Server: Apache
< Set-Cookie: BAIDUID=1584F17A1889A6DD496DCB35E490A178:FG=1; expires=Sat, 25-May-24 01:36:41 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
< Tracecode: 22011617440335634186052609
< Vary: Accept-Encoding
< X-Xss-Protection: 1; mode=block
< Transfer-Encoding: chunked
<
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
100   115    0   115    0     0    384      0 --:--:-- --:--:-- --:--:--   383
* Connection #0 to host tiebac.baidu.com left intact

详细x509证书信息显示SAN中的DNS:*.baidu.com正好匹配了tiebac.baidu.com

$ openssl s_client -connect tiebac.baidu.com:443 -prexit | openssl x509 -noout -text
depth=2 OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
verify error:num=20:unable to get local issuer certificate
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            44:17:ce:86:ef:82:ec:69:21:cc:6f:68
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = BE, O = GlobalSign nv-sa, CN = GlobalSign RSA OV SSL CA 2018
        Validity
            Not Before: Jul  5 05:16:02 2022 GMT
            Not After : Aug  6 05:16:01 2023 GMT
        Subject: C = CN, ST = beijing, L = beijing, OU = service operation department, O = "Beijing Baidu Netcom Science Technology Co., Ltd", CN = baidu.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:aa:2f:cc:41:8d:25:ae:83:e9:f4:27:c4:00:b3:
                    39:6f:0e:98:2a:55:7d:07:e5:80:49:82:fa:d3:d3:
                    85:98:b5:df:7b:6f:bb:02:dd:ed:78:e4:0c:07:2b:
                    9e:1e:86:4b:f6:6a:86:58:d7:57:6f:21:59:11:d8:
                    6f:96:6e:d2:de:36:28:f6:b4:e3:ce:95:32:29:00:
                    c1:65:8e:69:b0:00:fe:52:37:f4:88:3f:8b:6d:0f:
                    bb:f0:ec:c5:c0:31:ef:ad:b5:0c:06:66:ad:be:dc:
                    43:13:c4:66:b0:5d:cf:56:53:e2:d1:96:82:1c:06:
                    bb:9b:5f:ed:60:8d:d2:ed:f3:d2:50:ee:bb:cd:b2:
                    36:97:c8:ce:7b:d2:4b:b7:5c:b4:88:ca:37:6e:8b:
                    ce:f9:96:fd:b4:f5:47:b5:20:77:bb:fc:a8:9d:81:
                    b2:6c:f8:c7:09:6a:dd:22:6e:83:3f:a7:53:df:f1:
                    da:2f:29:6b:22:c3:e9:1d:65:e8:c5:a0:ba:13:4e:
                    16:3f:03:93:f0:a5:59:8a:1a:80:e8:27:7d:49:23:
                    df:d1:f9:4b:97:b7:01:c4:19:f5:f1:c5:ff:91:33:
                    d0:a1:74:c6:ee:d4:cf:f6:38:0c:ed:bd:5e:aa:44:
                    fb:88:f7:7b:99:70:76:34:55:7e:55:d2:0f:9e:bf:
                    94:93
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            Authority Information Access:
                CA Issuers - URI:http://secure.globalsign.com/cacert/gsrsaovsslca2018.crt
                OCSP - URI:http://ocsp.globalsign.com/gsrsaovsslca2018

            X509v3 Certificate Policies:
                Policy: 1.3.6.1.4.1.4146.1.20
                  CPS: https://www.globalsign.com/repository/
                Policy: 2.23.140.1.2.2

            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 CRL Distribution Points:

                Full Name:
                  URI:http://crl.globalsign.com/gsrsaovsslca2018.crl

            X509v3 Subject Alternative Name:
                DNS:baidu.com, DNS:baifubao.com, DNS:www.baidu.cn, DNS:www.baidu.com.cn, DNS:mct.y.nuomi.com, DNS:apollo.auto, DNS:dwz.cn, DNS:*.baidu.com, DNS:*.baifubao.com, DNS:*.baidustatic.com, DNS:*.bdstatic.com, DNS:*.bdimg.com, DNS:*.hao123.com, DNS:*.nuomi.com, DNS:*.chuanke.com, DNS:*.trustgo.com, DNS:*.bce.baidu.com, DNS:*.eyun.baidu.com, DNS:*.map.baidu.com, DNS:*.mbd.baidu.com, DNS:*.fanyi.baidu.com, DNS:*.baidubce.com, DNS:*.mipcdn.com, DNS:*.news.baidu.com, DNS:*.baidupcs.com, DNS:*.aipage.com, DNS:*.aipage.cn, DNS:*.bcehost.com, DNS:*.safe.baidu.com, DNS:*.im.baidu.com, DNS:*.baiducontent.com, DNS:*.dlnel.com, DNS:*.dlnel.org, DNS:*.dueros.baidu.com, DNS:*.su.baidu.com, DNS:*.91.com, DNS:*.hao123.baidu.com, DNS:*.apollo.auto, DNS:*.xueshu.baidu.com, DNS:*.bj.baidubce.com, DNS:*.gz.baidubce.com, DNS:*.smartapps.cn, DNS:*.bdtjrcv.com, DNS:*.hao222.com, DNS:*.haokan.com, DNS:*.pae.baidu.com, DNS:*.vd.bdstatic.com, DNS:*.cloud.baidu.com, DNS:click.hm.baidu.com, DNS:log.hm.baidu.com, DNS:cm.pos.baidu.com, DNS:wn.pos.baidu.com, DNS:update.pan.baidu.com
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Authority Key Identifier:
                keyid:F8:EF:7F:F2:CD:78:67:A8:DE:6F:8F:24:8D:88:F1:87:03:02:B3:EB

            X509v3 Subject Key Identifier:
                3B:70:2D:3D:E8:19:05:00:47:12:02:EF:81:18:D3:41:08:E5:16:52
            CT Precertificate SCTs:
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : E8:3E:D0:DA:3E:F5:06:35:32:E7:57:28:BC:89:6B:C9:
                                03:D3:CB:D1:11:6B:EC:EB:69:E1:77:7D:6D:06:BD:6E
                    Timestamp : Jul  5 05:16:03.185 2022 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:80:26:42:7D:3D:91:C5:D7:88:E3:A7:
                                9A:50:98:EA:95:1E:05:9C:28:94:1E:FC:2C:C2:61:8B:
                                54:D6:34:6E:B2:02:21:00:82:9C:76:69:D1:35:26:45:
                                19:5D:30:34:6F:64:12:77:99:AB:73:6D:72:F9:0A:16:
                                28:8D:73:83:95:F4:75:DE
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 6F:53:76:AC:31:F0:31:19:D8:99:00:A4:51:15:FF:77:
                                15:1C:11:D9:02:C1:00:29:06:8D:B2:08:9A:37:D9:13
                    Timestamp : Jul  5 05:16:03.150 2022 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:E6:15:95:37:0A:4C:E5:38:0D:9E:29:
                                90:1C:BD:0F:0A:35:BF:B0:C5:42:B8:30:DC:0A:37:08:
                                9F:C4:3E:CA:FA:02:21:00:EE:51:EC:28:04:AF:33:88:
                                14:FE:F7:EA:16:07:83:D9:35:C9:FC:E3:99:4C:E8:29:
                                B2:C0:0E:C3:91:B3:F3:81
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 55:81:D4:C2:16:90:36:01:4A:EA:0B:9B:57:3C:53:F0:
                                C0:E4:38:78:70:25:08:17:2F:A3:AA:1D:07:13:D3:0C
                    Timestamp : Jul  5 05:16:03.196 2022 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:9F:06:1B:00:03:2D:AD:F9:B7:90:29:
                                B4:F7:18:4E:5F:06:C3:C6:7F:02:91:3B:58:B1:7F:69:
                                AD:34:F1:3B:AF:02:21:00:9E:7F:3A:A2:91:38:47:1F:
                                74:29:71:16:A1:8D:54:BA:E2:B7:03:77:34:0C:C8:4D:
                                90:B4:FB:49:63:3E:B9:12
    Signature Algorithm: sha256WithRSAEncryption
         63:21:07:23:47:06:eb:b3:7c:77:6c:df:bc:55:12:b9:f1:5e:
         6a:04:60:16:be:d0:0b:18:9c:94:0c:a8:82:08:25:0d:26:fb:
         dd:cb:fc:8c:27:d9:0c:fa:4a:b6:31:b6:67:f0:26:2c:0d:96:
         96:39:65:3f:d9:a1:ee:de:9c:10:4d:54:e1:c8:d6:a9:0e:77:
         db:00:e2:37:e3:3f:b4:9c:31:4f:ac:74:d3:22:12:53:36:d0:
         ef:18:07:2d:8e:d0:e6:91:b2:6c:4a:5e:39:53:14:58:4e:d1:
         50:04:c9:83:7e:0d:7b:15:96:87:11:d7:5d:4a:17:ac:aa:9f:
         84:e3:a8:24:9d:d6:17:77:26:8c:9f:7a:7b:18:da:39:2f:77:
         f7:2b:c7:23:b8:97:6f:c3:d1:72:4c:7e:fc:c6:0d:cc:73:38:
         19:81:fb:e7:c1:7a:e8:b9:1d:3a:05:dc:36:04:9b:f1:f0:e1:
         a6:47:a0:30:4f:55:90:6c:da:cf:9e:b2:76:12:11:a1:5c:b6:
         61:8d:15:a4:68:65:9a:57:2f:7a:6e:a3:1f:f5:b4:92:5a:3c:
         df:71:0a:cd:57:d4:d0:15:36:7e:ba:d5:03:25:27:45:b4:60:
         cd:2e:02:c1:0f:0a:e7:41:6f:58:69:20:9e:ad:47:52:1a:b5:
         e6:e5:8d:1d
  • c.tieba.baidu.com
$ curl --http2 -o /dev/null -v 'http://c.tieba.baidu.com/c/f/frs/page?cmd=301001'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 103.235.46.140:80...
* Connected to c.tieba.baidu.com (103.235.46.140) port 80 (#0)
> GET /c/f/frs/page?cmd=301001 HTTP/1.1
> Host: c.tieba.baidu.com
> User-Agent: curl/7.81.0
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAAQCAAAAAAIAAAAA
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Type: application/x-javascript;charset=utf-8
< Date: Fri, 26 May 2023 01:37:20 GMT
< P3p: CP=" OTI DSP COR IVA OUR IND COM "
< Server: Apache
< Set-Cookie: BAIDUID=D83AFCB77D0614ADCB36703E6AF4BA91:FG=1; expires=Sat, 25-May-24 01:37:20 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
< Tracecode: 22409589002778115338052609
< Vary: Accept-Encoding
< X-Xss-Protection: 1; mode=block
< Transfer-Encoding: chunked
<
{ [126 bytes data]
100   115    0   115    0     0    760      0 --:--:-- --:--:-- --:--:--   761
* Connection #0 to host c.tieba.baidu.com left intact

不支持https(当然您可以-k/--insecure绕过,但徒增流量特征):

$ curl --http2 -o /dev/null -v 'https://c.tieba.baidu.com/c/f/frs/page?cmd=301001'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 103.235.46.140:443...
* Connected to c.tieba.baidu.com (103.235.46.140) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [102 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [4810 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS header, Finished (20):
} [5 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS header, Finished (20):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=CN; ST=beijing; L=beijing; OU=service operation department; O=Beijing Baidu Netcom Science Technology Co., Ltd; CN=baidu.com
*  start date: Jul  5 05:16:02 2022 GMT
*  expire date: Aug  6 05:16:01 2023 GMT
*  subjectAltName does not match c.tieba.baidu.com
* SSL: no alternative certificate subject name matches target host name 'c.tieba.baidu.com'
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Closing connection 0
* TLSv1.2 (OUT), TLS header, Unknown (21):
} [5 bytes data]
* TLSv1.2 (OUT), TLS alert, close notify (256):
} [2 bytes data]
curl: (60) SSL: no alternative certificate subject name matches target host name 'c.tieba.baidu.com'
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

详细x509证书信息显示SAN中毫无tieba的字样:

$ openssl s_client -connect c.tieba.baidu.com:443 -prexit | openssl x509 -noout -text
depth=2 OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
verify error:num=20:unable to get local issuer certificate
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            44:17:ce:86:ef:82:ec:69:21:cc:6f:68
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = BE, O = GlobalSign nv-sa, CN = GlobalSign RSA OV SSL CA 2018
        Validity
            Not Before: Jul  5 05:16:02 2022 GMT
            Not After : Aug  6 05:16:01 2023 GMT
        Subject: C = CN, ST = beijing, L = beijing, OU = service operation department, O = "Beijing Baidu Netcom Science Technology Co., Ltd", CN = baidu.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:aa:2f:cc:41:8d:25:ae:83:e9:f4:27:c4:00:b3:
                    39:6f:0e:98:2a:55:7d:07:e5:80:49:82:fa:d3:d3:
                    85:98:b5:df:7b:6f:bb:02:dd:ed:78:e4:0c:07:2b:
                    9e:1e:86:4b:f6:6a:86:58:d7:57:6f:21:59:11:d8:
                    6f:96:6e:d2:de:36:28:f6:b4:e3:ce:95:32:29:00:
                    c1:65:8e:69:b0:00:fe:52:37:f4:88:3f:8b:6d:0f:
                    bb:f0:ec:c5:c0:31:ef:ad:b5:0c:06:66:ad:be:dc:
                    43:13:c4:66:b0:5d:cf:56:53:e2:d1:96:82:1c:06:
                    bb:9b:5f:ed:60:8d:d2:ed:f3:d2:50:ee:bb:cd:b2:
                    36:97:c8:ce:7b:d2:4b:b7:5c:b4:88:ca:37:6e:8b:
                    ce:f9:96:fd:b4:f5:47:b5:20:77:bb:fc:a8:9d:81:
                    b2:6c:f8:c7:09:6a:dd:22:6e:83:3f:a7:53:df:f1:
                    da:2f:29:6b:22:c3:e9:1d:65:e8:c5:a0:ba:13:4e:
                    16:3f:03:93:f0:a5:59:8a:1a:80:e8:27:7d:49:23:
                    df:d1:f9:4b:97:b7:01:c4:19:f5:f1:c5:ff:91:33:
                    d0:a1:74:c6:ee:d4:cf:f6:38:0c:ed:bd:5e:aa:44:
                    fb:88:f7:7b:99:70:76:34:55:7e:55:d2:0f:9e:bf:
                    94:93
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            Authority Information Access:
                CA Issuers - URI:http://secure.globalsign.com/cacert/gsrsaovsslca2018.crt
                OCSP - URI:http://ocsp.globalsign.com/gsrsaovsslca2018

            X509v3 Certificate Policies:
                Policy: 1.3.6.1.4.1.4146.1.20
                  CPS: https://www.globalsign.com/repository/
                Policy: 2.23.140.1.2.2

            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 CRL Distribution Points:

                Full Name:
                  URI:http://crl.globalsign.com/gsrsaovsslca2018.crl

            X509v3 Subject Alternative Name:
                DNS:baidu.com, DNS:baifubao.com, DNS:www.baidu.cn, DNS:www.baidu.com.cn, DNS:mct.y.nuomi.com, DNS:apollo.auto, DNS:dwz.cn, DNS:*.baidu.com, DNS:*.baifubao.com, DNS:*.baidustatic.com, DNS:*.bdstatic.com, DNS:*.bdimg.com, DNS:*.hao123.com, DNS:*.nuomi.com, DNS:*.chuanke.com, DNS:*.trustgo.com, DNS:*.bce.baidu.com, DNS:*.eyun.baidu.com, DNS:*.map.baidu.com, DNS:*.mbd.baidu.com, DNS:*.fanyi.baidu.com, DNS:*.baidubce.com, DNS:*.mipcdn.com, DNS:*.news.baidu.com, DNS:*.baidupcs.com, DNS:*.aipage.com, DNS:*.aipage.cn, DNS:*.bcehost.com, DNS:*.safe.baidu.com, DNS:*.im.baidu.com, DNS:*.baiducontent.com, DNS:*.dlnel.com, DNS:*.dlnel.org, DNS:*.dueros.baidu.com, DNS:*.su.baidu.com, DNS:*.91.com, DNS:*.hao123.baidu.com, DNS:*.apollo.auto, DNS:*.xueshu.baidu.com, DNS:*.bj.baidubce.com, DNS:*.gz.baidubce.com, DNS:*.smartapps.cn, DNS:*.bdtjrcv.com, DNS:*.hao222.com, DNS:*.haokan.com, DNS:*.pae.baidu.com, DNS:*.vd.bdstatic.com, DNS:*.cloud.baidu.com, DNS:click.hm.baidu.com, DNS:log.hm.baidu.com, DNS:cm.pos.baidu.com, DNS:wn.pos.baidu.com, DNS:update.pan.baidu.com
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Authority Key Identifier:
                keyid:F8:EF:7F:F2:CD:78:67:A8:DE:6F:8F:24:8D:88:F1:87:03:02:B3:EB

            X509v3 Subject Key Identifier:
                3B:70:2D:3D:E8:19:05:00:47:12:02:EF:81:18:D3:41:08:E5:16:52
            CT Precertificate SCTs:
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : E8:3E:D0:DA:3E:F5:06:35:32:E7:57:28:BC:89:6B:C9:
                                03:D3:CB:D1:11:6B:EC:EB:69:E1:77:7D:6D:06:BD:6E
                    Timestamp : Jul  5 05:16:03.185 2022 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:80:26:42:7D:3D:91:C5:D7:88:E3:A7:
                                9A:50:98:EA:95:1E:05:9C:28:94:1E:FC:2C:C2:61:8B:
                                54:D6:34:6E:B2:02:21:00:82:9C:76:69:D1:35:26:45:
                                19:5D:30:34:6F:64:12:77:99:AB:73:6D:72:F9:0A:16:
                                28:8D:73:83:95:F4:75:DE
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 6F:53:76:AC:31:F0:31:19:D8:99:00:A4:51:15:FF:77:
                                15:1C:11:D9:02:C1:00:29:06:8D:B2:08:9A:37:D9:13
                    Timestamp : Jul  5 05:16:03.150 2022 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:E6:15:95:37:0A:4C:E5:38:0D:9E:29:
                                90:1C:BD:0F:0A:35:BF:B0:C5:42:B8:30:DC:0A:37:08:
                                9F:C4:3E:CA:FA:02:21:00:EE:51:EC:28:04:AF:33:88:
                                14:FE:F7:EA:16:07:83:D9:35:C9:FC:E3:99:4C:E8:29:
                                B2:C0:0E:C3:91:B3:F3:81
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 55:81:D4:C2:16:90:36:01:4A:EA:0B:9B:57:3C:53:F0:
                                C0:E4:38:78:70:25:08:17:2F:A3:AA:1D:07:13:D3:0C
                    Timestamp : Jul  5 05:16:03.196 2022 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:9F:06:1B:00:03:2D:AD:F9:B7:90:29:
                                B4:F7:18:4E:5F:06:C3:C6:7F:02:91:3B:58:B1:7F:69:
                                AD:34:F1:3B:AF:02:21:00:9E:7F:3A:A2:91:38:47:1F:
                                74:29:71:16:A1:8D:54:BA:E2:B7:03:77:34:0C:C8:4D:
                                90:B4:FB:49:63:3E:B9:12
    Signature Algorithm: sha256WithRSAEncryption
         63:21:07:23:47:06:eb:b3:7c:77:6c:df:bc:55:12:b9:f1:5e:
         6a:04:60:16:be:d0:0b:18:9c:94:0c:a8:82:08:25:0d:26:fb:
         dd:cb:fc:8c:27:d9:0c:fa:4a:b6:31:b6:67:f0:26:2c:0d:96:
         96:39:65:3f:d9:a1:ee:de:9c:10:4d:54:e1:c8:d6:a9:0e:77:
         db:00:e2:37:e3:3f:b4:9c:31:4f:ac:74:d3:22:12:53:36:d0:
         ef:18:07:2d:8e:d0:e6:91:b2:6c:4a:5e:39:53:14:58:4e:d1:
         50:04:c9:83:7e:0d:7b:15:96:87:11:d7:5d:4a:17:ac:aa:9f:
         84:e3:a8:24:9d:d6:17:77:26:8c:9f:7a:7b:18:da:39:2f:77:
         f7:2b:c7:23:b8:97:6f:c3:d1:72:4c:7e:fc:c6:0d:cc:73:38:
         19:81:fb:e7:c1:7a:e8:b9:1d:3a:05:dc:36:04:9b:f1:f0:e1:
         a6:47:a0:30:4f:55:90:6c:da:cf:9e:b2:76:12:11:a1:5c:b6:
         61:8d:15:a4:68:65:9a:57:2f:7a:6e:a3:1f:f5:b4:92:5a:3c:
         df:71:0a:cd:57:d4:d0:15:36:7e:ba:d5:03:25:27:45:b4:60:
         cd:2e:02:c1:0f:0a:e7:41:6f:58:69:20:9e:ad:47:52:1a:b5:
         e6:e5:8d:1d

并且这两个证书本质完全相同

$ diff <(openssl s_client -connect tiebac.baidu.com:443 <<< 'Q' | openssl x509 -noout -text) <(openssl s_client -connect c.tieba.baidu.com:443 -prexit <<< 'Q' | openssl x509 -noout -text)
depth=2 OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
verify error:num=20:unable to get local issuer certificate
depth=2 OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
verify error:num=20:unable to get local issuer certificate
DONE
DONE

也就是说白了这就是贴吧网管懒得(或者说没权限,毕竟这证书用于太多托管在百度云加速的百度业务上了)跑去给这个用于百度大量业务而匹配了十万甚至九万个wildcard SAN的年抛(有效期11个月Jul 5 05:16:02 2022 GMT~Aug 6 05:16:01 2023 GMT)ssl证书再在每次从西方某CAGlobalSign那签发/续期证书时加一条SAN:DNS:*.tieba.baidu.com,所以为了应百度/政府网安部门的等保要求而实现在12.x客户端中把接口换成https的就只能这样workaround:换成tiebac.baidu.com正好匹配了SAN:DNS:*.baidu.com
想必这在四叶信安底层壬上壬上海贵族 FSF EFF 精神会员杨博文阁下 @yangbowen 看来是严重违反曾创立了CT机制CA/B论坛所制定的网管必读https最佳实践,毕竟仅仅是使用了我在流亡四叶头子PLT理论中级高手CS硕士仏皇irol阁下于2023年3月16日亲自执行sudo apt purge nginx之前用过的wildcard SAN功能就已经严重违反了IETF RFC2818RFC6125,更别说这背后藏在百度内部的混乱管理机制导致的缺乏网安意识(哪怕有百度人注意到也没动力/权力去改哪怕是一行配置)

我隔着百度云加速的cdn节点都能感受到百度内部权限管理部门间政治斗争顶级拉扯的程度了,正如同v2某自称百度壬道真相

哎,前百度员工来分享一下,百度独特的 M 和 T 管理序列,一个部门有两个领导,一个 M 专职管理,一个高 T 技术领导,本来是为了区分职能,但是最终变成了一个部门 5 个人,2 个人指挥 3 个人干活,两个领导互相抢工甩锅。M 迫于指标压力,从来不管长期收益,只管眼前数据。再加上百度 M 独特的轮岗活水,一个 M 在一个部门干一两年就换部门了,因此更没人重视长期收益了。
最终结果就是,眼下指标怎么好看怎么来,收益不够就加广告,拆分会员功能,再来个促销。反正就三板斧,抡完就换地方。对于我们普通研发来说就变成了无数多各种无意义的加班,各种倒排需求。每天加班到 10 点都不算啥,然后好不容易把功能搞上线了,反而被用户一顿狂喷。
另外,肉饼重点在他的 ai ,我们这些跟 ai 不相关的部门,那都是边缘业务,加最多的班,赚最多的钱,拿最低的绩效,长期怎么样,根本没人管

我局的等到贴吧倒闭都不用指望客户端接口会支持比http/1.1快的多的http2/3了
然而tbm历年来的事实证明贴吧没那么容易倒闭:

「几年前我跟n9聊天时就说过做第三方没有未来」也总比在各个流亡四叶残部继续跟你们持续过去数年间的单向反馈通讯最后逼疯你们和我自己要更有可持续性
当时我也进一步指出以我毫无路线图的业余开发效率(2017至今)恐怕到贴吧倒闭都写不完我希望tbm有的功能,事实核查:截止2023年5月,贴吧20周年前夕,贴吧仍然没有倒闭。而反倒是四叶服上所有网站都在irol阁下的一句sudo apt purge nginx下彻底虚元了,可见四叶重工反而比百度贴吧更没有可持续性。
而这在以r/datahoarder为代表的数字考古学家们看来也是经典悖论:archive某互联网带数字厂商服务的实体可能最终还没有那个服务/厂商活得久,导致若干年后只能回到zh-Hans互联网信息茧房向仍然活着的带数字厂商讨要自己上传的那些bits

而v2人对此也早有预言:当百度真正走向死亡的时候,只会走得悄无声息

几年前魏则西事件出来后,全网声讨,群情激奋,大家觉得百度要完了,然而并没有。直到近几年百度真正被各大内容平台瓜分走流量,连黑百度的声音都很少再听到,再没有人关心百度如何如何。比被全网声讨更可悲的是被全网遗忘,扑得悄无声息。

正如同经典寓言之人会死3次,幻想入才是死亡https://www.zhihu.com/question/267352010


安装的话起码得等明天

想必阁下已经出席完23年广州吧主大会回家了: https://tieba.baidu.com/p/8426862419?pid=147679303211#147679303211

sudo (add-apt-repository ppa:ondrej/php && apt update && apt install php8.2 php8.2-curl php8.2-cgi)

@yangbowen
Copy link

想必这在四叶信安底层壬上壬上海贵族 FSF EFF 精神会员杨博文阁下 @yangbowen 看来是严重违反曾创立了CT机制CA/B论坛所制定的网管必读https最佳实践,毕竟仅仅是使用了我在流亡四叶头子PLT理论中级高手CS硕士仏皇irol阁下于2023年3月16日亲自执行sudo apt purge nginx之前用过的wildcard SAN功能就已经严重违反了IETF RFC2818RFC6125,更别说这背后藏在百度内部的混乱管理机制导致的缺乏网安意识(哪怕有百度人注意到也没动力/权力去改哪怕是一行配置)

我其实并不明白这种做法有什么坏处。

n0099 added a commit to n0099/open-tbm that referenced this issue May 26, 2023
…hreadCrawler` and `ThreadLateCrawlerAndSaver` @ ClientRequester.cs

* replace the `HttpClient.BaseAddress` from `http://c.tieba.baidu.com` to `http://tiebac.baidu.com` in favor of lumina37/aiotieba#123 (comment) @ `EntryPoint.ConfigureServices()`
+ const field `LegacyEndPointUrl` to allow method `GetRequestsForPage()` and the one in derived class `ThreadArchiveCrawler` to use the original domain @ ThreadCrawler.cs
@ crawler

+ private field `_logger` and static field `ExtractMalformedExifDateTimeRegex`
+ method `ParseExifDateTimeOrNull()` to handle malformed EXIF date time string
@ MetadataConsumer.cs

* change the type of fields `Exif.(Create|Modify)Date` from `string` to `DateTime` @ ImageMetadata.cs
* now will log different message when the `ImageInReply.ExpectedByteSize==0` @ `ImageRequester.GetImageBytes()`
@ imagePipeline
@ c#
@lumina37
Copy link
Owner

debian11不自带add-apt-repository,懒得装php了,不过貌似家庭ip相比云服务商的ip更容易110001

@n0099
Copy link
Author

n0099 commented May 27, 2023

家庭ip相比云服务商的ip更容易110001

难道百度云加速和贴吧的网管还优待跑在云服务ASNip下的爬虫?

@n0099
Copy link
Author

n0099 commented Aug 26, 2024

国外也都是解析到hk节点103.235.46.140 103.235.46.139

至少16年起就在用 https://v2ex.com/t/298794#r_3458895 @BANKA2017https://blog.nest.moe/posts/visit-tieba-from-tencent-cloud-server-in-hong-kong Hackl0us/GeoIP2-CN#41
最近不知道什么时候起换成了c.tieba. 45.113.194.42 tiebac. 45.113.194.190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion discussion
Projects
None yet
Development

No branches or pull requests

3 participants