-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
主题帖和回复贴接口请求参数中的q_type=2
到底是什么?
#123
Comments
…tbm & interested
…tbm & interested
|
…Crawler.GetRequestsForPage()` @ c#/crawler
…Crawler.GetRequestsForPage()` $ git submodule update --remote @ c#/crawler
等我过两三天有空了再来研究一下这个,如果能减少请求失败的次数那这个q_type还是相当有用的 |
|
|
只是当时觉得删了也不影响,能节省一丁点带宽就删了 |
|
|
|
…23 (comment) @ tbm & interested
…23 (comment) @ tbm & interested
…23 (comment) @ tbm & interested
…d_comment` due to lumina37/aiotieba#123 (comment) @ tbm & interested
…d_comment` due to lumina37/aiotieba#123 (comment) @ tbm & interested
…d_comment` due to lumina37/aiotieba#123 (comment) @ tbm & interested
…d_comment` due to lumina37/aiotieba#123 (comment) @ tbm & interested
…iotieba#123 (comment) @ `ReplyCrawler.GetRequestsForPage()` @ c#/crawler $ git submodule update --remote
我保守估计得后天才能动手实验,厂里催得紧 |
我这里加上 |
另外折叠相关的代码我都准备在下个版本清理掉,好像现在也没什么用 |
加上不是直接
c999767
什么 |
哦我搞混了18年那回部署的
既然网页端看不到那一页那客户端看不到也是合理的,这跟后来部署的 |
回到
|
fid | count | isFold |
---|---|---|
97650 | 10059415 | 0 |
27497591 | 9689164 | 0 |
17019292 | 8061329 | 0 |
3255599 | 8014598 | 0 |
2265748 | 2253963 | 0 |
6087183 | 552604 | 0 |
27546680 | 526483 | 0 |
898666 | 412969 | 0 |
27278534 | 127199 | 0 |
228500 | 71653 | 0 |
19871743 | 59650 | 0 |
23546288 | 39789 | 0 |
78579 | 29892 | 0 |
4734432 | 19771 | 0 |
2265748 | 7467 | 1 |
2265748 | 4392 | 6 |
2265748 | 3179 | 2 |
2265748 | 1457 | 5 |
17019292 | 1399 | 6 |
898666 | 1223 | 2 |
898666 | 1013 | 6 |
25459979 | 951 | 0 |
898666 | 690 | 1 |
3255599 | 518 | 6 |
97650 | 501 | 6 |
27497591 | 226 | 6 |
23546288 | 223 | 1 |
898666 | 217 | 5 |
228500 | 194 | 1 |
78579 | 170 | 5 |
78579 | 165 | 1 |
27278534 | 121 | 6 |
6087183 | 74 | 5 |
4734432 | 67 | 1 |
19871743 | 60 | 1 |
78579 | 54 | 2 |
6087183 | 50 | 6 |
4734432 | 49 | 2 |
228500 | 45 | 5 |
19871743 | 44 | 2 |
228500 | 34 | 6 |
27546680 | 34 | 6 |
6087183 | 26 | 1 |
23546288 | 20 | 5 |
23546288 | 17 | 2 |
228500 | 13 | 2 |
23546288 | 12 | 6 |
19871743 | 8 | 6 |
19871743 | 8 | 5 |
4734432 | 5 | 6 |
6087183 | 4 | 2 |
17019292 | 4 | 5 |
78579 | 3 | 6 |
4734432 | 2 | 5 |
稀有到我都不想再archive这个isFold
了,毕竟即便我用了mysql中8null=1bit的存储优化我也希望能要么拆表出去(one-to-zeroOrOne relationship
aka optional vertical partitioning
这样连每行1bit
来指示IS NULL
都不需要了)要么物理削除
count | month |
---|---|
2013-09 | 1 |
2015-10 | 2 |
2016-01 | 1 |
2016-03 | 10 |
2016-04 | 2 |
2016-05 | 2 |
2016-06 | 4 |
2016-07 | 5 |
2016-08 | 4 |
2016-09 | 4 |
2016-10 | 4 |
2016-11 | 6 |
2016-12 | 10 |
2017-01 | 2 |
2017-02 | 6 |
2017-03 | 7 |
2017-04 | 3 |
2017-05 | 8 |
2017-06 | 27 |
2017-07 | 20 |
2017-08 | 27 |
2017-09 | 27 |
2017-10 | 6 |
2017-11 | 19 |
2017-12 | 31 |
2018-01 | 135 |
2018-02 | 135 |
2018-03 | 163 |
2018-04 | 492 |
2018-05 | 712 |
2018-06 | 741 |
2018-07 | 830 |
2018-08 | 908 |
2018-09 | 593 |
2018-10 | 731 |
2018-11 | 882 |
2018-12 | 1180 |
2019-01 | 1927 |
2019-02 | 4716 |
2019-03 | 4968 |
2019-04 | 1214 |
2019-05 | 200 |
2019-06 | 191 |
2019-07 | 2 |
2019-08 | 5 |
2019-09 | 1 |
2019-11 | 3 |
2020-01 | 1 |
2020-02 | 2 |
2020-03 | 8 |
2020-04 | 4 |
2020-05 | 9 |
2020-06 | 23 |
2020-07 | 23 |
2020-08 | 38 |
2020-09 | 8 |
2020-10 | 150 |
2020-11 | 20 |
2020-12 | 18 |
2021-01 | 5 |
2021-02 | 19 |
2021-03 | 22 |
2021-04 | 19 |
2021-05 | 27 |
2021-06 | 39 |
2021-07 | 326 |
2021-08 | 473 |
2021-09 | 363 |
2021-10 | 384 |
2021-11 | 191 |
2021-12 | 99 |
2022-01 | 74 |
2022-02 | 47 |
2022-03 | 52 |
2022-04 | 58 |
2022-05 | 149 |
2022-06 | 31 |
2022-07 | 35 |
2022-08 | 33 |
2022-09 | 35 |
2022-10 | 29 |
2022-11 | 7 |
可见fold折叠回复贴
这策略主要用于18年1月~19年6月
和20年10月~22年11月
两个时间段,并在22年11月
后疑似彻底虚元,还有极少量13 15 16 17年
的回复贴也会被挖出来加上折叠
tid | count |
---|---|
6062186860 | 2046 |
6088034801 | 143 |
4543639472 | 112 |
6086253737 | 86 |
4815622632 | 85 |
5914682306 | 80 |
5858698299 | 78 |
5520459893 | 76 |
5711344015 | 75 |
5189989245 | 72 |
5915436528 | 61 |
3656202350 | 58 |
5261166918 | 47 |
5351619411 | 46 |
5740013136 | 45 |
6043743451 | 41 |
5779968474 | 40 |
5868793321 | 39 |
5821704822 | 38 |
5669513911 | 38 |
5752693575 | 35 |
5403243984 | 35 |
6013990333 | 35 |
5935524343 | 34 |
5909514989 | 34 |
6008316815 | 33 |
2873064865 | 33 |
5787951911 | 31 |
5969997180 | 31 |
6039003901 | 31 |
5835224454 | 31 |
5850314440 | 30 |
6011085619 | 29 |
6094908103 | 28 |
5639543100 | 28 |
5601339291 | 27 |
5782690990 | 27 |
5963083986 | 26 |
5697375636 | 26 |
5699283119 | 25 |
5764243124 | 25 |
5688918493 | 25 |
7515778827 | 25 |
5214963020 | 24 |
5815143613 | 24 |
5666442583 | 24 |
5653565852 | 24 |
6899475217 | 24 |
5816852083 | 23 |
5915438375 | 23 |
5214988494 | 23 |
5527213844 | 23 |
6063219615 | 23 |
5780907376 | 23 |
5826722519 | 23 |
5972646681 | 22 |
6054076740 | 21 |
5995620951 | 21 |
5883187495 | 21 |
5709571952 | 21 |
5917411915 | 21 |
5675184663 | 21 |
5916063885 | 20 |
6038728815 | 20 |
5784064024 | 20 |
5475249986 | 20 |
5765980569 | 20 |
5785530233 | 20 |
5845684538 | 20 |
5854089729 | 20 |
... |
然而人生自古谁无死?不幸地,前几个tid都早已贴吧404
,只能说简中互联网没有记忆。
以这3个is_fold=2
的pid为例
https://tieba.baidu.com/p/6116591812?pid=125402098933#125402098933
$ php-cgi client_tester_2.php client_version=8.8.8.8 type=replies tid=6116591812 pn=10 | sed -n '3,$p' | jq '.post_list[] | select(.is_fold!=0)'
{
"id": 125402098933,
"floor": 325,
"is_ntitle": 0,
"need_log": 0,
"is_top_agree_post": 0,
"show_squared": 0,
"agree": {
"has_agree": 0,
"agree_type": 0,
"disagree_num": 0,
"diff_agree_num": 1,
"agree_num": 1
},
"is_fold": 2,
"time": 1556710174,
"sub_post_number": 1,
"bimg_url": "",
"ios_bimg_format": "",
"author_id": 166890595,
"title": "回复:重温b站刚买的凉宫,团长的性格真是恶劣啊",
"content": [
{
"text": "image_emoticon25",
"c": "滑稽",
"type": 2
},
{
"type": 0,
"text": "听歌就完事了 God knows我TM吹爆"
}
],
"is_post_visible": 0,
"fold_tip": "查看本楼内容",
"dynamic_url": ""
}
https://tieba.baidu.com/p/4543639472?pid=124427670134#124427670134
$ php-cgi client_tester_2.php client_version=12.46.1.1 type=replies tid=4543639472 pn=198 | sed -n '3,$p' | jq '.post_list[] | select(.is_fold!=0)'
{
"is_ntitle": 0,
"ios_bimg_format": "",
"agree": {
"has_agree": 0,
"agree_type": 0,
"disagree_num": 0,
"diff_agree_num": 1,
"agree_num": 1
},
"is_top_agree_post": 0,
"title": "回复:【吧务贴】大b吧新举报申诉楼",
"content": [
{
"link": "http://tieba.baidu.com/p/6055681352?share=9105&fr=share",
"type": 1,
"text": "http://tieba.baidu.com/p/6055681352?share=9105&fr=share"
},
{
"type": 0,
"text": " 22楼 理由 :人身攻击"
}
],
"sub_post_number": 1,
"dynamic_url": "",
"floor": 10892,
"show_squared": 0,
"need_log": 0,
"time": 1551833529,
"bimg_url": "",
"author_id": 866443627,
"is_post_visible": 0,
"is_fold": 2,
"fold_tip": "查看本楼内容",
"id": 124427670134
}
https://tieba.baidu.com/p/4543639472?pid=123477137834#123477137834
$ php-cgi client_tester_2.php client_version=12.46.1.1 type=replies tid=4543639472 pn=194 | sed -n '3,$p' | jq '.post_list[] | select(.is_fold!=0)'
{
"is_ntitle": 0,
"sub_post_number": 0,
"fold_tip": "查看本楼内容",
"show_squared": 0,
"id": 123477137834,
"floor": 10624,
"content": [
{
"type": 1,
"text": "https://tieba.baidu.com/p/5994337462?pn=1#123476598598l",
"link": "https://tieba.baidu.com/p/5994337462?pn=1#123476598598l"
},
{
"text": " 强行引战2d3d谁上限高 而且说什么在哪贴哪贴看见什么 又强行来一贴弄二手屎 外带楼主是个3d学生 看不起2d画风 我说2d神韵被其嘲讽 完全没有讨论的态度 就是捧1踩1 ",
"type": 0
},
{
"type": 4,
"text": "@被羊追杀",
"uid": 412694632
},
{
"text": " ",
"type": 0
},
{
"type": 4,
"text": "@二十分好",
"uid": 1059382505
}
],
"author_id": 734843822,
"is_post_visible": 0,
"is_fold": 2,
"dynamic_url": "",
"title": "回复:【吧务贴】大b吧新举报申诉楼",
"bimg_url": "",
"ios_bimg_format": "",
"is_top_agree_post": 0,
"time": 1546339376,
"agree": {
"has_agree": 0,
"agree_type": 0,
"disagree_num": 0,
"diff_agree_num": 0,
"agree_num": 0
},
"need_log": 0
}
再次印证了 #123 (comment)
- 然而不知何时起贴吧服务端又将其改成了无视
_client_version
一律带上已折叠回复贴
那确实是没什么用,删了是正确的 |
所以回到本issue标题,到底什么是 |
这我不清楚,我带上q_type也不会出110001 |
要不我给你拉个branch出来测试? |
阁下那边试试 wget https://raw.githubusercontent.com/n0099/TiebaMonitor/v1/client_tester.php
patch client_tester.php << EOF
41c41,43
< 'pn' => $_GET['pn']
---
> 'pn' => $_GET['pn'],
> 'rn' => 30,
> 'q_type' => 2
EOF
php-cgi client_tester.php client_version=12.46.1.1 type=replies tid=4543639472 pn=194 | sed -n '3,$p' (也就是 $ nslookup c.tieba.baidu.com
Server: 168.63.129.16
Address: 168.63.129.16#53
Non-authoritative answer:
c.tieba.baidu.com canonical name = c.n.shifen.com.
Name: c.n.shifen.com
Address: 103.235.46.139
Name: c.n.shifen.com
Address: 103.235.46.140 |
推荐使用 PS C:\WINDOWS\system32> nslookup c.tieba.baidu.com
服务器: UnKnown
Address: fe80::d6da:21ff:fe05:576f
非权威应答:
名称: c.n.shifen.com
Addresses: 39.156.66.138
223.109.81.34
223.109.81.35
112.34.111.194
183.232.231.118
Aliases: c.tieba.baidu.com
PS C:\WINDOWS\system32> nslookup tiebac.baidu.com
服务器: UnKnown
Address: fe80::d6da:21ff:fe05:576f
非权威应答:
名称: tiebacchunwan.n.shifen.com
Address: 183.232.231.118
Aliases: tiebac.baidu.com 另外我这台机没有php,安装的话起码得等明天 |
我之前还以为这是贴吧网管又配置nginx和dns错误搞出来的又一个贴吧反代域,如同经典
https://ping.chinaz.com/c.tieba.baidu.com
事实核查:截止2023年5月,域
$ curl --http2 -o /dev/null -v 'http://tiebac.baidu.com/c/f/frs/page?cmd=301001'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 103.235.46.140:80...
* Connected to tiebac.baidu.com (103.235.46.140) port 80 (#0)
> GET /c/f/frs/page?cmd=301001 HTTP/1.1
> Host: tiebac.baidu.com
> User-Agent: curl/7.81.0
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAAQCAAAAAAIAAAAA
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Type: application/x-javascript;charset=utf-8
< Date: Fri, 26 May 2023 01:36:31 GMT
< P3p: CP=" OTI DSP COR IVA OUR IND COM "
< Server: Apache
< Set-Cookie: BAIDUID=122B07DB8917816357E8EB3BEBF732C3:FG=1; expires=Sat, 25-May-24 01:36:31 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
< Tracecode: 21919588212336534282052609
< Vary: Accept-Encoding
< X-Xss-Protection: 1; mode=block
< Transfer-Encoding: chunked
<
{ [126 bytes data]
100 115 0 115 0 0 282 0 --:--:-- --:--:-- --:--:-- 282
* Connection #0 to host tiebac.baidu.com left intact 支持https: $ curl --http2 -o /dev/null -v 'https://tiebac.baidu.com/c/f/frs/page?cmd=301001'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 103.235.46.140:443...
* Connected to tiebac.baidu.com (103.235.46.140) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [102 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [4810 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS header, Finished (20):
} [5 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* TLSv1.2 (IN), TLS header, Finished (20):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=CN; ST=beijing; L=beijing; OU=service operation department; O=Beijing Baidu Netcom Science Technology Co., Ltd; CN=baidu.com
* start date: Jul 5 05:16:02 2022 GMT
* expire date: Aug 6 05:16:01 2023 GMT
* subjectAltName: host "tiebac.baidu.com" matched cert's "*.baidu.com"
* issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign RSA OV SSL CA 2018
* SSL certificate verify ok.
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
> GET /c/f/frs/page?cmd=301001 HTTP/1.1
> Host: tiebac.baidu.com
> User-Agent: curl/7.81.0
> Accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Type: application/x-javascript;charset=utf-8
< Date: Fri, 26 May 2023 01:36:41 GMT
< P3p: CP=" OTI DSP COR IVA OUR IND COM "
< Server: Apache
< Set-Cookie: BAIDUID=1584F17A1889A6DD496DCB35E490A178:FG=1; expires=Sat, 25-May-24 01:36:41 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
< Tracecode: 22011617440335634186052609
< Vary: Accept-Encoding
< X-Xss-Protection: 1; mode=block
< Transfer-Encoding: chunked
<
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
100 115 0 115 0 0 384 0 --:--:-- --:--:-- --:--:-- 383
* Connection #0 to host tiebac.baidu.com left intact 详细x509证书信息显示SAN中的 $ openssl s_client -connect tiebac.baidu.com:443 -prexit | openssl x509 -noout -text
depth=2 OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
verify error:num=20:unable to get local issuer certificate
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
44:17:ce:86:ef:82:ec:69:21:cc:6f:68
Signature Algorithm: sha256WithRSAEncryption
Issuer: C = BE, O = GlobalSign nv-sa, CN = GlobalSign RSA OV SSL CA 2018
Validity
Not Before: Jul 5 05:16:02 2022 GMT
Not After : Aug 6 05:16:01 2023 GMT
Subject: C = CN, ST = beijing, L = beijing, OU = service operation department, O = "Beijing Baidu Netcom Science Technology Co., Ltd", CN = baidu.com
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public-Key: (2048 bit)
Modulus:
00:aa:2f:cc:41:8d:25:ae:83:e9:f4:27:c4:00:b3:
39:6f:0e:98:2a:55:7d:07:e5:80:49:82:fa:d3:d3:
85:98:b5:df:7b:6f:bb:02:dd:ed:78:e4:0c:07:2b:
9e:1e:86:4b:f6:6a:86:58:d7:57:6f:21:59:11:d8:
6f:96:6e:d2:de:36:28:f6:b4:e3:ce:95:32:29:00:
c1:65:8e:69:b0:00:fe:52:37:f4:88:3f:8b:6d:0f:
bb:f0:ec:c5:c0:31:ef:ad:b5:0c:06:66:ad:be:dc:
43:13:c4:66:b0:5d:cf:56:53:e2:d1:96:82:1c:06:
bb:9b:5f:ed:60:8d:d2:ed:f3:d2:50:ee:bb:cd:b2:
36:97:c8:ce:7b:d2:4b:b7:5c:b4:88:ca:37:6e:8b:
ce:f9:96:fd:b4:f5:47:b5:20:77:bb:fc:a8:9d:81:
b2:6c:f8:c7:09:6a:dd:22:6e:83:3f:a7:53:df:f1:
da:2f:29:6b:22:c3:e9:1d:65:e8:c5:a0:ba:13:4e:
16:3f:03:93:f0:a5:59:8a:1a:80:e8:27:7d:49:23:
df:d1:f9:4b:97:b7:01:c4:19:f5:f1:c5:ff:91:33:
d0:a1:74:c6:ee:d4:cf:f6:38:0c:ed:bd:5e:aa:44:
fb:88:f7:7b:99:70:76:34:55:7e:55:d2:0f:9e:bf:
94:93
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
Authority Information Access:
CA Issuers - URI:http://secure.globalsign.com/cacert/gsrsaovsslca2018.crt
OCSP - URI:http://ocsp.globalsign.com/gsrsaovsslca2018
X509v3 Certificate Policies:
Policy: 1.3.6.1.4.1.4146.1.20
CPS: https://www.globalsign.com/repository/
Policy: 2.23.140.1.2.2
X509v3 Basic Constraints:
CA:FALSE
X509v3 CRL Distribution Points:
Full Name:
URI:http://crl.globalsign.com/gsrsaovsslca2018.crl
X509v3 Subject Alternative Name:
DNS:baidu.com, DNS:baifubao.com, DNS:www.baidu.cn, DNS:www.baidu.com.cn, DNS:mct.y.nuomi.com, DNS:apollo.auto, DNS:dwz.cn, DNS:*.baidu.com, DNS:*.baifubao.com, DNS:*.baidustatic.com, DNS:*.bdstatic.com, DNS:*.bdimg.com, DNS:*.hao123.com, DNS:*.nuomi.com, DNS:*.chuanke.com, DNS:*.trustgo.com, DNS:*.bce.baidu.com, DNS:*.eyun.baidu.com, DNS:*.map.baidu.com, DNS:*.mbd.baidu.com, DNS:*.fanyi.baidu.com, DNS:*.baidubce.com, DNS:*.mipcdn.com, DNS:*.news.baidu.com, DNS:*.baidupcs.com, DNS:*.aipage.com, DNS:*.aipage.cn, DNS:*.bcehost.com, DNS:*.safe.baidu.com, DNS:*.im.baidu.com, DNS:*.baiducontent.com, DNS:*.dlnel.com, DNS:*.dlnel.org, DNS:*.dueros.baidu.com, DNS:*.su.baidu.com, DNS:*.91.com, DNS:*.hao123.baidu.com, DNS:*.apollo.auto, DNS:*.xueshu.baidu.com, DNS:*.bj.baidubce.com, DNS:*.gz.baidubce.com, DNS:*.smartapps.cn, DNS:*.bdtjrcv.com, DNS:*.hao222.com, DNS:*.haokan.com, DNS:*.pae.baidu.com, DNS:*.vd.bdstatic.com, DNS:*.cloud.baidu.com, DNS:click.hm.baidu.com, DNS:log.hm.baidu.com, DNS:cm.pos.baidu.com, DNS:wn.pos.baidu.com, DNS:update.pan.baidu.com
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Authority Key Identifier:
keyid:F8:EF:7F:F2:CD:78:67:A8:DE:6F:8F:24:8D:88:F1:87:03:02:B3:EB
X509v3 Subject Key Identifier:
3B:70:2D:3D:E8:19:05:00:47:12:02:EF:81:18:D3:41:08:E5:16:52
CT Precertificate SCTs:
Signed Certificate Timestamp:
Version : v1 (0x0)
Log ID : E8:3E:D0:DA:3E:F5:06:35:32:E7:57:28:BC:89:6B:C9:
03:D3:CB:D1:11:6B:EC:EB:69:E1:77:7D:6D:06:BD:6E
Timestamp : Jul 5 05:16:03.185 2022 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:46:02:21:00:80:26:42:7D:3D:91:C5:D7:88:E3:A7:
9A:50:98:EA:95:1E:05:9C:28:94:1E:FC:2C:C2:61:8B:
54:D6:34:6E:B2:02:21:00:82:9C:76:69:D1:35:26:45:
19:5D:30:34:6F:64:12:77:99:AB:73:6D:72:F9:0A:16:
28:8D:73:83:95:F4:75:DE
Signed Certificate Timestamp:
Version : v1 (0x0)
Log ID : 6F:53:76:AC:31:F0:31:19:D8:99:00:A4:51:15:FF:77:
15:1C:11:D9:02:C1:00:29:06:8D:B2:08:9A:37:D9:13
Timestamp : Jul 5 05:16:03.150 2022 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:46:02:21:00:E6:15:95:37:0A:4C:E5:38:0D:9E:29:
90:1C:BD:0F:0A:35:BF:B0:C5:42:B8:30:DC:0A:37:08:
9F:C4:3E:CA:FA:02:21:00:EE:51:EC:28:04:AF:33:88:
14:FE:F7:EA:16:07:83:D9:35:C9:FC:E3:99:4C:E8:29:
B2:C0:0E:C3:91:B3:F3:81
Signed Certificate Timestamp:
Version : v1 (0x0)
Log ID : 55:81:D4:C2:16:90:36:01:4A:EA:0B:9B:57:3C:53:F0:
C0:E4:38:78:70:25:08:17:2F:A3:AA:1D:07:13:D3:0C
Timestamp : Jul 5 05:16:03.196 2022 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:46:02:21:00:9F:06:1B:00:03:2D:AD:F9:B7:90:29:
B4:F7:18:4E:5F:06:C3:C6:7F:02:91:3B:58:B1:7F:69:
AD:34:F1:3B:AF:02:21:00:9E:7F:3A:A2:91:38:47:1F:
74:29:71:16:A1:8D:54:BA:E2:B7:03:77:34:0C:C8:4D:
90:B4:FB:49:63:3E:B9:12
Signature Algorithm: sha256WithRSAEncryption
63:21:07:23:47:06:eb:b3:7c:77:6c:df:bc:55:12:b9:f1:5e:
6a:04:60:16:be:d0:0b:18:9c:94:0c:a8:82:08:25:0d:26:fb:
dd:cb:fc:8c:27:d9:0c:fa:4a:b6:31:b6:67:f0:26:2c:0d:96:
96:39:65:3f:d9:a1:ee:de:9c:10:4d:54:e1:c8:d6:a9:0e:77:
db:00:e2:37:e3:3f:b4:9c:31:4f:ac:74:d3:22:12:53:36:d0:
ef:18:07:2d:8e:d0:e6:91:b2:6c:4a:5e:39:53:14:58:4e:d1:
50:04:c9:83:7e:0d:7b:15:96:87:11:d7:5d:4a:17:ac:aa:9f:
84:e3:a8:24:9d:d6:17:77:26:8c:9f:7a:7b:18:da:39:2f:77:
f7:2b:c7:23:b8:97:6f:c3:d1:72:4c:7e:fc:c6:0d:cc:73:38:
19:81:fb:e7:c1:7a:e8:b9:1d:3a:05:dc:36:04:9b:f1:f0:e1:
a6:47:a0:30:4f:55:90:6c:da:cf:9e:b2:76:12:11:a1:5c:b6:
61:8d:15:a4:68:65:9a:57:2f:7a:6e:a3:1f:f5:b4:92:5a:3c:
df:71:0a:cd:57:d4:d0:15:36:7e:ba:d5:03:25:27:45:b4:60:
cd:2e:02:c1:0f:0a:e7:41:6f:58:69:20:9e:ad:47:52:1a:b5:
e6:e5:8d:1d
$ curl --http2 -o /dev/null -v 'http://c.tieba.baidu.com/c/f/frs/page?cmd=301001'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 103.235.46.140:80...
* Connected to c.tieba.baidu.com (103.235.46.140) port 80 (#0)
> GET /c/f/frs/page?cmd=301001 HTTP/1.1
> Host: c.tieba.baidu.com
> User-Agent: curl/7.81.0
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAAQCAAAAAAIAAAAA
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Type: application/x-javascript;charset=utf-8
< Date: Fri, 26 May 2023 01:37:20 GMT
< P3p: CP=" OTI DSP COR IVA OUR IND COM "
< Server: Apache
< Set-Cookie: BAIDUID=D83AFCB77D0614ADCB36703E6AF4BA91:FG=1; expires=Sat, 25-May-24 01:37:20 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
< Tracecode: 22409589002778115338052609
< Vary: Accept-Encoding
< X-Xss-Protection: 1; mode=block
< Transfer-Encoding: chunked
<
{ [126 bytes data]
100 115 0 115 0 0 760 0 --:--:-- --:--:-- --:--:-- 761
* Connection #0 to host c.tieba.baidu.com left intact 不支持https(当然您可以 $ curl --http2 -o /dev/null -v 'https://c.tieba.baidu.com/c/f/frs/page?cmd=301001'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 103.235.46.140:443...
* Connected to c.tieba.baidu.com (103.235.46.140) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [102 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [4810 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS header, Finished (20):
} [5 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS header, Finished (20):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=CN; ST=beijing; L=beijing; OU=service operation department; O=Beijing Baidu Netcom Science Technology Co., Ltd; CN=baidu.com
* start date: Jul 5 05:16:02 2022 GMT
* expire date: Aug 6 05:16:01 2023 GMT
* subjectAltName does not match c.tieba.baidu.com
* SSL: no alternative certificate subject name matches target host name 'c.tieba.baidu.com'
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
* Closing connection 0
* TLSv1.2 (OUT), TLS header, Unknown (21):
} [5 bytes data]
* TLSv1.2 (OUT), TLS alert, close notify (256):
} [2 bytes data]
curl: (60) SSL: no alternative certificate subject name matches target host name 'c.tieba.baidu.com'
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above. 详细x509证书信息显示SAN中毫无 $ openssl s_client -connect c.tieba.baidu.com:443 -prexit | openssl x509 -noout -text
depth=2 OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
verify error:num=20:unable to get local issuer certificate
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
44:17:ce:86:ef:82:ec:69:21:cc:6f:68
Signature Algorithm: sha256WithRSAEncryption
Issuer: C = BE, O = GlobalSign nv-sa, CN = GlobalSign RSA OV SSL CA 2018
Validity
Not Before: Jul 5 05:16:02 2022 GMT
Not After : Aug 6 05:16:01 2023 GMT
Subject: C = CN, ST = beijing, L = beijing, OU = service operation department, O = "Beijing Baidu Netcom Science Technology Co., Ltd", CN = baidu.com
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public-Key: (2048 bit)
Modulus:
00:aa:2f:cc:41:8d:25:ae:83:e9:f4:27:c4:00:b3:
39:6f:0e:98:2a:55:7d:07:e5:80:49:82:fa:d3:d3:
85:98:b5:df:7b:6f:bb:02:dd:ed:78:e4:0c:07:2b:
9e:1e:86:4b:f6:6a:86:58:d7:57:6f:21:59:11:d8:
6f:96:6e:d2:de:36:28:f6:b4:e3:ce:95:32:29:00:
c1:65:8e:69:b0:00:fe:52:37:f4:88:3f:8b:6d:0f:
bb:f0:ec:c5:c0:31:ef:ad:b5:0c:06:66:ad:be:dc:
43:13:c4:66:b0:5d:cf:56:53:e2:d1:96:82:1c:06:
bb:9b:5f:ed:60:8d:d2:ed:f3:d2:50:ee:bb:cd:b2:
36:97:c8:ce:7b:d2:4b:b7:5c:b4:88:ca:37:6e:8b:
ce:f9:96:fd:b4:f5:47:b5:20:77:bb:fc:a8:9d:81:
b2:6c:f8:c7:09:6a:dd:22:6e:83:3f:a7:53:df:f1:
da:2f:29:6b:22:c3:e9:1d:65:e8:c5:a0:ba:13:4e:
16:3f:03:93:f0:a5:59:8a:1a:80:e8:27:7d:49:23:
df:d1:f9:4b:97:b7:01:c4:19:f5:f1:c5:ff:91:33:
d0:a1:74:c6:ee:d4:cf:f6:38:0c:ed:bd:5e:aa:44:
fb:88:f7:7b:99:70:76:34:55:7e:55:d2:0f:9e:bf:
94:93
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
Authority Information Access:
CA Issuers - URI:http://secure.globalsign.com/cacert/gsrsaovsslca2018.crt
OCSP - URI:http://ocsp.globalsign.com/gsrsaovsslca2018
X509v3 Certificate Policies:
Policy: 1.3.6.1.4.1.4146.1.20
CPS: https://www.globalsign.com/repository/
Policy: 2.23.140.1.2.2
X509v3 Basic Constraints:
CA:FALSE
X509v3 CRL Distribution Points:
Full Name:
URI:http://crl.globalsign.com/gsrsaovsslca2018.crl
X509v3 Subject Alternative Name:
DNS:baidu.com, DNS:baifubao.com, DNS:www.baidu.cn, DNS:www.baidu.com.cn, DNS:mct.y.nuomi.com, DNS:apollo.auto, DNS:dwz.cn, DNS:*.baidu.com, DNS:*.baifubao.com, DNS:*.baidustatic.com, DNS:*.bdstatic.com, DNS:*.bdimg.com, DNS:*.hao123.com, DNS:*.nuomi.com, DNS:*.chuanke.com, DNS:*.trustgo.com, DNS:*.bce.baidu.com, DNS:*.eyun.baidu.com, DNS:*.map.baidu.com, DNS:*.mbd.baidu.com, DNS:*.fanyi.baidu.com, DNS:*.baidubce.com, DNS:*.mipcdn.com, DNS:*.news.baidu.com, DNS:*.baidupcs.com, DNS:*.aipage.com, DNS:*.aipage.cn, DNS:*.bcehost.com, DNS:*.safe.baidu.com, DNS:*.im.baidu.com, DNS:*.baiducontent.com, DNS:*.dlnel.com, DNS:*.dlnel.org, DNS:*.dueros.baidu.com, DNS:*.su.baidu.com, DNS:*.91.com, DNS:*.hao123.baidu.com, DNS:*.apollo.auto, DNS:*.xueshu.baidu.com, DNS:*.bj.baidubce.com, DNS:*.gz.baidubce.com, DNS:*.smartapps.cn, DNS:*.bdtjrcv.com, DNS:*.hao222.com, DNS:*.haokan.com, DNS:*.pae.baidu.com, DNS:*.vd.bdstatic.com, DNS:*.cloud.baidu.com, DNS:click.hm.baidu.com, DNS:log.hm.baidu.com, DNS:cm.pos.baidu.com, DNS:wn.pos.baidu.com, DNS:update.pan.baidu.com
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Authority Key Identifier:
keyid:F8:EF:7F:F2:CD:78:67:A8:DE:6F:8F:24:8D:88:F1:87:03:02:B3:EB
X509v3 Subject Key Identifier:
3B:70:2D:3D:E8:19:05:00:47:12:02:EF:81:18:D3:41:08:E5:16:52
CT Precertificate SCTs:
Signed Certificate Timestamp:
Version : v1 (0x0)
Log ID : E8:3E:D0:DA:3E:F5:06:35:32:E7:57:28:BC:89:6B:C9:
03:D3:CB:D1:11:6B:EC:EB:69:E1:77:7D:6D:06:BD:6E
Timestamp : Jul 5 05:16:03.185 2022 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:46:02:21:00:80:26:42:7D:3D:91:C5:D7:88:E3:A7:
9A:50:98:EA:95:1E:05:9C:28:94:1E:FC:2C:C2:61:8B:
54:D6:34:6E:B2:02:21:00:82:9C:76:69:D1:35:26:45:
19:5D:30:34:6F:64:12:77:99:AB:73:6D:72:F9:0A:16:
28:8D:73:83:95:F4:75:DE
Signed Certificate Timestamp:
Version : v1 (0x0)
Log ID : 6F:53:76:AC:31:F0:31:19:D8:99:00:A4:51:15:FF:77:
15:1C:11:D9:02:C1:00:29:06:8D:B2:08:9A:37:D9:13
Timestamp : Jul 5 05:16:03.150 2022 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:46:02:21:00:E6:15:95:37:0A:4C:E5:38:0D:9E:29:
90:1C:BD:0F:0A:35:BF:B0:C5:42:B8:30:DC:0A:37:08:
9F:C4:3E:CA:FA:02:21:00:EE:51:EC:28:04:AF:33:88:
14:FE:F7:EA:16:07:83:D9:35:C9:FC:E3:99:4C:E8:29:
B2:C0:0E:C3:91:B3:F3:81
Signed Certificate Timestamp:
Version : v1 (0x0)
Log ID : 55:81:D4:C2:16:90:36:01:4A:EA:0B:9B:57:3C:53:F0:
C0:E4:38:78:70:25:08:17:2F:A3:AA:1D:07:13:D3:0C
Timestamp : Jul 5 05:16:03.196 2022 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:46:02:21:00:9F:06:1B:00:03:2D:AD:F9:B7:90:29:
B4:F7:18:4E:5F:06:C3:C6:7F:02:91:3B:58:B1:7F:69:
AD:34:F1:3B:AF:02:21:00:9E:7F:3A:A2:91:38:47:1F:
74:29:71:16:A1:8D:54:BA:E2:B7:03:77:34:0C:C8:4D:
90:B4:FB:49:63:3E:B9:12
Signature Algorithm: sha256WithRSAEncryption
63:21:07:23:47:06:eb:b3:7c:77:6c:df:bc:55:12:b9:f1:5e:
6a:04:60:16:be:d0:0b:18:9c:94:0c:a8:82:08:25:0d:26:fb:
dd:cb:fc:8c:27:d9:0c:fa:4a:b6:31:b6:67:f0:26:2c:0d:96:
96:39:65:3f:d9:a1:ee:de:9c:10:4d:54:e1:c8:d6:a9:0e:77:
db:00:e2:37:e3:3f:b4:9c:31:4f:ac:74:d3:22:12:53:36:d0:
ef:18:07:2d:8e:d0:e6:91:b2:6c:4a:5e:39:53:14:58:4e:d1:
50:04:c9:83:7e:0d:7b:15:96:87:11:d7:5d:4a:17:ac:aa:9f:
84:e3:a8:24:9d:d6:17:77:26:8c:9f:7a:7b:18:da:39:2f:77:
f7:2b:c7:23:b8:97:6f:c3:d1:72:4c:7e:fc:c6:0d:cc:73:38:
19:81:fb:e7:c1:7a:e8:b9:1d:3a:05:dc:36:04:9b:f1:f0:e1:
a6:47:a0:30:4f:55:90:6c:da:cf:9e:b2:76:12:11:a1:5c:b6:
61:8d:15:a4:68:65:9a:57:2f:7a:6e:a3:1f:f5:b4:92:5a:3c:
df:71:0a:cd:57:d4:d0:15:36:7e:ba:d5:03:25:27:45:b4:60:
cd:2e:02:c1:0f:0a:e7:41:6f:58:69:20:9e:ad:47:52:1a:b5:
e6:e5:8d:1d 并且这两个证书本质完全相同 $ diff <(openssl s_client -connect tiebac.baidu.com:443 <<< 'Q' | openssl x509 -noout -text) <(openssl s_client -connect c.tieba.baidu.com:443 -prexit <<< 'Q' | openssl x509 -noout -text)
depth=2 OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
verify error:num=20:unable to get local issuer certificate
depth=2 OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
verify error:num=20:unable to get local issuer certificate
DONE
DONE 也就是说白了这就是贴吧网管懒得(或者说没权限,毕竟这证书用于太多托管在百度云加速的百度业务上了)跑去给这个用于百度大量业务而匹配了十万甚至九万个wildcard SAN的年抛(有效期11个月 我隔着百度云加速的cdn节点都能感受到百度内部权限管理部门间政治斗争顶级拉扯的程度了,正如同v2某自称百度壬道真相:
我局的等到贴吧倒闭都不用指望客户端接口会支持比http/1.1快的多的http2/3了
而v2人对此也早有预言:
正如同经典寓言之
想必阁下已经出席完23年广州吧主大会回家了: https://tieba.baidu.com/p/8426862419?pid=147679303211#147679303211 sudo (add-apt-repository ppa:ondrej/php && apt update && apt install php8.2 php8.2-curl php8.2-cgi) |
我其实并不明白这种做法有什么坏处。 |
…hreadCrawler` and `ThreadLateCrawlerAndSaver` @ ClientRequester.cs * replace the `HttpClient.BaseAddress` from `http://c.tieba.baidu.com` to `http://tiebac.baidu.com` in favor of lumina37/aiotieba#123 (comment) @ `EntryPoint.ConfigureServices()` + const field `LegacyEndPointUrl` to allow method `GetRequestsForPage()` and the one in derived class `ThreadArchiveCrawler` to use the original domain @ ThreadCrawler.cs @ crawler + private field `_logger` and static field `ExtractMalformedExifDateTimeRegex` + method `ParseExifDateTimeOrNull()` to handle malformed EXIF date time string @ MetadataConsumer.cs * change the type of fields `Exif.(Create|Modify)Date` from `string` to `DateTime` @ ImageMetadata.cs * now will log different message when the `ImageInReply.ExpectedByteSize==0` @ `ImageRequester.GetImageBytes()` @ imagePipeline @ c#
debian11不自带add-apt-repository,懒得装php了,不过貌似家庭ip相比云服务商的ip更容易110001 |
难道百度云加速和贴吧的网管还优待跑在云服务ASNip下的爬虫? |
…d_comment` due to lumina37/aiotieba#123 (comment) @ tbm & interested
至少16年起就在用 https://v2ex.com/t/298794#r_3458895 @BANKA2017 的 https://blog.nest.moe/posts/visit-tieba-from-tencent-cloud-server-in-hong-kong Hackl0us/GeoIP2-CN#41 |
阁下于
May 26, 2022
v2.6.1
c5c208f#diff-4d330e17cd513d344a73ce66cfe7682bf66be5294b077a46f38fd4c9d61fa2dcL822 时删除了q_type=2
而我注意到最近16天内以 https://tieba.baidu.com/p/6616695318 为代表的如下主题帖请求某些页数的
主题帖回复贴列表
时返回了0条回复贴,这在以前通常代表请求的页数还不存在或主题帖已被删除https://tieba.baidu.com/p/3611123694
https://tieba.baidu.com/p/4944438028
https://tieba.baidu.com/p/5207410447
https://tieba.baidu.com/p/5214988494
https://tieba.baidu.com/p/5261166918
https://tieba.baidu.com/p/6616695318
https://tieba.baidu.com/p/6993372330
https://tieba.baidu.com/p/7096366852
https://tieba.baidu.com/p/7230997267
https://tieba.baidu.com/p/7816662822
https://tieba.baidu.com/p/7832611927
https://tieba.baidu.com/p/7943900568
https://tieba.baidu.com/p/7950129599
https://tieba.baidu.com/p/8215005930
https://tieba.baidu.com/p/8227942444
https://tieba.baidu.com/p/8235105996
https://tieba.baidu.com/p/8342345319
https://tieba.baidu.com/p/8355529720
https://tieba.baidu.com/p/8394353126
https://tieba.baidu.com/p/8402224919
https://tieba.baidu.com/p/8408045722
https://tieba.baidu.com/p/8409361546
https://tieba.baidu.com/p/8412021878
https://tieba.baidu.com/p/8416955449
https://tieba.baidu.com/p/8416972007
https://tieba.baidu.com/p/8417800020
https://tieba.baidu.com/p/8417882008
https://tieba.baidu.com/p/8418007671
https://tieba.baidu.com/p/8418615605
但我通过访问贴吧网页端手动检测这些主题帖都存在对应的
主题帖回复贴列表
页数,并且这些主题帖大多都有着大量回复(以水楼为代表)将 https://github.com/n0099/TiebaMonitor/blob/e7d7240aebeee74d04f7b5d4748af69dff3ed5b0/client_tester.php#L41 修改为
以模仿 https://github.com/n0099/TiebaMonitor/blob/91e617a4940f222b0d9d4299c93146b84e2301d7/c%23/crawler/src/Tieba/Crawl/Crawler/ReplyCrawler.cs#L31 后可复现
而删除
q_type=2
参数后就是预期的30条回复贴值得注意的是参数名
qtype
也有着同样的效果(贴吧接口validate层会做命名转换?也可能的确有着两个alias别名)还注意到在我的tbm生产环境数据库中的v吧表中有
https://tieba.baidu.com/p/7258962480 这样显著的贴吧报告的(在
吧首页主题帖列表
接口中返回的)主题帖回复贴+楼中楼数量
远大于实际爬下来并存储于表中的回复贴+楼中楼数量
,尽管一般而言两者相差几条是正常的,因为我记得以前删除回复贴/楼中楼
后在吧首页主题帖列表
中的主题帖回复贴+楼中楼数量
并不会减少(因此当时是仅自增计数器)所以我猜测
q_type=2
是某种会减少主题帖回复贴列表
接口返回的回复贴数量
的参数取值,考虑到其主要减少的是在水楼这样大量内容重复的回复贴上,这可能也跟主客态有关另外
吧首页主题帖列表
接口中也有这个参数取值,其是否也会减少主题帖1L回复贴
内容重复/相似的出现?The text was updated successfully, but these errors were encountered: