-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request Retry param for get_*** #82
Comments
通过判断是否为空重试?retry参数并不优雅,exception hooks是更好的解决方案,如果要设计这个东西的话可能至少需要一周时间 |
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1, user-scalable=no">
<title>贴吧404</title>
<link href="//tb1.bdstatic.com/tb/common-main-static/css/error.dc4bf960.css" rel="stylesheet"></head>
<body style="margin: 0;">
<!--[if lte IE 8]>
<style>
li {
list-style: none;
margin: 0;
padding: 0;
}
.content {
width: 490px;
margin-left: -245px;
margin-top: -245px;
position: absolute;
top: 45%;
bottom: 0;
left: 50%;
right: 0;
}
.emotion {
display: block;
margin: 0 auto 42px;
width: 144px;
height: 144px;
padding-top: 10%;
text-align: center;
}
p {
margin-bottom: 2px;
text-align: center;
color: #8F8E94;
font-size: 14px;
line-height: 14px;
}
.hr {
margin: 36px 0;
border-top: 1px solid #EEEDF0;
}
.bold {
font-weight: 500;
color: #141414;
}
</style>
<div class="content">
<img class="emotion" src="https://tieba-fe.gz.bcebos.com/hybrid_offline/assets/thread-not-found.44ef1fb5.png">
<p>为了保护您的账号安全和最佳的浏览体验,当前业务已经不支持IE8以下浏览器</p>
<p>版本访问,我们邀请您百度搜索下载以下几款浏览器,获得最佳网上冲浪体验~ </p>
<div class="hr"></div>
<p>以下四款官方正版浏览器任您选择</p>
<p class="bold">谷歌浏览器、QQ浏览器、搜狗浏览器、火狐浏览器</p>
</div>
<![endif]-->
<div id="app">
</div>
<script src="//tb1.bdstatic.com/tb/common-main-static/js/vendors.56dfadad.js"></script><script src="//tb1.bdstatic.com/tb/common-main-static/js/utils.f0fa7046.js"></script><script src="//tb1.bdstatic.com/tb/common-main-static/js/common.ec24fe0d.js"></script><script src="//tb1.bdstatic.com/tb/common-main-static/js/error.ddf32bc1.js"></script></body>
</html> 我从来没见过HTTP 403的限流 |
另外 我对此写了个动态试探rps并将发出的rps控制在试探出的rps上限来回浮动以尽可能使用更高的rps请求但又避免了一直撞上限,这样也能发现贴吧cdn运维如果突然修改了rps限制后的新rps上限: https://github.com/n0099/TiebaMonitor/blob/290c43ccf9054481d23b7bbc7ab6e6db54d6a38a/crawler/src/Tieba/ClientRequesterTcs.cs#L14 |
麻烦你回复那么多了,我是大陆的IP |
我目前是直接hack库内部解决了限流重试问题。可能记错了了错误码 |
目前已经实现的解决方案 使用 该处理函数可以抛出一个类型不同的异常,比如自定义的 import asyncio
import aiotieba as tb
class NeedRetry(RuntimeError):
pass
def exce_handler(err: tb.HTTPStatusError):
if isinstance(err, tb.HTTPStatusError):
if err.code == 429:
raise NeedRetry("need retry")
async def main():
async with tb.Client('default') as client:
tb.client.exc_handlers[client.get_fid] = exce_handler
for i in range(1, 4):
try:
await client.get_fid('天堂鸡汤')
except NeedRetry:
tb.LOG().debug(f"retry for the {i} time")
continue
else:
break
asyncio.run(main()) 输出的日志
|
经典c人最爱的全局error handler |
我需要什么功能
get_threads
get_posts
get_comment
三个函数在异步爬取帖子速率过快时,会出现429错误。可否提供retry的参数,设置行为成重新爬取而非raise然后打log忽略掉,多谢
https://github.com/Starry-OvO/aiotieba/blob/4fba4b58c4b1e98c11e198f832f62827dab5b539/aiotieba/client/__init__.py#L678
https://github.com/Starry-OvO/aiotieba/blob/4fba4b58c4b1e98c11e198f832f62827dab5b539/aiotieba/client/__init__.py#L718
https://github.com/Starry-OvO/aiotieba/blob/4fba4b58c4b1e98c11e198f832f62827dab5b539/aiotieba/client/__init__.py#L773
我想将这个功能应用于何种场景
快速爬取一个吧内的帖子
...
The text was updated successfully, but these errors were encountered: