-
Notifications
You must be signed in to change notification settings - Fork 91
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
250 additions
and
268 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
!.gitignore | ||
!LICENSE | ||
!README.md | ||
!README_EN.md | ||
!README_zh-cn.md | ||
|
||
!pyproject.toml | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,136 +1,131 @@ | ||
# Rotate-Captcha-Crack | ||
|
||
中文 | [English](README_EN.md) | ||
|
||
CNN预测图片旋转角度,可用于破解百度旋转验证码 | ||
|
||
测试效果: | ||
|
||
![test_result](https://user-images.githubusercontent.com/48282276/224320691-a8eefd23-392b-4580-a729-7869fa237eaa.png) | ||
|
||
本仓库实现了三类模型: | ||
|
||
| 名称 | Backbone | 损失函数 | 跨域测试误差(越小越好) | 大小(MB) | | ||
| ----------- | ----------------- | ------------ | ------------------------ | ---------- | | ||
| RotNet | ResNet50 | 交叉熵 | 1.1548° | 92.7 | | ||
| RotNetR | RegNetY 3.2GFLOPs | 交叉熵 | 1.2825° | 69.8 | | ||
| RCCNet_v0_5 | RegNetY 3.2GFLOPs | MSE+余弦修正 | 42.7774° | 68.7 | | ||
|
||
`RotNet`为[`d4nst/RotNet`](https://github.com/d4nst/RotNet/blob/master/train/train_street_view.py)的PyTorch实现。`RotNetR`仅在`RotNet`的基础上替换了backbone,并将分类数减少至180。其在[谷歌街景数据集](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/)上训练64个epoch(耗时2小时)得到的平均预测误差为`1.2825°`。目前`RCCNet_v0_5`效果较差,推荐使用`RotNetR` | ||
|
||
跨域测试使用[谷歌街景](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/)/[Landscape-Dataset](https://github.com/yuweiming70/Landscape-Dataset)作为训练集,百度验证码作为测试集(特别鸣谢@xiangbei1997) | ||
|
||
演示用到的百度验证码图片来自[RotateCaptchaBreak](https://github.com/chencchen/RotateCaptchaBreak/tree/master/data/baiduCaptcha) | ||
|
||
## 体验已有模型 | ||
|
||
### 准备环境 | ||
|
||
+ 支持CUDA10+的GPU(如需训练则显存还需要不少于4G) | ||
|
||
+ 确保你的`Python`版本`>=3.8 <3.11` | ||
|
||
+ 确保你的`PyTorch`版本`>=1.11` | ||
|
||
+ 拉取代码并安装依赖库 | ||
|
||
```shell | ||
git clone --depth=1 https://github.com/Starry-OvO/rotate-captcha-crack.git | ||
cd ./rotate-captcha-crack | ||
pip install . | ||
``` | ||
|
||
注意不要漏了`install`后面那个`.` | ||
|
||
+ 或者,使用虚拟环境 | ||
|
||
```shell | ||
git clone --depth=1 https://github.com/Starry-OvO/rotate-captcha-crack.git | ||
python -m venv ./rotate-captcha-crack --system-site-packages | ||
cd ./rotate-captcha-crack | ||
# 根据你的Shell类型挑选一个合适的脚本激活虚拟环境 例如./Script/Active.ps1 | ||
python -m pip install -U pip | ||
pip install . | ||
``` | ||
|
||
### 下载预训练模型 | ||
|
||
下载[Release](https://github.com/Starry-OvO/rotate-captcha-crack/releases)中的压缩包并解压到`./models`文件夹下 | ||
|
||
文件目录结构类似`./models/RCCNet_v0_5/230228_20_07_25_000/best.pth` | ||
|
||
本项目仍处于beta阶段,模型名称会频繁发生变更,因此出现任何`FileNotFoundError`请先尝试用git回退到对应的tag | ||
|
||
### 输入一个验证码图像并查看旋转效果 | ||
|
||
如果你的系统没有GUI,尝试把debug方法从显示图像改成保存图像 | ||
|
||
```bash | ||
python test_captcha.py | ||
``` | ||
|
||
### 使用http服务端 | ||
|
||
+ 安装额外依赖 | ||
|
||
```shell | ||
pip install aiohttp httpx[cli] | ||
``` | ||
|
||
+ 运行服务端 | ||
|
||
```shell | ||
python server.py | ||
``` | ||
|
||
+ 另开一命令行窗口发送图像 | ||
|
||
```shell | ||
httpx -m POST http://127.0.0.1:4396 -f img ./test.jpg | ||
``` | ||
|
||
## 训练新模型 | ||
|
||
### 准备数据集 | ||
|
||
+ 我这里直接扒的[谷歌街景](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/)和[Landscape-Dataset](https://github.com/yuweiming70/Landscape-Dataset),你也可以自己收集一些风景照并放到一个文件夹里,图像没有尺寸要求 | ||
|
||
+ 在`train.py`里配置`dataset_root`变量指向装有图片的文件夹 | ||
|
||
+ 不需要手动标注,dataset会在读取图片的同时自动完成矩形裁剪、缩放旋转等工作 | ||
|
||
### 训练 | ||
|
||
```bash | ||
python train_RotNetR.py | ||
``` | ||
|
||
### 在测试集上验证模型 | ||
|
||
```bash | ||
python test_RotNetR.py | ||
``` | ||
|
||
## 设计细节 | ||
|
||
现有的旋图验证码破解方法大多基于[`d4nst/RotNet`](https://github.com/d4nst/RotNet),其backbone为`ResNet50`,将角度预测视作360分类问题,并计算交叉熵损失。 | ||
|
||
`RotNet`中使用的交叉熵损失会令 $1°$ 和 $359°$ 之间的度量距离接近一个类似 $358°$ 的较大值,这显然是一个违背常识的结果。它们之间的度量距离应当是一个类似 $2°$ 的极小值。 | ||
|
||
同时,[d4nst/RotNet](https://github.com/d4nst/RotNet)给出的[`angle_error_regression`](https://github.com/d4nst/RotNet/blob/a56ea59818bbdd76d4dd8d83b8bbbaae6a802310/utils.py#L30-L36)损失函数效果较差。这是因为该损失函数在应对离群值时的梯度将导致不收敛的结果,你可以在后续的损失函数比较中轻松理解这一点。 | ||
|
||
本人设计的回归损失函数`RotationLoss`在`MSELoss`的基础上加了个余弦约束项来缩小真实值的 $±k*360°$ 与真实值之间的度量距离。 | ||
|
||
$$ \mathcal{L}(dist) = {dist}^{2} + \lambda_{cos} (1 - \cos(2\pi*{dist})) $$ | ||
|
||
为什么这里使用`MSELoss`,因为自监督学习生成的`label`可以保证不含有任何离群值,因此损失函数设计不需要考虑离群值的问题,同时`MSELoss`不破坏损失函数的可导性。 | ||
|
||
该损失函数在整个实数域可导且几乎为凸,为什么说是几乎,因为当 $\lambda_{cos} \gt 0.25$ 时在 $predict = \pm 1$ 的地方会出现局部极小值。 | ||
|
||
最后直观比较一下`RotationLoss`和`angle_error_regression`的函数图像。 | ||
|
||
![loss](https://user-images.githubusercontent.com/48282276/223087577-fe054521-36c4-4665-9132-2ca7dd2270f8.png) | ||
|
||
## 相关文章 | ||
|
||
[吾爱破解 - 简单聊聊旋转验证码攻防](https://www.52pojie.cn/thread-1754224-1-1.html) | ||
# Rotate-Captcha-Crack | ||
|
||
[中文](README_zh-cn.md) | English | ||
|
||
Predict the rotation angle of given picture through CNN. It can be used for rotate-captcha cracking. | ||
|
||
Test result: | ||
|
||
![test_result](https://user-images.githubusercontent.com/48282276/224320691-a8eefd23-392b-4580-a729-7869fa237eaa.png) | ||
|
||
Three kinds of models are implemented, as shown below. | ||
|
||
| Name | Backbone | Loss | Cross-Domain Loss (less is better) | Size (MB) | | ||
| ----------- | ----------------- | -------------------------- | ---------------------------------- | --------- | | ||
| RotNet | ResNet50 | CrossEntropy | 1.1548° | 92.7 | | ||
| RotNetR | RegNetY 3.2GFLOPs | CrossEntropy | 1.2825° | 69.8 | | ||
| RCCNet_v0_5 | RegNetY 3.2GFLOPs | MSE with Cosine-Correction | 42.7774° | 68.7 | | ||
|
||
RotNet is the implementation of [`d4nst/RotNet`](https://github.com/d4nst/RotNet/blob/master/train/train_street_view.py) over PyTorch. `RotNetR` is based on `RotNet`. I just renewed its backbone and reduce its class number to 180. It's average prediction error is `1.2825°`, obtained by 64 epochs of training (2 hours) on the [Google Street View](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/) dataset. Presently the performance of `RCCNet_v0_5` is far weaker than the other two. I suggest you use `RotNetR` anyway. | ||
|
||
About the Cross-Domain Test: [Google Street View](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/) and [Landscape-Dataset](https://github.com/yuweiming70/Landscape-Dataset) for training, and Captcha Pictures from Baidu for testing (special thx to @xiangbei1997) | ||
|
||
The captcha picture used in the demo above comes from [RotateCaptchaBreak](https://github.com/chencchen/RotateCaptchaBreak/tree/master/data/baiduCaptcha) | ||
|
||
## Try it! | ||
|
||
### Prepare | ||
|
||
+ GPU supporting CUDA10+ (mem>=4G for training) | ||
|
||
+ Python>=3.8 <3.11 | ||
|
||
+ PyTorch>=1.11 | ||
|
||
+ Clone the repository and install all requiring dependencies | ||
|
||
```shell | ||
git clone --depth=1 https://github.com/Starry-OvO/rotate-captcha-crack.git | ||
cd ./rotate-captcha-crack | ||
pip install . | ||
``` | ||
|
||
**DONT** miss the `.` after `install` | ||
|
||
+ Or, if you prefer `venv` | ||
|
||
```shell | ||
git clone --depth=1 https://github.com/Starry-OvO/rotate-captcha-crack.git | ||
python -m venv ./rotate-captcha-crack --system-site-packages | ||
cd ./rotate-captcha-crack | ||
# Choose the proper script to acivate venv according to your shell type. e.g. ./Script/active* | ||
python -m pip install -U pip | ||
pip install . | ||
``` | ||
|
||
### Download the Pretrained Models | ||
|
||
Download the zip files in [Release](https://github.com/Starry-OvO/rotate-captcha-crack/releases) and unzip them to the `./models` dir. | ||
|
||
The directory structure will be like `./models/RCCNet_v0_5/230228_20_07_25_000/best.pth` | ||
|
||
The models' names will change frequently as the project is still in beta status. So, if any `FileNotFoundError` occurs, please try to rollback to the corresponding tag first. | ||
|
||
### Test the Rotation Effect by a Single Captcha Picture | ||
|
||
If no GUI is presented, try to change the debugging behavior from showing images to saving them. | ||
|
||
```bash | ||
python test_captcha.py | ||
``` | ||
|
||
### Use HTTP Server | ||
|
||
+ Install extra dependencies | ||
|
||
```shell | ||
pip install aiohttp httpx[cli] | ||
``` | ||
|
||
+ Launch server | ||
|
||
```shell | ||
python server.py | ||
``` | ||
|
||
+ Another Shell to Send Images | ||
|
||
```shell | ||
httpx -m POST http://127.0.0.1:4396 -f img ./test.jpg | ||
``` | ||
|
||
## Train Your Own Model | ||
|
||
### Prepare Datasets | ||
|
||
+ For this project I'm using [Google Street View](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/) and [Landscape-Dataset](https://github.com/yuweiming70/Landscape-Dataset) for training. You can collect some photos and leave them in one directory. Without any size or shape requirement. | ||
|
||
+ Modify the `dataset_root` variable in `train.py`, let it points to the directory containing images. | ||
|
||
+ No manual labeling is required. All the cropping, rotation and resizing will be done soon after the image is loaded. | ||
|
||
### Train | ||
|
||
```bash | ||
python train_RotNetR.py | ||
``` | ||
|
||
### Validate the Model on Test Set | ||
|
||
```bash | ||
python test_RotNetR.py | ||
``` | ||
|
||
## Details of Design | ||
|
||
Most of the rotate-captcha cracking methods are based on [`d4nst/RotNet`](https://github.com/d4nst/RotNet), with `ResNet50` as its backbone. `RotNet` treat the angle prediction as a classification task with 360 classes, then use `CrossEntropy` to compute the loss. | ||
|
||
Yet `CrossEntropy` will bring a sizeable metric distance of about $358°$ between $1°$ and $359°$, clearly defies common sense, it should be a small value like $2°$. Meanwhile, the [`angle_error_regression`](https://github.com/d4nst/RotNet/blob/a56ea59818bbdd76d4dd8d83b8bbbaae6a802310/utils.py#L30-L36) given by [d4nst/RotNet](https://github.com/d4nst/RotNet) is less effective. That's because when dealing with outliers, the gradient will lead to a non-convergence result. You can easily understand this through the subsequent comparison between loss functions. | ||
|
||
My regression loss function `RotationLoss` is based on `MSELoss`, with an extra cosine-correction to decrease the metric distance between $±k*360°$. | ||
|
||
$$ \mathcal{L}(dist) = {dist}^{2} + \lambda_{cos} (1 - \cos(2\pi*{dist})) $$ | ||
|
||
Why `MSELoss` here? Because the `label` generated by | ||
self-supervised method is guaranteed not to contain any outliers. So our design does not need to consider the outliers. Also, `MSELoss` won't break the derivability of loss function. | ||
|
||
The loss function is derivable and *almost* convex over the entire $\mathbb{R}$. Why say *almost*? Because there will be local minimum at $predict = \pm 1$ when $\lambda_{cos} \gt 0.25$. | ||
|
||
Finally, let's take a look at the figure of two loss functions: | ||
|
||
![loss](https://user-images.githubusercontent.com/48282276/223087577-fe054521-36c4-4665-9132-2ca7dd2270f8.png) |
Oops, something went wrong.