Skip to content

Commit

Permalink
docs: update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
lumina37 committed Mar 27, 2023
1 parent d8e87c7 commit d9a726f
Show file tree
Hide file tree
Showing 4 changed files with 250 additions and 268 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
!.gitignore
!LICENSE
!README.md
!README_EN.md
!README_zh-cn.md

!pyproject.toml

Expand Down
267 changes: 131 additions & 136 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,136 +1,131 @@
# Rotate-Captcha-Crack

中文 | [English](README_EN.md)

CNN预测图片旋转角度,可用于破解百度旋转验证码

测试效果:

![test_result](https://user-images.githubusercontent.com/48282276/224320691-a8eefd23-392b-4580-a729-7869fa237eaa.png)

本仓库实现了三类模型:

| 名称 | Backbone | 损失函数 | 跨域测试误差(越小越好) | 大小(MB) |
| ----------- | ----------------- | ------------ | ------------------------ | ---------- |
| RotNet | ResNet50 | 交叉熵 | 1.1548° | 92.7 |
| RotNetR | RegNetY 3.2GFLOPs | 交叉熵 | 1.2825° | 69.8 |
| RCCNet_v0_5 | RegNetY 3.2GFLOPs | MSE+余弦修正 | 42.7774° | 68.7 |

`RotNet`[`d4nst/RotNet`](https://github.com/d4nst/RotNet/blob/master/train/train_street_view.py)的PyTorch实现。`RotNetR`仅在`RotNet`的基础上替换了backbone,并将分类数减少至180。其在[谷歌街景数据集](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/)上训练64个epoch(耗时2小时)得到的平均预测误差为`1.2825°`。目前`RCCNet_v0_5`效果较差,推荐使用`RotNetR`

跨域测试使用[谷歌街景](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/)/[Landscape-Dataset](https://github.com/yuweiming70/Landscape-Dataset)作为训练集,百度验证码作为测试集(特别鸣谢@xiangbei1997

演示用到的百度验证码图片来自[RotateCaptchaBreak](https://github.com/chencchen/RotateCaptchaBreak/tree/master/data/baiduCaptcha)

## 体验已有模型

### 准备环境

+ 支持CUDA10+的GPU(如需训练则显存还需要不少于4G)

+ 确保你的`Python`版本`>=3.8 <3.11`

+ 确保你的`PyTorch`版本`>=1.11`

+ 拉取代码并安装依赖库

```shell
git clone --depth=1 https://github.com/Starry-OvO/rotate-captcha-crack.git
cd ./rotate-captcha-crack
pip install .
```

注意不要漏了`install`后面那个`.`

+ 或者,使用虚拟环境

```shell
git clone --depth=1 https://github.com/Starry-OvO/rotate-captcha-crack.git
python -m venv ./rotate-captcha-crack --system-site-packages
cd ./rotate-captcha-crack
# 根据你的Shell类型挑选一个合适的脚本激活虚拟环境 例如./Script/Active.ps1
python -m pip install -U pip
pip install .
```

### 下载预训练模型

下载[Release](https://github.com/Starry-OvO/rotate-captcha-crack/releases)中的压缩包并解压到`./models`文件夹下

文件目录结构类似`./models/RCCNet_v0_5/230228_20_07_25_000/best.pth`

本项目仍处于beta阶段,模型名称会频繁发生变更,因此出现任何`FileNotFoundError`请先尝试用git回退到对应的tag

### 输入一个验证码图像并查看旋转效果

如果你的系统没有GUI,尝试把debug方法从显示图像改成保存图像

```bash
python test_captcha.py
```

### 使用http服务端

+ 安装额外依赖

```shell
pip install aiohttp httpx[cli]
```

+ 运行服务端

```shell
python server.py
```

+ 另开一命令行窗口发送图像

```shell
httpx -m POST http://127.0.0.1:4396 -f img ./test.jpg
```

## 训练新模型

### 准备数据集

+ 我这里直接扒的[谷歌街景](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/)[Landscape-Dataset](https://github.com/yuweiming70/Landscape-Dataset),你也可以自己收集一些风景照并放到一个文件夹里,图像没有尺寸要求

+`train.py`里配置`dataset_root`变量指向装有图片的文件夹

+ 不需要手动标注,dataset会在读取图片的同时自动完成矩形裁剪、缩放旋转等工作

### 训练

```bash
python train_RotNetR.py
```

### 在测试集上验证模型

```bash
python test_RotNetR.py
```

## 设计细节

现有的旋图验证码破解方法大多基于[`d4nst/RotNet`](https://github.com/d4nst/RotNet),其backbone为`ResNet50`,将角度预测视作360分类问题,并计算交叉熵损失。

`RotNet`中使用的交叉熵损失会令 $1°$ 和 $359°$ 之间的度量距离接近一个类似 $358°$ 的较大值,这显然是一个违背常识的结果。它们之间的度量距离应当是一个类似 $2°$ 的极小值。

同时,[d4nst/RotNet](https://github.com/d4nst/RotNet)给出的[`angle_error_regression`](https://github.com/d4nst/RotNet/blob/a56ea59818bbdd76d4dd8d83b8bbbaae6a802310/utils.py#L30-L36)损失函数效果较差。这是因为该损失函数在应对离群值时的梯度将导致不收敛的结果,你可以在后续的损失函数比较中轻松理解这一点。

本人设计的回归损失函数`RotationLoss``MSELoss`的基础上加了个余弦约束项来缩小真实值的 $±k*360°$ 与真实值之间的度量距离。

$$ \mathcal{L}(dist) = {dist}^{2} + \lambda_{cos} (1 - \cos(2\pi*{dist})) $$

为什么这里使用`MSELoss`,因为自监督学习生成的`label`可以保证不含有任何离群值,因此损失函数设计不需要考虑离群值的问题,同时`MSELoss`不破坏损失函数的可导性。

该损失函数在整个实数域可导且几乎为凸,为什么说是几乎,因为当 $\lambda_{cos} \gt 0.25$ 时在 $predict = \pm 1$ 的地方会出现局部极小值。

最后直观比较一下`RotationLoss``angle_error_regression`的函数图像。

![loss](https://user-images.githubusercontent.com/48282276/223087577-fe054521-36c4-4665-9132-2ca7dd2270f8.png)

## 相关文章

[吾爱破解 - 简单聊聊旋转验证码攻防](https://www.52pojie.cn/thread-1754224-1-1.html)
# Rotate-Captcha-Crack

[中文](README_zh-cn.md) | English

Predict the rotation angle of given picture through CNN. It can be used for rotate-captcha cracking.

Test result:

![test_result](https://user-images.githubusercontent.com/48282276/224320691-a8eefd23-392b-4580-a729-7869fa237eaa.png)

Three kinds of models are implemented, as shown below.

| Name | Backbone | Loss | Cross-Domain Loss (less is better) | Size (MB) |
| ----------- | ----------------- | -------------------------- | ---------------------------------- | --------- |
| RotNet | ResNet50 | CrossEntropy | 1.1548° | 92.7 |
| RotNetR | RegNetY 3.2GFLOPs | CrossEntropy | 1.2825° | 69.8 |
| RCCNet_v0_5 | RegNetY 3.2GFLOPs | MSE with Cosine-Correction | 42.7774° | 68.7 |

RotNet is the implementation of [`d4nst/RotNet`](https://github.com/d4nst/RotNet/blob/master/train/train_street_view.py) over PyTorch. `RotNetR` is based on `RotNet`. I just renewed its backbone and reduce its class number to 180. It's average prediction error is `1.2825°`, obtained by 64 epochs of training (2 hours) on the [Google Street View](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/) dataset. Presently the performance of `RCCNet_v0_5` is far weaker than the other two. I suggest you use `RotNetR` anyway.

About the Cross-Domain Test: [Google Street View](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/) and [Landscape-Dataset](https://github.com/yuweiming70/Landscape-Dataset) for training, and Captcha Pictures from Baidu for testing (special thx to @xiangbei1997)

The captcha picture used in the demo above comes from [RotateCaptchaBreak](https://github.com/chencchen/RotateCaptchaBreak/tree/master/data/baiduCaptcha)

## Try it!

### Prepare

+ GPU supporting CUDA10+ (mem>=4G for training)

+ Python>=3.8 <3.11

+ PyTorch>=1.11

+ Clone the repository and install all requiring dependencies

```shell
git clone --depth=1 https://github.com/Starry-OvO/rotate-captcha-crack.git
cd ./rotate-captcha-crack
pip install .
```

**DONT** miss the `.` after `install`

+ Or, if you prefer `venv`

```shell
git clone --depth=1 https://github.com/Starry-OvO/rotate-captcha-crack.git
python -m venv ./rotate-captcha-crack --system-site-packages
cd ./rotate-captcha-crack
# Choose the proper script to acivate venv according to your shell type. e.g. ./Script/active*
python -m pip install -U pip
pip install .
```

### Download the Pretrained Models

Download the zip files in [Release](https://github.com/Starry-OvO/rotate-captcha-crack/releases) and unzip them to the `./models` dir.

The directory structure will be like `./models/RCCNet_v0_5/230228_20_07_25_000/best.pth`

The models' names will change frequently as the project is still in beta status. So, if any `FileNotFoundError` occurs, please try to rollback to the corresponding tag first.

### Test the Rotation Effect by a Single Captcha Picture

If no GUI is presented, try to change the debugging behavior from showing images to saving them.

```bash
python test_captcha.py
```

### Use HTTP Server

+ Install extra dependencies

```shell
pip install aiohttp httpx[cli]
```

+ Launch server

```shell
python server.py
```

+ Another Shell to Send Images

```shell
httpx -m POST http://127.0.0.1:4396 -f img ./test.jpg
```

## Train Your Own Model

### Prepare Datasets

+ For this project I'm using [Google Street View](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/) and [Landscape-Dataset](https://github.com/yuweiming70/Landscape-Dataset) for training. You can collect some photos and leave them in one directory. Without any size or shape requirement.

+ Modify the `dataset_root` variable in `train.py`, let it points to the directory containing images.

+ No manual labeling is required. All the cropping, rotation and resizing will be done soon after the image is loaded.

### Train

```bash
python train_RotNetR.py
```

### Validate the Model on Test Set

```bash
python test_RotNetR.py
```

## Details of Design

Most of the rotate-captcha cracking methods are based on [`d4nst/RotNet`](https://github.com/d4nst/RotNet), with `ResNet50` as its backbone. `RotNet` treat the angle prediction as a classification task with 360 classes, then use `CrossEntropy` to compute the loss.

Yet `CrossEntropy` will bring a sizeable metric distance of about $358°$ between $1°$ and $359°$, clearly defies common sense, it should be a small value like $2°$. Meanwhile, the [`angle_error_regression`](https://github.com/d4nst/RotNet/blob/a56ea59818bbdd76d4dd8d83b8bbbaae6a802310/utils.py#L30-L36) given by [d4nst/RotNet](https://github.com/d4nst/RotNet) is less effective. That's because when dealing with outliers, the gradient will lead to a non-convergence result. You can easily understand this through the subsequent comparison between loss functions.

My regression loss function `RotationLoss` is based on `MSELoss`, with an extra cosine-correction to decrease the metric distance between $±k*360°$.

$$ \mathcal{L}(dist) = {dist}^{2} + \lambda_{cos} (1 - \cos(2\pi*{dist})) $$

Why `MSELoss` here? Because the `label` generated by
self-supervised method is guaranteed not to contain any outliers. So our design does not need to consider the outliers. Also, `MSELoss` won't break the derivability of loss function.

The loss function is derivable and *almost* convex over the entire $\mathbb{R}$. Why say *almost*? Because there will be local minimum at $predict = \pm 1$ when $\lambda_{cos} \gt 0.25$.

Finally, let's take a look at the figure of two loss functions:

![loss](https://user-images.githubusercontent.com/48282276/223087577-fe054521-36c4-4665-9132-2ca7dd2270f8.png)
Loading

0 comments on commit d9a726f

Please sign in to comment.