PythonSpiderNotes/Captcha1 at master · lining0806/PythonSpiderNotes

ReadMe.md

本项目采用Tesseract V3.01版本(V3.02版本在训练时有改动，多shapeclustering过程)

Tesseract用法：

配置环境变量TESSDATA_PREFIX =“D:\Tesseract-ocr\”，即tessdata的目录，在源码中会到这个路径下查找相应的字库文件用来识别。
命令格式：
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]
只识别成数字
tesseract imagename outputbase -l eng digits
解决empty page!!
-psm N

7 = Treat the image as a single text line
tesseract imagename outputbase -l eng -psm 7
configfile 参数值为tessdata\configs 和 tessdata\tessconfigs 目录下的文件名：
tesseract imagename outputbase -l eng nobatch

验证码识别项目使用方法1：

将下载的图片放到./pic目录下，

验证码图片名称：get_random.jpg
价格图片名称：get_price_img.png
命令格式：

验证码图片识别：python tess_test.py ./pic/get_random.jpg
价格图片识别：python tess_test.py ./pic/get_price_img.png

打印出识别的结果

若要将结果存在临时文本文件temp.txt中，则修改pytessr_pro.py中代码"cleanup_scratch_flag = True"改为"cleanup_scratch_flag = False"