-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CV套件建设专项活动 - 文字识别返回单字识别坐标 #10515
CV套件建设专项活动 - 文字识别返回单字识别坐标 #10515
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
很棒,可以补充下运行指令和输出结果。我明天再仔细看看~
ppocr/postprocess/rec_postprocess.py
Outdated
@@ -64,10 +64,55 @@ def pred_reverse(self, pred): | |||
|
|||
return ''.join(pred_re[::-1]) | |||
|
|||
def add_special_char(self, dict_character): | |||
def add_special_char(self, text, dict_character): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个多传入的参数似乎没有使用?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
噢,这个应该是写错了,没有用到这个函数
针对英文文档恢复:先下载推理模型:cd PaddleOCR/ppstructure
# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it
https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar
# Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight English table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar
tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar
# Download the layout model of publaynet dataset and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar
tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar
cd .. 然后在/ppstructure/目录下使用下面的指令推理:python predict_system.py \
--image_dir=./docs/table/1.png \
--det_model_dir=inference/en_PP-OCRv3_det_infer \
--rec_model_dir=inference/en_PP-OCRv3_rec_infer \
--rec_char_dict_path=../ppocr/utils/en_dict.txt \
--table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
--layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \
--layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \
--vis_font_path=../doc/fonts/simfang.ttf \
--recovery=True \
--output=../output/ \
--return_word_box=True 在../output/structure/1/show_0.jpg下查看推理结果的可视化,如下图所示:针对中文文档恢复先下载推理模型cd PaddleOCR/ppstructure
# download model
cd inference
# Download the detection model of the ultra-lightweight Chinesse PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese PP-OCRv3 model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar
# Download the ultra-lightweight Chinese table inch model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar
tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
# Download the layout model of CDLA dataset and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar
tar xf picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar
cd .. 上传下面的测试图片 "2.png" 至目录 ./docs/table/ 中然后在/ppstructure/目录下使用下面的指令推理python predict_system.py \
--image_dir=./docs/table/2.png \
--det_model_dir=inference/ch_PP-OCRv3_det_infer \
--rec_model_dir=inference/ch_PP-OCRv3_rec_infer \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
--table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
--table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
--layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_cdla_infer \
--layout_dict_path=../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt \
--vis_font_path=../doc/fonts/chinese_cht.ttf \
--recovery=True \
--output=../output/ \
--return_word_box=True 在../output/structure/2/show_0.jpg下查看推理结果的可视化,如下图所示: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 留了一下comment,可以看看要不要这样修改,提升代码简洁程度
- 结果验证没有问题。
tools/infer/predict_rec.py
Outdated
if self.postprocess_params['name'] == 'CTCLabelDecode': | ||
rec_result = self.postprocess_op(preds, return_word_box=self.return_word_box) | ||
ino_list = list(range(beg_img_no, end_img_no)) | ||
for rec_idx, rec in enumerate(rec_result): | ||
ino = ino_list[rec_idx] | ||
h, w = img_list[indices[ino]].shape[0:2] | ||
wh_ratio = w * 1.0 / h | ||
rec[2][0] = rec[2][0]*(wh_ratio/max_wh_ratio) | ||
else: | ||
rec_result = self.postprocess_op(preds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about add this in the call func of CTCLabelDecode postprocess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块好像不太好放进去,因为计算涉及到当前的图像的宽高比,宽高比的信息只有在predict_rec.py这个层级有
if isinstance(preds, tuple) or isinstance(preds, list): | ||
preds = preds[-1] | ||
if isinstance(preds, paddle.Tensor): | ||
preds = preds.numpy() | ||
preds_idx = preds.argmax(axis=2) | ||
preds_prob = preds.max(axis=2) | ||
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True) | ||
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True, return_word_box=return_word_box) | ||
if label is None: | ||
return text | ||
label = self.decode(label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉在预测里面的代码可以放在这里,因为所有的CTCLabelDecode都是在rec后处理中。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加这里的话可能要把图像的宽高比也作为参数传进来哈哈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个能不能利用kwargs传入呢?
ppstructure/predict_system.py
Outdated
rec_word_info = rec_res[2] | ||
col_num, word_list, word_col_list, state_list = rec_word_info | ||
box = box.tolist() | ||
bbox_x_start = box[0][0] | ||
bbox_x_end = box[1][0] | ||
bbox_y_start = box[0][1] | ||
bbox_y_end = box[2][1] | ||
|
||
cell_width = (bbox_x_end - bbox_x_start)/col_num | ||
|
||
word_box_list = [] | ||
word_box_content_list = [] | ||
cn_width_list = [] | ||
cn_col_list = [] | ||
for word, word_col, state in zip(word_list, word_col_list, state_list): | ||
if state == 'cn': | ||
if len(word_col) != 1: | ||
char_seq_length = (word_col[-1] - word_col[0] + 1) * cell_width | ||
char_width = char_seq_length/(len(word_col)-1) | ||
cn_width_list.append(char_width) | ||
cn_col_list += word_col | ||
word_box_content_list += word | ||
else: | ||
cell_x_start = bbox_x_start + int(word_col[0] * cell_width) | ||
cell_x_end = bbox_x_start + int((word_col[-1]+1) * cell_width) | ||
cell = ((cell_x_start, bbox_y_start), (cell_x_end, bbox_y_start), (cell_x_end, bbox_y_end), (cell_x_start, bbox_y_end)) | ||
word_box_list.append(cell) | ||
word_box_content_list.append("".join(word)) | ||
if len(cn_col_list) != 0: | ||
if len(cn_width_list) != 0: | ||
avg_char_width = np.mean(cn_width_list) | ||
else: | ||
avg_char_width = (bbox_x_end - bbox_x_start)/len(rec_str) | ||
for center_idx in cn_col_list: | ||
center_x = (center_idx+0.5)*cell_width | ||
cell_x_start = max(int(center_x - avg_char_width/2), 0) + bbox_x_start | ||
cell_x_end = min(int(center_x + avg_char_width/2), bbox_x_end-bbox_x_start) + bbox_x_start | ||
cell = ((cell_x_start, bbox_y_start), (cell_x_end, bbox_y_start), (cell_x_end, bbox_y_end), (cell_x_start, bbox_y_end)) | ||
word_box_list.append(cell) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议抽象这部分成函数到ppstructure 到utility,提升代码可读性。对了可以给代码增加一些注释,例如说明这一部分是将识别结果转化为基于字符的位置和内容信息
if isinstance(preds, tuple) or isinstance(preds, list): | ||
preds = preds[-1] | ||
if isinstance(preds, paddle.Tensor): | ||
preds = preds.numpy() | ||
preds_idx = preds.argmax(axis=2) | ||
preds_prob = preds.max(axis=2) | ||
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True) | ||
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True, return_word_box=return_word_box) | ||
if label is None: | ||
return text | ||
label = self.decode(label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个能不能利用kwargs传入呢?
bd477ac
to
92696e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py
* modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py
* Update recognition_en.md (#10059) ic15_dict.txt only have 36 digits * Update ocr_rec.h (#9469) It is enough to include preprocess_op.h, we do not need to include ocr_cls.h. * 补充num_classes注释说明 (#10073) ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes, 由于采用BIO标注,假设字典中包含n个字段(包含other)时,则类别数为2n-1;假设字典中包含n个字段(不含other)时,则类别数为2n+1。 * Update algorithm_overview_en.md (#9747) Fix links to super-resolution algorithm docs * 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110) * Update readme.md * Update readme.md * Update readme.md * Update models_list.md * trim trailling spaces @ `deploy/hubserving/readme_en.md` * `s/shell/bash/` @ `deploy/hubserving/readme_en.md` * Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md` * using Grammarly to weak `deploy/hubserving/readme_en.md` * using Grammarly to tweak `doc/doc_en/models_list_en.md` * `ocr_system` module will return with values of field `confidence` * Update README_CN.md * 修复测试服务中图片转Base64的引用地址错误。 (#8334) * Update application.md * [Doc] Fix 404 link. (#10318) * Update PP-OCRv3_det_train.md * Update knowledge_distillation.md * Update config.md * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181) * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file * refactor get_image_file_list function * Update customize.md (#10325) * Update FAQ.md (#10345) * Update FAQ.md (#10349) * Don't break overall processing on a bad image (#10216) * Add preprocessing common to OCR tasks (#10217) Add preprocessing to options * [MLU] add mlu device for infer (#10249) * Create newfeature.md * Update newfeature.md * remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502) * CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515) * modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py * Update README_ch.md * revert README_ch.md update * Fixed Layout recovery README file (#10493) Co-authored-by: Shubham Chambhare <[email protected]> * update_doc * bugfix --------- Co-authored-by: ChuongLoc <[email protected]> Co-authored-by: Wang Xin <[email protected]> Co-authored-by: tanjh <[email protected]> Co-authored-by: Louis Maddox <[email protected]> Co-authored-by: n0099 <[email protected]> Co-authored-by: zhenliang li <[email protected]> Co-authored-by: itasli <[email protected]> Co-authored-by: UserUnknownFactor <[email protected]> Co-authored-by: PeiyuLau <[email protected]> Co-authored-by: kerneltravel <[email protected]> Co-authored-by: ToddBear <[email protected]> Co-authored-by: Ligoml <[email protected]> Co-authored-by: Shubham Chambhare <[email protected]> Co-authored-by: Shubham Chambhare <[email protected]> Co-authored-by: andyj <[email protected]>
cpp的支持吗? |
这个单字坐标支持Parseq这种用了transformer decoder的识别模型的输出吗? |
请问可以支持C++吗?与python不同的是,C++的输出没有位置info信息,没法给出每个字符在detection box里面的位置 |
简单对照python补充了一下代码,勉强可以,不是很准确 |
c++是怎么实现的呢?能分享下思路吗? |
似乎不带标点符号?这个有考虑加入吗? |
是否有办法能支持文档上所有识别出来的内容都返回box的方式,楼上标点符号就是一种情况 @ToddBear |
No description provided.