Skip to content

world4jason/text_detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text Detector for OCR

This text detector acts as text localization and uses the structure of RetinaNet and applies the techniques used in textboxes++.

Train

SynthText

[raw data & tfrecord](https://drive.google.com/drive/folders/1Nj07w3DEL95R3qaIJl8qv6Z9pRb2H405?usp=sharing) ``` cd text_detector/sample/SynthText python3 train.py --train_dataset="/path/to/tfrecord/" ```

balloon from Mask_RCNN

[raw data & tfrecord](https://drive.google.com/drive/folders/1lUrDCWLtj2oL78SRIgwgwtIl1iA6CuHT?usp=sharing) ``` cd text_detector/sample/balloon python3 train.py --train_dataset="/path/to/tfrecord/" ```
  • SSD structure is used, and vertical offset is added to make bbox proposal.
  • The structure is the same as TextBoxes, but the offset for the QuadBox has been added.
  • 4d-anchor box(xywh) offset -> (4+8)-d anchor box(xywh + x0y0x1y1x2y2x3y3) offset
  • last conv : 3x5 -> To have a receptive field optimized for the quad box
  • Simple one-stage object detection and good performance
  • FPN (Feature Pyramid Network) allows various levels of features to be used.
  • output : 1-d score + 4-d anchor box offset
  • cls loss = focal loss, loc loss = smooth L1 loss

Encode

  1. Define anchor boxes for each grid.
  2. Obtain the IoU between the GT box and the anchor box.
  3. Each anchor box is assigned to the largest GT box with IoU.
  4. At this time, IoU> 0.5: Text (label = 1) / 0.4 <IoU <0.5: Ignore (label = -1) / IoU <0.4: non-text (label = 0).

Todo list:

  • Training
    • Training Code
    • Model Save
    • Step Decay Learning Rate
    • Multiple GPU
  • Make Data
    • Make SynthText tfrecord
    • Make ICDAR13 tfrecord
    • Make ICDAR15 tfrecord
    • Make toy dataset(balloon) from Mask_RCNN
  • Network
    • ResNet50,ResNet101
    • Feature Pyramid Network
    • Task Specific Network
    • Trainable BatchNorm (?
    • Freeze BatchNorm (?
    • GroupNorm
    • (binary) focal loss
    • Slim Backbone pretrained weight
  • Utils
    • Add vertical offset
    • Validation infernece image visualization using Tensorboard
    • Add augmentation
    • Add evaluation code (mAP) ==> Unstable
    • QUAD version NMS (numpy version)
    • Combine two NMS method as paper describe
    • Visualization

Environment

  • os : Ubuntu 16.04.4 LTS
  • GPU : Nvidia GTX 1080ti (12GB)
  • Python : 3.6.6
  • Tensorflow : 1.4.0
  • Polygon

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages