Skip to content

Fine-tuned model performs worse than base model on Arabic numerals (possible catastrophic forgetting) #1421

@David-Magdy

Description

@David-Magdy

Abstract

II've been working on fine-tuning EasyOCR's recognizer model specifically for computer-rendered Arabic numerals (٠١٢٣٤٥٦٧٨٩). After training, I compared the performance of the fine-tuned model with the original base model, but unfortunately, the results were noticeably worse.

During training, the model quickly reached 100% accuracy on the training set. I initially assumed this was expected due to the simplicity and narrow scope of the task (only Arabic numerals). The model also showed reasonable performance on the validation set.

However, when evaluating the fine-tuned model locally on the test data, I observed significantly higher Character Error Rate (CER) and Word Error Rate (WER) compared to the base model. This suggests the model may have overfitted or suffered from catastrophic forgetting.

I'm wondering if this is a known limitation, or if there are recommended practices for fine-tuning the recognizer on narrowly scoped datasets like numerals without degrading overall performance. Any guidance would be appreciated.


Process Details

Experiment 1 Configuration

number: '١٢٣٤٥٦٧٨٩٠'
symbol: ""
lang_char: ''
experiment_name: 'arabic_numerals_1'
train_data: 'all_data/train'
valid_data: 'all_data/validation'
manualSeed: 1111
workers: 1
batch_size: 1 #32
num_iter: 10000
valInterval: 100
saved_model: '/kaggle/input/pretrained-weights/arabic.pth' #'saved_models/en_filtered/iter_300000.pth'
FT: True
optim: False # default is Adadelta
lr: 0.1
beta1: 0.9
rho: 0.95
eps: 0.00000001
grad_clip: 5
#Data processing
select_data: 'train' # this is dataset folder in train_data
batch_ratio: '1' 
total_data_usage_ratio: 1.0
batch_max_length: 50 
imgH: 64
imgW: 600
rgb: False
contrast_adjust: False
sensitive: True
PAD: True
contrast_adjust: 0.0
data_filtering_off: False
# Model Architecture
Transformation: 'None'
FeatureExtraction: 'ResNet'
SequenceModeling: 'BiLSTM'
Prediction: 'CTC'
num_fiducial: 20
input_channel: 1
output_channel: 512
hidden_size: 512
decode: 'greedy'
new_prediction: True
freeze_FeatureFxtraction: False
freeze_SequenceModeling: False

Experiment 2 Configuration

batch_size: 4 #32
optim: 'adam' # default is Adadelta
lr: 0.0001
freeze_FeatureFxtraction: True

Dataset

The dataset is divided as follows (with augmentation applied):

  • Training set: 4,500 images
  • Validation set: 250 images
  • Test set: 250 images

Results

  • Base Model:
CER (Character Error Rate): 0.7181
WER (Word Error Rate):     0.8640
  • Fine-Tuned Model:
CER (Character Error Rate): 0.9933
WER (Word Error Rate):     1.0040

Dependencies

I installed the requirements.txt provided in the trainer repo.
But for torch, and CUDA I used the following versions:
torch==2.2.1+cu121 torchvision==0.17.1+cu121 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions