Fine-tuned model performs worse than base model on Arabic numerals (possible catastrophic forgetting)

### Abstract
II've been working on fine-tuning EasyOCR's recognizer model specifically for computer-rendered Arabic numerals (٠١٢٣٤٥٦٧٨٩). After training, I compared the performance of the fine-tuned model with the original base model, but unfortunately, the results were noticeably worse.

During training, the model quickly reached 100% accuracy on the training set. I initially assumed this was expected due to the simplicity and narrow scope of the task (only Arabic numerals). The model also showed reasonable performance on the validation set.

However, when evaluating the fine-tuned model locally on the test data, I observed significantly higher Character Error Rate (CER) and Word Error Rate (WER) compared to the base model. This suggests the model may have overfitted or suffered from catastrophic forgetting.

I'm wondering if this is a known limitation, or if there are recommended practices for fine-tuning the recognizer on narrowly scoped datasets like numerals without degrading overall performance. Any guidance would be appreciated.


----
### Process Details
#### Experiment 1 Configuration
```yaml
number: '١٢٣٤٥٦٧٨٩٠'
symbol: ""
lang_char: ''
experiment_name: 'arabic_numerals_1'
train_data: 'all_data/train'
valid_data: 'all_data/validation'
manualSeed: 1111
workers: 1
batch_size: 1 #32
num_iter: 10000
valInterval: 100
saved_model: '/kaggle/input/pretrained-weights/arabic.pth' #'saved_models/en_filtered/iter_300000.pth'
FT: True
optim: False # default is Adadelta
lr: 0.1
beta1: 0.9
rho: 0.95
eps: 0.00000001
grad_clip: 5
#Data processing
select_data: 'train' # this is dataset folder in train_data
batch_ratio: '1' 
total_data_usage_ratio: 1.0
batch_max_length: 50 
imgH: 64
imgW: 600
rgb: False
contrast_adjust: False
sensitive: True
PAD: True
contrast_adjust: 0.0
data_filtering_off: False
# Model Architecture
Transformation: 'None'
FeatureExtraction: 'ResNet'
SequenceModeling: 'BiLSTM'
Prediction: 'CTC'
num_fiducial: 20
input_channel: 1
output_channel: 512
hidden_size: 512
decode: 'greedy'
new_prediction: True
freeze_FeatureFxtraction: False
freeze_SequenceModeling: False
```

#### Experiment 2 Configuration
```yaml
batch_size: 4 #32
optim: 'adam' # default is Adadelta
lr: 0.0001
freeze_FeatureFxtraction: True
```
----
### Dataset
The dataset is divided as follows (with augmentation applied):
- Training set: 4,500 images
- Validation set: 250 images
- Test set: 250 images

### Results
- Base Model:
```
CER (Character Error Rate): 0.7181
WER (Word Error Rate):     0.8640
```
- Fine-Tuned Model:
```
CER (Character Error Rate): 0.9933
WER (Word Error Rate):     1.0040
```
----
### Dependencies
I installed the `requirements.txt` provided in the trainer repo.
But for torch, and CUDA I used the following versions:
`torch==2.2.1+cu121 torchvision==0.17.1+cu121 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121` 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tuned model performs worse than base model on Arabic numerals (possible catastrophic forgetting) #1421

Abstract

Process Details

Experiment 1 Configuration

Experiment 2 Configuration

Dataset

Results

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fine-tuned model performs worse than base model on Arabic numerals (possible catastrophic forgetting) #1421

Description

Abstract

Process Details

Experiment 1 Configuration

Experiment 2 Configuration

Dataset

Results

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions