How we can use Baal for NER using huggingface #262

ayushkm2799 · 2023-06-14T11:55:38Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Dref360 · 2023-06-14T19:25:47Z

Hello,

I'm not super familiar with NER, but Baal should be able to handle it easily.

Here is a script that load a NER model and outputs prediction. We use MC-Dropout to get the predictions and BALD to estimate the uncertainty.

Note that to get the uncertainty, we need to swap the axes to have the probabilities on the axis 1. [?, PROBS, Sequence, MC-Dropout].

from datasets import load_dataset
from transformers import pipeline

from baal.active.heuristics import BALD
from baal.bayesian.dropout import patch_module
from baal.transformers_trainer_wrapper import BaalTransformersTrainer

dataset = load_dataset("conll2003")

pipeline = pipeline('ner', model='issifuamajeed/distilbert-base-uncased-finetuned-ner')
tokenizer = pipeline.tokenizer

# Apply MC-Dropout
model = patch_module(pipeline.model)
trainer = BaalTransformersTrainer(model=model)


def preprocess(example):
    results = tokenizer(" ".join(example['tokens']), max_length=50,
                        truncation=True, padding='max_length')
    return results


tokenized_dataset = dataset.map(preprocess)

# Shape [Batch_size, Num-Tokens, Probabilities, Iterations]
predictions = trainer.predict_on_dataset(tokenized_dataset['test'], iterations=10)

# Predictions with Class first [batch_size, Probabilities, Num Tokens, Iteration]
next_to_label = BALD(reduction='sum')(predictions.swapaxes(1, 2))
uncertainties = BALD().get_uncertainties(predictions.swapaxes(1, 2))

I hope this helps!

ayushkm2799 · 2023-06-15T08:59:26Z

hi @Dref360,

So next_to_label contain a list of indexes, So, these are in sorted order i.e, if list contain [4,5,1,2] means 4 have highest uncertainty?

How we can use this in ActiveLearningLoop?

Dref360 · 2023-06-15T18:16:32Z

Hello,

Yes those would have the highest uncertainty.

I had to do some modifications to Baal to make ActiveLearningLoop work on NER. I opened #263 to fix that.

I edited my code to showcase a complete active learning experiment on NER.
If you pull the branch and run this gist, it should work as expected.

ayushkm2799 · 2023-06-29T05:35:37Z

Hii @Dref360

for _ in range(2):
    print(f"Active learning: labelled={al_dataset.n_labelled} unlabelled={al_dataset.n_unlabelled}")
    trainer.train()
    trainer.load_state_dict(init_weights)
    trainer.lr_scheduler = None
    trainer.evaluate()
    loop.step()

If we add this Piece of code does it means we
We are training the model on 100 sample of label data again and again and now we are adding 100 data from unlabelled data?
or it means in first iteration we are adding 100 sample of unlabel data to training and in next step model will train on 200 dataset and now will predict next 100 dataset?

Dref360 · 2023-06-29T14:13:00Z

Oh I made a mistake in the Gist, now updated. We first load the initial weights before training.

for _ in range(2):
    trainer.load_state_dict(init_weights)
    print(f"Active learning: labelled={al_dataset.n_labelled} unlabelled={al_dataset.n_unlabelled}")
    trainer.train()
    trainer.lr_scheduler = None
    trainer.evaluate()
    loop.step()

For your question:

The first step will train on 100 and label 100 more. Then we will train on 200, label 100, etc.

ayushkm2799 added the enhancement New feature or request label Jun 14, 2023

Dref360 mentioned this issue Sep 22, 2023

How to use the BaaL framework for named entity recognition? #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How we can use Baal for NER using huggingface #262

How we can use Baal for NER using huggingface #262

ayushkm2799 commented Jun 14, 2023

Dref360 commented Jun 14, 2023

ayushkm2799 commented Jun 15, 2023

Dref360 commented Jun 15, 2023

ayushkm2799 commented Jun 29, 2023

Dref360 commented Jun 29, 2023

How we can use Baal for NER using huggingface #262

How we can use Baal for NER using huggingface #262

Comments

ayushkm2799 commented Jun 14, 2023

Dref360 commented Jun 14, 2023

ayushkm2799 commented Jun 15, 2023

Dref360 commented Jun 15, 2023

ayushkm2799 commented Jun 29, 2023

Dref360 commented Jun 29, 2023