The final report of this assignment can be found in here.
conll_notebook.ipynb
contains all the training of the models trained on the CoNLL-2003 dataset.squad_notebook.ipynb
contains all the training of the models trained on the SQuAD 1.1 dataset.conll_data.ipynb
contains all the data-preprocessing described in the report for the CoNLL-2003 dataset.squad_data.ipynb
contains all the data-preprocessing described in the report for the squad dataset.
models.py
contains the code for the different models used. The weights of the models can be found in the models folder.
To use them, you'll have to load the weights of the embedding layer:
import pickle
import torch
# Load idx2token and token2idx files which are dictionaries.
with open('path/to/idx2token.pkl', 'rb') as file:
idx2token = pickle.load(file)
with open('path/to/token2idx.pkl', 'rb') as file:
token2idx = pickle.load(file)
# Load embedding weights.
embedding_weights = torch.load('path/to/weights')
Following is a guide to the different settings used for different models:
lstm_baseline.pt
# Non-moe model
BaselineModel(vocab_size = len(idx2token),
embedding_dim = 50,
hidden_dim = 128,
intermediate_expert_dim = 128,
output_dim = 9, # This argument is only there for the CoNLL models.
model_state_dict = embedding_weights,
)
moe_baseline.pt
MoEModel(vocab_size = len(idx2token),
embedding_dim = 50,
hidden_dim = 128,
intermediate_expert_dim = 128,
output_dim = 9, # This argument is only there for the CoNLL models.
model_state_dict = embedding_weights,
router: 'top_k',
num_experts: 8,
top_k: 2,
)
moe_top_k_mask.pt
andmoe_importance_mask.pt
MoEModel(vocab_size = len(idx2token),
embedding_dim = 50,
hidden_dim = 128,
intermediate_expert_dim = 128,
output_dim = 9, # This argument is only there for the CoNLL models.
model_state_dict = embedding_weights,
router: 'top_k_mask',
num_experts: 8,
top_k: 2,
)
moe_noisy.pt
andmoe_noisy_load.pt
MoEModel(vocab_size = len(idx2token),
embedding_dim = 50,
hidden_dim = 128,
intermediate_expert_dim = 128,
output_dim = 9, # This argument is only there for the CoNLL models.
model_state_dict = embedding_weights,
router: 'noisy_top_k',
num_experts: 8,
top_k: 2,
)
Create model
variable using the above guide and run the following script:
import torch
weights = torch.load('path/to/weights')['state_dict']
model.load_state_dict(weights)
idx2token.pkl
andtoken2idx.pkl
are index to token and token to index mapping respectively.conll.glove.6B.50d.pt
contains the embedding weights used for the CoNLL models.
idx2token.pkl
andtoken2idx.pkl
are index to token and token to index mapping respectively.squad.glove.6B.50d.pt
contains the embedding weights used for the SQuAD models.train_data.csv
contains the training data in csv format.dev_data.csv
contains validation data in csv format.
You can load the embeddings in the following way.
import pickle
import torch
# Load idx2token and token2idx files which are dictionaries.
with open('path/to/idx2token.pkl', 'rb') as file:
idx2token = pickle.load(file)
with open('path/to/token2idx.pkl', 'rb') as file:
token2idx = pickle.load(file)
# Load embedding weights.
weights = torch.load('path/to/weights')
vocab_size = len(idx2token)
embedding = nn.Embedding(vocab_size,
embedding_dim=50,
padding_idx=0)
embedding.load_state_dict(weights)
moe.py
contains mixture of experts implementation with different routers. It also has the code for load loss.
tokenizer.py
contains the code for custom tokenizer class I wrote for both datasets.
dataset.py
contains the Pytorch dataset classes for both SQuAD and CoNLL datasets.