Mixture of Experts

Mixture of Experts Layer

Report

The final report of this assignment can be found in here.

Notebooks

conll_notebook.ipynb contains all the training of the models trained on the CoNLL-2003 dataset.
squad_notebook.ipynb contains all the training of the models trained on the SQuAD 1.1 dataset.
conll_data.ipynb contains all the data-preprocessing described in the report for the CoNLL-2003 dataset.
squad_data.ipynb contains all the data-preprocessing described in the report for the squad dataset.

Models

models.py contains the code for the different models used. The weights of the models can be found in the models folder. To use them, you'll have to load the weights of the embedding layer:

import pickle
import torch

# Load idx2token and token2idx files which are dictionaries.
with open('path/to/idx2token.pkl', 'rb') as file:
    idx2token = pickle.load(file)
with open('path/to/token2idx.pkl', 'rb') as file:
    token2idx = pickle.load(file)

# Load embedding weights.
embedding_weights = torch.load('path/to/weights')

Model Settings

Following is a guide to the different settings used for different models:

lstm_baseline.pt

# Non-moe model
BaselineModel(vocab_size = len(idx2token),
              embedding_dim =  50,
              hidden_dim =  128,
              intermediate_expert_dim =  128,
              output_dim = 9,  # This argument is only there for the CoNLL models.
              model_state_dict = embedding_weights,
            )

moe_baseline.pt

MoEModel(vocab_size = len(idx2token),
         embedding_dim = 50,
         hidden_dim = 128,
         intermediate_expert_dim = 128,
         output_dim = 9, # This argument is only there for the CoNLL models.
         model_state_dict = embedding_weights,
         router: 'top_k',
         num_experts: 8,
         top_k: 2, 
        )

moe_top_k_mask.pt and moe_importance_mask.pt

MoEModel(vocab_size = len(idx2token),
         embedding_dim = 50,
         hidden_dim = 128,
         intermediate_expert_dim = 128,
         output_dim = 9, # This argument is only there for the CoNLL models.
         model_state_dict = embedding_weights,
         router: 'top_k_mask',
         num_experts: 8,
         top_k: 2, 
        )

moe_noisy.pt and moe_noisy_load.pt

MoEModel(vocab_size = len(idx2token),
         embedding_dim = 50,
         hidden_dim = 128,
         intermediate_expert_dim = 128,
         output_dim = 9, # This argument is only there for the CoNLL models.
         model_state_dict = embedding_weights,
         router: 'noisy_top_k',
         num_experts: 8,
         top_k: 2, 
        )

Load Weights

Create model variable using the above guide and run the following script:

import torch

weights = torch.load('path/to/weights')['state_dict']
model.load_state_dict(weights)

Assets

CoNLL-2003:

idx2token.pkl and token2idx.pkl are index to token and token to index mapping respectively.
conll.glove.6B.50d.pt contains the embedding weights used for the CoNLL models.

SQuAD 1.1:

idx2token.pkl and token2idx.pkl are index to token and token to index mapping respectively.
squad.glove.6B.50d.pt contains the embedding weights used for the SQuAD models.
train_data.csv contains the training data in csv format.
dev_data.csv contains validation data in csv format.

Load Embeddings

You can load the embeddings in the following way.

import pickle
import torch

# Load idx2token and token2idx files which are dictionaries.
with open('path/to/idx2token.pkl', 'rb') as file:
    idx2token = pickle.load(file)
with open('path/to/token2idx.pkl', 'rb') as file:
    token2idx = pickle.load(file)
# Load embedding weights.
weights = torch.load('path/to/weights')

vocab_size = len(idx2token)

embedding = nn.Embedding(vocab_size,
                        embedding_dim=50,
                        padding_idx=0)
embedding.load_state_dict(weights)

Mixture of Experts

moe.py contains mixture of experts implementation with different routers. It also has the code for load loss.

Tokenizer

tokenizer.py contains the code for custom tokenizer class I wrote for both datasets.

Datasets

dataset.py contains the Pytorch dataset classes for both SQuAD and CoNLL datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Mixture of Experts

Report

Notebooks

Models

Model Settings

Load Weights

Assets

CoNLL-2003:

SQuAD 1.1:

Load Embeddings

Mixture of Experts

Tokenizer

Datasets

Files

README.md

Latest commit

History

README.md

File metadata and controls

Mixture of Experts

Report

Notebooks

Models

Model Settings

Load Weights

Assets

CoNLL-2003:

SQuAD 1.1:

Load Embeddings

Mixture of Experts

Tokenizer

Datasets