Skip to content

Latest commit

 

History

History
130 lines (118 loc) · 4.92 KB

README.md

File metadata and controls

130 lines (118 loc) · 4.92 KB

Mixture of Experts

Mixture of Experts Layer

Report

The final report of this assignment can be found in here.

Notebooks

Models

models.py contains the code for the different models used. The weights of the models can be found in the models folder. To use them, you'll have to load the weights of the embedding layer:

import pickle
import torch

# Load idx2token and token2idx files which are dictionaries.
with open('path/to/idx2token.pkl', 'rb') as file:
    idx2token = pickle.load(file)
with open('path/to/token2idx.pkl', 'rb') as file:
    token2idx = pickle.load(file)

# Load embedding weights.
embedding_weights = torch.load('path/to/weights')

Model Settings

Following is a guide to the different settings used for different models:

  • lstm_baseline.pt
# Non-moe model
BaselineModel(vocab_size = len(idx2token),
              embedding_dim =  50,
              hidden_dim =  128,
              intermediate_expert_dim =  128,
              output_dim = 9,  # This argument is only there for the CoNLL models.
              model_state_dict = embedding_weights,
            )
  • moe_baseline.pt
MoEModel(vocab_size = len(idx2token),
         embedding_dim = 50,
         hidden_dim = 128,
         intermediate_expert_dim = 128,
         output_dim = 9, # This argument is only there for the CoNLL models.
         model_state_dict = embedding_weights,
         router: 'top_k',
         num_experts: 8,
         top_k: 2, 
        ) 
  • moe_top_k_mask.pt and moe_importance_mask.pt
MoEModel(vocab_size = len(idx2token),
         embedding_dim = 50,
         hidden_dim = 128,
         intermediate_expert_dim = 128,
         output_dim = 9, # This argument is only there for the CoNLL models.
         model_state_dict = embedding_weights,
         router: 'top_k_mask',
         num_experts: 8,
         top_k: 2, 
        ) 
  • moe_noisy.pt and moe_noisy_load.pt
MoEModel(vocab_size = len(idx2token),
         embedding_dim = 50,
         hidden_dim = 128,
         intermediate_expert_dim = 128,
         output_dim = 9, # This argument is only there for the CoNLL models.
         model_state_dict = embedding_weights,
         router: 'noisy_top_k',
         num_experts: 8,
         top_k: 2, 
        ) 

Load Weights

Create model variable using the above guide and run the following script:

import torch

weights = torch.load('path/to/weights')['state_dict']
model.load_state_dict(weights)

Assets

CoNLL-2003:

SQuAD 1.1:

Load Embeddings

You can load the embeddings in the following way.

import pickle
import torch

# Load idx2token and token2idx files which are dictionaries.
with open('path/to/idx2token.pkl', 'rb') as file:
    idx2token = pickle.load(file)
with open('path/to/token2idx.pkl', 'rb') as file:
    token2idx = pickle.load(file)
# Load embedding weights.
weights = torch.load('path/to/weights')

vocab_size = len(idx2token)

embedding = nn.Embedding(vocab_size,
                        embedding_dim=50,
                        padding_idx=0)
embedding.load_state_dict(weights)

Mixture of Experts

moe.py contains mixture of experts implementation with different routers. It also has the code for load loss.

Tokenizer

tokenizer.py contains the code for custom tokenizer class I wrote for both datasets.

Datasets

dataset.py contains the Pytorch dataset classes for both SQuAD and CoNLL datasets.