Softmax-CPR

CPR refers to "C"ontext Partition, "P"ointer Network, and "R"eranker Partition.

Introduction

The softmax bottleneck (Chang and McCallum (2022)) sometimes prevents the language models from predicting the desired distribution and the pointer networks can be used to break the bottleneck efficiently. Based on the finding, we propose the context/encoder partition by simplifying the pointer networks and the reranker partition to accelerate the word-by-word rerankers. By combining these softmax alternatives, softmax-CPR is significantly better and more efficient than mixture of softmax (MoS) in GPT-2, a state-of-the-art softmax alternative. In summarization experiments, without significantly decreasing its training/testing speed, softmax-CEPR based on T5-Small improves factCC score by 2 points in CNN/DM and XSUM dataset, and improves MAUVE scores by around 30% in BookSum paragraph-level dataset. Please see our paper for more details: https://arxiv.org/abs/2305.12289.

How to Run

For GPT2-related LM experiments

Put your text data into ./data (see an small example in ./data/openwebtext_2017_18_small).
Run the python code src/LM/preprocessing/prepare_gpt2_id_corpus_from_raw.py (change the file paths if necessary) to preprocess your text data
Run the script ./bin/LM/main.sh (change the python path, data paths, or different configurations if necessary) to train the model
Compare the validation differences from the log files

For summarization experiments

Run the script ./bin/summarization/main.sh (change the python path, data paths, or different configurations if necessary) to train the model
Compare the validation differences from the log files

If using conda, you can get this to work as follows:

conda create -n rerankLM python=3.8
conda activate rerankLM
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch (change this 10.2 according to official website https://pytorch.org/)
conda install -c conda-forge matplotlib
conda install -c conda-forge spacy
python -m spacy download en_core_web_sm
conda install pandas
conda install nltk

git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .

pip install datasets

Citation

@inproceedings{chang2023revisiting,
  title={Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond},
  abbr={ACL Findings},
  author={Haw-Shiuan Chang* and Zonghai Yao* and Alolika Gon and Hong Yu and Andrew McCallum},
  booktitle={Findings of the Association for Computational Linguistics: ACL 2023 (Findings of ACL)},
  year={2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
bin		bin
data		data
imgs		imgs
src		src
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Softmax-CPR

Introduction

How to Run

For GPT2-related LM experiments

For summarization experiments

Citation

About

Releases

Packages

Contributors 2

Languages

License

iesl/Softmax-CPR

Folders and files

Latest commit

History

Repository files navigation

Softmax-CPR

Introduction

How to Run

For GPT2-related LM experiments

For summarization experiments

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages