Skip to content

Latest commit

 

History

History
77 lines (55 loc) · 3.26 KB

README.md

File metadata and controls

77 lines (55 loc) · 3.26 KB

OpenLTG-MLM

OpenLTG-MLM is a code repository designed for Open-ended Text Generation tasks using bidirectional pre-trained langugage models such as BERT Family models (e.g., BERT and RoBERTa) through a Non-autoregressive Generation paradigm.

The main advantages of our work (has been accepted for ACL 2023!) include "enhancing the diversity of generated text" and "improving the generation speed of long text." We aim to bring renewed attention to bidirectional attention models, as they still hold potential in the field of text generation!

🧩 Installation

The codebase relies on Fairseq and PyTorch. As of March 12, 2024 (2024/3/12), it has been verified that Fairseq version 0.12.2 is compatible.

pip install fairseq

🧱 Reproducibility

We conducted experiments on "Writing Prompts" tasks in open domains, presenting results from different-sized datasets.

Dataset Test set Size
WritingPrompts (Slim) 26k
WritingPrompts download (.tar.bz2) 272k
WritingPromptsX 587k

🎮 Inference:

We have incorporated two methods of generation: Direct Generation and Recursive Span Generation:

  • DirectGen generate the target text directly in its entirety.

  • RecSpanGen generate the target text by specifying span numbers.

| Recursive span generation helps our model remain competitive in scenarios involving longer text generation.

Sampling Parameters:

  • -DSWAttn (Dynamic Sliding Window Attention) can assist attention mechanisms in focusing on crucial information within a broader local context, thus preventing interference from distant noise.
  • -NSamping (Nucleus Sampling) helps mitigate prevent degradation issues in language models during open-domain tasks.
  • -LTD (Linear Temperature Decay) is a crucial technique that ensures the model maintains high-quality outputs in the iterative process.
bash openltg_mlm/scripts/tasks/xsum/run_inf.sh

🚀 Training

We can extend the maximum encoding length of the RoBERTa model using --hierarchical-pos to support usage scenarios larger than 1k.

# Prepare Data
bash openltg_mlm/scripts/process/xsum/binarize.sh
# DirectGen
bash openltg_mlm/scripts/tasks/xsum/run_train.sh
# or RecSpanGen
# bash openltg_mlm/scripts/tasks/xsum/run_rec_train.sh

🧷 Citing

@inproceedings{liang-etal-2023-open,
    title = "Open-ended Long Text Generation via Masked Language Modeling",
    author = "Liang, Xiaobo  and
      Tang, Zecheng  and
      Li, Juntao  and
      Zhang, Min",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.13",
    doi = "10.18653/v1/2023.acl-long.13",
    pages = "223--241",
}