Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

Resources on ChatGPT and Large Language Models

Collection of papers and related works for Large Language Models (ChatGPT, GPT-3, Codex etc.).

Contributors

This repository is contributed by the following contributors.

Organizers: Guilin Qi (漆桂林), Xiaofang Qi (戚晓芳)
Paper Collectors: Zafar Ali, Sheng Bi (毕胜), Yongrui Chen (陈永锐), Zizhuo Chen (陈孜卓), Xinbang Dai (戴鑫邦), Huan Gao (高桓), Nan Hu (胡楠), Shilong Hu (胡世龙), Jingqi Kang (康婧淇), Jiaqi Li (李嘉琦), Dehai Min (闵德海), Guilin Qi (漆桂林), Yiming Tan (谭亦鸣), Tongtong Wu (吴桐桐), Songlin Zhai (翟松林), Shenyu Zhang (张沈昱), Yuxin Zhang (张裕欣)
Maintainers: Runzhe Wang (王润哲), Shenyu Zhang (张沈昱)

The automation script of this repo is powered by Auto-Bibfile. If you'd like to commit to this repo, please modify bibtex.bib or related_works.json and re-generate README.md using python scripts/run.py.

This page categorizes the literature by the Published Venue

Papers

Outline

Hyperlinks

AAAI

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models,
by Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li and Cairong Zhao
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting,
by Xinyan Guan, Yanjiang Liu, Hongyu Lin, Yaojie Lu, Ben He, Xianpei Han and Le Sun
Is a Large Language Model a Good Annotator for Event Extraction?,
by Ruirui Chen, Chengwei Qin, Weifeng Jiang and Dongkyu Choi
Code-Style In-Context Learning for Knowledge-Based Question Answering,
by Zhijie Nie, Richong Zhang, Zhongyuan Wang and Xudong Liu
Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning,
by Zhenfang Chen, Qinhong Zhou, Yikang Shen, Yining Hong, Zhiqing Sun, Dan Gutfreund and Chuang Gan
Can Large Language Models Understand Real-World Complex Instructions?,
by Qianyu He, Jie Zeng, Wenhao Huang, Lina Chen, Jin Xiao, Qianxi He, Xunzhe Zhou, Jiaqing Liang et al.
Beyond Entities: A Large-Scale Multi-Modal Knowledge Graph with Triplet Fact Grounding,
by Jingping Liu, Mingchuan Zhang, Weichen Li, Chao Wang, Shuang Li, Haiyun Jiang, Sihang Jiang, Yanghua Xiao et al.
EcomGPT: Instruction-Tuning Large Language Models with Chain-of-Task Tasks for E-commerce,
by Yangning Li, Shirong Ma, Xiaobin Wang, Shen Huang, Chengyue Jiang, Haitao Zheng, Pengjun Xie, Fei Huang et al.
Fusing Task-Oriented and Open-Domain Dialogues in Conversational Agents,
by Tom Young, Frank Xing, Vlad Pandelea, Jinjie Ni and Erik Cambria
Selecting Optimal Context Sentences for Event-Event Relation Extraction,
by Hieu Man, Nghia Trung Ngo, Linh Ngo Van and Thien Huu Nguyen
Commonsense Knowledge Reasoning and Generation with Pre-trained Language Models: A Survey,
by Prajjwal Bhargava and Vincent Ng
DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances,
by Xiaodong Gu, Kang Min Yoo and Jung-Woo Ha
UBAR: Towards Fully End-to-End Task-Oriented Dialog System with GPT-2,
by Yunyi Yang, Yunhao Li and Xiaojun Quan
Parsing as Pretraining,
by David Vilares, Michalina Strzyz, Anders S\ogaard and Carlos G'omez-Rodr'\iguez
Unsupervised Deep Learning via Affinity Diffusion,
by Jiabo Huang, Qi Dong, Shaogang Gong and Xiatian Zhu
Cross-Lingual Natural Language Generation via Pre-Training,
by Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao and Heyan Huang
Improved Knowledge Distillation via Teacher Assistant,
by Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa and Hassan Ghasemzadeh
Towards Hands-Free Visual Dialog Interactive Recommendation,
by Tong Yu, Yilin Shen and Hongxia Jin
ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding,
by Sun, Yu, Wang, Shuohuan, Li, Yukun, Feng, Shikun, Tian, Hao, Wu, Hua and Wang, Haifeng
In order to extract the lexical, syntactic and semantic information from training corpora, we propose a continual pre-training framework named ERNIE 2.0 which incrementally builds pre-training tasks and then learn pre-trained models on these constructed tasks via continual multi-task learning.

ACL

Few-shot Transfer Learning for Knowledge Base Question Answering: Fusing Supervised Models with In-Context Learning,
by Mayur Patidar, Riya Sawhney, Avinash Kumar Singh, Biswajit Chatterjee, Mausam and Indrajit Bhattacharya
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models,
by Mihir Parmar, Nisarg Patel, Neeraj Varshney, Mutsumi Nakamura, Man Luo, Santosh Mashetty, Arindam Mitra and Chitta Baral
Prompting Language Models for Linguistic Structure,
by Terra Blevins, Hila Gonen and Luke Zettlemoyer
The Web Can Be Your Oyster for Improving Language Models,
by Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jingyuan Wang, Jian-Yun Nie and Ji-Rong Wen
Small Pre-trained Language Models Can be Fine-tuned as Large Models via Over-Parameterization,
by Ze-Feng Gao, Kun Zhou, Peiyu Liu, Wayne Xin Zhao and Ji-Rong Wen
Unified Demonstration Retriever for In-Context Learning,
by Xiaonan Li, Kai Lv, Hang Yan, Tianyang Lin, Wei Zhu, Yuan Ni, Guotong Xie, Xiaoling Wang et al.
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models,
by Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee and Ee-Peng Lim
Causality-aware Concept Extraction based on Knowledge-guided Prompting,
by Siyu Yuan, Deqing Yang, Jinxi Liu, Shuyu Tian, Jiaqing Liang, Yanghua Xiao and Rui Xie
Revisiting Relation Extraction in the era of Large Language Models,
by Somin Wadhwa, Silvio Amir and Byron C. Wallace
Learning In-context Learning for Named Entity Recognition,
by Jiawei Chen, Yaojie Lu, Hongyu Lin, Jie Lou, Wei Jia, Dai Dai, Hua Wu, Boxi Cao et al.
WebIE: Faithful and Robust Information Extraction on the Web,
by Chenxi Whitehouse, Clara Vania, Alham Fikri Aji, Christos Christodoulopoulos and Andrea Pierleoni
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark,
by Jason Hoelscher-Obermaier, Julia Persson, Esben Kran, Ioannis Konstas and Fazl Barez
Language Model Analysis for Ontology Subsumption Inference,
by Yuan He, Jiaoyan Chen, Ernesto Jim'enez-Ruiz, Hang Dong and Ian Horrocks
BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from Pretrained Language Models,
by Shibo Hao, Bowen Tan, Kaiwen Tang, Bin Ni, Xiyan Shao, Hengzhe Zhang, Eric P. Xing and Zhiting Hu
Text Augmented Open Knowledge Graph Completion via Pre-Trained Language Models,
by Pengcheng Jiang, Shivam Agarwal, Bowen Jin, Xuan Wang, Jimeng Sun and Jiawei Han
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors,
by Kai Zhang, Bernal Jimenez Gutierrez and Yu Su
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes,
by Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alex Ratner, Ranjay Krishna, Chen-Yu Lee et al.
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets,
by Md. Tahmid Rahman Laskar, M. Saiful Bari, Mizanur Rahman, Md Amran Hossen Bhuiyan, Shafiq Joty and Jimmy X. Huang
Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge,
by Jiangjie Chen, Wei Shi, Ziquan Fu, Sijie Cheng, Lei Li and Yanghua Xiao
Chain of Thought Prompting Elicits Knowledge Augmentation,
by Dingjun Wu, Jing Zhang and Xinmei Huang
Extracting Multi-valued Relations from Language Models,
by Sneha Singhania, Simon Razniewski and Gerhard Weikum
CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors,
by Peng Li, Tianxiang Sun, Qiong Tang, Hang Yan, Yuanbin Wu, Xuanjing Huang and Xipeng Qiu
Meta-learning via Language Model In-context Tuning,
by Yanda Chen, Ruiqi Zhong, Sheng Zha, George Karypis and He He
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity,
by Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel and Pontus Stenetorp
(1) This work demonstrates that few-shot prompts suffer from order sensitivity, in that for the same prompt the order in which samples are provided can make a difference to model performance.
(2) This work introduces a probing method which constructs an artificial development set by language models themselves to alleviate the order sensitivity problem.
An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels,
by Taylor Sorensen, Joshua Robinson, Christopher Michael Rytting, Alexander Glenn Shaw, Kyle Jeffrey Rogers, Alexia Pauline Delorey, Mahmoud Khalil, Nancy Fulda et al.
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models,
by Robert L. Logan IV, Ivana Balazevic, Eric Wallace, Fabio Petroni, Sameer Singh and Sebastian Riedel
Adversarial Soft Prompt Tuning for Cross-Domain Sentiment Analysis,
by Hui Wu and Xiaodong Shi
Fine-Grained Controllable Text Generation Using Non-Residual Prompting,
by Fredrik Carlsson, Joey "Ohman, Fangyu Liu, Severine Verlinden, Joakim Nivre and Magnus Sahlgren
MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators,
by Zhixing Tan, Xiangwen Zhang, Shuo Wang and Yang Liu
Noisy Channel Language Model Prompting for Few-Shot Text Classification,
by Sewon Min, Mike Lewis, Hannaneh Hajishirzi and Luke Zettlemoyer
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer,
by Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou' and Daniel Cer
ELLE: Efficient Lifelong Pre-training for Emerging Data,
by Yujia Qin, Jiajie Zhang, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun and Jie Zhou
UniXcoder: Unified Cross-Modal Pre-training for Code Representation,
by Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou and Jian Yin
Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network,
by Zheng Gong, Kun Zhou, Xin Zhao, Jing Sha, Shijin Wang and Ji-Rong Wen
Improving Supervised Drug-Protein Relation Extraction with Distantly Supervised Models,
by Naoki Iinuma, Makoto Miwa and Yutaka Sasaki
Comparing Encoder-Only and Encoder-Decoder Transformers for Relation Extraction from Biomedical Texts: An Empirical Study on Ten Benchmark Datasets,
by Mourad Sarrouti, Carson Tao and Yoann Mamy Randriamihaja
Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View,
by Boxi Cao, Hongyu Lin, Xianpei Han, Fangchao Liu and Le Sun
Sequence-to-Sequence Knowledge Graph Completion and Question Answering,
by Apoorv Saxena, Adrian Kochsiek and Rainer Gemulla
Dict-BERT: Enhancing Language Model Pre-training with Dictionary,
by Wenhao Yu, Chenguang Zhu, Yuwei Fang, Donghan Yu, Shuohang Wang, Yichong Xu, Michael Zeng and Meng Jiang
Finding Structural Knowledge in Multimodal-BERT,
by Victor Milewski, Miryam de Lhoneux and Marie-Francine Moens
Reframing Instructional Prompts to GPTk's Language,
by Daniel Khashabi, Chitta Baral, Yejin Choi and Hannaneh Hajishirzi
Generated Knowledge Prompting for Commonsense Reasoning,
by Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi and Hannaneh Hajishirzi
Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation,
by Dongha Choi, Hongseok Choi and Hyunju Lee
Controllable Open-ended Question Generation with A New Question Type Ontology,
by Shuyang Cao and Lu Wang
PRAL: A Tailored Pre-Training Model for Task-Oriented Dialog Generation,
by Jing Gu, Qingyang Wu, Chongruo Wu, Weiyan Shi and Zhou Yu
DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Text Generation,
by Xinyu Hua, Ashwin Sreevatsa and Lu Wang
Latent Reasoning for Low-Resource Question Generation,
by Xinting Huang, Jianzhong Qi, Yu Sun and Rui Zhang
JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs,
by Pei Ke, Haozhe Ji, Yu Ran, Xin Cui, Liwei Wang, Linfeng Song, Xiaoyan Zhu and Minlie Huang
TextBox: A Unified, Modularized, and Extensible Framework for Text Generation,
by Junyi Li, Tianyi Tang, Gaole He, Jinhao Jiang, Xiaoxuan Hu, Puzhao Xie, Zhipeng Chen, Zhuohao Yu et al.
Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models,
by Junyi Li, Tianyi Tang, Wayne Xin Zhao, Zhicheng Wei, Nicholas Jing Yuan and Ji-Rong Wen
Prefix-Tuning: Optimizing Continuous Prompts for Generation,
by Xiang Lisa Li and Percy Liang
GLGE: A New General Language Generation Evaluation Benchmark,
by Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu et al.
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation,
by Fuli Luo, Wei Wang, Jiahao Liu, Yijia Liu, Bin Bi, Songfang Huang, Fei Huang and Luo Si
ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation,
by Kaushal Kumar Maurya, Maunendra Sankar Desarkar, Yoshinobu Kano and Kumari Deepshikha
A Plug-and-Play Method for Controlled Text Generation,
by Damian Pascual, Beni Egressy, Clara Meister, Ryan Cotterell and Roger Wattenhofer
Towards Table-to-Text Generation with Numerical Reasoning,
by Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura and Hiroya Takamura
Structure-Aware Pre-Training for Table-to-Text Generation,
by Xinyu Xing and Xiaojun Wan
AugNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation,
by Xinnuo Xu, Guoyin Wang, Young-Bum Kim and Sungjin Lee
DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling,
by Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang and Tie-Yan Liu
FastSeq: Make Sequence Generation Faster,
by Yu Yan, Fei Hu, Jiusheng Chen, Nikhil Bhendawade, Ting Ye, Yeyun Gong, Nan Duan, Desheng Cui et al.
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains,
by Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong and Furu Wei
Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation,
by Shizhe Diao, Ruijia Xu, Hongjin Su, Yilei Jiang, Yan Song and Tong Zhang
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters,
by Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu Ji, Guihong Cao, Daxin Jiang et al.
We propose KADAPTER, a framework that retains the original parameters of the pre-trained model fixed
and supports the development of versatile
knowledge-infused model.
Parameter-Efficient Transfer Learning with Diff Pruning,
by Guo, Demi , Rush, Alexander and Kim, Yoon
The approach learns a task-specific “diff” vector that extends the original pretrained parameters. As the number of tasks increases, diff pruning remains parameter-efficient, as it requires storing only a small diff vector for each task.
Refining Sample Embeddings with Relation Prototypes to Enhance Continual Relation Extraction,
by Cui, Li , Yang, Deqing , Yu, Jiaxin , Hu, Chengwei , Cheng, Jiayang , Yi, Jingjie and Xiao, Yanghua
To fully utilize memorized samples, in this paper, we employ relation prototype to extract useful information of each relation.
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation,
by He, Ruidan , Liu, Linlin , Ye, Hai , Tan, Qingyu , Ding, Bosheng , Cheng, Liying , Low, Jiawei , Bing, Lidong et al.
we first show that adapter-based tuning better mitigates forgetting issues than fine-tuning since it yields representations with less deviation from those generated by the initial PrLM. Effectiveness: it tendsto outperform fine-tuning on both low-resource and cross-lingual tasks; 2 it demonstrates higher stability under different learning rates compared to fine-tuning.
Rational LAMOL: A Rationale-based Lifelong Learning Framework,
by Kanwatchara, Kasidis , Horsuwan, Thanapapas , Lertvittayakumjorn, Piyawat , Kijsirikul, Boonserm and Vateekul, Peerapon
Rational LAMOL enhances LAMOL, a recent LL model, by applying critical freezing guided by human rationales. When the human rationales are not available, we propose exploiting unsupervised generated rationales as substitutions.
Do Language Models Perform Generalizable Commonsense Inference?,
by Peifeng Wang, Filip Ilievski, Muhao Chen and Xiang Ren
Mention Flags (MF): Constraining Transformer-based Text Generators,
by Yufei Wang, Ian D. Wood, Stephen Wan, Mark Dras and Mark Johnson
Prompting Contrastive Explanations for Commonsense Reasoning Tasks,
by Bhargavi Paranjape, Julian Michael, Marjan Ghazvininejad, Hannaneh Hajishirzi and Luke Zettlemoyer
Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases,
by Boxi Cao, Hongyu Lin, Xianpei Han, Le Sun, Lingyong Yan, Meng Liao, Tong Xue and Jin Xu
Leveraging Type Descriptions for Zero-shot Named Entity Recognition and Classification,
by Rami Aly, Andreas Vlachos and Ryan McDonald
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable,
by Siqi Bao, Huang He, Fan Wang, Hua Wu and Haifeng Wang
Distilling Knowledge Learned in BERT for Text Generation,
by Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu and Jingjing Liu
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,
by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov and Luke Zettlemoyer
Rigid Formats Controlled Text Generation,
by Piji Li, Haisong Zhang, Xiaojiang Liu and Shuming Shi
GPT-too: A Language-Model-First Approach for AMR-to-Text Generation,
by Manuel Mager, Ram'on Fernandez Astudillo, Tahira Naseem, Md. Arafat Sultan, Young-Suk Lee, Radu Florian and Salim Roukos
DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation,
by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu et al.
Integrating Multimodal Information in Large Pretrained Transformers,
by Wasifur Rahman, Md. Kamrul Hasan, Sangwu Lee, AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency and Mohammed E. Hoque
End-to-End Neural Pipeline for Goal-Oriented Dialogue Systems using GPT-2,
by DongHoon Ham, Jeong-Gwan Lee, Youngsoo Jang and Kee-Eung Kim
Pretrained Transformers Improve Out-of-Distribution Robustness,
by Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan and Dawn Song
Large-Scale Transfer Learning for Natural Language Generation,
by Sergey Golovanov, Rauf Kurbanov, Sergey I. Nikolenko, Kyryl Truskovskyi, Alexander Tselousov and Thomas Wolf
Exploring Pre-trained Language Models for Event Extraction and Generation,
by Sen Yang, Dawei Feng, Linbo Qiao, Zhigang Kan and Dongsheng Li
Barack's Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling,
by Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner and Sameer Singh

ACL

Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models,
by Anonymous Submission
Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting,
by Anonymous Submission

ACL Findings

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models,
by Robert L. Logan IV, Ivana Balazevic, Eric Wallace, Fabio Petroni, Sameer Singh and Sebastian Riedel
ELLE: Efficient Lifelong Pre-training for Emerging Data,
by Yujia Qin, Jiajie Zhang, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun and Jie Zhou
Dict-BERT: Enhancing Language Model Pre-training with Dictionary,
by Wenhao Yu, Chenguang Zhu, Yuwei Fang, Donghan Yu, Shuohang Wang, Yichong Xu, Michael Zeng and Meng Jiang
Reframing Instructional Prompts to GPTk's Language,
by Daniel Khashabi, Chitta Baral, Yejin Choi and Hannaneh Hajishirzi
Latent Reasoning for Low-Resource Question Generation,
by Xinting Huang, Jianzhong Qi, Yu Sun and Rui Zhang
JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs,
by Pei Ke, Haozhe Ji, Yu Ran, Xin Cui, Liwei Wang, Linfeng Song, Xiaoyan Zhu and Minlie Huang
Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models,
by Junyi Li, Tianyi Tang, Wayne Xin Zhao, Zhicheng Wei, Nicholas Jing Yuan and Ji-Rong Wen
ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation,
by Kaushal Kumar Maurya, Maunendra Sankar Desarkar, Yoshinobu Kano and Kumari Deepshikha
A Plug-and-Play Method for Controlled Text Generation,
by Damian Pascual, Beni Egressy, Clara Meister, Ryan Cotterell and Roger Wattenhofer
Structure-Aware Pre-Training for Table-to-Text Generation,
by Xinyu Xing and Xiaojun Wan
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains,
by Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong and Furu Wei
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters,
by Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu Ji, Guihong Cao, Daxin Jiang et al.
We propose KADAPTER, a framework that retains the original parameters of the pre-trained model fixed
and supports the development of versatile
knowledge-infused model.
Do Language Models Perform Generalizable Commonsense Inference?,
by Peifeng Wang, Filip Ilievski, Muhao Chen and Xiang Ren

AIIoT

Graph Attention Neural Network Distributed Model Training,
by Esmaeilzadeh, Armin, Zadeh Nojoo Kambar, Mina Esmail and Heidari, Maryam

ASE

AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models,
by Jos'e Antonio Hern'andez L'opez, Martin Weyssow, Jes'us S'anchez Cuadrado and Houari A. Sahraoui
CoditT5: Pretraining for Source Code and Natural Language Editing,
by Jiyang Zhang, Sheena Panthaplackel, Pengyu Nie, Junyi Jessy Li and Milos Gligoric
Compressing Pre-trained Models of Code into 3 MB,
by Jieke Shi, Zhou Yang, Bowen Xu, Hong Jin Kang and David Lo
What do pre-trained code models know about code?,
by Anjan Karmakar and Romain Robbes
Multi-task Learning based Pre-trained Language Model for Code Completion,
by Fang Liu, Ge Li, Yunfei Zhao and Zhi Jin

Applied Sciences

A model with iterative trials for correcting logic errors in source code,
by Matsumoto, Taku, Watanobe, Yutaka and Nakamura, Keita

AutoML

Meta-Adapters: Parameter Efficient Few-shot Fine-tuning through Meta-Learning,
by Trapit Bansal, Salaheddin Alzubi, Tong Wang, Jay-Yoon Lee and Andrew McCallum

CHI

Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm,
by Laria Reynolds and Kyle McDonell

CIKM

Mitigating Biases in Student Performance Prediction via Attention-Based Personalized Federated Learning,
by Yun-Wei Chu, Seyyedali Hosseinalipour, Elizabeth Tenorio, Laura M. Cruz Castro, Kerrie A. Douglas, Andrew Lan and Christopher G. Brinton
SPOT: Knowledge-Enhanced Language Representations for Information Extraction,
by Jiacheng Li, Yannis Katsis, Tyler Baldwin, Ho-Cheol Kim, Andrew Bartko, Julian J. McAuley and Chun-Nan Hsu
Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network,
by Junyi Li, Siqing Li, Wayne Xin Zhao, Gaole He, Zhicheng Wei, Nicholas Jing Yuan and Ji-Rong Wen

COLING

Improving Recall of Large Language Models: A Model Collaboration Approach for Relational Triple Extraction,
by Zepeng Ding, Wenhao Huang, Jiaqing Liang, Yanghua Xiao and Deqing Yang
Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation,
by Young-Jun Lee, Chae-Gyun Lim and Ho-Jin Choi
Event Causality Identification via Derivative Prompt Joint Learning,
by Shirong Shen, Heng Zhou, Tongtong Wu and Guilin Qi
Are Visual-Linguistic Models Commonsense Knowledge Bases?,
by Hsiu-Yu Yang and Carina Silberer
A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products,
by Kesong Liu, Jianhui Jiang and Feifei Lyu
TableGPT: Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching,
by Heng Gong, Yawei Sun, Xiaocheng Feng, Bing Qin, Wei Bi, Xiaojiang Liu and Ting Liu
Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity,
by Hamza Harkous, Isabel Groves and Amir Saffari
Distill and Replay for Continual Language Learning,
by Sun, Jingyuan , Wang, Shaonan , Zhang, Jiajun and Zong, Chengqing
Proposing a distill and replay method (DnR) which follows the setting of LAMOL. As a distillation-based method, DnR also shows the ability in incrementally compressing the model size while still outperforming most of the baselines.

CVPR

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles,
by Ye, Shuquan, Xie, Yujia, Chen, Dongdong, Xu, Yichong, Yuan, Lu, Zhu, Chenguang and Liao, Jing
Learning to Prompt for Continual Learning,
by Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot et al.
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning,
by Liangqiong Qu, Yuyin Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Ehsan Adeli, Li Fei-Fei and Daniel L. Rubin
CLIP-Event: Connecting Text and Images with Event Structures,
by Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji et al.
Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling,
by Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal and Jingjing Liu
Regularizing Class-Wise Predictions via Self-Knowledge Distillation,
by Sukmin Yun, Jongjin Park, Kimin Lee and Jinwoo Shin
Relational Knowledge Distillation,
by Wonpyo Park, Dongju Kim, Yan Lu and Minsu Cho
Learning to detect unseen object classes by between-class attribute transfer,
by Christoph H. Lampert, Hannes Nickisch and Stefan Harmeling

EACL

Crawling The Internal Knowledge-Base of Language Models,
by Roi Cohen, Mor Geva, Jonathan Berant and Amir Globerson
Methods for Measuring, Updating, and Visualizing Factual Beliefs in Language Models,
by Peter Hase, Mona T. Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal and Srinivasan Iyer
Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions,
by Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen R. McKeown, Doug Downey and Yejin Choi
Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models,
by Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James R. Glass and Fuchun Peng
Our major finding is that after standard finetuning, the model forgets some of the important language generation skills acquired during large-scale pretraining. We propose an intuitive finetuning strategy named “mix-review”: : For each finetuning epoch, we mix the target dialogue data with a random subset of the pretraining data, mix_ratio is 4, decay is 0.9.
Lifelong Knowledge-Enriched Social Event Representation Learning,
by Vijayaraghavan, Prashanth and Roy, Deb
Proposing a rehearsal-based method, i.e.,Domain-Representative Episodic Memory Replay (DR-EMR), for lifelong event representation with embedding alignment and external social commonsense knowledge.
Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries,
by Benjamin Heinzerling and Kentaro Inui

EACL

Crawling the Internal Knowledge-Base of Language Models,
by Roi Cohen, Mor Geva, Jonathan Berant and Amir Globerson
本文提出一种从语言模型中提取结构化知识图谱的方法；使用专门设计的提示来控制提取过程中的精度和召回率；在GPT-3上进行了评估，显示了高精确度的结果。

ECCV

Federated Visual Classification with Real-World Data Distribution,
by Tzu-Ming Harry Hsu, Hang Qi and Matthew Brown

ECIR

Consistency and Coherency Enhanced Story Generation,
by Wei Wang, Piji Li and Hai-Tao Zheng

ECML

Augmenting Open-Domain Event Detection with Synthetic Data from GPT-2,
by Amir Pouran Ben Veyseh, Minh Van Nguyen, Bonan Min and Thien Huu Nguyen

EDBT

Distributed Training of Knowledge Graph Embedding Models using Ray,
by Nasrullah Sheikh, Xiao Qin, Yaniv Gur and Berthold Reinwald

EMNLP

Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation,
by Jinglong Gao, Xiao Ding, Bing Qin and Ting Liu
Graph Meets LLM: A Novel Approach to Collaborative Filtering for Robust Conversational Understanding,
by Zheng Chen, Ziyan Jiang, Fan Yang, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu and Aram Galstyan
Instruct and Extract: Instruction Tuning for On-Demand Information Extraction,
by Yizhu Jiao, Ming Zhong, Sha Li, Ruining Zhao, Siru Ouyang, Heng Ji and Jiawei Han
Empirical Study of Zero-Shot NER with ChatGPT,
by Tingyu Xie, Qi Li, Jian Zhang, Yan Zhang, Zuozhu Liu and Hongwei Wang
Evaluating the Knowledge Base Completion Potential of GPT,
by Blerta Veseli, Simon Razniewski, Jan-Christoph Kalo and Gerhard Weikum
Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!,
by Yubo Ma, Yixin Cao, Yong Hong and Aixin Sun
Chain of Thought with Explicit Evidence Reasoning for Few-shot Relation Extraction,
by Xilai Ma, Jing Li and Min Zhang
Guideline Learning for In-Context Information Extraction,
by Chaoxu Pang, Yixuan Cao, Qiang Ding and Ping Luo
Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained Language Models,
by Paul Youssef, Osman Alperen Koras, Meijie Li, J"org Schl"otterer and Christin Seifert
KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion,
by Yanbin Wei, Qiushi Huang, Yu Zhang and James T. Kwok
Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning,
by Jinyuan Wang, Junlong Li and Hai Zhao
Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning,
by Ruosen Li and Xinya Du
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks,
by Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Atharva Naik, Arjun Ashok, Arut Selvan Dhanasekaran et al.
Iteratively Prompt Pre-trained Language Models for Chain of Thought,
by Boshi Wang, Xiang Deng and Huan Sun
(1) 提出了一种迭代式的prompt-tuning方法，他们认为soft prompt应该带有语境，即在自回归解码时不同时刻应该有不同的prompt向量；
(2) 利用BERT为encoder-decoder架构的PLM生成prompt，在每个解码时刻BERT都会根据先前时刻的上下文生成一组新的prompt向量，提供给PLM生成新的上下文，迭代往复。
Active Example Selection for In-Context Learning,
by Yiming Zhang, Shi Feng and Chenhao Tan
(1) This paper revisits the effect of example selection (re-ordering & calibration) for ICL, observing that a large variance across set of demonstration examples still exists.
(2) This paper applies reinforcement learning (Q-Learning) to optimize example selection by formulating this task as sequential decision-making problem, which is appropriate for example selection from unlabeled datasets.
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?,
by Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi and Luke Zettlemoyer
Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5,
by Nghi Bui, Yue Wang and Steven C. H. Hoi
Generative Knowledge Graph Construction: A Review,
by Hongbin Ye, Ningyu Zhang, Hui Chen and Huajun Chen
Learning Cross-Task Dependencies for Joint Extraction of Entities, Events, Event Arguments, and Relations,
by Minh Van Nguyen, Bonan Min, Franck Dernoncourt and Thien Nguyen
Multilingual SubEvent Relation Extraction: A Novel Dataset and Structure Induction Method,
by Viet Dac Lai, Hieu Man, Linh Ngo Van, Franck Dernoncourt and Thien Nguyen
LILA: A Unified Benchmark for Mathematical Reasoning,
by Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord et al.
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations,
by Jaehun Jung, Lianhui Qin, Sean Welleck, Faeze Brahman, Chandra Bhagavatula, Ronan Le Bras and Yejin Choi
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression,
by Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen and Xiaodan Liang
Knowledge Prompting in Pre-trained Language Model for Natural Language Understanding,
by Jianing Wang, Wenkang Huang, Minghui Qiu, Qiuhui Shi, Hongbin Wang, Xiang Li and Ming Gao
Large language models are few-shot clinical information extractors,
by Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim and David A. Sontag
Snapshot-Guided Domain Adaptation for ELECTRA,
by Daixuan Cheng, Shaohan Huang, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Furu Wei, Denvy Deng and Qi Zhang
Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again,
by Bernal Jimenez Gutierrez, Nikolas McNeal, Clayton Washington, You Chen, Lang Li, Huan Sun and Yu Su
VarMAE: Pre-training of Variational Masked Autoencoder for Domain-adaptive Language Understanding,
by Dou Hu, Xiaolong Hou, Xiyang Du, Mengyuan Zhou, Lianxin Jiang, Yang Mo and Xiaofeng Shi
Efficient Large Scale Language Modeling with Mixtures of Experts,
by Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du et al.
TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models,
by Joel Jang, Seonghyeon Ye, Changho Lee, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim and Minjoon Seo
CN-AutoMIC: Distilling Chinese Commonsense Knowledge from Pretrained Language Models,
by Chenhao Wang, Jiachun Li, Yubo Chen, Kang Liu and Jun Zhao
Training Language Models with Memory Augmentation,
by Zexuan Zhong, Tao Lei and Danqi Chen
Calibrating Factual Knowledge in Pretrained Language Models,
by Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui and Lei Li
Can Language Models Serve as Temporal Knowledge Bases?,
by Ruilin Zhao, Feng Zhao, Guandong Xu, Sixiao Zhang and Hai Jin
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering,
by Jiacheng Liu, Skyler Hallinan, Ximing Lu, Pengfei He, Sean Welleck, Hannaneh Hajishirzi and Yejin Choi
GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models,
by Da Yin, Hritik Bansal, Masoud Monajatipoor, Liunian Harold Li and Kai-Wei Chang
RobustLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners,
by Soumya Sanyal, Zeyi Liao and Xiang Ren
Towards Unified Prompt Tuning for Few-shot Text Classification,
by Jianing Wang, Chengyu Wang, Fuli Luo, Chuanqi Tan, Minghui Qiu, Fei Yang, Qiuhui Shi, Songfang Huang et al.
Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations,
by Swarnadeep Saha, Peter Hase, Nazneen Rajani and Mohit Bansal
ZeroGen: Efficient Zero-shot Learning via Dataset Generation,
by Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu and Lingpeng Kong
Towards Robust NLG Bias Evaluation with Syntactically-diverse Prompts,
by Arshiya Aggarwal, Jiao Sun and Nanyun Peng
LogicNMR: Probing the Non-monotonic Reasoning Ability of Pre-trained Language Models,
by Yeliang Xiu, Zhanhao Xiao and Yongmei Liu
FewshotQA: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models,
by Rakesh Chada and Pradeep Natarajan
The Power of Scale for Parameter-Efficient Prompt Tuning,
by Brian Lester, Rami Al-Rfou and Noah Constant
Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation,
by Leonardo F. R. Ribeiro, Jonas Pfeiffer, Yue Zhang and Iryna Gurevych
A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation,
by Shilei Liu, Xiaofeng Zhao, Bochao Li, Feiliang Ren, Longhui Zhang and Shujuan Yin
Structural Adapters in Pretrained Language Models for AMR-to-Text Generation,
by Leonardo F. R. Ribeiro, Yue Zhang and Iryna Gurevych
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation,
by Yue Wang, Weishi Wang, Shafiq R. Joty and Steven C. H. Hoi
Dialogue State Tracking with a Language Model using Schema-Driven Prompting,
by Chia-Hsuan Lee, Hao Cheng and Mari Ostendorf
Salience-Aware Event Chain Modeling for Narrative Understanding,
by Xiyang Zhang, Muhao Chen and Jonathan May
Want To Reduce Labeling Cost? GPT-3 Can Help,
by Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu and Michael Zeng
Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning,
by Jin, Xisen , Lin, Bill Yuchen , Rostami, Mohammad and Ren, Xiang
We present a new learning setup, Continual Learning of Few-Shot Learners, to address challenges of both learning settings in a unified setup, with a hyper-network for task-specific adapter generation.
Domain-Lifelong Learning for Dialogue State Tracking via Knowledge Preservation Networks,
by Liu, Qingbin , Cao, Pengfei , Liu, Cao , Chen, Jiansong , Cai, Xunliang , Yang, Fan , He, Shizhu , Liu, Kang et al.
This paper explores Domain-Lifelong Learning for Dialogue State Tracking, we propose Knowledge Preservation Network, which consists of multi-prototype enhanced retrospection and multi-strategy knowledge distillation, to solve the problems of expression diversity and combinatorial explosion in the DLL-DST task
CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks,
by Ke, Zixuan , Liu, Bing , Xu, Hu and Shu, Lei
The key novelty is a contrastive continual learning method that enables both knowledge transfer across tasks and knowledge distillation from old tasks to the new task, which eliminates the need for task ids in testing.
Lifelong Explainer for Lifelong Learners,
by Situ, Xuelin , Maruf, Sameen , Zukerman, Ingrid , Paris, Cecile and Haffari, Gholamreza
We propose a novel Lifelong Explanation approach that continuously trains a student explainer under the supervision of a teacher – an arbitrary explanation algorithm – on different tasks undertaken in LL. We also leverage the Experience Replay mechanism to prevent catastrophic forgetting in the student explainer.
A Unified Speaker Adaptation Approach for ASR,
by Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng and Bin Ma
Prefix-based user identifier, Continual ASR / Architecture Search / Network Pruning.
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation,
by Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee and Woo-Myoung Park
Editing Factual Knowledge in Language Models,
by Nicola De Cao, Wilker Aziz and Ivan Titov
Relational World Knowledge Representation in Contextual Language Models: A Review,
by Tara Safavi and Danai Koutra
RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms,
by Pei Zhou, Rahul Khanna, Seyeon Lee, Bill Yuchen Lin, Daniel Ho, Jay Pujara and Xiang Ren
Can Language Models be Biomedical Knowledge Bases?,
by Mujeen Sung, Jinhyuk Lee, Sean S. Yi, Minji Jeon, Sungdong Kim and Jaewoo Kang
Transformer Feed-Forward Layers Are Key-Value Memories,
by Mor Geva, Roei Schuster, Jonathan Berant and Omer Levy
KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation,
by Wenhu Chen, Yu Su, Xifeng Yan and William Yang Wang
Logic2Text: High-Fidelity Natural Language Generation from Logical Forms,
by Zhiyu Chen, Wenhu Chen, Hanwen Zha, Xiyou Zhou, Yunkai Zhang, Sairam Sundaresan and William Yang Wang
Reformulating Unsupervised Style Transfer as Paraphrase Generation,
by Kalpesh Krishna, John Wieting and Mohit Iyyer
Few-shot Natural Language Generation for Task-Oriented Dialog,
by Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng and Jianfeng Gao
PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking,
by Hannah Rashkin, Asli Celikyilmaz, Yejin Choi and Jianfeng Gao
T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack,
by Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang and Bo Li
MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models,
by Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar and Bryan Catanzaro
StyleDGPT: Stylized Response Generation with Pre-trained Language Models,
by Ze Yang, Wei Wu, Can Xu, Xinnian Liang, Jiaqi Bai, Liran Wang, Wei Wang and Zhoujun Li
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision,
by Hao Tan and Mohit Bansal
SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup,
by Rongzhi Zhang, Yue Yu and Chao Zhang
Joint Constrained Learning for Event-Event Relation Extraction,
by Haoyu Wang, Muhao Chen, Hongming Zhang and Dan Roth
Revisiting Pre-Trained Models for Chinese Natural Language Processing,
by Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang and Guoping Hu
Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting,
by Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu and Xiangzhan Yu
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks. Specifically, we introduce a Pretraining Simulation mechanism to recall the knowledge from pretraining tasks without data, and an Objective Shifting mechanism to focus the learning on downstream tasks gradually.
Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning,
by Zhaojiang Lin, Andrea Madotto and Pascale Fung
Proposing an adapter-based method for continual learning in text generation. One of the insights is a frozen PLM can be well-applied in continual learning.
An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training,
by Arumae, Kristjan , Sun, Qing and Bhatia, Parminder
We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks while remaining competitive in bio-medical tasks.
Visually Grounded Continual Learning of Compositional Phrases,
by Jin, Xisen , Du, Junyi , Sadhu, Arka , Nevatia, Ram and Ren, Xiang
A novel continual learning setting and a new benchmark for continual caption generation, evaluated with exiting rehearsal-based methods
Incremental Event Detection via Knowledge Consolidation Networks,
by Cao, Pengfei , Chen, Yubo , Zhao, Jun and Wang, Taifeng
Proposing a hybrid continual learning method for event detection, combining experience replay and Knowledge Distillation, focusing on (1) semantic ambiguity in NLP and (2) data imbalance between memory and current task.
A Multi-Task Incremental Learning Framework with Category Name Embedding for Aspect-Category Sentiment Analysis,
by Dai, Zehui , Peng, Cheng , Chen, Huajie and Ding, Yadong
Utilizing BERT for sentence and category encoding, preserving category encoding to prevent catastrophic forgetting.
Efficient Meta Lifelong-Learning with Limited Memory,
by Wang, Zirui , Mehta, Sanket Vaibhav , Poczos, Barnabas and Carbonell, Jaime
A meta learning-enhanced version of MbPA (NeurIPS19), sharing the continual setting as well. Figure 1 is interesting.
Lifelong Language Knowledge Distillation,
by Chuang, Yung-Sung , Su, Shang-Yu and Chen, Yun-Nung
Proposing a Knowledge Distillation-enhanced Method LLL based on LAMOL (ICLR 2020) model for continual learning, evaluated on text generation and text classification.
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts,
by Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace and Sameer Singh
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning,
by Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi and Xiang Ren
Dialogue Response Ranking Training with Large-Scale Human Feedback Data,
by Xiang Gao, Yizhe Zhang, Michel Galley, Chris Brockett and Bill Dolan
Thinking Like a Skeptic: Defeasible Inference in Natural Language,
by Rachel Rudinger, Vered Shwartz, Jena D. Hwang, Chandra Bhagavatula, Maxwell Forbes, Ronan Le Bras, Noah A. Smith and Yejin Choi
Improving Neural Story Generation by Targeted Common Sense Grounding,
by Huanru Henry Mao, Bodhisattwa Prasad Majumder, Julian J. McAuley and Garrison W. Cottrell
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,
by Nils Reimers and Iryna Gurevych
Language Models as Knowledge Bases?,
by Fabio Petroni, Tim Rockt"aschel, Sebastian Riedel, Patrick S. H. Lewis, Anton Bakhtin, Yuxiang Wu and Alexander H. Miller

EMNLP Findings

Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5,
by Nghi Bui, Yue Wang and Steven C. H. Hoi
Multilingual SubEvent Relation Extraction: A Novel Dataset and Structure Induction Method,
by Viet Dac Lai, Hieu Man, Linh Ngo Van, Franck Dernoncourt and Thien Nguyen
Snapshot-Guided Domain Adaptation for ELECTRA,
by Daixuan Cheng, Shaohan Huang, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Furu Wei, Denvy Deng and Qi Zhang
Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again,
by Bernal Jimenez Gutierrez, Nikolas McNeal, Clayton Washington, You Chen, Lang Li, Huan Sun and Yu Su
VarMAE: Pre-training of Variational Masked Autoencoder for Domain-adaptive Language Understanding,
by Dou Hu, Xiaolong Hou, Xiyang Du, Mengyuan Zhou, Lianxin Jiang, Yang Mo and Xiaofeng Shi
Calibrating Factual Knowledge in Pretrained Language Models,
by Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui and Lei Li
Can Language Models Serve as Temporal Knowledge Bases?,
by Ruilin Zhao, Feng Zhao, Guandong Xu, Sixiao Zhang and Hai Jin
Towards Unified Prompt Tuning for Few-shot Text Classification,
by Jianing Wang, Chengyu Wang, Fuli Luo, Chuanqi Tan, Minghui Qiu, Fei Yang, Qiuhui Shi, Songfang Huang et al.
Want To Reduce Labeling Cost? GPT-3 Can Help,
by Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu and Michael Zeng
Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning,
by Jin, Xisen , Lin, Bill Yuchen , Rostami, Mohammad and Ren, Xiang
We present a new learning setup, Continual Learning of Few-Shot Learners, to address challenges of both learning settings in a unified setup, with a hyper-network for task-specific adapter generation.
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation,
by Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee and Woo-Myoung Park
Logic2Text: High-Fidelity Natural Language Generation from Logical Forms,
by Zhiyu Chen, Wenhu Chen, Hanwen Zha, Xiyou Zhou, Yunkai Zhang, Sairam Sundaresan and William Yang Wang
Few-shot Natural Language Generation for Task-Oriented Dialog,
by Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng and Jianfeng Gao
StyleDGPT: Stylized Response Generation with Pre-trained Language Models,
by Ze Yang, Wei Wu, Can Xu, Xinnian Liang, Jiaqi Bai, Liran Wang, Wei Wang and Zhoujun Li
Revisiting Pre-Trained Models for Chinese Natural Language Processing,
by Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang and Guoping Hu
Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning,
by Zhaojiang Lin, Andrea Madotto and Pascale Fung
Proposing an adapter-based method for continual learning in text generation. One of the insights is a frozen PLM can be well-applied in continual learning.
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning,
by Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi and Xiang Ren
Thinking Like a Skeptic: Defeasible Inference in Natural Language,
by Rachel Rudinger, Vered Shwartz, Jena D. Hwang, Chandra Bhagavatula, Maxwell Forbes, Ronan Le Bras, Noah A. Smith and Yejin Choi

Euro-Par

Elastic Deep Learning Using Knowledge Distillation with Heterogeneous Computing Resources,
by Daxiang Dong, Ji Liu, Xi Wang, Weibao Gong, An Qin, Xingjian Li, Dianhai Yu, Patrick Valduriez et al.

EvoMUSART

Towards the Generation of Musical Explanations with GPT-3,
by Stephen James Krol, Maria Teresa Llano and Jon McCormack

FCST 计算机科学与探索

Review of Knowledge-Enhanced Pre-trained Language Models,
by Yi, HAN, Linbo, QIAO, Dongsheng, LI and Xiangke, LIAO

FLPI

Collaborative Fairness in Federated Learning,
by Lingjuan Lyu, Xinyi Xu, Qian Wang and Han Yu

FSE

AUGER: automatically generating review comments with pre-training models,
by Lingwei Li, Li Yang, Huaxi Jiang, Jun Yan, Tiejian Luo, Zihan Hua, Geng Liang and Chun Zuo
Automating code review activities by large-scale pre-training,
by Zhiyu Li, Shuai Lu, Daya Guo, Nan Duan, Shailesh Jannu, Grant Jenks, Deep Majumder, Jared Green et al.
Diet code is healthy: simplifying programs for pre-trained models of code,
by Zhaowei Zhang, Hongyu Zhang, Beijun Shen and Xiaodong Gu
NatGen: generative pre-training by "naturalizing" source code,
by Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar T. Devanbu and Baishakhi Ray
IntelliCode compose: Code Generation using transformer,
by Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu and Neel Sundaresan

IA3

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs,
by Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang et al.

ICCV

Knowledge Distillation via Route Constrained Optimization,
by Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan and Xiaolin Hu
VideoBERT: A Joint Model for Video and Language Representation Learning,
by Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy and Cordelia Schmid

ICDCS

GRACE: A Compressed Communication Framework for Distributed Machine Learning,
by Hang Xu, Chen-Yu Ho, Ahmed M. Abdelmoniem, Aritra Dutta, El Houcine Bergou, Konstantinos Karatsenidis, Marco Canini and Panos Kalnis

ICER

Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models,
by Sami Sarsa, Paul Denny, Arto Hellas and Juho Leinonen

ICLR

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction,
by Oscar Sainz, Iker Garc'\ia-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau and Eneko Agirre
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning,
by Gao, Jiahui, Pi, Renjie, Yong, LIN, Xu, Hang, Ye, Jiacheng, Wu, Zhiyong, ZHANG, WEIZHONG, Liang, Xiaodan et al.
Continual Pre-training of Language Models,
by Zixuan Ke, Yijia Shao, Haowei Lin, Tatsuya Konishi, Gyuhak Kim and Bing Liu
Language models are multilingual chain-of-thought reasoners,
by Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay et al.
Dataless Knowledge Fusion by Merging Weights of Language Models,
by Xisen Jin, Xiang Ren, Daniel Preotiuc-Pietro and Pengxiang Cheng
Complexity-Based Prompting for Multi-step Reasoning,
by Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark and Tushar Khot
Finetuned Language Models are Zero-Shot Learners,
by Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai et al.
An Explanation of In-context Learning as Implicit Bayesian Inference,
by Sang Michael Xie, Aditi Raghunathan, Percy Liang and Tengyu Ma
LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5,
by Chengwei Qin and Shafiq Joty
We define a challenging yet practical problem as Lifelong Few-shot Language Learning and propose a unified framework for it based on prompt tuning of T5.
Towards Continual Knowledge Learning of Language Models,
by Joel Jang, Seonghyeon Ye, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun KIM, Stanley Jungkyu Choi and Minjoon Seo
We propose a novel continual learning formulation named Continual Knowledge Learning which allows large language models to constantly obtain new and updated knowledge while mitigating forgetting of previous learned time-invariant knowledge.
Pretrained Language Model in Continual Learning: A Comparative Study,
by Tongtong Wu, Massimo Caccia, Zhuang Li, Yuan-Fang Li, Guilin Qi and Gholamreza Haffari
To explore the layer-wise property of pretrained languge models in continual learning, we thoroughly compare the continual learning performance over the combination of 5 PLMs and 4 veins of CL methods on 3 benchmarks in 2 typical incremental settings.
Fast Model Editing at Scale,
by Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn and Christopher D. Manning
P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts,
by Benjamin Newman, Prafulla Kumar Choubey and Nazneen Rajani
GreaseLM: Graph REASoning Enhanced Language Models,
by Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning and Jure Leskovec
A Distributional Approach to Controlled Text Generation,
by Muhammad Khalifa, Hady Elsahar and Marc Dymetman
GraphCodeBERT: Pre-training Code Representations with Data Flow,
by Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan et al.
Combining Ensembles and Data Augmentation Can Harm Your Calibration,
by Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry, Jasper Snoek, Balaji Lakshminarayanan and Dustin Tran
Pre-training Text-to-Text Transformers for Concept-centric Common Sense,
by Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee and Xiang Ren
Plug and Play Language Models: A Simple Approach to Controlled Text Generation,
by Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski and Rosanne Liu
BERTScore: Evaluating Text Generation with BERT,
by Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger and Yoav Artzi
VL-BERT: Pre-training of Generic Visual-Linguistic Representations,
by Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei and Jifeng Dai
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators,
by Kevin Clark, Minh-Thang Luong, Quoc V. Le and Christopher D. Manning
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks,
by Jonathan Frankle and Michael Carbin
Generating Wikipedia by Summarizing Long Sequences,
by Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser and Noam Shazeer
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer,
by Sergey Zagoruyko and Nikos Komodakis
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,
by Antti Tarvainen and Harri Valpola

ICML

Large Language Models Struggle to Learn Long-Tail Knowledge,
by Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace and Colin Raffel
Can Neural Network Memorization Be Localized?,
by Pratyush Maini, Michael Curtis Mozer, Hanie Sedghi, Zachary Chase Lipton, J. Zico Kolter and Chiyuan Zhang
Improved logical reasoning of language models via differentiable symbolic programming,
by Zhang, Hanlin, Li, Ziyang, Huang, Jiani, Naik, Mayur and Xing, Eric
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention,
by Kazuki Irie, R'obert Csord'as and J"urgen Schmidhuber
(1) 很有意思的一篇，回顾神经网络（NN）线性层Y=WX（省略偏置b）的原始形式与对偶形式，两种形式完全等价；
(2) 从对偶形式中可以发现，通过反向传播训练的NN线性层的输出主要是该层在训练期间的训练误差信号et的线性组合，其中权重是通过比较测试查询x和每个训练输入计算出来的；进一步可以得出，如果测试时输入的x和训练时的输入是正交的，那么梯度下降所得到的参数更新对于该样本x完全没有影响。
StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models,
by Adam Liska, Tom'as Kocisk'y, Elena Gribovskaya, Tayfun Terzi, Eren Sezener, Devang Agrawal, Cyprien de Masson d'Autume, Tim Scholtes et al.
Memory-Based Model Editing at Scale,
by Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning and Chelsea Finn
Improving Language Models by Retrieving from Trillions of Tokens,
by Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau et al.
Ditto: Fair and Robust Federated Learning Through Personalization,
by Tian Li, Shengyuan Hu, Ahmad Beirami and Virginia Smith
MASS: Masked Sequence to Sequence Pre-training for Language Generation,
by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu and Tie-Yan Liu
Born-Again Neural Networks,
by Tommaso Furlanello, Zachary Chase Lipton, Michael Tschannen, Laurent Itti and Anima Anandkumar

ICPADS

Load Balancing Optimization for Transformer in Distributed Environment,
by Delu Ma, Zhou Lei, Shengbo Chen and Peng Wang

ICSE

Towards JavaScript program repair with Generative Pre-trained Transformer (GPT-2),
by M'ark Lajk'o, Viktor Csuvik and L'aszl'o Vid'acs
Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding,
by Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong and Xiangke Liao
Jigsaw: Large Language Models meet Program Synthesis,
by Naman Jain, Skanda Vaidyanath, Arun Shankar Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram K. Rajamani and Rahul Sharma
Natural Attack for Pre-trained Models of Code,
by Zhou Yang, Jieke Shi, Junda He and David Lo
Using Pre-Trained Models to Boost Code Review Automation,
by Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk and Gabriele Bavota
What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code,
by Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu and Hai Jin
Fast Changeset-based Bug Localization with BERT,
by Agnieszka Ciborowska and Kostadin Damevski
Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models,
by Jinfeng Lin, Yalin Liu, Qingkai Zeng, Meng Jiang and Jane Cleland-Huang

ICSME

Sentiment analysis for software engineering: How far can pre-trained transformer models go?,
by Zhang, Ting, Xu, Bowen, Thung, Ferdian, Haryono, Stefanus Agus, Lo, David and Jiang, Lingxiao

IJCAI

Meta-Learning Based Knowledge Extrapolation for Knowledge Graphs in the Federated Setting,
by Mingyang Chen, Wen Zhang, Zhen Yao, Xiangnan Chen, Mengxiao Ding, Fei Huang and Huajun Chen
Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code,
by Changan Niu, Chuanyi Li, Bin Luo and Vincent Ng
Federated Learning with Fair Averaging,
by Zheng Wang, Xiaoliang Fan, Jianzhong Qi, Chenglu Wen, Cheng Wang and Rongshan Yu

IJRR

Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences,
by Erdem Biyik, Dylan P. Losey, Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk and Dorsa Sadigh

ISSTA

An extensive study on pre-trained models for program understanding and generation,
by Zhengran Zeng, Hanzhuo Tan, Haotian Zhang, Jing Li, Yuqun Zhang and Lingming Zhang
CIRCLE: continual repair across programming languages,
by Wei Yuan, Quanjun Zhang, Tieke He, Chunrong Fang, Nguyen Quoc Viet Hung, Xiaodong Hao and Hongzhi Yin

ISoLA

Measuring Convergence Inertia: Online Learning in Self-adaptive Systems with Context Shifts,
by Elvin Alberts and Ilias Gerostathopoulos

JAIR

Towards Continual Reinforcement Learning: A Review and Perspectives,
by Khimya Khetarpal, Matthew Riemer, Irina Rish and Doina Precup

JIS

Fairness and accuracy in horizontal federated learning,
by Wei Huang, Tianrui Li, Dexian Wang, Shengdong Du, Junbo Zhang and Tianqiang Huang

JKSUCIS

The survey: Text generation models in deep learning,
by Touseef Iqbal and Shaima Qureshi

JMLR

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,
by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li et al.

KDD

All in One: Multi-Task Prompting for Graph Neural Networks,
by Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu and Jihong Guan
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding,
by Wayne Xin Zhao, Kun Zhou, Zheng Gong, Beichen Zhang, Yuanhang Zhou, Jing Sha, Zhigang Chen, Shijin Wang et al.
Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries,
by Xiao Liu, Shiyu Zhao, Kai Su, Yukuo Cen, Jiezhong Qiu, Mengdi Zhang, Wei Wu, Yuxiao Dong et al.
GPPT: Graph Pre-training and Prompt Tuning to Generalize Graph Neural Networks,
by Mingchen Sun, Kaixiong Zhou, Xin He, Ying Wang and Xin Wang

KIS

From distributed machine learning to federated learning: a survey,
by Ji Liu, Jizhou Huang, Yang Zhou, Xuhong Li, Shilei Ji, Haoyi Xiong and Dejing Dou

MM

Pre-training Graph Transformer with Multimodal Side Information for Recommendation,
by Yong Liu, Susen Yang, Chenyi Lei, Guoxin Wang, Haihong Tang, Juyong Zhang, Aixin Sun and Chunyan Miao

MSR

Applying CodeBERT for Automated Program Repair of Java Simple Bugs,
by Ehsan Mashhadi and Hadi Hemmati

NAACL

Prompting Few-shot Multi-hop Question Generation via Comprehending Type-aware Semantics,
by Zefeng Lin, Weidong Chen, Yan Song and Yongdong Zhang
Do Prompt-Based Models Really Understand the Meaning of Their Prompts?,
by Albert Webson and Ellie Pavlick
MetaICL: Learning to Learn In Context,
by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi
MetaICL proposes a supervised meta-training framework to enable LMs to more effectively learn a new task in context. In MetaICL, each meta-training example includes several training examples from one task that will be presented together as a single sequence to the LM, and the prediction of the final example is used to calculate the loss.
Improving In-Context Few-Shot Learning via Self-Supervised Training,
by Mingda Chen, Jingfei Du, Ramakanth Pasunuru, Todor Mihaylov, Srini Iyer, Veselin Stoyanov and Zornitsa Kozareva
This paper proposes to use self-supervision (MLM, NSP, CL, etc.) between pre-training and downstream usage to teach the LM to perform in-context learning. Analysis reveals that:
(1) benefits of self-supervised depends on the amount of training data,
(2) semantic similarity between training and evaluation tasks matters,
(3) adding training objectives without diversity does not help,
(4) model performance improves when choosing similar templates for both self-supervised and downstream tasks,
(5) self-supervised tasks and human-annotated datasets are complementary,
(6) self-supervised-trained models are better at following task instructions.
Learning To Retrieve Prompts for In-Context Learning,
by Ohad Rubin, Jonathan Herzig and Jonathan Berant
This paper proposes a method to retrieve good contexts for in-context learning. Specifically, the method
(1) uses an unsupervised retriever (BM25/SBERT) to obtain a set of context candidates,
(2) passes the candidates to a scoring model (GPT-Neo/GPT-J/GPT-3/Codex) and select the top/bottom k as positive/negative examples,
(3) uses the examples to train a dense retriever (BERT-based).
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora,
by Xisen Jin, Dejiao Zhang, Henghui Zhu, Wei Xiao, Shang-Wen Li, Xiaokai Wei, Andrew O. Arnold and Xiang Ren
Pretrained Models for Multilingual Federated Learning,
by Orion Weller, Marc Marone, Vladimir Braverman, Dawn J. Lawrie and Benjamin Van Durme
CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training,
by Xin Wang, Yasheng Wang, Yao Wan, Jiawei Wang, Pingyi Zhou, Li Li, Hao Wu and Jin Liu
Word-Label Alignment for Event Detection: A New Perspective via Optimal Transport,
by Amir Pouran Ben Veyseh and Thien Huu Nguyen
What Makes Good In-Context Examples for GPT-3?,
by Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin and Weizhu Chen
(1) 探索了在in-context learning中什么样的demonstration example可以对GPT-3的效果取得帮助；
(2) 利用roberta对样本进行编码，并计算demonstration与test example的向量距离（欧氏距离），最终发现与test example越相近的demonstration越能取得较好的效果。
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation,
by Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao and Weizhu Chen
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models,
by Peter West, Chandra Bhagavatula, Jack Hessel, Jena D. Hwang, Liwei Jiang, Ronan Le Bras, Ximing Lu, Sean Welleck et al.
Ask what's missing and what's useful: Improving Clarification Question Generation using Global Knowledge,
by Bodhisattwa Prasad Majumder, Sudha Rao, Michel Galley and Julian J. McAuley
Progressive Generation of Long Text with Pretrained Language Models,
by Bowen Tan, Zichao Yang, Maruan Al-Shedivat, Eric P. Xing and Zhiting Hu
A Simple and Efficient Multi-Task Learning Approach for Conditioned Dialogue Generation,
by Yan Zeng and Jian-Yun Nie
Unified Pre-training for Program Understanding and Generation,
by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray and Kai-Wei Chang
Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems,
by Derek Chen, Howard Chen, Yi Yang, Alexander Lin and Zhou Yu
Fine-grained Post-training for Improving Retrieval-based Dialogue Systems,
by Janghoon Han, Taesuk Hong, Byoungjae Kim, Youngjoong Ko and Jungyun Seo
Improving Biomedical Pretrained Language Models with Knowledge,
by Zheng Yuan, Yijia Liu, Chuanqi Tan, Songfang Huang and Fei Huang
Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution,
by Garcia, Xavier , Constant, Noah , Parikh, Ankur and Firat, Orhan
Introducing the catastrophic forgetting problem in incremental multi-language translation, and utilizing a vocabulary substitution manner to alleviate the above problem.
Continual Learning for Text Classification with Information Disentanglement Based Regularization,
by Huang, Yufan , Zhang, Yanzhe , Chen, Jiaao , Wang, Xuezhi and Yang, Diyi
Proposing a regularization-based method for continual text classification, introducing the next sentence prediction and task id prediction as auxiliary tasks.
Incremental Few-shot Text Classification with Multi-round New Classes: Formulation, Dataset and System,
by Xia, Congying , Yin, Wenpeng , Feng, Yihao and Yu, Philip
Proposing a new setting and respective benchmark for few-shot incremental text classification, modeling continual text classification with text entailment.
Hyperparameter-free Continuous Learning for Domain Classification in Natural Language Understanding,
by Hua, Ting , Shen, Yilin , Zhao, Changsheng , Hsu, Yen-Chang and Jin, Hongxia
Inspired by EWC and proposing a hyperparameter-free (Fisher information-based) sampling method for memory replay.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,
by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova

NAACL

Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey,
by Garima Agrawal, Tharindu Kumarage, Zeyad Alghami and Huan Liu

NeurIPS

GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph,
by Xin Li, Dongze Lian, Zhihe Lu, Jiawang Bai, Zhibo Chen and Xinchao Wang
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks,
by Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu et al.
Visual Instruction Tuning,
by Haotian Liu, Chunyuan Li, Qingyang Wu and Yong Jae Lee
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning,
by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung et al.
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction,
by Rui Yang, Lin Song, Yanwei Li, Sijie Zhao, Yixiao Ge, Xiu Li and Ying Shan
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark,
by Zhenfei Yin, Jiong Wang, Jianjian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Xiaoshui Huang, Zhiyong Wang et al.
Meta-in-context learning in large language models,
by Julian Coda-Forno, Marcel Binz, Zeynep Akata, Matt M. Botvinick, Jane X. Wang and Eric Schulz
DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models,
by Ge Zheng, Bin Yang, Jiajin Tang, Hong-Yu Zhou and Sibei Yang
Dissecting Chain-of-Thought: Compositionality through In-Context Filtering and Learning,
by Yingcong Li, Kartik Sreenivasan, Angeliki Giannou, Dimitris Papailiopoulos and Samet Oymak
Schema-learning and rebinding as mechanisms of in-context learning and emergence,
by Sivaramakrishnan Swaminathan, Antoine Dedieu, Rajkumar Vasudeva Raju, Murray Shanahan, Miguel L'azaro-Gredilla and Dileep George
Sparse Structure Search for Delta Tuning,
by Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan Liu and Maosong Sun
Star: Self-taught reasoner bootstrapping reasoning with reasoning,
by Zelikman, Eric, Mu, Jesse, Goodman, Noah D and Wu, Yuhuai Tony
Locating and editing factual associations in gpt,
by Meng, Kevin, Bau, David, Andonian, Alex J and Belinkov, Yonatan
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation,
by Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain et al.
Revisiting the Calibration of Modern Neural Networks,
by Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran and Mario Lucic
Soft Calibration Objectives for Neural Networks,
by Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer and Becca Roelofs
Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning,
by Zixuan Ke, Bing Liu, Nianzu Ma, Hu Xu and Lei Shu
NeurIPS 2021, The key component of CTR is the CL-plugin inserted in BERT. A CL-plugin is a capsule network with a new transfer routing mechanism to encourage knowledge transfer among tasks and also to isolate task-specific knowledge to avoid forgetting.
Language Models are Few-Shot Learners,
by Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam et al.
Large-Scale Adversarial Training for Vision-and-Language Representation Learning,
by Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng and Jingjing Liu
A Simple Language Model for Task-Oriented Dialogue,
by Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz and Richard Socher
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,
by Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K"uttler, Mike Lewis et al.
Learning to summarize with human feedback,
by Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei et al.
Unified Language Model Pre-training for Natural Language Understanding and Generation,
by Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou et al.
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks,
by Jiasen Lu, Dhruv Batra, Devi Parikh and Stefan Lee
Episodic Memory in Lifelong Language Learning,
by Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong and Dani Yogatama
MbPA++. This paper proposes the use of memory (a fixed memory network) in life-long learning to prevent catastrophic forgetting by means of experience replay and local adaptation.
Deep Reinforcement Learning from Human Preferences,
by Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg and Dario Amodei
Zero-shot Learning with Semantic Output Codes,
by Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton and Tom M. Mitchell

NeurIPS

News Summarization and Evaluation in the Era of GPT-3,
by Tanya Goyal, Junyi Jessy Li and Greg Durrett
Deep Bidirectional Language-Knowledge Graph Pretraining,
by Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D. Manning, Percy Liang and Jure Leskovec
Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models,
by Zijian Zhang, Zhou Zhao and Zhijie Lin
The unreliability of explanations in few-shot prompting for textual reasoning,
by Ye, Xi and Durrett, Greg

OSDI

Ray: A Distributed Framework for Emerging AI Applications,
by Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang et al.

OpenAI

GPT-4 Technical Report,
by OpenAI
GPT-4 System Card,
by OpenAI
Language Models are Unsupervised Multitask Learners,
by Radford, Alec, Wu, Jeffrey, Child, Rewon, Luan, David, Amodei, Dario and Sutskever, Ilya
Improving language understanding by generative pre-training,
by Radford, Alec, Narasimhan, Karthik, Salimans, Tim, Sutskever, Ilya and others

SIGIR

Are Graph Augmentations Necessary?: Simple Graph Contrastive Learning for Recommendation,
by Junliang Yu, Hongzhi Yin, Xin Xia, Tong Chen, Lizhen Cui and Quoc Viet Hung Nguyen
Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective,
by Xin Xin, Tiago Pimentel, Alexandros Karatzoglou, Pengjie Ren, Konstantina Christakopoulou and Zhaochun Ren
Knowledge-based Review Generation by Coherence Enhanced Text Planning,
by Junyi Li, Wayne Xin Zhao, Zhicheng Wei, Nicholas Jing Yuan and Ji-Rong Wen
DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization,
by Xueying Zhang, Yunjiang Jiang, Yue Shang, Zhaomeng Cheng, Chi Zhang, Xiaochuan Fan, Yun Xiao and Bo Long

T-PAMI

A Continual Learning Survey: Defying Forgetting in Classification Tasks,
by Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory G. Slabaugh and Tinne Tuytelaars

TACL

Time-Aware Language Models as Temporal Knowledge Bases,
by Bhuwan Dhingra, Jeremy R. Cole, Julian Martin Eisenschlos, Daniel Gillick, Jacob Eisenstein and William W. Cohen
Pretraining the Noisy Channel Model for Task-Oriented Dialogue,
by Qi Liu, Lei Yu, Laura Rimell and Phil Blunsom
Measuring and Improving Consistency in Pretrained Language Models,
by Yanai Elazar, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard H. Hovy, Hinrich Sch"utze and Yoav Goldberg
A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation,
by Jian Guan, Fei Huang, Minlie Huang, Zhihao Zhao and Xiaoyan Zhu
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks,
by Sascha Rothe, Shashi Narayan and Aliaksei Severyn
A Primer in BERTology: What We Know About How BERT Works,
by Anna Rogers, Olga Kovaleva and Anna Rumshisky

TIST

FedBERT: When Federated Learning Meets Pre-training,
by Yuanyishu Tian, Yao Wan, Lingjuan Lyu, Dezhong Yao, Hai Jin and Lichao Sun

TKDE

Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling,
by Yang, Linyao, Chen, Hongyang, Li, Zhao, Ding, Xiao and Wu, Xindong
A Survey on Knowledge-Enhanced Pre-trained Language Models,
by Chaoqi Zhen, Yanlei Shang, Xiangyu Liu, Yifei Li, Yong Chen and Dell Zhang
A Survey on Knowledge Graph-Based Recommender Systems,
by Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong and Qing He

TNSE

Federated Learning Meets Multi-Objective Optimization,
by Zeou Hu, Kiarash Shaloudegi, Guojun Zhang and Yaoliang Yu

TOIS

Disentangled Representations Learning for Multi-Target Cross-Domain Recommendation,
by Guo, Xiaobo, Li, Shaoshuai, Guo, Naicheng, Cao, Jiangxia, Liu, Xiaolei, Ma, Qiongxu, Gan, Runsheng and Zhao, Yunan

VLDB

Selective Data Acquisition in the Wild for Model Charging,
by Chengliang Chai, Jiabin Liu, Nan Tang, Guoliang Li and Yuyu Luo
PyTorch Distributed: Experiences on Accelerating Data Parallel Training,
by Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith et al.

WASA

Multi-view Pre-trained Model for Code Vulnerability Identification,
by Xuxiang Jiang, Yinhao Xiao, Jun Wang and Wei Zhang

WWW

Ontology-enhanced Prompt-tuning for Few-shot Learning,
by Hongbin Ye, Ningyu Zhang, Shumin Deng, Xiang Chen, Hui Chen, Feiyu Xiong, Xi Chen and Huajun Chen
Slot Self-Attentive Dialogue State Tracking,
by Fanghua Ye, Jarana Manotumruksa, Qiang Zhang, Shenghui Li and Emine Yilmaz