Publications
Preprint
Pre-trained language models (PLMs) have achieved notable success in natural language generation (NLG) tasks. Up to now, most of
the PLMs are pre-trained in an unsupervised manner using large-scale general corpus. In the meanwhile, an increasing number
of models pre-trained with less labeled data showcase superior performance compared to unsupervised models. Motivated by the
success of supervised pre-training, we propose Multi-task superVised Pre-training (MVP) for natural language generation.
For pre-training the text generation model MVP, we collect a labeled pre-training corpus from 45 datasets over seven generation
tasks. For each task, we further pre-train specific soft prompts to stimulate the model capacity in performing a specific task.
Extensive experiments have demonstrated the effectiveness of our supervised pre-training in a number of NLG tasks, and our
general methods achieve state-of-the-art performance on 12 of 17 datasets.
Text Generation aims to produce plausible and readable text in human language from input data. The resurgence of deep learning has greatly
advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). Grounding text generation
on PLMs is seen as a promising direction in both academia and industry. In this survey, we present the recent advances achieved in the
topic of PLMs for text generation. In detail, we begin with introducing three key points of applying PLMs to text generation: 1) how to
encode the input data as representations preserving input semantics which can be fused into PLMs; 2) how to design a universal and
performant architecture of PLMs served as generation models; and 3) how to optimize PLMs given the reference text and ensure the generated
text satisfying special text properties. Then, we figure out several challenges and future directions within each key point. Next,
we present a summary of various useful resources and typical text generation applications to work with PLMs. Finally, we conclude and
summarize the contribution of this survey.
WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jingyuan Wen,
Heng Zhang, Baogui Xu,
Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li,
Yida Zhao, Liang Zhang,
Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li,
Peiyu Liu, Zheng Gong,
Chuhao Jin, Yuchong Sun, Shizhe Chen, Zhiwu Lu*, Zhicheng Dou, Qin Jin,
Yanyan Lan, Wayne Xin Zhao,
Ruihua Song*, Ji-Rong Wen*
pdf / code
Multi-modal pre-training models have been intensively explored to bridge vision and language in recent years. However, most of them
explicitly model the cross-modal interaction between image-text pairs, by assuming that there exists strong semantic correlation
between the text and image modalities. Since this strong assumption is often invalid in real-world scenarios, we choose to
implicitly model the cross-modal correlation for large-scale multi-modal pre-training, which is the focus of the Chinese
project `WenLan' led by our team. Specifically, with the weak correlation assumption over image-text pairs, we propose a
two-tower pre-training model called BriVL within the cross-modal contrastive learning framework. Unlike OpenAI CLIP that
adopts a simple contrastive learning method, we devise a more advanced algorithm by adapting the latest method MoCo into
the cross-modal scenario. By building a large queue-based dictionary, our BriVL can incorporate more negative samples in
limited GPU resources. We further construct a large Chinese multi-source image-text dataset called RUC-CAS-WenLan for
pre-training our BriVL model. Extensive experiments demonstrate that the pre-trained BriVL model outperforms both UNITER
and OpenAI CLIP on various downstream tasks.
2022
We consider the text generation task under the approach of pre-trained language models (PLMs). Typically, an auto-regressive (AR)
paradigm is adopted for generating texts in a token-by-token manner. Despite many advantages of AR generation, it has been
widely blamed for its inefficient inference. Therefore, non-autoregressive (NAR) models are proposed to generate all target
tokens simultaneously. However, NAR models usually generate texts of lower quality due to the absence of token dependency
in the output text. In this paper, we propose ELMER: an efficient and effective PLM for NAR text generation to explicitly
model the token dependency during NAR generation. By leveraging the early exit technique, ELMER enables the token generations
at different layers, according to their prediction confidence (a more confident token will exit at a lower layer). Besides,
we propose a novel Layer Permutation Language Modeling to pre-train ELMER by permuting the exit layer for each
token in sequences. Experiments on three text generation tasks show that ELMER significantly outperforms NAR models and
further narrows the performance gap with AR PLMs (\eg ELMER (29.92) vs BART (30.61) ROUGE-L in XSUM) while achieving
over 10x inference speedups.
TextBox 2.0: A Text Generation Library with Pre-trained Language Models
Tianyi Tang†,
Junyi Li†,
Zhipeng Chen†,
Yiwen Hu,
Zhuohao Yu,
Wenxun Dai,
Wayne Xin Zhao*,
Jian-Yun Nie,
Ji-Rong Wen
The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022, System Demonstration
pdf / code
To facilitate research on text generation, this paper presents a comprehensive, unified, and standardized library, TextBox 2.0,
focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library considers 13 common text
generation tasks and their corresponding 83 datasets and incorporates 36 PLMs covering general, translation, dialogue,
controllable, distilled, Chinese, and lightweight PLMs. We also implement 4 efficient training strategies and provide 4
generation objectives for pre-training new PLMs from scratch. To be unified and standardized, we carefully design the
interfaces along the research pipeline (from data loading to training and evaluation), ensuring that each step can be
conducted in a unified, standard way. Though comprehensive and powerful, the use of our library is rather simple,
either by friendly Python API or command line. Besides, we perform extensive experiments to validate the effectiveness
of our library and provide useful methods to analyze the generated results.
Recently, pretrained language models (PLMs) have made exceptional success in language generation. To leverage the rich knowledge encoded
by PLMs, a simple yet powerful mechanism is to use prompts, in the form of either discrete tokens or continuous embeddings.
In existing studies, manual prompts are time-consuming and require domain expertise, while continuous prompts are typically
independent of the inputs. To address this issue, we propose a novel continuous prompting approach, called Context-Tuning,
to fine-tuning PLMs for natural language generation. Firstly, the prompts are derived based on the input text, so that they
can elicit useful knowledge from PLMs for generation. We refer to such prompts as contextualized prompts. Secondly, to further
enhance the relevance of the generated text to the inputs, we utilize continuous inverse prompting to refine the process of
natural language generation by modeling an inverse generation process from output to input. Moreover, we propose a lightweight
context-tuning, fine-tuning only 0.4% of parameters while retaining well performance.
As transformer evolves, pre-trained models have advanced at a breakneck pace in recent years. They have dominated the mainstream
techniques in natural language processing~(NLP) and computer vision~(CV). How to adapt pre-training to the field of
Vision-and-Language~(V-L) learning and improve downstream task performance becomes a focus of multimodal learning.
In this paper, we review the recent progress in Vision-Language Pre-Trained Models~(VL-PTMs). As the core content,
we first briefly introduce several ways to encode raw images and texts to single-modal embeddings before pre-training.
Then, we dive into the mainstream architectures of VL-PTMs in modeling the interaction between text and image
representations. We further present widely-used pre-training tasks, and then we introduce some common downstream tasks.
We finally conclude this paper and present some promising research directions. Our survey aims to provide researchers with
synthesis and pointer to related research.
Pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on
systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on
general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, i.e., memory,
comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results
demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning
PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between
similar tasks. Moreover, our final predicted results of PLMs can be reused as an open resource for more depth and granularity in
analyzing PLMs' language abilities. This paper can guide the future work to choose, apply, and design PLMs for specific tasks.
Pretrained language models (PLMs) have made remarkable progress in text generation tasks via fine-tuning. However, it is difficult
to fine-tune PLMs in a data-scarce situation. Therefore, it is non-trivial to develop a general and lightweight model that
can adapt to various text generation tasks based on PLMs. Prompt-based learning offers a potential solution. There are two
major challenges for applying prompt-based methods to data-scarce text generation tasks in a transferable setting. First,
it is difficult to effectively transfer prompts for new tasks. Second, it is important to design effective transferring
strategy considering both task- and instance-level information. To address these issues, we propose a novel prompt-based
transfer learning approach for text generation called PTG. PTG learns a set of source prompts for various source generation
tasks and then transfers these prompts to perform target generation tasks through an adaptive attention mechanism
considering both task- and instance-level information. In extensive experiments, PTG yields competitive or better
results than fine-tuning methods. We will release our source prompts as an open-source library, which can be added or reused
to improve new generation tasks for future researches.
2021
In this paper, we study the task of generating long and coherent text. In the literature, Generative Adversarial Nets (GAN) based methods
have been one of the mainstream approaches to generic text generation. We aim to improve two aspects of GAN-based methods in generic
text generation, namely long sequence optimization and semantic coherence enhancement. For this purpose, we propose a novel Multi-Level
Generative Adversarial Networks (MLGAN) for long and coherent text generation. Our approach explicitly models the text generation
process at three different levels, namely paragraph-, sentence- and word-level generation. At the top two levels, we generate
continuous paragraph vectors and sentence vectors as \emph{semantic sketches} to plan the entire content. While, at the bottom
level we generate discrete word tokens for realizing the sentences. Furthermore, we utilize a Conditional GAN architecture to
enhance the inter-sentence coherence by injecting paragraph vectors for sentence vector generation. Extensive experiments results
have demonstrated the effectiveness of the proposed model.
TextBox: A Unified, Modularized, and Extensible Framework for Text Generation
Junyi Li†,
Tianyi Tang†,
Gaole He,
Jinhao Jiang,
Xiaoxuan Hu,
Puzhao Xie,
Zhipeng Chen,
Zhuohao Yu,
Wayne Xin Zhao*,
Ji-Rong Wen
The 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021, System Demonstration
pdf / code
We release an open library, called TextBox, which provides a unified, modularized, and extensible text generation framework. TextBox aims
to support a broad set of text generation tasks and models. In TextBox, we implements several text generation models on benchmark
datasets, covering the categories of VAE, GAN, pre-trained language models, etc. Meanwhile, our library maintains sufficient
modularity and extensibility by properly decomposing the model architecture, inference, learning process into highly reusable modules,
which allows easily incorporating new models into our framework. It is specially suitable for researchers and practitioners to
efficiently reproduce baseline models and develop new models. TextBox is implemented based on PyTorch, and released under
Apache License 2.0 at the link https://github.com/RUCAIBox/TextBox.
This paper studies how to automatically generate a natural language text that describes facts in knowledge graph (KG).
Considering a fewshot setting, we leverage the excellent capacities of pretrained language models (PLMs) in
language understanding and generation. We introduce three major technical contributions, namely representation
alignment for bridging the semantic gap between KG encodings and PLMs, relation-biased KG linearization for
deriving better input representations, and multitask learning for learning the correspondence between KG and text.
Extensive experiments on three benchmarks have demonstrated the effectiveness of our model on KG-to-text generation task.
In particular, our model outperforms existing systems on most few-shot settings.
Text generation has become one of the most important yet challenging tasks in natural language
processing (NLP). The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). In this paper,
we presents an overview of the major advances
achieved in the topic of PLM for text generation.
As the preliminaries, we present the general task
definition and briefly describe the mainstream architectures of PLMs. As the core content, we discuss how to adapt existing PLMs to model different
input data and satisfy the properties in the generated text. We further summarize several important
fine-tuning strategies for text generation. Finally,
we present several future directions and conclude
this paper. Our survey aims to provide text generation researchers a synthesis and pointer to relatedresearch.
As a natural language generation task, it is challenging to generate informative and coherent review text. In order to enhance
the informativeness of the generated text, existing solutions typically learn to copy entities or triples from knowledge graphs (KGs).
However, they lack overall consideration to select and arrange the
incorporated knowledge, which tends to cause text incoherence.
To address the above issue, we focus on improving entity-centric
coherence of the generated reviews by leveraging the semantic structure of KGs. In this paper, we propose a novel Coherence Enhanced
Text Planning model (CETP) based on knowledge graphs (KGs) to
improve both global and local coherence for review generation. The
proposed model learns a two-level text plan for generating a document: (1) the document plan is modeled as a sequence of sentence
plans in order, and (2) the sentence plan is modeled as an entitybased subgraph from KG. Local coherence can be naturally enforced
by KG subgraphs through intra-sentence correlations between entities. For global coherence, we design a hierarchical self-attentive
architecture with both subgraph- and node-level attention to enhance the correlations between subgraphs. To our knowledge, we
are the first to utilize a KG-based text planning model to enhance
text coherence for review generation. Extensive experiments on
three datasets confirm the effectiveness of our model on improving
the content coherence of generated texts.
2020
Personalized review generation (PRG) aims to automatically produce review text reflecting user preference, which is a challenging
natural language generation task. Most of previous studies do not
explicitly model factual description of products, tending to generate
uninformative content. Moreover, they mainly focus on word-level
generation, but cannot accurately reflect more abstractive user
preference in multiple aspects.
To address the above issues, we propose a novel knowledgeenhanced PRG model based on capsule graph neural network (CapsGNN). We first construct a heterogeneous knowledge graph (HKG)
for utilizing rich item attributes. We adopt Caps-GNN to learn
graph capsules for encoding underlying characteristics from the
HKG. Our generation process contains two major steps, namely
aspect sequence generation and sentence generation. First, based
on graph capsules, we adaptively learn aspect capsules for inferring the aspect sequence. Then, conditioned on the inferred aspect
label, we design a graph-based copy mechanism to generate sentences by incorporating related entities or words from HKG. To
our knowledge, we are the first to utilize knowledge graph for the
PRG task. The incorporated KG information is able to enhance user
preference at both aspect and word levels. Extensive experiments
on three real-world datasets have demonstrated the effectiveness
of our model on the PRG task.
The task of Knowledge Graph Completion (KGC) aims to automatically infer the missing fact information in Knowledge Graph (KG).In this paper, we take a new perspective that aims to leverage rich
user-item interaction data (user interaction data for short) for improving the KGC task. Our work is inspired by the observation that
many KG entities correspond to online items in application systems.
However, the two kinds of data sources have very different intrinsic
characteristics, and it is likely to hurt the original representation
performance using simple fusion strategy.
To address this challenge, we propose a novel adversarial learning approach for leveraging user interaction data for the KGC task.
Our generator is isolated from user interaction data, and improves
itself according to the feedback from the discriminator. The discriminator takes the learned useful information from user interaction
data as input, and gradually enhances the evaluation capacity in
order to identify the fake samples generated by the generator. To
discover implicit entity preference of users, we design an elaborate
collaborative learning algorithms based on graph neural networks,
which will be jointly optimized with the discriminator. Such an
approach is effective to alleviate the issues about data heterogeneity
and semantic complexity for the KGC task. Extensive experiments
on three real-world datasets have demonstrated the effectiveness
of our approach on the KGC task.
2019
Generating long and informative review text
is a challenging natural language generation
task. Previous work focuses on word-level
generation, neglecting the importance of topical and syntactic characteristics from natural
languages. In this paper, we propose a novel
review generation model by characterizing an
elaborately designed aspect-aware coarse-tofine generation process. First, we model the
aspect transitions to capture the overall content
flow. Then, to generate a sentence, an aspectaware sketch will be predicted using an aspectaware decoder. Finally, another decoder fills in
the semantic slots by generating corresponding words. Our approach is able to jointly
utilize aspect semantics, syntactic sketch, and
context information. Extensive experiments
results have demonstrated the effectiveness of
the proposed model.
* Corresponding author
† Equal contribution
|