Publications
* Corresponding author
† Equal contribution
Preprint
!!!!!A Survey of Large Language Models
Wayne Xin Zhao,
Kun Zhou†,
Junyi Li†,
Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu,
Jian-Yun Nie,
Ji-Rong Wen
pdf / code
Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge
to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely
studied for language understanding and generation in the past two decades, evolving from statistical language models to neural
language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale
corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to
performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly,
when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance
improvement but also show some special abilities that are not present in small-scale language models. To discriminate the
difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant
size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the
launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an
important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this
survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular,
we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides,
we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.
Adapting general large language models (LLMs) to specialized domains presents great challenges due to varied data distributions. This adaptation
typically requires continual pre-training on massive domain-specific corpora to facilitate knowledge memorization, followed by training to
apply this knowledge following human instructions and preferences. However, this method may result in inefficient knowledge memorization due
to a lack of awareness of knowledge utilization and imposes substantial demands on LLMs to simultaneously learn knowledge utilization and
format alignment with limited training samples. To facilitate the domain adaptation of LLM, we revise this process and propose a new domain
adaptation framework including domain knowledge learning and general format alignment, called Mix-CPT. Specifically, we first conduct a
knowledge mixture continual pre-training that concurrently focuses on knowledge memorization and utilization, allowing for mutual reinforcement.
To avoid catastrophic forgetting during the continual pre-training process, we further incorporate a logit swap self-distillation constraint.
Subsequently, leveraging the knowledge and capabilities acquired during continual pre-training, we efficiently perform instruction tuning and
alignment with a few general training samples to achieve format alignment. Extensive experiments demonstrate that our proposed Mix-CPT framework
can simultaneously improve the task-solving capabilities of LLMs on the target and general domains compared to the traditional adaptation methods.
2024
Transformer-based large language models (LLMs) typically have a limited context window, resulting in significant performance degradation when processing
text beyond the length of the context window. Extensive studies have been proposed to extend the context window and achieve length extrapolation of
LLMs, but there is still a lack of in-depth interpretation of these approaches. In this study, we explore the positional information within and beyond
the context window for deciphering the underlying mechanism of LLMs. By using a mean-based decomposition method, we disentangle positional vectors
from hidden states of LLMs and analyze their formation and effect on attention. Furthermore, when texts exceed the context window, we analyze the
change of positional vectors in two settings, i.e., direct extrapolation and context window extension. Based on our findings, we design two training-free
context window extension methods, positional vector replacement and attention window extension. Experimental results show that our methods can
effectively extend the context window length.
Hallucination detection is a challenging task for large language models (LLMs), and existing studies heavily rely on powerful closed-source LLMs such as
GPT-4. In this paper, we propose an autonomous LLM-based agent framework, called HaluAgent, which enables relatively smaller LLMs (e.g. Baichuan2-Chat 7B)
to actively select suitable tools for detecting multiple hallucination types such as text, code, and mathematical expression. In HaluAgent, we integrate
the LLM, multi-functional toolbox, and design a fine-grained three-stage detection framework along with memory mechanism. To facilitate the effectiveness
of HaluAgent, we leverage existing Chinese and English datasets to synthesize detection trajectories for fine-tuning, which endows HaluAgent with the
capability for bilingual hallucination detection. Extensive experiments demonstrate that only using 2K samples for tuning LLMs, HaluAgent can perform
hallucination detection on various types of tasks and datasets, achieving performance comparable to or even higher than GPT-4 without tool enhancements
on both in-domain and out-of-domain datasets.
Considering the limited internal parametric knowledge, retrieval-augmented generation (RAG) has been widely used to extend the knowledge scope of large
language models (LLMs). Despite the extensive efforts on RAG research, in existing methods, LLMs cannot precisely assess the relevance of retrieved
documents, thus likely leading to misleading or even incorrect utilization of external knowledge (i.e., retrieved documents). To address this issue,
in this paper, we propose REAR, a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA). As the key motivation, we aim
to enhance the self-awareness of source relevance for LLMs, so as to adaptively utilize external knowledge in RAG systems. Specially, we develop a
new architecture for LLM based RAG system, by incorporating a specially designed rank head that precisely assesses the relevance of retrieved documents.
Furthermore, we propose an improved training method based on bi-granularity relevance fusion and noise-resistant training. By combining the improvements
in both architecture and training, our proposed REAR can better utilize external knowledge by effectively perceiving the relevance of retrieved documents.
Experiments on four open-domain QA tasks show that REAR significantly outperforms previous a number of competitive RAG approaches.
In the era of large language models (LLMs), hallucination (i.e., the tendency to generate factually incorrect content) poses great challenge to
trustworthy and reliable deployment of LLMs in real-world applications. To tackle the LLM hallucination, three key questions should be well
studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them (mitigation). To
address these challenges, this work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination
detection, source and mitigation. Specially, we construct a new hallucination benchmark HaluEval 2.0, and designs a simple yet effective detection
method for LLM hallucination. Furthermore, we zoom into the different training or utilization stages of LLMs and extensively analyze the potential
factors that lead to the LLM hallucination. Finally, we implement and examine a series of widely used techniques to mitigate the hallucinations in
LLMs. Our work has led to several important findings to understand the hallucination origin and mitigate the hallucinations in LLMs.
Chain-of-Thought (CoT) prompting can largely enhance the reasoning capabilities of large language models (LLMs), establishing itself as a primary approach to solve complex reasoning tasks. Existing CoT synthesis approaches usually focus on simpler reasoning tasks and result in low-quality and inconsistent CoT prompts.
In response to this challenge, we present an empirical investigation of CoT prompting and introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts. CoTGenius is developed based on three major evolution strategies, \ie complicate, diversify, and specify—alongside two filtering mechanisms: evolutionary success judgement and correctness verification. We further employ CoTGenius to create an extensive CoT dataset, and subsequently fine-tune the Llama 2-Chat 7B and 13B models on this dataset. We call the resulting model ChainLM.
To deal with the cumulative error issue in reasoning steps, we propose a step-level debating method, wherein multiple debaters discuss each reasoning step to arrive at the correct answer. Extensive experiments demonstrate that our ChainLM models exhibit enhanced proficiency in addressing a spectrum of complex reasoning problems compared to existing models.
In addition, we conduct an in-depth analysis of the impact of data categories within CoTGenius on the model performance.
Large language models (LLMs) have achieved dramatic proficiency over NLP tasks with normal length. Recently, multiple studies have committed to extending the context length and enhancing the long text modeling capabilities of LLMs.
To comprehensively evaluate the long context ability of LLMs, we propose BAMBOO, a multi-task long context benchmark. BAMBOO has been designed with four principles: comprehensive capacity evaluation, avoidance of data contamination, accurate automatic evaluation, and different length levels.
It consists of 10 datasets from 5 different long text understanding tasks, \ie question answering, hallucination detection, text sorting, language modeling, and code completion, to cover various domains and core capacities of LLMs.
We conduct experiments with five widely-used long-context models and further discuss five key questions for long text research.
In the end, we discuss problems of current long-context models and point out future directions for enhancing long text modeling capacities.
Text Generation aims to produce plausible and readable text in human language from input data. The resurgence of deep learning has greatly advanced this field,
in particular, with the help of neural generation models based on pre-trained language models (PLMs). Text generation based on PLMs is viewed as
a promising approach in both academia and industry. In this paper, we provide a survey on the utilization of PLMs in text generation. We begin
with introducing two key aspects of applying PLMs to text generation: 1) how to design an effective PLM to serve as the generation model; and
2) how to effectively optimize PLMs given the reference text and to ensure that the generated texts satisfy special text properties. Then, we
show the major challenges {that have arisen} in these aspects, as well as possible solutions for them. We also include a summary of various
useful resources and typical text generation applications based on PLMs. Finally, we highlight the future research directions which will further
improve these PLMs for text generation. This comprehensive survey is intended to help researchers interested in text generation problems to learn
the core concepts, the main techniques and the latest developments in this area based on PLMs.
2023
Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, i.e., content that conflicts with
the source or cannot be verified by the factual knowledge. To understand what types of content and to which extent
LLMs are apt to hallucinate, we introduce the Hallucination Evaluation for Large Language Models (HaluEval)
benchmark, a large collection of generated and human-annotated hallucinated samples for evaluating the performance
of LLMs in recognizing hallucination. To generate these samples, we propose a ChatGPT-based two-step framework,
i.e., sampling-then-filtering. Besides, we also hire some human labelers to annotate the hallucinations in
ChatGPT responses. The empirical results suggest that ChatGPT is likely to generate hallucinated content in
specific topics by fabricating unverifiable information (i.e., about 11.4% user queries). Moreover, existing
LLMs face great challenges in recognizing the hallucinations in texts. While, our experiments also prove that
the hallucination recognition can be improved by providing external knowledge or adding reasoning steps.
People often imagine relevant scenes to aid in the writing process. In this work, we aim to utilize visual
information for composition in the same manner as humans. We propose a method, LIVE, that makes pre-trained
language models (PLMs) Learn to Imagine for Visually-augmented natural language gEneration. First, we imagine
the scene based on the text: we use a diffusion model to synthesize high-quality images conditioned on the
input and output texts. Second, we use determine whether the text can evoke the imagination in posterior.
Finally, our imagination is dynamic, and we conduct scene imagination for each sentence, rather than imagining
a scene for a entire paragraph. Technically, we propose a novel plug-and-play fusion layer to obtain
visually-augmented textual representations for each text. Our vision-text fusion layer is compatible with
Transformer-based architecture. We have conducted extensive experiments on four generation tasks using
BART and T5, and the automatic results and human evaluation demonstrate the effectiveness of our proposed method.
Pre-trained language models (PLMs) have achieved notable success in natural language generation (NLG) tasks. Up to now, most of
the PLMs are pre-trained in an unsupervised manner using large-scale general corpus. In the meanwhile, an increasing number
of models pre-trained with less labeled data showcase superior performance compared to unsupervised models. Motivated by the
success of supervised pre-training, we propose Multi-task superVised Pre-training (MVP) for natural language generation.
For pre-training the text generation model MVP, we collect a labeled pre-training corpus from 45 datasets over seven generation
tasks. For each task, we further pre-train specific soft prompts to stimulate the model capacity in performing a specific task.
Extensive experiments have demonstrated the effectiveness of our supervised pre-training in a number of NLG tasks, and our
general methods achieve state-of-the-art performance on 12 of 17 datasets.
In this paper, we propose a novel language model guided captioning approach, Lamoc, for knowledge-based visual
question answering~(VQA). Our approach employs the generated captions by a captioning model as the context
of an answer prediction model, which is a Pre-Trained Language model~(PLM). As the major contribution, we
leverage the guidance and feedback of the prediction model to improve the capability of the captioning model.
In this way, the captioning model can become aware of the task goal and information need from the PLM.
To develop our approach, we design two specific training stages, where the first stage adapts the captioning
model to the prediction model (selecting more suitable caption propositions for training) and the second stage
tunes the captioning model according to the task goal (learning from feedback of the PLM). Extensive experiments
demonstrate the effectiveness of the proposed approach on the knowledge-based VQA task. Specifically,
on the challenging A-OKVQA dataset, Lamoc outperforms several competitive zero-shot methods and even achieves
comparable results to a fine-tuned VLP model.
Pretrained language models (PLMs) encode a large amount of world knowledge. However, as such knowledge is frozen at the
time of model training, the models become static and limited by the training data at that time.
In order to further improve the capacity of PLMs for knowledge-intensive tasks, we consider augmenting PLMs with
the large-scale web using search engine. Unlike previous augmentation sources (\eg Wikipedia data dump), the web
provides broader, more comprehensive and constantly updated information. In this paper, we present a web-augmented
PLM -- UniWeb, which is trained over 16 knowledge-intensive tasks in a unified text-to-text format. Instead of
simply using the retrieved contents from web, our approach has made two major improvements. Firstly, we propose
an adaptive search engine assisted learning method that can self-evaluate the confidence level of PLM’s predictions,
and adaptively determine when to refer to the web for more data, which can avoid useless or noisy augmentation
from web. Secondly, we design a pretraining task, \ie continual knowledge learning, based on salient spans
prediction, to reduce the discrepancy between the encoded and retrieved knowledge. Experiments on a wide
range of knowledge-intensive tasks show that our model significantly outperforms previous retrieval-augmented methods.
2022
We consider the text generation task under the approach of pre-trained language models (PLMs). Typically, an auto-regressive (AR)
paradigm is adopted for generating texts in a token-by-token manner. Despite many advantages of AR generation, it has been
widely blamed for its inefficient inference. Therefore, non-autoregressive (NAR) models are proposed to generate all target
tokens simultaneously. However, NAR models usually generate texts of lower quality due to the absence of token dependency
in the output text. In this paper, we propose ELMER: an efficient and effective PLM for NAR text generation to explicitly
model the token dependency during NAR generation. By leveraging the early exit technique, ELMER enables the token generations
at different layers, according to their prediction confidence (a more confident token will exit at a lower layer). Besides,
we propose a novel Layer Permutation Language Modeling to pre-train ELMER by permuting the exit layer for each
token in sequences. Experiments on three text generation tasks show that ELMER significantly outperforms NAR models and
further narrows the performance gap with AR PLMs (\eg ELMER (29.92) vs BART (30.61) ROUGE-L in XSUM) while achieving
over 10x inference speedups.
TextBox 2.0: A Text Generation Library with Pre-trained Language Models
Tianyi Tang†,
Junyi Li†,
Zhipeng Chen†,
Yiwen Hu,
Zhuohao Yu,
Wenxun Dai,
Wayne Xin Zhao*,
Jian-Yun Nie,
Ji-Rong Wen
The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022, System Demonstration
pdf / code
To facilitate research on text generation, this paper presents a comprehensive, unified, and standardized library, TextBox 2.0,
focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library considers 13 common text
generation tasks and their corresponding 83 datasets and incorporates 36 PLMs covering general, translation, dialogue,
controllable, distilled, Chinese, and lightweight PLMs. We also implement 4 efficient training strategies and provide 4
generation objectives for pre-training new PLMs from scratch. To be unified and standardized, we carefully design the
interfaces along the research pipeline (from data loading to training and evaluation), ensuring that each step can be
conducted in a unified, standard way. Though comprehensive and powerful, the use of our library is rather simple,
either by friendly Python API or command line. Besides, we perform extensive experiments to validate the effectiveness
of our library and provide useful methods to analyze the generated results.
Recently, pretrained language models (PLMs) have made exceptional success in language generation. To leverage the rich knowledge encoded
by PLMs, a simple yet powerful mechanism is to use prompts, in the form of either discrete tokens or continuous embeddings.
In existing studies, manual prompts are time-consuming and require domain expertise, while continuous prompts are typically
independent of the inputs. To address this issue, we propose a novel continuous prompting approach, called Context-Tuning,
to fine-tuning PLMs for natural language generation. Firstly, the prompts are derived based on the input text, so that they
can elicit useful knowledge from PLMs for generation. We refer to such prompts as contextualized prompts. Secondly, to further
enhance the relevance of the generated text to the inputs, we utilize continuous inverse prompting to refine the process of
natural language generation by modeling an inverse generation process from output to input. Moreover, we propose a lightweight
context-tuning, fine-tuning only 0.4% of parameters while retaining well performance.
As transformer evolves, pre-trained models have advanced at a breakneck pace in recent years. They have dominated the mainstream
techniques in natural language processing~(NLP) and computer vision~(CV). How to adapt pre-training to the field of
Vision-and-Language~(V-L) learning and improve downstream task performance becomes a focus of multimodal learning.
In this paper, we review the recent progress in Vision-Language Pre-Trained Models~(VL-PTMs). As the core content,
we first briefly introduce several ways to encode raw images and texts to single-modal embeddings before pre-training.
Then, we dive into the mainstream architectures of VL-PTMs in modeling the interaction between text and image
representations. We further present widely-used pre-training tasks, and then we introduce some common downstream tasks.
We finally conclude this paper and present some promising research directions. Our survey aims to provide researchers with
synthesis and pointer to related research.
Pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on
systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on
general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, i.e., memory,
comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results
demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning
PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between
similar tasks. Moreover, our final predicted results of PLMs can be reused as an open resource for more depth and granularity in
analyzing PLMs' language abilities. This paper can guide the future work to choose, apply, and design PLMs for specific tasks.
Pretrained language models (PLMs) have made remarkable progress in text generation tasks via fine-tuning. However, it is difficult
to fine-tune PLMs in a data-scarce situation. Therefore, it is non-trivial to develop a general and lightweight model that
can adapt to various text generation tasks based on PLMs. Prompt-based learning offers a potential solution. There are two
major challenges for applying prompt-based methods to data-scarce text generation tasks in a transferable setting. First,
it is difficult to effectively transfer prompts for new tasks. Second, it is important to design effective transferring
strategy considering both task- and instance-level information. To address these issues, we propose a novel prompt-based
transfer learning approach for text generation called PTG. PTG learns a set of source prompts for various source generation
tasks and then transfers these prompts to perform target generation tasks through an adaptive attention mechanism
considering both task- and instance-level information. In extensive experiments, PTG yields competitive or better
results than fine-tuning methods. We will release our source prompts as an open-source library, which can be added or reused
to improve new generation tasks for future researches.
2021
In this paper, we study the task of generating long and coherent text. In the literature, Generative Adversarial Nets (GAN) based methods
have been one of the mainstream approaches to generic text generation. We aim to improve two aspects of GAN-based methods in generic
text generation, namely long sequence optimization and semantic coherence enhancement. For this purpose, we propose a novel Multi-Level
Generative Adversarial Networks (MLGAN) for long and coherent text generation. Our approach explicitly models the text generation
process at three different levels, namely paragraph-, sentence- and word-level generation. At the top two levels, we generate
continuous paragraph vectors and sentence vectors as \emph{semantic sketches} to plan the entire content. While, at the bottom
level we generate discrete word tokens for realizing the sentences. Furthermore, we utilize a Conditional GAN architecture to
enhance the inter-sentence coherence by injecting paragraph vectors for sentence vector generation. Extensive experiments results
have demonstrated the effectiveness of the proposed model.
TextBox: A Unified, Modularized, and Extensible Framework for Text Generation
Junyi Li†,
Tianyi Tang†,
Gaole He,
Jinhao Jiang,
Xiaoxuan Hu,
Puzhao Xie,
Zhipeng Chen,
Zhuohao Yu,
Wayne Xin Zhao*,
Ji-Rong Wen
The 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021, System Demonstration
pdf / code
We release an open library, called TextBox, which provides a unified, modularized, and extensible text generation framework. TextBox aims
to support a broad set of text generation tasks and models. In TextBox, we implements several text generation models on benchmark
datasets, covering the categories of VAE, GAN, pre-trained language models, etc. Meanwhile, our library maintains sufficient
modularity and extensibility by properly decomposing the model architecture, inference, learning process into highly reusable modules,
which allows easily incorporating new models into our framework. It is specially suitable for researchers and practitioners to
efficiently reproduce baseline models and develop new models. TextBox is implemented based on PyTorch, and released under
Apache License 2.0 at the link https://github.com/RUCAIBox/TextBox.
This paper studies how to automatically generate a natural language text that describes facts in knowledge graph (KG).
Considering a fewshot setting, we leverage the excellent capacities of pretrained language models (PLMs) in
language understanding and generation. We introduce three major technical contributions, namely representation
alignment for bridging the semantic gap between KG encodings and PLMs, relation-biased KG linearization for
deriving better input representations, and multitask learning for learning the correspondence between KG and text.
Extensive experiments on three benchmarks have demonstrated the effectiveness of our model on KG-to-text generation task.
In particular, our model outperforms existing systems on most few-shot settings.
Text generation has become one of the most important yet challenging tasks in natural language
processing (NLP). The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). In this paper,
we presents an overview of the major advances
achieved in the topic of PLM for text generation.
As the preliminaries, we present the general task
definition and briefly describe the mainstream architectures of PLMs. As the core content, we discuss how to adapt existing PLMs to model different
input data and satisfy the properties in the generated text. We further summarize several important
fine-tuning strategies for text generation. Finally,
we present several future directions and conclude
this paper. Our survey aims to provide text generation researchers a synthesis and pointer to relatedresearch.
As a natural language generation task, it is challenging to generate informative and coherent review text. In order to enhance
the informativeness of the generated text, existing solutions typically learn to copy entities or triples from knowledge graphs (KGs).
However, they lack overall consideration to select and arrange the
incorporated knowledge, which tends to cause text incoherence.
To address the above issue, we focus on improving entity-centric
coherence of the generated reviews by leveraging the semantic structure of KGs. In this paper, we propose a novel Coherence Enhanced
Text Planning model (CETP) based on knowledge graphs (KGs) to
improve both global and local coherence for review generation. The
proposed model learns a two-level text plan for generating a document: (1) the document plan is modeled as a sequence of sentence
plans in order, and (2) the sentence plan is modeled as an entitybased subgraph from KG. Local coherence can be naturally enforced
by KG subgraphs through intra-sentence correlations between entities. For global coherence, we design a hierarchical self-attentive
architecture with both subgraph- and node-level attention to enhance the correlations between subgraphs. To our knowledge, we
are the first to utilize a KG-based text planning model to enhance
text coherence for review generation. Extensive experiments on
three datasets confirm the effectiveness of our model on improving
the content coherence of generated texts.
2020
Personalized review generation (PRG) aims to automatically produce review text reflecting user preference, which is a challenging
natural language generation task. Most of previous studies do not
explicitly model factual description of products, tending to generate
uninformative content. Moreover, they mainly focus on word-level
generation, but cannot accurately reflect more abstractive user
preference in multiple aspects.
To address the above issues, we propose a novel knowledgeenhanced PRG model based on capsule graph neural network (CapsGNN). We first construct a heterogeneous knowledge graph (HKG)
for utilizing rich item attributes. We adopt Caps-GNN to learn
graph capsules for encoding underlying characteristics from the
HKG. Our generation process contains two major steps, namely
aspect sequence generation and sentence generation. First, based
on graph capsules, we adaptively learn aspect capsules for inferring the aspect sequence. Then, conditioned on the inferred aspect
label, we design a graph-based copy mechanism to generate sentences by incorporating related entities or words from HKG. To
our knowledge, we are the first to utilize knowledge graph for the
PRG task. The incorporated KG information is able to enhance user
preference at both aspect and word levels. Extensive experiments
on three real-world datasets have demonstrated the effectiveness
of our model on the PRG task.
The task of Knowledge Graph Completion (KGC) aims to automatically infer the missing fact information in Knowledge Graph (KG).In this paper, we take a new perspective that aims to leverage rich
user-item interaction data (user interaction data for short) for improving the KGC task. Our work is inspired by the observation that
many KG entities correspond to online items in application systems.
However, the two kinds of data sources have very different intrinsic
characteristics, and it is likely to hurt the original representation
performance using simple fusion strategy.
To address this challenge, we propose a novel adversarial learning approach for leveraging user interaction data for the KGC task.
Our generator is isolated from user interaction data, and improves
itself according to the feedback from the discriminator. The discriminator takes the learned useful information from user interaction
data as input, and gradually enhances the evaluation capacity in
order to identify the fake samples generated by the generator. To
discover implicit entity preference of users, we design an elaborate
collaborative learning algorithms based on graph neural networks,
which will be jointly optimized with the discriminator. Such an
approach is effective to alleviate the issues about data heterogeneity
and semantic complexity for the KGC task. Extensive experiments
on three real-world datasets have demonstrated the effectiveness
of our approach on the KGC task.
2019
Generating long and informative review text
is a challenging natural language generation
task. Previous work focuses on word-level
generation, neglecting the importance of topical and syntactic characteristics from natural
languages. In this paper, we propose a novel
review generation model by characterizing an
elaborately designed aspect-aware coarse-tofine generation process. First, we model the
aspect transitions to capture the overall content
flow. Then, to generate a sentence, an aspectaware sketch will be predicted using an aspectaware decoder. Finally, another decoder fills in
the semantic slots by generating corresponding words. Our approach is able to jointly
utilize aspect semantics, syntactic sketch, and
context information. Extensive experiments
results have demonstrated the effectiveness of
the proposed model.
|