Junyi Li (李军毅)

I am a final-year PhD student in the Department of Computer Science and Operations Research from Université de Montréal, advised by Prof. Jian-Yun Nie. Prior to that, I obtained master degree and bachelor degree from Renmin University of China, supervised by Prof. Xin Zhao.

Email: junyi_cs at nus dot edu dot sg / cheneyjunyi at gmail dot com

CV / Google Scholar / GitHub / Zhihu

Research

My research interests lie broadly in natural language processing and multi-modal systems, with an emphasis on large language models (LLMs). I am always excited about building reliable, trustable, and adaptable AI systems.

Hallucination in Large Language Models: Construct comprehensive and automatic hallucination evaluation benchmark (HaluEval, HaluEval 2.0), empirically explore the source of hallucinations in LLMs (HaluEval 2.0), build powerful hallucination detection agent (HaluAgent) and develop effective hallucination mitigation techniques (HaluSearch).
Retrieval-Augmented Generation: Build robust and up-to-date RAG systems (UniWeb), effective and adaptive retrieval algorithms (REAR) and integrate retrieval with language models to achieve optimal synergy between them.
Adaptable Large Language Models: Domain adaptation of LLMs (Mix-CPT) and long context adaptation i.e., long context evaluation benchmark without data contamination (BAMBOO), empirically explore the underlying mechanism of positional vectors (arXiv) and develop effective context extension algorithms for LLMs.
I am also interested in research that advances the complex reasoning capability of (multi-modal) AI systems to facilitate their understanding on our real world (LAMOC, ChainLM).

News

Publications

* Corresponding author
† Equal contribution

Preprint

!!!!!A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou†, Junyi Li†, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen
pdf / code

Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
Xiaoxue Cheng†, Junyi Li†, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao*, Xinyu Kong, Zhiqiang Zhang
pdf / code

ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework
Lisheng Huang†, Yichen Liu†, Jinhao Jiang†, Rongxiang Zhang, Jiahao Yan, Junyi Li*, Wayne Xin Zhao*
pdf / code

The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models
Junyi Li, Hwee Tou Ng
pdf / code

2025

Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
Junyi Li, Hwee Tou Ng
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025
pdf / code

LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Zican Dong†, Junyi Li†, Jinhao Jiang, Mingyu Xu, Wayne Xin Zhao*, Bingning Wang, Weipeng Chen
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025
pdf / code

Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking
Xiaoxue Cheng†, Junyi Li†, Wayne Xin Zhao*, Ji-Rong Wen
Findings of The 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025
pdf / code

Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models
Muhammad Reza Qorib, Junyi Li, Hwee Tou Ng
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025
pdf / code

DynaQuest: A Dynamic Question Answering Dataset Reflecting Real-World Knowledge Updates
Qian Lin, Junyi Li, Hwee Tou Ng
Findings of The 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025
pdf / code

Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User
Xiaolei Wang, Chunxuan Xia, Junyi Li, Fanzhe Meng, Lei Huang, Jinpeng Wang, Wayne Xin Zhao*, Ji-Rong Wen
The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
pdf / code

Holistically Guided Monte Carlo Tree Search for Intricate Information Seeking
Ruiyang Ren†, Yuhao Wang†, Junyi Li†, Jinhao Jiang, Wayne Xin Zhao*, Wenjie Wang, Tat-Seng Chua
The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
pdf / code

Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment
Jinhao Jiang†, Junyi Li†, Wayne Xin Zhao*, Yang Song, Tao Zhang, Ji-Rong Wen
The Thirteenth International Conference on Learning Representations (ICLR), 2025
pdf / code

Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning
Xiaolei Wang, Xinyu Tang, Junyi Li, Wayne Xin Zhao*, Ji-Rong Wen
The Thirteenth International Conference on Learning Representations (ICLR), 2025
pdf / code

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Jinhao Jiang†, Jiayi Chen†, Junyi Li†, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao*, Yang Song, Tao Zhang
The Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025
pdf / code

2024

Exploring Context Window of Large Language Models via Decomposed Positional Vectors
Zican Dong†, Junyi Li†, Xin Men, Wayne Xin Zhao*, Bingning Wang, Zhen Tian, Weipeng Chen, Ji-Rong Wen
The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS Spotlight), 2024
pdf

Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
Xiaoxue Cheng†, Junyi Li†, Wayne Xin Zhao*, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Kun Gai, Ji-Rong Wen
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
pdf / code

REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering
Yuhao Wang, Ruiyang Ren, Junyi Li, Wayne Xin Zhao*, Jing Liu, Ji-Rong Wen
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
pdf / code

LLMBox: A Comprehensive Library for Large Language Models
Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao*, Ji-Rong Wen
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL System Demonstrations), 2024
pdf / code

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models
Junyi Li, Jie Chen, Ruiyang Ren, Xiaoxue Cheng, Wayne Xin Zhao*, Jian-Yun Nie, Ji-Rong Wen
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024
pdf / code

ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting
Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao*, Ji-Rong Wen
The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024
pdf / code

BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong, Tianyi Tang, Junyi Li, Wayne Xin Zhao*, Ji-Rong Wen
The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024
pdf / code

Pretrained Language Models Based Text Generation: A Survey
Junyi Li†, Tianyi Tang†, Wayne Xin Zhao*, Jian-Yun Nie, Ji-Rong Wen
ACM Computing Surveys (Impact Factor: 23.8), 2024
pdf

2023

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Junyi Li†, Xiaoxue Cheng†, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
pdf / code

Learning to Imagine: Visually-Augmented Natural Language Generation
Tianyi Tang, Yushuo Chen, Yifan Du, Junyi Li, Wayne Xin Zhao*, Ji-Rong Wen
The 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023
pdf / code

MVP: Multi-task Supervised Pre-training for Natural Language Generation
Tianyi Tang, Junyi Li, Wayne Xin Zhao*, Ji-Rong Wen
Findings of The 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023
pdf / code

Zero-shot Visual Question Answering with Language Model Feedback
Yifan Du, Junyi Li, Tianyi Tang, Wayne Xin Zhao*, Ji-Rong Wen
Findings of The 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023
pdf / code

The Web Can Be Your Oyster for Improving Language Models
Junyi Li, Tianyi Tang, Wayne Xin Zhao*, Jingyuan Wang, Jian-Yun Nie, Ji-Rong Wen
Findings of The 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023
pdf / code

2022

ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation
Junyi Li, Tianyi Tang, Wayne Xin Zhao*, Jian-Yun Nie, Ji-Rong Wen
The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
pdf / code

TextBox 2.0: A Text Generation Library with Pre-trained Language Models
Tianyi Tang†, Junyi Li†, Zhipeng Chen†, Yiwen Hu, Zhuohao Yu, Wenxun Dai, Wayne Xin Zhao*, Jian-Yun Nie, Ji-Rong Wen
The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022, System Demonstration
pdf / code

Context-Tuning: Learning Contextualized Prompts for Natural Language Generation
Tianyi Tang, Junyi Li, Wayne Xin Zhao*, Ji-Rong Wen
The 29th International Conference on Computational Linguistics (COLING), 2022
pdf / code

A Survey of Vision-Language Pre-Trained Models
Yifan Du†, Zikang Liu†, Junyi Li, Wayne Xin Zhao*
The 31th International Joint Conference on Artificial Intelligence (IJCAI), 2022, Survey Track
pdf

ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models
Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Zhipeng Chen, Jingyuan Wang, Wayne Xin Zhao*, Ji-Rong Wen
The North American Chapter of the Association for Computational Linguistics (NAACL), 2022
pdf / code

Learning to Transfer Prompts for Text Generation
Junyi Li, Tianyi Tang, Jian-Yun Nie, Ji-Rong Wen, Wayne Xin Zhao*
The North American Chapter of the Association for Computational Linguistics (NAACL), 2022
pdf / code

2021

Generating Long and Coherent Text with Multi-Level Generative Adversarial Networks
Tianyi Tang, Junyi Li, Wayne Xin Zhao*, Ji-Rong Wen
The 5th APWeb-WAIM International Joint Conference on Web and Big Data (APWeb-WAIM), 2021
pdf / code

TextBox: A Unified, Modularized, and Extensible Framework for Text Generation
Junyi Li†, Tianyi Tang†, Gaole He, Jinhao Jiang, Xiaoxuan Hu, Puzhao Xie, Zhipeng Chen, Zhuohao Yu, Wayne Xin Zhao*, Ji-Rong Wen
The 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021, System Demonstration
pdf / code

Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models
Junyi Li, Tianyi Tang, Wayne Xin Zhao*, Ji-Rong Wen
Findings of The 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021
pdf / code

Pretrained Language Model for Text Generation: A Survey
Junyi Li†, Tianyi Tang†, Wayne Xin Zhao*, Ji-Rong Wen
The 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021, Survey Track
pdf

Knowledge-based Review Generation by Coherence Enhanced Text Planning
Junyi Li, Wayne Xin Zhao*, Zhicheng Wei, Nicholas Jing Yuan, Ji-Rong Wen
The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021
pdf / code

2020

Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network
Junyi Li, Siqing Li, Wayne Xin Zhao*, Gaole He, Zhicheng Wei, Nicholas Jing Yuan, Ji-Rong Wen
The 29th ACM International Conference on Information and Knowledge Management (CIKM), 2020
pdf / code

Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning
Gaole He, Junyi Li, Wayne Xin Zhao*, Peiju Liu, Ji-Rong Wen
International World Wide Web Conference (WWW), 2020
pdf / code

2019

Generating Long and Informative Reviews with Aspect-Aware Coarse-to-Fine Decoding
Junyi Li, Wayne Xin Zhao*, Ji-Rong Wen, Yang Song
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019
pdf / code

Open Source Projects

(Most of my research work are open-source. Here are some my preferable projects!)

TextBox
A unified, comprehensive and efficient framework for reproducing and developing text generation algorithms, covering more than 20 base models and nearly 10 benchmarks.
HaluEval
A hallucination evaluation benchmark for large language models. HaluEval includes 5,000 general user queries with ChatGPT responses and 30,000 task-specific examples from three tasks, i.e., question answering, knowledge-grounded dialogue, and text summarization.

Professional Services

Reviewer
- Journal: TALLIP, ACM Computing Survey, Computational Intelligence
- Conference: AAAI 2021-24, IJCAI 2021-24, EMNLP 2022-2024, COLING 2022-2023, ACL 2023-2024
Chair
- ACL ARR (Area Chair)
- CSSNLP 2020 (Co-Chair)

Selected Awards and Honors

Outstanding Doctoral Dissertation (10 persons/year), Chinese Information Processing Society of China, 2024
Outstanding Graduates, Renmin University of China, 2024
Jingdong Special Scholarship (20 students/year), Renmin University of China, 2023
National Scholarship for Graduate Student (top 2% students), Ministry of Education of P.R.China, 2021
SIGIR Student Travel Grant (CIKM 2020)
National Scholarship for Graduate Student (top 2% students), Ministry of Education of P.R.China, 2019
China Undergraduate Mathematical Contest in Modeling, Second Prize in Beijing Contest District, 2016

Education

Ph.D. of Artificial Intelligence, Renmin University of China & Université de Montréal, 2020-2024
M.Sc. of Computer Application Technology, Renmin University of China, 2018-2020
B.Sc. of Computer Science and Technology, Renmin University of China, 2014-2018