Yunqi Hong

myphoto.png

yunqihong@ucla.edu

I am a second-year PhD student in the Computer Science Department at UCLA, advised by Prof. Cho-Jui Hsieh.

My research focuses on post-training and self-improvement of large language and multimodal models. I am currently working on LLM reinforcement learning, reward modeling, and text-to-image generation. I am particularly interested in understanding and mitigating failure modes in optimization, such as reward hacking, distribution shift, and evaluation bias, and in designing scalable methods that improve model performance using limited or unlabeled data. Previously, I explored topics in LLM automatic prompt optimization, model interpretability, scalable graph adversarial attacks, graph representation learning, and recommender systems.

I also collaborate with Prof. Neil Y.C. Lin on interdisciplinary projects, including applying LLM-driven methods to biomedical research.

🙌 I’m actively looking for research internships for Summer 2026. Feel free to reach out if you are interested.

news

Jan 06, 2026 Our new paper Understanding Reward Hacking in Text-to-Image Reinforcement Learning is out, uncovering how reward design leads to artifact exploitation in T2I RL; code is available on Github.
Sep 29, 2025 Check out our paper on Intrinsic Reward Image Synthesis, showing how RL with intrinsic rewards alone can improve text-to-image generation.
Sep 18, 2025 Our paper on boosting fine-grained zero-shot performance of MLLMs with unlabeled data has been accepted at NeurIPS 2025.

selected publications

  1. Preprint
    t2i_reward.png
    Understanding Reward Hacking in Text-to-Image Reinforcement Learning
    Yunqi Hong, Kuei-Chun Kao, Hengguang Zhou, and Cho-Jui Hsieh
    arXiv preprint arXiv:2601.03468, 2026
  2. Preprint
    intro.png
    IRIS: Intrinsic Reward Image Synthesis
    Yihang Chen, Yuanhao Ban, Yunqi Hong, and Cho-Jui Hsieh
    arXiv preprint arXiv:2509.25562, 2025
  3. Preprint
    NormBT.png
    When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models
    Tong Xie, Andrew Bai, Yuanhao Ban, Yunqi Hong, Haoyu Li, and Cho-jui Hsieh
    arXiv preprint arXiv:2512.06343, 2025
  4. NeurIPS 2025
    black_cuckoo.png
    Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs
    Yunqi Hong, Sohyun An, Andrew Bai, Neil YC Lin, and Cho-Jui Hsieh
    Advances in Neural Information Processing Systems, 2025
  5. Preprint
    pathology.png
    Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models
    Yunqi Hong, Johnson Kao, Liam Edwards, Nein-Tzu Liu, Chung-Yen Huang, Alex Oliveira-Kowaleski, Cho-Jui Hsieh, and 1 more author
    arXiv preprint arXiv:2511.12008, 2025
  6. EMNLP 2025
    qgcoc_pipeline.png
    QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
    Kuei-Chun Kao, Hsu Tzu-Yin, Yunqi Hong, Ruochen Wang, and Cho-Jui Hsieh
    In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025