Yunqi Hong

myphoto.png

yunqihong@ucla.edu

I am just entering my third-year PhD study in the Computer Science Department at UCLA, advised by Prof. Cho-Jui Hsieh.

My research focuses on improving the reasoning ability, reliability, and alignment of large language and multimodal models. I am particularly interested in post-training methods, including supervised fine-tuning (SFT), reinforcement learning (RL), reward modeling, and data curation, with an emphasis on learning under imperfect supervision.

My current work investigates how foundation models can acquire stronger reasoning capabilities when high-quality supervision is limited, noisy, or unavailable. A central theme of my research is understanding how different forms of supervision, including unlabeled data, domain knowledge, model-generated feedback, and learned reward signals, can be leveraged to improve model performance and reasoning abilities. I am particularly interested in developing scalable post-training methods that remain effective under imperfect supervision while improving model reliability and robustness.

Previously, I explored topics of Text-to-Image RL, unsupervised prompt optimization for fine-grained image classification, model attribution, scalable graph adversarial attacks, graph representation learning, and recommender systems.

I also collaborate with Prof. Neil Y.C. Lin on interdisciplinary applications of foundation models in biomedicine, including medical image analysis and drug synergy prediction with LLMs. These projects explore how reasoning models can integrate scientific knowledge and generate useful predictions in domains where expert annotations and reasoning trajectories are often scarce.

news

Jun 15, 2026 I’m very happy to join Google as a student researcher for summer 2026.
Jun 09, 2026 See our new paper A Unifying Lens on SFT Through Target Distribution Design.
Jun 06, 2026 We presented our paper Understanding Reward Hacking in Text-to-Image Reinforcement Learning at CVPR 2026, which uncovers how different rewards lead to exploitations in T2I RL and mitigation methods; code is available on Github.
Apr 30, 2026 Our paper When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models is accepted by ICML 2026.
Sep 18, 2025 Our paper on boosting fine-grained zero-shot performance of MLLMs with unlabeled data has been accepted at NeurIPS 2025.

selected publications

  1. Preprint
    target_sft.png
    A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design
    Tong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An, Yihang Chen, and Cho-Jui Hsieh
    arXiv preprint arXiv:2606.11189, 2026
  2. CVPR 2026
    t2i_reward.png
    Understanding Reward Hacking in Text-to-Image Reinforcement Learning
    Yunqi Hong, Kuei-Chun Kao, Hengguang Zhou, and Cho-Jui Hsieh
    IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026
  3. ICML 2026
    NormBT.png
    When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models
    Tong Xie, Andrew Bai, Yuanhao Ban, Yunqi Hong, Haoyu Li, and Cho-jui Hsieh
    International Conference on Machine Learning, 2026
  4. NeurIPS 2025
    black_cuckoo.png
    Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs
    Yunqi Hong, Sohyun An, Andrew Bai, Neil YC Lin, and Cho-Jui Hsieh
    Advances in Neural Information Processing Systems, 2025
  5. Preprint
    intro.png
    IRIS: Intrinsic Reward Image Synthesis
    Yihang Chen, Yuanhao Ban, Yunqi Hong, and Cho-Jui Hsieh
    arXiv preprint arXiv:2509.25562, 2025
  6. Commun. Med.
    pathology.png
    Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models
    Yunqi Hong, Kuei-Chun Kao, Liam Edwards, Nein-Tzu Liu, Chung-Yen Huang, Alex Oliveira-Kowaleski, Cho-Jui Hsieh, and 1 more author
    Communications Medicine, 2026
  7. EMNLP 2025
    qgcoc_pipeline.png
    QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
    Kuei-Chun Kao, Tzu-Yin Hsu, Yunqi Hong, Ruochen Wang, and Cho-Jui Hsieh
    In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025