publications
2025
- NeurIPS 2025Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMsYunqi Hong, Sohyun An, Andrew Bai, Neil YC Lin, and 1 more authorAdvances in Neural Information Processing Systems, 2025
Despite Multimodal Large Language Models (MLLMs) showing promising results on general zero-shot image classification tasks, fine-grained image classification remains challenging. It demands precise attention to subtle visual details to distinguish between visually similar subcategories–details that MLLMs may easily overlook without explicit guidance. To address this, we introduce AutoSEP, an iterative self-supervised prompt learning framework designed to enhance MLLM fine-grained classification capabilities in a fully unsupervised manner. Our core idea is to leverage unlabeled data to learn a description prompt that guides MLLMs in identifying crucial discriminative features within an image, and boosts classification accuracy. We developed an automatic self-enhancing prompt learning framework called AutoSEP to iteratively improve the description prompt using unlabeled data, based on instance-level classification scoring function. AutoSEP only requires black-box access to MLLMs, eliminating the need for any training or fine-tuning. We evaluate our approach on multiple fine-grained classification datasets. It consistently outperforms other unsupervised baselines, demonstrating the effectiveness of our self-supervised optimization framework. Notably, AutoSEP on average improves 13 percent over standard zero-shot classification and 5 percent over the best-performing baselines.
- PreprintIRIS: Intrinsic Reward Image SynthesisYihang Chen, Yuanhao Ban, Yunqi Hong, and Cho-Jui HsieharXiv preprint arXiv:2509.25562, 2025
Despite the success of Reinforcement Learning from Human Feedback (RLHF) in language reasoning, its application to autoregressive Text-to-Image (T2I) generation is often constrained by the limited availability of human preference data. This paper explores how an autoregressive T2I model can learn from internal signals without relying on external rewards or labeled data. Contrary to recent findings in text generation, we show that maximizing self-uncertainty, rather than self-certainty, improves image generation. We observe that this is because autoregressive T2I models with low uncertainty tend to generate simple and uniform images, which are less aligned with human preferences. Based on these observations, we propose IRIS (Intrinsic Reward Image Synthesis), the first framework to improve autoregressive T2I models with reinforcement learning using only an intrinsic reward. Empirical results demonstrate that applying IRIS to autoregressive T2I models achieves performance that is competitive with or superior to external rewards.
2023
- EGALA: Efficient Gradient Approximation for Large-scale Graph Adversarial AttackYunqi Hong and Cho-Jui Hsieh2023
Graph Neural Networks (GNNs) have emerged as powerful tools for graph representation learning. However, their vulnerability to adversarial attacks underscores the importance of gaining a deeper understanding of techniques in graph adversarial attacks. Existing attack methods have demonstrated that it is possible to deteriorate the predictions of GNNs by injecting a small number of edges, but they often suffer from poor scalability due to the need of computing/storing gradients on a quadratic number of entries in the adjacency matrix. In this paper, we propose EGALA, a novel approach for conducting large-scale graph adversarial attacks. By showing the derivative of linear graph neural networks can be approximated by the inner product of two matrices, EGALA leverages efficient Approximate Nearest Neighbor Search (ANNS) techniques to identify entries with dominant gradients in sublinear time, offering superior attack capabilities, reduced memory and time consumption, and enhanced scalability. We conducted comprehensive experiments across various datasets to demonstrate the outstanding performance of our model compared with the state-of-the-art methods.
- Enhanced Sequential Recommendation with Self-Attention and Graph Collaborative FeaturesYunqi Hong and Wei YeIn 2023 IEEE International Conference on Data Mining Workshops (ICDMW), 2023
Recommender systems have gained significant popularity in recent years, with research focusing on three main branches: sequential recommendation, graph-based recommendation, and review-based recommendation. These branches leverage different data modalities to provide personalized recommendations. However, these individual modalities have their own biases and limitations, making it challenging to fully capture user preferences and item relations. Naive modality combination methods often face compatibility issues, leading to suboptimal performance. In this work, we propose a comprehensive model called RSSG (Review-enhanced Sequential recommendation with Self-attention and Graph collaborative features) to address this challenge. Our model finely integrates all three data modalities to provide more accurate and personalized recommendations. We utilize a language model to capture textual representations of users and items, as well as user-item attention weights. With shared user and item representations, we employ a time-interval aware self-attention mechanism to capture long-term user behavior patterns, and leverage graph attention networks to capture collaborative information between users and items. Finally, we fuse the sequential and collaborative features to make predictions. We conduct extensive experiments on various datasets to evaluate the performance of our proposed method. The experimental results demonstrate significant performance gains compared to state-of-the-art methods. Our approach effectively leverages the strengths of different data modalities, enabling more accurate and personalized recommendations for users.
2022
- Graph neural diffusion networks for semi-supervised learningWei Ye, Zexi Huang, Yunqi Hong, and Ambuj SingharXiv preprint arXiv:2201.09698, 2022
Graph Convolutional Networks (GCN) is a pioneering model for graph-based semi-supervised learning. However, GCN does not perform well on sparsely-labeled graphs. Its two-layer version cannot effectively propagate the label information to the whole graph structure (i.e., the under-smoothing problem) while its deep version over-smoothens and is hard to train (i.e., the over-smoothing problem). To solve these two issues, we propose a new graph neural network called GND-Nets (for Graph Neural Diffusion Networks) that exploits the local and global neighborhood information of a vertex in a single layer. Exploiting the shallow network mitigates the over-smoothing problem while exploiting the local and global neighborhood information mitigates the under-smoothing problem. The utilization of the local and global neighborhood information of a vertex is achieved by a new graph diffusion method called neural diffusions, which integrate neural networks into the conventional linear and nonlinear graph diffusions. The adoption of neural networks makes neural diffusions adaptable to different datasets. Extensive experiments on various sparsely-labeled graphs verify the effectiveness and efficiency of GND-Nets compared to state-of-the-art approaches.