Haobo Yuan

PhD Student @ UC Merced

yuanhaobo.jpg

I am a PhD student at UC Merced, working under the supervision of Prof. Ming-Hsuan Yang. I have had the privilege of working closely with Dr. Xiangtai Li during my research. I earned my B.E. degree with honors from the Hongyi Honors College in 2020, and my M.S. degree from the School of Computer Science in 2023, both in computer science and technology at Wuhan University. During my master’s studies, I was fortunate to be supervised by Prof. Lefei Zhang. I also had the opportunity to spend one year as a research associate at NTU Singapore, working with Prof. Chen Change Loy. During my PhD studies, I had the valuable opportunity to be student researcher @ Google DeepMind, Mountain View (2025), research scientist intern @ ByteDance, San Jose (2026).

Currently, my research focuses on advancing multi-modal large language models, visual reasoning, and image/video generation.

news

Feb 20, 2026 SAMtok is accepted by CVPR 2026.
Jan 05, 2026 Starting my internship at ByteDance (San Jose).
May 19, 2025 Starting my internship at Google Deepmind (MTV-CE).
Jan 23, 2025 PCM and RAP-SAM got accepted by AAAI and ICLR (2025).
Dec 16, 2024 I am now visiting University of Tokyo.
Aug 15, 2024 Starting my PhD journey at UC Merced.
Jul 01, 2024 Open-Vocabulary SAM has been accepted by ECCV 2024.
Feb 27, 2024 OMG-Seg got accepted by CVPR 2024.
Jan 30, 2024 One paper (survey about open-vocabulary learning) got accepted by TPAMI.
Aug 16, 2023 Day 1 @ MMLab, NTU. My new voyage begins. 🚢

selected publications

  1. arXiv
    Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
    Haobo Yuan, Xiangtai Li, Tao Zhang, Zilong Huang, Shilin Xu, Shunping Ji, Yunhai Tong, Lu Qi, Jiashi Feng, and Ming-Hsuan Yang
    arXiv pre-print, 2025
  2. arXiv
    Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
    Haobo Yuan, Yueyi Sun, Yanwei Li, Tao Zhang, Xueqing Deng, Henghui Ding, Lu Qi, Anran Wang, Xiangtai Li, and Ming-Hsuan Yang
    arXiv pre-print, 2025
  3. CVPR 26
    SAMTok: Representing Any Mask with Two Words
    Yikang Zhou, Tao Zhang, Dengxian Gong, Yuanzheng Wu, Ye Tian, Haochen Wang, Haobo Yuan, Jiacong Wang, Lu Qi, Hao Fei, Anran Wang, Zhuochen Wang, Yujing Wang, Cheng Chen, Shunping Ji, and Xiangtai Li
    In CVPR, 2026
  4. arXiv
    DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World
    Xiangtai Li, Tao Zhang, Yanwei Li, Haobo Yuan, Shihao Chen, Yikang Zhou, Jiahao Meng, Yueyi Sun, Shilin Xu, Lu Qi, Tianheng Cheng, Yi Lin, Zilong Huang, Wenhao Huang, Jiashi Feng, and Guang Shi
    arXiv preprint arXiv:2506.24102, 2025
  5. arXiv
    An empirical study of gpt-4o image generation capabilities
    Sixiang Chen, Jinbin Bai, Zhuoran Zhao, Tian Ye, Qingyu Shi, Donghao Zhou, Wenhao Chai, Xin Lin, Jianzong Wu, Chao Tang, Shilin Xu, Tao Zhang, Haobo Yuan, Yikang Zhou, Wei Chow, Linfeng Li, Xiangtai Li, Lei Zhu, and Lu Qi
    arXiv preprint arXiv:2504.05979, 2025
  6. On path to multimodal generalist: General-level and general-bench
    Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Weiming Wu, Tianjie Ju, Zixiang Meng, Shilin Xu, Liyu Jia, Wentao Hu, Meng Luo, Jiebo Luo, Tat-Seng Chua, Shuicheng Yan, and Hanwang Zhang
    In ICML, 2025
  7. Point Could Mamba: Point Cloud Learning via State Space Model
    Tao Zhang, Xiangtai Li, Haobo Yuan, Shunping Ji, and Shuicheng Yan
    In AAAI, 2025
  8. RAP-SAM:Towards Real-Time All-Purpose Segment Anything
    Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, and Ming-Hsuan Yang
    In ICLR, 2025
  9. Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
    Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, and Chen Change Loy
    In ECCV, 2024
  10. OMG-Seg: Is One Model Good Enough For All Segmentation?
    Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, and Chen Change Loy
    In CVPR, 2024
  11. OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
    Tao Zhang, Xiangtai Li, Hao Fei, Haobo Yuan, Shengqiong Wu, Shunping Ji, Chen Change Loy, and Shuicheng Yan
    In NeurIPS, 2024
  12. Transformer-based Visual Segmentation: A Survey
    Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, and Chen Change Loy
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
  13. Towards Open Vocabulary Learning: A Survey
    Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, and Dacheng Tao
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
  14. PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation
    Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, and Dacheng Tao
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
  15. Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation
    Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, and Chen Change Loy
    In ICCV, 2023
  16. TIP
    Monocular Road Planar Parallax Estimation
    Haobo Yuan, Teng Chen, Wei Sui, Jiafeng Xie, Lefei Zhang, Yuan Li, and Qian Zhang
    IEEE Transactions on Image Processing, 2023
  17. Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning
    Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, and Dacheng Tao
    In ICLR, 2023
  18. PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation
    Haobo Yuan, Xiangtai Li, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, and Dacheng Tao
    In ECCV, 2022
    Winner method of the ICCV-2021 SemKITTI-DVPS Challenge.
  19. Multi-Task Learning with Multi-query Transformer for Dense Prediction
    Yangyang Xu, Xiangtai Li, Haobo Yuan, Yibo Yang, and Lefei Zhang
    IEEE Transactions on Circuits and Systems for Video Technology, 2023
  20. Towards Theoretically Inspired Neural Initialization Optimization
    Yibo Yang, Hong Wang, Haobo Yuan, and Zhouchen Lin
    In NeurIPS, 2022
  21. BOSSA: a decentralized system for proofs of data retrievability and replication
    Dian Chen, Haobo Yuan, Shengshan Hu, Qian Wang, and Cong Wang
    IEEE Transactions on Parallel and Distributed Systems, 2021
  22. arXiv
    Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model
    Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, and Chen Change Loy
    arXiv preprint, 2024
  23. arXiv
    LLAVADI: What Matters For Multimodal Large Language Models Distillation
    Shilin Xu, Xiangtai Li, Haobo Yuan, Lu Qi, Yunhai Tong, and Ming-Hsuan Yang
    arXiv preprint, 2024
  24. arXiv
    Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants
    Yibo Yang, Haobo Yuan, Xiangtai Li, Jianlong Wu, Lefei Zhang, Zhouchen Lin, Philip Torr, Dacheng Tao, and Bernard Ghanem
    arXiv pre-print, 2023