Bingyu Li (李炳煜) - Academic Homepage

Academic Homepage

Researching Multimodal Intelligence for Open-World Visual Understanding

Welcome to my academic homepage. My work focuses on multimodal large language models, vision-language learning, and open-vocabulary visual understanding across diverse real-world scenarios such as remote sensing and underwater environments.

Multimodal LLMs Vision-Language Models Open-Vocabulary Segmentation Remote Sensing Vision Underwater Vision

About Me

I am a Ph.D. student at the University of Science and Technology of China (USTC), supervised by Prof. Xuelong Li.

My research focuses on applying multimodal large language models and vision-language models to visual tasks across diverse scenes. I am particularly interested in open-vocabulary segmentation, multimodal reasoning, and domain-oriented visual intelligence.

Research Topics

I work at the intersection of multimodal learning, visual understanding, and domain-specific intelligence.

Multimodal LLMs

Vision-language models, multimodal reasoning, and foundation models for general-purpose visual intelligence.

Computer Vision

Open-vocabulary segmentation, semantic understanding, instance segmentation, and video understanding.

Domain Applications

Remote sensing vision, underwater vision, and robust multimodal perception in challenging environments.

News

2026.03

Four papers are accepted by CVPR 2026 (2 first-author, 1 second-author, and 1 fourth-author paper)! 🎉
2025.11

One paper is accepted by AAAI 2026 (Oral)! 🎉
2025.10

Awarded the National Scholarship for Graduate Students (研究生国家奖学金). 🎖️
2025.04

StitchFusion is accepted by ACM MM 2025 (Oral).
2024.09

Started my Ph.D. journey at USTC.

Research Highlights

Full publication list is available on Google Scholar.

Multi-Visual Modality

ACM MM 2025 StitchFusion

StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation

Bingyu Li, D Zhang, Z Zhao, J Gao, X Li

Code

Paper

We propose a novel framework that seamlessly integrates arbitrary visual modalities to improve multimodal semantic segmentation.

Pattern Recognition 2025 U3M

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

Bingyu Li, D Zhang, Z Zhao, J Gao, X Li

Code

Paper

We develop an unbiased multiscale modal fusion framework for multimodal semantic segmentation.

Vision-Language and Multimodal Large Language Models

AAAI 2026

Exploring Efficient Open-Vocabulary Segmentation in Remote Sensing

Bingyu Li, H Dong, D Zhang, Z Zhao, J Gao, X Li

Code

Paper

We investigate efficient open-vocabulary segmentation approaches tailored to remote sensing imagery.

CVPR 2026

MARIS: Marine Open-Vocabulary Instance Segmentation

Bingyu Li, F Wang, D Zhang, Z Zhao, J Gao, X Li

Code

Paper

This work introduces MARIS, a benchmark and method for open-vocabulary instance segmentation in marine environments.

CVPR 2026

Exploring the Underwater World Segmentation without Extra Training

Bingyu Li, T Huo, D Zhang, Z Zhao, J Gao, X Li

Code

Paper

We explore training-free segmentation methods for underwater scenes, enabling effective transfer without additional supervision.

CVPR 2026

Boosting Quantitative and Spatial Awareness for Zero-Shot Object Counting

D Zhang, Bingyu Li, F Wang, Z Zhao, J Gao

Code

Paper

We enhance zero-shot object counting by improving both quantitative reasoning and spatial awareness.

arXiv 2025 FGAseg

FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation

Bingyu Li, D Zhang, Z Zhao, J Gao, X Li

Code

Paper

We propose a fine-grained pixel-text alignment framework for open-vocabulary semantic segmentation.

Honors and Awards

2025, National Scholarship for Graduate Students | 研究生国家奖学金

Academic Service

Reviewer — Journals

TGRS
Pattern Recognition (PR)
More journals in related areas

Reviewer — Conferences

CVPR
NeurIPS
ICLR
Other major conferences