AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding
Published in ACM International Conference on Multimedia (ACMMM) 2025, 2025
AlignCAT introduces a query-based semantic matching framework for weakly supervised visual grounding, employing coarse-grained category alignment and fine-grained attribute alignment to enhance visual-linguistic reasoning and achieve state-of-the-art performance on RefCOCO, RefCOCO+, and RefCOCOg.
Recommended citation: Yidan Wang, Chenyi Zhuang, Wutao Liu, Pan Gao, Nicu Sebe. (2025). "AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding." ACMMM 2025.
Download Paper
