We gratefully acknowledge support from
the Simons Foundation and member institutions.

Guangzhi Sun is qualified to endorse.

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Guangzhi Sun: Is registered as an author of this paper.
Can endorse for cs.AI, cs.CL, cs.CV, cs.LG, cs.SD, eess.AS, eess.IV, stat.ML. (why?)

Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma and Chao Zhang are not registered as owners of this paper. (why?)