We gratefully acknowledge support from
the Simons Foundation and member institutions.

Ludan Ruan is qualified to endorse.

Accommodating Audio Modality in CLIP for Multimodal Processing

Ludan Ruan: Is registered as an author of this paper.
Can endorse for cs.CV. (why?)

Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng and Qin ** are not registered as owners of this paper. (why?)