We gratefully acknowledge support from
the Simons Foundation and member institutions.

Shi**g Si and Jianzong Wang are qualified to endorse.

Speech2Video: Cross-Modal Distillation for Speech to Video Generation

Shi**g Si: Is registered as an author of this paper.
Can endorse for cs.AI, cs.CL, cs.SD, stat.ML. (why?)
Jianzong Wang: Is registered as an author of this paper.
Can endorse for cs.AI, cs.CL, cs.CV, cs.LG, cs.SD, eess.AS. (why?)

Xiaoyang Qu, Ning Cheng, Wenqi Wei, Xinghua Zhu and **g Xiao are not registered as owners of this paper. (why?)