Skip to main content

Showing 1–8 of 8 results for author: Su, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2309.04960  [pdf, other

    eess.IV cs.CV

    SdCT-GAN: Reconstructing CT from Biplanar X-Rays with Self-driven Generative Adversarial Networks

    Authors: Shuangqin Cheng, Qingliang Chen, Qiyi Zhang, Ming Li, Yamuhanmode Alike, Kaile Su, Pengcheng Wen

    Abstract: Computed Tomography (CT) is a medical imaging modality that can generate more informative 3D images than 2D X-rays. However, this advantage comes at the expense of more radiation exposure, higher costs, and longer acquisition time. Hence, the reconstruction of 3D CT images using a limited number of 2D X-rays has gained significant importance as an economical alternative. Nevertheless, existing met… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  2. arXiv:2305.06594  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

    Authors: Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo I. Denk

    Abstract: Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally alig… ▽ More

    Submitted 22 February, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: accepted at AAAI 2024, music samples available at https://tinyurl.com/v2meow

  3. arXiv:2303.16897  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

    Authors: Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan

    Abstract: Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely availab… ▽ More

    Submitted 8 July, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: CVPR 2023. Project page: https://sukun1045.github.io/video-physics-sound-diffusion/

  4. arXiv:2206.01369  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Incremental Learning Meets Transfer Learning: Application to Multi-site Prostate MRI Segmentation

    Authors: Chenyu You, **lin Xiang, Kun Su, Xiaoran Zhang, Siyuan Dong, John Onofrey, Lawrence Staib, James S. Duncan

    Abstract: Many medical datasets have recently been created for medical image segmentation tasks, and it is natural to question whether we can use them to sequentially train a single model that (1) performs better on all these datasets, and (2) generalizes well and transfers better to the unknown target site domain. Prior works have achieved this goal by jointly training one model on multi-site datasets, whi… ▽ More

    Submitted 30 July, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  5. arXiv:2012.03478  [pdf, other

    cs.SD cs.CV eess.AS

    Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements

    Authors: Kun Su, Xiulong Liu, Eli Shlizerman

    Abstract: We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting. Learning to generate multi-instrumental music from videos without labeling the instruments is a challenging problem. To achieve the transformation, we built a pipeline named 'Multi-instrumentalistNet' (MI Net). At its base, the pipeline learns a… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Comments: Please see associated video at https://www.youtube.com/watch?v=yo5OZKBbBh4

  6. arXiv:2006.14348  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS eess.IV

    Audeo: Audio Generation for a Silent Performance Video

    Authors: Kun Su, Xiulong Liu, Eli Shlizerman

    Abstract: We present a novel system that gets as an input video frames of a musician playing the piano and generates the music for that video. Generation of music from visual cues is a challenging problem and it is not clear whether it is an attainable goal at all. Our main aim in this work is to explore the plausibility of such a transformation and to identify cues and components able to carry the associat… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: Please see associated video at https://www.youtube.com/watch?v=8rS3VgjG7_c

    Journal ref: Advances in neural information processing 2020

  7. arXiv:1911.12409  [pdf, other

    cs.CV cs.LG eess.IV

    PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition

    Authors: Kun Su, Xiulong Liu, Eli Shlizerman

    Abstract: We propose a novel system for unsupervised skeleton-based action recognition. Given inputs of body keypoints sequences obtained during various movements, our system associates the sequences with actions. Our system is based on an encoder-decoder recurrent neural network, where the encoder learns a separable feature representation within its hidden states formed by training the model to perform pre… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

    Comments: See video at: https://www.youtube.com/watch?v=-dcCFUBRmwE

  8. arXiv:1905.12176  [pdf, other

    cs.LG eess.SP q-bio.NC stat.ML

    Clustering and Recognition of Spatiotemporal Features through Interpretable Embedding of Sequence to Sequence Recurrent Neural Networks

    Authors: Kun Su, Eli Shlizerman

    Abstract: Encoder-decoder recurrent neural network models (RNN Seq2Seq) have achieved great success in ubiquitous areas of computation and applications. It was shown to be successful in modeling data with both temporal and spatial dependencies for translation or prediction tasks. In this study, we propose an embedding approach to visualize and interpret the representation of data by these models. Furthermor… ▽ More

    Submitted 31 January, 2020; v1 submitted 28 May, 2019; originally announced May 2019.