Skip to main content

Showing 1–18 of 18 results for author: Chang, J R

.
  1. arXiv:2405.13226  [pdf, other

    cs.CL cs.LG

    Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

    Authors: Hadi Pouransari, Chun-Liang Li, Jen-Hao Rick Chang, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Oncel Tuzel

    Abstract: Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length. However, this method of concatenation can lead to cross-document attention within a sequence, which is neither a desirable learning signal n… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  2. arXiv:2311.18168  [pdf, other

    cs.CV cs.LG eess.AS

    Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

    Authors: Karren D. Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja Vemulapalli, Oncel Tuzel

    Abstract: We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one map** from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D f… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  3. arXiv:2311.17910  [pdf, other

    cs.CV cs.GR

    HUGS: Human Gaussian Splats

    Authors: Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan

    Abstract: Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human togethe… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  4. arXiv:2310.15130  [pdf, other

    cs.SD cs.CV eess.AS

    Novel-View Acoustic Synthesis from 3D Reconstructed Rooms

    Authors: Byeongjoo Ahn, Karren Yang, Brian Hamilton, Jonathan Sheaffer, Anurag Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang

    Abstract: We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separ… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  5. arXiv:2310.03015  [pdf, other

    cs.CV

    Efficient-3DiM: Learning a Generalizable Single-image Novel-view Synthesizer in One Day

    Authors: Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao

    Abstract: The task of novel view synthesis aims to generate unseen perspectives of an object or scene from a limited set of input images. Nevertheless, synthesizing novel views from a single image still remains a significant challenge in the realm of computer vision. Previous approaches tackle this problem by adopting mesh prediction, multi-plain image construction, or more advanced techniques such as neura… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  6. arXiv:2309.10707  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

    Authors: Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel

    Abstract: While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  7. arXiv:2304.12390  [pdf, other

    cs.CV cs.GR

    Pointersect: Neural Rendering with Cloud-Ray Intersection

    Authors: Jen-Hao Rick Chang, Wei-Yu Chen, Anurag Ranjan, Kwang Moo Yi, Oncel Tuzel

    Abstract: We propose a novel method that renders point clouds as if they are surfaces. The proposed method is differentiable and requires no scene-specific optimization. This unique capability enables, out-of-the-box, surface normal estimation, rendering room-scale point clouds, inverse rendering, and ray tracing with global illumination. Unlike existing work that focuses on converting point clouds to other… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  8. arXiv:2303.15437  [pdf, other

    cs.CV

    FaceLit: Neural 3D Relightable Faces

    Authors: Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, Oncel Tuzel

    Abstract: We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation. Unlike existing works that require careful capture setup or human labor, we rely on off-the-shelf pose and illumination estimators. With these estimates, we incorporate the Ph… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  9. arXiv:2303.14885  [pdf, other

    eess.AS cs.LG cs.SD

    Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

    Authors: Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel

    Abstract: Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  10. arXiv:2112.14159  [pdf, other

    cs.CV cs.AI cs.LG

    Skin feature point tracking using deep feature encodings

    Authors: Jose Ramon Chang, Torbjörn E. M. Nordling

    Abstract: Facial feature tracking is a key component of imaging ballistocardiography (BCG) where accurate quantification of the displacement of facial keypoints is needed for good heart rate estimation. Skin feature tracking enables video-based quantification of motor degradation in Parkinson's disease. Traditional computer vision algorithms include Scale Invariant Feature Transform (SIFT), Speeded-Up Robus… ▽ More

    Submitted 4 December, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

  11. arXiv:2110.11479  [pdf, other

    eess.AS cs.LG cs.SD

    Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

    Authors: Ting-Yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel

    Abstract: With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models. However, machine learning with synthetic data is not trivial due to the gap between the synthetic and the real data distributions. Synthetic datasets may contain artifacts that do not exist in real data such as structured noise, content errors, or unrealist… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

  12. arXiv:2110.07040  [pdf, other

    cs.CV cs.LG

    Data Incubation -- Synthesizing Missing Data for Handwriting Recognition

    Authors: Jen-Hao Rick Chang, Martin Bresler, Youssouf Chherawala, Adrien Delaye, Thomas Deselaers, Ryan Dixon, Oncel Tuzel

    Abstract: In this paper, we demonstrate how a generative model can be used to build a better recognizer through the control of content and style. We are building an online handwriting recognizer from a modest amount of training samples. By training our controllable handwriting synthesizer on the same data, we can synthesize handwriting with previously underrepresented content (e.g., URLs and email addresses… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  13. arXiv:2110.03860  [pdf, other

    cs.CV cs.LG

    Token Pooling in Vision Transformers

    Authors: Dmitrii Marin, Jen-Hao Rick Chang, Anurag Ranjan, Anish Prabhu, Mohammad Rastegari, Oncel Tuzel

    Abstract: Despite the recent success in many applications, the high computational requirements of vision transformers limit their use in resource-constrained settings. While many existing methods improve the quadratic complexity of attention, in most vision transformers, self-attention is not the major computation bottleneck, e.g., more than 80% of the computation is spent on fully-connected layers. To impr… ▽ More

    Submitted 11 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023

  14. arXiv:2110.02891  [pdf, other

    cs.LG cs.SD eess.AS

    Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models

    Authors: Jen-Hao Rick Chang, Ashish Shrivastava, Hema Swetha Koppula, Xiaoshuai Zhang, Oncel Tuzel

    Abstract: Controllable generative sequence models with the capability to extract and replicate the style of specific examples enable many applications, including narrating audiobooks in different voices, auto-completing and auto-correcting written handwriting, and generating missing training samples for downstream recognition tasks. However, under an unsupervised-style setting, typical training algorithms f… ▽ More

    Submitted 30 June, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: ICML 2022

  15. arXiv:2011.01156  [pdf, other

    cs.LG stat.ML

    SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

    Authors: Ting-Yao Hu, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel

    Abstract: Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to update the model parameters and should be perturbe… ▽ More

    Submitted 15 February, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted at ICASSP 2021

  16. arXiv:2005.00946  [pdf, other

    eess.IV cs.CV physics.optics

    Towards Occlusion-Aware Multifocal Displays

    Authors: Jen-Hao Rick Chang, Anat Levin, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan

    Abstract: The human visual system uses numerous cues for depth perception, including disparity, accommodation, motion parallax and occlusion. It is incumbent upon virtual-reality displays to satisfy these cues to provide an immersive user experience. Multifocal displays, one of the classic approaches to satisfy the accommodation cue, place virtual content at multiple focal planes, each at a di erent depth.… ▽ More

    Submitted 2 May, 2020; originally announced May 2020.

    Comments: SIGGRAPH 2020

  17. Towards Multifocal Displays with Dense Focal Stacks

    Authors: Jen-Hao Rick Chang, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan

    Abstract: We present a virtual reality display that is capable of generating a dense collection of depth/focal planes. This is achieved by driving a focus-tunable lens to sweep a range of focal lengths at a high frequency and, subsequently, tracking the focal length precisely at microsecond time resolutions using an optical module. Precise tracking of the focal length, coupled with a high-speed display, ena… ▽ More

    Submitted 22 September, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

  18. arXiv:1503.02457  [pdf

    cond-mat.supr-con cond-mat.str-el

    Gate-tuned Superconductor-Insulator transition in (Li,Fe)OHFeSe

    Authors: B. Lei, Z. J. Xiang, X. F. Lu, N. Z. Wang, J. R. Chang, C. Shang, X. G. Luo, T. Wu, Z. Sun, X. H. Chen

    Abstract: The antiferromagnetic(AFM) insulator-superconductor transition has been always a center of interest in the underlying physics of unconventional superconductors. The quantum phase transition between Mott insulator with AFM and superconductor can be induced by do** charge carriers in high-Tc cuprate superconductors. For the best characterized organic superconductors of k-(BEDT-TTF)2X (X=anion), a… ▽ More

    Submitted 9 March, 2015; originally announced March 2015.

    Comments: 20 pages, 4 figures, supplementary information is not uploaded

    Journal ref: Phys. Rev. B 93, 060501 (2016)