Skip to main content

Showing 1–50 of 63 results for author: Cong, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09606  [pdf, other

    cs.LG cs.AI cs.AR

    Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

    Authors: Zongyue Qin, Yunsheng Bai, Atefeh Sohrabizadeh, Zijian Ding, Ziniu Hu, Yizhou Sun, Jason Cong

    Abstract: In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality… ▽ More

    Submitted 27 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages, 8 figures. arXiv admin note: text overlap with arXiv:2305.10838

  2. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2405.18371  [pdf, other

    quant-ph cs.AR cs.DC

    ML-QLS: Multilevel Quantum Layout Synthesis

    Authors: Wan-Hsuan Lin, Jason Cong

    Abstract: Quantum Layout Synthesis (QLS) plays a crucial role in optimizing quantum circuit execution on physical quantum devices. As we enter the era where quantum computers have hundreds of qubits, we are faced with scalability issues using optimal approaches and degrading heuristic methods' performance due to the lack of global optimization. To this end, we introduce a hybrid design that obtains the much… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2405.15095  [pdf, other

    cs.ET quant-ph

    Compilation for Dynamically Field-Programmable Qubit Arrays with Efficient and Provably Near-Optimal Scheduling

    Authors: Daniel Bochen Tan, Wan-Hsuan Lin, Jason Cong

    Abstract: Dynamically field-programmable qubit arrays based on neutral atoms have high fidelity and highly parallel gates for quantum computing. However, it is challenging for compilers to fully leverage the novel flexibility offered by such hardware while respecting its various constraints. In this study, we break down the compilation for this architecture into three tasks: scheduling, placement, and routi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2405.12304  [pdf, other

    cs.AR

    Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach

    Authors: Stéphane Pouget, Louis-Noël Pouchet, Jason Cong

    Abstract: High-Level Synthesis enables the rapid prototy** of hardware accelerators, by combining a high-level description of the functional behavior of a kernel with a set of micro-architecture optimizations as inputs. Such optimizations can be described by inserting pragmas e.g. pipelining and replication of units, or even higher level transformations for HLS such as automatic data caching using the AMD… ▽ More

    Submitted 30 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  6. arXiv:2405.06067  [pdf, other

    cs.CL cs.LG

    HMT: Hierarchical Memory Transformer for Long Context Language Processing

    Authors: Zifan He, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong

    Abstract: Transformer-based large language models (LLM) have been widely used in language processing applications. However, most of them restrict the context window that permits the model to attend to every token in the inputs. Previous works in recurrent models can memorize past tokens to enable unlimited context and maintain effectiveness. However, they have "flat" memory architectures, which have limitat… ▽ More

    Submitted 14 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  7. arXiv:2405.03058  [pdf, other

    cs.SE cs.PL

    Enhancing High-Level Synthesis with Automated Pragma Insertion and Code Transformation Framework

    Authors: Stéphane Pouget, Louis-Noël Pouchet, Jason Cong

    Abstract: High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which… ▽ More

    Submitted 21 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  8. arXiv:2403.07262  [pdf, other

    cs.LG cs.AI

    A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective

    Authors: Yunpeng Qing, Shunyu liu, **gyuan Cong, Kaixuan Chen, Yihe Zhou, Mingli Song

    Abstract: Offline reinforcement learning endeavors to leverage offline datasets to craft effective agent policy without online interaction, which imposes proper conservative constraints with the support of behavior policies to tackle the out-of-distribution problem. However, existing works often suffer from the constraint conflict issue when offline datasets are collected from multiple behavior policies, i.… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  9. arXiv:2401.13807  [pdf, other

    cs.ET quant-ph

    Depth-Optimal Addressing of 2D Qubit Array with 1D Controls Based on Exact Binary Matrix Factorization

    Authors: Daniel Bochen Tan, Shuohao **, Jason Cong

    Abstract: Reducing control complexity is essential for achieving large-scale quantum computing. However, reducing control knobs may compromise the ability to independently address each qubit. Recent progress in neutral atom-based platforms suggests that rectangular (row-column) addressing may strike a balance between control granularity and flexibility for 2D qubit arrays. This scheme allows addressing qubi… ▽ More

    Submitted 22 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  10. arXiv:2401.01009  [pdf, other

    cs.IT quant-ph

    Quantum State Preparation Using an Exact CNOT Synthesis Formulation

    Authors: Hanyu Wang, Bochen Tan, Jason Cong, Giovanni De Micheli

    Abstract: Minimizing the use of CNOT gates in quantum state preparation is a crucial step in quantum compilation, as they introduce coupling constraints and more noise than single-qubit gates. Reducing the number of CNOT gates can lead to more efficient and accurate quantum computations. However, the lack of compatibility to model superposition and entanglement challenges the scalability and optimality of C… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: 6 pages, 7 figures

  11. arXiv:2311.16190  [pdf, other

    quant-ph cs.AR cs.ET

    Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas

    Authors: Hanrui Wang, Daniel Bochen Tan, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jason Cong, Song Han

    Abstract: Neutral atom arrays have become a promising platform for quantum computing, especially the field programmable qubit array (FPQA) endowed with the unique capability of atom movement. This feature allows dynamic alterations in qubit connectivity during runtime, which can reduce the cost of executing long-range gates and improve parallelism. However, this added flexibility introduces new challenges i… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: 10 pages, 16 figures; Published as a conference paper at DAC 2024

  12. arXiv:2311.15123  [pdf, other

    quant-ph cs.AR cs.DC

    Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

    Authors: Hanrui Wang, Pengyu Liu, Daniel Bochen Tan, Yilian Liu, Jiaqi Gu, David Z. Pan, Jason Cong, Umut A. Acar, Song Han

    Abstract: The neutral atom array has gained prominence in quantum computing for its scalability and operation fidelity. Previous works focus on fixed atom arrays (FAAs) that require extensive SWAP operations for long-range interactions. This work explores a novel architecture reconfigurable atom arrays (RAAs), also known as field programmable qubit arrays (FPQAs), which allows for coherent atom movements du… ▽ More

    Submitted 2 May, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: 17 pages, 26 figures; Published as a conference paper at ISCA 2024

  13. arXiv:2311.10189  [pdf, other

    cs.DC cs.AR

    TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs

    Authors: Neha Prakriya, Yuze Chi, Suhail Basalama, Linghao Song, Jason Cong

    Abstract: Despite the increasing adoption of Field-Programmable Gate Arrays (FPGAs) in compute clouds, there remains a significant gap in programming tools and abstractions which can leverage network-connected, cloud-scale, multi-die FPGAs to generate accelerators with high frequency and throughput. To this end, we propose TAPA-CS, a task-parallel dataflow programming framework which automatically partition… ▽ More

    Submitted 1 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  14. arXiv:2310.04004  [pdf, other

    cs.SD eess.AS

    U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

    Authors: Tao Li, Zhichao Wang, Xinfa Zhu, Jian Cong, Qiao Tian, Yu** Wang, Lei Xie

    Abstract: Zero-shot speaker cloning aims to synthesize speech for any target speaker unseen during TTS system building, given only a single speech reference of the speaker at hand. Although more practical in real applications, the current zero-shot methods still produce speech with undesirable naturalness and speaker similarity. Moreover, endowing the target speaker with arbitrary speaking styles in the zer… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  15. arXiv:2310.01342  [pdf, other

    cs.IT eess.SP

    Near-field Integrated Sensing and Communication: Opportunities and Challenges

    Authors: Jiayi Cong, Changsheng You, Jiapeng Li, Li Chen, Beixiong Zheng, Yuanwei Liu, Wen Wu, Yi Gong, Shi **, Rui Zhang

    Abstract: With the extremely large-scale array XL-array deployed in future wireless systems, wireless communication and sensing are expected to operate in the radiative near-field region, which needs to be characterized by the spherical rather than planar wavefronts. Unlike most existing works that considered far-field integrated sensing and communication (ISAC), we study in this article the new near-field… ▽ More

    Submitted 17 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: This work is submitted to IEEE for possible publication

  16. arXiv:2309.00883  [pdf, other

    cs.SD eess.AS

    DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

    Authors: Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, **gbei Li, Qiao Tian, Yu** Wang, Lei Xie

    Abstract: While the performance of cross-lingual TTS based on monolingual corpora has been significantly improved recently, generating cross-lingual speech still suffers from the foreign accent problem, leading to limited naturalness. Besides, current cross-lingual methods ignore modeling emotion, which is indispensable paralinguistic information in speech delivery. In this paper, we propose DiCLET-TTS, a D… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: accepted by TASLP

  17. arXiv:2306.14052  [pdf, other

    cs.LG cs.AR cs.DC

    A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware

    Authors: Shichang Zhang, Atefeh Sohrabizadeh, Cheng Wan, Zijie Huang, Ziniu Hu, Yewen Wang, Yingyan, Lin, Jason Cong, Yizhou Sun

    Abstract: Graph neural networks (GNNs) are emerging for machine learning research on graph-structured data. GNNs achieve state-of-the-art performance on many tasks, but they face scalability challenges when it comes to real-world applications that have numerous data and strict latency requirements. Many studies have been conducted on how to accelerate GNNs in an effort to address these challenges. These acc… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

  18. arXiv:2306.08232  [pdf, other

    cs.LG cs.AI

    Curricular Subgoals for Inverse Reinforcement Learning

    Authors: Shunyu Liu, Yunpeng Qing, Shuqi Xu, Hongyan Wu, Jiangtao Zhang, **gyuan Cong, Tianhao Chen, Yunfu Liu, Mingli Song

    Abstract: Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning, and has demonstrated its remarkable success in imitation learning. To promote expert-like behavior, existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert. However, these globa… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  19. Compiling Quantum Circuits for Dynamically Field-Programmable Neutral Atoms Array Processors

    Authors: Daniel Bochen Tan, Dolev Bluvstein, Mikhail D. Lukin, Jason Cong

    Abstract: Dynamically field-programmable qubit arrays (DPQA) have recently emerged as a promising platform for quantum information processing. In DPQA, atomic qubits are selectively loaded into arrays of optical traps that can be reconfigured during the computation itself. Leveraging qubit transport and parallel, entangling quantum operations, different pairs of qubits, even those initially far away, can be… ▽ More

    Submitted 1 July, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Version accepted by Quantum. 21 pages, 9 figures, 7 tables. An extended abstract was presented at the 41st International Conference on Computer-Aided Design (ICCAD '22)

    Journal ref: Quantum 8, 1281 (2024)

  20. Locality and Utilization in Placement Suboptimality

    Authors: Jason Cong, Michalis Romesis, Joseph R. Shinnerl, Kenton Sze, Min Xie

    Abstract: The mixed-size placement benchmarks described in this book chapter directly address several of the shortcomings in previously published suboptimality benchmarks. Two new sets of placement examples are constructed, one targeting the role of nonlocal nets in suboptimality, and another targeting the role of white space and large variations in module sizes. The first set, PEKO-MC, is a set of standard… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Version of Record: https://link.springer.com/book/10.1007/978-0-387-68739-1 Pre-print of Chapter 2 from the following work: Gi-Joon Nam and Jason Cong, Modern Circuit Placement Best Practices and Results, 2007, Springer, reproduced with permission of Springer Science+Business Media, LLC. The final authenticated version is available online at: http://dx.doi.org/10.1007/978-0-387-68739-1

    ACM Class: B.7.2; B.7.3; B.8.2; G.4; J.6

    Journal ref: Chapter 2 from Gi-Joon Nam and Jason Cong, Modern Circuit Placement Best Practices and Results, 2007, Springer, reproduced with permission of Springer Science+Business Media, LLC

  21. arXiv:2305.10838   

    cs.LG cs.PL

    ProgSG: Cross-Modality Representation Learning for Programs in Electronic Design Automation

    Authors: Yunsheng Bai, Atefeh Sohrabizadeh, Zongyue Qin, Ziniu Hu, Yizhou Sun, Jason Cong

    Abstract: Recent years have witnessed the growing popularity of domain-specific accelerators (DSAs), such as Google's TPUs, for accelerating various applications such as deep learning, search, autonomous driving, etc. To facilitate DSA designs, high-level synthesis (HLS) is used, which allows a developer to compile a high-level description in the form of software code in C and C++ into a design in low-level… ▽ More

    Submitted 2 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Requires further polishing

  22. arXiv:2302.11082  [pdf, other

    cs.CV

    BB-GCN: A Bi-modal Bridged Graph Convolutional Network for Multi-label Chest X-Ray Recognition

    Authors: Guoli Wang, **** Wang, **yu Cong, Kunmeng Liu, Benzheng Wei

    Abstract: Multi-label chest X-ray (CXR) recognition involves simultaneously diagnosing and identifying multiple labels for different pathologies. Since pathological labels have rich information about their relationship to each other, modeling the co-occurrence dependencies between pathological labels is essential to improve recognition performance. However, previous methods rely on state variable coding and… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: under Computers in Biology and Medicine submission

  23. arXiv:2301.02359  [pdf, other

    cs.AR

    CHARM: Composing Heterogeneous Accelerators for Matrix Multiply on Versal ACAP Architecture

    Authors: **ming Zhuang, Jason Lau, Hanchen Ye, Zhuo** Yang, Yubo Du, Jack Lo, Kristof Denolf, Stephen Neuendorffer, Alex Jones, **gtong Hu, Deming Chen, Jason Cong, Peipei Zhou

    Abstract: Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged as promising platforms. For example, the AMD/Xilinx Versal ACAP architecture combines general-purpose CPU cores and programmable logic wi… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

  24. arXiv:2211.01087  [pdf, other

    cs.SD eess.AS

    DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

    Authors: Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie, Gang He, **feng Bai

    Abstract: Recent development of neural vocoders based on the generative adversarial neural network (GAN) has shown obvious advantages of generating raw waveform conditioned on mel-spectrogram with fast inference speed and lightweight networks. Whereas, it is still challenging to train a universal neural vocoder that can synthesize high-fidelity speech from various scenarios with unseen speakers, languages,… ▽ More

    Submitted 28 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  25. arXiv:2210.17349  [pdf, other

    cs.SD eess.AS

    Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

    Authors: Kun Song, Jian Cong, Xinsheng Wang, Yongmao Zhang, Lei Xie, Ning Jiang, Haiying Wu

    Abstract: In current two-stage neural text-to-speech (TTS) paradigm, it is ideal to have a universal neural vocoder, once trained, which is robust to imperfect mel-spectrogram predicted from the acoustic model. To this end, we propose Robust MelGAN vocoder by solving the original multi-band MelGAN's metallic sound problem and increasing its generalization ability. Specifically, we introduce a fine-grained n… ▽ More

    Submitted 2 November, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: Accepted by ISCSLP 2022

  26. Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver

    Authors: Linghao Song, Licheng Guo, Suhail Basalama, Yuze Chi, Robert F. Lucas, Jason Cong

    Abstract: The continued growth in the processing power of FPGAs coupled with high bandwidth memories (HBM), makes systems like the Xilinx U280 credible platforms for linear solvers which often dominate the run time of scientific and engineering applications. In this paper, we present Callipepla, an accelerator for a preconditioned conjugate gradient linear solver (CG). FPGA acceleration of CG faces three ch… ▽ More

    Submitted 29 December, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: To appear in FPGA 2023

  27. arXiv:2209.02951  [pdf, other

    cs.AR cs.PL

    Democratizing Domain-Specific Computing

    Authors: Yuze Chi, Weikang Qiao, Atefeh Sohrabizadeh, Jie Wang, Jason Cong

    Abstract: In the past few years, domain-specific accelerators (DSAs), such as Google's Tensor Processing Units, have shown to offer significant performance and energy efficiency over general-purpose CPUs. An important question is whether typical software developers can design and implement their own customized DSAs, with affordability and efficiency, to accelerate their applications. This article presents o… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: To be published in CACM'22

  28. arXiv:2209.02663  [pdf, other

    cs.AR cs.DC cs.PF cs.PL

    TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design

    Authors: Licheng Guo, Yuze Chi, Jason Lau, Linghao Song, Xingyu Tian, Moazin Khatti, Weikang Qiao, Jie Wang, Ecenur Ustun, Zhenman Fang, Zhiru Zhang, Jason Cong

    Abstract: In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient APIs that allow users to easily express flexible and complex inter-task communication structures. Second, TAPA adopts a coarse-grained floorplanning… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

  29. arXiv:2207.14482  [pdf, other

    cs.AR quant-ph

    Domain-Specific Quantum Architecture Optimization

    Authors: Wan-Hsuan Lin, Bochen Tan, Murphy Yuezhen Niu, Jason Kimko, Jason Cong

    Abstract: With the steady progress in quantum computing over recent years, roadmaps for upscaling quantum processors have relied heavily on the targeted qubit architectures. So far, similarly to the early age of classical computing, these designs have been crafted by human experts. These general-purpose architectures, however, leave room for customization and optimization, especially when targeting popular… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

  30. arXiv:2207.01832  [pdf, other

    cs.SD eess.AS

    Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion

    Authors: Yi Lei, Shan Yang, Jian Cong, Lei Xie, Dan Su

    Abstract: The zero-shot scenario for speech generation aims at synthesizing a novel unseen voice with only one utterance of the target speaker. Although the challenges of adapting new voices in zero-shot scenario exist in both stages -- acoustic modeling and vocoder, previous works usually consider the problem from only one stage. In this paper, we extend our previous Glow-WaveGAN to Glow-WaveGAN 2, aiming… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

  31. arXiv:2206.00208  [pdf, other

    cs.SD eess.AS

    AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation

    Authors: Kun Song, Heyang Xue, Xinsheng Wang, Jian Cong, Yongmao Zhang, Lei Xie, Bing Yang, Xiong Zhang, Dan Su

    Abstract: Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a pre-trained TTS model to adapt to new target speakers with limited data. While much effort has been conducted towards this task, seldom work has been performed for low computational resource scenarios due to the challenges raised by the requirement of the lightweight model and less computational complexity. In this paper, a tiny… ▽ More

    Submitted 2 November, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

    Comments: Accepted by ISCSLP 2022

  32. arXiv:2205.07991  [pdf, other

    cs.AR cs.DC

    TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-based FPGAs

    Authors: Weikang Qiao, Licheng Guo, Zhenman Fang, Mau-Chung Frank Chang, Jason Cong

    Abstract: The emergence of high-bandwidth memory (HBM) brings new opportunities to boost the performance of sorting acceleration on FPGAs, which was conventionally bounded by the available off-chip memory bandwidth. However, it is nontrivial for designers to fully utilize this immense bandwidth. First, the existing sorter designs cannot be directly scaled at the increasing rate of available off-chip bandwid… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  33. arXiv:2205.04421  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

    Authors: Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, Yuanhao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

    Abstract: Text to speech (TTS) has made rapid progress in both academia and industry in recent years. Some questions naturally arise that whether a TTS system can achieve human-level quality, how to define/judge that quality and how to achieve it. In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing app… ▽ More

    Submitted 10 May, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: 19 pages, 3 figures, 8 tables

  34. arXiv:2111.14252  [pdf, other

    cs.AR

    Search for Optimal Systolic Arrays: A Comprehensive Automated Exploration Framework and Lessons Learned

    Authors: Jie Wang, Jason Cong

    Abstract: Systolic arrays have been widely used for accelerating HPC and deep learning applications. There is a plethora of previous works on the performance tuning of systolic arrays, but usually based on a number of oversimplified assumptions (e.g., only considering divisors for loop tiling, pruning based on off-chip data communication) to reduce the design space. In this paper, we present a comprehensi… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

  35. arXiv:2111.12555  [pdf, other

    cs.AR cs.DC

    Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication

    Authors: Linghao Song, Yuze Chi, Licheng Guo, Jason Cong

    Abstract: Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, a… ▽ More

    Submitted 9 May, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: To appear in DAC'22

  36. arXiv:2111.08848  [pdf, other

    cs.AR cs.LG

    Enabling Automated FPGA Accelerator Optimization Using Graph Neural Networks

    Authors: Atefeh Sohrabizadeh, Yunsheng Bai, Yizhou Sun, Jason Cong

    Abstract: High-level synthesis (HLS) has freed the computer architects from develo** their designs in a very low-level language and needing to exactly specify how the data should be transferred in register-level. With the help of HLS, the hardware designers must describe only a high-level behavioral flow of the design. Despite this, it still can take weeks to develop a high-performance architecture mainly… ▽ More

    Submitted 21 November, 2021; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: 12 pages

  37. arXiv:2111.05936  [pdf, other

    cs.LG cs.AR cs.DC

    SPA-GCN: Efficient and Flexible GCN Accelerator with an Application for Graph Similarity Computation

    Authors: Atefeh Sohrabizadeh, Yuze Chi, Jason Cong

    Abstract: While there have been many studies on hardware acceleration for deep learning on images, there has been a rather limited focus on accelerating deep learning applications involving graphs. The unique characteristics of graphs, such as the irregular memory access and dynamic parallelism, impose several challenges when the algorithm is mapped to a CPU or GPU. To address these challenges while exploit… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

    Comments: 12 pages

  38. arXiv:2110.08813  [pdf, other

    eess.AS cs.SD

    VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

    Authors: Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi

    Abstract: In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score. Our approach is inspired by VITS, which adopts VAE-based posterior encoder augmented with normalizing flow-based prior encoder and adversarial decoder to realize complete end-to-end speech generation. VISinger follows the… ▽ More

    Submitted 24 February, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: 5 pages, ICASSP 2022

  39. arXiv:2110.04280  [pdf, other

    cs.LG cs.AR cs.DC cs.PF

    Pyxis: An Open-Source Performance Dataset of Sparse Accelerators

    Authors: Linghao Song, Yuze Chi, Jason Cong

    Abstract: Specialized accelerators provide gains of performance and efficiency in specific domains of applications. Sparse data structures or/and representations exist in a wide range of applications. However, it is challenging to design accelerators for sparse applications because no architecture or performance-level analytic models are able to fully capture the spectrum of the sparse data. Accelerator res… ▽ More

    Submitted 21 February, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: To appear in ICASSP'22

  40. arXiv:2109.11081  [pdf, other

    cs.AR cs.DC

    Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication

    Authors: Linghao Song, Yuze Chi, Atefeh Sohrabizadeh, Young-kyu Choi, Jason Lau, Jason Cong

    Abstract: Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of applications, including scientific computing, graph processing, and deep learning. Architecting accelerators for SpMM is faced with three challenges - (1) the random memory accessing and unbalanced load in processing because of random distribution of elements in sparse matrices, (2) inefficient data handling o… ▽ More

    Submitted 12 January, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: To appear in FPGA'22

  41. Optimal Qubit Map** with Simultaneous Gate Absorption

    Authors: Bochen Tan, Jason Cong

    Abstract: Before quantum error correction (QEC) is achieved, quantum computers focus on noisy intermediate-scale quantum (NISQ) applications. Compared to the well-known quantum algorithms requiring QEC, like Shor's or Grover's algorithm, NISQ applications have different structures and properties to exploit in compilation. A key step in compilation is map** the qubits in the program to physical qubits on a… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: 8 pages, 8 figures, to appear in ICCAD'21

  42. Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition

    Authors: Saranyu Chattopadhyay, Florian Lonsing, Luca Piccolboni, Deepraj Soni, Peng Wei, Xiaofan Zhang, Yuan Zhou, Luca Carloni, Deming Chen, Jason Cong, Ramesh Karri, Zhiru Zhang, Caroline Trippel, Clark Barrett, Subhasish Mitra

    Abstract: Hardware accelerators (HAs) are essential building blocks for fast and energy-efficient computing systems. Accelerator Quick Error Detection (A-QED) is a recent formal technique which uses Bounded Model Checking for pre-silicon verification of HAs. A-QED checks an HA for self-consistency, i.e., whether identical inputs within a sequence of operations always produce the same output. Under modest as… ▽ More

    Submitted 17 August, 2021; v1 submitted 13 August, 2021; originally announced August 2021.

    Comments: preprint of a paper to appear at FMCAD 2021, including appendix

  43. arXiv:2106.10831  [pdf, other

    eess.AS cs.SD

    Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

    Authors: Jian Cong, Shan Yang, Lei Xie, Dan Su

    Abstract: Current two-stage TTS framework typically integrates an acoustic model with a vocoder -- the acoustic model predicts a low resolution intermediate representation such as Mel-spectrum while the vocoder generates waveform from the intermediate representation. Although the intermediate representation is served as a bridge, there still exists critical mismatch between the acoustic model and the vocode… ▽ More

    Submitted 21 June, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021

  44. arXiv:2106.10828  [pdf, other

    eess.AS cs.SD

    Controllable Context-aware Conversational Speech Synthesis

    Authors: Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su

    Abstract: In spoken conversations, spontaneous behaviors like filled pause and prolongations always happen. Conversational partner tends to align features of their speech with their interlocutor which is known as entrainment. To produce human-like conversations, we propose a unified controllable spontaneous conversational speech synthesis framework to model the above two phenomena. Specifically, we use expl… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021

  45. arXiv:2105.01892  [pdf, other

    cs.AR cs.PF

    TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation

    Authors: Liqiang Lu, Naiqing Guan, Yuyue Wang, Liancheng Jia, Zizhang Luo, Jieming Yin, Jason Cong, Yun Liang

    Abstract: Accelerating tensor applications on spatial architectures provides high performance and energy-efficiency, but requires accurate performance models for evaluating various dataflow alternatives. Such modeling relies on the notation of tensor dataflow and the formulation of performance metrics. Recent proposed compute-centric and data-centric notations describe the dataflow using imperative directiv… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

  46. arXiv:2011.07446  [pdf, ps, other

    cs.IT

    Joint Placement Optimization and RNC in UAV-based Wireless Multicast Networks

    Authors: Xianzhen Guo, Bin Li, Jiayi Cong, Ruonan Zhang

    Abstract: Random network coding (RNC) is an efficient coding scheme to improve the performance of the broadband networks, especially for multimedia applications which are popular in 5G network. However, it is a challenging work to transmit the real time media data because of the time limitation and wide band requirement. Moreover, the topology of the network changes due to users' movement, causing huge chan… ▽ More

    Submitted 9 December, 2020; v1 submitted 14 November, 2020; originally announced November 2020.

  47. arXiv:2010.06075  [pdf, other

    cs.AR

    When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization

    Authors: Young-kyu Choi, Yuze Chi, Jie Wang, Licheng Guo, Jason Cong

    Abstract: With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers can now exploit unprecedented external memory bandwidth. This allows more memory-bounded applications to benefit from FPGA acceleration. However, we found that it is not easy to fully utilize the available bandwidth when develo** some applications with high-level synthesis (HLS) tools. This is due to the limitat… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  48. arXiv:2009.14381  [pdf, other

    cs.AR cs.PL

    AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators

    Authors: Atefeh Sohrabizadeh, Cody Hao Yu, Min Gao, Jason Cong

    Abstract: Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS), accelerator designers still have to manually perform code reconstruction and cumbersome parameter tuning to achieve the optimal performance. While many l… ▽ More

    Submitted 31 August, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: 25 pages

  49. arXiv:2009.11389  [pdf, other

    cs.AR

    Extending High-Level Synthesis for Task-Parallel Programs

    Authors: Yuze Chi, Licheng Guo, Jason Lau, Young-kyu Choi, Jie Wang, Jason Cong

    Abstract: C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short development cycles compared with the traditional register-transfer level design approach. Yet, limited by the sequential C semantics, it remains challenging to adop… ▽ More

    Submitted 5 May, 2021; v1 submitted 23 September, 2020; originally announced September 2020.

  50. arXiv:2008.04265  [pdf, other

    eess.AS cs.SD

    Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training

    Authors: Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu, Guanglu Wan

    Abstract: Data efficient voice cloning aims at synthesizing target speaker's voice with only a few enrollment samples at hand. To this end, speaker adaptation and speaker encoding are two typical methods based on base model trained from multiple speakers. The former uses a small set of target speaker data to transfer the multi-speaker model to target speaker's voice through direct model update, while in the… ▽ More

    Submitted 10 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted to INTERSPEECH 2020