Search | arXiv e-print repository

arXiv:2406.19002 [pdf, ps, other]

Coded Cooperative Networks for Semi-Decentralized Federated Learning

Authors: Shudi Weng, Ming Xiao, Mikael Skoglund

Abstract: To enhance straggler resilience in federated learning (FL) systems, a semi-decentralized approach has been recently proposed, enabling collaboration between clients. Unlike the existing semi-decentralized schemes, which adaptively adjust the collaboration weight according to the network topology, this letter proposes a deterministic coded network that leverages wireless diversity for semi-decentra… ▽ More To enhance straggler resilience in federated learning (FL) systems, a semi-decentralized approach has been recently proposed, enabling collaboration between clients. Unlike the existing semi-decentralized schemes, which adaptively adjust the collaboration weight according to the network topology, this letter proposes a deterministic coded network that leverages wireless diversity for semi-decentralized FL without requiring prior information about the entire network. Furthermore, the theoretical analyses of the outage and the convergence rate of the proposed scheme are provided. Finally, the superiority of our proposed method over benchmark methods is demonstrated through comprehensive simulations. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.15734 [pdf, other]

RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs

Authors: Changhai Zhou, Shijie Han, Shiyang Zhang, Shichao Weng, Zekai Liu, Cheng **

Abstract: The efficient compression of large language models (LLMs) is becoming increasingly popular. However, recovering the accuracy of compressed LLMs is still a major challenge. Structural pruning with standard Low-Rank Adaptation (LoRA) is a common technique in current LLM compression. In structural pruning, the model architecture is modified unevenly, resulting in suboptimal performance in various dow… ▽ More The efficient compression of large language models (LLMs) is becoming increasingly popular. However, recovering the accuracy of compressed LLMs is still a major challenge. Structural pruning with standard Low-Rank Adaptation (LoRA) is a common technique in current LLM compression. In structural pruning, the model architecture is modified unevenly, resulting in suboptimal performance in various downstream tasks via standard LoRA with fixed rank. To address this problem, we introduce RankAdaptor, an efficient fine-tuning method with hierarchical dynamic rank scheduling for pruned LLMs. An end-to-end automatic optimization flow is developed that utilizes a lightweight performance model to determine the different ranks during fine-tuning. Comprehensive experiments on popular benchmarks show that RankAdaptor consistently outperforms standard LoRA with structural pruning over different pruning settings. Without increasing the trainable parameters, RankAdaptor further reduces the accuracy performance gap between the recovery of the pruned model and the original model compared to standard LoRA. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2405.05001 [pdf, other]

HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution

Authors: Shu-Chuan Chu, Zhi-Chao Dou, Jeng-Shyang Pan, Shaowei Weng, Junbao Li

Abstract: Transformer-based methods have demonstrated excellent performance on super-resolution visual tasks, surpassing conventional convolutional neural networks. However, existing work typically restricts self-attention computation to non-overlap** windows to save computational costs. This means that Transformer-based networks can only use input information from a limited spatial range. Therefore, a no… ▽ More Transformer-based methods have demonstrated excellent performance on super-resolution visual tasks, surpassing conventional convolutional neural networks. However, existing work typically restricts self-attention computation to non-overlap** windows to save computational costs. This means that Transformer-based networks can only use input information from a limited spatial range. Therefore, a novel Hybrid Multi-Axis Aggregation network (HMA) is proposed in this paper to exploit feature potential information better. HMA is constructed by stacking Residual Hybrid Transformer Blocks(RHTB) and Grid Attention Blocks(GAB). On the one side, RHTB combines channel attention and self-attention to enhance non-local feature fusion and produce more attractive visual results. Conversely, GAB is used in cross-domain information interaction to jointly model similar features and obtain a larger perceptual field. For the super-resolution task in the training phase, a novel pre-training method is designed to enhance the model representation capabilities further and validate the proposed model's effectiveness through many experiments. The experimental results show that HMA outperforms the state-of-the-art methods on the benchmark dataset. We provide code and models at https://github.com/korouuuuu/HMA. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 12 pages, 10 figures, conference

arXiv:2403.06536 [pdf, other]

Multi-Scale Implicit Transformer with Re-parameterize for Arbitrary-Scale Super-Resolution

Authors: **chen Zhu, Mingjian Zhang, Ling Zheng, Shizhuang Weng

Abstract: Recently, the methods based on implicit neural representations have shown excellent capabilities for arbitrary-scale super-resolution (ASSR). Although these methods represent the features of an image by generating latent codes, these latent codes are difficult to adapt for different magnification factors of super-resolution, which seriously affects their performance. Addressing this, we design Mul… ▽ More Recently, the methods based on implicit neural representations have shown excellent capabilities for arbitrary-scale super-resolution (ASSR). Although these methods represent the features of an image by generating latent codes, these latent codes are difficult to adapt for different magnification factors of super-resolution, which seriously affects their performance. Addressing this, we design Multi-Scale Implicit Transformer (MSIT), consisting of an Multi-scale Neural Operator (MSNO) and Multi-Scale Self-Attention (MSSA). Among them, MSNO obtains multi-scale latent codes through feature enhancement, multi-scale characteristics extraction, and multi-scale characteristics merging. MSSA further enhances the multi-scale characteristics of latent codes, resulting in better performance. Furthermore, to improve the performance of network, we propose the Re-Interaction Module (RIM) combined with the cumulative training strategy to improve the diversity of learned information for the network. We have systematically introduced multi-scale characteristics for the first time in ASSR, extensive experiments are performed to validate the effectiveness of MSIT, and our method achieves state-of-the-art performance in arbitrary super-resolution tasks. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: Super-resolution, Arbitrary-Scale Super-Resolution, Multi-Scale, Transformer

arXiv:2402.18147 [pdf, other]

A Lightweight Low-Light Image Enhancement Network via Channel Prior and Gamma Correction

Authors: Shyang-En Weng, Shaou-Gang Miaou, Ricky Christanto

Abstract: Human vision relies heavily on available ambient light to perceive objects. Low-light scenes pose two distinct challenges: information loss due to insufficient illumination and undesirable brightness shifts. Low-light image enhancement (LLIE) refers to image enhancement technology tailored to handle this scenario. We introduce CPGA-Net, an innovative LLIE network that combines dark/bright channel… ▽ More Human vision relies heavily on available ambient light to perceive objects. Low-light scenes pose two distinct challenges: information loss due to insufficient illumination and undesirable brightness shifts. Low-light image enhancement (LLIE) refers to image enhancement technology tailored to handle this scenario. We introduce CPGA-Net, an innovative LLIE network that combines dark/bright channel priors and gamma correction via deep learning and integrates features inspired by the Atmospheric Scattering Model and the Retinex Theory. This approach combines the use of traditional and deep learning methodologies, designed within a simple yet efficient architectural framework that focuses on essential feature extraction. The resulting CPGA-Net is a lightweight network with only 0.025 million parameters and 0.030 seconds for inference time, yet it achieves superior performance over existing LLIE methods on both objective and subjective evaluation criteria. Furthermore, we utilized knowledge distillation with explainable factors and proposed an efficient version that achieves 0.018 million parameters and 0.006 seconds for inference time. The proposed approaches inject new solution ideas into LLIE, providing practical applications in challenging low-light scenarios. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: Preprint of an article submitted for consideration in [International Journal of Pattern Recognition and Artificial Intelligence] \c{opyright} [2024] [copyright World Scientific Publishing Company] [https://www.worldscientific.com/worldscinet/ijprai]

arXiv:2402.12184 [pdf, other]

Colorizing Monochromatic Radiance Fields

Authors: Yean Cheng, Renjie Wan, Shuchen Weng, Chengxuan Zhu, Yakun Chang, Boxin Shi

Abstract: Though Neural Radiance Fields (NeRF) can produce colorful 3D representations of the world by using a set of 2D images, such ability becomes non-existent when only monochromatic images are provided. Since color is necessary in representing the world, reproducing color from monochromatic radiance fields becomes crucial. To achieve this goal, instead of manipulating the monochromatic radiance fields… ▽ More Though Neural Radiance Fields (NeRF) can produce colorful 3D representations of the world by using a set of 2D images, such ability becomes non-existent when only monochromatic images are provided. Since color is necessary in representing the world, reproducing color from monochromatic radiance fields becomes crucial. To achieve this goal, instead of manipulating the monochromatic radiance fields directly, we consider it as a representation-prediction task in the Lab color space. By first constructing the luminance and density representation using monochromatic images, our prediction stage can recreate color representation on the basis of an image colorization module. We then reproduce a colorful implicit model through the representation of luminance, density, and color. Extensive experiments have been conducted to validate the effectiveness of our approaches. Our project page: https://liquidammonia.github.io/color-nerf. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11874 [pdf, other]

Language-guided Image Reflection Separation

Authors: Haofeng Zhong, Yuchen Hong, Shuchen Weng, **xiu Liang, Boxin Shi

Abstract: This paper studies the problem of language-guided reflection separation, which aims at addressing the ill-posed reflection separation problem by introducing language descriptions to provide layer content. We propose a unified framework to solve this problem, which leverages the cross-attention mechanism with contrastive learning strategies to construct the correspondence between language descripti… ▽ More This paper studies the problem of language-guided reflection separation, which aims at addressing the ill-posed reflection separation problem by introducing language descriptions to provide layer content. We propose a unified framework to solve this problem, which leverages the cross-attention mechanism with contrastive learning strategies to construct the correspondence between language descriptions and image layers. A gated network design and a randomized training strategy are employed to tackle the recognizable layer ambiguity. The effectiveness of the proposed method is validated by the significant performance advantage over existing reflection separation methods on both quantitative and qualitative comparisons. △ Less

Submitted 4 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2311.10746 [pdf, other]

EIT: Earnest Insight Toolkit for Evaluating Students' Earnestness in Interactive Lecture Participation Exercises

Authors: Mihran Miroyan, Shiny Weng, Rahul Shah, Lisa Yan, Narges Norouzi

Abstract: In today's rapidly evolving educational landscape, traditional modes of passive information delivery are giving way to transformative pedagogical approaches that prioritize active student engagement. Within the context of large-scale hybrid classrooms, the challenge lies in fostering meaningful and active interaction between students and course content. This study delves into the significance of m… ▽ More In today's rapidly evolving educational landscape, traditional modes of passive information delivery are giving way to transformative pedagogical approaches that prioritize active student engagement. Within the context of large-scale hybrid classrooms, the challenge lies in fostering meaningful and active interaction between students and course content. This study delves into the significance of measuring students' earnestness during interactive lecture participation exercises. By analyzing students' responses to interactive lecture poll questions, establishing a clear rubric for evaluating earnestness, and conducting a comprehensive assessment, we introduce EIT (Earnest Insight Toolkit), a tool designed to assess students' engagement within interactive lecture participation exercises - particularly in the context of large-scale hybrid classrooms. Through the utilization of EIT, our objective is to equip educators with valuable means of identifying at-risk students for enhancing intervention and support strategies, as well as measuring students' levels of engagement with course content. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2309.05786 [pdf, other]

doi 10.1145/3616195.3616209

Stringesthesia: Dynamically Shifting Musical Agency Between Audience and Performer Based on Trust in an Interactive and Improvised Performance

Authors: Torin Hopkins, Emily Doherty, Netta Ofer, Suibi Che Chuan Weng, Peter Gyrory, Chad Tobin, Leanne Hirshfield, Ellen Yi-Luen Do

Abstract: This paper introduces Stringesthesia, an interactive and improvised performance paradigm. Stringesthesia uses real-time neuroimaging to connect performers and audiences, enabling direct access to the performers mental state and determining audience participation during the performance. Functional near-infrared spectroscopy, or fNIRS, a noninvasive neuroimaging tool, was used to assess metabolic ac… ▽ More This paper introduces Stringesthesia, an interactive and improvised performance paradigm. Stringesthesia uses real-time neuroimaging to connect performers and audiences, enabling direct access to the performers mental state and determining audience participation during the performance. Functional near-infrared spectroscopy, or fNIRS, a noninvasive neuroimaging tool, was used to assess metabolic activity of brain areas collectively associated with a metric we call trust. A visualization representing the real-time measurement of the performers level of trust was projected behind the performer and used to dynamically restrict or promote audience participation. Throughout the paper we discuss prior work that heavily influenced our design, conceptual and methodological issues with using fNIRS technology, system architecture, and feedback from the audience and performer. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Journal ref: Audio Mostly 2023, Edinburgh, UK

arXiv:2309.00842 [pdf, other]

DualStream: Spatially Sharing Selves and Surroundings using Mobile Devices and Augmented Reality

Authors: Rishi Vanukuru, Suibi Che-Chuan Weng, Krithik Ranjan, Torin Hopkins, Amy Banic, Mark D. Gross, Ellen Yi-Luen Do

Abstract: In-person human interaction relies on our spatial perception of each other and our surroundings. Current remote communication tools partially address each of these aspects. Video calls convey real user representations but without spatial interactions. Augmented and Virtual Reality (AR/VR) experiences are immersive and spatial but often use virtual environments and characters instead of real-life r… ▽ More In-person human interaction relies on our spatial perception of each other and our surroundings. Current remote communication tools partially address each of these aspects. Video calls convey real user representations but without spatial interactions. Augmented and Virtual Reality (AR/VR) experiences are immersive and spatial but often use virtual environments and characters instead of real-life representations. Bridging these gaps, we introduce DualStream, a system for synchronous mobile AR remote communication that captures, streams, and displays spatial representations of users and their surroundings. DualStream supports transitions between user and environment representations with different levels of visuospatial fidelity, as well as the creation of persistent shared spaces using environment snapshots. We demonstrate how DualStream can enable spatial communication in real-world contexts, and support the creation of blended spaces for collaboration. A formative evaluation of DualStream revealed that users valued the ability to interact spatially and move between representations, and could see DualStream fitting into their own remote communication practices in the near future. Drawing from these findings, we discuss new opportunities for designing more widely accessible spatial communication tools, centered around the mobile phone. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: 10 pages, 4 figures, 1 table; To appear in the proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2023

arXiv:2305.15217 [pdf, other]

L-CAD: Language-based Colorization with Any-level Descriptions using Diffusion Priors

Authors: Zheng Chang, Shuchen Weng, Peixuan Zhang, Yu Li, Si Li, Boxin Shi

Abstract: Language-based colorization produces plausible and visually pleasing colors under the guidance of user-friendly natural language descriptions. Previous methods implicitly assume that users provide comprehensive color descriptions for most of the objects in the image, which leads to suboptimal performance. In this paper, we propose a unified model to perform language-based colorization with any-lev… ▽ More Language-based colorization produces plausible and visually pleasing colors under the guidance of user-friendly natural language descriptions. Previous methods implicitly assume that users provide comprehensive color descriptions for most of the objects in the image, which leads to suboptimal performance. In this paper, we propose a unified model to perform language-based colorization with any-level descriptions. We leverage the pretrained cross-modality generative model for its robust language understanding and rich color priors to handle the inherent ambiguity of any-level descriptions. We further design modules to align with input conditions to preserve local spatial structures and prevent the ghosting effect. With the proposed novel sampling strategy, our model achieves instance-aware colorization in diverse and complex scenarios. Extensive experimental results demonstrate our advantages of effectively handling any-level descriptions and outperforming both language-based and automatic colorization methods. The code and pretrained models are available at: https://github.com/changzheng123/L-CAD. △ Less

Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.11403 [pdf, other]

Efficient Mixed Transformer for Single Image Super-Resolution

Authors: Ling Zheng, **chen Zhu, **peng Shi, Shizhuang Weng

Abstract: Recently, Transformer-based methods have achieved impressive results in single image super-resolution (SISR). However, the lack of locality mechanism and high complexity limit their application in the field of super-resolution (SR). To solve these problems, we propose a new method, Efficient Mixed Transformer (EMT) in this study. Specifically, we propose the Mixed Transformer Block (MTB), consisti… ▽ More Recently, Transformer-based methods have achieved impressive results in single image super-resolution (SISR). However, the lack of locality mechanism and high complexity limit their application in the field of super-resolution (SR). To solve these problems, we propose a new method, Efficient Mixed Transformer (EMT) in this study. Specifically, we propose the Mixed Transformer Block (MTB), consisting of multiple consecutive transformer layers, in some of which the Pixel Mixer (PM) is used to replace the Self-Attention (SA). PM can enhance the local knowledge aggregation with pixel shifting operations. At the same time, no additional complexity is introduced as PM has no parameters and floating-point operations. Moreover, we employ striped window for SA (SWSA) to gain an efficient global dependency modelling by utilizing image anisotropy. Experimental results show that EMT outperforms the existing methods on benchmark dataset and achieved state-of-the-art performance. The Code is available at https://github.com/Fried-Rice-Lab/FriedRiceLab. △ Less

Submitted 19 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: Super-resolution, Long-range attention, Transformer, Locality

arXiv:2301.09869 [pdf, other]

Image Super-Resolution using Efficient Striped Window Transformer

Authors: **peng Shi, Hui Li, Tianle Liu, Yulong Liu, Mingjian Zhang, **chen Zhu, Ling Zheng, Shizhuang Weng

Abstract: Transformers have achieved remarkable results in single-image super-resolution (SR). However, the challenge of balancing model performance and complexity has hindered their application in lightweight SR (LSR). To tackle this challenge, we propose an efficient striped window transformer (ESWT). We revisit the normalization layer in the transformer and design a concise and efficient transformer stru… ▽ More Transformers have achieved remarkable results in single-image super-resolution (SR). However, the challenge of balancing model performance and complexity has hindered their application in lightweight SR (LSR). To tackle this challenge, we propose an efficient striped window transformer (ESWT). We revisit the normalization layer in the transformer and design a concise and efficient transformer structure to build the ESWT. Furthermore, we introduce a striped window mechanism to model long-term dependencies more efficiently. To fully exploit the potential of the ESWT, we propose a novel flexible window training strategy that can improve the performance of the ESWT without additional cost. Extensive experiments show that ESWT outperforms state-of-the-art LSR transformers, and achieves a better trade-off between model performance and complexity. The ESWT requires fewer parameters, incurs faster inference, smaller FLOPs, and less memory consumption, making it a promising solution for LSR. △ Less

Submitted 14 March, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

Comments: SOTA lightweight super-resolution transformer. 8 pages, 9 figures and 6 tables. The Code is available at https://github.com/Fried-Rice-Lab/FriedRiceLab

arXiv:2301.00062 [pdf]

FIPS Compliant Quantum Secure Communication using Quantum Permutation Pad

Authors: Alex He, Dafu Lou, Eric She, Shangjie Guo, Hareesh Watson, Sibyl Weng, Maria Perepechaenko, Rand Kuang

Abstract: Quantum computing has entered fast development track since Shor's algorithm was proposed in 1994. Multi-cloud services of quantum computing farms are currently available. One of which, IBM quantum computing, presented a road map showing their Kookaburra system with over 4158 qubits will be available in 2025. For the standardization of Post-Quantum Cryptography or PQC, the National Institute of Sta… ▽ More Quantum computing has entered fast development track since Shor's algorithm was proposed in 1994. Multi-cloud services of quantum computing farms are currently available. One of which, IBM quantum computing, presented a road map showing their Kookaburra system with over 4158 qubits will be available in 2025. For the standardization of Post-Quantum Cryptography or PQC, the National Institute of Standards and Technology or NIST recently announced the first candidates for standardization with one algorithm for key encapsulation mechanism (KEM), Kyber, and three algorithms for digital signatures. NIST has also issued a new call for quantum-safe digital signature algorithms due June 1, 2023. This timeline shows that FIPS-certified quantum-safe TLS protocol would take a predictably long time. However, "steal now, crack later" tactic requires protecting data against future quantum threat actors today. NIST recommended the use of a hybrid mode of TLS 1.3 with its extensions to support PQC. The hybrid mode works for certain cases but FIPS certification for the hybridized cryptomodule might still be required. This paper proposes to take a nested mode to enable TLS 1.3 protocol with quantum-safe data, which can be made available today and is FIPS compliant. We discussed the performance impacts of the handshaking phase of the nested TLS 1.3 with PQC and the symmetric encryption phase. The major impact on performance using the nested mode is in the data symmetric encryption with AES. To overcome this performance reduction, we suggest using quantum encryption with a quantum permutation pad for the data encryption with a minor performance reduction of less than 10 percent. △ Less

Submitted 28 December, 2023; v1 submitted 30 December, 2022; originally announced January 2023.

Comments: 6 pages, 3 figures, to be submitted for a conference

arXiv:2210.00198 [pdf, other]

Closed cap condition under the cap construction algorithm

Authors: Mercedes Sandu, Shuyi Weng, Jade Zhang

Abstract: Every polygon $P$ can be companioned by a cap polygon $\hat P$ such that $P$ and $\hat P$ serve as two parts of the boundary surface of a polyhedron $V$. Pairs of vertices on $P$ and $\hat P$ are identified successively to become vertices of $V$. In this paper, we study the cap construction that asserts equal angular defects at these pairings. We exhibit a linear relation that arises from the cap… ▽ More Every polygon $P$ can be companioned by a cap polygon $\hat P$ such that $P$ and $\hat P$ serve as two parts of the boundary surface of a polyhedron $V$. Pairs of vertices on $P$ and $\hat P$ are identified successively to become vertices of $V$. In this paper, we study the cap construction that asserts equal angular defects at these pairings. We exhibit a linear relation that arises from the cap construction algorithm, which in turn demonstrates an abundance of polygons that satisfy the closed cap condition, that is, those that can successfully undergo the cap construction process. △ Less

Submitted 11 June, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: 13 pages, 8 figures, accepted by Involve

arXiv:2103.05767 [pdf]

ZYELL-NCTU NetTraffic-1.0: A Large-Scale Dataset for Real-World Network Anomaly Detection

Authors: Lei Chen, Shao-En Weng, Chu-Jun Peng, Hong-Han Shuai, Wen-Huang Cheng

Abstract: Network security has been an active research topic for long. One critical issue is improving the anomaly detection capability of intrusion detection systems (IDSs), such as firewalls. However, existing network anomaly datasets are out of date (i.e., being collected many years ago) or IP-anonymized, making the data characteristics differ from today's network. Therefore, this work introduces a new,… ▽ More Network security has been an active research topic for long. One critical issue is improving the anomaly detection capability of intrusion detection systems (IDSs), such as firewalls. However, existing network anomaly datasets are out of date (i.e., being collected many years ago) or IP-anonymized, making the data characteristics differ from today's network. Therefore, this work introduces a new, large-scale, and real-world dataset, ZYELL-NCTU NetTraffic-1.0, which is collected from the raw output of firewalls in a real network, with the objective to advance the development of network security researches. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 2 pages, 3 tables, 1 figure

arXiv:2006.01189 [pdf, other]

An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features

Authors: Shi-Yan Weng, Tien-Hong Lo, Berlin Chen

Abstract: Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural network-based methods to extractive speech summarization. More recently, the Bidirectional Encoder Representations from Transformers (BERT) model was proposed an… ▽ More Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural network-based methods to extractive speech summarization. More recently, the Bidirectional Encoder Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. In view of this, we in this paper contextualize and enhance the state-of-the-art BERT-based model for speech summarization, while its contributions are at least three-fold. First, we explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition (ASR). Secondly, we also augment the sentence embeddings obtained from BERT with extra structural and linguistic features, such as sentence position and inverse document frequency (IDF) statistics. Finally, we validate the effectiveness of our proposed method on a benchmark dataset, in comparison to several classic and celebrated speech summarization methods. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: Accepted by EUSIPCO 2020

arXiv:2005.08440 [pdf]

An Effective End-to-End Modeling Approach for Mispronunciation Detection

Authors: Tien-Hong Lo, Shi-Yan Weng, Hsiu-Jui Chang, Berlin Chen

Abstract: Recently, end-to-end (E2E) automatic speech recognition (ASR) systems have garnered tremendous attention because of their great success and unified modeling paradigms in comparison to conventional hybrid DNN-HMM ASR systems. Despite the widespread adoption of E2E modeling frameworks on ASR, there still is a dearth of work on investigating the E2E frameworks for use in computer-assisted pronunciati… ▽ More Recently, end-to-end (E2E) automatic speech recognition (ASR) systems have garnered tremendous attention because of their great success and unified modeling paradigms in comparison to conventional hybrid DNN-HMM ASR systems. Despite the widespread adoption of E2E modeling frameworks on ASR, there still is a dearth of work on investigating the E2E frameworks for use in computer-assisted pronunciation learning (CAPT), particularly for Mispronunciation detection (MD). In response, we first present a novel use of hybrid CTCAttention approach to the MD task, taking advantage of the strengths of both CTC and the attention-based model meanwhile getting around the need for phone-level forced alignment. Second, we perform input augmentation with text prompt information to make the resulting E2E model more tailored for the MD task. On the other hand, we adopt two MD decision methods so as to better cooperate with the proposed framework: 1) decision-making based on a recognition confidence measure or 2) simply based on speech recognition results. A series of Mandarin MD experiments demonstrate that our approach not only simplifies the processing pipeline of existing hybrid DNN-HMM systems but also brings about systematic and substantial performance improvements. Furthermore, input augmentation with text prompts seems to hold excellent promise for the E2E-based MD approach. △ Less

Submitted 17 May, 2020; originally announced May 2020.

Comments: Submitted to Interspeech 2020

arXiv:2005.08433 [pdf, other]

The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge

Authors: Tien-Hong Lo, Fu-An Chao, Shi-Yan Weng, Berlin Chen

Abstract: This paper describes the NTNU ASR system participating in the Interspeech 2020 Non-Native Children's Speech ASR Challenge supported by the SIG-CHILD group of ISCA. This ASR shared task is made much more challenging due to the coexisting diversity of non-native and children speaking characteristics. In the setting of closed-track evaluation, all participants were restricted to develop their systems… ▽ More This paper describes the NTNU ASR system participating in the Interspeech 2020 Non-Native Children's Speech ASR Challenge supported by the SIG-CHILD group of ISCA. This ASR shared task is made much more challenging due to the coexisting diversity of non-native and children speaking characteristics. In the setting of closed-track evaluation, all participants were restricted to develop their systems merely based on the speech and text corpora provided by the organizer. To work around this under-resourced issue, we built our ASR system on top of CNN-TDNNF-based acoustic models, meanwhile harnessing the synergistic power of various data augmentation strategies, including both utterance- and word-level speed perturbation and spectrogram augmentation, alongside a simple yet effective data-cleansing approach. All variants of our ASR system employed an RNN-based language model to rescore the first-pass recognition hypotheses, which was trained solely on the text dataset released by the organizer. Our system with the best configuration came out in second place, resulting in a word error rate (WER) of 17.59 %, while those of the top-performing, second runner-up and official baseline systems are 15.67%, 18.71%, 35.09%, respectively. △ Less

Submitted 2 June, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: Submitted to Interspeech 2020 Special Session: Shared Task on Automatic Speech Recognition for Non-Native Children's Speech

arXiv:1908.00966 [pdf, other]

doi 10.1016/j.artmed.2020.101806

Mixed-Integer Optimization Approach to Learning Association Rules for Unplanned ICU Transfer

Authors: Chun-An Chou, Qingtao Cao, Shao-Jen Weng, Che-Hung Tsai

Abstract: After admission to emergency department (ED), patients with critical illnesses are transferred to intensive care unit (ICU) due to unexpected clinical deterioration occurrence. Identifying such unplanned ICU transfers is urgently needed for medical physicians to achieve two-fold goals: improving critical care quality and preventing mortality. A priority task is to understand the crucial rationale… ▽ More After admission to emergency department (ED), patients with critical illnesses are transferred to intensive care unit (ICU) due to unexpected clinical deterioration occurrence. Identifying such unplanned ICU transfers is urgently needed for medical physicians to achieve two-fold goals: improving critical care quality and preventing mortality. A priority task is to understand the crucial rationale behind diagnosis results of individual patients during stay in ED, which helps prepare for an early transfer to ICU. Most existing prediction studies were based on univariate analysis or multiple logistic regression to provide one-size-fit-all results. However, patient condition varying from case to case may not be accurately examined by the only judgment. In this study, we present a new decision tool using a mathematical optimization approach aiming to automatically discover rules associating diagnostic features with high-risk outcome (i.e., unplanned transfers) in different deterioration scenarios. We consider four mutually exclusive patient subgroups based on the principal reasons of ED visits: infections, cardiovascular/respiratory diseases, gastrointestinal diseases, and neurological/other diseases at a suburban teaching hospital. The analysis results demonstrate significant rules associated with unplanned transfer outcome for each subgroups and also show comparable prediction accuracy, compared to state-of-the-art machine learning methods while providing easy-to-interpret symptom-outcome information. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Journal ref: Artificial Intelligence in Medicine, 2020

arXiv:1805.07740 [pdf, other]

STS Classification with Dual-stream CNN

Authors: Shuchen Weng, Wenbo Li, Yi Zhang, Siwei Lyu

Abstract: The structured time series (STS) classification problem requires the modeling of interweaved spatiotemporal dependency. most previous STS classification methods model the spatial and temporal dependencies independently. Due to the complexity of the STS data, we argue that a desirable STS classification method should be a holistic framework that can be made as adaptive and flexible as possible. Thi… ▽ More The structured time series (STS) classification problem requires the modeling of interweaved spatiotemporal dependency. most previous STS classification methods model the spatial and temporal dependencies independently. Due to the complexity of the STS data, we argue that a desirable STS classification method should be a holistic framework that can be made as adaptive and flexible as possible. This motivates us to design a deep neural network with such merits. Inspired by the dual-stream hypothesis in neural science, we propose a novel dual-stream framework for modeling the interweaved spatiotemporal dependency, and develop a convolutional neural network within this framework that aims to achieve high adaptability and flexibility in STS configurations from various diagonals, i.e., sequential order, dependency range and features. The proposed architecture is highly modularized and scalable, making it easy to be adapted to specific tasks. The effectiveness of our model is demonstrated through experiments on synthetic data as well as benchmark datasets for skeleton based activity recognition. △ Less

Submitted 20 May, 2018; originally announced May 2018.

arXiv:1611.00692 [pdf]

Towards Automatic Resource Bound Analysis for OCaml

Authors: Jan Hoffmann, Ankush Das, Shu-Chun Weng

Abstract: This article presents a resource analysis system for OCaml programs. This system automatically derives worst-case resource bounds for higher-order polymorphic programs with user-defined inductive types. The technique is parametric in the resource and can derive bounds for time, memory allocations and energy usage. The derived bounds are multivariate resource polynomials which are functions of diff… ▽ More This article presents a resource analysis system for OCaml programs. This system automatically derives worst-case resource bounds for higher-order polymorphic programs with user-defined inductive types. The technique is parametric in the resource and can derive bounds for time, memory allocations and energy usage. The derived bounds are multivariate resource polynomials which are functions of different size parameters that depend on the standard OCaml types. Bound inference is fully automatic and reduced to a linear optimization problem that is passed to an off-the-shelf LP solver. Technically, the analysis system is based on a novel multivariate automatic amortized resource analysis (AARA). It builds on existing work on linear AARA for higher-order programs with user-defined inductive types and on multivariate AARA for first-order programs with built-in lists and binary trees. For the first time, it is possible to automatically derive polynomial bounds for higher-order functions and polynomial bounds that depend on user-defined inductive types. Moreover, the analysis handles programs with side effects and even outperforms the linear bound inference of previous systems. At the same time, it preserves the expressivity and efficiency of existing AARA techniques. The practicality of the analysis system is demonstrated with an implementation and integration with Inria's OCaml compiler. The implementation is used to automatically derive resource bounds for 411 functions and 6018 lines of code derived from OCaml libraries, the CompCert compiler, and implementations of textbook algorithms. In a case study, the system infers bounds on the number of queries that are sent by OCaml programs to DynamoDB, a commercial NoSQL cloud database service. △ Less

Submitted 2 November, 2016; originally announced November 2016.

Comments: 74 pages, technical report, short version accepted at POPL 2017

arXiv:1511.04519 [pdf, ps, other]

doi 10.1145/2593069.2593160

MATEX: A Distributed Framework for Transient Simulation of Power Distribution Networks

Authors: Hao Zhuang, Shih-Hung Weng, Jeng-Hau Lin, Chung-Kuan Cheng

Abstract: We proposed MATEX, a distributed framework for transient simulation of power distribution networks (PDNs). MATEX utilizes matrix exponential kernel with Krylov subspace approximations to solve differential equations of linear circuit. First, the whole simulation task is divided into subtasks based on decompositions of current sources, in order to reduce the computational overheads. Then these subt… ▽ More We proposed MATEX, a distributed framework for transient simulation of power distribution networks (PDNs). MATEX utilizes matrix exponential kernel with Krylov subspace approximations to solve differential equations of linear circuit. First, the whole simulation task is divided into subtasks based on decompositions of current sources, in order to reduce the computational overheads. Then these subtasks are distributed to different computing nodes and processed in parallel. Within each node, after the matrix factorization at the beginning of simulation, the adaptive time step** solver is performed without extra matrix re-factorizations. MATEX overcomes the stiff-ness hinder of previous matrix exponential-based circuit simulator by rational Krylov subspace method, which leads to larger step sizes with smaller dimensions of Krylov subspace bases and highly accelerates the whole computation. MATEX outperforms both traditional fixed and adaptive time step** methods, e.g., achieving around 13X over the trapezoidal framework with fixed time step for the IBM power grid benchmarks. △ Less

Submitted 14 November, 2015; originally announced November 2015.

Comments: ACM/IEEE DAC 2014. arXiv admin note: substantial text overlap with arXiv:1505.06699

arXiv:1507.06711 [pdf, other]

The SYSU System for the Interspeech 2015 Automatic Speaker Verification Spoofing and Countermeasures Challenge

Authors: Shitao Weng, Shushan Chen, Lei Yu, Xuewei Wu, Weicheng Cai, Zhi Liu, Ming Li

Abstract: Many existing speaker verification systems are reported to be vulnerable against different spoofing attacks, for example speaker-adapted speech synthesis, voice conversion, play back, etc. In order to detect these spoofed speech signals as a countermeasure, we propose a score level fusion approach with several different i-vector subsystems. We show that the acoustic level Mel-frequency cepstral co… ▽ More Many existing speaker verification systems are reported to be vulnerable against different spoofing attacks, for example speaker-adapted speech synthesis, voice conversion, play back, etc. In order to detect these spoofed speech signals as a countermeasure, we propose a score level fusion approach with several different i-vector subsystems. We show that the acoustic level Mel-frequency cepstral coefficients (MFCC) features, the phase level modified group delay cepstral coefficients (MGDCC) and the phonetic level phoneme posterior probability (PPP) tandem features are effective for the countermeasure. Furthermore, feature level fusion of these features before i-vector modeling also enhance the performance. A polynomial kernel support vector machine is adopted as the supervised classifier. In order to enhance the generalizability of the countermeasure, we also adopted the cosine similarity and PLDA scoring as one-class classifications methods. By combining the proposed i-vector subsystems with the OpenSMILE baseline which covers the acoustic and prosodic information further improves the final performance. The proposed fusion system achieves 0.29% and 3.26% EER on the development and test set of the database provided by the INTERSPEECH 2015 automatic speaker verification spoofing and countermeasures challenge. △ Less

Submitted 29 July, 2015; v1 submitted 23 July, 2015; originally announced July 2015.

Comments: 5 pages, 1 figure

arXiv:1505.06699 [pdf, ps, other]

doi 10.1109/TCAD.2016.2523908

Simulation Algorithms with Exponential Integration for Time-Domain Analysis of Large-Scale Power Delivery Networks

Authors: Hao Zhuang, Wenjian Yu, Shih-Hung Weng, Ilgweon Kang, Jeng-Hau Lin, Xiang Zhang, Ryan Coutts, Chung-Kuan Cheng

Abstract: We design an algorithmic framework using matrix exponentials for time-domain simulation of power delivery network (PDN). Our framework can reuse factorized matrices to simulate the large-scale linear PDN system with variable stepsizes. In contrast, current conventional PDN simulation solvers have to use fixed step-size approach in order to reuse factorized matrices generated by the expensive matri… ▽ More We design an algorithmic framework using matrix exponentials for time-domain simulation of power delivery network (PDN). Our framework can reuse factorized matrices to simulate the large-scale linear PDN system with variable stepsizes. In contrast, current conventional PDN simulation solvers have to use fixed step-size approach in order to reuse factorized matrices generated by the expensive matrix decomposition. Based on the proposed exponential integration framework, we design a PDN solver R-MATEX with the flexible time-step** capability. The key operation of matrix exponential and vector product (MEVP) is computed by the rational Krylov subspace method. To further improve the runtime, we also propose a distributed computing framework DR-MATEX. DR-MATEX reduces Krylov subspace generations caused by frequent breakpoints from a large number of current sources during simulation. By virtue of the superposition property of linear system and scaling invariance property of Krylov subspace, DR-MATEX can divide the whole simulation task into subtasks based on the alignments of breakpoints among those sources. The subtasks are processed in parallel at different computing nodes without any communication during the computation of transient simulation. The final result is obtained by summing up the partial results among all the computing nodes after they finish the assigned subtasks. Therefore, our computation model belongs to the category known as Embarrassingly Parallel model. Experimental results show R-MATEX and DR-MATEX can achieve up to around 14.4X and 98.0X runtime speedups over traditional trapezoidal integration based solver with fixed timestep approach. △ Less

Submitted 1 February, 2016; v1 submitted 25 May, 2015; originally announced May 2015.

Comments: Accepted by IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems (TCAD)

arXiv:1309.5333 [pdf, ps, other]

Power Grid Simulation using Matrix Exponential Method with Rational Krylov Subspaces

Authors: Hao Zhuang, Shih-Hung Weng, Chung-Kuan Cheng

Abstract: One well adopted power grid simulation methodology is to factorize matrix once and perform only backward forward substitution with a deliberately chosen step size along the simulation. Since the required simulation time is usually long for the power grid design, the costly factorization is amortized. However, such fixed step size cannot exploit larger step size for the low frequency response in th… ▽ More One well adopted power grid simulation methodology is to factorize matrix once and perform only backward forward substitution with a deliberately chosen step size along the simulation. Since the required simulation time is usually long for the power grid design, the costly factorization is amortized. However, such fixed step size cannot exploit larger step size for the low frequency response in the power grid to speedup the simulation. In this work, we utilize the matrix exponential method with the rational Krylov subspace approximation to enable adaptive step size in the power grid simulation. The kernel operation in our method only demands one factorization and backward forward substitutions. Moreover, the rational Krylov subspace approximation can relax the stiffness constraint of the previous works. The cheap computation of adaptivity in our method could exploit the long low frequency response in a power grid and significantly accelerate the simulation. The experimental results show that our method achieves up to 18X speedup over the trapezoidal method with fixed step size. △ Less

Submitted 14 October, 2013; v1 submitted 20 September, 2013; originally announced September 2013.

arXiv:1005.3450 [pdf, ps, other]

Efficient System-Enforced Deterministic Parallelism

Authors: Amittai Aviram, Shu-Chun Weng, Sen Hu, Bryan Ford

Abstract: Deterministic execution offers many benefits for debugging, fault tolerance, and security. Running parallel programs deterministically is usually difficult and costly, however - especially if we desire system-enforced determinism, ensuring precise repeatability of arbitrarily buggy or malicious software. Determinator is a novel operating system that enforces determinism on both multithreaded and m… ▽ More Deterministic execution offers many benefits for debugging, fault tolerance, and security. Running parallel programs deterministically is usually difficult and costly, however - especially if we desire system-enforced determinism, ensuring precise repeatability of arbitrarily buggy or malicious software. Determinator is a novel operating system that enforces determinism on both multithreaded and multi-process computations. Determinator's kernel provides only single-threaded, "shared-nothing" address spaces interacting via deterministic synchronization. An untrusted user-level runtime uses distributed computing techniques to emulate familiar abstractions such as Unix processes, file systems, and shared memory multithreading. The system runs parallel applications deterministically both on multicore PCs and across nodes in a cluster. Coarse-grained parallel benchmarks perform and scale comparably to - sometimes better than - conventional systems, though determinism is costly for fine-grained parallel applications. △ Less

Submitted 19 May, 2010; originally announced May 2010.

Comments: 14 pages, 12 figures, 3 tables

Showing 1–27 of 27 results for author: Weng, S