-
The Role of Electric Grid Research in Addressing Climate Change
Authors:
Le Xie,
Subir Majumder,
Tong Huang,
Qian Zhang,
** Chang,
David J. Hill,
Mohammad Shahidehpour
Abstract:
Addressing the urgency of climate change necessitates a coordinated and inclusive effort from all relevant stakeholders. Critical to this effort is the modeling, analysis, control, and integration of technological innovations within the electric energy system, which plays a crucial role in scaling up climate change solutions. This perspective article presents a set of research challenges and oppor…
▽ More
Addressing the urgency of climate change necessitates a coordinated and inclusive effort from all relevant stakeholders. Critical to this effort is the modeling, analysis, control, and integration of technological innovations within the electric energy system, which plays a crucial role in scaling up climate change solutions. This perspective article presents a set of research challenges and opportunities in the area of electric power systems that would be crucial in accelerating Gigaton-level decarbonization. Furthermore, it highlights institutional challenges associated with develo** market mechanisms and regulatory architectures, ensuring that incentives are aligned for stakeholders to effectively implement the technological solutions on a large scale.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Impact of Climate Simulation Resolutions on Future Energy System Reliability Assessment: A Texas Case Study
Authors:
Xiangtian Zheng,
Le Xie,
Kiyeob Lee,
Dan Fu,
Jiahan Wu,
** Chang
Abstract:
The reliability of energy systems is strongly influenced by the prevailing climate conditions. With the increasing prevalence of renewable energy sources, the interdependence between energy and climate systems has become even stronger. This study examines the impact of different spatial resolutions in climate modeling on energy grid reliability assessment, with the Texas interconnection between 20…
▽ More
The reliability of energy systems is strongly influenced by the prevailing climate conditions. With the increasing prevalence of renewable energy sources, the interdependence between energy and climate systems has become even stronger. This study examines the impact of different spatial resolutions in climate modeling on energy grid reliability assessment, with the Texas interconnection between 2033 and 2043 serving as a pilot case study. Our preliminary findings indicate that while low-resolution climate simulations can provide a rough estimate of system reliability, high-resolution simulations can provide more informative assessment of low-adequacy extreme events. Furthermore, both high and low-resolution assessments suggest the need to prepare for severe blackout events in winter due to extremely low temperatures.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition
Authors:
Ruchao Fan,
Wei Chu,
Peng Chang,
Abeer Alwan
Abstract:
Recently, end-to-end models have been widely used in automatic speech recognition (ASR) systems. Two of the most representative approaches are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. Autoregressive transformers, variants of AED, adopt an autoregressive mechanism for token generation and thus are relatively slow during inference. In this paper,…
▽ More
Recently, end-to-end models have been widely used in automatic speech recognition (ASR) systems. Two of the most representative approaches are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. Autoregressive transformers, variants of AED, adopt an autoregressive mechanism for token generation and thus are relatively slow during inference. In this paper, we present a comprehensive study of a CTC Alignment-based Single-Step Non-Autoregressive Transformer (CASS-NAT) for end-to-end ASR. In CASS-NAT, word embeddings in the autoregressive transformer (AT) are substituted with token-level acoustic embeddings (TAE) that are extracted from encoder outputs with the acoustical boundary information offered by the CTC alignment. TAE can be obtained in parallel, resulting in a parallel generation of output tokens. During training, Viterbi-alignment is used for TAE generation, and multiple training strategies are further explored to improve the word error rate (WER) performance. During inference, an error-based alignment sampling method is investigated in depth to reduce the alignment mismatch in the training and testing processes. Experimental results show that the CASS-NAT has a WER that is close to AT on various ASR tasks, while providing a ~24x inference speedup. With and without self-supervised learning, we achieve new state-of-the-art results for non-autoregressive models on several datasets. We also analyze the behavior of the CASS-NAT decoder to explain why it can perform similarly to AT. We find that TAEs have similar functionality to word embeddings for grammatical structures, which might indicate the possibility of learning some semantic information from TAEs without a language model.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Facial Image Reconstruction from Functional Magnetic Resonance Imaging via GAN Inversion with Improved Attribute Consistency
Authors:
Pei-Chun Chang,
Yan-Yu Tien,
Chia-Lin Chen,
Li-Fen Chen,
Yong-Sheng Chen,
Hui-Ling Chan
Abstract:
Neuroscience studies have revealed that the brain encodes visual content and embeds information in neural activity. Recently, deep learning techniques have facilitated attempts to address visual reconstructions by map** brain activity to image stimuli using generative adversarial networks (GANs). However, none of these studies have considered the semantic meaning of latent code in image space. O…
▽ More
Neuroscience studies have revealed that the brain encodes visual content and embeds information in neural activity. Recently, deep learning techniques have facilitated attempts to address visual reconstructions by map** brain activity to image stimuli using generative adversarial networks (GANs). However, none of these studies have considered the semantic meaning of latent code in image space. Omitting semantic information could potentially limit the performance. In this study, we propose a new framework to reconstruct facial images from functional Magnetic Resonance Imaging (fMRI) data. With this framework, the GAN inversion is first applied to train an image encoder to extract latent codes in image space, which are then bridged to fMRI data using linear transformation. Following the attributes identified from fMRI data using an attribute classifier, the direction in which to manipulate attributes is decided and the attribute manipulator adjusts the latent code to improve the consistency between the seen image and the reconstructed image. Our experimental results suggest that the proposed framework accomplishes two goals: (1) reconstructing clear facial images from fMRI data and (2) maintaining the consistency of semantic characteristics.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment
Authors:
Yuan Gong,
Ziyi Chen,
Iek-Heng Chu,
Peng Chang,
James Glass
Abstract:
Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level). In this work, we explore modeling multi-aspect pronunciation assessment at mu…
▽ More
Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level). In this work, we explore modeling multi-aspect pronunciation assessment at multiple granularities. Specifically, we train a Goodness Of Pronunciation feature-based Transformer (GOPT) with multi-task learning. Experiments show that GOPT achieves the best results on speechocean762 with a public automatic speech recognition (ASR) acoustic model trained on Librispeech.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
A 1.5GS/s 8b Pipelined-SAR ADC with Output Level Shifting Settling Technique in 14nm CMOS
Authors:
Yuanming Zhu,
Shengchang Cai,
Shiva Kiran,
Yang-Hang Fan,
Po-Hsuan Chang,
Sebastian Hoyos,
Samuel Palermo
Abstract:
A single channel 1.5GS/s 8-bit pipelined-SAR ADC utilizes a novel output level shifting (OLS) settling technique to reduce the power and enable low-voltage operation of the dynamic residue amplifier. The ADC consists of a 4-bit first stage and a 5-bit second stage, with 1-bit redundancy to relax the offset, gain, and settling requirements of the first stage. Employing the OLS technique allows for…
▽ More
A single channel 1.5GS/s 8-bit pipelined-SAR ADC utilizes a novel output level shifting (OLS) settling technique to reduce the power and enable low-voltage operation of the dynamic residue amplifier. The ADC consists of a 4-bit first stage and a 5-bit second stage, with 1-bit redundancy to relax the offset, gain, and settling requirements of the first stage. Employing the OLS technique allows for an inter-stage gain of ~4 from the dynamic residue amplifier with a settling time that is only 28% of a conventional CML amplifier. The ADC's conversion speed is further improved with the use of parallel comparators in the two asynchronous stages. Fabricated in a 14nm FinFET technology, the ADC occupies 0.0013mm2 core area and operates with a 0.8V supply. 6.6-bit ENOB is achieved at Nyquist while consuming 2.4mW, resulting in an FOM of 16.7fJ/conv.-step.
△ Less
Submitted 20 August, 2022; v1 submitted 8 January, 2022;
originally announced January 2022.
-
MS-SincResNet: Joint learning of 1D and 2D kernels using multi-scale SincNet and ResNet for music genre classification
Authors:
Pei-Chun Chang,
Yong-Sheng Chen,
Chang-Hsing Lee
Abstract:
In this study, we proposed a new end-to-end convolutional neural network, called MS-SincResNet, for music genre classification. MS-SincResNet appends 1D multi-scale SincNet (MS-SincNet) to 2D ResNet as the first convolutional layer in an attempt to jointly learn 1D kernels and 2D kernels during the training stage. First, an input music signal is divided into a number of fixed-duration (3 seconds i…
▽ More
In this study, we proposed a new end-to-end convolutional neural network, called MS-SincResNet, for music genre classification. MS-SincResNet appends 1D multi-scale SincNet (MS-SincNet) to 2D ResNet as the first convolutional layer in an attempt to jointly learn 1D kernels and 2D kernels during the training stage. First, an input music signal is divided into a number of fixed-duration (3 seconds in this study) music clips, and the raw waveform of each music clip is fed into 1D MS-SincNet filter learning module to obtain three-channel 2D representations. The learned representations carry rich timbral, harmonic, and percussive characteristics comparing with spectrograms, harmonic spectrograms, percussive spectrograms and Mel-spectrograms. ResNet is then used to extract discriminative embeddings from these 2D representations. The spatial pyramid pooling (SPP) module is further used to enhance the feature discriminability, in terms of both time and frequency aspects, to obtain the classification label of each music clip. Finally, the voting strategy is applied to summarize the classification results from all 3-second music clips. In our experimental results, we demonstrate that the proposed MS-SincResNet outperforms the baseline SincNet and many well-known hand-crafted features. Considering individual 2D representation, MS-SincResNet also yields competitive results with the state-of-the-art methods on the GTZAN dataset and the ISMIR2004 dataset. The code is available at https://github.com/PeiChunChang/MS-SincResNet
△ Less
Submitted 18 September, 2021;
originally announced September 2021.
-
An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition
Authors:
Ruchao Fan,
Wei Chu,
Peng Chang,
**g Xiao,
Abeer Alwan
Abstract:
Non-autoregressive mechanisms can significantly decrease inference time for speech transformers, especially when the single step variant is applied. Previous work on CTC alignment-based single step non-autoregressive transformer (CASS-NAT) has shown a large real time factor (RTF) improvement over autoregressive transformers (AT). In this work, we propose several methods to improve the accuracy of…
▽ More
Non-autoregressive mechanisms can significantly decrease inference time for speech transformers, especially when the single step variant is applied. Previous work on CTC alignment-based single step non-autoregressive transformer (CASS-NAT) has shown a large real time factor (RTF) improvement over autoregressive transformers (AT). In this work, we propose several methods to improve the accuracy of the end-to-end CASS-NAT, followed by performance analyses. First, convolution augmented self-attention blocks are applied to both the encoder and decoder modules. Second, we propose to expand the trigger mask (acoustic boundary) for each token to increase the robustness of CTC alignments. In addition, iterated loss functions are used to enhance the gradient update of low-layer parameters. Without using an external language model, the WERs of the improved CASS-NAT, when using the three methods, are 3.1%/7.2% on Librispeech test clean/other sets and the CER is 5.4% on the Aishell1 test set, achieving a 7%~21% relative WER/CER improvement. For the analyses, we plot attention weight distributions in the decoders to visualize the relationships between token-level acoustic embeddings. When the acoustic embeddings are visualized, we find that they have a similar behavior to word embeddings, which explains why the improved CASS-NAT performs similarly to AT.
△ Less
Submitted 21 July, 2021; v1 submitted 17 June, 2021;
originally announced June 2021.
-
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition
Authors:
Ruchao Fan,
Wei Chu,
Peng Chang,
**g Xiao
Abstract:
We propose a CTC alignment-based single step non-autoregressive transformer (CASS-NAT) for speech recognition. Specifically, the CTC alignment contains the information of (a) the number of tokens for decoder input, and (b) the time span of acoustics for each token. The information are used to extract acoustic representation for each token in parallel, referred to as token-level acoustic embedding…
▽ More
We propose a CTC alignment-based single step non-autoregressive transformer (CASS-NAT) for speech recognition. Specifically, the CTC alignment contains the information of (a) the number of tokens for decoder input, and (b) the time span of acoustics for each token. The information are used to extract acoustic representation for each token in parallel, referred to as token-level acoustic embedding which substitutes the word embedding in autoregressive transformer (AT) to achieve parallel generation in decoder. During inference, an error-based alignment sampling method is proposed to be applied to the CTC output space, reducing the WER and retaining the parallelism as well. Experimental results show that the proposed method achieves WERs of 3.8%/9.1% on Librispeech test clean/other dataset without an external LM, and a CER of 5.8% on Aishell1 Mandarin corpus, respectively1. Compared to the AT baseline, the CASS-NAT has a performance reduction on WER, but is 51.2x faster in terms of RTF. When decoding with an oracle CTC alignment, the lower bound of WER without LM reaches 2.3% on the test-clean set, indicating the potential of the proposed method.
△ Less
Submitted 11 February, 2021; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Demonstration of multivariate photonics: blind dimensionality reduction with analog integrated photonics
Authors:
Alexander N. Tait,
Philip Y. Ma,
Thomas Ferreira de Lima,
Eric C. Blow,
Matthew P. Chang,
Mitchell A. Nahmias,
Bhavin J. Shastri,
Paul R. Prucnal
Abstract:
Multi-antenna radio front-ends generate a multi-dimensional flood of information, most of which is partially redundant. Redundancy is eliminated by dimensionality reduction, but contemporary digital processing techniques face harsh fundamental tradeoffs when implementing this class of functions. These tradeoffs can be broken in the analog domain, in which the performance of optical technologies gr…
▽ More
Multi-antenna radio front-ends generate a multi-dimensional flood of information, most of which is partially redundant. Redundancy is eliminated by dimensionality reduction, but contemporary digital processing techniques face harsh fundamental tradeoffs when implementing this class of functions. These tradeoffs can be broken in the analog domain, in which the performance of optical technologies greatly exceeds that of electronic counterparts. Here, we present concepts, methods, and a first demonstration of multivariate photonics: a combination of integrated photonic hardware, analog dimensionality reduction, and blind algorithmic techniques. We experimentally demonstrate 2-channel, 1.0 GHz principal component analysis in a photonic weight bank using recently proposed algorithms for synthesizing the multivariate properties of signals to which the receiver is blind. Novel methods are introduced for controlling blindness conditions in a laboratory context. This work provides a foundation for further research in multivariate photonic information processing, which is poised to play a role in future generations of wireless technology.
△ Less
Submitted 10 February, 2019;
originally announced March 2019.