Search | arXiv e-print repository

arXiv:2404.14424 [pdf]

Ultrafast Vibrational Control of Hybrid Perovskite Devices Reveals the Influence of the Organic Cation on Electronic Dynamics

Authors: Nathaniel. P. Gallop, Dmitry R. Maslennikov, Katelyn P. Goetz, Zhenbang Dai, Aaron M. Schankler, Woongmo Sung, Satoshi Nihonyanagi, Tahei Tahara, Maryna Bodnarchuk, Maksym Kovalenko, Yana Vaynzof, Andrew M. Rappe, Artem A. Bakulin

Abstract: Vibrational control (VC) of photochemistry through the optical stimulation of structural dynamics is a nascent concept only recently demonstrated for model molecules in solution. Extending VC to state-of-the-art materials may lead to new applications and improved performance for optoelectronic devices. Metal halide perovskites are promising targets for VC due to their mechanical softness and the r… ▽ More Vibrational control (VC) of photochemistry through the optical stimulation of structural dynamics is a nascent concept only recently demonstrated for model molecules in solution. Extending VC to state-of-the-art materials may lead to new applications and improved performance for optoelectronic devices. Metal halide perovskites are promising targets for VC due to their mechanical softness and the rich array of vibrational motions of both their inorganic and organic sublattices. Here, we demonstrate the ultrafast VC of FAPbBr3 perovskite solar cells via intramolecular vibrations of the formamidinium cation using spectroscopic techniques based on vibrationally promoted electronic resonance. The observed short (~300 fs) time window of VC highlights the fast dynamics of coupling between the cation and inorganic sublattice. First-principles modelling reveals that this coupling is mediated by hydrogen bonds that modulate both lead halide lattice and electronic states. Cation dynamics modulating this coupling may suppress non-radiative recombination in perovskites, leading to photovoltaics with reduced voltage losses. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2401.10315 [pdf, other]

Joint Processing and Transmission Energy Optimization for ISAC in Cell-Free Massive MIMO with URLLC

Authors: Zinat Behdad, Özlem Tuğfe Demir, Ki Won Sung, Cicek Cavdar

Abstract: In this paper, we explore the concept of integrated sensing and communication (ISAC) within a downlink cell-free massive MIMO (multiple-input multiple-output) system featuring multi-static sensing and users requiring ultra-reliable low-latency communications (URLLC). Our focus involves the formulation of two non-convex algorithms that jointly solve power and blocklength allocation for end-to-end (… ▽ More In this paper, we explore the concept of integrated sensing and communication (ISAC) within a downlink cell-free massive MIMO (multiple-input multiple-output) system featuring multi-static sensing and users requiring ultra-reliable low-latency communications (URLLC). Our focus involves the formulation of two non-convex algorithms that jointly solve power and blocklength allocation for end-to-end (E2E) minimization. The objectives are to jointly minimize sensing/communication processing and transmission energy consumption, while simultaneously meeting the requirements for sensing and URLLC. To address the inherent non-convexity of these optimization problems, we utilize techniques such as the Feasible Point Pursuit - Successive Convex Approximation (FPP-SCA), Concave-Convex Programming (CCP), and fractional programming. We conduct a comparative analysis of the performance of these algorithms in ISAC scenarios and against a URLLC-only scenario where sensing is not integrated. Our numerical results highlight the superior performance of the E2E energy minimization algorithm, especially in scenarios without sensing capability. Additionally, our study underscores the increasing prominence of energy consumption associated with sensing processing tasks as the number of sensing receive access points rises. Furthermore, the results emphasize that a higher sensing signal-to-interference-plus-noise ratio threshold is associated with an escalation in E2E energy consumption, thereby narrowing the performance gap between the two proposed algorithms. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 13 pages, 8 figures. arXiv admin note: text overlap with arXiv:2401.10133

arXiv:2401.10133 [pdf, ps, other]

Interplay between Sensing and Communication in Cell-Free Massive MIMO with URLLC Users

Authors: Zinat Behdad, Özlem Tuğfe Demir, Ki Won Sung, Cicek Cavdar

Abstract: This paper studies integrated sensing and communication (ISAC) in the downlink of a cell-free massive multiple-input multiple-output (MIMO) system with multi-static sensing and ultra-reliable low-latency communication (URLLC) users. We propose a successive convex approximation-based power allocation algorithm that maximizes energy efficiency while satisfying the sensing and URLLC requirements. In… ▽ More This paper studies integrated sensing and communication (ISAC) in the downlink of a cell-free massive multiple-input multiple-output (MIMO) system with multi-static sensing and ultra-reliable low-latency communication (URLLC) users. We propose a successive convex approximation-based power allocation algorithm that maximizes energy efficiency while satisfying the sensing and URLLC requirements. In addition, we provide a new definition for network availability, which accounts for both sensing and URLLC requirements. The impact of blocklength, sensing requirement, and required reliability as a function of decoding error probability on network availability and energy efficiency is investigated. The proposed power allocation algorithm is compared to a communication-centric approach where only the URLLC requirement is considered. It is shown that the URLLC-only approach is incapable of meeting sensing requirements, while the proposed ISAC algorithm fulfills both sensing and URLLC requirements, albeit with an associated increase in energy consumption. This increment can be reduced up to 75% by utilizing additional symbols for sensing. It is also demonstrated that larger blocklengths enhance network availability and offer greater robustness against stringent reliability requirements. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 6 pages, 3 figures

arXiv:2311.05161 [pdf, other]

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

Authors: Jangwhan Lee, Minsoo Kim, Seungcheol Baek, Seok Joong Hwang, Wonyong Sung, Jungwook Choi

Abstract: Large Language Models (LLMs) are proficient in natural language processing tasks, but their deployment is often restricted by extensive parameter sizes and computational demands. This paper focuses on post-training quantization (PTQ) in LLMs, specifically 4-bit weight and 8-bit activation (W4A8) quantization, to enhance computational efficiency -- a topic less explored compared to weight-only quan… ▽ More Large Language Models (LLMs) are proficient in natural language processing tasks, but their deployment is often restricted by extensive parameter sizes and computational demands. This paper focuses on post-training quantization (PTQ) in LLMs, specifically 4-bit weight and 8-bit activation (W4A8) quantization, to enhance computational efficiency -- a topic less explored compared to weight-only quantization. We present two innovative techniques: activation-quantization-aware scaling (AQAS) and sequence-length-aware calibration (SLAC) to enhance PTQ by considering the combined effects on weights and activations and aligning calibration sequence lengths to target tasks. Moreover, we introduce dINT, a hybrid data format combining integer and denormal representations, to address the underflow issue in W4A8 quantization, where small values are rounded to zero. Through rigorous evaluations of LLMs, including OPT and LLaMA, we demonstrate that our techniques significantly boost task accuracies to levels comparable with full-precision models. By develo** arithmetic units compatible with dINT, we further confirm that our methods yield a 2$\times$ hardware efficiency improvement compared to 8-bit integer MAC unit. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: EMNLP 2023 Main Conference

arXiv:2308.06744 [pdf, other]

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Authors: Minsoo Kim, Sihwa Lee, Janghwan Lee, Suk** Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi

Abstract: Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To cou… ▽ More Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and achieves enhanced accuracy in tasks like common-sense QA and arithmetic reasoning as well as natural language understanding. Our code is available at https://github.com/aiha-lab/TSLD. △ Less

Submitted 2 December, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

Comments: NeurIPS 2023 Camera Ready

arXiv:2305.12523 [pdf, ps, other]

Multi-Static Target Detection and Power Allocation for Integrated Sensing and Communication in Cell-Free Massive MIMO

Authors: Zinat Behdad, Özlem Tuğfe Demir, Ki Won Sung, Emil Björnson, Cicek Cavdar

Abstract: This paper studies an integrated sensing and communication (ISAC) system within a centralized cell-free massive MIMO (multiple-input multiple-output) network for target detection. ISAC transmit access points serve the user equipments in the downlink and optionally steer a beam toward the target in a multi-static sensing framework. A maximum a posteriori ratio test detector is developed for target… ▽ More This paper studies an integrated sensing and communication (ISAC) system within a centralized cell-free massive MIMO (multiple-input multiple-output) network for target detection. ISAC transmit access points serve the user equipments in the downlink and optionally steer a beam toward the target in a multi-static sensing framework. A maximum a posteriori ratio test detector is developed for target detection in the presence of clutter, so-called target-free signals. Additionally, sensing spectral efficiency (SE) is introduced as a key metric, capturing the impact of resource utilization in ISAC. A power allocation algorithm is proposed to maximize the sensing signal-to-interference-plus-noise ratio while ensuring minimum communication requirements. Two ISAC configurations are studied: utilizing existing communication beams for sensing and using additional sensing beams. The proposed algorithm's efficiency is investigated in realistic and idealistic scenarios, corresponding to the presence and absence of the target-free channels, respectively. Despite performance degradation in the presence of target-free channels, the proposed algorithm outperforms the interference-unaware benchmark, leveraging clutter statistics. Comparisons with a fully communication-centric algorithm reveal superior performance in both cluttered and clutter-free environments. The incorporation of an extra sensing beam enhances detection performance for lower radar cross-section variances. Moreover, the results demonstrate the effectiveness of the integrated operation of sensing and communication compared to an orthogonal resource-sharing approach. △ Less

Submitted 27 March, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 16 pages, 7 figures

arXiv:2302.12709 [pdf, other]

Sleep Model -- A Sequence Model for Predicting the Next Sleep Stage

Authors: Iksoo Choi, Wonyong Sung

Abstract: As sleep disorders are becoming more prevalent there is an urgent need to classify sleep stages in a less disturbing way.In particular, sleep-stage classification using simple sensors, such as single-channel electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), or electrocardiography (ECG) has gained substantial interest. In this study, we proposed a sleep model that pred… ▽ More As sleep disorders are becoming more prevalent there is an urgent need to classify sleep stages in a less disturbing way.In particular, sleep-stage classification using simple sensors, such as single-channel electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), or electrocardiography (ECG) has gained substantial interest. In this study, we proposed a sleep model that predicts the next sleep stage and used it to improve sleep classification accuracy. The sleep models were built using sleep-sequence data and employed either statistical $n$-gram or deep neural network-based models. We developed beam-search decoding to combine the information from the sensor and the sleep models. Furthermore, we evaluated the performance of the $n$-gram and long short-term memory (LSTM) recurrent neural network (RNN)-based sleep models and demonstrated the improvement of sleep-stage classification using an EOG sensor. The developed sleep models significantly improved the accuracy of sleep-stage classification, particularly in the absence of an EEG sensor. △ Less

Submitted 17 February, 2023; originally announced February 2023.

arXiv:2302.11812 [pdf, other]

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Authors: Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi

Abstract: Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity. Quantization-aware training (QAT) is a promising method to lower the implementation cost and energy consumption. However, aggressive quantization below 2-bit causes considerable accuracy degradation due to unstable convergence, especial… ▽ More Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity. Quantization-aware training (QAT) is a promising method to lower the implementation cost and energy consumption. However, aggressive quantization below 2-bit causes considerable accuracy degradation due to unstable convergence, especially when the downstream dataset is not abundant. This work proposes a proactive knowledge distillation method called Teacher Intervention (TI) for fast converging QAT of ultra-low precision pre-trained Transformers. TI intervenes layer-wise signal propagation with the intact signal from the teacher to remove the interference of propagated quantization errors, smoothing loss surface of QAT and expediting the convergence. Furthermore, we propose a gradual intervention mechanism to stabilize the recovery of subsections of Transformer layers from quantization. The proposed schemes enable fast convergence of QAT and improve the model accuracy regardless of the diverse characteristics of downstream fine-tuning tasks. We demonstrate that TI consistently achieves superior accuracy with significantly lower fine-tuning iterations on well-known Transformers of natural language processing as well as computer vision compared to the state-of-the-art QAT methods. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted to EACL 2023 (main conference)

arXiv:2301.12444 [pdf, other]

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

Authors: Kyuhong Shim, Jungwook Choi, Wonyong Sung

Abstract: Transformer-based deep neural networks have achieved great success in various sequence applications due to their powerful ability to model long-range dependency. The key module of Transformer is self-attention (SA) which extracts features from the entire sequence regardless of the distance between positions. Although SA helps Transformer performs particularly well on long-range tasks, SA requires… ▽ More Transformer-based deep neural networks have achieved great success in various sequence applications due to their powerful ability to model long-range dependency. The key module of Transformer is self-attention (SA) which extracts features from the entire sequence regardless of the distance between positions. Although SA helps Transformer performs particularly well on long-range tasks, SA requires quadratic computation and memory complexity with the input sequence length. Recently, attention map reuse, which groups multiple SA layers to share one attention map, has been proposed and achieved significant speedup for speech recognition models. In this paper, we provide a comprehensive study on attention map reuse focusing on its ability to accelerate inference. We compare the method with other SA compression techniques and conduct a breakdown analysis of its advantages for a long sequence. We demonstrate the effectiveness of attention map reuse by measuring the latency on both CPU and GPU platforms. △ Less

Submitted 29 January, 2023; originally announced January 2023.

arXiv:2212.14149 [pdf, other]

Macro-block dropout for improved regularization in training end-to-end speech recognition models

Authors: Chanwoo Kim, Sathish Indurti, **hwan Park, Wonyong Sung

Abstract: This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the… ▽ More This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN). Rather than applying dropout to each unit, we apply random dropout to each macro-block. This algorithm has the effect of applying different drop out rates for each layer even if we keep a constant average dropout rate, which has better regularization effects. In our experiments using Recurrent Neural Network-Transducer (RNN-T), this algorithm shows relatively 4.30 % and 6.13 % Word Error Rates (WERs) improvement over the conventional dropout on LibriSpeech test-clean and test-other. With an Attention-based Encoder-Decoder (AED) model, this algorithm shows relatively 4.36 % and 5.85 % WERs improvement over the conventional dropout on the same test sets. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Comments: Accepted for presentation at The 2022 IEEE Spoken Language Technology Workshop (SLT 2022)

arXiv:2210.00367 [pdf, other]

A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition

Authors: Kyuhong Shim, Wonyong Sung

Abstract: Phoneme recognition is a very important part of speech recognition that requires the ability to extract phonetic features from multiple frames. In this paper, we compare and analyze CNN, RNN, Transformer, and Conformer models using phoneme recognition. For CNN, the ContextNet model is used for the experiments. First, we compare the accuracy of various architectures under different constraints, suc… ▽ More Phoneme recognition is a very important part of speech recognition that requires the ability to extract phonetic features from multiple frames. In this paper, we compare and analyze CNN, RNN, Transformer, and Conformer models using phoneme recognition. For CNN, the ContextNet model is used for the experiments. First, we compare the accuracy of various architectures under different constraints, such as the receptive field length, parameter size, and layer depth. Second, we interpret the performance difference of these models, especially when the observable sequence length varies. Our analyses show that Transformer and Conformer models benefit from the long-range accessibility of self-attention through input frames. △ Less

Submitted 1 October, 2022; originally announced October 2022.

arXiv:2209.01864 [pdf, ps, other]

Power Allocation for Joint Communication and Sensing in Cell-Free Massive MIMO

Authors: Zinat Behdad, Özlem Tuğfe Demir, Ki Won Sung, Emil Björnson, Cicek Cavdar

Abstract: This paper studies a joint communication and sensing (JCAS) system with downlink communication and multi-static sensing for single-target detection in a cloud radio access network architecture. A centralized operation of cell-free massive MIMO is considered for communication and sensing purposes. The JCAS transmit access points (APs) jointly serve the user equipments (UEs) and optionally steer a b… ▽ More This paper studies a joint communication and sensing (JCAS) system with downlink communication and multi-static sensing for single-target detection in a cloud radio access network architecture. A centralized operation of cell-free massive MIMO is considered for communication and sensing purposes. The JCAS transmit access points (APs) jointly serve the user equipments (UEs) and optionally steer a beam towards the target. A maximum a posteriori ratio test detector is derived to detect the target using signals received at distributed APs. We propose a power allocation algorithm to maximize the sensing signal-to-noise ratio under the condition that a minimal signal-to-interference-plus-noise ratio value for each UE is guaranteed. Numerical results show that, compared to the fully communication-centric power allocation, the detection probability under a certain false alarm probability can be increased significantly by the proposed algorithm for both JCAS setups: i) using additional sensing symbols or ii) using only existing communication symbols. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: 6 pages, 5 figures, to be presented at the IEEE GLOBECOM 2022 conference

arXiv:2206.06978 [pdf, ps, other]

Low-Latency MAC Design for Pairwise Random Networks

Authors: Irshad A. Meer, Woong-Hee Lee, Mustafa Ozger, Cicek Cavdar, Ki Won Sung

Abstract: Feasibility of using unlicensed spectrum for ultra reliable low latency communications (URLLC) is still a question for beyond 5G wireless networks. Low latency access to the channel and efficiently sharing spectrum among the multiple users are the main requirements for exploiting unlicensed spectrum for URLLC. Listen before talk and back-off procedures implemented to avoid the collisions in channe… ▽ More Feasibility of using unlicensed spectrum for ultra reliable low latency communications (URLLC) is still a question for beyond 5G wireless networks. Low latency access to the channel and efficiently sharing spectrum among the multiple users are the main requirements for exploiting unlicensed spectrum for URLLC. Listen before talk and back-off procedures implemented to avoid the collisions in channel access hinder the low latency communication. In this paper, we propose a novel low-latency medium access control (MAC) scheme based on the collision resolution for a pairwise random wireless network. We use geometric sequence decomposition for collision resolution among the competing users. This enables the system to tackle collisions and thus removing the need for carrier sensing and back-off procedures. This saves time in obtaining access to the channel and improves the efficiency of the system. We implement our approach in the synchronized time slotted system and show that it yields significant improvement over existing MAC schemes. △ Less

Submitted 22 May, 2022; originally announced June 2022.

Comments: Accepted in IEEE VTC Spring 2022

arXiv:2203.10252 [pdf, ps, other]

Similarity and Content-based Phonetic Self Attention for Speech Recognition

Authors: Kyuhong Shim, Wonyong Sung

Abstract: Transformer-based speech recognition models have achieved great success due to the self-attention (SA) mechanism that utilizes every frame in the feature extraction process. Especially, SA heads in lower layers capture various phonetic characteristics by the query-key dot product, which is designed to compute the pairwise relationship between frames. In this paper, we propose a variant of SA to ex… ▽ More Transformer-based speech recognition models have achieved great success due to the self-attention (SA) mechanism that utilizes every frame in the feature extraction process. Especially, SA heads in lower layers capture various phonetic characteristics by the query-key dot product, which is designed to compute the pairwise relationship between frames. In this paper, we propose a variant of SA to extract more representative phonetic features. The proposed phonetic self-attention (phSA) is composed of two different types of phonetic attention; one is similarity-based and the other is content-based. In short, similarity-based attention captures the correlation between frames while content-based attention only considers each frame without being affected by other frames. We identify which parts of the original dot product equation are related to two different attention patterns and improve each part with simple modifications. Our experiments on phoneme classification and speech recognition show that replacing SA with phSA for lower layers improves the recognition performance without increasing the latency and the parameter size. △ Less

Submitted 11 July, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

Comments: Accepted for INTERSPEECH 2022

arXiv:2203.03583 [pdf, ps, other]

Korean Tokenization for Beam Search Rescoring in Speech Recognition

Authors: Kyuhong Shim, Hyewon Bae, Wonyong Sung

Abstract: The performance of automatic speech recognition (ASR) models can be greatly improved by proper beam-search decoding with external language model (LM). There has been an increasing interest in Korean speech recognition, but not many studies have been focused on the decoding procedure. In this paper, we propose a Korean tokenization method for neural network-based LM used for Korean ASR. Although th… ▽ More The performance of automatic speech recognition (ASR) models can be greatly improved by proper beam-search decoding with external language model (LM). There has been an increasing interest in Korean speech recognition, but not many studies have been focused on the decoding procedure. In this paper, we propose a Korean tokenization method for neural network-based LM used for Korean ASR. Although the common approach is to use the same tokenization method for external LM as the ASR model, we show that it may not be the best choice for Korean. We propose a new tokenization method that inserts a special token, SkipTC, when there is no trailing consonant in a Korean syllable. By utilizing the proposed SkipTC token, the input sequence for LM becomes very regularly patterned so that the LM can better learn the linguistic characteristics. Our experiments show that the proposed approach achieves a lower word error rate compared to the same LM model without SkipTC. In addition, we are the first to report the ASR performance for the recently introduced large-scale 7,600h Korean speech dataset. △ Less

Submitted 28 March, 2022; v1 submitted 22 February, 2022; originally announced March 2022.

Comments: Submitted to INTERSPEECH 2022

arXiv:2110.03252 [pdf, other]

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

Authors: Kyuhong Shim, Iksoo Choi, Wonyong Sung, Jungwook Choi

Abstract: While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use. Attention head pruning, which removes unnecessary attention heads in the multihead attention, is a promising technique to solve this problem. However, it does not evenly reduce the overall load because the heavy feedforward module is not affected by… ▽ More While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use. Attention head pruning, which removes unnecessary attention heads in the multihead attention, is a promising technique to solve this problem. However, it does not evenly reduce the overall load because the heavy feedforward module is not affected by head pruning. In this paper, we apply layer-wise attention head pruning on All-attention Transformer so that the entire computation and the number of parameters can be reduced proportionally to the number of pruned heads. While the architecture has the potential to fully utilize head pruning, we propose three training methods that are especially helpful to minimize performance degradation and stabilize the pruning process. Our pruned model shows consistently lower perplexity within a comparable parameter size than Transformer-XL on WikiText-103 language modeling benchmark. △ Less

Submitted 7 October, 2021; originally announced October 2021.

arXiv:2108.13039

An Interpretable Web-based Glioblastoma Multiforme Prognosis Prediction Tool using Random Forest Model

Authors: Yeseul Kim, Kyung Hwan Kim, Junyoung Park, Hong In Yoon, Wonmo Sung

Abstract: We propose predictive models that estimate GBM patients' health status of one-year after treatments (Classification task), predict the long-term prognosis of GBM patients at an individual level (Survival task). We used total of 467 GBM patients' clinical profile consists of 13 features and two follow-up dates. For baseline models of random forest classifier(RFC) and random survival forest model (R… ▽ More We propose predictive models that estimate GBM patients' health status of one-year after treatments (Classification task), predict the long-term prognosis of GBM patients at an individual level (Survival task). We used total of 467 GBM patients' clinical profile consists of 13 features and two follow-up dates. For baseline models of random forest classifier(RFC) and random survival forest model (RSF), we introduced generalized linear model (GLM), support vector machine (SVM) and Cox proportional hazardous model (COX), accelerated failure time model (AFT) respectively. After preprocessing and prefixing stratified 5-fold data set, we generated best performing models for model types using recursive feature elimination process. Total 10, 4, and 13 features were extracted for best performing one-year survival/progression status RFC models and RSF model via the recursive feature elimination process. In classification task, AUROC of best performing RFC recorded 0.6990 (for one-year survival status classification) and 0.7076 (for one-year progression classification) while that of second best baseline models (GLM in both cases) recorded 0.6691 and 0.6997 respectively. About survival task, the highest C-index of 0.7157 and the lowest IBS of 0.1038 came from the best performing RSF model while that of second best baseline models were 0.6556 and 0.1139 respectively. A simplified linear correlation (extracted from LIME and virtual patient group analysis) between each feature and prognosis of GBM patient were consistent with proven medical knowledge. Our machine learning models suggest that the top three prognostic factors for GBM patient survival were MGMT gene promoter, the extent of resection, and age. To the best of our knowledge, this study is the very first study introducing a interpretable and medical knowledge consistent GBM prognosis predictive models. △ Less

Submitted 8 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: This version of preprint has some issues regarding methods and results

arXiv:2104.04215 [pdf, other]

doi 10.1109/LWC.2021.3123314

Sparse Channel Estimation in Wideband Systems with Geometric Sequence Decomposition

Authors: Woong-Hee Lee, Ki Won Sung

Abstract: The sparsity of multipaths in the wideband channel has motivated the use of compressed sensing for channel estimation. In this letter, we propose a different approach to sparse channel estimation. We exploit the fact that $L$ taps of channel impulse response in time domain constitute a non-orthogonal superposition of $L$ geometric sequences in frequency domain. This converts the channel estimation… ▽ More The sparsity of multipaths in the wideband channel has motivated the use of compressed sensing for channel estimation. In this letter, we propose a different approach to sparse channel estimation. We exploit the fact that $L$ taps of channel impulse response in time domain constitute a non-orthogonal superposition of $L$ geometric sequences in frequency domain. This converts the channel estimation problem into the extraction of the parameters of geometric sequences. Numerical results show that the proposed scheme is superior to existing algorithms in high signal-to-noise ratio (SNR) and large bandwidth conditions. △ Less

Submitted 24 October, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

arXiv:2101.07937 [pdf, other]

doi 10.1109/LCOMM.2021.3091800

Noise Learning Based Denoising Autoencoder

Authors: Woong-Hee Lee, Mustafa Ozger, Ursula Challita, Ki Won Sung

Abstract: This letter introduces a new denoiser that modifies the structure of denoising autoencoder (DAE), namely noise learning based DAE (nlDAE). The proposed nlDAE learns the noise of the input data. Then, the denoising is performed by subtracting the regenerated noise from the noisy input. Hence, nlDAE is more effective than DAE when the noise is simpler to regenerate than the original data. To validat… ▽ More This letter introduces a new denoiser that modifies the structure of denoising autoencoder (DAE), namely noise learning based DAE (nlDAE). The proposed nlDAE learns the noise of the input data. Then, the denoising is performed by subtracting the regenerated noise from the noisy input. Hence, nlDAE is more effective than DAE when the noise is simpler to regenerate than the original data. To validate the performance of nlDAE, we provide three case studies: signal restoration, symbol demodulation, and precise localization. Numerical results suggest that nlDAE requires smaller latent space dimension and smaller training dataset compared to DAE. △ Less

Submitted 21 June, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

arXiv:2009.14502 [pdf, other]

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

Authors: Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung

Abstract: The quantization of deep neural networks (QDNNs) has been actively studied for deployment in edge devices. Recent studies employ the knowledge distillation (KD) method to improve the performance of quantized networks. In this study, we propose stochastic precision ensemble training for QDNNs (SPEQ). SPEQ is a knowledge distillation training scheme; however, the teacher is formed by sharing the mod… ▽ More The quantization of deep neural networks (QDNNs) has been actively studied for deployment in edge devices. Recent studies employ the knowledge distillation (KD) method to improve the performance of quantized networks. In this study, we propose stochastic precision ensemble training for QDNNs (SPEQ). SPEQ is a knowledge distillation training scheme; however, the teacher is formed by sharing the model parameters of the student network. We obtain the soft labels of the teacher by changing the bit precision of the activation stochastically at each layer of the forward-pass computation. The student model is trained with these soft labels to reduce the activation quantization noise. The cosine similarity loss is employed, instead of the KL-divergence, for KD training. As the teacher model changes continuously by random bit-precision assignment, it exploits the effect of stochastic ensemble KD. SPEQ outperforms the existing quantization training methods in various tasks, such as image classification, question-answering, and transfer learning without the need for cumbersome teacher networks. △ Less

Submitted 30 September, 2020; originally announced September 2020.

arXiv:2009.02479 [pdf, other]

S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

Authors: Wonyong Sung, Iksoo Choi, **hwan Park, Seokhyun Choi, Sungho Shin

Abstract: The stochastic gradient descent (SGD) method is most widely used for deep neural network (DNN) training. However, the method does not always converge to a flat minimum of the loss surface that can demonstrate high generalization capability. Weight noise injection has been extensively studied for finding flat minima using the SGD method. We devise a new weight-noise injection-based SGD method that… ▽ More The stochastic gradient descent (SGD) method is most widely used for deep neural network (DNN) training. However, the method does not always converge to a flat minimum of the loss surface that can demonstrate high generalization capability. Weight noise injection has been extensively studied for finding flat minima using the SGD method. We devise a new weight-noise injection-based SGD method that adds symmetrical noises to the DNN weights. The training with symmetrical noise evaluates the loss surface at two adjacent points, by which convergence to sharp minima can be avoided. Fixed-magnitude symmetric noises are added to minimize training instability. The proposed method is compared with the conventional SGD method and previous weight-noise injection algorithms using convolutional neural networks for image classification. Particularly, performance improvements in large batch training are demonstrated. This method shows superior performance compared with conventional SGD and weight-noise injection methods regardless of the batch-size and learning rate scheduling algorithms. △ Less

Submitted 5 September, 2020; originally announced September 2020.

arXiv:2007.09102 [pdf, other]

Breaking Moravec's Paradox: Visual-Based Distribution in Smart Fashion Retail

Authors: Shin Woong Sung, Hyunsuk Baek, Hyeonjun Sim, Eun Hie Kim, Hyunwoo Hwangbo, Young Jae Jang

Abstract: In this paper, we report an industry-academia collaborative study on the distribution method of fashion products using an artificial intelligence (AI) technique combined with an optimization method. To meet the current fashion trend of short product lifetimes and an increasing variety of styles, the company produces limited volumes of a large variety of styles. However, due to the limited volume o… ▽ More In this paper, we report an industry-academia collaborative study on the distribution method of fashion products using an artificial intelligence (AI) technique combined with an optimization method. To meet the current fashion trend of short product lifetimes and an increasing variety of styles, the company produces limited volumes of a large variety of styles. However, due to the limited volume of each style, some styles may not be distributed to some off-line stores. As a result, this high-variety, low-volume strategy presents another challenge to distribution managers. We collaborated with KOLON F/C, one of the largest fashion business units in South Korea, to develop models and an algorithm to optimally distribute the products to the stores based on the visual images of the products. The team developed a deep learning model that effectively represents the styles of clothes based on their visual image. Moreover, the team created an optimization model that effectively determines the product mix for each store based on the image representation of clothes. In the past, computers were only considered to be useful for conducting logical calculations, and visual perception and cognition were considered to be difficult computational tasks. The proposed approach is significant in that it uses both AI (perception and cognition) and mathematical optimization (logical calculation) to address a practical supply chain problem, which is why the study was called "Breaking Moravec's Paradox." △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: 10 pages, 19 figures, The fifth international workshop on fashion and KDD, KDD 2020

arXiv:2006.01561 [pdf, other]

Studying The Effect of MIL Pooling Filters on MIL Tasks

Authors: Mustafa Umit Oner, Jared Marc Song Kye-Jet, Hwee Kuan Lee, Wing-Kin Sung

Abstract: There are different multiple instance learning (MIL) pooling filters used in MIL models. In this paper, we study the effect of different MIL pooling filters on the performance of MIL models in real world MIL tasks. We designed a neural network based MIL framework with 5 different MIL pooling filters: `max', `mean', `attention', `distribution' and `distribution with attention'. We also formulated 5… ▽ More There are different multiple instance learning (MIL) pooling filters used in MIL models. In this paper, we study the effect of different MIL pooling filters on the performance of MIL models in real world MIL tasks. We designed a neural network based MIL framework with 5 different MIL pooling filters: `max', `mean', `attention', `distribution' and `distribution with attention'. We also formulated 5 different MIL tasks on a real world lymph node metastases dataset. We found that the performance of our framework in a task is different for different filters. We also observed that the performances of the five pooling filters are also different from task to task. Hence, the selection of a correct MIL pooling filter for each MIL task is crucial for better performance. Furthermore, we noticed that models with `distribution' and `distribution with attention' pooling filters consistently perform well in almost all of the tasks. We attribute this phenomena to the amount of information captured by `distribution' based pooling filters. While point estimate based pooling filters, like `max' and `mean', produce point estimates of distributions, `distribution' based pooling filters capture the full information in distributions. Lastly, we compared the performance of our neural network model with `distribution' pooling filter with the performance of the best MIL methods in the literature on classical MIL datasets and our model outperformed the others. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Comments: 16 pages

arXiv:2006.00530 [pdf, other]

Quantized Neural Networks: Characterization and Holistic Optimization

Authors: Yoonho Boo, Sungho Shin, Wonyong Sung

Abstract: Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on develo** optimization methods for the quantization of given models. However, quantization sensitivity depends on the model architecture. Therefore, the model selection needs to be a part of the QDNN design process. Also, the characteristics of weight… ▽ More Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on develo** optimization methods for the quantization of given models. However, quantization sensitivity depends on the model architecture. Therefore, the model selection needs to be a part of the QDNN design process. Also, the characteristics of weight and activation quantization are quite different. This study proposes a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design. Synthesized data is used to visualize the effects of weight and activation quantization. The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization. This study can provide insight into better optimization of QDNNs. △ Less

Submitted 31 May, 2020; originally announced June 2020.

arXiv:2002.00343 [pdf, other]

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

Authors: Sungho Shin, Yoonho Boo, Wonyong Sung

Abstract: Designing a deep neural network (DNN) with good generalization capability is a complex process especially when the weights are severely quantized. Model averaging is a promising approach for achieving the good generalization capability of DNNs, especially when the loss surface for training contains many sharp minima. We present a new quantized neural network optimization approach, stochastic quant… ▽ More Designing a deep neural network (DNN) with good generalization capability is a complex process especially when the weights are severely quantized. Model averaging is a promising approach for achieving the good generalization capability of DNNs, especially when the loss surface for training contains many sharp minima. We present a new quantized neural network optimization approach, stochastic quantized weight averaging (SQWA), to design low-precision DNNs with good generalization capability using model averaging. The proposed approach includes (1) floating-point model training, (2) direct quantization of weights, (3) capturing multiple low-precision models during retraining with cyclical learning rates, (4) averaging the captured models, and (5) re-quantizing the averaged model and fine-tuning it with low-learning rates. Additionally, we present a loss-visualization technique on the quantized weight domain to clearly elucidate the behavior of the proposed method. Visualization results indicate that a quantized DNN (QDNN) optimized with the proposed approach is located near the center of the flat minimum in the loss surface. With SQWA training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and ImageNet datasets. Although we only employed a uniform quantization scheme for the sake of implementation in VLSI or low-precision neural processing units, the performance achieved exceeded those of previous studies employing non-uniform quantization. △ Less

Submitted 2 February, 2020; originally announced February 2020.

arXiv:1910.14412 [pdf, other]

doi 10.1109/TCOMM.2020.3028876

Geometric Sequence Decomposition with $k$-simplexes Transform

Authors: Woong-Hee Lee, Jong-Ho Lee, Ki Won Sung

Abstract: This paper presents a computationally efficient technique for decomposing non-orthogonally superposed $k$ geometric sequences. The method, which is named as geometric sequence decomposition with $k$-simplexes transform (GSD-ST), is based on the concept of transforming an observed sequence to multiple $k$-simplexes in a virtual $k$-dimensional space and correlating the volumes of the transformed si… ▽ More This paper presents a computationally efficient technique for decomposing non-orthogonally superposed $k$ geometric sequences. The method, which is named as geometric sequence decomposition with $k$-simplexes transform (GSD-ST), is based on the concept of transforming an observed sequence to multiple $k$-simplexes in a virtual $k$-dimensional space and correlating the volumes of the transformed simplexes. Hence, GSD-ST turns the problem of decomposing $k$ geometric sequences into one of solving a $k$-th order polynomial equation. Our technique has significance for wireless communications because sampled points of a radio wave comprise a geometric sequence. This implies that GSD-ST is capable of demodulating randomly combined radio waves, thereby eliminating the effect of interference. To exemplify the potential of GSD-ST, we propose a new radio access scheme, namely non-orthogonal interference-free radio access (No-INFRA). Herein, GSD-ST enables the collision-free reception of uncoordinated access requests. Numerical results show that No-INFRA effectively resolves the colliding access requests when the interference is dominant. △ Less

Submitted 6 August, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

arXiv:1909.01688 [pdf, other]

Knowledge distillation for optimization of quantized deep neural networks

Authors: Sungho Shin, Yoonho Boo, Wonyong Sung

Abstract: Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique is exploited for quantized deep neural networks (QDNNs) training as a way to restore the performance sacrificed by word-length reduction. KD, however, employs additional hyper-parameters, such as temperature, coefficient, and the size of teacher network for QDNN training. We analyze the effect of… ▽ More Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique is exploited for quantized deep neural networks (QDNNs) training as a way to restore the performance sacrificed by word-length reduction. KD, however, employs additional hyper-parameters, such as temperature, coefficient, and the size of teacher network for QDNN training. We analyze the effect of these hyper-parameters for QDNN optimization with KD. We find that these hyper-parameters are inter-related, and also introduce a simple and effective technique that reduces \textit{coefficient} during training. With KD employing the proposed hyper-parameters, we achieve the test accuracy of 92.7% and 67.0% on Resnet20 with 2-bit ternary weights for CIFAR-10 and CIFAR-100 data sets, respectively. △ Less

Submitted 23 October, 2019; v1 submitted 4 September, 2019; originally announced September 2019.

arXiv:1906.07647 [pdf, other]

Weakly Supervised Clustering by Exploiting Unique Class Count

Authors: Mustafa Umit Oner, Hwee Kuan Lee, Wing-Kin Sung

Abstract: A weakly supervised learning based clustering framework is proposed in this paper. As the core of this framework, we introduce a novel multiple instance learning task based on a bag level label called unique class count ($ucc$), which is the number of unique classes among all instances inside the bag. In this task, no annotations on individual instances inside the bag are needed during training of… ▽ More A weakly supervised learning based clustering framework is proposed in this paper. As the core of this framework, we introduce a novel multiple instance learning task based on a bag level label called unique class count ($ucc$), which is the number of unique classes among all instances inside the bag. In this task, no annotations on individual instances inside the bag are needed during training of the models. We mathematically prove that with a perfect $ucc$ classifier, perfect clustering of individual instances inside the bags is possible even when no annotations on individual instances are given during training. We have constructed a neural network based $ucc$ classifier and experimentally shown that the clustering performance of our framework with our weakly supervised $ucc$ classifier is comparable to that of fully supervised learning models where labels for all instances are known. Furthermore, we have tested the applicability of our framework to a real world task of semantic segmentation of breast cancer metastases in histological lymph node sections and shown that the performance of our weakly supervised framework is comparable to the performance of a fully supervised Unet model. △ Less

Submitted 25 January, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: Published as a conference paper at ICLR 2020

arXiv:1902.04178 [pdf]

doi 10.1109/AIKE.2018.00055

Stochastic Reinforcement Learning

Authors: Nikki Li**g Kuang, Clement H. C. Leung, Vienne W. K. Sung

Abstract: In reinforcement learning episodes, the rewards and punishments are often non-deterministic, and there are invariably stochastic elements governing the underlying situation. Such stochastic elements are often numerous and cannot be known in advance, and they have a tendency to obscure the underlying rewards and punishments patterns. Indeed, if stochastic elements were absent, the same outcome woul… ▽ More In reinforcement learning episodes, the rewards and punishments are often non-deterministic, and there are invariably stochastic elements governing the underlying situation. Such stochastic elements are often numerous and cannot be known in advance, and they have a tendency to obscure the underlying rewards and punishments patterns. Indeed, if stochastic elements were absent, the same outcome would occur every time and the learning problems involved could be greatly simplified. In addition, in most practical situations, the cost of an observation to receive either a reward or punishment can be significant, and one would wish to arrive at the correct learning conclusion by incurring minimum cost. In this paper, we present a stochastic approach to reinforcement learning which explicitly models the variability present in the learning environment and the cost of observation. Criteria and rules for learning success are quantitatively analyzed, and probabilities of exceeding the observation cost bounds are also obtained. △ Less

Submitted 11 February, 2019; originally announced February 2019.

Comments: AIKE 2018

arXiv:1901.07647 [pdf, other]

Understanding Geometry of Encoder-Decoder CNNs

Authors: Jong Chul Ye, Woon Kyoung Sung

Abstract: Encoder-decoder networks using convolutional neural network (CNN) architecture have been extensively used in deep learning literatures thanks to its excellent performance for various inverse problems. However, it is still difficult to obtain coherent geometric view why such an architecture gives the desired performance. Inspired by recent theoretical understanding on generalizability, expressivity… ▽ More Encoder-decoder networks using convolutional neural network (CNN) architecture have been extensively used in deep learning literatures thanks to its excellent performance for various inverse problems. However, it is still difficult to obtain coherent geometric view why such an architecture gives the desired performance. Inspired by recent theoretical understanding on generalizability, expressivity and optimization landscape of neural networks, as well as the theory of convolutional framelets, here we provide a unified theoretical framework that leads to a better understanding of geometry of encoder-decoder CNNs. Our unified mathematical framework shows that encoder-decoder CNN architecture is closely related to nonlinear basis representation using combinatorial convolution frames, whose expressibility increases exponentially with the network depth. We also demonstrate the importance of skipped connection in terms of expressibility, and optimization landscape. △ Less

Submitted 7 May, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

Comments: Accepted to ICML 2019

arXiv:1812.00819 [pdf, other]

Fast and Reliable Initial Access with Random Beamforming for mmWave Networks

Authors: Yanpeng Yang, Hossein S. Ghadikolaei, Carlo Fischione, Marina Petrova, Ki Won Sung

Abstract: Millimeter-wave (mmWave) communications rely on directional transmissions to overcome severe path loss. Nevertheless, the use of narrow beams complicates the initial access procedure and increase the latency as the transmitter and receiver beams should be aligned for a proper link establishment. In this paper, we investigate the feasibility of random beamforming for the cell-search phase of initia… ▽ More Millimeter-wave (mmWave) communications rely on directional transmissions to overcome severe path loss. Nevertheless, the use of narrow beams complicates the initial access procedure and increase the latency as the transmitter and receiver beams should be aligned for a proper link establishment. In this paper, we investigate the feasibility of random beamforming for the cell-search phase of initial access. We develop a stochastic geometry framework to analyze the performance in terms of detection failure probability and expected latency of initial access as well as total data transmission. Meanwhile, we compare our scheme with the widely used exhaustive search and iterative search schemes, in both control plane and data plane. Our numerical results show that, compared to the other two schemes, random beamforming can substantially reduce the latency of initial access with comparable failure probability in dense networks. We show that the gain of the random beamforming is more prominent in light traffics and low-latency services. Our work demonstrates that develo** complex cell-discovery algorithms may be unnecessary in dense mmWave networks and thus shed new lights on mmWave network design. △ Less

Submitted 30 November, 2018; originally announced December 2018.

Comments: 29 pages, 7 figures. arXiv admin note: text overlap with arXiv:1802.06450

arXiv:1811.01532 [pdf, other]

Workload-aware Automatic Parallelization for Multi-GPU DNN Training

Authors: Sungho Shin, Youngmin Jo, Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Wonyong Sung

Abstract: Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to ha… ▽ More Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to have an in-depth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straight-forward way without optimizing GPU utilization. In this work, we propose a workload-aware auto-parallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG-16), and show competitive training throughput compared with the state-of-the-art frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload's compute requirements, thereby improving energy efficiency. △ Less

Submitted 6 February, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: This paper is accepted in ICASSP2019

arXiv:1803.11389 [pdf, other]

Single Stream Parallelization of Recurrent Neural Networks for Low Power and Fast Inference

Authors: Wonyong Sung, **hwan Park

Abstract: As neural network algorithms show high performance in many applications, their efficient inference on mobile and embedded systems are of great interests. When a single stream recurrent neural network (RNN) is executed for a personal user in embedded systems, it demands a large amount of DRAM accesses because the network size is usually much bigger than the cache size and the weights of an RNN are… ▽ More As neural network algorithms show high performance in many applications, their efficient inference on mobile and embedded systems are of great interests. When a single stream recurrent neural network (RNN) is executed for a personal user in embedded systems, it demands a large amount of DRAM accesses because the network size is usually much bigger than the cache size and the weights of an RNN are used only once at each time step. We overcome this problem by parallelizing the algorithm and executing it multiple time steps at a time. This approach also reduces the power consumption by lowering the number of DRAM accesses. QRNN (Quasi Recurrent Neural Networks) and SRU (Simple Recurrent Unit) based recurrent neural networks are used for implementation. The experiments for SRU showed about 300% and 930% of speed-up when the numbers of multi time steps are 4 and 16, respectively, in an ARM CPU based system. △ Less

Submitted 30 March, 2018; originally announced March 2018.

Comments: Submitted to International Conference on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS) 2018

arXiv:1802.06450 [pdf, other]

Reducing Initial Cell-search Latency in mmWave Networks

Authors: Yanpeng Yang, Hossein S. Ghadikolaei, Carlo Fischione, Marina Petrova, Ki Won Sung

Abstract: Millimeter-wave (mmWave) networks rely on directional transmissions, in both control plane and data plane, to overcome severe path-loss. Nevertheless, the use of narrow beams complicates the initial cell-search procedure where we lack sufficient information for beamforming. In this paper, we investigate the feasibility of random beamforming for cell-search. We develop a stochastic geometry framewo… ▽ More Millimeter-wave (mmWave) networks rely on directional transmissions, in both control plane and data plane, to overcome severe path-loss. Nevertheless, the use of narrow beams complicates the initial cell-search procedure where we lack sufficient information for beamforming. In this paper, we investigate the feasibility of random beamforming for cell-search. We develop a stochastic geometry framework to analyze the performance in terms of failure probability and expected latency of cell-search. Meanwhile, we compare our results with the naive, but heavily used, exhaustive search scheme. Numerical results show that, for a given discovery failure probability, random beamforming can substantially reduce the latency of exhaustive search, especially in dense networks. Our work demonstrates that develo** complex cell-discovery algorithms may be unnecessary in dense mmWave networks and thus shed new lights on mmWave system design. △ Less

Submitted 18 February, 2018; originally announced February 2018.

Comments: 6 pages, 5 figures, accepted by mmSys workshop at Infocom 2018

arXiv:1712.10082 [pdf, other]

Application of Convolutional Neural Network to Predict Airfoil Lift Coefficient

Authors: Yao Zhang, Woong-Je Sung, Dimitri Mavris

Abstract: The adaptability of the convolutional neural network (CNN) technique for aerodynamic meta-modeling tasks is probed in this work. The primary objective is to develop suitable CNN architecture for variable flow conditions and object geometry, in addition to identifying a sufficient data preparation process. Multiple CNN structures were trained to learn the lift coefficients of the airfoils with a va… ▽ More The adaptability of the convolutional neural network (CNN) technique for aerodynamic meta-modeling tasks is probed in this work. The primary objective is to develop suitable CNN architecture for variable flow conditions and object geometry, in addition to identifying a sufficient data preparation process. Multiple CNN structures were trained to learn the lift coefficients of the airfoils with a variety of shapes in multiple flow Mach numbers, Reynolds numbers, and diverse angles of attack. This is conducted to illustrate the concept of the technique. A multi-layered perceptron (MLP) is also used for the training sets. The MLP results are compared with that of the CNN results. The newly proposed meta-modeling concept has been found to be comparable with the MLP in learning capability; and more importantly, our CNN model exhibits a competitive prediction accuracy with minimal constraints in a geometric representation. △ Less

Submitted 16 January, 2018; v1 submitted 28 December, 2017; originally announced December 2017.

arXiv:1712.00540 [pdf, other]

Millimeter-Wave Interference Avoidance via Building-Aware Associations

Authors: Jeemin Kim, Jihong Park, Seunghwan Kim, Seong-Lyun Kim, Ki Won Sung, Kwang Soon Kim

Abstract: Signal occlusion by building blockages is a double-edged sword for the performance of millimeter-wave (mmW) communication networks. Buildings may dominantly attenuate the useful signals, especially when mmW base stations (BSs) are sparsely deployed compared to the building density. In the opposite BS deployment, buildings can block the undesired interference. To enjoy only the benefit, we propose… ▽ More Signal occlusion by building blockages is a double-edged sword for the performance of millimeter-wave (mmW) communication networks. Buildings may dominantly attenuate the useful signals, especially when mmW base stations (BSs) are sparsely deployed compared to the building density. In the opposite BS deployment, buildings can block the undesired interference. To enjoy only the benefit, we propose a building-aware association scheme that adjusts the directional BS association bias of the user equipments (UEs), based on a given building density and the concentration of UE locations around the buildings. The association of each BS can thereby be biased: (i) toward the UEs located against buildings for avoiding interference to other UEs; or (ii) toward the UEs providing their maximum reference signal received powers (RSRPs). The proposed association scheme is optimized to maximize the downlink average data rate derived by stochastic geometry. Its effectiveness is validated by simulation using real building statistics. △ Less

Submitted 1 December, 2017; originally announced December 2017.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:1710.04361 [pdf, ps, other]

Linear Programming Bounds for Distributed Storage Codes

Authors: Ali Tebbi, Terence H. Chan, Chi Wan Sung

Abstract: A major issue of locally repairable codes is their robustness. If a local repair group is not able to perform the repair process, this will result in increasing the repair cost. Therefore, it is critical for a locally repairable code to have multiple repair groups. In this paper we consider robust locally repairable coding schemes which guarantee that there exist multiple distinct (not necessarily… ▽ More A major issue of locally repairable codes is their robustness. If a local repair group is not able to perform the repair process, this will result in increasing the repair cost. Therefore, it is critical for a locally repairable code to have multiple repair groups. In this paper we consider robust locally repairable coding schemes which guarantee that there exist multiple distinct (not necessarily disjoint) alternative local repair groups for any single failure such that the failed node can still be repaired locally even if some of the repair groups are not available. We use linear programming techniques to establish upper bounds on the code size of these codes. We also provide two examples of robust locally repairable codes that are optimal regarding our linear programming bound. Furthermore, we address the update efficiency problem of the distributed data storage networks. Any modification on the stored data will result in updating the content of the storage nodes. Therefore, it is essential to minimise the number of nodes which need to be updated by any change in the stored data. We characterise the update-efficient storage code properties and establish the necessary conditions of existence update-efficient locally repairable storage codes. △ Less

Submitted 6 April, 2019; v1 submitted 12 October, 2017; originally announced October 2017.

arXiv:1709.08928 [pdf, ps, other]

Multi-Rack Distributed Data Storage Networks

Authors: Ali Tebbi, Terence H. Chan, Chi Wan Sung

Abstract: The majority of works in distributed storage networks assume a simple network model with a collection of identical storage nodes with the same communication cost between the nodes. In this paper, we consider a realistic multi-rack distributed data storage network and present a code design framework for this model. Considering the cheaper data transmission within the racks, our code construction me… ▽ More The majority of works in distributed storage networks assume a simple network model with a collection of identical storage nodes with the same communication cost between the nodes. In this paper, we consider a realistic multi-rack distributed data storage network and present a code design framework for this model. Considering the cheaper data transmission within the racks, our code construction method is able to locally repair the nodes failure within the same rack by using only the survived nodes in the same rack. However, in the case of severe failure patterns when the information content of the survived nodes is not sufficient to repair the failures, other racks will participate in the repair process. By employing the criteria of our multi-rack storage code, we establish a linear programming bound on the size of the code in order to maximize the code rate. △ Less

Submitted 7 March, 2019; v1 submitted 26 September, 2017; originally announced September 2017.

arXiv:1709.06281 [pdf, ps, other]

Coded Caching in Partially Cooperative D2D Communication Networks

Authors: Ali Tebbi, Chi Wan Sung

Abstract: The backhaul traffic is becoming a major concern in wireless and cellular networks (e.g., 4G-LTE and 5G) with the increasing demands for online video streaming. Caching the popular content in the cache memory of the network users (e.g., mobile devices) is an effective technique to reduce the traffic during the networks' peak time. However, due to the dynamic nature of these networks, users privacy… ▽ More The backhaul traffic is becoming a major concern in wireless and cellular networks (e.g., 4G-LTE and 5G) with the increasing demands for online video streaming. Caching the popular content in the cache memory of the network users (e.g., mobile devices) is an effective technique to reduce the traffic during the networks' peak time. However, due to the dynamic nature of these networks, users privacy settings, or energy limitations, some users may not be available or intend to participate during the caching procedures. In this paper, we propose caching schemes for device-to-device communication networks where a group of users show selfish characteristics. The selfish users along with the non-selfish users will cache the popular content, but will not share their useful cache content with the other users to satisfy a user request. We show that our proposed schemes are able to satisfy any arbitrary user requests under partial cooperation of the network users. △ Less

Submitted 19 September, 2017; originally announced September 2017.

arXiv:1707.03684 [pdf, other]

Structured Sparse Ternary Weight Coding of Deep Neural Networks for Efficient Hardware Implementations

Authors: Yoonho Boo, Wonyong Sung

Abstract: Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain a large number of weights, thus they usually need many off-chip memory accesses for inference. We propose a weight compression method for deep neural networks, which allows values of +1 or -1 only at predetermined positions of the weights so that decoding usin… ▽ More Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain a large number of weights, thus they usually need many off-chip memory accesses for inference. We propose a weight compression method for deep neural networks, which allows values of +1 or -1 only at predetermined positions of the weights so that decoding using a table can be conducted easily. For example, the structured sparse (8,2) coding allows at most two non-zero values among eight weights. This method not only enables multiplication-free DNN implementations but also compresses the weight storage by up to x32 compared to floating-point networks. Weight distribution normalization and gradual pruning techniques are applied to mitigate the performance degradation. The experiments are conducted with fully-connected deep neural networks and convolutional neural networks. △ Less

Submitted 1 July, 2017; originally announced July 2017.

Comments: This paper is accepted in SIPS 2017

arXiv:1707.01996 [pdf, ps, other]

Capacity of Wireless Distributed Storage Systems with Broadcast Repair

Authors: ** Hu, Chi Wan Sung, Terence H. Chan

Abstract: In wireless distributed storage systems, storage nodes are connected by wireless channels, which are broadcast in nature. This paper exploits this unique feature to design an efficient repair mechanism, called broadcast repair, for wireless distributed storage systems in the presence of multiple-node failures. Due to the broadcast nature of wireless transmission, we advocate a new measure on repai… ▽ More In wireless distributed storage systems, storage nodes are connected by wireless channels, which are broadcast in nature. This paper exploits this unique feature to design an efficient repair mechanism, called broadcast repair, for wireless distributed storage systems in the presence of multiple-node failures. Due to the broadcast nature of wireless transmission, we advocate a new measure on repair performance called repair-transmission bandwidth. In contrast to repair bandwidth, which measures the average number of packets downloaded by a newcomer to replace a failed node, repair-transmission bandwidth measures the average number of packets transmitted by helper nodes per failed node. A fundamental study on the storage capacity of wireless distributed storage systems with broadcast repair is conducted by modeling the storage system as a multicast network and analyzing the minimum cut of the corresponding information flow graph. The fundamental tradeoff between storage efficiency and repair-transmission bandwidth is also obtained for functional repair. The performance of broadcast repair is compared both analytically and numerically with that of cooperative repair, the basic repair method for wired distributed storage systems with multiple-node failures. While cooperative repair is based on the idea of allowing newcomers to exchange packets, broadcast repair is based on the idea of allowing a helper to broadcast packets to all newcomers simultaneously. We show that broadcast repair outperforms cooperative repair, offering a better tradeoff between storage efficiency and repair-transmission bandwidth. △ Less

Submitted 6 July, 2017; originally announced July 2017.

Comments: 28 pages, 7 figures

arXiv:1705.10548 [pdf, other]

A Faster Construction of Greedy Consensus Trees

Authors: Paweł Gawrychowski, Gad M. Landau, Wing-Kin Sung, Oren Weimann

Abstract: A consensus tree is a phylogenetic tree that captures the similarity between a set of conflicting phylogenetic trees. The problem of computing a consensus tree is a major step in phylogenetic tree reconstruction. It also finds applications in predicting a species tree from a set of gene trees. This paper focuses on two of the most well-known and widely used oconsensus tree methods: the greedy cons… ▽ More A consensus tree is a phylogenetic tree that captures the similarity between a set of conflicting phylogenetic trees. The problem of computing a consensus tree is a major step in phylogenetic tree reconstruction. It also finds applications in predicting a species tree from a set of gene trees. This paper focuses on two of the most well-known and widely used oconsensus tree methods: the greedy consensus tree and the frequency difference consensus tree. Given $k$ conflicting trees each with $n$ leaves, the previous fastest algorithms for these problems were $O(k n^2)$ for the greedy consensus tree [J. ACM 2016] and $\tilde O(\min \{ k n^2, k^2n\})$ for the frequency difference consensus tree [ACM TCBB 2016]. We improve these running times to $\tilde O(k n^{1.5})$ and $\tilde O(k n)$ respectively. △ Less

Submitted 4 July, 2017; v1 submitted 30 May, 2017; originally announced May 2017.

arXiv:1705.04022 [pdf, ps, other]

Faster algorithms for 1-mappability of a sequence

Authors: Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, Jakub Radoszewski, Wing-Kin Sung

Abstract: In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present t… ▽ More In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present two algorithms that require worst-case time O(mn) and O(n log^2 n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present an algorithm that requires average-case time and space O(n) for integer alphabets if m = Ω(log n/ log σ), where σ is the alphabet size. △ Less

Submitted 11 May, 2017; originally announced May 2017.

arXiv:1702.08171 [pdf, ps, other]

Fixed-point optimization of deep neural networks with adaptive step size retraining

Authors: Sungho Shin, Yoonho Boo, Wonyong Sung

Abstract: Fixed-point optimization of deep neural networks plays an important role in hardware based design and low-power implementations. Many deep neural networks show fairly good performance even with 2- or 3-bit precision when quantized weights are fine-tuned by retraining. We propose an improved fixedpoint optimization algorithm that estimates the quantization step size dynamically during the retrainin… ▽ More Fixed-point optimization of deep neural networks plays an important role in hardware based design and low-power implementations. Many deep neural networks show fairly good performance even with 2- or 3-bit precision when quantized weights are fine-tuned by retraining. We propose an improved fixedpoint optimization algorithm that estimates the quantization step size dynamically during the retraining. In addition, a gradual quantization scheme is also tested, which sequentially applies fixed-point optimizations from high- to low-precision. The experiments are conducted for feed-forward deep neural networks (FFDNNs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs). △ Less

Submitted 27 February, 2017; originally announced February 2017.

Comments: This paper is accepted in ICASSP 2017

arXiv:1702.06045 [pdf, other]

Joint Transmission with Dummy Symbols for Dynamic TDD in Ultra-Dense Deployments

Authors: Haris Celik, Ki Won Sung

Abstract: Dynamic time-division duplexing (TDD) is considered a promising solution to deal with fast-varying traffic often found in ultra-densely deployed networks. At the same time, it generates more interference which may degrade the performance of some user equipment (UE). When base station (BS) utilization is low, some BSs may not have an UE to serve. Rather than going into sleep mode, the idle BSs can… ▽ More Dynamic time-division duplexing (TDD) is considered a promising solution to deal with fast-varying traffic often found in ultra-densely deployed networks. At the same time, it generates more interference which may degrade the performance of some user equipment (UE). When base station (BS) utilization is low, some BSs may not have an UE to serve. Rather than going into sleep mode, the idle BSs can help nearby UEs using joint transmission. To deal with BS-to-BS interference, we propose using joint transmission with dummy symbols where uplink BSs serving uplink UEs participate in the precoding. Since BSs are not aware of the uplink symbols beforehand, any symbols with zero power can be transmitted instead to null the BS-to-BS interference. Numerical results show significant performance gains for uplink and downlink at low and medium utilization. By varying the number of participating uplink BSs in the precoding, we also show that it is possible to successfully trade performance in the two directions. △ Less

Submitted 27 April, 2017; v1 submitted 20 February, 2017; originally announced February 2017.

arXiv:1701.04066 [pdf, other]

Cooperative Transmissions in Ultra-Dense Networks under a Bounded Dual-Slope Path Loss Model

Authors: Yanpeng Yang, Ki Won Sung, Jihong Park, Seong-Lyun Kim, Kwang Soon Kim

Abstract: In an Ultra-dense network (UDN) where there are more base stations (BSs) than active users, it is possible that many BSs are instantaneously left idle. Thus, how to utilize these dormant BSs by means of cooperative transmission is an interesting question. In this paper, we investigate the performance of a UDN with two types of cooperation schemes: non-coherent joint transmission (JT) without chann… ▽ More In an Ultra-dense network (UDN) where there are more base stations (BSs) than active users, it is possible that many BSs are instantaneously left idle. Thus, how to utilize these dormant BSs by means of cooperative transmission is an interesting question. In this paper, we investigate the performance of a UDN with two types of cooperation schemes: non-coherent joint transmission (JT) without channel state information (CSI) and coherent JT with full CSI knowledge. We consider a bounded dual-slope path loss model to describe UDN environments where a user has several BSs in the near-field and the rest in the far-field. Numerical results show that non-coherent JT cannot improve the user spectral efficiency (SE) due to the simultaneous increment in signal and interference powers. For coherent JT, the achievable SE gain depends on the range of near-field, the relative densities of BSs and users, and the CSI accuracy. Finally, we assess the energy efficiency (EE) of cooperation in UDN. Despite costing extra energy consumption, cooperation can still improve EE under certain conditions. △ Less

Submitted 5 May, 2017; v1 submitted 15 January, 2017; originally announced January 2017.

Comments: 6 pages, 8 figure, to appear in EuCNC 2017

arXiv:1701.04065 [pdf, other]

On the Asymptotic Behavior of Ultra-Densification under a Bounded Dual-Slope Path Loss Model

Authors: Yanpeng Yang, Jihong Park, Ki Won Sung

Abstract: In this paper, we investigate the impact of network densification on the performance in terms of downlink signal-to-interference (SIR) coverage probability and network area spectral efficiency (ASE). A sophisticated bounded dual-slope path loss model and practical user equipment (UE) densities are incorporated in the analysis, which have never been jointly considered before. By using stochastic ge… ▽ More In this paper, we investigate the impact of network densification on the performance in terms of downlink signal-to-interference (SIR) coverage probability and network area spectral efficiency (ASE). A sophisticated bounded dual-slope path loss model and practical user equipment (UE) densities are incorporated in the analysis, which have never been jointly considered before. By using stochastic geometry, we derive an integral expression along with closed-form bounds of the coverage probability and ASE, validated by simulation results. Through these, we provide the asymptotic behavior of ultra-densification. The coverage probability and ASE have non-zero convergence in asymptotic regions unless UE density goes to infinity (full load). Meanwhile, the effect of UE density on the coverage probability is analyzed. The coverage probability will reveal an U-shape for large UE densities due to interference fall into the near-field, but it will keep increasing for low UE densites. Furthermore, our results indicate that the performance is overestimated without applying the bounded dual-slope path loss model. The derived expressions and results in this work pave the way for future network provisioning. △ Less

Submitted 10 April, 2017; v1 submitted 15 January, 2017; originally announced January 2017.

Comments: 7 pages, 4 figures, to appear in European Wireless 2017

arXiv:1611.06342 [pdf, other]

Quantized neural network design under weight capacity constraint

Authors: Sungho Shin, Kyuyeon Hwang, Wonyong Sung

Abstract: The complexity of deep neural network algorithms for hardware implementation can be lowered either by scaling the number of units or reducing the word-length of weights. Both approaches, however, can accompany the performance degradation although many types of research are conducted to relieve this problem. Thus, it is an important question which one, between the network size scaling and the weigh… ▽ More The complexity of deep neural network algorithms for hardware implementation can be lowered either by scaling the number of units or reducing the word-length of weights. Both approaches, however, can accompany the performance degradation although many types of research are conducted to relieve this problem. Thus, it is an important question which one, between the network size scaling and the weight quantization, is more effective for hardware optimization. For this study, the performances of fully-connected deep neural networks (FCDNNs) and convolutional neural networks (CNNs) are evaluated while changing the network complexity and the word-length of weights. Based on these experiments, we present the effective compression ratio (ECR) to guide the trade-off between the network size and the precision of weights when the hardware resource is limited. △ Less

Submitted 19 November, 2016; originally announced November 2016.

Comments: This paper is accepted at NIPS 2016 workshop on Efficient Methods for Deep Neural Networks (EMDNN). arXiv admin note: text overlap with arXiv:1511.06488

arXiv:1610.09639 [pdf, other]

Compact Deep Convolutional Neural Networks With Coarse Pruning

Authors: Sajid Anwar, Wonyong Sung

Abstract: The learning capability of a neural network improves with increasing depth at higher computational costs. Wider layers with dense kernel connectivity patterns furhter increase this cost and may hinder real-time inference. We propose feature map and kernel level pruning for reducing the computational complexity of a deep convolutional neural network. Pruning feature maps reduces the width of a laye… ▽ More The learning capability of a neural network improves with increasing depth at higher computational costs. Wider layers with dense kernel connectivity patterns furhter increase this cost and may hinder real-time inference. We propose feature map and kernel level pruning for reducing the computational complexity of a deep convolutional neural network. Pruning feature maps reduces the width of a layer and hence does not need any sparse representation. Further, kernel pruning converts the dense connectivity pattern into a sparse one. Due to coarse nature, these pruning granularities can be exploited by GPUs and VLSI based implementations. We propose a simple and generic strategy to choose the least adversarial pruning masks for both granularities. The pruned networks are retrained which compensates the loss in accuracy. We obtain the best pruning ratios when we prune a network with both granularities. Experiments with the CIFAR-10 dataset show that more than 85% sparsity can be induced in the convolution layers with less than 1% increase in the missclassification rate of the baseline network. △ Less

Submitted 30 October, 2016; originally announced October 2016.

arXiv:1610.00552 [pdf, other]

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

Authors: Minjae Lee, Kyuyeon Hwang, **hwan Park, Sungwook Choi, Sungho Shin, Wonyong Sung

Abstract: In this paper, a neural network based real-time speech recognition (SR) system is developed using an FPGA for very low-power operation. The implemented system employs two recurrent neural networks (RNNs); one is a speech-to-character RNN for acoustic modeling (AM) and the other is for character-level language modeling (LM). The system also employs a statistical word-level LM to improve the recogni… ▽ More In this paper, a neural network based real-time speech recognition (SR) system is developed using an FPGA for very low-power operation. The implemented system employs two recurrent neural networks (RNNs); one is a speech-to-character RNN for acoustic modeling (AM) and the other is for character-level language modeling (LM). The system also employs a statistical word-level LM to improve the recognition accuracy. The results of the AM, the character-level LM, and the word-level LM are combined using a fairly simple N-best search algorithm instead of the hidden Markov model (HMM) based network. The RNNs are implemented using massively parallel processing elements (PEs) for low latency and high throughput. The weights are quantized to 6 bits to store all of them in the on-chip memory of an FPGA. The proposed algorithm is implemented on a Xilinx XC7Z045, and the system can operate much faster than real-time. △ Less

Submitted 30 September, 2016; originally announced October 2016.

Comments: Accepted to SiPS 2016

Showing 1–50 of 103 results for author: Sung, W