Search | arXiv e-print repository

Parameter-Efficient Instance-Adaptive Neural Video Compression

Authors: Hyunmo Yang, Seungjun Oh, Eunbyung Park

Abstract: Learning-based Neural Video Codecs (NVCs) have emerged as a compelling alternative to the standard video codecs, demonstrating promising performance, and simple and easily maintainable pipelines. However, NVCs often fall short of compression performance and occasionally exhibit poor generalization capability due to inference-only compression scheme and their dependence on training data. The instan… ▽ More Learning-based Neural Video Codecs (NVCs) have emerged as a compelling alternative to the standard video codecs, demonstrating promising performance, and simple and easily maintainable pipelines. However, NVCs often fall short of compression performance and occasionally exhibit poor generalization capability due to inference-only compression scheme and their dependence on training data. The instance-adaptive video compression techniques have recently been suggested as a viable solution, fine-tuning the encoder or decoder networks for a particular test instance video. However, fine-tuning all the model parameters incurs high computational costs, increases the bitrates, and often leads to unstable training. In this work, we propose a parameter-efficient instance-adaptive video compression framework. Inspired by the remarkable success of parameter-efficient fine-tuning on large-scale neural network models, we propose to use a lightweight adapter module that can be easily attached to the pretrained NVCs and fine-tuned for test video sequences. The resulting algorithm significantly improves compression performance and reduces the encoding time compared to the existing instant-adaptive video compression algorithms. Furthermore, the suggested fine-tuning method enhances the robustness of the training process, allowing for the proposed method to be widely used in many practical settings. We conducted extensive experiments on various standard benchmark datasets, including UVG, MCL-JVC, and HEVC sequences, and the experimental results have shown a significant improvement in rate-distortion (RD) curves (up to 5 dB PSNR improvements) and BD rates compared to the baselines NVC. Our code is available on https://github.com/ohsngjun/PEVC}{https://github.com/ohsngjun/PEVC. △ Less

Submitted 11 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: 23 pages, 13 figures

arXiv:2404.15374 [pdf, other]

Minimum Description Feature Selection for Complexity Reduction in Machine Learning-based Wireless Positioning

Authors: Myeung Suk Oh, Anindya Bijoy Das, Taejoon Kim, David J. Love, Christopher G. Brinton

Abstract: Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neura… ▽ More Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neural network (P-NN) that utilizes the minimum description features to substantially reduce the complexity of deep learning-based WP. P-NN's feature selection strategy is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We improve P-NN's learning ability by intelligently processing two different types of inputs: sparse image and measurement matrices. Specifically, we implement a self-attention layer to reinforce the training ability of our network. We also develop a technique to adapt feature space size, optimizing over the expected information gain and the classification capability quantified with information-theoretic measures on signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). In particular, we find that P-NN achieves a large improvement in performance for low SNR, as unnecessary measurements are discarded in our minimum description features. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: This paper has been accepted for the publication in IEEE Journal on Selected Areas in Communications. arXiv admin note: text overlap with arXiv:2402.09580

arXiv:2404.04096 [pdf, other]

Machine Learning-Aided Cooperative Localization under Dense Urban Environment

Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions including localization and controls. Location awareness, in particular, lends itself to the deployment of location-specific services and the improvement of the operation performance. The localization entails direct communication to the network infrastructure, and the resulting centralized positioning solutions readily become intractable as the network scales up. As an alternative to the centralized solutions, this article addresses decentralized principle of vehicular localization reinforced by machine learning techniques in dense urban environments with frequent inaccessibility to reliable measurement. As such, the collaboration of multiple vehicles enhances the positioning performance of machine learning approaches. A virtual testbed is developed to validate this machine learning model for real-map vehicular networks. Numerical results demonstrate universal feasibility of cooperative localization, in particular, for dense urban area configurations. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2402.17127 [pdf, other]

Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

Authors: Taein Kang, Soyul Han, Sunmook Choi, Jae** Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, Il-Youp Kwak

Abstract: Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, p… ▽ More Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, particularly those utilizing large pretrained wav2vec 2.0 as a featurization front-end, highlights the importance of refined feature encoders. In response, this research assessed the representational capability of wav2vec 2.0 as an audio feature extractor, modifying the size of its pretrained Transformer layers through two key adjustments: (1) selecting a subset of layers starting from the leftmost one and (2) fine-tuning a portion of the selected layers from the rightmost one. We complemented this analysis with five spoofing detection back-end models, with a primary focus on AASIST, enabling us to pinpoint the optimal configuration for the selection and fine-tuning process. In contrast to conventional handcrafted features, our investigation identified several spoofing detection systems that achieve state-of-the-art performance in the ASVspoof 2019 LA dataset. This comprehensive exploration offers valuable insights into feature selection strategies, advancing the field of spoofing detection. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 5 pages

MSC Class: 00A71 ACM Class: I.2.6

arXiv:2402.09580 [pdf, other]

Complexity Reduction in Machine Learning-Based Wireless Positioning: Minimum Description Features

Authors: Myeung Suk Oh, Anindya Bijoy Das, Taejoon Kim, David J. Love, Christopher G. Brinton

Abstract: A recent line of research has been investigating deep learning approaches to wireless positioning (WP). Although these WP algorithms have demonstrated high accuracy and robust performance against diverse channel conditions, they also have a major drawback: they require processing high-dimensional features, which can be prohibitive for mobile applications. In this work, we design a positioning neur… ▽ More A recent line of research has been investigating deep learning approaches to wireless positioning (WP). Although these WP algorithms have demonstrated high accuracy and robust performance against diverse channel conditions, they also have a major drawback: they require processing high-dimensional features, which can be prohibitive for mobile applications. In this work, we design a positioning neural network (P-NN) that substantially reduces the complexity of deep learning-based WP through carefully crafted minimum description features. Our feature selection is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We also develop a novel methodology for adaptively selecting the size of feature space, which optimizes over balancing the expected amount of useful information and classification capability, quantified using information-theoretic measures on the signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: This paper has been accepted in IEEE International Conference on Communications (ICC) 2024

arXiv:2401.03690 [pdf]

So You Want to Image Myelin Using MRI: Magnetic Susceptibility Source Separation for Myelin Imaging

Authors: Jongho Lee, Sooyeon Ji, Se-Hong Oh

Abstract: In MRI, researchers have long endeavored to effectively visualize myelin distribution in the brain, a pursuit with significant implications for both scientific research and clinical applications. Over time, various methods such as myelin water imaging, magnetization transfer imaging, and relaxometric imaging have been developed, each carrying distinct advantages and limitations. Recently, an innov… ▽ More In MRI, researchers have long endeavored to effectively visualize myelin distribution in the brain, a pursuit with significant implications for both scientific research and clinical applications. Over time, various methods such as myelin water imaging, magnetization transfer imaging, and relaxometric imaging have been developed, each carrying distinct advantages and limitations. Recently, an innovative technique named as magnetic susceptibility source separation has emerged, introducing a novel surrogate biomarker for myelin in the form of a diamagnetic susceptibility map. This paper comprehensively reviews this cutting-edge method, providing the fundamental concepts of magnetic susceptibility, susceptibility imaging, and the validation of the diamagnetic susceptibility map as a myelin biomarker that indirectly measure myelin content. Additionally, the paper explores essential aspects of data acquisition and processing, offering practical insights for readers. A comparison with established myelin imaging methods is also presented, and both current and prospective clinical and scientific applications are discussed to provide a holistic understanding of the technique. This work aims to serve as a foundational resource for newcomers entering this dynamic and rapidly expanding field. △ Less

Submitted 28 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: Accepted to Magnetic Resonance in Medical Sciences

arXiv:2312.10880 [pdf, other]

Sharable Clothoid-based Continuous Motion Planning for Connected Automated Vehicles

Authors: Sanghoon Oh, Qi Chen, H. Eric Tseng, Gaurav Pandey, Gabor Orosz

Abstract: A continuous motion planning method for connected automated vehicles is considered for generating feasible trajectories in real-time using three consecutive clothoids. The proposed method reduces path planning to a small set of nonlinear algebraic equations such that the generated path can be efficiently checked for feasibility and collision. After path planning, velocity planning is executed whil… ▽ More A continuous motion planning method for connected automated vehicles is considered for generating feasible trajectories in real-time using three consecutive clothoids. The proposed method reduces path planning to a small set of nonlinear algebraic equations such that the generated path can be efficiently checked for feasibility and collision. After path planning, velocity planning is executed while maintaining a parallel simple structure. Key strengths of this framework include its interpretability, shareability, and ability to specify boundary conditions. Its interpretability and shareability stem from the succinct representation of the resulting local motion plan using a handful of physically meaningful parameters. Vehicles may share these parameters via V2X communication so that the recipients can precisely reconstruct the planned trajectory of the senders and respond accordingly. The proposed local planner guarantees the satisfaction of boundary conditions, thus ensuring seamless integration with a wide array of higher-level global motion planners. The tunable nature of the method enables tailoring the local plans to specific maneuvers like turns at intersections, lane changes, and U-turns. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: 14 pages, 14 figures

arXiv:2312.01638 [pdf, other]

J-Net: Improved U-Net for Terahertz Image Super-Resolution

Authors: Woon-Ha Yeo, Seung-Hwan Jung, Seung Jae Oh, Inhee Maeng, Eui Su Lee, Han-Cheol Ryu

Abstract: Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot… ▽ More Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot research topics. We propose a novel network architecture called J-Net which is improved version of U-Net to solve the THz image super-resolution. It employs the simple baseline blocks which can extract low resolution (LR) image features and learn the map** of LR images to highresolution (HR) images efficiently. All training was conducted using the DIV2K+Flickr2K dataset, and we employed the peak signal-to-noise ratio (PSNR) for quantitative comparison. In our comparisons with other THz image super-resolution methods, JNet achieved a PSNR of 32.52 dB, surpassing other techniques by more than 1 dB. J-Net also demonstrates superior performance on real THz images compared to other methods. Experiments show that the proposed J-Net achieves better PSNR and visual improvement compared with other THz image super-resolution methods. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2307.04292 [pdf, other]

A Demand-Driven Perspective on Generative Audio AI

Authors: Sangshin Oh, Minsung Kang, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon

Abstract: To achieve successful deployment of AI research, it is crucial to understand the demands of the industry. In this paper, we present the results of a survey conducted with professional audio engineers, in order to determine research priorities and define various research tasks. We also summarize the current challenges in audio quality and controllability based on the survey. Our analysis emphasizes… ▽ More To achieve successful deployment of AI research, it is crucial to understand the demands of the industry. In this paper, we present the results of a survey conducted with professional audio engineers, in order to determine research priorities and define various research tasks. We also summarize the current challenges in audio quality and controllability based on the survey. Our analysis emphasizes that the availability of datasets is currently the main bottleneck for achieving high-quality audio generation. Finally, we suggest potential solutions for some revealed issues with empirical evidence. △ Less

Submitted 9 July, 2023; originally announced July 2023.

Comments: 10 pages, 7 figures

arXiv:2306.12789 [pdf]

Russian assimilatory palatalization is incomplete neutralization

Authors: Se** Oh, Jason A. Shaw, Karthik Durvasula, Alexei Kochotov

Abstract: Incomplete neutralization refers to phonetic traces of underlying contrasts in phonologically neutralizing contexts. The present study examines one such context: Russian assimilatory palatalization in C+j sequences. Russian contrasts plain and palatalized consonants, with the plain consonants having a secondary articulation involving retraction of the tongue dorsum (velarization/uvularization). Ho… ▽ More Incomplete neutralization refers to phonetic traces of underlying contrasts in phonologically neutralizing contexts. The present study examines one such context: Russian assimilatory palatalization in C+j sequences. Russian contrasts plain and palatalized consonants, with the plain consonants having a secondary articulation involving retraction of the tongue dorsum (velarization/uvularization). However, Russian also has stop-glide sequences that form near-minimal pairs with palatalized stops. In the environment preceding palatal glides, the contrast between palatalized and plain consonants is neutralized, due to the palatalization of the plain stop (assimilatory palatalization). The purpose of the study is to explore whether the neutralization is complete. To do so, we conducted an electromagnetic articulography (EMA) experiment examining temporal coordination and the spatial position of the tongue body in underlyingly palatalized consonants and those derived from assimilatory palatalization. Articulatory results from four native speakers of Russian revealed that gestures in both conditions are coordinated as complex segments, i.e., they are palatalized consonants. However, there are differences across conditions consistent with the residual presence of a tongue dorsum retraction gesture in the plain obstruents. We conclude that neutralization of the plain-palatal contrast in Russian is incomplete; consonants in the assimilatory palatalization condition exhibit inter-gestural coordination characteristic of palatalized consonants along with residual evidence of an underlying tongue dorsum retraction (velarization/uvularization) gesture. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.10841 [pdf, other]

Blockchain-Enabled Federated Learning: A Reference Architecture Design, Implementation, and Verification

Authors: Eunsu Goh, Dae-Yeol Kim, Kwangkee Lee, Suyeong Oh, Jong-Eui Chae, Do-Yup Kim

Abstract: This paper presents a novel reference architecture for blockchain-enabled federated learning (BCFL), a state-of-the-art approach that amalgamates the strengths of federated learning and blockchain technology.We define smart contract functions, stakeholders and their roles, and the use of interplanetary file system (IPFS) as key components of BCFL and conduct a comprehensive analysis. In traditiona… ▽ More This paper presents a novel reference architecture for blockchain-enabled federated learning (BCFL), a state-of-the-art approach that amalgamates the strengths of federated learning and blockchain technology.We define smart contract functions, stakeholders and their roles, and the use of interplanetary file system (IPFS) as key components of BCFL and conduct a comprehensive analysis. In traditional centralized federated learning, the selection of local nodes and the collection of learning results for each round are merged under the control of a central server. In contrast, in BCFL, all these processes are monitored and managed via smart contracts. Additionally, we propose an extension architecture to support both crossdevice and cross-silo federated learning scenarios. Furthermore, we implement and verify the architecture in a practical real-world Ethereum development environment. Our BCFL reference architecture provides significant flexibility and extensibility, accommodating the integration of various additional elements, as per specific requirements and use cases, thereby rendering it an adaptable solution for a wide range of BCFL applications. As a prominent example of extensibility, decentralized identifiers (DIDs) have been employed as an authentication method to introduce practical utilization within BCFL. This study not only bridges a crucial gap between research and practical deployment but also lays a solid foundation for future explorations in the realm of BCFL. The pivotal contribution of this study is the successful implementation and verification of a realistic BCFL reference architecture. We intend to make the source code publicly accessible shortly, fostering further advancements and adaptations within the community. △ Less

Submitted 22 November, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 14 pages, 15 figures, 3 tables

MSC Class: 68T01 (Primary) 68M14; 94A60 (Secondary) ACM Class: I.2.6; I.2.11

arXiv:2306.09807 [pdf, other]

FALL-E: A Foley Sound Synthesis Model and Strategies

Authors: Minsung Kang, Sangshin Oh, Hyeongi Moon, Kyungyun Lee, Ben Sangbae Chon

Abstract: This paper introduces FALL-E, a foley synthesis system and its training/inference strategies. The FALL-E model employs a cascaded approach comprising low-resolution spectrogram generation, spectrogram super-resolution, and a vocoder. We trained every sound-related model from scratch using our extensive datasets, and utilized a pre-trained language model. We conditioned the model with dataset-speci… ▽ More This paper introduces FALL-E, a foley synthesis system and its training/inference strategies. The FALL-E model employs a cascaded approach comprising low-resolution spectrogram generation, spectrogram super-resolution, and a vocoder. We trained every sound-related model from scratch using our extensive datasets, and utilized a pre-trained language model. We conditioned the model with dataset-specific texts, enabling it to learn sound quality and recording environment based on text input. Moreover, we leveraged external language models to improve text descriptions of our datasets and performed prompt engineering for quality, coherence, and diversity. FALL-E was evaluated by an objective measure as well as listening tests in the DCASE 2023 challenge Task 7. The submission achieved the second place on average, while achieving the best score for diversity, second place for audio quality, and third place for class fitness. △ Less

Submitted 10 August, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 5 pages, 3 figures

arXiv:2305.00384 [pdf, other]

doi 10.1109/TVT.2023.3279833

Dynamic and Robust Sensor Selection Strategies for Wireless Positioning with TOA/RSS Measurement

Authors: Myeung Suk Oh, Seyyedali Hosseinalipour, Taejoon Kim, David J. Love, James V. Krogmeier, Christopher G. Brinton

Abstract: Emerging wireless applications are requiring ever more accurate location-positioning from sensor measurements. In this paper, we develop sensor selection strategies for 3D wireless positioning based on time of arrival (TOA) and received signal strength (RSS) measurements to handle two distinct scenarios: (i) known approximated target location, for which we conduct dynamic sensor selection to minim… ▽ More Emerging wireless applications are requiring ever more accurate location-positioning from sensor measurements. In this paper, we develop sensor selection strategies for 3D wireless positioning based on time of arrival (TOA) and received signal strength (RSS) measurements to handle two distinct scenarios: (i) known approximated target location, for which we conduct dynamic sensor selection to minimize the positioning error; and (ii) unknown approximated target location, in which the worst-case positioning error is minimized via robust sensor selection. We derive expressions for the Cramér-Rao lower bound (CRLB) as a performance metric to quantify the positioning accuracy resulted from selected sensors. For dynamic sensor selection, two greedy selection strategies are proposed, each of which exploits properties revealed in the derived CRLB expressions. These selection strategies are shown to strike an efficient balance between computational complexity and performance suboptimality. For robust sensor selection, we show that the conventional convex relaxation approach leads to instability, and then develop three algorithms based on (i) iterative convex optimization (ICO), (ii) difference of convex functions programming (DCP), and (iii) discrete monotonic optimization (DMO). Each of these strategies exhibits a different tradeoff between computational complexity and optimality guarantee. Simulation results show that the proposed sensor selection strategies provide significant improvements in terms of accuracy and/or complexity compared to existing sensor selection methods. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: This paper has been accepted to IEEE Transactions on Vehicular Technology for future publication

arXiv:2304.12200 [pdf, other]

SplitAMC: Split Learning for Robust Automatic Modulation Classification

Authors: Jihoon Park, Seungeun Oh, Seong-Lyun Kim

Abstract: Automatic modulation classification (AMC) is a technology that identifies a modulation scheme without prior signal information and plays a vital role in various applications, including cognitive radio and link adaptation. With the development of deep learning (DL), DL-based AMC methods have emerged, while most of them focus on reducing computational complexity in a centralized structure. This cent… ▽ More Automatic modulation classification (AMC) is a technology that identifies a modulation scheme without prior signal information and plays a vital role in various applications, including cognitive radio and link adaptation. With the development of deep learning (DL), DL-based AMC methods have emerged, while most of them focus on reducing computational complexity in a centralized structure. This centralized learning-based AMC (CentAMC) violates data privacy in the aspect of direct transmission of client-side raw data. Federated learning-based AMC (FedeAMC) can bypass this issue by exchanging model parameters, but causes large resultant latency and client-side computational load. Moreover, both CentAMC and FedeAMC are vulnerable to large-scale noise occured in the wireless channel between the client and the server. To this end, we develop a novel AMC method based on a split learning (SL) framework, coined SplitAMC, that can achieve high accuracy even in poor channel conditions, while guaranteeing data privacy and low latency. In SplitAMC, each client can benefit from data privacy leakage by exchanging smashed data and its gradient instead of raw data, and has robustness to noise with the help of high scale of smashed data. Numerical evaluations validate that SplitAMC outperforms CentAMC and FedeAMC in terms of accuracy for all SNRs as well as latency. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: to be presented at IEEE VTC2023-Spring

arXiv:2304.06237 [pdf, other]

Deep learning based ECG segmentation for delineation of diverse arrhythmias

Authors: Chankyu Joung, Mi** Kim, Tae** Paik, Seong-Ho Kong, Seung-Young Oh, Won Kyeong Jeon, Jae-hu Jeon, Joong-Sik Hong, Wan-Joong Kim, Woong Kook, Myung-** Cha, Otto van Koert

Abstract: Accurate delineation of key waveforms in an ECG is a critical initial step in extracting relevant features to support the diagnosis and treatment of heart conditions. Although deep learning based methods using a segmentation model to locate the P, QRS, and T waves have shown promising results, their ability to handle signals exhibiting arrhythmia remains unclear. This study builds on existing rese… ▽ More Accurate delineation of key waveforms in an ECG is a critical initial step in extracting relevant features to support the diagnosis and treatment of heart conditions. Although deep learning based methods using a segmentation model to locate the P, QRS, and T waves have shown promising results, their ability to handle signals exhibiting arrhythmia remains unclear. This study builds on existing research by introducing a U-Net-like segmentation model for ECG delineation, with a particular focus on diverse arrhythmias. For this purpose, we curate an internal dataset containing waveform boundary annotations for various arrhythmia types to train and validate our model. Our key contributions include identifying segmentation model failures in different arrhythmia types, develo** a robust model using a diverse training set, achieving comparable performance on benchmark datasets, and introducing a classification guided strategy to reduce false P wave predictions for specific arrhythmias. This study advances deep learning based ECG delineation in the context of arrhythmias and highlights its challenges. △ Less

Submitted 6 September, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

arXiv:2301.04774 [pdf, other]

doi 10.1109/JSAC.2023.3336154

A Decentralized Pilot Assignment Algorithm for Scalable O-RAN Cell-Free Massive MIMO

Authors: Myeung Suk Oh, Anindya Bijoy Das, Seyyedali Hosseinalipour, Taejoon Kim, David J. Love, Christopher G. Brinton

Abstract: Radio access networks (RANs) in monolithic architectures have limited adaptability to supporting different network scenarios. Recently, open-RAN (O-RAN) techniques have begun adding enormous flexibility to RAN implementations. O-RAN is a natural architectural fit for cell-free massive multiple-input multiple-output (CFmMIMO) systems, where many geographically-distributed access points (APs) are em… ▽ More Radio access networks (RANs) in monolithic architectures have limited adaptability to supporting different network scenarios. Recently, open-RAN (O-RAN) techniques have begun adding enormous flexibility to RAN implementations. O-RAN is a natural architectural fit for cell-free massive multiple-input multiple-output (CFmMIMO) systems, where many geographically-distributed access points (APs) are employed to achieve ubiquitous coverage and enhanced user performance. In this paper, we address the decentralized pilot assignment (PA) problem for scalable O-RAN-based CFmMIMO systems. We propose a low-complexity PA scheme using a multi-agent deep reinforcement learning (MA-DRL) framework in which multiple learning agents perform distributed learning over the O-RAN communication architecture to suppress pilot contamination. Our approach does not require prior channel knowledge but instead relies on real-time interactions made with the environment during the learning procedure. In addition, we design a codebook search (CS) scheme that exploits the decentralization of our O-RAN CFmMIMO architecture, where different codebook sets can be utilized to further improve PA performance without any significant additional complexities. Numerical evaluations verify that our proposed scheme provides substantial computational scalability advantages and improvements in channel estimation performance compared to the state-of-the-art. △ Less

Submitted 1 April, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: The journal version of this paper is published in IEEE Journal on Selected Areas in Communications

arXiv:2212.07939 [pdf, other]

RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis

Authors: Shinhyeok Oh, HyeongRae Noh, Yoonseok Hong, Insoo Oh

Abstract: With the advent of deep learning, a huge number of text-to-speech (TTS) models which produce human-like speech have emerged. Recently, by introducing syntactic and semantic information w.r.t the input text, various approaches have been proposed to enrich the naturalness and expressiveness of TTS models. Although these strategies showed impressive results, they still have some limitations in utiliz… ▽ More With the advent of deep learning, a huge number of text-to-speech (TTS) models which produce human-like speech have emerged. Recently, by introducing syntactic and semantic information w.r.t the input text, various approaches have been proposed to enrich the naturalness and expressiveness of TTS models. Although these strategies showed impressive results, they still have some limitations in utilizing language information. First, most approaches only use graph networks to utilize syntactic and semantic information without considering linguistic features. Second, most previous works do not explicitly consider adjacent words when encoding syntactic and semantic information, even though it is obvious that adjacent words are usually meaningful when encoding the current word. To address these issues, we propose Relation-aware Word Encoding Network (RWEN), which effectively allows syntactic and semantic information based on two modules (i.e., Semantic-level Relation Encoding and Adjacent Word Relation Encoding). Experimental results show substantial improvements compared to previous works. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: Accepted to AAAI 2023

arXiv:2209.06305 [pdf]

Ptychographic lens-less polarization microscopy

Authors: Jeongsoo Kim, Seungri Song, Bora Kim, Mirae Park, Seung Jae Oh, Daesuk Kim, Barry Cense, Yong-Min Huh, Joo Yong Lee, Chulmin Joo

Abstract: Birefringence, an inherent characteristic of optically anisotropic materials, is widely utilized in various imaging applications ranging from material characterizations to clinical diagnosis. Polarized light microscopy enables high-resolution, high-contrast imaging of optically anisotropic specimens, but it is associated with mechanical rotations of polarizer/analyzer and relatively complex optica… ▽ More Birefringence, an inherent characteristic of optically anisotropic materials, is widely utilized in various imaging applications ranging from material characterizations to clinical diagnosis. Polarized light microscopy enables high-resolution, high-contrast imaging of optically anisotropic specimens, but it is associated with mechanical rotations of polarizer/analyzer and relatively complex optical designs. Here, we present a novel form of polarization-sensitive microscopy capable of birefringence imaging of transparent objects without an optical lens and any moving parts. Our method exploits an optical mask-modulated polarization image sensor and single-input-state LED illumination design to obtain complex and birefringence images of the object via ptychographic phase retrieval. Using a camera with a pixel resolution of 3.45 um, the method achieves birefringence imaging with a half-pitch resolution of 2.46 um over a 59.74 mm^2 field-of-view, which corresponds to a space-bandwidth product of 9.9 megapixels. We demonstrate the high-resolution, large-area birefringence imaging capability of our method by presenting the birefringence images of various anisotropic objects, including a birefringent resolution target, liquid crystal polymer depolarizer, monosodium urate crystal, and excised mouse eye and heart tissues. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 18 pages, 10 figures, author names corrected

arXiv:2207.10760 [pdf, ps, other]

A Proposal for Foley Sound Synthesis Challenge

Authors: Keunwoo Choi, Sangshin Oh, Minsung Kang, Brian McFee

Abstract: "Foley" refers to sound effects that are added to multimedia during post-production to enhance its perceived acoustic properties, e.g., by simulating the sounds of footsteps, ambient environmental sounds, or visible objects on the screen. While foley is traditionally produced by foley artists, there is increasing interest in automatic or machine-assisted techniques building upon recent advances in… ▽ More "Foley" refers to sound effects that are added to multimedia during post-production to enhance its perceived acoustic properties, e.g., by simulating the sounds of footsteps, ambient environmental sounds, or visible objects on the screen. While foley is traditionally produced by foley artists, there is increasing interest in automatic or machine-assisted techniques building upon recent advances in sound synthesis and generative models. To foster more participation in this growing research area, we propose a challenge for automatic foley synthesis. Through case studies on successful previous challenges in audio and machine learning, we set the goals of the proposed challenge: rigorous, unified, and efficient evaluation of different foley synthesis systems, with an overarching goal of drawing active participation from the research community. We outline the details and design considerations of a foley sound synthesis challenge, including task definition, dataset requirements, and evaluation criteria. △ Less

Submitted 21 July, 2022; originally announced July 2022.

arXiv:2207.10324 [pdf, other]

Enhancing Generative Networks for Chest Anomaly Localization through Automatic Registration-Based Unpaired-to-Pseudo-Paired Training Data Translation

Authors: Kyungsu Kim, Seong Je Oh, Chae Yeon Lim, Ju Hwan Lee, Tae Uk Kim, Myung ** Chung

Abstract: Image translation based on a generative adversarial network (GAN-IT) is a promising method for the precise localization of abnormal regions in chest X-ray images (AL-CXR) even without the pixel-level annotation. However, heterogeneous unpaired datasets undermine existing methods to extract key features and distinguish normal from abnormal cases, resulting in inaccurate and unstable AL-CXR. To addr… ▽ More Image translation based on a generative adversarial network (GAN-IT) is a promising method for the precise localization of abnormal regions in chest X-ray images (AL-CXR) even without the pixel-level annotation. However, heterogeneous unpaired datasets undermine existing methods to extract key features and distinguish normal from abnormal cases, resulting in inaccurate and unstable AL-CXR. To address this problem, we propose an improved two-stage GAN-IT involving registration and data augmentation. For the first stage, we introduce an advanced deep-learning-based registration technique that virtually and reasonably converts unpaired data into paired data for learning registration maps, by sequentially utilizing linear-based global and uniform coordinate transformation and AI-based non-linear coordinate fine-tuning. This approach enables independent and complex coordinate transformation of each detailed location of the lung while recognizing the entire lung structure, thereby achieving higher registration performance with resolving inherent artifacts caused by unpaired conditions. For the second stage, we apply data augmentation to diversify anomaly locations by swap** the left and right lung regions on the uniform registered frames, further improving the performance by alleviating imbalance in data distribution showing left and right lung lesions. The proposed method is model agnostic and shows consistent AL-CXR performance improvement in representative AI models. Therefore, we believe GAN-IT for AL-CXR can be clinically implemented by using our basis framework, even if learning data are scarce or difficult for the pixel-level disease annotation. △ Less

Submitted 15 June, 2024; v1 submitted 21 July, 2022; originally announced July 2022.

arXiv:2206.14984 [pdf, other]

TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder

Authors: Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, **-Seob Kim, Jae-Min Kim

Abstract: Recent advances in synthetic speech quality have enabled us to train text-to-speech (TTS) systems by using synthetic corpora. However, merely increasing the amount of synthetic data is not always advantageous for improving training efficiency. Our aim in this study is to selectively choose synthetic data that are beneficial to the training process. In the proposed method, we first adopt a variatio… ▽ More Recent advances in synthetic speech quality have enabled us to train text-to-speech (TTS) systems by using synthetic corpora. However, merely increasing the amount of synthetic data is not always advantageous for improving training efficiency. Our aim in this study is to selectively choose synthetic data that are beneficial to the training process. In the proposed method, we first adopt a variational autoencoder whose posterior distribution is utilized to extract latent features representing acoustic similarity between the recorded and synthetic corpora. By using those learned features, we then train a ranking support vector machine (RankSVM) that is well known for effectively ranking relative attributes among binary classes. By setting the recorded and synthetic ones as two opposite classes, RankSVM is used to determine how the synthesized speech is acoustically similar to the recorded data. Then, synthetic TTS data, whose distribution is close to the recorded data, are selected from large-scale synthetic corpora. By using these data for retraining the TTS model, the synthetic quality can be significantly improved. Objective and subjective evaluation results show the superiority of the proposed method over the conventional methods. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: Accepted to the conference of INTERSPEECH 2022

arXiv:2206.13504 [pdf, other]

AI-based computer-aided diagnostic system of chest digital tomography synthesis: Demonstrating comparative advantage with X-ray-based AI systems

Authors: Kyung-Su Kim, Ju Hwan Lee, Seong Je Oh, Myung ** Chung

Abstract: Compared with chest X-ray (CXR) imaging, which is a single image projected from the front of the patient, chest digital tomosynthesis (CDTS) imaging can be more advantageous for lung lesion detection because it acquires multiple images projected from multiple angles of the patient. Various clinical comparative analysis and verification studies have been reported to demonstrate this, but there were… ▽ More Compared with chest X-ray (CXR) imaging, which is a single image projected from the front of the patient, chest digital tomosynthesis (CDTS) imaging can be more advantageous for lung lesion detection because it acquires multiple images projected from multiple angles of the patient. Various clinical comparative analysis and verification studies have been reported to demonstrate this, but there were no artificial intelligence (AI)-based comparative analysis studies. Existing AI-based computer-aided detection (CAD) systems for lung lesion diagnosis have been developed mainly based on CXR images; however, CAD-based on CDTS, which uses multi-angle images of patients in various directions, has not been proposed and verified for its usefulness compared to CXR-based counterparts. This study develops/tests a CDTS-based AI CAD system to detect lung lesions to demonstrate performance improvements compared to CXR-based AI CAD. We used multiple projection images as input for the CDTS-based AI model and a single-projection image as input for the CXR-based AI model to fairly compare and evaluate the performance between models. The proposed CDTS-based AI CAD system yielded sensitivities of 0.782 and 0.785 and accuracies of 0.895 and 0.837 for the performance of detecting tuberculosis and pneumonia, respectively, against normal subjects. These results show higher performance than sensitivities of 0.728 and 0.698 and accuracies of 0.874 and 0.826 for detecting tuberculosis and pneumonia through the CXR-based AI CAD, which only uses a single projection image in the frontal direction. We found that CDTS-based AI CAD improved the sensitivity of tuberculosis and pneumonia by 5.4% and 8.7% respectively, compared to CXR-based AI CAD without loss of accuracy. Therefore, we comparatively prove that CDTS-based AI CAD technology can improve performance more than CXR, enhancing the clinical applicability of CDTS. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Comments: Kyung-Su Kim, Ju Hwan Lee, and Seong Je Oh have contributed equally to this work as the co-first author. Kyung-Su Kim ([email protected]) and Myung ** Chung ([email protected]) have contributed equally to this work as the co-corresponding author

arXiv:2206.13385 [pdf, other]

3D unsupervised anomaly detection and localization through virtual multi-view projection and reconstruction: Clinical validation on low-dose chest computed tomography

Authors: Kyung-Su Kim, Seong Je Oh, Ju Hwan Lee, Myung ** Chung

Abstract: Computer-aided diagnosis for low-dose computed tomography (CT) based on deep learning has recently attracted attention as a first-line automatic testing tool because of its high accuracy and low radiation exposure. However, existing methods rely on supervised learning, imposing an additional burden to doctors for collecting disease data or annotating spatial labels for network training, consequent… ▽ More Computer-aided diagnosis for low-dose computed tomography (CT) based on deep learning has recently attracted attention as a first-line automatic testing tool because of its high accuracy and low radiation exposure. However, existing methods rely on supervised learning, imposing an additional burden to doctors for collecting disease data or annotating spatial labels for network training, consequently hindering their implementation. We propose a method based on a deep neural network for computer-aided diagnosis called virtual multi-view projection and reconstruction for unsupervised anomaly detection. Presumably, this is the first method that only requires data from healthy patients for training to identify three-dimensional (3D) regions containing any anomalies. The method has three key components. Unlike existing computer-aided diagnosis tools that use conventional CT slices as the network input, our method 1) improves the recognition of 3D lung structures by virtually projecting an extracted 3D lung region to obtain two-dimensional (2D) images from diverse views to serve as network inputs, 2) accommodates the input diversity gain for accurate anomaly detection, and 3) achieves 3D anomaly/disease localization through a novel 3D map restoration method using multiple 2D anomaly maps. The proposed method based on unsupervised learning improves the patient-level anomaly detection by 10% (area under the curve, 0.959) compared with a gold standard based on supervised learning (area under the curve, 0.848), and it localizes the anomaly region with 93% accuracy, demonstrating its high performance. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Comments: Kyung-Su Kim and Seong Je Oh have contributed equally to this work as the co-first author. Kyung-Su Kim ([email protected]) and Myung ** Chung ([email protected]) have contributed equally to this work as the co-corresponding author

arXiv:2205.04104 [pdf, other]

ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence

Authors: Sangshin Oh, Seyun Um, Hong-Goo Kang

Abstract: The Gumbel-softmax distribution, or Concrete distribution, is often used to relax the discrete characteristics of a categorical distribution and enable back-propagation through differentiable reparameterization. Although it reliably yields low variance gradients, it still relies on a stochastic sampling process for optimization. In this work, we present a relaxed categorical analytic bound (ReCAB)… ▽ More The Gumbel-softmax distribution, or Concrete distribution, is often used to relax the discrete characteristics of a categorical distribution and enable back-propagation through differentiable reparameterization. Although it reliably yields low variance gradients, it still relies on a stochastic sampling process for optimization. In this work, we present a relaxed categorical analytic bound (ReCAB), a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution. The proposed metric is easy to implement because it has a closed form solution, and empirical results show that it is close to the actual KLD. Along with this new metric, we propose a relaxed categorical analytic bound variational autoencoder (ReCAB-VAE) that successfully models both continuous and relaxed discrete latent representations. We implement an emotional text-to-speech synthesis system based on the proposed framework, and show that the proposed system flexibly and stably controls emotion expressions with better speech quality compared to baselines that use stochastic estimation or categorical distribution approximation. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2202.04328 [pdf, other]

CAU_KU team's submission to ADD 2022 Challenge task 1: Low-quality fake audio detection through frequency feature masking

Authors: Il-Youp Kwak, Sunmook Choi, Jonghoon Yang, Yerin Lee, Seungsang Oh

Abstract: This technical report describes Chung-Ang University and Korea University (CAU_KU) team's model participating in the Audio Deep Synthesis Detection (ADD) 2022 Challenge, track 1: Low-quality fake audio detection. For track 1, we propose a frequency feature masking (FFM) augmentation technique to deal with a low-quality audio environment. %detection that spectrogram-based models can be applied. We… ▽ More This technical report describes Chung-Ang University and Korea University (CAU_KU) team's model participating in the Audio Deep Synthesis Detection (ADD) 2022 Challenge, track 1: Low-quality fake audio detection. For track 1, we propose a frequency feature masking (FFM) augmentation technique to deal with a low-quality audio environment. %detection that spectrogram-based models can be applied. We applied FFM and mixup augmentation on five spectrogram-based deep neural network architectures that performed well for spoofing detection using mel-spectrogram and constant Q transform (CQT) features. Our best submission achieved 23.8% of EER ranked 3rd on track 1. △ Less

Submitted 9 February, 2022; originally announced February 2022.

arXiv:2112.00931 [pdf, other]

Antenna Selection in Polarization Reconfigurable MIMO (PR-MIMO) Communication Systems

Authors: Paul S. Oh, Sean S. Kwon, Andreas F. Molisch

Abstract: Adaptation of a wireless system to the polarization state of the propagation channel can improve reliability and throughput. This paper in particular considers polarization reconfigurable multiple input multiple output (PR-MIMO) systems, where both transmitter and receiver can change the (linear) polarization orientation at each element of their antenna arrays. We first introduce joint polarizatio… ▽ More Adaptation of a wireless system to the polarization state of the propagation channel can improve reliability and throughput. This paper in particular considers polarization reconfigurable multiple input multiple output (PR-MIMO) systems, where both transmitter and receiver can change the (linear) polarization orientation at each element of their antenna arrays. We first introduce joint polarization pre-post coding to maximize bounds on the capacity and the maximum eigenvalue of the channel matrix. For this we first derive approximate closed form equations of optimal polarization vectors at one link end, and then use iterative joint polarization pre-post coding to pursue joint optimal polarization vectors at both link ends. Next we investigate the combination of PR-MIMO with hybrid antenna selection / maximum ratio transmission (PR-HS/MRT), which can achieve a remarkable improvement of channel capacity and symbol error rate (SER). Further, two novel schemes of element wise and global polarization reconfiguration are presented for PR-HS/MRT. Comprehensive simulation results indicate that the proposed schemes provide 3 to 5 dB SNR gain in PR-MIMO spatial multiplexing and approximately 3 dB SNR gain in PRHS/ MRT, with concomitant improvements of channel capacity and SER. △ Less

Submitted 2 April, 2024; v1 submitted 1 December, 2021; originally announced December 2021.

arXiv:2106.04165 [pdf, other]

Neural Hybrid Automata: Learning Dynamics with Multiple Modes and Stochastic Transitions

Authors: Michael Poli, Stefano Massaroli, Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Atsushi Yamashita, Hajime Asama, **kyoo Park, Animesh Garg

Abstract: Effective control and prediction of dynamical systems often require appropriate handling of continuous-time and discrete, event-triggered processes. Stochastic hybrid systems (SHSs), common across engineering domains, provide a formalism for dynamical systems subject to discrete, possibly stochastic, state jumps and multi-modal continuous-time flows. Despite the versatility and importance of SHSs… ▽ More Effective control and prediction of dynamical systems often require appropriate handling of continuous-time and discrete, event-triggered processes. Stochastic hybrid systems (SHSs), common across engineering domains, provide a formalism for dynamical systems subject to discrete, possibly stochastic, state jumps and multi-modal continuous-time flows. Despite the versatility and importance of SHSs across applications, a general procedure for the explicit learning of both discrete events and multi-mode continuous dynamics remains an open problem. This work introduces Neural Hybrid Automata (NHAs), a recipe for learning SHS dynamics without a priori knowledge on the number of modes and inter-modal transition dynamics. NHAs provide a systematic inference method based on normalizing flows, neural differential equations and self-supervision. We showcase NHAs on several tasks, including mode recovery and flow learning in systems with stochastic transitions, and end-to-end learning of hierarchical robot controllers. △ Less

Submitted 8 June, 2021; originally announced June 2021.

arXiv:2104.10781 [pdf, other]

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

Authors: Ren Yang, Radu Timofte, **g Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh △ Less

Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: Corrected the MOS values in Table 2, and corrected some minor typos

arXiv:2102.02463 [pdf]

DIFFnet: Diffusion parameter map** network generalized for input diffusion gradient schemes and bvalues

Authors: Juhung Park, Woo** Jung, Eun-Jung Choi, Se-Hong Oh, Dongmyung Shin, Hongjun An, Jongho Lee

Abstract: In MRI, deep neural networks have been proposed to reconstruct diffusion model parameters. However, the inputs of the networks were designed for a specific diffusion gradient scheme (i.e., diffusion gradient directions and numbers) and a specific b-value that are the same as the training data. In this study, a new deep neural network, referred to as DIFFnet, is developed to function as a generaliz… ▽ More In MRI, deep neural networks have been proposed to reconstruct diffusion model parameters. However, the inputs of the networks were designed for a specific diffusion gradient scheme (i.e., diffusion gradient directions and numbers) and a specific b-value that are the same as the training data. In this study, a new deep neural network, referred to as DIFFnet, is developed to function as a generalized reconstruction tool of the diffusion-weighted signals for various gradient schemes and b-values. For generalization, diffusion signals are normalized in a q-space and then projected and quantized, producing a matrix (Qmatrix) as an input for the network. To demonstrate the validity of this approach, DIFFnet is evaluated for diffusion tensor imaging (DIFFnetDTI) and for neurite orientation dispersion and density imaging (DIFFnetNODDI). In each model, two datasets with different gradient schemes and b-values are tested. The results demonstrate accurate reconstruction of the diffusion parameters at substantially reduced processing time (approximately 8.7 times and 2240 times faster processing time than conventional methods in DTI and NODDI, respectively; less than 4% mean normalized root-mean-square errors (NRMSE) in DTI and less than 8% in NODDI). The generalization capability of the networks was further validated using reduced numbers of diffusion signals from the datasets. Different from previously proposed deep neural networks, DIFFnet does not require any specific gradient scheme and b-value for its input. As a result, it can be adopted as an online reconstruction tool for various complex diffusion imaging. △ Less

Submitted 4 February, 2021; originally announced February 2021.

arXiv:2101.10300 [pdf, other]

doi 10.1109/ICC42927.2021.9500671

Channel Estimation via Successive Denoising in MIMO OFDM Systems: A Reinforcement Learning Approach

Authors: Myeung Suk Oh, Seyyedali Hosseinalipour, Taejoon Kim, Christopher G. Brinton, David J. Love

Abstract: In general, reliable communication via multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) requires accurate channel estimation at the receiver. The existing literature largely focuses on denoising methods for channel estimation that depend on either (i) channel analysis in the time-domain with prior channel knowledge or (ii) supervised learning techniques which… ▽ More In general, reliable communication via multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) requires accurate channel estimation at the receiver. The existing literature largely focuses on denoising methods for channel estimation that depend on either (i) channel analysis in the time-domain with prior channel knowledge or (ii) supervised learning techniques which require large pre-labeled datasets for training. To address these limitations, we present a frequency-domain denoising method based on a reinforcement learning framework that does not need a priori channel knowledge and pre-labeled data. Our methodology includes a new successive channel denoising process based on channel curvature computation, for which we obtain a channel curvature magnitude threshold to identify unreliable channel estimates. Based on this process, we formulate the denoising mechanism as a Markov decision process, where we define the actions through a geometry-based channel estimation update, and the reward function based on a policy that reduces mean squared error (MSE). We then resort to Q-learning to update the channel estimates. Numerical results verify that our denoising algorithm can successfully mitigate noise in channel estimates. In particular, our algorithm provides a significant improvement over the practical least squares (LS) estimation method and provides performance that approaches that of the ideal linear minimum mean square error (LMMSE) estimation with perfect knowledge of channel statistics. △ Less

Submitted 27 March, 2024; v1 submitted 25 January, 2021; originally announced January 2021.

Comments: This paper has been published in the proceedings of 2021 IEEE International Conference on Communications (ICC)

arXiv:2012.02859 [pdf, other]

Idle speed control with low-complexity offset-free explicit model predictive control in presence of system delay

Authors: Sang Hwan Son, Se-Kyu Oh, Byung Jun Park, Min Jun Song, Jong Min Lee

Abstract: The requirement for continual improvement of idle speed control (ISC) performance is increasing due to the stringent regulation on emission and fuel economy these days. In this regard, a low-complexity offset-free explicit model predictive control (EMPC) with constraint horizon is designed to regulate the idle speed under unmeasured disturbance in presence of system delay with rigorous formulation… ▽ More The requirement for continual improvement of idle speed control (ISC) performance is increasing due to the stringent regulation on emission and fuel economy these days. In this regard, a low-complexity offset-free explicit model predictive control (EMPC) with constraint horizon is designed to regulate the idle speed under unmeasured disturbance in presence of system delay with rigorous formulation. Particularly, we developed a high-fidelity 4-stroke gasoline-direct injected spark-ignited engine model based on first-principles and test vehicle driving data, and designed a model predictive ISC system. To handle the delay from intake to torque production, we constructed a control-oriented model with delay augmentation. To reject the influence of torque loss, we implemented the offset-free MPC scheme with disturbance model and estimator. Moreover, to deal with the limited capacity assigned for the controller in the engine control unit and the short sampling instant of the engine system, we formulated a low-complexity multiparametric quadratic program with constraint horizon in presence of system delay in state and input variables, and obtained an explicit solution map. To demonstrate the performance of the designed controller, a series of closed-loop simulations were performed. The developed explicit controller showed proper ISC performance in presence of torque loss and system delay. △ Less

Submitted 13 December, 2020; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2011.08061 [pdf, other]

FRDet: Balanced and Lightweight Object Detector based on Fire-Residual Modules for Embedded Processor of Autonomous Driving

Authors: Seontaek Oh, Ji-Hwan You, Young-Keun Kim

Abstract: For deployment on an embedded processor for autonomous driving, the object detection network should satisfy all of the accuracy, real-time inference, and light model size requirements. Conventional deep CNN-based detectors aim for high accuracy, making their model size heavy for an embedded system with limited memory space. In contrast, lightweight object detectors are greatly compressed but at a… ▽ More For deployment on an embedded processor for autonomous driving, the object detection network should satisfy all of the accuracy, real-time inference, and light model size requirements. Conventional deep CNN-based detectors aim for high accuracy, making their model size heavy for an embedded system with limited memory space. In contrast, lightweight object detectors are greatly compressed but at a significant sacrifice of accuracy. Therefore, we propose FRDet, a lightweight one-stage object detector that is balanced to satisfy all the constraints of accuracy, model size, and real-time processing on an embedded GPU processor for autonomous driving applications. Our network aims to maximize the compression of the model while achieving or surpassing YOLOv3 level of accuracy. This paper proposes the Fire-Residual (FR) module to design a lightweight network with low accuracy loss by adapting fire modules with residual skip connections. In addition, the Gaussian uncertainty modeling of the bounding box is applied to further enhance the localization accuracy. Experiments on the KITTI dataset showed that FRDet reduced the memory size by 50.8% but achieved higher accuracy by 1.12% mAP compared to YOLOv3. Moreover, the real-time detection speed reached 31.3 FPS on an embedded GPU board(NVIDIA Xavier). The proposed network achieved higher compression with comparable accuracy compared to other deep CNN object detectors while showing improved accuracy than the lightweight detector baselines. Therefore, the proposed FRDet is a well-balanced and efficient object detector for practical application in autonomous driving that can satisfies all the criteria of accuracy, real-time inference, and light model size. △ Less

Submitted 16 November, 2020; originally announced November 2020.

arXiv:2011.07673 [pdf, other]

Spatiotemporal Characteristics of Ride-sourcing Operation in Urban Area

Authors: Simon Oh, Daniel Kondor, Ravi Seshadri, Meng Zhou, Diem-Trinh Le, Moshe Ben-Akiva

Abstract: The emergence of ride-sourcing platforms has brought an innovative alternative in transportation, radically changed travel behaviors, and suggested new directions for transportation planners and operators. This paper provides an exploratory analysis on the operations of a ride-sourcing service using large-scale data on service performance. Observations over multiple days in Singapore suggest repro… ▽ More The emergence of ride-sourcing platforms has brought an innovative alternative in transportation, radically changed travel behaviors, and suggested new directions for transportation planners and operators. This paper provides an exploratory analysis on the operations of a ride-sourcing service using large-scale data on service performance. Observations over multiple days in Singapore suggest reproducible demand patterns and provide empirical estimates of fleet operations over time and space. During peak periods, we observe significant increases in the service rate along with surge price multipliers. We perform an in-depth analysis of fleet utilization rates and are able to explain daily patterns based on drivers' behavior by involving the number of shifts, shift duration, and shift start and end time choices. We also evaluate metrics of user experience, namely waiting and travel time distribution, and explain our empirical findings with distance metrics from driver trajectory analysis and congestion patterns. Our results of empirical observations on actual service in Singapore can help to understand the spatiotemporal characteristics of ride-sourcing services and provide important insights for transportation planning and operations. △ Less

Submitted 15 November, 2020; originally announced November 2020.

Comments: 18 pages, 11 figures, 5 tables

arXiv:2010.11585 [pdf]

doi 10.3390/futuretransp1030034

A simulation-based evaluation of a Cargo-Hitching service for E-commerce using mobility-on-demand vehicles

Authors: Andre Alho, Takanori Sakai, Simon Oh, Cheng Cheng, Ravi Seshadri, Wen Han Chong, Yusuke Hara, Julia Caravias, Lynette Cheah, Moshe Ben-Akiva

Abstract: Time-sensitive parcel deliveries, shipments requested for delivery in a day or less, are an increasingly important research subject. It is challenging to deal with these deliveries from a carrier perspective since it entails additional planning constraints, preventing an efficient consolidation of deliveries which is possible when demand is well known in advance. Furthermore, such time-sensitive d… ▽ More Time-sensitive parcel deliveries, shipments requested for delivery in a day or less, are an increasingly important research subject. It is challenging to deal with these deliveries from a carrier perspective since it entails additional planning constraints, preventing an efficient consolidation of deliveries which is possible when demand is well known in advance. Furthermore, such time-sensitive deliveries are requested to a wider spatial scope than retail centers, including homes and offices. Therefore, an increase in such deliveries is considered to exacerbate negative externalities such as congestion and emissions. One of the solutions is to leverage spare capacity in passenger transport modes. This concept is often denominated as cargo-hitching. While there are various possible system designs, it is crucial that such solution does not deteriorate the quality of service of passenger trips. This research aims to evaluate the use of Mobility-On-Demand services to perform same-day parcel deliveries. For this purpose, we use SimMobility, a high-resolution agent-based simulation platform of passenger and freight flows, applied in Singapore. E-commerce demand carrier data are used to characterize simulated parcel delivery demand. Operational scenarios that aim to minimize the adverse effect of fulfilling deliveries with Mobility-On-Demand vehicles on Mobility-On-Demand passenger flows (fulfillment, wait and travel times) are explored. Results indicate that the Mobility-On-Demand services have potential to fulfill a considerable amount of parcel deliveries and decrease freight vehicle traffic and total vehicle-kilometers-travelled without compromising the quality of Mobility On-Demand for passenger travel. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: 19 pages, 4 tables, 7 figures. Submitted to Transportation (Springer)

Journal ref: Future Transp. 2021, 1, 639-656

arXiv:2010.10282 [pdf, other]

User-Number Threshold-based Base Station On/Off Control for Maximizing Coverage Probability

Authors: Jung-Hoon Noh, Seong-Jun Oh

Abstract: In this study, we investigate the operation of user-number threshold-based base station (BS) on/off control, in which the BS turns off when the number of active users is less than a specific threshold value. This paper presents a space-based analysis of the BS on/off control system to which a stochastic geometric approach is applied. In particular, we derive the approximated closed-form expression… ▽ More In this study, we investigate the operation of user-number threshold-based base station (BS) on/off control, in which the BS turns off when the number of active users is less than a specific threshold value. This paper presents a space-based analysis of the BS on/off control system to which a stochastic geometric approach is applied. In particular, we derive the approximated closed-form expression of the coverage probability of a homogeneous network (HomNet) with the user-number threshold-based on/off control. Moreover, the optimal user-number threshold for maximizing the coverage probability is analytically derived. In addition to HomNet, we also derive the overall coverage probability and the optimal user-number thresholds for a heterogeneous network (HetNet). The results show that HetNet, the analysis of which seems intractable, can be analyzed in the form of a linear combination of HomNets with weighted densities. In addition, the optimal user-number threshold of each tier is obtained independently of other tiers. The modeling and analysis presented in this paper are not only limited to the case of user-number threshold-based on/off control, but also applicable to other novel on/off controls with minor modifications. Finally, by comparing with the simulated results, the theoretical contributions of this study are validated. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2008.10584 [pdf, other]

Accurate Alignment Inspection System for Low-resolution Automotive and Mobility LiDAR

Authors: Seontake Oh, Ji-Hwan You, Azim Eskandarian, Young-Keun Kim

Abstract: A misalignment of LiDAR as low as a few degrees could cause a significant error in obstacle detection and map** that could cause safety and quality issues. In this paper, an accurate inspection system is proposed for estimating a LiDAR alignment error after sensor attachment on a mobility system such as a vehicle or robot. The proposed method uses only a single target board at the fixed position… ▽ More A misalignment of LiDAR as low as a few degrees could cause a significant error in obstacle detection and map** that could cause safety and quality issues. In this paper, an accurate inspection system is proposed for estimating a LiDAR alignment error after sensor attachment on a mobility system such as a vehicle or robot. The proposed method uses only a single target board at the fixed position to estimate the three orientations (roll, tilt, and yaw) and the horizontal position of the LiDAR attachment with sub-degree and millimeter level accuracy. After the proposed preprocessing steps, the feature beam points that are the closest to each target corner are extracted and used to calculate the sensor attachment pose with respect to the target board frame using a nonlinear optimization method and with a low computational cost. The performance of the proposed method is evaluated using a test bench that can control the reference yaw and horizontal translation of LiDAR within ranges of 3 degrees and 30 millimeters, respectively. The experimental results for a low-resolution 16 channel LiDAR (Velodyne VLP-16) confirmed that misalignment could be estimated with accuracy within 0.2 degrees and 4 mm. The high accuracy and simplicity of the proposed system make it practical for large-scale industrial applications such as automobile or robot manufacturing process that inspects the sensor attachment for the safety quality control. △ Less

Submitted 24 August, 2020; originally announced August 2020.

arXiv:2008.10542 [pdf, other]

Automatic LiDAR Extrinsic Calibration System using Photodetector and Planar Board for Large-scale Applications

Authors: Ji-Hwan You, Seon Taek Oh, Jae-Eun Park, Azim Eskandarian, Young-Keun Kim

Abstract: This paper presents a novel automatic calibration system to estimate the extrinsic parameters of LiDAR mounted on a mobile platform for sensor misalignment inspection in the large-scale production of highly automated vehicles. To obtain subdegree and subcentimeter accuracy levels of extrinsic calibration, this study proposed a new concept of a target board with embedded photodetector arrays, named… ▽ More This paper presents a novel automatic calibration system to estimate the extrinsic parameters of LiDAR mounted on a mobile platform for sensor misalignment inspection in the large-scale production of highly automated vehicles. To obtain subdegree and subcentimeter accuracy levels of extrinsic calibration, this study proposed a new concept of a target board with embedded photodetector arrays, named the PD-target system, to find the precise position of the correspondence laser beams on the target surface. Furthermore, the proposed system requires only the simple design of the target board at the fixed pose in a close range to be readily applicable in the automobile manufacturing environment. The experimental evaluation of the proposed system on low-resolution LiDAR showed that the LiDAR offset pose can be estimated within 0.1 degree and 3 mm levels of precision. The high accuracy and simplicity of the proposed calibration system make it practical for large-scale applications for the reliability and safety of autonomous systems. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: prepost for IEEE journal

arXiv:2007.09597 [pdf]

doi 10.1016/j.neuroimage.2020.117432

DeepResp: Deep learning solution for respiration-induced B0 fluctuation artifacts in multi-slice GRE

Authors: Hongjun An, Hyeong-Geol Shin, Sooyoen Ji, Woo** Jung, Sehong Oh, Dongmyung Shin, Juhyung Park, Jongho Lee

Abstract: Respiration-induced B$_0$ fluctuation corrupts MRI images by inducing phase errors in k-space. A few approaches such as navigator have been proposed to correct for the artifacts at the expense of sequence modification. In this study, a new deep learning method, which is referred to as DeepResp, is proposed for reducing the respiration-artifacts in multi-slice gradient echo (GRE) images. DeepResp i… ▽ More Respiration-induced B$_0$ fluctuation corrupts MRI images by inducing phase errors in k-space. A few approaches such as navigator have been proposed to correct for the artifacts at the expense of sequence modification. In this study, a new deep learning method, which is referred to as DeepResp, is proposed for reducing the respiration-artifacts in multi-slice gradient echo (GRE) images. DeepResp is designed to extract the respiration-induced phase errors from a complex image using deep neural networks. Then, the network-generated phase errors are applied to the k-space data, creating an artifact-corrected image. For network training, the computer-simulated images were generated using artifact-free images and respiration data. When evaluated, both simulated images and in-vivo images of two different breathing conditions (deep breathing and natural breathing) show improvements (simulation: normalized root-mean-square error (NRMSE) from 7.8% to 1.3%; structural similarity (SSIM) from 0.88 to 0.99; ghost-to-signal-ratio (GSR) from 7.9% to 0.6%; deep breathing: NRMSE from 13.9% to 5.8%; SSIM from 0.86 to 0.95; GSR 20.2% to 5.7%; natural breathing: NRMSE from 5.2% to 4.0%; SSIM from 0.94 to 0.97; GSR 5.7% to 2.8%). Our approach does not require any modification of the sequence or additional hardware, and may therefore find useful applications. Furthermore, the deep neural networks extract respiration-induced phase errors, which is more interpretable and reliable than results of end-to-end trained networks. △ Less

Submitted 19 July, 2020; originally announced July 2020.

Comments: 19 pages

arXiv:2003.09171 [pdf, other]

DMV: Visual Object Tracking via Part-level Dense Memory and Voting-based Retrieval

Authors: Gunhee Nam, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

Abstract: We propose a novel memory-based tracker via part-level dense memory and voting-based retrieval, called DMV. Since deep learning techniques have been introduced to the tracking field, Siamese trackers have attracted many researchers due to the balance between speed and accuracy. However, most of them are based on a single template matching, which limits the performance as it restricts the accessibl… ▽ More We propose a novel memory-based tracker via part-level dense memory and voting-based retrieval, called DMV. Since deep learning techniques have been introduced to the tracking field, Siamese trackers have attracted many researchers due to the balance between speed and accuracy. However, most of them are based on a single template matching, which limits the performance as it restricts the accessible in-formation to the initial target features. In this paper, we relieve this limitation by maintaining an external memory that saves the tracking record. Part-level retrieval from the memory also liberates the information from the template and allows our tracker to better handle the challenges such as appearance changes and occlusions. By updating the memory during tracking, the representative power for the target object can be enhanced without online learning. We also propose a novel voting mechanism for the memory reading to filter out unreliable information in the memory. We comprehensively evaluate our tracker on OTB-100,TrackingNet, GOT-10k, LaSOT, and UAV123, which show that our method yields comparable results to the state-of-the-art methods. △ Less

Submitted 20 March, 2020; originally announced March 2020.

Comments: 19 pages, 9 figures

arXiv:2003.09124 [pdf, other]

Learning the Loss Functions in a Discriminative Space for Video Restoration

Authors: Younghyun Jo, Jaeyeon Kang, Seoung Wug Oh, Seonghyeon Nam, Peter Vajda, Seon Joo Kim

Abstract: With more advanced deep network architectures and learning schemes such as GANs, the performance of video restoration algorithms has greatly improved recently. Meanwhile, the loss functions for optimizing deep neural networks remain relatively unchanged. To this end, we propose a new framework for building effective loss functions by learning a discriminative space specific to a video restoration… ▽ More With more advanced deep network architectures and learning schemes such as GANs, the performance of video restoration algorithms has greatly improved recently. Meanwhile, the loss functions for optimizing deep neural networks remain relatively unchanged. To this end, we propose a new framework for building effective loss functions by learning a discriminative space specific to a video restoration task. Our framework is similar to GANs in that we iteratively train two networks - a generator and a loss network. The generator learns to restore videos in a supervised fashion, by following ground truth features through the feature matching in the discriminative space learned by the loss network. In addition, we also introduce a new relation loss in order to maintain the temporal consistency in output videos. Experiments on video superresolution and deblurring show that our method generates visually more pleasing videos with better quantitative perceptual metric values than the other state-of-the-art methods. △ Less

Submitted 20 March, 2020; originally announced March 2020.

Comments: 24 pages

arXiv:1912.09015 [pdf]

doi 10.1109/TMI.2020.3018508

Deep Reinforcement Learning Designed Shinnar-Le Roux RF Pulse using Root-Flip**: DeepRF_SLR

Authors: Dongmyung Shin, Sooyeon Ji, Doohee Lee, Jieun Lee, Se-Hong Oh, Jongho Lee

Abstract: A novel approach of applying deep reinforcement learning to an RF pulse design is introduced. This method, which is referred to as DeepRF_SLR, is designed to minimize the peak amplitude or, equivalently, minimize the pulse duration of a multiband refocusing pulse generated by the Shinar Le-Roux (SLR) algorithm. In the method, the root pattern of SLR polynomial, which determines the RF pulse shape,… ▽ More A novel approach of applying deep reinforcement learning to an RF pulse design is introduced. This method, which is referred to as DeepRF_SLR, is designed to minimize the peak amplitude or, equivalently, minimize the pulse duration of a multiband refocusing pulse generated by the Shinar Le-Roux (SLR) algorithm. In the method, the root pattern of SLR polynomial, which determines the RF pulse shape, is optimized by iterative applications of deep reinforcement learning and greedy tree search. When tested for the designs of the multiband factors of three and seven RFs, DeepRF_SLR demonstrated improved performance compared to conventional methods, generating shorter duration RF pulses in shorter computational time. In the experiments, the RF pulse from DeepRF_SLR produced a slice profile similar to the minimum-phase SLR RF pulse and the profiles matched to that of the computer simulation. Our approach suggests a new way of designing an RF by applying a machine learning algorithm, demonstrating a machine-designed MRI sequence. △ Less

Submitted 1 September, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

Comments: Accepted at IEEE transactions on Medical Imaging (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9174664)

arXiv:1911.04069 [pdf, other]

Generative Autoregressive Networks for 3D Dancing Move Synthesis from Music

Authors: Hyemin Ahn, Jaehun Kim, Kihyun Kim, Songhwai Oh

Abstract: This paper proposes a framework which is able to generate a sequence of three-dimensional human dance poses for a given music. The proposed framework consists of three components: a music feature encoder, a pose generator, and a music genre classifier. We focus on integrating these components for generating a realistic 3D human dancing move from music, which can be applied to artificial agents and… ▽ More This paper proposes a framework which is able to generate a sequence of three-dimensional human dance poses for a given music. The proposed framework consists of three components: a music feature encoder, a pose generator, and a music genre classifier. We focus on integrating these components for generating a realistic 3D human dancing move from music, which can be applied to artificial agents and humanoid robots. The trained dance pose generator, which is a generative autoregressive model, is able to synthesize a dance sequence longer than 5,000 pose frames. Experimental results of generated dance sequences from various songs show how the proposed method generates human-like dancing move to a given music. In addition, a generated 3D dance sequence is applied to a humanoid robot, showing that the proposed framework can make a robot to dance just by listening to music. △ Less

Submitted 10 November, 2019; originally announced November 2019.

Comments: 8 pages, 10 figures

arXiv:1911.03038 [pdf, other]

Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels

Authors: Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

Abstract: Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communications. Asymptotically optimal channel codes have been developed by mathematicians for communicating under canonical models after over 60 years of research. On the other hand, in many non-canonical channel settings, optimal codes do not exist… ▽ More Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communications. Asymptotically optimal channel codes have been developed by mathematicians for communicating under canonical models after over 60 years of research. On the other hand, in many non-canonical channel settings, optimal codes do not exist and the codes designed for canonical models are adapted via heuristics to these channels and are thus not guaranteed to be optimal. In this work, we make significant progress on this problem by designing a fully end-to-end jointly trained neural encoder and decoder, namely, Turbo Autoencoder (TurboAE), with the following contributions: ($a$) under moderate block lengths, TurboAE approaches state-of-the-art performance under canonical channels; ($b$) moreover, TurboAE outperforms the state-of-the-art codes under non-canonical settings in terms of reliability. TurboAE shows that the development of channel coding design can be automated via deep learning, with near-optimal performance. △ Less

Submitted 7 November, 2019; originally announced November 2019.

arXiv:1911.01635 [pdf, other]

Emotional speech synthesis with rich and granularized control

Authors: Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang

Abstract: This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion… ▽ More This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion category while maximizing its distance to the other emotion categories. To further enhance the expressiveness of a target speech, we also introduce an effective interpolation technique that enables the intensity of a target emotion to be gradually changed to that of neutral speech. Subjective evaluation results in terms of emotional expressiveness and controllability show the superiority of the proposed algorithm to the conventional methods. △ Less

Submitted 5 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

Comments: Submitted to ICASSP 2020

arXiv:1908.05895 [pdf, other]

Distilling On-Device Intelligence at the Network Edge

Authors: Jihong Park, Shiqiang Wang, Anis Elgabli, Seungeun Oh, Eunjeong Jeong, Han Cha, Hyesung Kim, Seong-Lyun Kim, Mehdi Bennis

Abstract: Devices at the edge of wireless networks are the last mile data sources for machine learning (ML). As opposed to traditional ready-made public datasets, these user-generated private datasets reflect the freshest local environments in real time. They are thus indispensable for enabling mission-critical intelligent systems, ranging from fog radio access networks (RANs) to driverless cars and e-Healt… ▽ More Devices at the edge of wireless networks are the last mile data sources for machine learning (ML). As opposed to traditional ready-made public datasets, these user-generated private datasets reflect the freshest local environments in real time. They are thus indispensable for enabling mission-critical intelligent systems, ranging from fog radio access networks (RANs) to driverless cars and e-Health wearables. This article focuses on how to distill high-quality on-device ML models using fog computing, from such user-generated private data dispersed across wirelessly connected devices. To this end, we introduce communication-efficient and privacy-preserving distributed ML frameworks, termed fog ML (FML), wherein on-device ML models are trained by exchanging model parameters, model outputs, and surrogate data. We then present advanced FML frameworks addressing wireless RAN characteristics, limited on-device resources, and imbalanced data distributions. Our study suggests that the full potential of FML can be reached by co-designing communication and distributed ML operations while accounting for heterogeneous hardware specifications, data characteristics, and user requirements. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: 7 pages, 6 figures; This work has been submitted to the IEEE for possible publication

arXiv:1907.09707 [pdf, other]

RRNet: Repetition-Reduction Network for Energy Efficient Decoder of Depth Estimation

Authors: Sangyun Oh, Hye-** S. Kim, Jongeun Lee, Junmo Kim

Abstract: We introduce Repetition-Reduction network (RRNet) for resource-constrained depth estimation, offering significantly improved efficiency in terms of computation, memory and energy consumption. The proposed method is based on repetition-reduction (RR) blocks. The RR blocks consist of the set of repeated convolutions and the residual connection layer that take place of the pointwise reduction layer w… ▽ More We introduce Repetition-Reduction network (RRNet) for resource-constrained depth estimation, offering significantly improved efficiency in terms of computation, memory and energy consumption. The proposed method is based on repetition-reduction (RR) blocks. The RR blocks consist of the set of repeated convolutions and the residual connection layer that take place of the pointwise reduction layer with linear connection to the decoder. The RRNet help reduce memory usage and power consumption in the residual connections to the decoder layers. RRNet consumes approximately 3.84 times less energy and 3.06 times less meory and is approaximately 2.21 times faster, without increasing the demand on hardware resource relative to the baseline network (Godard et al, CVPR'17), outperforming current state-of-the-art lightweight architectures such as SqueezeNet, ShuffleNet, MobileNet and PyDNet. △ Less

Submitted 31 July, 2019; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: 9 pages, 5 figures

arXiv:1907.06426 [pdf, other]

Multi-hop Federated Private Data Augmentation with Sample Compression

Authors: Eunjeong Jeong, Seungeun Oh, Jihong Park, Hyesung Kim, Mehdi Bennis, Seong-Lyun Kim

Abstract: On-device machine learning (ML) has brought about the accessibility to a tremendous amount of data from the users while kee** their local data private instead of storing it in a central entity. However, for privacy guarantee, it is inevitable at each device to compensate for the quality of data or learning performance, especially when it has a non-IID training dataset. In this paper, we propose… ▽ More On-device machine learning (ML) has brought about the accessibility to a tremendous amount of data from the users while kee** their local data private instead of storing it in a central entity. However, for privacy guarantee, it is inevitable at each device to compensate for the quality of data or learning performance, especially when it has a non-IID training dataset. In this paper, we propose a data augmentation framework using a generative model: multi-hop federated augmentation with sample compression (MultFAug). A multi-hop protocol speeds up the end-to-end over-the-air transmission of seed samples by enhancing the transport capacity. The relaying devices guarantee stronger privacy preservation as well since the origin of each seed sample is hidden in those participants. For further privatization on the individual sample level, the devices compress their data samples. The devices sparsify their data samples prior to transmissions to reduce the sample size, which impacts the communication payload. This preprocessing also strengthens the privacy of each sample, which corresponds to the input perturbation for preserving sample privacy. The numerical evaluations show that the proposed framework significantly improves privacy guarantee, transmission delay, and local training performance with adjustment to the number of hops and compression rate. △ Less

Submitted 15 July, 2019; originally announced July 2019.

Comments: to be presented at the 28th International Joint Conference on Artificial Intelligence (IJCAI-19), 1st International Workshop on Federated Machine Learning for User Privacy and Data Confidentiality (FML'19), Macao, China

arXiv:1903.02295 [pdf, other]

DeepTurbo: Deep Turbo Decoder

Authors: Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

Abstract: Present-day communication systems routinely use codes that approach the channel capacity when coupled with a computationally efficient decoder. However, the decoder is typically designed for the Gaussian noise channel and is known to be sub-optimal for non-Gaussian noise distribution. Deep learning methods offer a new approach for designing decoders that can be trained and tailored for arbitrary c… ▽ More Present-day communication systems routinely use codes that approach the channel capacity when coupled with a computationally efficient decoder. However, the decoder is typically designed for the Gaussian noise channel and is known to be sub-optimal for non-Gaussian noise distribution. Deep learning methods offer a new approach for designing decoders that can be trained and tailored for arbitrary channel statistics. We focus on Turbo codes and propose DeepTurbo, a novel deep learning based architecture for Turbo decoding. The standard Turbo decoder (Turbo) iteratively applies the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm with an interleaver in the middle. A neural architecture for Turbo decoding termed (NeuralBCJR), was proposed recently. There, the key idea is to create a module that imitates the BCJR algorithm using supervised learning, and to use the interleaver architecture along with this module, which is then fine-tuned using end-to-end training. However, knowledge of the BCJR algorithm is required to design such an architecture, which also constrains the resulting learned decoder. Here we remedy this requirement and propose a fully end-to-end trained neural decoder - Deep Turbo Decoder (DeepTurbo). With novel learnable decoder structure and training methodology, DeepTurbo reveals superior performance under both AWGN and non-AWGN settings as compared to the other two decoders - Turbo and NeuralBCJR. Furthermore, among all the three, DeepTurbo exhibits the lowest error floor. △ Less

Submitted 24 April, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

arXiv:1811.12707 [pdf, other]

LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks

Authors: Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

Abstract: Designing channel codes under low-latency constraints is one of the most demanding requirements in 5G standards. However, a sharp characterization of the performance of traditional codes is available only in the large block-length limit. Guided by such asymptotic analysis, code designs require large block lengths as well as latency to achieve the desired error rate. Tail-biting convolutional codes… ▽ More Designing channel codes under low-latency constraints is one of the most demanding requirements in 5G standards. However, a sharp characterization of the performance of traditional codes is available only in the large block-length limit. Guided by such asymptotic analysis, code designs require large block lengths as well as latency to achieve the desired error rate. Tail-biting convolutional codes and other recent state-of-the-art short block codes, while promising reduced latency, are neither robust to channel-mismatch nor adaptive to varying channel conditions. When the codes designed for one channel (e.g.,~Additive White Gaussian Noise (AWGN) channel) are used for another (e.g.,~non-AWGN channels), heuristics are necessary to achieve non-trivial performance. In this paper, we first propose an end-to-end learned neural code, obtained by jointly designing a Recurrent Neural Network (RNN) based encoder and decoder. This code outperforms canonical convolutional code under block settings. We then leverage this experience to propose a new class of codes under low-latency constraints, which we call Low-latency Efficient Adaptive Robust Neural (LEARN) codes. These codes outperform state-of-the-art low-latency codes and exhibit robustness and adaptivity properties. LEARN codes show the potential to design new versatile and universal codes for future communications via tools of modern deep learning coupled with communication engineering insights. △ Less

Submitted 24 July, 2020; v1 submitted 30 November, 2018; originally announced November 2018.

arXiv:1811.02182 [pdf, ps, other]

doi 10.1109/LSP.2018.2880285

Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

Authors: Geonmin Kim, Hwaran Lee, Bo-Kyeong Kim, Sang-Hoon Oh, Soo-Young Lee

Abstract: Many speech enhancement methods try to learn the relationship between noisy and clean speech, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this work is proposing an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the lik… ▽ More Many speech enhancement methods try to learn the relationship between noisy and clean speech, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this work is proposing an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the likelihood of transcription on the pre-trained acoustic model and having general characteristics of clean speech, which improve generalization on unseen noisy speeches. We employ the connectionist temporal classification and the unpaired conditional boundary equilibrium generative adversarial network as the loss function of AAS. AAS is tested on two datasets including additive noise without and with reverberation, Librispeech + DEMAND and CHiME-4. By visualizing the enhanced speech with different loss combinations, we demonstrate the role of each supervision. AAS achieves a lower word error rate than other state-of-the-art methods using the clean speech target in both datasets. △ Less

Submitted 6 November, 2018; originally announced November 2018.

Comments: will be published in IEEE Signal Processing Letter

Showing 1–50 of 50 results for author: Oh, S