Search | arXiv e-print repository

Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment

Authors: Hee** Do, Wonjun Lee, Gary Geunbae Lee

Abstract: In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-linearly inte… ▽ More In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-linearly interpolating with the in-batch averaged feature, to address data scarcity and score-label imbalances. Primarily using goodness-of-pronunciation as an acoustic feature, we tailor mixup designs to suit pronunciation assessment. Further, we integrate fine-grained error-rate features by comparing speech recognition results with the original answer phonemes, giving direct hints for mispronunciation. Effective mixing of the acoustic features notably enhances overall scoring performances on the speechocean762 dataset, and detailed analysis highlights our potential to predict unseen distortions. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Interspeech 2024

arXiv:2404.15533 [pdf, other]

Designing, simulating, and performing the 100-AV field test for the CIRCLES consortium: Methodology and Implementation of the Largest mobile traffic control experiment to date

Authors: Mostafa Ameli, Sean Mcquade, Jonathan W. Lee, Matthew Bunting, Matthew Nice, Han Wang, William Barbour, Ryan Weightman, Chris Denaro, Ryan Delorenzo, Sharon Hornstein, Jon F. Davis, Dan Timsit, Riley Wagner, Rita Xu, Malaika Mahmood, Mikail Mahmood, Maria Laura Delle Monache, Benjamin Seibold, Daniel B. Work, Jonathan Sprinkle, Benedetto Piccoli, Alexandre M. Bayen

Abstract: Previous controlled experiments on single-lane ring roads have shown that a single partially autonomous vehicle (AV) can effectively mitigate traffic waves. This naturally prompts the question of how these findings can be generalized to field operational, high-density traffic conditions. To address this question, the Congestion Impacts Reduction via CAV-in-the-loop Lagrangian Energy Smoothing (CIR… ▽ More Previous controlled experiments on single-lane ring roads have shown that a single partially autonomous vehicle (AV) can effectively mitigate traffic waves. This naturally prompts the question of how these findings can be generalized to field operational, high-density traffic conditions. To address this question, the Congestion Impacts Reduction via CAV-in-the-loop Lagrangian Energy Smoothing (CIRCLES) Consortium conducted MegaVanderTest (MVT), a live traffic control experiment involving 100 vehicles near Nashville, TN, USA. This article is a tutorial for develo** analytical and simulation-based tools essential for designing and executing a live traffic control experiment like the MVT. It presents an overview of the proposed roadmap and various procedures used in designing, monitoring, and conducting the MVT, which is the largest mobile traffic control experiment at the time. The design process is aimed at evaluating the impact of the CIRCLES AVs on surrounding traffic. The article discusses the agent-based traffic simulation framework created for this evaluation. A novel methodological framework is introduced to calibrate this microsimulation, aiming to accurately capture traffic dynamics and assess the impact of adding 100 vehicles to existing traffic. The calibration model's effectiveness is verified using data from a six-mile section of Nashville's I-24 highway. The results indicate that the proposed model establishes an effective feedback loop between the optimizer and the simulator, thereby calibrating flow and speed with different spatiotemporal characteristics to minimize the error between simulated and real-world data. Finally, We simulate AVs in multiple scenarios to assess their effect on traffic congestion. This evaluation validates the AV routes, thereby contributing to the execution of a safe and successful live traffic control experiment via AVs. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.02135 [pdf]

Enhancing Ship Classification in Optical Satellite Imagery: Integrating Convolutional Block Attention Module with ResNet for Improved Performance

Authors: Ryan Donghan Kwon, Gangjoo Robin Nam, Jisoo Tak, Junseob Shin, Hyerin Cha, Yeom Hyeok, Seung Won Lee

Abstract: This study presents an advanced Convolutional Neural Network (CNN) architecture for ship classification from optical satellite imagery, significantly enhancing performance through the integration of the Convolutional Block Attention Module (CBAM) and additional architectural innovations. Building upon the foundational ResNet50 model, we first incorporated a standard CBAM to direct the model's focu… ▽ More This study presents an advanced Convolutional Neural Network (CNN) architecture for ship classification from optical satellite imagery, significantly enhancing performance through the integration of the Convolutional Block Attention Module (CBAM) and additional architectural innovations. Building upon the foundational ResNet50 model, we first incorporated a standard CBAM to direct the model's focus towards more informative features, achieving an accuracy of 87% compared to the baseline ResNet50's 85%. Further augmentations involved multi-scale feature integration, depthwise separable convolutions, and dilated convolutions, culminating in the Enhanced ResNet Model with Improved CBAM. This model demonstrated a remarkable accuracy of 95%, with precision, recall, and f1-scores all witnessing substantial improvements across various ship classes. The bulk carrier and oil tanker classes, in particular, showcased nearly perfect precision and recall rates, underscoring the model's enhanced capability in accurately identifying and classifying ships. Attention heatmap analyses further validated the improved model's efficacy, revealing a more focused attention on relevant ship features, regardless of background complexities. These findings underscore the potential of integrating attention mechanisms and architectural innovations in CNNs for high-resolution satellite imagery classification. The study navigates through the challenges of class imbalance and computational costs, proposing future directions towards scalability and adaptability in new or rare ship type recognition. This research lays a groundwork for the application of advanced deep learning techniques in the domain of remote sensing, offering insights into scalable and efficient satellite image classification. △ Less

Submitted 8 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2402.17050 [pdf, other]

Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

Authors: Kathy Jang, Nathan Lichtlé, Eugene Vinitsky, Adit Shah, Matthew Bunting, Matthew Nice, Benedetto Piccoli, Benjamin Seibold, Daniel B. Work, Maria Laura Delle Monache, Jonathan Sprinkle, Jonathan W. Lee, Alexandre M. Bayen

Abstract: In this article, we explore the technical details of the reinforcement learning (RL) algorithms that were deployed in the largest field test of automated vehicles designed to smooth traffic flow in history as of 2023, uncovering the challenges and breakthroughs that come with develo** RL controllers for automated vehicles. We delve into the fundamental concepts behind RL algorithms and their app… ▽ More In this article, we explore the technical details of the reinforcement learning (RL) algorithms that were deployed in the largest field test of automated vehicles designed to smooth traffic flow in history as of 2023, uncovering the challenges and breakthroughs that come with develo** RL controllers for automated vehicles. We delve into the fundamental concepts behind RL algorithms and their application in the context of self-driving cars, discussing the developmental process from simulation to deployment in detail, from designing simulators to reward function sha**. We present the results in both simulation and deployment, discussing the flow-smoothing benefits of the RL controller. From understanding the basics of Markov decision processes to exploring advanced techniques such as deep RL, our article offers a comprehensive overview and deep dive of the theoretical foundations and practical implementations driving this rapidly evolving field. We also showcase real-world case studies and alternative research projects that highlight the impact of RL controllers in revolutionizing autonomous driving. From tackling complex urban environments to dealing with unpredictable traffic scenarios, these intelligent controllers are pushing the boundaries of what automated vehicles can achieve. Furthermore, we examine the safety considerations and hardware-focused technical details surrounding deployment of RL controllers into automated vehicles. As these algorithms learn and evolve through interactions with the environment, ensuring their behavior aligns with safety standards becomes crucial. We explore the methodologies and frameworks being developed to address these challenges, emphasizing the importance of building reliable control systems for automated vehicles. △ Less

Submitted 14 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.17043 [pdf, other]

Traffic Control via Connected and Automated Vehicles: An Open-Road Field Experiment with 100 CAVs

Authors: Jonathan W. Lee, Han Wang, Kathy Jang, Amaury Hayat, Matthew Bunting, Arwa Alanqary, William Barbour, Zhe Fu, Xiaoqian Gong, George Gunter, Sharon Hornstein, Abdul Rahman Kreidieh, Nathan Lichtlé, Matthew W. Nice, William A. Richardson, Adit Shah, Eugene Vinitsky, Fangyu Wu, Shengquan Xiang, Sulaiman Almatrudi, Fahd Althukair, Rahul Bhadani, Joy Carpio, Raphael Chekroun, Eric Cheng , et al. (39 additional authors not shown)

Abstract: The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experim… ▽ More The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experiment leveraged a heterogeneous fleet of 100 longitudinally-controlled vehicles as Lagrangian traffic actuators, each of which ran a controller with the architecture described in this paper. The MegaController is a hierarchical control architecture, which consists of two main layers. The upper layer is called Speed Planner, and is a centralized optimal control algorithm. It assigns speed targets to the vehicles, conveyed through the LTE cellular network. The lower layer is a control layer, running on each vehicle. It performs local actuation by overriding the stock adaptive cruise controller, using the stock on-board sensors. The Speed Planner ingests live data feeds provided by third parties, as well as data from our own control vehicles, and uses both to perform the speed assignment. The architecture of the speed planner allows for modular use of standard control techniques, such as optimal control, model predictive control, kernel methods and others, including Deep RL, model predictive control and explicit controllers. Depending on the vehicle architecture, all onboard sensing data can be accessed by the local controllers, or only some. Control inputs vary across different automakers, with inputs ranging from torque or acceleration requests for some cars, and electronic selection of ACC set points in others. The proposed architecture allows for the combination of all possible settings proposed above. Most configurations were tested throughout the ramp up to the MegaVandertest. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2401.11567 [pdf, other]

Deterministic Multi-stage Constellation Reconfiguration Using Integer Linear Programming and Sequential Decision-Making Methods

Authors: Hang Woon Lee, David O. Williams Rogers, Brycen D. Pearl, Hao Chen, Koki Ho

Abstract: In this paper, we address the problem of reconfiguring Earth observation satellite constellation systems through multiple stages. The Multi-stage Constellation Reconfiguration Problem (MCRP) aims to maximize the total observation rewards obtained by covering a set of targets of interest through the active manipulation of the orbits and relative phasing of constituent satellites. In this paper, we… ▽ More In this paper, we address the problem of reconfiguring Earth observation satellite constellation systems through multiple stages. The Multi-stage Constellation Reconfiguration Problem (MCRP) aims to maximize the total observation rewards obtained by covering a set of targets of interest through the active manipulation of the orbits and relative phasing of constituent satellites. In this paper, we consider deterministic problem settings in which the targets of interest are known a priori. We propose a novel integer linear programming formulation for MCRP, capable of obtaining provably optimal solutions. To overcome computational intractability due to the combinatorial explosion in solving large-scale instances, we introduce two computationally efficient sequential decision-making methods based on the principles of a myopic policy and a rolling horizon procedure. The computational experiments demonstrate that the devised sequential decision-making approaches yield high-quality solutions with improved computational efficiency over the baseline MCRP. Finally, a case study using Hurricane Harvey data showcases the advantages of multi-stage constellation reconfiguration over single-stage and no-reconfiguration scenarios. △ Less

Submitted 30 April, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: 39 pages, 13 figures, submitted to the Journal of Spacecraft and Rockets

arXiv:2401.09666 [pdf, other]

Traffic Smoothing Controllers for Autonomous Vehicles Using Deep Reinforcement Learning and Real-World Trajectory Data

Authors: Nathan Lichtlé, Kathy Jang, Adit Shah, Eugene Vinitsky, Jonathan W. Lee, Alexandre M. Bayen

Abstract: Designing traffic-smoothing cruise controllers that can be deployed onto autonomous vehicles is a key step towards improving traffic flow, reducing congestion, and enhancing fuel efficiency in mixed autonomy traffic. We bypass the common issue of having to carefully fine-tune a large traffic microsimulator by leveraging real-world trajectory data from the I-24 highway in Tennessee, replayed in a o… ▽ More Designing traffic-smoothing cruise controllers that can be deployed onto autonomous vehicles is a key step towards improving traffic flow, reducing congestion, and enhancing fuel efficiency in mixed autonomy traffic. We bypass the common issue of having to carefully fine-tune a large traffic microsimulator by leveraging real-world trajectory data from the I-24 highway in Tennessee, replayed in a one-lane simulation. Using standard deep reinforcement learning methods, we train energy-reducing wave-smoothing policies. As an input to the agent, we observe the speed and distance of only the vehicle in front, which are local states readily available on most recent vehicles, as well as non-local observations about the downstream state of the traffic. We show that at a low 4% autonomous vehicle penetration rate, we achieve significant fuel savings of over 15% on trajectories exhibiting many stop-and-go waves. Finally, we analyze the smoothing effect of the controllers and demonstrate robustness to adding lane-changing into the simulation as well as the removal of downstream information. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted to be published as part of the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC) 2023, Bilbao, Spain, September 24-28, 2023

arXiv:2401.03850 [pdf, other]

Inverse Nonlinearity Compensation of Hyperelastic Deformation in Dielectric Elastomer for Acoustic Actuation

Authors: ** Woo Lee, Gwang Seok An, Jeong-Yun Sun, Kyogu Lee

Abstract: This paper delves into the analysis of nonlinear deformation induced by dielectric actuation in pre-stressed ideal dielectric elastomers. It formulates a nonlinear ordinary differential equation governing this deformation based on the hyperelastic model under dielectric stress. Through numerical integration and neural network approximations, the relationship between voltage and stretch is establis… ▽ More This paper delves into the analysis of nonlinear deformation induced by dielectric actuation in pre-stressed ideal dielectric elastomers. It formulates a nonlinear ordinary differential equation governing this deformation based on the hyperelastic model under dielectric stress. Through numerical integration and neural network approximations, the relationship between voltage and stretch is established. Neural networks are employed to approximate solutions for voltage-to-stretch and stretch-to-voltage transformations obtained via an explicit Runge-Kutta method. The effectiveness of these approximations is demonstrated by leveraging them for compensating nonlinearity through the wavesha** of the input signal. The comparative analysis highlights the superior accuracy of the approximated solutions over baseline methods, resulting in minimized harmonic distortions when utilizing dielectric elastomers as acoustic actuators. This study underscores the efficacy of the proposed approach in mitigating nonlinearities and enhancing the performance of dielectric elastomers in acoustic actuation applications. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.17344 [pdf, other]

Recursive Self-Composite Approach Towards Structural Understanding of Boolean Network

Authors: Jongrae Kim, Woojeong Lee, Kwang-Hyun Cho

Abstract: Boolean networks have been widely used in many areas of science and engineering to represent various dynamical behaviour. In systems biology, they became useful tools to study the dynamical characteristics of large-scale biomolecular networks and there have been a number of studies to develop efficient ways of finding steady states or cycles of Boolean network models. On the other hand, there has… ▽ More Boolean networks have been widely used in many areas of science and engineering to represent various dynamical behaviour. In systems biology, they became useful tools to study the dynamical characteristics of large-scale biomolecular networks and there have been a number of studies to develop efficient ways of finding steady states or cycles of Boolean network models. On the other hand, there has been little attention to analyzing the dynamic properties of the network structure itself. Here, we present a systematic way to study such properties by introducing a recursive self-composite of the logic update rules. Of note, we found that all Boolean update rules actually have repeated logic structures underneath. This repeated nature of Boolean networks reveals interesting algebraic properties embedded in the networks. We found that each converged logic leads to the same states, called kernel states. As a result, the longest-length period of states cycle turns out to be equal to the number of converged logics in the logic cycle. Based on this, we propose a lea** and filling algorithm to avoid any possible large string explosions during the self-composition procedures. Finally, we demonstrate how the proposed approach can be used to reveal interesting hidden properties using Boolean network examples of a simple network with a long feedback structure, a T-cell receptor network and a cancer network. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 9 pages, 3 figures

arXiv:2312.09576 [pdf, other]

SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results in many medical image segmentation tasks. However, for NPC OARs and GTVs segmentation, few public datasets are available for model development and evaluation. To alleviate this problem, the SegRap2023 challenge was organized in conjunction with MICCAI2023 and presented a large-scale benchmark for OAR and GTV segmentation with 400 Computed Tomography (CT) scans from 200 NPC patients, each with a pair of pre-aligned non-contrast and contrast-enhanced CT scans. The challenge's goal was to segment 45 OARs and 2 GTVs from the paired CT scans. In this paper, we detail the challenge and analyze the solutions of all participants. The average Dice similarity coefficient scores for all submissions ranged from 76.68\% to 86.70\%, and 70.42\% to 73.44\% for OARs and GTVs, respectively. We conclude that the segmentation of large-size OARs is well-addressed, and more efforts are needed for GTVs and small-size or thin-structure OARs. The benchmark will remain publicly available here: https://segrap2023.grand-challenge.org △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

arXiv:2312.03312 [pdf, other]

Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation

Authors: Wonjun Lee, Gary Geunbae Lee, Yunsu Kim

Abstract: This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. A… ▽ More This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. Additionally, we introduce a global phoneme noise generator for realistic ASR noise during phoneme-to-grapheme training to reduce error propagation. Experiments on the CommonVoice 12.0 dataset show significant reductions in Word Error Rate (WER) for low-resource languages, highlighting the effectiveness of our approach. This research contributes to the advancements of two-pass ASR systems in low-resource languages, offering the potential for improved cross-lingual transfer learning. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: 8 pages, ASRU 2023 Accepted

arXiv:2312.01842 [pdf, other]

Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking

Authors: Jihyun Lee, Ye** Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee

Abstract: Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audi… ▽ More Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audio dataset, and test them on actual human speech data. To facilitate evaluation tailored to audio modalities, we introduce a novel PhonemeF1 to capture pronunciation similarity. Experimental results showed that models trained solely on synthetic datasets can generalize their performance to human voice data. By eliminating the dependency on human speech data collection, these insights pave the way for significant practical advancements in audio-based DST. Data and code are available at https://github.com/JihyunLee1/E2E-DST. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Accepted in ASRU 2023

arXiv:2311.18539 [pdf, other]

Bridging Both Worlds in Semantics and Time: Domain Knowledge Based Analysis and Correlation of Industrial Process Attacks

Authors: Moses Ike, Kandy Phan, Anwesh Badapanda, Matthew Landen, Keaton Sadoski, Wanda Guo, Asfahan Shah, Saman Zonouz, Wenke Lee

Abstract: Modern industrial control systems (ICS) attacks infect supervisory control and data acquisition (SCADA) hosts to stealthily alter industrial processes, causing damage. To detect attacks with low false alarms, recent work detects attacks in both SCADA and process data. Unfortunately, this led to the same problem - disjointed (false) alerts, due to the semantic and time gap in SCADA and process beha… ▽ More Modern industrial control systems (ICS) attacks infect supervisory control and data acquisition (SCADA) hosts to stealthily alter industrial processes, causing damage. To detect attacks with low false alarms, recent work detects attacks in both SCADA and process data. Unfortunately, this led to the same problem - disjointed (false) alerts, due to the semantic and time gap in SCADA and process behavior, i.e., SCADA execution does not map to process dynamics nor evolve at similar time scales. We propose BRIDGE to analyze and correlate SCADA and industrial process attacks using domain knowledge to bridge their unique semantic and time evolution. This enables operators to tie malicious SCADA operations to their adverse process effects, which reduces false alarms and improves attack understanding. BRIDGE (i) identifies process constraints violations in SCADA by measuring actuation dependencies in SCADA process-control, and (ii) detects malicious SCADA effects in processes via a physics-informed neural network that embeds generic knowledge of inertial process dynamics. BRIDGE then dynamically aligns both analysis (i and ii) in a time-window that adjusts their time evolution based on process inertial delays. We applied BRIDGE to 11 diverse real-world industrial processes, and adaptive attacks inspired by past events. BRIDGE correlated 98.3% of attacks with 0.8% false positives (FP), compared to 78.3% detection accuracy and 13.7% FP of recent work. △ Less

Submitted 3 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.18505 [pdf, other]

String Sound Synthesizer on GPU-accelerated Finite Difference Scheme

Authors: ** Woo Lee, Min Jun Choi, Kyogu Lee

Abstract: This paper introduces a nonlinear string sound synthesizer, based on a finite difference simulation of the dynamic behavior of strings under various excitations. The presented synthesizer features a versatile string simulation engine capable of stochastic parameterization, encompassing fundamental frequency modulation, stiffness, tension, frequency-dependent loss, and excitation control. This open… ▽ More This paper introduces a nonlinear string sound synthesizer, based on a finite difference simulation of the dynamic behavior of strings under various excitations. The presented synthesizer features a versatile string simulation engine capable of stochastic parameterization, encompassing fundamental frequency modulation, stiffness, tension, frequency-dependent loss, and excitation control. This open-source physical model simulator not only benefits the audio signal processing community but also contributes to the burgeoning field of neural network-based audio synthesis by serving as a novel dataset construction tool. Implemented in PyTorch, this synthesizer offers flexibility, facilitating both CPU and GPU utilization, thereby enhancing its applicability as a simulator. GPU utilization expedites computation by parallelizing operations across spatial and batch dimensions, further enhancing its utility as a data generator. △ Less

Submitted 8 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: To be appeared in ICASSP 2024

arXiv:2311.10430 [pdf, other]

Deep Residual CNN for Multi-Class Chest Infection Diagnosis

Authors: Ryan Donghan Kwon, Dohyun Lim, Yoonha Lee, Seung Won Lee

Abstract: The advent of deep learning has significantly propelled the capabilities of automated medical image diagnosis, providing valuable tools and resources in the realm of healthcare and medical diagnostics. This research delves into the development and evaluation of a Deep Residual Convolutional Neural Network (CNN) for the multi-class diagnosis of chest infections, utilizing chest X-ray images. The im… ▽ More The advent of deep learning has significantly propelled the capabilities of automated medical image diagnosis, providing valuable tools and resources in the realm of healthcare and medical diagnostics. This research delves into the development and evaluation of a Deep Residual Convolutional Neural Network (CNN) for the multi-class diagnosis of chest infections, utilizing chest X-ray images. The implemented model, trained and validated on a dataset amalgamated from diverse sources, demonstrated a robust overall accuracy of 93%. However, nuanced disparities in performance across different classes, particularly Fibrosis, underscored the complexity and challenges inherent in automated medical image diagnosis. The insights derived pave the way for future research, focusing on enhancing the model's proficiency in classifying conditions that present more subtle and nuanced visual features in the images, as well as optimizing and refining the model architecture and training process. This paper provides a comprehensive exploration into the development, implementation, and evaluation of the model, offering insights and directions for future research and development in the field. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2310.18151 [pdf, other]

Traffic smoothing using explicit local controllers

Authors: Amaury Hayat, Arwa Alanqary, Rahul Bhadani, Christopher Denaro, Ryan J. Weightman, Shengquan Xiang, Jonathan W. Lee, Matthew Bunting, Anish Gollakota, Matthew W. Nice, Derek Gloudemans, Gergely Zachar, Jon F. Davis, Maria Laura Delle Monache, Benjamin Seibold, Alexandre M. Bayen, Jonathan Sprinkle, Daniel B. Work, Benedetto Piccoli

Abstract: The dissipation of stop-and-go waves attracted recent attention as a traffic management problem, which can be efficiently addressed by automated driving. As part of the 100 automated vehicles experiment named MegaVanderTest, feedback controls were used to induce strong dissipation via velocity smoothing. More precisely, a single vehicle driving differently in one of the four lanes of I-24 in the N… ▽ More The dissipation of stop-and-go waves attracted recent attention as a traffic management problem, which can be efficiently addressed by automated driving. As part of the 100 automated vehicles experiment named MegaVanderTest, feedback controls were used to induce strong dissipation via velocity smoothing. More precisely, a single vehicle driving differently in one of the four lanes of I-24 in the Nashville area was able to regularize the velocity profile by reducing oscillations in time and velocity differences among vehicles. Quantitative measures of this effect were possible due to the innovative I-24 MOTION system capable of monitoring the traffic conditions for all vehicles on the roadway. This paper presents the control design, the technological aspects involved in its deployment, and, finally, the results achieved by the experiment. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 21 pages, 1 Table , 9 figures

MSC Class: 93D15; 93D21; 93-05; 34H05; ACM Class: H.2.2

arXiv:2310.06297 [pdf, other]

Reducing Detailed Vehicle Energy Dynamics to Physics-Like Models

Authors: Nour Khoudari, Sulaiman Almatrudi, Rabie Ramadan, Joy Carpio, Mengsha Yao, Kenneth Butts, Alexandre M. Bayen, Jonathan W. Lee, Benjamin Seibold

Abstract: The energy demand of vehicles, particularly in unsteady drive cycles, is affected by complex dynamics internal to the engine and other powertrain components. Yet, in many applications, particularly macroscopic traffic flow modeling and optimization, structurally simple approximations to the complex vehicle dynamics are needed that nevertheless reproduce the correct effective energy behavior. This… ▽ More The energy demand of vehicles, particularly in unsteady drive cycles, is affected by complex dynamics internal to the engine and other powertrain components. Yet, in many applications, particularly macroscopic traffic flow modeling and optimization, structurally simple approximations to the complex vehicle dynamics are needed that nevertheless reproduce the correct effective energy behavior. This work presents a systematic model reduction pipeline that starts from complex vehicle models based on the Autonomie software and derives a hierarchy of simplified models that are fast to evaluate, easy to disseminate in open-source frameworks, and compatible with optimization frameworks. The pipeline, based on a virtual chassis dynamometer and subsequent approximation strategies, is reproducible and is applied to six different vehicle classes to produce concrete explicit energy models that represent an average vehicle in each class and leverage the accuracy and validation work of the Autonomie software. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 40 pages, 9 figures

arXiv:2309.04755 [pdf, other]

Towards Real-time Training of Physics-informed Neural Networks: Applications in Ultrafast Ultrasound Blood Flow Imaging

Authors: Haotian Guan, **** Dong, Wei-Ning Lee

Abstract: Physics-informed Neural Network (PINN) is one of the most preeminent solvers of Navier-Stokes equations, which are widely used as the governing equation of blood flow. However, current approaches, relying on full Navier-Stokes equations, are impractical for ultrafast Doppler ultrasound, the state-of-the-art technique for depiction of complex blood flow dynamics \emph{in vivo} through acquired thou… ▽ More Physics-informed Neural Network (PINN) is one of the most preeminent solvers of Navier-Stokes equations, which are widely used as the governing equation of blood flow. However, current approaches, relying on full Navier-Stokes equations, are impractical for ultrafast Doppler ultrasound, the state-of-the-art technique for depiction of complex blood flow dynamics \emph{in vivo} through acquired thousands of frames (or, timestamps) per second. In this article, we first propose a novel training framework of PINN for solving Navier-Stokes equations by discretizing Navier-Stokes equations into steady state and sequentially solving steady-state Navier-Stokes equations with transfer learning. The novel training framework is coined as SeqPINN. Upon the success of SeqPINN, we adopt the idea of averaged constant stochastic gradient descent (SGD) as initialization and propose a parallel training scheme for all timestamps. To ensure an initialization that generalizes well, we borrow the concept of Stochastic Weight Averaging Gaussian to perform uncertainty estimation as an indicator of generalizability of the initialization. This algorithm, named SP-PINN, further expedites training of PINN while achieving comparable accuracy with SeqPINN. Finite-element simulations and \emph{in vitro} phantoms of single-branch and trifurcate blood vessels are used to evaluate the performance of SeqPINN and SP-PINN. Results show that both SeqPINN and SP-PINN are manyfold faster than the original design of PINN, while respectively achieving Root Mean Square Errors (RMSEs) of 1.01 cm/s and 1.26 cm/s on the straight vessel and 1.91 cm/s and 2.56 cm/s on the trifurcate blood vessel when recovering blood flow velocities. △ Less

Submitted 9 September, 2023; originally announced September 2023.

arXiv:2309.00995 [pdf]

doi 10.1088/1361-6560/acd236

Constrained CycleGAN for Effective Generation of Ultrasound Sector Images of Improved Spatial Resolution

Authors: Xiaofei Sun, He Li, Wei-Ning Lee

Abstract: Objective. A phased or a curvilinear array produces ultrasound (US) images with a sector field of view (FOV), which inherently exhibits spatially-varying image resolution with inferior quality in the far zone and towards the two sides azimuthally. Sector US images with improved spatial resolutions are favorable for accurate quantitative analysis of large and dynamic organs, such as the heart. Ther… ▽ More Objective. A phased or a curvilinear array produces ultrasound (US) images with a sector field of view (FOV), which inherently exhibits spatially-varying image resolution with inferior quality in the far zone and towards the two sides azimuthally. Sector US images with improved spatial resolutions are favorable for accurate quantitative analysis of large and dynamic organs, such as the heart. Therefore, this study aims to translate US images with spatially-varying resolution to ones with less spatially-varying resolution. CycleGAN has been a prominent choice for unpaired medical image translation; however, it neither guarantees structural consistency nor preserves backscattering patterns between input and generated images for unpaired US images. Approach. To circumvent this limitation, we propose a constrained CycleGAN (CCycleGAN), which directly performs US image generation with unpaired images acquired by different ultrasound array probes. In addition to conventional adversarial and cycle-consistency losses of CycleGAN, CCycleGAN introduces an identical loss and a correlation coefficient loss based on intrinsic US backscattered signal properties to constrain structural consistency and backscattering patterns, respectively. Instead of post-processed B-mode images, CCycleGAN uses envelope data directly obtained from beamformed radio-frequency signals without any other non-linear postprocessing. Main Results. In vitro phantom results demonstrate that CCycleGAN successfully generates images with improved spatial resolution as well as higher peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) compared with benchmarks. Significance. CCycleGAN-generated US images of the in vivo human beating heart further facilitate higher quality heart wall motion estimation than benchmarks-generated ones, particularly in deep regions. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Journal ref: Physics in Medicine & Biology 2023

arXiv:2308.15791 [pdf, other]

Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

Authors: Yeongwoong Kim, Suyong Bahk, Seungeon Kim, Won Hee Lee, Dokwan Oh, Hui Yong Kim

Abstract: Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In N… ▽ More Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In NVC, however, limited research has investigated the hierarchical B scheme. In this paper, we propose an NVC model exploiting hierarchical B-frame coding with temporal layer-adaptive optimization. We first extend an existing unidirectional NVC model to a bidirectional model, which achieves -21.13% BD-rate gain over the unidirectional baseline model. However, this model faces challenges when applied to sequences with complex or large motions, leading to performance degradation. To address this, we introduce temporal layer-adaptive optimization, incorporating methods such as temporal layer-adaptive quality scaling (TAQS) and temporal layer-adaptive latent scaling (TALS). The final model with the proposed methods achieves an impressive BD-rate gain of -39.86% against the baseline. It also resolves the challenges in sequences with large or complex motions with up to -49.13% more BD-rate gains than the simple bidirectional extension. This improvement is attributed to the allocation of more bits to lower temporal layers, thereby enhancing overall reconstruction quality with smaller bits. Since our method has little dependency on a specific NVC model architecture, it can serve as a general tool for extending unidirectional NVC models to the ones with hierarchical B-frame coding. △ Less

Submitted 5 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2305.00676 [pdf, other]

doi 10.1109/LRA.2023.3318190

Learning Terrain-Aware Kinodynamic Model for Autonomous Off-Road Rally Driving With Model Predictive Path Integral Control

Authors: Ho** Lee, Taekyung Kim, Jungwi Mun, Wonsuk Lee

Abstract: High-speed autonomous driving in off-road environments has immense potential for various applications, but it also presents challenges due to the complexity of vehicle-terrain interactions. In such environments, it is crucial for the vehicle to predict its motion and adjust its controls proactively in response to environmental changes, such as variations in terrain elevation. To this end, we propo… ▽ More High-speed autonomous driving in off-road environments has immense potential for various applications, but it also presents challenges due to the complexity of vehicle-terrain interactions. In such environments, it is crucial for the vehicle to predict its motion and adjust its controls proactively in response to environmental changes, such as variations in terrain elevation. To this end, we propose a method for learning terrain-aware kinodynamic model which is conditioned on both proprioceptive and exteroceptive information. The proposed model generates reliable predictions of 6-degree-of-freedom motion and can even estimate contact interactions without requiring ground truth force data during training. This enables the design of a safe and robust model predictive controller through appropriate cost function design which penalizes sampled trajectories with unstable motion, unsafe interactions, and high levels of uncertainty derived from the model. We demonstrate the effectiveness of our approach through experiments on a simulated off-road track, showing that our proposed model-controller pair outperforms the baseline and ensures robust high-speed driving performance without control failure. △ Less

Submitted 22 September, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: Accepted to IEEE Robotics and Automation Letters (and ICRA 2024). Our video can be found at https://youtu.be/VXf_prNQnJo Project page : https://sites.google.com/view/terrainawarekinodyn

Journal ref: IEEE Robotics and Automation Letters, 2023

arXiv:2303.03093 [pdf, other]

A Miniaturised Camera-based Multi-Modal Tactile Sensor

Authors: Kaspar Althoefer, Yonggen Ling, Wanlin Li, Xinyuan Qian, Wang Wei Lee, Peng Qi

Abstract: In conjunction with huge recent progress in camera and computer vision technology, camera-based sensors have increasingly shown considerable promise in relation to tactile sensing. In comparison to competing technologies (be they resistive, capacitive or magnetic based), they offer super-high-resolution, while suffering from fewer wiring problems. The human tactile system is composed of various ty… ▽ More In conjunction with huge recent progress in camera and computer vision technology, camera-based sensors have increasingly shown considerable promise in relation to tactile sensing. In comparison to competing technologies (be they resistive, capacitive or magnetic based), they offer super-high-resolution, while suffering from fewer wiring problems. The human tactile system is composed of various types of mechanoreceptors, each able to perceive and process distinct information such as force, pressure, texture, etc. Camera-based tactile sensors such as GelSight mainly focus on high-resolution geometric sensing on a flat surface, and their force measurement capabilities are limited by the hysteresis and non-linearity of the silicone material. In this paper, we present a miniaturised dome-shaped camera-based tactile sensor that allows accurate force and tactile sensing in a single coherent system. The key novelty of the sensor design is as follows. First, we demonstrate how to build a smooth silicone hemispheric sensing medium with uniform markers on its curved surface. Second, we enhance the illumination of the rounded silicone with diffused LEDs. Third, we construct a force-sensitive mechanical structure in a compact form factor with usage of springs to accurately perceive forces. Our multi-modal sensor is able to acquire tactile information from multi-axis forces, local force distribution, and contact geometry, all in real-time. We apply an end-to-end deep learning method to process all the information. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2303.00882 [pdf, other]

X-Ray2EM: Uncertainty-Aware Cross-Modality Image Reconstruction from X-Ray to Electron Microscopy in Connectomics

Authors: Yicong Li, Yaron Meirovitch, Aaron T. Kuan, Jasper S. Phelps, Alexandra Pacureanu, Wei-Chung Allen Lee, Nir Shavit, Lu Mi

Abstract: Comprehensive, synapse-resolution imaging of the brain will be crucial for understanding neuronal computations and function. In connectomics, this has been the sole purview of volume electron microscopy (EM), which entails an excruciatingly difficult process because it requires cutting tissue into many thin, fragile slices that then need to be imaged, aligned, and reconstructed. Unlike EM, hard X-… ▽ More Comprehensive, synapse-resolution imaging of the brain will be crucial for understanding neuronal computations and function. In connectomics, this has been the sole purview of volume electron microscopy (EM), which entails an excruciatingly difficult process because it requires cutting tissue into many thin, fragile slices that then need to be imaged, aligned, and reconstructed. Unlike EM, hard X-ray imaging is compatible with thick tissues, eliminating the need for thin sectioning, and delivering fast acquisition, intrinsic alignment, and isotropic resolution. Unfortunately, current state-of-the-art X-ray microscopy provides much lower resolution, to the extent that segmenting membranes is very challenging. We propose an uncertainty-aware 3D reconstruction model that translates X-ray images to EM-like images with enhanced membrane segmentation quality, showing its potential for develo** simpler, faster, and more accurate X-ray based connectomics pipelines. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted by ISBI 2023 conference. Supplementary material is available in this arXiv version

arXiv:2302.07453 [pdf, other]

Cooperative Driving for Speed Harmonization in Mixed-Traffic Environments

Authors: Zhe Fu, Abdul Rahman Kreidieh, Han Wang, Jonathan W. Lee, Maria Laura Delle Monache, Alexandre M. Bayen

Abstract: Autonomous driving systems present promising methods for congestion mitigation in mixed autonomy traffic control settings. In particular, when coupled with even modest traffic state estimates, such systems can plan and coordinate the behaviors of automated vehicles (AVs) in response to observed downstream events, thereby inhibiting the continued propagation of congestion. In this paper, we present… ▽ More Autonomous driving systems present promising methods for congestion mitigation in mixed autonomy traffic control settings. In particular, when coupled with even modest traffic state estimates, such systems can plan and coordinate the behaviors of automated vehicles (AVs) in response to observed downstream events, thereby inhibiting the continued propagation of congestion. In this paper, we present a two-layer control strategy in which the upper layer proposes the desired speeds that predictively react to the downstream state of traffic, and the lower layer maintains safe and reasonable headways with leading vehicles. This method is demonstrated to achieve an average of over 15% energy savings within simulations of congested events observed in Interstate 24 with only 4% AV penetration, while restricting negative externalities imposed on traveling times and mobility. The proposed strategy that served as part of the "speed planner" was deployed on 100 AVs in a massive traffic experiment conducted on Nashville's I-24 in November 2022. △ Less

Submitted 3 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: Accepted by IEEE IV 2023

arXiv:2212.05662 [pdf, other]

Optimal Planning of Hybrid Energy Storage Systems using Curtailed Renewable Energy through Deep Reinforcement Learning

Authors: Dongju Kang, Doeun Kang, Sumin Hwangbo, Haider Niaz, Won Bo Lee, J. Jay Liu, Jonggeol Na

Abstract: Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with… ▽ More Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with the complexity and uncertainties of large-scale problems. Here, we propose a sophisticated deep reinforcement learning (DRL) methodology with a policy-based algorithm to realize the real-time optimal ESS planning under the curtailed renewable energy uncertainty. A quantitative performance comparison proved that the DRL agent outperforms the scenario-based stochastic optimization (SO) algorithm, even with a wide action and observation space. Owing to the uncertainty rejection capability of the DRL, we could confirm a robust performance, under a large uncertainty of the curtailed renewable energy, with a maximizing net profit and stable system. Action-map** was performed for visually assessing the action taken by the DRL agent according to the state. The corresponding results confirmed that the DRL agent learns the way like what a human expert would do, suggesting reliable application of the proposed methodology. △ Less

Submitted 11 December, 2022; originally announced December 2022.

Comments: 30 pages, 8 figures

arXiv:2211.00878 [pdf, other]

Neural Fourier Shift for Binaural Speech Rendering

Authors: ** Woo Lee, Kyogu Lee

Abstract: We present a neural network for rendering binaural speech from given monaural audio, position, and orientation of the source. Most of the previous works have focused on synthesizing binaural speeches by conditioning the positions and orientations in the feature space of convolutional neural networks. These synthesis approaches are powerful in estimating the target binaural speeches even for in-the… ▽ More We present a neural network for rendering binaural speech from given monaural audio, position, and orientation of the source. Most of the previous works have focused on synthesizing binaural speeches by conditioning the positions and orientations in the feature space of convolutional neural networks. These synthesis approaches are powerful in estimating the target binaural speeches even for in-the-wild data but are difficult to generalize for rendering the audio from out-of-distribution domains. To alleviate this, we propose Neural Fourier Shift (NFS), a novel network architecture that enables binaural speech rendering in the Fourier space. Specifically, utilizing a geometric time delay based on the distance between the source and the receiver, NFS is trained to predict the delays and scales of various early reflections. NFS is efficient in both memory and computational cost, is interpretable, and operates independently of the source domain by its design. Experimental results show that NFS performs comparable to the previous studies on the benchmark dataset, even with its 25 times lighter memory and 6 times fewer calculations. △ Less

Submitted 1 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted by ICASSP 2023

arXiv:2206.06978 [pdf, ps, other]

Low-Latency MAC Design for Pairwise Random Networks

Authors: Irshad A. Meer, Woong-Hee Lee, Mustafa Ozger, Cicek Cavdar, Ki Won Sung

Abstract: Feasibility of using unlicensed spectrum for ultra reliable low latency communications (URLLC) is still a question for beyond 5G wireless networks. Low latency access to the channel and efficiently sharing spectrum among the multiple users are the main requirements for exploiting unlicensed spectrum for URLLC. Listen before talk and back-off procedures implemented to avoid the collisions in channe… ▽ More Feasibility of using unlicensed spectrum for ultra reliable low latency communications (URLLC) is still a question for beyond 5G wireless networks. Low latency access to the channel and efficiently sharing spectrum among the multiple users are the main requirements for exploiting unlicensed spectrum for URLLC. Listen before talk and back-off procedures implemented to avoid the collisions in channel access hinder the low latency communication. In this paper, we propose a novel low-latency medium access control (MAC) scheme based on the collision resolution for a pairwise random wireless network. We use geometric sequence decomposition for collision resolution among the competing users. This enables the system to tackle collisions and thus removing the need for carrier sensing and back-off procedures. This saves time in obtaining access to the channel and improves the efficiency of the system. We implement our approach in the synchronized time slotted system and show that it yields significant improvement over existing MAC schemes. △ Less

Submitted 22 May, 2022; originally announced June 2022.

Comments: Accepted in IEEE VTC Spring 2022

arXiv:2206.02910 [pdf, other]

Regional Constellation Reconfiguration Problem: Integer Linear Programming Formulation and Lagrangian Heuristic Method

Authors: Hang Woon Lee, Koki Ho

Abstract: A group of satellites, with either homogeneous or heterogeneous orbital characteristics and/or hardware specifications, can undertake a reconfiguration process due to variations in operations pertaining to Earth observation missions. This paper investigates the problem of optimizing a satellite constellation reconfiguration process against two competing mission objectives: (i) the maximization of… ▽ More A group of satellites, with either homogeneous or heterogeneous orbital characteristics and/or hardware specifications, can undertake a reconfiguration process due to variations in operations pertaining to Earth observation missions. This paper investigates the problem of optimizing a satellite constellation reconfiguration process against two competing mission objectives: (i) the maximization of the total coverage reward and (ii) the minimization of the total cost of the transfer. The decision variables for the reconfiguration process include the design of the new configuration and the assignment of satellites from one configuration to another. We present a novel bi-objective integer linear programming formulation that combines constellation design and transfer problems. The formulation lends itself to the use of generic mixed-integer linear programming (MILP) methods such as the branch-and-bound algorithm for the computation of provably-optimal solutions; however, these approaches become computationally prohibitive even for moderately-sized instances. In response to this challenge, this paper proposes a Lagrangian relaxation-based heuristic method that leverages the assignment problem structure embedded in the problem. The results from the computational experiments attest to the near-optimality of the Lagrangian heuristic solutions and a significant improvement in the computational runtime compared to a commercial MILP solver. △ Less

Submitted 2 July, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: 41 pages, 10 figures, accepted for publication at Journal of Spacecraft and Rockets; pre-print

arXiv:2204.02639 [pdf, other]

doi 10.21437/Interspeech.2022-11460

Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification

Authors: ** Woo Lee, Eungbeom Kim, Junghyun Koo, Kyogu Lee

Abstract: Text-to-speech and voice conversion studies are constantly improving to the extent where they can produce synthetic speech almost indistinguishable from bona fide human speech. In this regard, the importance of countermeasures (CM) against synthetic voice attacks of the automatic speaker verification (ASV) systems emerges. Nonetheless, most end-to-end spoofing detection networks are black-box syst… ▽ More Text-to-speech and voice conversion studies are constantly improving to the extent where they can produce synthetic speech almost indistinguishable from bona fide human speech. In this regard, the importance of countermeasures (CM) against synthetic voice attacks of the automatic speaker verification (ASV) systems emerges. Nonetheless, most end-to-end spoofing detection networks are black-box systems, and the answer to what is an effective representation for finding artifacts remains veiled. In this paper, we examine which feature space can effectively represent synthetic artifacts using wav2vec 2.0, and study which architecture can effectively utilize the space. Our study allows us to analyze which attribute of speech signals is advantageous for the CM systems. The proposed CM system achieved 0.31% equal error rate (EER) on ASVspoof 2019 LA evaluation set for the spoof detection task. We further propose a simple yet effective spoofing aware speaker verification (SASV) method, which takes advantage of the disentangled representations from our countermeasure system. Evaluation performed with the SASV Challenge 2022 database show 1.08% of SASV EER. Quantitative analysis shows that using the explored feature space of wav2vec 2.0 advantages both spoofing CM and SASV. △ Less

Submitted 2 July, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: Accepted to be published in the Proceedings of Interspeech 2022

arXiv:2204.02637 [pdf, other]

Global HRTF Interpolation via Learned Affine Transformation of Hyper-conditioned Features

Authors: ** Woo Lee, Sungho Lee, Kyogu Lee

Abstract: Estimating Head-Related Transfer Functions (HRTFs) of arbitrary source points is essential in immersive binaural audio rendering. Computing each individual's HRTFs is challenging, as traditional approaches require expensive time and computational resources, while modern data-driven approaches are data-hungry. Especially for the data-driven approaches, existing HRTF datasets differ in spatial sampl… ▽ More Estimating Head-Related Transfer Functions (HRTFs) of arbitrary source points is essential in immersive binaural audio rendering. Computing each individual's HRTFs is challenging, as traditional approaches require expensive time and computational resources, while modern data-driven approaches are data-hungry. Especially for the data-driven approaches, existing HRTF datasets differ in spatial sampling distributions of source positions, posing a major problem when generalizing the method across multiple datasets. To alleviate this, we propose a deep learning method based on a novel conditioning architecture. The proposed method can predict an HRTF of any position by interpolating the HRTFs of known distributions. Experimental results show that the proposed architecture improves the model's generalizability across datasets with various coordinate systems. Additional demonstrations show that the model robustly reconstructs the target HRTFs from the spatially downsampled HRTFs in both quantitative and perceptual measures. △ Less

Submitted 3 November, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: Submitted to ICASSP 2023

arXiv:2203.11799 [pdf, other]

AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network

Authors: Wooseok Lee, Sanghyun Son, Kyoung Mu Lee

Abstract: Blind-spot network (BSN) and its variants have made significant advances in self-supervised denoising. Nevertheless, they are still bound to synthetic noisy inputs due to less practical assumptions like pixel-wise independent noise. Hence, it is challenging to deal with spatially correlated real-world noise using self-supervised BSN. Recently, pixel-shuffle downsampling (PD) has been proposed to r… ▽ More Blind-spot network (BSN) and its variants have made significant advances in self-supervised denoising. Nevertheless, they are still bound to synthetic noisy inputs due to less practical assumptions like pixel-wise independent noise. Hence, it is challenging to deal with spatially correlated real-world noise using self-supervised BSN. Recently, pixel-shuffle downsampling (PD) has been proposed to remove the spatial correlation of real-world noise. However, it is not trivial to integrate PD and BSN directly, which prevents the fully self-supervised denoising model on real-world images. We propose an Asymmetric PD (AP) to address this issue, which introduces different PD stride factors for training and inference. We systematically demonstrate that the proposed AP can resolve inherent trade-offs caused by specific PD stride factors and make BSN applicable to practical scenarios. To this end, we develop AP-BSN, a state-of-the-art self-supervised denoising method for real-world sRGB images. We further propose random-replacing refinement, which significantly improves the performance of our AP-BSN without any additional parameters. Extensive studies demonstrate that our method outperforms the other self-supervised and even unpaired denoising methods by a large margin, without using any additional knowledge, e.g., noise level, regarding the underlying unknown noise. △ Less

Submitted 24 March, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: Accepted to CVPR2022

arXiv:2203.04294 [pdf, other]

NaviAirway: a Bronchiole-sensitive Deep Learning-based Airway Segmentation Pipeline

Authors: Andong Wang, Terence Chi Chun Tam, Ho Ming Poon, Kun-Chang Yu, Wei-Ning Lee

Abstract: Airway segmentation is essential for chest CT image analysis. Different from natural image segmentation, which pursues high pixel-wise accuracy, airway segmentation focuses on topology. The task is challenging not only because of its complex tree-like structure but also the severe pixel imbalance among airway branches of different generations. To tackle the problems, we present a NaviAirway method… ▽ More Airway segmentation is essential for chest CT image analysis. Different from natural image segmentation, which pursues high pixel-wise accuracy, airway segmentation focuses on topology. The task is challenging not only because of its complex tree-like structure but also the severe pixel imbalance among airway branches of different generations. To tackle the problems, we present a NaviAirway method which consists of a bronchiole-sensitive loss function for airway topology preservation and an iterative training strategy for accurate model learning across different airway generations. To supplement the features of airway branches learned by the model, we distill the knowledge from numerous unlabeled chest CT images in a teacher-student manner. Experimental results show that NaviAirway outperforms existing methods, particularly in the identification of higher-generation bronchioles and robustness to new CT scans. Moreover, NaviAirway is general enough to be combined with different backbone models to significantly improve their performance. NaviAirway can generate an airway roadmap for Navigation Bronchoscopy and can also be applied to other scenarios when segmenting fine and long tubular structures in biomedical images. The code is publicly available on https://github.com/AntonotnaWang/NaviAirway. △ Less

Submitted 16 June, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

arXiv:2202.09533 [pdf, other]

C2N: Practical Generative Noise Modeling for Real-World Denoising

Authors: Geonwoon Jang, Wooseok Lee, Sanghyun Son, Kyoung Mu Lee

Abstract: Learning-based image denoising methods have been bounded to situations where well-aligned noisy and clean images are given, or samples are synthesized from predetermined noise models, e.g., Gaussian. While recent generative noise modeling methods aim to simulate the unknown distribution of real-world noise, several limitations still exist. In a practical scenario, a noise generator should learn to… ▽ More Learning-based image denoising methods have been bounded to situations where well-aligned noisy and clean images are given, or samples are synthesized from predetermined noise models, e.g., Gaussian. While recent generative noise modeling methods aim to simulate the unknown distribution of real-world noise, several limitations still exist. In a practical scenario, a noise generator should learn to simulate the general and complex noise distribution without using paired noisy and clean images. However, since existing methods are constructed on the unrealistic assumption of real-world noise, they tend to generate implausible patterns and cannot express complicated noise maps. Therefore, we introduce a Clean-to-Noisy image generation framework, namely C2N, to imitate complex real-world noise without using any paired examples. We construct the noise generator in C2N accordingly with each component of real-world noise characteristics to express a wide range of noise accurately. Combined with our C2N, conventional denoising CNNs can be trained to outperform existing unsupervised methods on challenging real-world benchmarks by a large margin. △ Less

Submitted 19 February, 2022; originally announced February 2022.

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2350-2359

arXiv:2202.01784 [pdf, other]

Robust Audio Anomaly Detection

Authors: Wo Jae Lee, Karim Helwani, Arvindh Krishnaswamy, Srikanth Tenneti

Abstract: We propose an outlier robust multivariate time series model which can be used for detecting previously unseen anomalous sounds based on noisy training data. The presented approach doesn't assume the presence of labeled anomalies in the training dataset and uses a novel deep neural network architecture to learn the temporal dynamics of the multivariate time series at multiple resolutions while bein… ▽ More We propose an outlier robust multivariate time series model which can be used for detecting previously unseen anomalous sounds based on noisy training data. The presented approach doesn't assume the presence of labeled anomalies in the training dataset and uses a novel deep neural network architecture to learn the temporal dynamics of the multivariate time series at multiple resolutions while being robust to contaminations in the training dataset. The temporal dynamics are modeled using recurrent layers augmented with attention mechanism. These recurrent layers are built on top of convolutional layers allowing the network to extract features at multiple resolutions. The output of the network is an outlier robust probability density function modeling the conditional probability of future samples given the time series history. State-of-the-art approaches using other multiresolution architectures are contrasted with our proposed approach. We validate our solution using publicly available machine sound datasets. We demonstrate the effectiveness of our approach in anomaly detection by comparing against several state-of-the-art models. △ Less

Submitted 3 February, 2022; originally announced February 2022.

Comments: Accepted paper at RobustML Workshop@ICLR 2021

Journal ref: RobustML Workshop - ICLR 2021

arXiv:2201.08321 [pdf, other]

doi 10.1109/LRA.2022.3184769

TOAST: Trajectory Optimization and Simultaneous Tracking using Shared Neural Network Dynamics

Authors: Taekyung Kim, Ho** Lee, Seongil Hong, Wonsuk Lee

Abstract: Neural networks have been increasingly employed in Model Predictive Controller (MPC) to control nonlinear dynamic systems. However, MPC still poses a problem that an achievable update rate is insufficient to cope with model uncertainty and external disturbances. In this paper, we present a novel control scheme that can design an optimal tracking controller using the neural network dynamics of the… ▽ More Neural networks have been increasingly employed in Model Predictive Controller (MPC) to control nonlinear dynamic systems. However, MPC still poses a problem that an achievable update rate is insufficient to cope with model uncertainty and external disturbances. In this paper, we present a novel control scheme that can design an optimal tracking controller using the neural network dynamics of the MPC, making it possible to be applied as a plug-and-play extension for any existing model-based feedforward controller. We also describe how our method handles a neural network containing history information, which does not follow a general form of dynamics. The proposed method is evaluated by its performance in classical control benchmarks with external disturbances. We also extend our control framework to be applied in an aggressive autonomous driving task with unknown friction. In all experiments, our method outperformed the compared methods by a large margin. Our controller also showed low control chattering levels, demonstrating that our feedback controller does not interfere with the optimal command of MPC. △ Less

Submitted 14 July, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Accepted to IEEE Robotics and Automation Letters (and IROS 2022). Our video can be found at https://youtu.be/YQG0yHE5jWw

Journal ref: IEEE Robotics and Automation Letters, 2022

arXiv:2201.00229 [pdf, other]

Understanding Energy Efficiency and Interference Tolerance in Millimeter Wave Receivers

Authors: Panagiotis Skrimponis, Seongjoon Kang, Abbas Khalili, Wonho Lee, Navid Hosseinzadeh, Marco Mezzavilla, Elza Erkip, Mark J. W. Rodwell, James F. Buckwalter, Sundeep Rangan

Abstract: Power consumption is a key challenge in millimeter wave (mmWave) receiver front-ends, due to the need to support high dimensional antenna arrays at wide bandwidths. Recently, there has been considerable work in develo** low-power front-ends, often based on low-resolution ADCs and low-power mixers. A critical but less studied consequence of such designs is the relatively low-dynamic range which i… ▽ More Power consumption is a key challenge in millimeter wave (mmWave) receiver front-ends, due to the need to support high dimensional antenna arrays at wide bandwidths. Recently, there has been considerable work in develo** low-power front-ends, often based on low-resolution ADCs and low-power mixers. A critical but less studied consequence of such designs is the relatively low-dynamic range which in turn exposes the receiver to adjacent carrier interference and blockers. This paper provides a general mathematical framework for analyzing the performance of mmWave front-ends in the presence of out-of-band interference. The goal is to elucidate the fundamental trade-off of power consumption, interference tolerance and in-band performance. The analysis is combined with detailed network simulations in cellular systems with multiple carriers, as well as detailed circuit simulations of key components at 140 GHz. The analysis reveals critical bottlenecks for low-power interference robustness and suggests designs enhancements for use in practical systems. △ Less

Submitted 1 January, 2022; originally announced January 2022.

Comments: Appeared at the Asilomar Conference on Signals, Systems, and Computers 2021

arXiv:2111.00187 [pdf]

Echopype: A Python library for interoperable and scalable processing of water column sonar data for biological information

Authors: Wu-Jung Lee, Emilio Mayorga, Landung Setiawan, Valentina Staneva

Abstract: High-frequency sonar systems deployed on a wide array of ocean observing platforms are creating a deluge of water column sonar data at an unprecedented speed from all corners of the ocean. Efficient and integrative analysis of these data, either across different sonar instruments or with other oceanographic datasets, holds the key to understanding the response of marine ecosystems to the rapidly c… ▽ More High-frequency sonar systems deployed on a wide array of ocean observing platforms are creating a deluge of water column sonar data at an unprecedented speed from all corners of the ocean. Efficient and integrative analysis of these data, either across different sonar instruments or with other oceanographic datasets, holds the key to understanding the response of marine ecosystems to the rapidly changing climate. Here we present Echopype, an open-source Python software library designed to address this need. By standardizing water column sonar data from diverse instruments following a community convention and utilizing the widely embraced netCDF data model to encode sonar data as labeled, multi-dimensional arrays, Echopype facilitates intuitive, user-friendly exploration and use of sonar data in an instrument-agnostic manner. By leveraging existing open-source Python libraries optimized for distributed computing, Echopype directly enables computational interoperability and scalability in both local and cloud computing environments. Echopype's modularized package structure further provides a conceptually unified implementation framework for expanding its support for additional instrument raw data formats and incorporating new data analysis functionalities. We envision the continued development of Echopype as a catalyst for making information derived from water column sonar data an integrated component of regional and global ocean observation strategies. △ Less

Submitted 12 March, 2024; v1 submitted 30 October, 2021; originally announced November 2021.

Comments: Fix erroneous annotations in use case example flowchart

arXiv:2105.06887 [pdf]

A Frequency Domain Constraint for Synthetic and Real X-ray Image Super Resolution

Authors: Qing Ma, Jae Chul Koh, WonSook Lee

Abstract: Synthetic X-ray images are simulated X-ray images projected from CT data. High-quality synthetic X-ray images can facilitate various applications such as surgical image guidance systems and VR training simulations. However, it is difficult to produce high-quality arbitrary view synthetic X-ray images in real-time due to different CT slice thickness, high computational cost, and the complexity of a… ▽ More Synthetic X-ray images are simulated X-ray images projected from CT data. High-quality synthetic X-ray images can facilitate various applications such as surgical image guidance systems and VR training simulations. However, it is difficult to produce high-quality arbitrary view synthetic X-ray images in real-time due to different CT slice thickness, high computational cost, and the complexity of algorithms. Our goal is to generate high-resolution synthetic X-ray images in real-time by upsampling low-resolution images with deep learning-based super-resolution methods. Reference-based Super Resolution (RefSR) has been well studied in recent years and has shown higher performance than traditional Single Image Super-Resolution (SISR). It can produce fine details by utilizing the reference image but still inevitably generates some artifacts and noise. In this paper, we introduce frequency domain loss as a constraint to further improve the quality of the RefSR results with fine details and without obvious artifacts. To the best of our knowledge, this is the first paper utilizing the frequency domain for the loss functions in the field of super-resolution. We achieved good results in evaluating our method on both synthetic and real X-ray image datasets. △ Less

Submitted 10 August, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

arXiv:2104.11421 [pdf, other]

A Framework for Recognizing and Estimating Human Concentration Levels

Authors: Woodo Lee, Jakyung Koo, Nokyung Park, Pilgu Kang, Jeakwon Shim

Abstract: One of the major tasks in online education is to estimate the concentration levels of each student. Previous studies have a limitation of classifying the levels using discrete states only. The purpose of this paper is to estimate the subtle levels as specified states by using the minimum amount of body movement data. This is done by a framework composed of a Deep Neural Network and Kalman Filter.… ▽ More One of the major tasks in online education is to estimate the concentration levels of each student. Previous studies have a limitation of classifying the levels using discrete states only. The purpose of this paper is to estimate the subtle levels as specified states by using the minimum amount of body movement data. This is done by a framework composed of a Deep Neural Network and Kalman Filter. Using this framework, we successfully extracted the concentration levels, which can be used to aid lecturers and expand to other areas. △ Less

Submitted 23 April, 2021; originally announced April 2021.

arXiv:2104.11267 [pdf, other]

Integrated Framework of Vehicle Dynamics, Instabilities, Energy Models, and Sparse Flow Smoothing Controllers

Authors: Jonathan W. Lee, George Gunter, Rabie Ramadan, Sulaiman Almatrudi, Paige Arnold, John Aquino, William Barbour, Rahul Bhadani, Joy Carpio, Fang-Chieh Chou, Marsalis Gibson, Xiaoqian Gong, Amaury Hayat, Nour Khoudari, Abdul Rahman Kreidieh, Maya Kumar, Nathan Lichtlé, Sean McQuade, Brian Nguyen, Megan Ross, Sydney Truong, Eugene Vinitsky, Yibo Zhao, Jonathan Sprinkle, Benedetto Piccoli , et al. (3 additional authors not shown)

Abstract: This work presents an integrated framework of: vehicle dynamics models, with a particular attention to instabilities and traffic waves; vehicle energy models, with particular attention to accurate energy values for strongly unsteady driving profiles; and sparse Lagrangian controls via automated vehicles, with a focus on controls that can be executed via existing technology such as adaptive cruise… ▽ More This work presents an integrated framework of: vehicle dynamics models, with a particular attention to instabilities and traffic waves; vehicle energy models, with particular attention to accurate energy values for strongly unsteady driving profiles; and sparse Lagrangian controls via automated vehicles, with a focus on controls that can be executed via existing technology such as adaptive cruise control systems. This framework serves as a key building block in develo** control strategies for human-in-the-loop traffic flow smoothing on real highways. In this contribution, we outline the fundamental merits of integrating vehicle dynamics and energy modeling into a single framework, and we demonstrate the energy impact of sparse flow smoothing controllers via simulation results. △ Less

Submitted 22 April, 2021; originally announced April 2021.

arXiv:2104.04215 [pdf, other]

doi 10.1109/LWC.2021.3123314

Sparse Channel Estimation in Wideband Systems with Geometric Sequence Decomposition

Authors: Woong-Hee Lee, Ki Won Sung

Abstract: The sparsity of multipaths in the wideband channel has motivated the use of compressed sensing for channel estimation. In this letter, we propose a different approach to sparse channel estimation. We exploit the fact that $L$ taps of channel impulse response in time domain constitute a non-orthogonal superposition of $L$ geometric sequences in frequency domain. This converts the channel estimation… ▽ More The sparsity of multipaths in the wideband channel has motivated the use of compressed sensing for channel estimation. In this letter, we propose a different approach to sparse channel estimation. We exploit the fact that $L$ taps of channel impulse response in time domain constitute a non-orthogonal superposition of $L$ geometric sequences in frequency domain. This converts the channel estimation problem into the extraction of the parameters of geometric sequences. Numerical results show that the proposed scheme is superior to existing algorithms in high signal-to-noise ratio (SNR) and large bandwidth conditions. △ Less

Submitted 24 October, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

arXiv:2101.07937 [pdf, other]

doi 10.1109/LCOMM.2021.3091800

Noise Learning Based Denoising Autoencoder

Authors: Woong-Hee Lee, Mustafa Ozger, Ursula Challita, Ki Won Sung

Abstract: This letter introduces a new denoiser that modifies the structure of denoising autoencoder (DAE), namely noise learning based DAE (nlDAE). The proposed nlDAE learns the noise of the input data. Then, the denoising is performed by subtracting the regenerated noise from the noisy input. Hence, nlDAE is more effective than DAE when the noise is simpler to regenerate than the original data. To validat… ▽ More This letter introduces a new denoiser that modifies the structure of denoising autoencoder (DAE), namely noise learning based DAE (nlDAE). The proposed nlDAE learns the noise of the input data. Then, the denoising is performed by subtracting the regenerated noise from the noisy input. Hence, nlDAE is more effective than DAE when the noise is simpler to regenerate than the original data. To validate the performance of nlDAE, we provide three case studies: signal restoration, symbol demodulation, and precise localization. Numerical results suggest that nlDAE requires smaller latent space dimension and smaller training dataset compared to DAE. △ Less

Submitted 21 June, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

arXiv:2010.01799 [pdf, other]

Understanding Catastrophic Overfitting in Single-step Adversarial Training

Authors: Hoki Kim, Woo** Lee, Jaewook Lee

Abstract: Although fast adversarial training has demonstrated both robustness and efficiency, the problem of "catastrophic overfitting" has been observed. This is a phenomenon in which, during single-step adversarial training, the robust accuracy against projected gradient descent (PGD) suddenly decreases to 0% after a few epochs, whereas the robust accuracy against fast gradient sign method (FGSM) increase… ▽ More Although fast adversarial training has demonstrated both robustness and efficiency, the problem of "catastrophic overfitting" has been observed. This is a phenomenon in which, during single-step adversarial training, the robust accuracy against projected gradient descent (PGD) suddenly decreases to 0% after a few epochs, whereas the robust accuracy against fast gradient sign method (FGSM) increases to 100%. In this paper, we demonstrate that catastrophic overfitting is very closely related to the characteristic of single-step adversarial training which uses only adversarial examples with the maximum perturbation, and not all adversarial examples in the adversarial direction, which leads to decision boundary distortion and a highly curved loss surface. Based on this observation, we propose a simple method that not only prevents catastrophic overfitting, but also overrides the belief that it is difficult to prevent multi-step adversarial attacks with single-step adversarial training. △ Less

Submitted 15 December, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: Accepted to AAAI 2021. Preprint

arXiv:2008.06146 [pdf]

End-to-End Trainable Self-Attentive Shallow Network for Text-Independent Speaker Verification

Authors: Hyeonmook Park, Jungbae Park, Sang Wan Lee

Abstract: Generalized end-to-end (GE2E) model is widely used in speaker verification (SV) fields due to its expandability and generality regardless of specific languages. However, the long-short term memory (LSTM) based on GE2E has two limitations: First, the embedding of GE2E suffers from vanishing gradient, which leads to performance degradation for very long input sequences. Secondly, utterances are not… ▽ More Generalized end-to-end (GE2E) model is widely used in speaker verification (SV) fields due to its expandability and generality regardless of specific languages. However, the long-short term memory (LSTM) based on GE2E has two limitations: First, the embedding of GE2E suffers from vanishing gradient, which leads to performance degradation for very long input sequences. Secondly, utterances are not represented as a properly fixed dimensional vector. In this paper, to overcome issues mentioned above, we propose a novel framework for SV, end-to-end trainable self-attentive shallow network (SASN), incorporating a time-delay neural network (TDNN) and a self-attentive pooling mechanism based on the self-attentive x-vector system during an utterance embedding phase. We demonstrate that the proposed model is highly efficient, and provides more accurate speaker verification than GE2E. For VCTK dataset, with just less than half the size of GE2E, the proposed model showed significant performance improvement over GE2E of about 63%, 67%, and 85% in EER (Equal error rate), DCF (Detection cost function), and AUC (Area under the curve), respectively. Notably, when the input length becomes longer, the DCF score improvement of the proposed model is about 17 times greater than that of GE2E. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: 5 pages, 3 figures, 3 tables

arXiv:2007.11653 [pdf]

Darwin's Neural Network: AI-based Strategies for Rapid and Scalable Cell and Coronavirus Screening

Authors: Sang Won Lee, Yueh-Ting Chiu, Philip Brudnicki, Audrey M. Bischoff, Angus Jelinek, Jenny Zijun Wang, Danielle R. Bogdanowicz, Andrew F. Laine, Jia Guo, Helen H. Lu

Abstract: Recent advances in the interdisciplinary scientific field of machine perception, computer vision, and biomedical engineering underpin a collection of machine learning algorithms with a remarkable ability to decipher the contents of microscope and nanoscope images. Machine learning algorithms are transforming the interpretation and analysis of microscope and nanoscope imaging data through use in co… ▽ More Recent advances in the interdisciplinary scientific field of machine perception, computer vision, and biomedical engineering underpin a collection of machine learning algorithms with a remarkable ability to decipher the contents of microscope and nanoscope images. Machine learning algorithms are transforming the interpretation and analysis of microscope and nanoscope imaging data through use in conjunction with biological imaging modalities. These advances are enabling researchers to carry out real-time experiments that were previously thought to be computationally impossible. Here we adapt the theory of survival of the fittest in the field of computer vision and machine perception to introduce a new framework of multi-class instance segmentation deep learning, Darwin's Neural Network (DNN), to carry out morphometric analysis and classification of COVID19 and MERS-CoV collected in vivo and of multiple mammalian cell types in vitro. △ Less

Submitted 22 July, 2020; originally announced July 2020.

Comments: 19 pages, 7 figures

ACM Class: I.5.0

arXiv:2007.02906 [pdf, other]

doi 10.1121/10.0002670

Compact representation of temporal processes in echosounder time series via matrix decomposition

Authors: Wu-Jung Lee, Valentina Staneva

Abstract: The recent explosion in the availability of echosounder data from diverse ocean platforms has created unprecedented opportunities to observe the marine ecosystems at broad scales. However, the critical lack of methods capable of automatically discovering and summarizing prominent spatio-temporal echogram structures has limited the effective and wider use of these rich datasets. To address this cha… ▽ More The recent explosion in the availability of echosounder data from diverse ocean platforms has created unprecedented opportunities to observe the marine ecosystems at broad scales. However, the critical lack of methods capable of automatically discovering and summarizing prominent spatio-temporal echogram structures has limited the effective and wider use of these rich datasets. To address this challenge, we develop a data-driven methodology based on matrix decomposition that builds compact representation of long-term echosounder time series using intrinsic features in the data. In a two-stage approach, we first remove noisy outliers from the data by Principal Component Pursuit, then employ a temporally smooth Nonnegative Matrix Factorization to automatically discover a small number of distinct daily echogram patterns, whose time-varying linear combination (activation) reconstructs the dominant echogram structures. This low-rank representation provides biological information that is more tractable and interpretable than the original data, and is suitable for visualization and systematic analysis with other ocean variables. Unlike existing methods that rely on fixed, handcrafted rules, our unsupervised machine learning approach is well-suited for extracting information from data collected from unfamiliar or rapidly changing ecosystems. This work forms the basis for constructing robust time series analytics for large-scale, acoustics-based biological observation in the ocean. △ Less

Submitted 30 November, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

arXiv:2005.08701 [pdf, other]

doi 10.1007/s40042-021-00056-8

Machine learning for the diagnosis of early stage diabetes using temporal glucose profiles

Authors: Woo Seok Lee, Junghyo Jo, Taegeun Song

Abstract: Machine learning shows remarkable success for recognizing patterns in data. Here we apply the machine learning (ML) for the diagnosis of early stage diabetes, which is known as a challenging task in medicine. Blood glucose levels are tightly regulated by two counter-regulatory hormones, insulin and glucagon, and the failure of the glucose homeostasis leads to the common metabolic disease, diabetes… ▽ More Machine learning shows remarkable success for recognizing patterns in data. Here we apply the machine learning (ML) for the diagnosis of early stage diabetes, which is known as a challenging task in medicine. Blood glucose levels are tightly regulated by two counter-regulatory hormones, insulin and glucagon, and the failure of the glucose homeostasis leads to the common metabolic disease, diabetes mellitus. It is a chronic disease that has a long latent period the complicates detection of the disease at an early stage. The vast majority of diabetics result from that diminished effectiveness of insulin action. The insulin resistance must modify the temporal profile of blood glucose. Thus we propose to use ML to detect the subtle change in the temporal pattern of glucose concentration. Time series data of blood glucose with sufficient resolution is currently unavailable, so we confirm the proposal using synthetic data of glucose profiles produced by a biophysical model that considers the glucose regulation and hormone action. Multi-layered perceptrons, convolutional neural networks, and recurrent neural networks all identified the degree of insulin resistance with high accuracy above $85\%$. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: 4 pages, 2 figure

arXiv:2005.02144 [pdf, other]

Potential, Challenges and Future Directions for Deep Learning in Prognostics and Health Management Applications

Authors: Olga Fink, Qin Wang, Markus Svensén, Pierre Dersin, Wan-Jui Lee, Melanie Ducoffe

Abstract: Deep learning applications have been thriving over the last decade in many different domains, including computer vision and natural language understanding. The drivers for the vibrant development of deep learning have been the availability of abundant data, breakthroughs of algorithms and the advancements in hardware. Despite the fact that complex industrial assets have been extensively monitored… ▽ More Deep learning applications have been thriving over the last decade in many different domains, including computer vision and natural language understanding. The drivers for the vibrant development of deep learning have been the availability of abundant data, breakthroughs of algorithms and the advancements in hardware. Despite the fact that complex industrial assets have been extensively monitored and large amounts of condition monitoring signals have been collected, the application of deep learning approaches for detecting, diagnosing and predicting faults of complex industrial assets has been limited. The current paper provides a thorough evaluation of the current developments, drivers, challenges, potential solutions and future research needs in the field of deep learning applied to Prognostics and Health Management (PHM) applications. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: To appear in Engineering Applications of Artificial Intelligence

arXiv:2004.12786 [pdf, other]

A Cascaded Learning Strategy for Robust COVID-19 Pneumonia Chest X-Ray Screening

Authors: Chun-Fu Yeh, Hsien-Tzu Cheng, Andy Wei, Hsin-Ming Chen, Po-Chen Kuo, Keng-Chi Liu, Mong-Chi Ko, Ray-Jade Chen, Po-Chang Lee, Jen-Hsiang Chuang, Chi-Mai Chen, Yi-Chang Chen, Wen-Jeng Lee, Ning Chien, Jo-Yu Chen, Yu-Sen Huang, Yu-Chien Chang, Yu-Cheng Huang, Nai-Kuan Chou, Kuan-Hua Chao, Yi-Chin Tu, Yeun-Chung Chang, Tyng-Luh Liu

Abstract: We introduce a comprehensive screening platform for the COVID-19 (a.k.a., SARS-CoV-2) pneumonia. The proposed AI-based system works on chest x-ray (CXR) images to predict whether a patient is infected with the COVID-19 disease. Although the recent international joint effort on making the availability of all sorts of open data, the public collection of CXR images is still relatively small for relia… ▽ More We introduce a comprehensive screening platform for the COVID-19 (a.k.a., SARS-CoV-2) pneumonia. The proposed AI-based system works on chest x-ray (CXR) images to predict whether a patient is infected with the COVID-19 disease. Although the recent international joint effort on making the availability of all sorts of open data, the public collection of CXR images is still relatively small for reliably training a deep neural network (DNN) to carry out COVID-19 prediction. To better address such inefficiency, we design a cascaded learning strategy to improve both the sensitivity and the specificity of the resulting DNN classification model. Our approach leverages a large CXR image dataset of non-COVID-19 pneumonia to generalize the original well-trained classification model via a cascaded learning scheme. The resulting screening system is shown to achieve good classification performance on the expanded dataset, including those newly added COVID-19 CXR images. △ Less

Submitted 30 April, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: 14 pages, 6 figures

arXiv:2004.11138 [pdf, other]

doi 10.1145/3425780

The Creation and Detection of Deepfakes: A Survey

Authors: Yisroel Mirsky, Wenke Lee

Abstract: Generative deep learning algorithms have progressed to a point where it is difficult to tell the difference between what is real and what is fake. In 2018, it was discovered how easy it is to use this technology for unethical and malicious applications, such as the spread of misinformation, impersonation of political leaders, and the defamation of innocent individuals. Since then, these `deepfakes… ▽ More Generative deep learning algorithms have progressed to a point where it is difficult to tell the difference between what is real and what is fake. In 2018, it was discovered how easy it is to use this technology for unethical and malicious applications, such as the spread of misinformation, impersonation of political leaders, and the defamation of innocent individuals. Since then, these `deepfakes' have advanced significantly. In this paper, we explore the creation and detection of deepfakes and provide an in-depth view of how these architectures work. The purpose of this survey is to provide the reader with a deeper understanding of (1) how deepfakes are created and detected, (2) the current trends and advancements in this domain, (3) the shortcomings of the current defense solutions, and (4) the areas which require further research and attention. △ Less

Submitted 13 September, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

Journal ref: ACM Computing Surveys (CSUR), 2020, preprint

Showing 1–50 of 75 results for author: Lee, W