Search | arXiv e-print repository

PretVM: Predictable, Efficient Virtual Machine for Real-Time Concurrency

Authors: Shaokai Lin, Erling Jellum, Mirco Theile, Tassilo Tanneberger, Binqi Sun, Chadlia Jerad, Ruomu Xu, Guangyu Feng, Christian Menard, Marten Lohstroh, Jeronimo Castrillon, Sanjit Seshia, Edward Lee

Abstract: This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with wel… ▽ More This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with well-defined worst-case timing bounds. The PretVM provides a clean separation between application logic and coordination logic, yielding more analyzable program executions. Experiments compare the PretVM against the default (more dynamic) LF scheduler and show that it delivers time-accurate deterministic execution. △ Less

Submitted 25 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2312.02669 [pdf, other]

Deep-learning-driven end-to-end metalens imaging

Authors: Joonhyuk Seo, Jaegang Jo, Joohoon Kim, Joonho Kang, Chanik Kang, Seongwon Moon, Eunji Lee, Jehyeong Hong, Junsuk Rho, Haejun Chung

Abstract: Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic ab… ▽ More Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic aberration, angular aberration, and a relatively low efficiency. In this study, a deep-learning-based image restoration framework is proposed to overcome these limitations and realize end-to-end metalens imaging, thereby achieving aberration-free full-color imaging for mass-produced metalenses with 10-mm diameter. Neural-network-assisted metalens imaging achieved a high resolution comparable to that of the ground truth image. △ Less

Submitted 10 May, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: 17 pages, 7 figures, 1 table

arXiv:2312.01638 [pdf, other]

J-Net: Improved U-Net for Terahertz Image Super-Resolution

Authors: Woon-Ha Yeo, Seung-Hwan Jung, Seung Jae Oh, Inhee Maeng, Eui Su Lee, Han-Cheol Ryu

Abstract: Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot… ▽ More Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot research topics. We propose a novel network architecture called J-Net which is improved version of U-Net to solve the THz image super-resolution. It employs the simple baseline blocks which can extract low resolution (LR) image features and learn the map** of LR images to highresolution (HR) images efficiently. All training was conducted using the DIV2K+Flickr2K dataset, and we employed the peak signal-to-noise ratio (PSNR) for quantitative comparison. In our comparisons with other THz image super-resolution methods, JNet achieved a PSNR of 32.52 dB, surpassing other techniques by more than 1 dB. J-Net also demonstrates superior performance on real THz images compared to other methods. Experiments show that the proposed J-Net achieves better PSNR and visual improvement compared with other THz image super-resolution methods. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.01285 [pdf, other]

A Literature Review on the Smart Wheelchair Systems

Authors: Yane Kim, Bharath Velamala, Youngseo Choi, Yu** Kim, Hyunkin Kim, Nishad Kulkarni, Eung-Joo Lee

Abstract: This study offers an in-depth analysis of smart wheelchair (SW) systems, charting their progression from early developments to future innovations. It delves into various Brain-Computer Interface (BCI) systems, including mu rhythm, event-related potential, and steady-state visual evoked potential. The paper addresses challenges in signal categorization, proposing the sparse Bayesian extreme learnin… ▽ More This study offers an in-depth analysis of smart wheelchair (SW) systems, charting their progression from early developments to future innovations. It delves into various Brain-Computer Interface (BCI) systems, including mu rhythm, event-related potential, and steady-state visual evoked potential. The paper addresses challenges in signal categorization, proposing the sparse Bayesian extreme learning machine as an innovative solution. Additionally, it explores the integration of emotional states in BCI systems, the application of alternative control methods such as EMG-based systems, and the deployment of intelligent adaptive interfaces utilizing recurrent quantum neural networks. The study also covers advancements in autonomous navigation, assistance, and map**, emphasizing their importance in SW systems. The human aspect of SW interaction receives considerable attention, specifically in terms of privacy, physiological factors, and the refinement of control mechanisms. The paper acknowledges the commercial challenges faced, like the limitations of indoor usage and the necessity for user training. For future applications, the research explores the potential of autonomous systems adept at adapting to changing environments and user needs. This exploration includes reinforcement learning and various control methods, such as eye and voice control, to improve adaptability and interaction. The potential integration with smart home technologies, including advanced features such as robotic arms, is also considered, aiming to further enhance user accessibility and independence. Ultimately, this study seeks to provide a thorough overview of SW systems, presenting extensive research to detail their historical evolution, current state, and future prospects. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2309.13753 [pdf, other]

Policy Stitching: Learning Transferable Robot Policies

Authors: **cheng Jian, Easop Lee, Zachary Bell, Michael M. Zavlanos, Boyuan Chen

Abstract: Training robots with reinforcement learning (RL) typically involves heavy interactions with the environment, and the acquired skills are often sensitive to changes in task environments and robot kinematics. Transfer RL aims to leverage previous knowledge to accelerate learning of new tasks or new body configurations. However, existing methods struggle to generalize to novel robot-task combinations… ▽ More Training robots with reinforcement learning (RL) typically involves heavy interactions with the environment, and the acquired skills are often sensitive to changes in task environments and robot kinematics. Transfer RL aims to leverage previous knowledge to accelerate learning of new tasks or new body configurations. However, existing methods struggle to generalize to novel robot-task combinations and scale to realistic tasks due to complex architecture design or strong regularization that limits the capacity of the learned policy. We propose Policy Stitching, a novel framework that facilitates robot transfer learning for novel combinations of robots and tasks. Our key idea is to apply modular policy design and align the latent representations between the modular interfaces. Our method allows direct stitching of the robot and task modules trained separately to form a new policy for fast adaptation. Our simulated and real-world experiments on various 3D manipulation tasks demonstrate the superior zero-shot and few-shot transfer learning performances of our method. Our project website is at: http://generalroboticslab.com/PolicyStitching/ . △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: CoRL 2023

arXiv:2304.03295 [pdf, other]

Automatic Detection of Reactions to Music via Earable Sensing

Authors: Euihyoek Lee, Chulhong Min, Jeaseung Lee, ** Yu, Seungwoo Kang

Abstract: We present GrooveMeter, a novel system that automatically detects vocal and motion reactions to music via earable sensing and supports music engagement-aware applications. To this end, we use smart earbuds as sensing devices, which are already widely used for music listening, and devise reaction detection techniques by leveraging an inertial measurement unit (IMU) and a microphone on earbuds. To e… ▽ More We present GrooveMeter, a novel system that automatically detects vocal and motion reactions to music via earable sensing and supports music engagement-aware applications. To this end, we use smart earbuds as sensing devices, which are already widely used for music listening, and devise reaction detection techniques by leveraging an inertial measurement unit (IMU) and a microphone on earbuds. To explore reactions in daily music-listening situations, we collect the first kind of dataset, MusicReactionSet, containing 926-minute-long IMU and audio data with 30 participants. With the dataset, we discover a set of unique challenges in detecting music listening reactions accurately and robustly using audio and motion sensing. We devise sophisticated processing pipelines to make reaction detection accurate and efficient. We present a comprehensive evaluation to examine the performance of reaction detection and system cost. It shows that GrooveMeter achieves the macro F1 scores of 0.89 for vocal reaction and 0.81 for motion reaction with leave-one-subject-out cross-validation. More importantly, GrooveMeter shows higher accuracy and robustness compared to alternative methods. We also show that our filtering approach reduces 50% or more of the energy overhead. Finally, we demonstrate the potential use cases through a case study. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2303.13567 [pdf]

AI Models Close to your Chest: Robust Federated Learning Strategies for Multi-site CT

Authors: Edward H. Lee, Brendan Kelly, Emre Altinmakas, Hakan Dogan, Maryam Mohammadzadeh, Errol Colak, Steve Fu, Olivia Choudhury, Ujjwal Ratan, Felipe Kitamura, Hernan Chaves, Jimmy Zheng, Mourad Said, Eduardo Reis, Jaekwang Lim, Patricia Yokoo, Courtney Mitchell, Golnaz Houshmand, Marzyeh Ghassemi, Ronan Killeen, Wendy Qiu, Joel Hayden, Farnaz Rafiee, Chad Klochko, Nicholas Bevins , et al. (5 additional authors not shown)

Abstract: While it is well known that population differences from genetics, sex, race, and environmental factors contribute to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI developm… ▽ More While it is well known that population differences from genetics, sex, race, and environmental factors contribute to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI development that enables learning across hospitals without data share. In this study, we show the results of various FL strategies on one of the largest and most diverse COVID-19 chest CT datasets: 21 participating hospitals across five continents that comprise >10,000 patients with >1 million images. We also propose an FL strategy that leverages synthetically generated data to overcome class and size imbalances. We also describe the sources of data heterogeneity in the context of FL, and show how even among the correctly labeled populations, disparities can arise due to these biases. △ Less

Submitted 13 April, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.10770 [pdf, other]

doi 10.1002/aisy.202400265

RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network

Authors: Sangmin Yoo, Eric Yeu-Jer Lee, Ziyu Wang, Xinxin Wang, Wei D. Lu

Abstract: Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are expensive to train. In this work, we propose a neural network architecture, Reservoir Nodes-enabled neuromorphic vision… ▽ More Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are expensive to train. In this work, we propose a neural network architecture, Reservoir Nodes-enabled neuromorphic vision sensing Network (RN-Net), based on simple convolution layers integrated with dynamic temporal encoding reservoirs for local and global spatiotemporal feature detection with low hardware and training costs. The RN-Net allows efficient processing of asynchronous temporal features, and achieves the highest accuracy of 99.2% for DVS128 Gesture reported to date, and one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller network size. By leveraging the internal device and circuit dynamics, asynchronous temporal feature encoding can be implemented at very low hardware cost without preprocessing and dedicated memory and arithmetic units. The use of simple DNN blocks and standard backpropagation-based training rules further reduces implementation costs. △ Less

Submitted 24 May, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

Comments: 12 pages, 5 figures, 4 tables

arXiv:2303.00795 [pdf, other]

Improved Segmentation of Deep Sulci in Cortical Gray Matter Using a Deep Learning Framework Incorporating Laplace's Equation

Authors: Sadhana Ravikumar, Ranjit Ittyerah, Sydney Lim, Long Xie, Sandhitsu Das, Pulkit Khandelwal, Laura E. M. Wisse, Madigan L. Bedard, John L. Robinson, Terry Schuck, Murray Grossman, John Q. Trojanowski, Edward B. Lee, M. Dylan Tisdall, Karthik Prabhakaran, John A. Detre, David J. Irwin, Winifred Trotman, Gabor Mizsei, Emilio Artacho-Pérula, Maria Mercedes Iñiguez de Onzono Martin, Maria del Mar Arroyo Jiménez, Monica Muñoz, Francisco Javier Molina Romero, Maria del Pilar Marcos Rabal , et al. (7 additional authors not shown)

Abstract: When develo** tools for automated cortical segmentation, the ability to produce topologically correct segmentations is important in order to compute geometrically valid morphometry measures. In practice, accurate cortical segmentation is challenged by image artifacts and the highly convoluted anatomy of the cortex itself. To address this, we propose a novel deep learning-based cortical segmentat… ▽ More When develo** tools for automated cortical segmentation, the ability to produce topologically correct segmentations is important in order to compute geometrically valid morphometry measures. In practice, accurate cortical segmentation is challenged by image artifacts and the highly convoluted anatomy of the cortex itself. To address this, we propose a novel deep learning-based cortical segmentation method in which prior knowledge about the geometry of the cortex is incorporated into the network during the training process. We design a loss function which uses the theory of Laplace's equation applied to the cortex to locally penalize unresolved boundaries between tightly folded sulci. Using an ex vivo MRI dataset of human medial temporal lobe specimens, we demonstrate that our approach outperforms baseline segmentation networks, both quantitatively and qualitatively. △ Less

Submitted 3 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted at the 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)

arXiv:2303.00091 [pdf, other]

Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model

Authors: Jaeyoung Huh, Sangjoon Park, Jeong Eun Lee, Jong Chul Ye

Abstract: Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications of ASR is Speech-To-Text (STT) technology, which simplifies user workflows by transcribing spoken words into text. In the medical field, STT has the potential to significantly reduce the workload of clinicians who rely on… ▽ More Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications of ASR is Speech-To-Text (STT) technology, which simplifies user workflows by transcribing spoken words into text. In the medical field, STT has the potential to significantly reduce the workload of clinicians who rely on typists to transcribe their voice recordings. However, develo** an STT model for the medical domain is challenging due to the lack of sufficient speech and text datasets. To address this issue, we propose a medical-domain text correction method that modifies the output text of a general STT system using the Vision Language Pre-training (VLP) method. VLP combines textual and visual information to correct text based on image knowledge. Our extensive experiments demonstrate that the proposed method offers quantitatively and clinically significant improvements in STT performance in the medical field. We further show that multi-modal understanding of image and text information outperforms single-modal understanding using only text information. △ Less

Submitted 27 February, 2023; originally announced March 2023.

arXiv:2301.03027 [pdf, other]

Annealed Score-Based Diffusion Model for MR Motion Artifact Reduction

Authors: Gyutaek Oh, Jeong Eun Lee, Jong Chul Ye

Abstract: Motion artifact reduction is one of the important research topics in MR imaging, as the motion artifact degrades image quality and makes diagnosis difficult. Recently, many deep learning approaches have been studied for motion artifact reduction. Unfortunately, most existing models are trained in a supervised manner, requiring paired motion-corrupted and motion-free images, or are based on a stric… ▽ More Motion artifact reduction is one of the important research topics in MR imaging, as the motion artifact degrades image quality and makes diagnosis difficult. Recently, many deep learning approaches have been studied for motion artifact reduction. Unfortunately, most existing models are trained in a supervised manner, requiring paired motion-corrupted and motion-free images, or are based on a strict motion-corruption model, which limits their use for real-world situations. To address this issue, here we present an annealed score-based diffusion model for MRI motion artifact reduction. Specifically, we train a score-based model using only motion-free images, and then motion artifacts are removed by applying forward and reverse diffusion processes repeatedly to gradually impose a low-frequency data consistency. Experimental results verify that the proposed method successfully reduces both simulated and in vivo motion artifacts, outperforming the state-of-the-art deep learning methods. △ Less

Submitted 8 January, 2023; originally announced January 2023.

arXiv:2210.13576 [pdf, ps, other]

Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation

Authors: Evonne P. C. Lee, Guangzhi Sun, Chao Zhang, Philip C. Woodland

Abstract: In speaker diarisation, speaker embedding extraction models often suffer from the mismatch between their training loss functions and the speaker clustering method. In this paper, we propose the method of spectral clustering-aware learning of embeddings (SCALE) to address the mismatch. Specifically, besides an angular prototype cal (AP) loss, SCALE uses a novel affinity matrix loss which directly m… ▽ More In speaker diarisation, speaker embedding extraction models often suffer from the mismatch between their training loss functions and the speaker clustering method. In this paper, we propose the method of spectral clustering-aware learning of embeddings (SCALE) to address the mismatch. Specifically, besides an angular prototype cal (AP) loss, SCALE uses a novel affinity matrix loss which directly minimises the error between the affinity matrix estimated from speaker embeddings and the reference. SCALE also includes p-percentile thresholding and Gaussian blur as two important hyper-parameters for spectral clustering in training. Experiments on the AMI dataset showed that speaker embeddings obtained with SCALE achieved over 50% relative speaker error rate reductions using oracle segmentation, and over 30% relative diarisation error rate reductions using automatic segmentation when compared to a strong baseline with the AP-loss-based speaker embeddings. △ Less

Submitted 14 March, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: To appear in ICASSP 2023, 5 pages

arXiv:2208.05140 [pdf, other]

Self-supervised Multi-modal Training from Uncurated Image and Reports Enables Zero-shot Oversight Artificial Intelligence in Radiology

Authors: Sangjoon Park, Eun Sun Lee, Kyung Sook Shin, Jeong Eun Lee, Jong Chul Ye

Abstract: Oversight AI is an emerging concept in radiology where the AI forms a symbiosis with radiologists by continuously supporting radiologists in their decision-making. Recent advances in vision-language models sheds a light on the long-standing problems of the oversight AI by the understanding both visual and textual concepts and their semantic correspondences. However, there have been limited success… ▽ More Oversight AI is an emerging concept in radiology where the AI forms a symbiosis with radiologists by continuously supporting radiologists in their decision-making. Recent advances in vision-language models sheds a light on the long-standing problems of the oversight AI by the understanding both visual and textual concepts and their semantic correspondences. However, there have been limited successes in the application of vision-language models in the medical domain, as the current vision-language models and learning strategies for photographic images and captions call for the web-scale data corpus of image and text pairs which was not often feasible in the medical domain. To address this, here we present a model dubbed Medical Cross-attention Vision-Language model (Medical X-VL), leveraging the key components to be tailored for the medical domain. Our medical X-VL model is based on the following components: self-supervised uni-modal models in medical domain and fusion encoder to bridge them, momentum distillation, sentence-wise contrastive learning for medical reports, and the sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for oversight AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed the current state-of-the-art models in two different medical image database, suggesting the novel clinical usage of our oversight AI model for monitoring human errors. Our method was especially successful in the data-limited setting, which is frequently encountered in the clinics, suggesting the potential widespread applicability in medical domain. △ Less

Submitted 12 April, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2203.12621 [pdf, other]

MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion

Authors: Hyung** Chung, Eun Sun Lee, Jong Chul Ye

Abstract: Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifact, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce… ▽ More Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifact, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce a blurred output. Moreover, such models suffer when deployed in real-world sitautions: out-of-distribution data, and complex noise distributions that deviate from the usual parametric noise models. In this work, we propose a new denoising method based on score-based reverse diffusion sampling, which overcomes all the aforementioned drawbacks. Our network, trained only with coronal knee scans, excels even on out-of-distribution in vivo liver MRI data, contaminated with complex mixture of noise. Even more, we propose a method to enhance the resolution of the denoised image with the same network. With extensive experiments, we show that our method establishes state-of-the-art performance, while having desirable properties which prior MMSE denoisers did not have: flexibly choosing the extent of denoising, and quantifying uncertainty. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2202.06052 [pdf, other]

Learning by Doing: Controlling a Dynamical System using Causality, Control, and Reinforcement Learning

Authors: Sebastian Weichwald, Søren Wengel Mogensen, Tabitha Edith Lee, Dominik Baumann, Oliver Kroemer, Isabelle Guyon, Sebastian Trimpe, Jonas Peters, Niklas Pfister

Abstract: Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system… ▽ More Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system by excitation strategies to then apply model-based design techniques to control the system. In (non-model-based) reinforcement learning, one directly optimizes a reward. In causality, one focus is on identifiability of causal structure. We believe that combining the different views might create synergies and this competition is meant as a first step toward such synergies. The participants had access to observational and (offline) interventional data generated by dynamical systems. Track CHEM considers an open-loop problem in which a single impulse at the beginning of the dynamics can be set, while Track ROBO considers a closed-loop problem in which control variables can be set at each time step. The goal in both tracks is to infer controls that drive the system to a desired state. Code is open-sourced ( https://github.com/LearningByDoingCompetition/learningbydoing-comp ) to reproduce the winning solutions of the competition and to facilitate trying out new methods on the competition tasks. △ Less

Submitted 12 February, 2022; originally announced February 2022.

Comments: https://learningbydoingcompetition.github.io/

arXiv:2202.02685 [pdf]

Blind source separation of baseband RF communication signals using mixed-signal matrix multiplication circuit

Authors: Bindu Madhavan, Edward Lee, Joshua Zusman, Anthony F. J. Levi

Abstract: An 8 x 8 mixed-signal matrix multiplier architecture based on 64 hybrid capacitor-resistor multiplying digital to analogue converters implemented in a 65 nm CMOS technology was developed for the application of blind source separation of baseband RF signals. The integrated circuit has 13-bit resolution for each matrix weight and achieves a measured dynamic range of > 62 dB with a bandwidth of > 15… ▽ More An 8 x 8 mixed-signal matrix multiplier architecture based on 64 hybrid capacitor-resistor multiplying digital to analogue converters implemented in a 65 nm CMOS technology was developed for the application of blind source separation of baseband RF signals. The integrated circuit has 13-bit resolution for each matrix weight and achieves a measured dynamic range of > 62 dB with a bandwidth of > 15 MHz and typical power dissipation of < 30 mW per matrix row. Separation of single-tone signal is measured to be better than 57 dBc. △ Less

Submitted 5 February, 2022; originally announced February 2022.

arXiv:2112.02896 [pdf, other]

Tunable Image Quality Control of 3-D Ultrasound using Switchable CycleGAN

Authors: Jaeyoung Huh, Shujaat Khan, Sung** Choi, Dongkuk Shin, Eun Sun Lee, Jong Chul Ye

Abstract: In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for… ▽ More In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for example, the image quality is comparable along the beam direction, but significant deterioration in image quality is often observed in the other two axial image planes. To address this, here we propose a novel unsupervised deep learning approach to improve 3-D US image quality. In particular, using {\em unmatched} high-quality 2-D US images as a reference, we trained a recently proposed switchable CycleGAN architecture so that every map** plane in 3-D US can learn the image quality of 2-D US images. Thanks to the switchable architecture, our network can also provide real-time control of image enhancement level based on user preference, which is ideal for a user-centric scanner setup. Extensive experiments with clinical evaluation confirm that our method offers significantly improved image quality as well user-friendly flexibility. △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:2110.07711 [pdf, other]

Gray Matter Segmentation in Ultra High Resolution 7 Tesla ex vivo T2w MRI of Human Brain Hemispheres

Authors: Pulkit Khandelwal, Shokufeh Sadaghiani, Michael Tran Duong, Sadhana Ravikumar, Sydney Lim, Sanaz Arezoumandan, Claire Peterson, Eunice Chung, Madigan Bedard, Noah Capp, Ranjit Ittyerah, Elyse Migdal, Grace Choi, Emily Kopp, Bridget Loja, Eusha Hasan, Jiacheng Li, Karthik Prabhakaran, Gabor Mizsei, Marianna Gabrielyan, Theresa Schuck, John Robinson, Daniel Ohm, Edward Lee, John Q. Trojanowski , et al. (8 additional authors not shown)

Abstract: Ex vivo MRI of the brain provides remarkable advantages over in vivo MRI for visualizing and characterizing detailed neuroanatomy. However, automated cortical segmentation methods in ex vivo MRI are not well developed, primarily due to limited availability of labeled datasets, and heterogeneity in scanner hardware and acquisition protocols. In this work, we present a high resolution 7 Tesla datase… ▽ More Ex vivo MRI of the brain provides remarkable advantages over in vivo MRI for visualizing and characterizing detailed neuroanatomy. However, automated cortical segmentation methods in ex vivo MRI are not well developed, primarily due to limited availability of labeled datasets, and heterogeneity in scanner hardware and acquisition protocols. In this work, we present a high resolution 7 Tesla dataset of 32 ex vivo human brain specimens. We benchmark the cortical mantle segmentation performance of nine neural network architectures, trained and evaluated using manually-segmented 3D patches sampled from specific cortical regions, and show excellent generalizing capabilities across whole brain hemispheres in different specimens, and also on unseen images acquired at different magnetic field strength and imaging sequences. Finally, we provide cortical thickness measurements across key regions in 3D ex vivo human brain images. Our code and processed datasets are publicly available at https://github.com/Pulkit-Khandelwal/picsl-ex-vivo-segmentation. △ Less

Submitted 3 March, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: Ex vivo analysis framework (work in progress 2022 at the University of Pennsylvania)

arXiv:2109.14956 [pdf]

Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark

Authors: Martin Wagner, Beat-Peter Müller-Stich, Anna Kisilenko, Duc Tran, Patrick Heger, Lars Mündermann, David M Lubotsky, Benjamin Müller, Tornike Davitashvili, Manuela Capek, Annika Reinke, Tong Yu, Armine Vardazaryan, Chinedu Innocent Nwoye, Nicolas Padoy, Xinyang Liu, Eung-Joo Lee, Constantin Disch, Hans Meine, Tong Xia, Fucang Jia, Satoshi Kondo, Wolfgang Reiter, Yueming **, Yonghao Long , et al. (16 additional authors not shown)

Abstract: PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo… ▽ More PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center dataset. In this work we investigated the generalizability of phase recognition algorithms in a multi-center setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 teams submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n=9 teams), for instrument presence detection between 38.5% and 63.8% (n=8 teams), but for action recognition only between 21.8% and 23.3% (n=5 teams). The average absolute error for skill assessment was 0.78 (n=1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but are not solved yet, as shown by our comparison of algorithms. This novel benchmark can be used for comparable evaluation and validation of future work. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2106.10437 [pdf, other]

One-to-many Approach for Improving Super-Resolution

Authors: Sieun Park, Eunho Lee

Abstract: Recently, there has been discussions on the ill-posed nature of super-resolution that multiple possible reconstructions exist for a given low-resolution image. Using normalizing flows, SRflow[23] achieves state-of-the-art perceptual quality by learning the distribution of the output instead of a deterministic output to one estimate. In this paper, we adapt the concepts of SRFlow to improve GAN-bas… ▽ More Recently, there has been discussions on the ill-posed nature of super-resolution that multiple possible reconstructions exist for a given low-resolution image. Using normalizing flows, SRflow[23] achieves state-of-the-art perceptual quality by learning the distribution of the output instead of a deterministic output to one estimate. In this paper, we adapt the concepts of SRFlow to improve GAN-based super-resolution by properly implementing the one-to-many property. We modify the generator to estimate a distribution as a map** from random noise. We improve the content loss that hampers the perceptual training objectives. We also propose additional training techniques to further enhance the perceptual quality of generated images. Using our proposed methods, we were able to improve the performance of ESRGAN[1] in x4 perceptual SR and achieve the state-of-the-art LPIPS score in x16 perceptual extreme SR by applying our methods to RFB-ESRGAN[21]. △ Less

Submitted 18 August, 2021; v1 submitted 19 June, 2021; originally announced June 2021.

arXiv:2104.00123 [pdf, other]

Generalized Reinforcement Learning for Building Control using Behavioral Cloning

Authors: Zachary E. Lee, K. Max Zhang

Abstract: Advanced building control methods such as model predictive control (MPC) offer significant potential benefits to both consumers and grid operators, but the high computational requirements have acted as barriers to more widespread adoption. Local control computation requires installation of expensive computational hardware, while cloud computing introduces data security and privacy concerns. In thi… ▽ More Advanced building control methods such as model predictive control (MPC) offer significant potential benefits to both consumers and grid operators, but the high computational requirements have acted as barriers to more widespread adoption. Local control computation requires installation of expensive computational hardware, while cloud computing introduces data security and privacy concerns. In this paper, we drastically reduce the local computational requirements of advanced building control through a reinforcement learning (RL)-based approach called Behavioral Cloning, which represents the MPC policy as a neural network that can be locally implemented and quickly computed on a low-cost programmable logic controller. While previous RL and approximate MPC methods must be specifically trained for each building, our key improvement is that our controller can generalize to many buildings, electricity rates, and thermostat setpoint schedules without additional, effort-intensive retraining. To provide this versatility, we have adapted the traditional Behavioral Cloning approach through (1) a constraint-informed parameter grou** (CIPG) method that provides a more efficient representation of the training data; (2) an MPC-Guided training data generation method using the DAgger algorithm that improves stability and constraint satisfaction; and (3) a new deep learning model-structure called reverse-time recurrent neural networks (RT-RNN) that allows future information to flow backward in time to more effectively interpret the temporal information in disturbance predictions. The result is an easy-to-deploy, generalized behavioral clone of MPC that can be implemented on a programmable logic controller and requires little building-specific controller tuning, reducing the effort and costs associated with implementing smart residential heat pump control. △ Less

Submitted 31 March, 2021; originally announced April 2021.

Comments: submitted to Applied Energy 2021

arXiv:2102.12209 [pdf]

Designing zonal-based flexible bus services under stochastic demand

Authors: Enoch Lee, Xuekai Cen, Hong K. Lo, Ka Fai Ng

Abstract: In this paper, we develop a zonal-based flexible bus services (ZBFBS) by considering both passenger demands spatial (origin-destination or OD) and volume stochastic variations. Service requests are grouped by zonal OD pairs and number of passengers per request, and aggregated into demand categories which follow certain probability distributions. A two-stage stochastic program is formulated to mini… ▽ More In this paper, we develop a zonal-based flexible bus services (ZBFBS) by considering both passenger demands spatial (origin-destination or OD) and volume stochastic variations. Service requests are grouped by zonal OD pairs and number of passengers per request, and aggregated into demand categories which follow certain probability distributions. A two-stage stochastic program is formulated to minimize the expected operating cost of ZBFBS, in which the zonal visit sequences of vehicles are determined in Stage-1, whereas in Stage-2, service requests are assigned to either regular routes determined in Stage-1 or ad hoc services that incur additional costs. Demand volume reliability and detour time reliability are introduced to ensure quality of the services and separate the problem into two phases for efficient solutions. In phase-1, given the reliability requirements, we minimize the cost of operating the regular services. In phase-2, we optimize the passenger assignment to vehicles to minimize the expected ad hoc service cost. The reliabilities are then optimized by a gradient-based approach to minimize the sum of the regular service operating cost and expected ad hoc service cost. We conduct numerical studies on vehicle capacity, detour time limit and demand volume to demonstrate the potential of ZBFBS, and apply the model to Chengdu, China, based on real data to illustrate its applicability. △ Less

Submitted 31 October, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

Comments: 42 pages, 12 figures, manuscript accepted by Transportation Science

arXiv:2011.06337 [pdf, other]

Unsupervised MR Motion Artifact Deep Learning using Outlier-Rejecting Bootstrap Aggregation

Authors: Gyutaek Oh, Jeong Eun Lee, Jong Chul Ye

Abstract: Recently, deep learning approaches for MR motion artifact correction have been extensively studied. Although these approaches have shown high performance and reduced computational complexity compared to classical methods, most of them require supervised training using paired artifact-free and artifact-corrupted images, which may prohibit its use in many important clinical applications. For example… ▽ More Recently, deep learning approaches for MR motion artifact correction have been extensively studied. Although these approaches have shown high performance and reduced computational complexity compared to classical methods, most of them require supervised training using paired artifact-free and artifact-corrupted images, which may prohibit its use in many important clinical applications. For example, transient severe motion (TSM) due to acute transient dyspnea in Gd-EOB-DTPA-enhanced MR is difficult to control and model for paired data generation. To address this issue, here we propose a novel unsupervised deep learning scheme through outlier-rejecting bootstrap subsampling and aggregation. This is inspired by the observation that motions usually cause sparse k-space outliers in the phase encoding direction, so k-space subsampling along the phase encoding direction can remove some outliers and the aggregation step can further improve the results from the reconstruction network. Our method does not require any paired data because the training step only requires artifact-free images. Furthermore, to address the smoothing from potential bias to the artifact-free images, the network is trained in an unsupervised manner using optimal transport driven cycleGAN. We verify that our method can be applied for artifact correction from simulated motion as well as real motion from TSM successfully, outperforming existing state-of-the-art deep learning methods. △ Less

Submitted 12 November, 2020; originally announced November 2020.

arXiv:2007.06786 [pdf, other]

Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner

Authors: Eugene Lee, Evan Chen, Chen-Yi Lee

Abstract: Remote heart rate estimation is the measurement of heart rate without any physical contact with the subject and is accomplished using remote photoplethysmography (rPPG) in this work. rPPG signals are usually collected using a video camera with a limitation of being sensitive to multiple contributing factors, e.g. variation in skin tone, lighting condition and facial structure. End-to-end supervise… ▽ More Remote heart rate estimation is the measurement of heart rate without any physical contact with the subject and is accomplished using remote photoplethysmography (rPPG) in this work. rPPG signals are usually collected using a video camera with a limitation of being sensitive to multiple contributing factors, e.g. variation in skin tone, lighting condition and facial structure. End-to-end supervised learning approach performs well when training data is abundant, covering a distribution that doesn't deviate too much from the distribution of testing data or during deployment. To cope with the unforeseeable distributional changes during deployment, we propose a transductive meta-learner that takes unlabeled samples during testing (deployment) for a self-supervised weight adjustment (also known as transductive inference), providing fast adaptation to the distributional changes. Using this approach, we achieve state-of-the-art performance on MAHNOB-HCI and UBFC-rPPG. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: 26 pages, 10 figures, accepted by European Conference on Computer Vision (ECCV) 2020

arXiv:1907.10491 [pdf, ps, other]

Alternative Intersection Designs with Connected and Automated Vehicle

Authors: Zijia Zhong, Earl E. Lee

Abstract: Alternative intersection designs (AIDs) can improve the performance of an intersection by not only reducing the number of signal phases but also change the configuration of the conflicting points by re-routing traffic. However the AID studies have rarely been extended to Connected and Automated Vehicle (CAV) which is expected to revolutionize our transportation system. In this study, we investigat… ▽ More Alternative intersection designs (AIDs) can improve the performance of an intersection by not only reducing the number of signal phases but also change the configuration of the conflicting points by re-routing traffic. However the AID studies have rarely been extended to Connected and Automated Vehicle (CAV) which is expected to revolutionize our transportation system. In this study, we investigate the potential benefits of CAV to two AIDs: the diverging diamond interchange (DDI) and the restricted crossing U-turn intersection. The potential enhancements of AID, CAV, and the combination of both are quantified via microscopic traffic simulation. We found that CAV is able to positively contribute to the performance of an intersection. However, converting an existing conventional diamond interchange (CDI) to a diverging one is a more effective way according to the simulation results. DDI improves the throughput of a CDI by 950 vehicles per hour, a near 20% improvement; whereas with full penetration of CAV, the throughput of a CDI is increased only by 300 vehicles per hour. A similar trend is observed in the average delay per vehicle as well. Furthermore, we assess the impact for the driver's confusion, a concern for deploying AIDs, on the traffic flow. According to the ANOVA test, the negative impacts of driver's confusion are of statistical significance. △ Less

Submitted 21 July, 2019; originally announced July 2019.

Comments: 6 pages, 6 figures, 2019 IEEE 2nd Connected and Automated Vehicles Symposium. arXiv admin note: text overlap with arXiv:1811.03074

arXiv:1906.07330 [pdf, other]

Boosting CNN beyond Label in Inverse Problems

Authors: Eunju Cha, Jaeduck Jang, Junho Lee, Eunha Lee, Jong Chul Ye

Abstract: Convolutional neural networks (CNN) have been extensively used for inverse problems. However, their prediction error for unseen test data is difficult to estimate a priori since the neural networks are trained using only selected data and their architecture are largely considered a blackbox. This poses a fundamental challenge to neural networks for unsupervised learning or improvement beyond the l… ▽ More Convolutional neural networks (CNN) have been extensively used for inverse problems. However, their prediction error for unseen test data is difficult to estimate a priori since the neural networks are trained using only selected data and their architecture are largely considered a blackbox. This poses a fundamental challenge to neural networks for unsupervised learning or improvement beyond the label. In this paper, we show that the recent unsupervised learning methods such as Noise2Noise, Stein's unbiased risk estimator (SURE)-based denoiser, and Noise2Void are closely related to each other in their formulation of an unbiased estimator of the prediction error, but each of them are associated with its own limitations. Based on these observations, we provide a novel boosting estimator for the prediction error. In particular, by employing combinatorial convolutional frame representation of encoder-decoder CNN and synergistically combining it with the batch normalization, we provide a close form formulation for the unbiased estimator of the prediction error that can be minimized for neural network training beyond the label. Experimental results show that the resulting algorithm, what we call Noise2Boosting, provides consistent improvement in various inverse problems under both supervised and unsupervised learning setting. △ Less

Submitted 17 June, 2019; originally announced June 2019.

arXiv:1811.03074 [pdf, other]

Unconventional Arterial Intersection Designs under Connected and Automated Vehicle Environment: A Survey

Authors: Zijia Zhong, Mark M. Nejad, Earl E. Lee II

Abstract: Signalized intersections are major sources of traffic delay and collision within the modern transportation system. Conventional signal optimization has revealed its limitation in improving the mobility and safety of an intersection. Unconventional arterial intersection designs (UAIDs) are able to improve the performance of an intersection by reducing phases of a signal cycle. Furthermore, they can… ▽ More Signalized intersections are major sources of traffic delay and collision within the modern transportation system. Conventional signal optimization has revealed its limitation in improving the mobility and safety of an intersection. Unconventional arterial intersection designs (UAIDs) are able to improve the performance of an intersection by reducing phases of a signal cycle. Furthermore, they can fundamentally alter the number and the nature of the conflicting points. However, the driver's confusion, as a result of the unconventional geometric designs, remains one of the major barriers for the widespread adoption of UAIDs. Connected and Automated Vehicle (CAV) technology has the potential to overcome this barrier by eliminating the driver's confusion of a UAID. Therefore, UAIDs can play a significant role in transportation networks in the near future. In this paper, we surveyed UAID studies and implementations. In addition, we present an overview of intersection control schemes with the emergence of CAV and highlight the opportunity rises for UAID with the CAV technology. It is believed that the benefits gained from deploying UAIDs in conjunction with CAV are significant during the initial rollout of CAV under low market penetration. △ Less

Submitted 2 August, 2019; v1 submitted 7 November, 2018; originally announced November 2018.

arXiv:1511.02279 [pdf, other]

doi 10.1109/IoTDI.2015.33

Control Improvisation with Probabilistic Temporal Specifications

Authors: Ilge Akkaya, Daniel J. Fremont, Rafael Valle, Alexandre Donzé, Edward A. Lee, Sanjit A. Seshia

Abstract: We consider the problem of generating randomized control sequences for complex networked systems typically actuated by human agents. Our approach leverages a concept known as control improvisation, which is based on a combination of data-driven learning and controller synthesis from formal specifications. We learn from existing data a generative model (for instance, an explicit-duration hidden Mar… ▽ More We consider the problem of generating randomized control sequences for complex networked systems typically actuated by human agents. Our approach leverages a concept known as control improvisation, which is based on a combination of data-driven learning and controller synthesis from formal specifications. We learn from existing data a generative model (for instance, an explicit-duration hidden Markov model, or EDHMM) and then supervise this model in order to guarantee that the generated sequences satisfy some desirable specifications given in Probabilistic Computation Tree Logic (PCTL). We present an implementation of our approach and apply it to the problem of mimicking the use of lighting appliances in a residential unit, with potential applications to home security and resource management. We present experimental results showing that our approach produces realistic control sequences, similar to recorded data based on human actuation, while satisfying suitable formal requirements. △ Less

Submitted 29 February, 2016; v1 submitted 6 November, 2015; originally announced November 2015.

Comments: to appear in Proceedings of the 1st IEEE Conference on Internet-of-Things Design and Implementation (IoTDI'16)

arXiv:1307.3722 [pdf, other]

Numerical LTL Synthesis for Cyber-Physical Systems

Authors: Chih-Hong Cheng, Edward A. Lee

Abstract: Cyber-physical systems (CPS) are systems that interact with the physical world via sensors and actuators. In such a system, the reading of a sensor represents measures of a physical quantity, and sensor values are often reals ranged over bounded intervals. The implementation of control laws is based on nonlinear numerical computations over the received sensor values. Synthesizing controllers fulfi… ▽ More Cyber-physical systems (CPS) are systems that interact with the physical world via sensors and actuators. In such a system, the reading of a sensor represents measures of a physical quantity, and sensor values are often reals ranged over bounded intervals. The implementation of control laws is based on nonlinear numerical computations over the received sensor values. Synthesizing controllers fulfilling features within CPS brings a huge challenge to the research community in formal methods, as most of the works in automatic controller synthesis (LTL synthesis) are restricted to specifications having a few discrete inputs within the Boolean domain. In this report, we present a novel approach that addresses the above challenge to synthesize controllers for CPS. Our core methodology, called numerical LTL synthesis, extends LTL synthesis by using inputs or outputs in real numbers and by allowing predicates of polynomial constraints to be defined within an LTL formula as specification. The synthesis algorithm is based on an interplay between an LTL synthesis engine which handles the pseudo-Boolean structure, together with a nonlinear constraint validity checker which tests the (in)feasibility of a (counter-)strategy. The methodology is integrated within the CPS research framework Ptolemy II via the development of an LTL synthesis module G4LTL and a validity checker JBernstein. Although we only target the theory of nonlinear real arithmetic, the use of pseudo-Boolean synthesis framework also allows an easy extension to embed a richer set of theories, making the technique applicable to a much broader audience. △ Less

Submitted 14 July, 2013; originally announced July 2013.

Comments: 10 pages; work-in-progress report

Showing 1–29 of 29 results for author: Lee, E