Search | arXiv e-print repository

arXiv:2406.07103 [pdf, other]

MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-** Yu

Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw waveforms. The MR-RawNet extracts time-frequency representations from raw waveforms via a multi-resolution feature extractor that optimally adjusts both temporal and spectral resolutions simultaneously. Furthermore, we apply a multi-resolution attention block that focuses on diverse and extensive temporal contexts, ensuring robustness against changes in utterance length. The experimental results, conducted on VoxCeleb1 dataset, demonstrate that the MR-RawNet exhibits superior performance in handling utterances of variable duration compared to other raw waveform-based systems. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 5 pages, accepted by Interspeech 2024

arXiv:2404.07021 [pdf, other]

A 4x32Gb/s 1.8pJ/bit Collaborative Baud-Rate CDR with Background Eye-Climbing Algorithm and Low-Power Global Clock Distribution

Authors: Jihee Kim, Jia Park, Jiwon Shin, Hanseok Kim, Kahyun Kim, Haengbeom Shin, Ha-Jung Park, Woo-Seok Choi

Abstract: This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the freq… ▽ More This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the frequency offset without any phase interpolators. To this end, a fractional divider controlled by CDR is placed close to the global phase locked loop. Moreover, in order to address the sub-optimal lock point of conventional baud-rate phase detectors, the proposed CDR employs a background eye-climbing algorithm, which optimizes the sampling phase and maximizes the vertical eye margin (VEM). Fabricated in a 28nm CMOS process, the proposed 4x32Gb/s RX shows a low integrated fractional spur of -40.4dBc at a 2500ppm frequency offset. Furthermore, it improves bit-error-rate performance by increasing the VEM by 17%. The entire RX achieves the energy efficiency of 1.8pJ/bit with the aggregate data rate of 128Gb/s. △ Less

Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05119 [pdf, other]

A 0.65-pJ/bit 3.6-TB/s/mm I/O Interface with XTalk Minimizing Affine Signaling for Next-Generation HBM with High Interconnect Density

Authors: Hyunjun Park, Jiwon Shin, Hanseok Kim, Jihee Kim, Haengbeom Shin, Taehoon Kim, Jung-Hun Park, Woo-Seok Choi

Abstract: This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through n… ▽ More This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through numerical experiments. XMAS not only demonstrates exceptional crosstalk removing capabilities but also exhibits robustness against noise, especially simultaneous switching noise. Fabricated in a 28-nm CMOS process, the prototype XMAS transceiver achieves an edge density of 3.6TB/s/mm and an energy efficiency of 0.65pJ/b. Compared to the single-ended signaling, the crosstalk-induced peak-to-peak jitter of the received eye with XMAS is reduced by 75% at 10GS/s/pin data rate, and the horizontal eye opening extends to 0.2UI at a bit error rate < 10$^{-12}$. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2403.09179 [pdf, other]

Synchronisation-Oriented Design Approach for Adaptive Control

Authors: Namhoon Cho, Seokwon Lee, Hyo-Sang Shin

Abstract: This study presents a synchronisation-oriented perspective towards adaptive control which views model-referenced adaptation as synchronisation between actual and virtual dynamic systems. In the context of adaptation, model reference adaptive control methods make the state response of the actual plant follow a reference model. In the context of synchronisation, consensus methods involving diffusive… ▽ More This study presents a synchronisation-oriented perspective towards adaptive control which views model-referenced adaptation as synchronisation between actual and virtual dynamic systems. In the context of adaptation, model reference adaptive control methods make the state response of the actual plant follow a reference model. In the context of synchronisation, consensus methods involving diffusive coupling induce a collective behaviour across multiple agents. We draw from the understanding about the two time-scale nature of synchronisation motivated by the study of blended dynamics. The synchronisation-oriented approach consists in the design of a coupling input to achieve desired closed-loop error dynamics followed by the input allocation process to shape the collective behaviour. We suggest that synchronisation can be a reasonable design principle allowing a more holistic and systematic approach to the design of adaptive control systems for improved transient characteristics. Most notably, the proposed approach enables not only constructive derivation but also substantial generalisation of the previously developed closed-loop reference model adaptive control method. Practical significance of the proposed generalisation lies at the capability to improve the transient response characteristics and mitigate the unwanted peaking phenomenon at the same time. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 34 pages, 8 figures, extended version for a manuscript submitted to Automatica

arXiv:2402.09440 [pdf, ps, other]

doi 10.1109/TCOMM.2023.3308150

Extreme Learning Machine-based Channel Estimation in IRS-Assisted Multi-User ISAC System

Authors: Yu Liu, Ibrahim Al-Nahhal, Octavia A. Dobre, Fanggang Wang, Hyundong Shin

Abstract: Multi-user integrated sensing and communication (ISAC) assisted by intelligent reflecting surface (IRS) has been recently investigated to provide a high spectral and energy efficiency transmission. This paper proposes a practical channel estimation approach for the first time to an IRS-assisted multiuser ISAC system. The estimation problem in such a system is challenging since the sensing and comm… ▽ More Multi-user integrated sensing and communication (ISAC) assisted by intelligent reflecting surface (IRS) has been recently investigated to provide a high spectral and energy efficiency transmission. This paper proposes a practical channel estimation approach for the first time to an IRS-assisted multiuser ISAC system. The estimation problem in such a system is challenging since the sensing and communication (SAC) signals interfere with each other, and the passive IRS lacks signal processing ability. A two-stage approach is proposed to transfer the overall estimation problem into sub-ones, successively including the direct and reflected channels estimation. Based on this scheme, the ISAC base station (BS) estimates all the SAC channels associated with the target and uplink users, while each downlink user estimates the downlink communication channels individually. Considering a low-cost demand of the ISAC BS and downlink users, the proposed two-stage approach is realized by an efficient neural network (NN) framework that contains two different extreme learning machine (ELM) structures to estimate the above SAC channels. Moreover, two types of input-output pairs to train the ELMs are carefully devised, which impact the estimation accuracy and computational complexity under different system parameters. Simulation results reveal a substantial performance improvement achieved by the proposed ELM-based approach over the least-squares and NN-based benchmarks, with reduced training complexity and faster training speed. △ Less

Submitted 7 April, 2024; v1 submitted 29 January, 2024; originally announced February 2024.

Journal ref: Published in IEEE Transactions on Communications, vol. 71, no. 12, pp. 6993-7007, Dec. 2023

arXiv:2312.10672 [pdf, other]

Automatic Optimisation of Normalised Neural Networks

Authors: Namhoon Cho, Hyo-Sang Shin

Abstract: We propose automatic optimisation methods considering the geometry of matrix manifold for the normalised parameters of neural networks. Layerwise weight normalisation with respect to Frobenius norm is utilised to bound the Lipschitz constant and to enhance gradient reliability so that the trained networks are suitable for control applications. Our approach first initialises the network and normali… ▽ More We propose automatic optimisation methods considering the geometry of matrix manifold for the normalised parameters of neural networks. Layerwise weight normalisation with respect to Frobenius norm is utilised to bound the Lipschitz constant and to enhance gradient reliability so that the trained networks are suitable for control applications. Our approach first initialises the network and normalises the data with respect to the $\ell^{2}$-$\ell^{2}$ gain of the initialised network. Then, the proposed algorithms take the update structure based on the exponential map on high-dimensional spheres. Given an update direction such as that of the negative Riemannian gradient, we propose two different ways to determine the stepsize for descent. The first algorithm utilises automatic differentiation of the objective function along the update curve defined on the combined manifold of spheres. The directional second-order derivative information can be utilised without requiring explicit construction of the Hessian. The second algorithm utilises the majorisation-minimisation framework via architecture-aware majorisation for neural networks. With these new developments, the proposed methods avoid manual tuning and scheduling of the learning rate, thus providing an automated pipeline for optimizing normalised neural networks. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: 13 pages, 2 figures, submitted to 2024 L4DC

arXiv:2312.09456 [pdf, other]

Pioneering EEG Motor Imagery Classification Through Counterfactual Analysis

Authors: Kang Yin, Hye-Bin Shin, Hee-Dong Kim, Seong-Whan Lee

Abstract: The application of counterfactual explanation (CE) techniques in the realm of electroencephalography (EEG) classification has been relatively infrequent in contemporary research. In this study, we attempt to introduce and explore a novel non-generative approach to CE, specifically tailored for the analysis of EEG signals. This innovative approach assesses the model's decision-making process by str… ▽ More The application of counterfactual explanation (CE) techniques in the realm of electroencephalography (EEG) classification has been relatively infrequent in contemporary research. In this study, we attempt to introduce and explore a novel non-generative approach to CE, specifically tailored for the analysis of EEG signals. This innovative approach assesses the model's decision-making process by strategically swap** patches derived from time-frequency analyses. By meticulously examining the variations and nuances introduced in the classification outcomes through this method, we aim to derive insights that can enhance interpretability. The empirical results obtained from our experimental investigations serve not only to validate the efficacy of our proposed approach but also to reinforce human confidence in the model's predictive capabilities. Consequently, these findings underscore the significance and potential value of conducting further, more extensive research in this promising direction. △ Less

Submitted 10 November, 2023; originally announced December 2023.

arXiv:2312.05828 [pdf, other]

Sparse Multitask Learning for Efficient Neural Representation of Motor Imagery and Execution

Authors: Hye-Bin Shin, Kang Yin, Seong-Whan Lee

Abstract: In the quest for efficient neural network models for neural data interpretation and user intent classification in brain-computer interfaces (BCIs), learning meaningful sparse representations of the underlying neural subspaces is crucial. The present study introduces a sparse multitask learning framework for motor imagery (MI) and motor execution (ME) tasks, inspired by the natural partitioning of… ▽ More In the quest for efficient neural network models for neural data interpretation and user intent classification in brain-computer interfaces (BCIs), learning meaningful sparse representations of the underlying neural subspaces is crucial. The present study introduces a sparse multitask learning framework for motor imagery (MI) and motor execution (ME) tasks, inspired by the natural partitioning of associated neural subspaces observed in the human brain. Given a dual-task CNN model for MI-ME classification, we apply a saliency-based sparsification approach to prune superfluous connections and reinforce those that show high importance in both tasks. Through our approach, we seek to elucidate the distinct and common neural ensembles associated with each task, employing principled sparsification techniques to eliminate redundant connections and boost the fidelity of neural signal decoding. Our results indicate that this tailored sparsity can mitigate the overfitting problem and improve the test performance with small amount of data, suggesting a viable path forward for computationally efficient and robust BCI systems. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.01689 [pdf, other]

Fast and accurate sparse-view CBCT reconstruction using meta-learned neural attenuation field and hash-encoding regularization

Authors: Heejun Shin, Taehee Kim, Jongho Lee, Se Young Chun, Seungryung Cho, Dongmyung Shin

Abstract: Cone beam computed tomography (CBCT) is an emerging medical imaging technique to visualize the internal anatomical structures of patients. During a CBCT scan, several projection images of different angles or views are collectively utilized to reconstruct a tomographic image. However, reducing the number of projections in a CBCT scan while preserving the quality of a reconstructed image is challeng… ▽ More Cone beam computed tomography (CBCT) is an emerging medical imaging technique to visualize the internal anatomical structures of patients. During a CBCT scan, several projection images of different angles or views are collectively utilized to reconstruct a tomographic image. However, reducing the number of projections in a CBCT scan while preserving the quality of a reconstructed image is challenging due to the nature of an ill-posed inverse problem. Recently, a neural attenuation field (NAF) method was proposed by adopting a neural radiance field algorithm as a new way for CBCT reconstruction, demonstrating fast and promising results using only 50 views. However, decreasing the number of projections is still preferable to reduce potential radiation exposure, and a faster reconstruction time is required considering a typical scan time. In this work, we propose a fast and accurate sparse-view CBCT reconstruction (FACT) method to provide better reconstruction quality and faster optimization speed in the minimal number of view acquisitions ($<$ 50 views). In the FACT method, we meta-trained a neural network and a hash-encoder using a few scans (= 15), and a new regularization technique is utilized to reconstruct the details of an anatomical structure. In conclusion, we have shown that the FACT method produced better, and faster reconstruction results over the other conventional algorithms based on CBCT scans of different body parts (chest, head, and abdomen) and CT vendors (Siemens, Phillips, and GE). △ Less

Submitted 16 January, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.04468 [pdf]

A human brain atlas of chi-separation for normative iron and myelin distributions

Authors: Kyeongseon Min, Beomseok Sohn, Woo Jung Kim, Chae Jung Park, Soohwa Song, Dong Hoon Shin, Kyung Won Chang, Na-Young Shin, Minjun Kim, Hyeong-Geol Shin, Phil Hyu Lee, Jongho Lee

Abstract: Iron and myelin are primary susceptibility sources in the human brain. These substances are essential for healthy brain, and their abnormalities are often related to various neurological disorders. Recently, an advanced susceptibility map** technique, which is referred to as chi-separation, has been proposed, successfully disentangling paramagnetic iron from diamagnetic myelin. This method opene… ▽ More Iron and myelin are primary susceptibility sources in the human brain. These substances are essential for healthy brain, and their abnormalities are often related to various neurological disorders. Recently, an advanced susceptibility map** technique, which is referred to as chi-separation, has been proposed, successfully disentangling paramagnetic iron from diamagnetic myelin. This method opened a potential for generating high resolution iron and myelin maps in the brain. Utilizing this technique, this study constructs a normative chi-separation atlas from 106 healthy human brains. The resulting atlas provides detailed anatomical structures associated with the distributions of iron and myelin, clearly delineating subcortical nuclei, thalamic nuclei, and white matter fiber bundles. Additionally, susceptibility values in a number of regions of interest are reported along with age-dependent changes. This atlas may have direct applications such as localization of subcortical structures for deep brain stimulation or high-intensity focused ultrasound and also serve as a valuable resource for future research. △ Less

Submitted 2 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 19 pages, 9 figures

arXiv:2309.08320 [pdf, other]

Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-** Yu

Abstract: Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV… ▽ More Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV framework that leverages DPM. Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure. The proposed model was evaluated under both in-domain and out-of-domain noisy conditions using the VoxCeleb1 test set, an external noise source, and the VOiCES corpus. The obtained experimental results demonstrate that Diff-SV achieves state-of-the-art performance, outperforming recently proposed noise-robust SV systems. △ Less

Submitted 13 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: 5 pages, 2 figures, accepted for ICASSP 2024

arXiv:2309.08208 [pdf, other]

HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

Authors: Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-** Yu

Abstract: Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since… ▽ More Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since the Conformer was designed for sequence-to-sequence tasks, its direct application to ADD tasks may be sub-optimal. To tackle this limitation, we propose HM-Conformer by adopting two components: (1) Hierarchical pooling method progressively reducing the sequence length to eliminate duplicated information (2) Multi-level classification token aggregation method utilizing classification tokens to gather information from different blocks. Owing to these components, HM-Conformer can efficiently detect spoofing evidence by processing various sequence lengths and aggregating them. In experimental results on the ASVspoof 2021 Deepfake dataset, HM-Conformer achieved a 15.71% EER, showing competitive performance compared to recent systems. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Submitted to 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

arXiv:2308.09803 [pdf, other]

Liquid Crystal-Based RIS for VLC Transmitters: Performance Analysis, Challenges, and Opportunities

Authors: Sylvester Aboagye, Telex M. N. Ngatched, Alain R. Ndjiongue, Octavia A. Dobre, Hyundong Shin

Abstract: This article presents a novel approach of using reconfigurable intelligent surfaces (RISs) in the transmitter of indoor visible light communication (VLC) systems to enhance data rate uniformity and maintain adequate illumination. In this approach, a liquid crystal (LC)-based RIS is placed in front of the LED arrays of the transmitter to form an LC-based RIS-enabled VLC transmitter. This RIS-enable… ▽ More This article presents a novel approach of using reconfigurable intelligent surfaces (RISs) in the transmitter of indoor visible light communication (VLC) systems to enhance data rate uniformity and maintain adequate illumination. In this approach, a liquid crystal (LC)-based RIS is placed in front of the LED arrays of the transmitter to form an LC-based RIS-enabled VLC transmitter. This RIS-enabled transmitter is able to perform new functions such as transmit light steering and amplification and demonstrates very high data rate and illumination performance when compared with traditional VLC transmitters with circular and distributed LED arrays and the more recent angle diversity transmitter. Simulation results reveal the strong potential of LC-based RIS-aided transmitters in satisfying the joint illumination and communication needs of indoor VLC systems and positions VLC as a critical essential block for next generation communication networks. Several challenging and exciting issues related to the realization of such transmitters are discussed. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: This paper has been accepted for publication in IEEE

arXiv:2307.10628 [pdf, other]

PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification

Authors: Wonbin Kim, Hyun-seo Shin, Ju-ho Kim, Jungwoo Heo, Chan-yeong Lim, Ha-** Yu

Abstract: Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environ… ▽ More Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environments. The experimental results demonstrate that PAS outperforms traditional additive noise in terms of equal error rates (EER), with relative improvements of 4.64% and 5.01% observed in SE-ResNet34 and ECAPA-TDNN. We also show the effectiveness of proposed method by analyzing attention modules and visualizing speaker embeddings. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 5 pages, 2 figures, 1 table, accepted to CKAIA2023 as a conference paper

arXiv:2306.11474 [pdf, other]

A Passivity-Based Method for Accelerated Convex Optimisation

Authors: Namhoon Cho, Hyo-Sang Shin

Abstract: This study presents a constructive methodology for designing accelerated convex optimisation algorithms in continuous-time domain. The two key enablers are the classical concept of passivity in control theory and the time-dependent change of variables that maps the output of the internal dynamic system to the optimisation variables. The Lyapunov function associated with the optimisation dynamics i… ▽ More This study presents a constructive methodology for designing accelerated convex optimisation algorithms in continuous-time domain. The two key enablers are the classical concept of passivity in control theory and the time-dependent change of variables that maps the output of the internal dynamic system to the optimisation variables. The Lyapunov function associated with the optimisation dynamics is obtained as a natural consequence of specifying the internal dynamics that drives the state evolution as a passive linear time-invariant system. The passivity-based methodology provides a general framework that has the flexibility to generate convex optimisation algorithms with the guarantee of different convergence rate bounds on the objective function value. The same principle applies to the design of online parameter update algorithms for adaptive control by re-defining the output of internal dynamics to allow for the feedback interconnection with tracking error dynamics. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 10 pages, 1 figure

arXiv:2305.17394 [pdf, other]

One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

Authors: Jungwoo Heo, Chan-yeong Lim, Ju-ho Kim, Hyun-seo Shin, Ha-** Yu

Abstract: The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods coul… ▽ More The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods could not extract SV-tailored features. This paper suggests One-Step Knowledge Distillation and Fine-Tuning (OS-KDFT), which incorporates KD and fine-tuning (FT). We optimize a student model for SV during KD training to avert the distillation of inappropriate information for the SV. OS-KDFT could downsize Wav2Vec 2.0 based ECAPA-TDNN size by approximately 76.2%, and reduce the SSL model's inference time by 79% while presenting an EER of 0.98%. The proposed OS-KDFT is validated across VoxCeleb1 and VoxCeleb2 datasets and W2V2 and HuBERT SSL models. Experiments are available on our GitHub. △ Less

Submitted 7 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: ISCA INTERSPEECH 2023

arXiv:2303.06593 [pdf, other]

Domain-Knowledge-Aided Airborne Ground Moving Targets Tracking

Authors: Jianduo Chai, Shaoming He, Hyo-Sang Shin

Abstract: This paper investigates the problem of traffic surveillance using an unmanned aerial vehicle (UAV) and proposes a domain-knowledge-aided airborne ground moving targets tracking algorithm. To improve the accuracy of multiple targets tracking, the proposed algorithm incorporates domain knowledge into the joint probabilistic data association (JPDA) filter as state constraints. The domain knowledge co… ▽ More This paper investigates the problem of traffic surveillance using an unmanned aerial vehicle (UAV) and proposes a domain-knowledge-aided airborne ground moving targets tracking algorithm. To improve the accuracy of multiple targets tracking, the proposed algorithm incorporates domain knowledge into the joint probabilistic data association (JPDA) filter as state constraints. The domain knowledge considered in this paper includes both road information extracted from a given map and local traffic regulations. Conventional track update method is modified to enhance the capability of recognition of temporarily track loss. A variable structure multiple model (VS-MM) method is developed to assign the road segment to a given target. The effectiveness of proposed algorithm is demonstrated through extensive numerical simulations. △ Less

Submitted 12 March, 2023; originally announced March 2023.

arXiv:2212.13887 [pdf, other]

Calibration-Free Driver Drowsiness Classification based on Manifold-Level Augmentation

Authors: Dong-Young Kim, Dong-Kyun Han, Hye-Bin Shin

Abstract: Drowsiness reduces concentration and increases response time, which causes fatal road accidents. Monitoring drivers' drowsiness levels by electroencephalogram (EEG) and taking action may prevent road accidents. EEG signals effectively monitor the driver's mental state as they can monitor brain dynamics. However, calibration is required in advance because EEG signals vary between and within subject… ▽ More Drowsiness reduces concentration and increases response time, which causes fatal road accidents. Monitoring drivers' drowsiness levels by electroencephalogram (EEG) and taking action may prevent road accidents. EEG signals effectively monitor the driver's mental state as they can monitor brain dynamics. However, calibration is required in advance because EEG signals vary between and within subjects. Because of the inconvenience, calibration has reduced the accessibility of the brain-computer interface (BCI). Develo** a generalized classification model is similar to domain generalization, which overcomes the domain shift problem. Especially data augmentation is frequently used. This paper proposes a calibration-free framework for driver drowsiness state classification using manifold-level augmentation. This framework increases the diversity of source domains by utilizing features. We experimented with various augmentation methods to improve the generalization performance. Based on the results of the experiments, we found that deeper models with smaller kernel sizes improved generalizability. In addition, applying an augmentation at the manifold-level resulted in an outstanding improvement. The framework demonstrated the capability for calibration-free BCI. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: Submitted to 2023 11th IEEE International Winter Conference on Brain-Computer Interface

arXiv:2211.02227 [pdf, other]

Integrated Parameter-Efficient Tuning for General-Purpose Audio Models

Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-** Yu

Abstract: The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of building task-specific models for target tasks. In the field of audio research, task-agnostic pre-trained models with high transferability and adaptability have achieved state-of-the-art performances through fine-tuning for downstream tasks. Nevertheless, re-training all the parameters of these massive mod… ▽ More The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of building task-specific models for target tasks. In the field of audio research, task-agnostic pre-trained models with high transferability and adaptability have achieved state-of-the-art performances through fine-tuning for downstream tasks. Nevertheless, re-training all the parameters of these massive models entails an enormous amount of time and cost, along with a huge carbon footprint. To overcome these limitations, the present study explores and applies efficient transfer learning methods in the audio domain. We also propose an integrated parameter-efficient tuning (IPET) framework by aggregating the embedding prompt (a prompt-based learning approach), and the adapter (an effective transfer learning method). We demonstrate the efficacy of the proposed framework using two backbone pre-trained audio models with different characteristics: the audio spectrogram transformer and wav2vec 2.0. The proposed IPET framework exhibits remarkable performance compared to fine-tuning method with fewer trainable parameters in four downstream tasks: sound event classification, music genre classification, keyword spotting, and speaker verification. Furthermore, the authors identify and analyze the shortcomings of the IPET framework, providing lessons and research directions for parameter efficient tuning in the audio domain. △ Less

Submitted 1 March, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: 5 pages, 3 figures

arXiv:2211.01599 [pdf, other]

Convolution channel separation and frequency sub-bands aggregation for music genre classification

Authors: Jungwoo Heo, Hyun-seo Shin, Ju-ho Kim, Chan-yeong Lim, Ha-** Yu

Abstract: In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative. A music genre classification (MGC) system should be able to analyze these features. In this research, we propose a novel framework that can extract and aggregate both short- and long-term features hierarchically. Our framework is based on ECAPA-TDNN, where all the layers that… ▽ More In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative. A music genre classification (MGC) system should be able to analyze these features. In this research, we propose a novel framework that can extract and aggregate both short- and long-term features hierarchically. Our framework is based on ECAPA-TDNN, where all the layers that extract short-term features are affected by the layers that extract long-term features because of the back-propagation training. To prevent the distortion of short-term features, we devised the convolution channel separation technique that separates short-term features from long-term feature extraction paths. To extract more diverse features from our framework, we incorporated the frequency sub-bands aggregation method, which divides the input spectrogram along frequency bandwidths and processes each segment. We evaluated our framework using the Melon Playlist dataset which is a large-scale dataset containing 600 times more data than GTZAN which is a widely used dataset in MGC studies. As the result, our framework achieved 70.4% accuracy, which was improved by 16.9% compared to a conventional framework. △ Less

Submitted 3 November, 2022; originally announced November 2022.

arXiv:2209.03698 [pdf, other]

doi 10.2514/1.G007234

Incremental Correction in Dynamic Systems Modelled with Neural Networks for Constraint Satisfaction

Authors: Namhoon Cho, Hyo-Sang Shin, Antonios Tsourdos, Davide Amato

Abstract: This study presents incremental correction methods for refining neural network parameters or control functions entering into a continuous-time dynamic system to achieve improved solution accuracy in satisfying the interim point constraints placed on the performance output variables. The proposed approach is to linearise the dynamics around the baseline values of its arguments, and then to solve fo… ▽ More This study presents incremental correction methods for refining neural network parameters or control functions entering into a continuous-time dynamic system to achieve improved solution accuracy in satisfying the interim point constraints placed on the performance output variables. The proposed approach is to linearise the dynamics around the baseline values of its arguments, and then to solve for the corrective input required to transfer the perturbed trajectory to precisely known or desired values at specific time points, i.e., the interim points. Depending on the type of decision variables to adjust, parameter correction and control function correction methods are developed. These incremental correction methods can be utilised as a means to compensate for the prediction errors of pre-trained neural networks in real-time applications where high accuracy of the prediction of dynamical systems at prescribed time points is imperative. In this regard, the online update approach can be useful for enhancing overall targeting accuracy of finite-horizon control subject to point constraints using a neural policy. Numerical example demonstrates the effectiveness of the proposed approach in an application to a powered descent problem at Mars. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Comments: 32 pages, submitted to AIAA Journal of Guidance, Control, and Dynamics

arXiv:2208.07552 [pdf]

Coil2Coil: Self-supervised MR image denoising using phased-array coil images

Authors: Juhyung Park, Dongwon Park, Hyeong-Geol Shin, Eun-Jung Choi, Hongjun An, Minjun Kim, Dongmyung Shin, Se Young Chun, Jongho Lee

Abstract: Denoising of magnetic resonance images is beneficial in improving the quality of low signal-to-noise ratio images. Recently, denoising using deep neural networks has demonstrated promising results. Most of these networks, however, utilize supervised learning, which requires large training images of noise-corrupted and clean image pairs. Obtaining training images, particularly clean images, is expe… ▽ More Denoising of magnetic resonance images is beneficial in improving the quality of low signal-to-noise ratio images. Recently, denoising using deep neural networks has demonstrated promising results. Most of these networks, however, utilize supervised learning, which requires large training images of noise-corrupted and clean image pairs. Obtaining training images, particularly clean images, is expensive and time-consuming. Hence, methods such as Noise2Noise (N2N) that require only pairs of noise-corrupted images have been developed to reduce the burden of obtaining training datasets. In this study, we propose a new self-supervised denoising method, Coil2Coil (C2C), that does not require the acquisition of clean images or paired noise-corrupted images for training. Instead, the method utilizes multichannel data from phased-array coils to generate training images. First, it divides and combines multichannel coil images into two images, one for input and the other for label. Then, they are processed to impose noise independence and sensitivity normalization such that they can be used for the training images of N2N. For inference, the method inputs a coil-combined image (e.g., DICOM image), enabling a wide application of the method. When evaluated using synthetic noise-added images, C2C shows the best performance against several self-supervised methods, reporting comparable outcomes to supervised methods. When testing the DICOM images, C2C successfully denoised real noise without showing structure-dependent residuals in the error maps. Because of the significant advantage of not requiring additional scans for clean or paired images, the method can be easily utilized for various clinical applications. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: 9 pages, 5figures

arXiv:2206.15400 [pdf, other]

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

Authors: Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

Abstract: In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences. Unlike previous approaches requiring speech keyword enrollment, our method compares input queries with an enrolled text keyword sequence. To place the audio and text representations within a common latent space, we adopt an attenti… ▽ More In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences. Unlike previous approaches requiring speech keyword enrollment, our method compares input queries with an enrolled text keyword sequence. To place the audio and text representations within a common latent space, we adopt an attention-based cross-modal matching approach that is trained in an end-to-end manner with monotonic matching loss and keyword classification loss. We also utilize a de-noising loss for the acoustic embedding network to improve robustness in noisy environments. Additionally, we introduce the LibriPhrase dataset, a new short-phrase dataset based on LibriSpeech for efficiently training keyword spotting models. Our proposed method achieves competitive results on various evaluation sets compared to other single-modal and cross-modal baselines. △ Less

Submitted 1 July, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: Accepted to Interspeech 2022

arXiv:2206.13807 [pdf, other]

doi 10.21437/Interspeech.2022-602

Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector

Authors: Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin

Abstract: The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade. However, ASV systems can be easily neutralized by spoofing attacks. Therefore, the Spoofing-Aware Speaker Verification (SASV) challenge is designed and held to promote development of systems that can perform ASV considering spoofing attacks by integrating AS… ▽ More The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade. However, ASV systems can be easily neutralized by spoofing attacks. Therefore, the Spoofing-Aware Speaker Verification (SASV) challenge is designed and held to promote development of systems that can perform ASV considering spoofing attacks by integrating ASV and spoofing countermeasure (CM) systems. In this paper, we propose two back-end systems: multi-layer perceptron score fusion model (MSFM) and integrated embedding projector (IEP). The MSFM, score fusion back-end system, derived SASV score utilizing ASV and CM scores and embeddings. On the other hand,IEP combines ASV and CM embeddings into SASV embedding and calculates final SASV score based on the cosine similarity. We effectively integrated ASV and CM systems through proposed MSFM and IEP and achieved the SASV equal error rates 0.56%, 1.32% on the official evaluation trials of the SASV 2022 challenge. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 5 pages, 4 figures, 5 tables, accepted to 2022 Interspeech as a conference paper

Journal ref: Proc. Interspeech 2022

arXiv:2203.16557 [pdf, other]

COSMOS: Cross-Modality Unsupervised Domain Adaptation for 3D Medical Image Segmentation based on Target-aware Domain Translation and Iterative Self-Training

Authors: Hyungseob Shin, Hyeongyu Kim, Sewon Kim, Yohan Jun, Taejoon Eo, Dosik Hwang

Abstract: Recent advances in deep learning-based medical image segmentation studies achieve nearly human-level performance when in fully supervised condition. However, acquiring pixel-level expert annotations is extremely expensive and laborious in medical imaging fields. Unsupervised domain adaptation can alleviate this problem, which makes it possible to use annotated data in one imaging modality to train… ▽ More Recent advances in deep learning-based medical image segmentation studies achieve nearly human-level performance when in fully supervised condition. However, acquiring pixel-level expert annotations is extremely expensive and laborious in medical imaging fields. Unsupervised domain adaptation can alleviate this problem, which makes it possible to use annotated data in one imaging modality to train a network that can successfully perform segmentation on target imaging modality with no labels. In this work, we propose a self-training based unsupervised domain adaptation framework for 3D medical image segmentation named COSMOS and validate it with automatic segmentation of Vestibular Schwannoma (VS) and cochlea on high-resolution T2 Magnetic Resonance Images (MRI). Our target-aware contrast conversion network translates source domain annotated T1 MRI to pseudo T2 MRI to enable segmentation training on target domain, while preserving important anatomical features of interest in the converted images. Iterative self-training is followed to incorporate unlabeled data to training and incrementally improve the quality of pseudo-labels, thereby leading to improved performance of segmentation. COSMOS won the 1\textsuperscript{st} place in the Cross-Modality Domain Adaptation (crossMoDA) challenge held in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). It achieves mean Dice score and Average Symmetric Surface Distance of 0.871(0.063) and 0.437(0.270) for VS, and 0.842(0.020) and 0.152(0.030) for cochlea. △ Less

Submitted 19 December, 2023; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: 10 pages, 6 figures, MICCAI 2021 Cross-Modality Domain Adaptation (crossMoDA) Challenge

arXiv:2203.02720 [pdf, ps, other]

Bayesian Learning Approach to Model Predictive Control

Authors: Namhoon Cho, Seokwon Lee, Hyo-Sang Shin, Antonios Tsourdos

Abstract: This study presents a Bayesian learning perspective towards model predictive control algorithms. High-level frameworks have been developed separately in the earlier studies on Bayesian learning and sampling-based model predictive control. On one hand, the Bayesian learning rule provides a general framework capable of generating various machine learning algorithms as special instances. On the other… ▽ More This study presents a Bayesian learning perspective towards model predictive control algorithms. High-level frameworks have been developed separately in the earlier studies on Bayesian learning and sampling-based model predictive control. On one hand, the Bayesian learning rule provides a general framework capable of generating various machine learning algorithms as special instances. On the other hand, the dynamic mirror descent model predictive control framework is capable of diversifying sample-rollout-based control algorithms. However, connections between the two frameworks have still not been fully appreciated in the context of stochastic optimal control. This study combines the Bayesian learning rule point of view into the model predictive control setting by taking inspirations from the view of understanding model predictive controller as an online learner. The selection of posterior class and natural gradient approximation for the variational formulation governs diversification of model predictive control algorithms in the Bayesian learning approach to model predictive control. This alternative viewpoint complements the dynamic mirror descent framework through streamlining the explanation of design choices. △ Less

Submitted 11 March, 2022; v1 submitted 5 March, 2022; originally announced March 2022.

Comments: 6 pages, IEEE L-CSS with CDC

arXiv:2202.11918 [pdf, other]

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

Authors: Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang

Abstract: Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly. However, these loss terms are typically designed to reduce the distortion of phase spectrum values at specific frequencies, which ensures they do not significantly affect the quality of the enhanced speech. In this paper, we propose an effective… ▽ More Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly. However, these loss terms are typically designed to reduce the distortion of phase spectrum values at specific frequencies, which ensures they do not significantly affect the quality of the enhanced speech. In this paper, we propose an effective phase reconstruction strategy for neural speech enhancement that can operate in noisy environments. Specifically, we introduce a phase continuity loss that considers relative phase variations across the time and frequency axes. By including this phase continuity loss in a state-of-the-art neural speech enhancement system trained with reconstruction loss and a number of magnitude spectral losses, we show that our proposed method further improves the quality of enhanced speech signals over the baseline, especially when training is done jointly with a magnitude spectrum loss. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: Accepted by ICASSP 2022

arXiv:2201.06262 [pdf, other]

Optimisation of Structured Neural Controller Based on Continuous-Time Policy Gradient

Authors: Namhoon Cho, Hyo-Sang Shin

Abstract: This study presents a policy optimisation framework for structured nonlinear control of continuous-time (deterministic) dynamic systems. The proposed approach prescribes a structure for the controller based on relevant scientific knowledge (such as Lyapunov stability theory or domain experiences) while considering the tunable elements inside the given structure as the point of parametrisation with… ▽ More This study presents a policy optimisation framework for structured nonlinear control of continuous-time (deterministic) dynamic systems. The proposed approach prescribes a structure for the controller based on relevant scientific knowledge (such as Lyapunov stability theory or domain experiences) while considering the tunable elements inside the given structure as the point of parametrisation with neural networks. To optimise a cost represented as a function of the neural network weights, the proposed approach utilises the continuous-time policy gradient method based on adjoint sensitivity analysis as a means for correct and performant computation of cost gradient. This enables combining the stability, robustness, and physical interpretability of an analytically-derived structure for the feedback controller with the representational flexibility and optimised resulting performance provided by machine learning techniques. Such a hybrid paradigm for fixed-structure control synthesis is particularly useful for optimising adaptive nonlinear controllers to achieve improved performance in online operation, an area where the existing theory prevails the design of structure while lacking clear analytical understandings about tuning of the gains and the uncertainty model basis functions that govern the performance characteristics. Numerical experiments on aerospace applications illustrate the utility of the structured nonlinear controller optimisation framework. △ Less

Submitted 26 June, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2201.02831 [pdf, other]

doi 10.1016/j.media.2022.102628

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation

Authors: Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen , et al. (15 additional authors not shown)

Abstract: Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality… ▽ More Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice - VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice - VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image. △ Less

Submitted 14 December, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

Comments: In Medical Image Analysis

arXiv:2112.07123 [pdf, other]

Recognition of Tactile-related EEG Signals Generated by Self-touch

Authors: Myoung-Ki Kim, Jeong-Hyun Cho, Hye-Bin Shin

Abstract: Touch is the first sense among human senses. Not only that, but it is also one of the most important senses that are indispensable. However, compared to sight and hearing, it is often neglected. In particular, since humans use the tactile sense of the skin to recognize and manipulate objects, without tactile sensation, it is very difficult to recognize or skillfully manipulate objects. In addition… ▽ More Touch is the first sense among human senses. Not only that, but it is also one of the most important senses that are indispensable. However, compared to sight and hearing, it is often neglected. In particular, since humans use the tactile sense of the skin to recognize and manipulate objects, without tactile sensation, it is very difficult to recognize or skillfully manipulate objects. In addition, the importance and interest of haptic technology related to touch are increasing with the development of technologies such as VR and AR in recent years. So far, the focus is only on haptic technology based on mechanical devices. Especially, there are not many studies on tactile sensation in the field of brain-computer interface based on EEG. There have been some studies that measured the surface roughness of artificial structures in relation to EEG-based tactile sensation. However, most studies have used passive contact methods in which the object moves, while the human subject remains still. Additionally, there have been no EEG-based tactile studies of active skin touch. In reality, we directly move our hands to feel the sense of touch. Therefore, as a preliminary study for our future research, we collected EEG signals for tactile sensation upon skin touch based on active touch and compared and analyzed differences in brain changes during touch and movement tasks. Through time-frequency analysis and statistical analysis, significant differences in power changes in alpha, beta, gamma, and high-gamma regions were observed. In addition, major spatial differences were observed in the sensory-motor region of the brain. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: Submitted to 2022 10th IEEE International Winter Conference on Brain-Computer Interface

arXiv:2110.11954 [pdf, other]

Variational Probabilistic Multi-Hypothesis Tracking

Authors: Shuoyuan Xu, Hyo-Sang Shin, Antonios Tsourdos

Abstract: This paper proposes a novel multi-target tracking (MTT) algorithm for scenarios with arbitrary numbers of measurements per target. We propose the variational probabilistic multi-hypothesis tracking (VPMHT) algorithm based on the variational Bayesian expectation-maximisation (VBEM) algorithm to resolve the MTT problem in the classic PMHT algorithm. With the introduction of variational inference, th… ▽ More This paper proposes a novel multi-target tracking (MTT) algorithm for scenarios with arbitrary numbers of measurements per target. We propose the variational probabilistic multi-hypothesis tracking (VPMHT) algorithm based on the variational Bayesian expectation-maximisation (VBEM) algorithm to resolve the MTT problem in the classic PMHT algorithm. With the introduction of variational inference, the proposed VPMHT handles track-loss much better than the conventional probabilistic multi-hypothesis tracking (PMHT) while preserving a similar or even better tracking accuracy. Extensive numerical simulations are conducted to demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2110.07821 [pdf, other]

Gait-based Frailty Assessment using Image Representation of IMU Signals and Deep CNN

Authors: Muhammad Zeeshan Arshad, Dawoon Jung, Mina Park, Hyungeun Shin, **wook Kim, Kyung-Ryoul Mun

Abstract: Frailty is a common and critical condition in elderly adults, which may lead to further deterioration of health. However, difficulties and complexities exist in traditional frailty assessments based on activity-related questionnaires. These can be overcome by monitoring the effects of frailty on the gait. In this paper, it is shown that by encoding gait signals as images, deep learning-based model… ▽ More Frailty is a common and critical condition in elderly adults, which may lead to further deterioration of health. However, difficulties and complexities exist in traditional frailty assessments based on activity-related questionnaires. These can be overcome by monitoring the effects of frailty on the gait. In this paper, it is shown that by encoding gait signals as images, deep learning-based models can be utilized for the classification of gait type. Two deep learning models (a) SS-CNN, based on single stride input images, and (b) MS-CNN, based on 3 consecutive strides were proposed. It was shown that MS-CNN performs best with an accuracy of 85.1\%, while SS-CNN achieved an accuracy of 77.3\%. This is because MS-CNN can observe more features corresponding to stride-to-stride variations which is one of the key symptoms of frailty. Gait signals were encoded as images using STFT, CWT, and GAF. While the MS-CNN model using GAF images achieved the best overall accuracy and precision, CWT has a slightly better recall. This study demonstrates how image encoded gait data can be used to exploit the full potential of deep learning CNN models for the assessment of frailty. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2021)

ACM Class: I.2

arXiv:2109.10674 [pdf]

Self-Training Based Unsupervised Cross-Modality Domain Adaptation for Vestibular Schwannoma and Cochlea Segmentation

Authors: Hyungseob Shin, Hyeongyu Kim, Sewon Kim, Yohan Jun, Taejoon Eo, Dosik Hwang

Abstract: With the advances of deep learning, many medical image segmentation studies achieve human-level performance when in fully supervised condition. However, it is extremely expensive to acquire annotation on every data in medical fields, especially on magnetic resonance images (MRI) that comprise many different contrasts. Unsupervised methods can alleviate this problem; however, the performance drop i… ▽ More With the advances of deep learning, many medical image segmentation studies achieve human-level performance when in fully supervised condition. However, it is extremely expensive to acquire annotation on every data in medical fields, especially on magnetic resonance images (MRI) that comprise many different contrasts. Unsupervised methods can alleviate this problem; however, the performance drop is inevitable compared to fully supervised methods. In this work, we propose a self-training based unsupervised-learning framework that performs automatic segmentation of Vestibular Schwannoma (VS) and cochlea on high-resolution T2 scans. Our method consists of 4 main stages: 1) VS-preserving contrast conversion from contrast-enhanced T1 scan to high-resolution T2 scan, 2) training segmentation on generated T2 scans with annotations on T1 scans, and 3) Inferring pseudo-labels on non-annotated real T2 scans, and 4) boosting the generalizability of VS and cochlea segmentation by training with combined data (i.e., real T2 scans with pseudo-labels and generated T2 scans with true annotations). Our method showed mean Dice score and Average Symmetric Surface Distance (ASSD) of 0.8570 (0.0705) and 0.4970 (0.3391) for VS, 0.8446 (0.0211) and 0.1513 (0.0314) for Cochlea on CrossMoDA2021 challenge validation phase leaderboard, outperforming most other approaches. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: 6 pages, 5 figures, MICCAI 2021 Cross-Modality Domain Adaptation for Medical Image Segmentation Challenge

arXiv:2108.12587 [pdf]

BUDA-SAGE with self-supervised denoising enables fast, distortion-free, high-resolution T2, T2*, para- and dia-magnetic susceptibility map**

Authors: Zi**g Zhang, Long Wang, Jae** Cho, Congyu Liao, Hyeong-Geol Shin, Xiaozhi Cao, Jongho Lee, **min Xu, Tao Zhang, Huihui Ye, Kawin Setsompop, Huafeng Liu, Berkin Bilgic

Abstract: To rapidly obtain high resolution T2, T2* and quantitative susceptibility map** (QSM) source separation maps with whole-brain coverage and high geometric fidelity. We propose Blip Up-Down Acquisition for Spin And Gradient Echo imaging (BUDA-SAGE), an efficient echo-planar imaging (EPI) sequence for quantitative map**. The acquisition includes multiple T2*-, T2'- and T2-weighted contrasts. We a… ▽ More To rapidly obtain high resolution T2, T2* and quantitative susceptibility map** (QSM) source separation maps with whole-brain coverage and high geometric fidelity. We propose Blip Up-Down Acquisition for Spin And Gradient Echo imaging (BUDA-SAGE), an efficient echo-planar imaging (EPI) sequence for quantitative map**. The acquisition includes multiple T2*-, T2'- and T2-weighted contrasts. We alternate the phase-encoding polarities across the interleaved shots in this multi-shot navigator-free acquisition. A field map estimated from interim reconstructions was incorporated into the joint multi-shot EPI reconstruction with a structured low rank constraint to eliminate geometric distortion. A self-supervised MR-Self2Self (MR-S2S) neural network (NN) was utilized to perform denoising after BUDA reconstruction to boost SNR. Employing Slider encoding allowed us to reach 1 mm isotropic resolution by performing super-resolution reconstruction on BUDA-SAGE volumes acquired with 2 mm slice thickness. Quantitative T2 and T2* maps were obtained using Bloch dictionary matching on the reconstructed echoes. QSM was estimated using nonlinear dipole inversion (NDI) on the gradient echoes. Starting from the estimated R2 and R2* maps, R2' information was derived and used in source separation QSM reconstruction, which provided additional para- and dia-magnetic susceptibility maps. In vivo results demonstrate the ability of BUDA-SAGE to provide whole-brain, distortion-free, high-resolution multi-contrast images and quantitative T2 and T2* maps, as well as yielding para- and dia-magnetic susceptibility maps. Derived quantitative maps showed comparable values to conventional map** methods in phantom and in vivo measurements. BUDA-SAGE acquisition with self-supervised denoising and Slider encoding enabled rapid, distortion-free, whole-brain T2, T2* map** at 1 mm3 isotropic resolution in 90 seconds. △ Less

Submitted 9 September, 2021; v1 submitted 28 August, 2021; originally announced August 2021.

arXiv:2012.06318 [pdf, ps, other]

doi 10.1109/TMI.2021.3075856

Results of the 2020 fastMRI Challenge for Machine Learning MR Image Reconstruction

Authors: Matthew J. Muckley, Bruno Riemenschneider, Alireza Radmanesh, Sunwoo Kim, Geunu Jeong, **gyu Ko, Yohan Jun, Hyungseob Shin, Dosik Hwang, Mahmoud Mostapha, Simon Arberet, Dominik Nickel, Zaccharie Ramzi, Philippe Ciuciu, Jean-Luc Starck, Jonas Teuwen, Dimitrios Karkalousos, Chao** Zhang, Anuroop Sriram, Zhengnan Huang, Nafissa Yakubova, Yvonne Lui, Florian Knoll

Abstract: Accelerating MRI scans is one of the principal outstanding problems in the MRI research community. Towards this goal, we hosted the second fastMRI competition targeted towards reconstructing MR images with subsampled k-space data. We provided participants with data from 7,299 clinical brain scans (de-identified via a HIPAA-compliant procedure by NYU Langone Health), holding back the fully-sampled… ▽ More Accelerating MRI scans is one of the principal outstanding problems in the MRI research community. Towards this goal, we hosted the second fastMRI competition targeted towards reconstructing MR images with subsampled k-space data. We provided participants with data from 7,299 clinical brain scans (de-identified via a HIPAA-compliant procedure by NYU Langone Health), holding back the fully-sampled data from 894 of these scans for challenge evaluation purposes. In contrast to the 2019 challenge, we focused our radiologist evaluations on pathological assessment in brain images. We also debuted a new Transfer track that required participants to submit models evaluated on MRI scanners from outside the training set. We received 19 submissions from eight different groups. Results showed one team scoring best in both SSIM scores and qualitative radiologist evaluations. We also performed analysis on alternative metrics to mitigate the effects of background noise and collected feedback from the participants to inform future challenges. Lastly, we identify common failure modes across the submissions, highlighting areas of need for future research in the MRI reconstruction community. △ Less

Submitted 3 May, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

Comments: M. J. Muckley and B. Riemenschneider contributed equally to this work. This updates to version accepted in IEEE Transactions on Medical Imaging. It includes a rewrite of Section II.E as well as minor changes and corrections

arXiv:2008.04396 [pdf, other]

GANDALF: Generative Adversarial Networks with Discriminator-Adaptive Loss Fine-tuning for Alzheimer's Disease Diagnosis from MRI

Authors: Hoo-Chang Shin, Alvin Ihsani, Ziyue Xu, Swetha Mandava, Sharath Turuvekere Sreenivas, Christopher Forster, Jiook Cha, Alzheimer's Disease Neuroimaging Initiative

Abstract: Positron Emission Tomography (PET) is now regarded as the gold standard for the diagnosis of Alzheimer's Disease (AD). However, PET imaging can be prohibitive in terms of cost and planning, and is also among the imaging techniques with the highest dosage of radiation. Magnetic Resonance Imaging (MRI), in contrast, is more widely available and provides more flexibility when setting the desired imag… ▽ More Positron Emission Tomography (PET) is now regarded as the gold standard for the diagnosis of Alzheimer's Disease (AD). However, PET imaging can be prohibitive in terms of cost and planning, and is also among the imaging techniques with the highest dosage of radiation. Magnetic Resonance Imaging (MRI), in contrast, is more widely available and provides more flexibility when setting the desired image resolution. Unfortunately, the diagnosis of AD using MRI is difficult due to the very subtle physiological differences between healthy and AD subjects visible on MRI. As a result, many attempts have been made to synthesize PET images from MR images using generative adversarial networks (GANs) in the interest of enabling the diagnosis of AD from MR. Existing work on PET synthesis from MRI has largely focused on Conditional GANs, where MR images are used to generate PET images and subsequently used for AD diagnosis. There is no end-to-end training goal. This paper proposes an alternative approach to the aforementioned, where AD diagnosis is incorporated in the GAN training objective to achieve the best AD classification performance. Different GAN lossesare fine-tuned based on the discriminator performance, and the overall training is stabilized. The proposed network architecture and training regime show state-of-the-art performance for three- and four- class AD classification tasks. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: Accepted for publication at the MICCAI 2020 conference

arXiv:2008.04393 [pdf, other]

GANBERT: Generative Adversarial Networks with Bidirectional Encoder Representations from Transformers for MRI to PET synthesis

Authors: Hoo-Chang Shin, Alvin Ihsani, Swetha Mandava, Sharath Turuvekere Sreenivas, Christopher Forster, Jiook Cha, Alzheimer's Disease Neuroimaging Initiative

Abstract: Synthesizing medical images, such as PET, is a challenging task due to the fact that the intensity range is much wider and denser than those in photographs and digital renderings and are often heavily biased toward zero. Above all, intensity values in PET have absolute significance, and are used to compute parameters that are reproducible across the population. Yet, usually much manual adjustment… ▽ More Synthesizing medical images, such as PET, is a challenging task due to the fact that the intensity range is much wider and denser than those in photographs and digital renderings and are often heavily biased toward zero. Above all, intensity values in PET have absolute significance, and are used to compute parameters that are reproducible across the population. Yet, usually much manual adjustment has to be made in pre-/post- processing when synthesizing PET images, because its intensity ranges can vary a lot, e.g., between -100 to 1000 in floating point values. To overcome these challenges, we adopt the Bidirectional Encoder Representations from Transformers (BERT) algorithm that has had great success in natural language processing (NLP), where wide-range floating point intensity values are represented as integers ranging between 0 to 10000 that resemble a dictionary of natural language vocabularies. BERT is then trained to predict a proportion of masked values images, where its "next sentence prediction (NSP)" acts as GAN discriminator. Our proposed approach, is able to generate PET images from MRI images in wide intensity range, with no manual adjustments in pre-/post- processing. It is a method that can scale and ready to deploy. △ Less

Submitted 10 August, 2020; originally announced August 2020.

arXiv:2007.09614 [pdf]

A geometric approach to separate the effects of magnetic susceptibility and chemical shift/exchange in a phantom with isotropic magnetic susceptibility

Authors: Hyunsung Eun, Hwihun Jeong, **gu Lee, Hyeong-geol Shin, Jongho Lee

Abstract: Purpose: To separate the effects of magnetic susceptibility and chemical shift/exchange in a phantom with isotropic magnetic susceptibility. To generate a chemical shift/exchange-corrected quantitative susceptibility map** (QSM) result. Theory and Methods: Magnetic susceptibility and chemical shift/exchange are the properties of a material. Both are known to induce the resonance frequency shif… ▽ More Purpose: To separate the effects of magnetic susceptibility and chemical shift/exchange in a phantom with isotropic magnetic susceptibility. To generate a chemical shift/exchange-corrected quantitative susceptibility map** (QSM) result. Theory and Methods: Magnetic susceptibility and chemical shift/exchange are the properties of a material. Both are known to induce the resonance frequency shift in MRI. In current QSM, the susceptibility is reconstructed from the frequency shift, ignoring the contribution of the chemical shift/exchange. In this work, a simple geometric approach, which averages the frequency shift maps from three orthogonal B0 directions to generate a chemical shift/exchange map, is developed using the fact that the average nullifies the (isotropic) susceptibility effects. The resulting chemical shift/exchange map is subtracted from the total frequency shift, producing a frequency shift map solely from susceptibility. Finally, this frequency shift map is reconstructed to a susceptibility map using a QSM algorithm. The proposed method is validated in numerical simulations and applied to phantom experiments with olive oil, bovine serum albumin, ferritin, and iron oxide solutions. Results: Both simulations and experiments confirm that the method successfully separates the contributions of the susceptibility and chemical shift/exchange, reporting the susceptibility and chemical shift/exchange of olive oil (susceptibility: 0.62 ppm, chemical shift: -3.60 ppm), bovine serum albumin (susceptibility: -0.059 ppm, chemical shift: 0.008 ppm), ferritin (susceptibility: 0.125 ppm, chemical shift: -0.005 ppm), and iron oxide (susceptibility: 0.30 ppm, chemical shift: -0.039 ppm) solutions. Conclusion: The proposed method successfully separates the susceptibility and chemical shift/exchange in phantoms with isotropic magnetic susceptibility. △ Less

Submitted 19 July, 2020; originally announced July 2020.

Comments: Magnetic Resonance in Medicine 2020 Accepted

arXiv:2007.09597 [pdf]

doi 10.1016/j.neuroimage.2020.117432

DeepResp: Deep learning solution for respiration-induced B0 fluctuation artifacts in multi-slice GRE

Authors: Hongjun An, Hyeong-Geol Shin, Sooyoen Ji, Woo** Jung, Sehong Oh, Dongmyung Shin, Juhyung Park, Jongho Lee

Abstract: Respiration-induced B$_0$ fluctuation corrupts MRI images by inducing phase errors in k-space. A few approaches such as navigator have been proposed to correct for the artifacts at the expense of sequence modification. In this study, a new deep learning method, which is referred to as DeepResp, is proposed for reducing the respiration-artifacts in multi-slice gradient echo (GRE) images. DeepResp i… ▽ More Respiration-induced B$_0$ fluctuation corrupts MRI images by inducing phase errors in k-space. A few approaches such as navigator have been proposed to correct for the artifacts at the expense of sequence modification. In this study, a new deep learning method, which is referred to as DeepResp, is proposed for reducing the respiration-artifacts in multi-slice gradient echo (GRE) images. DeepResp is designed to extract the respiration-induced phase errors from a complex image using deep neural networks. Then, the network-generated phase errors are applied to the k-space data, creating an artifact-corrected image. For network training, the computer-simulated images were generated using artifact-free images and respiration data. When evaluated, both simulated images and in-vivo images of two different breathing conditions (deep breathing and natural breathing) show improvements (simulation: normalized root-mean-square error (NRMSE) from 7.8% to 1.3%; structural similarity (SSIM) from 0.88 to 0.99; ghost-to-signal-ratio (GSR) from 7.9% to 0.6%; deep breathing: NRMSE from 13.9% to 5.8%; SSIM from 0.86 to 0.95; GSR 20.2% to 5.7%; natural breathing: NRMSE from 5.2% to 4.0%; SSIM from 0.94 to 0.97; GSR 5.7% to 2.8%). Our approach does not require any modification of the sequence or additional hardware, and may therefore find useful applications. Furthermore, the deep neural networks extract respiration-induced phase errors, which is more interpretable and reliable than results of end-to-end trained networks. △ Less

Submitted 19 July, 2020; originally announced July 2020.

Comments: 19 pages

arXiv:2005.03778 [pdf, other]

LGSVL Simulator: A High Fidelity Simulator for Autonomous Driving

Authors: Guodong Rong, Byung Hyun Shin, Hadi Tabatabaee, Qiang Lu, Steve Lemke, Mārtiņš Možeiko, Eric Boise, Geehoon Uhm, Mark Gerow, Shalin Mehta, Eugene Agafonov, Tae Hyung Kim, Eric Sterner, Keunhae Ushiroda, Michael Reyes, Dmitry Zelenkovsky, Seonman Kim

Abstract: Testing autonomous driving algorithms on real autonomous vehicles is extremely costly and many researchers and developers in the field cannot afford a real car and the corresponding sensors. Although several free and open-source autonomous driving stacks, such as Autoware and Apollo are available, choices of open-source simulators to use with them are limited. In this paper, we introduce the LGSVL… ▽ More Testing autonomous driving algorithms on real autonomous vehicles is extremely costly and many researchers and developers in the field cannot afford a real car and the corresponding sensors. Although several free and open-source autonomous driving stacks, such as Autoware and Apollo are available, choices of open-source simulators to use with them are limited. In this paper, we introduce the LGSVL Simulator which is a high fidelity simulator for autonomous driving. The simulator engine provides end-to-end, full-stack simulation which is ready to be hooked up to Autoware and Apollo. In addition, simulator tools are provided with the core simulation engine which allow users to easily customize sensors, create new types of controllable objects, replace some modules in the core simulator, and create digital twins of particular environments. △ Less

Submitted 21 June, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

Comments: 6 pages, 7 figures, ITSC 2020

arXiv:2004.03910 [pdf, ps, other]

A New Exponential Forgetting Algorithm for Recursive Least-Squares Parameter Estimation

Authors: Hyo-Sang Shin, Hae-In Lee

Abstract: This paper develops a new exponential forgetting algorithm that can prevent so-called the estimator windup problem, while retaining fast convergence speed. To investigate the properties of the proposed forgetting algorithm, boundedness of the covariance matrix is first analysed and compared with various exponential and directional forgetting algorithms. Then, stability of the estimation error with… ▽ More This paper develops a new exponential forgetting algorithm that can prevent so-called the estimator windup problem, while retaining fast convergence speed. To investigate the properties of the proposed forgetting algorithm, boundedness of the covariance matrix is first analysed and compared with various exponential and directional forgetting algorithms. Then, stability of the estimation error with and without the persistent excitation condition is theoretically analysed in comparison with the existing benchmark algorithms. Numerical simulations on wing rock motion validate the analysis results. △ Less

Submitted 8 April, 2020; originally announced April 2020.

arXiv:2002.12467 [pdf, other]

doi 10.1109/REDUAS47371.2019.8999695

Improving Learning Effectiveness For Object Detection and Classification in Cluttered Backgrounds

Authors: Vinorth Varatharasan, Hyo-Sang Shin, Antonios Tsourdos, Nick Colosimo

Abstract: Usually, Neural Networks models are trained with a large dataset of images in homogeneous backgrounds. The issue is that the performance of the network models trained could be significantly degraded in a complex and heterogeneous environment. To mitigate the issue, this paper develops a framework that permits to autonomously generate a training dataset in heterogeneous cluttered backgrounds. It is… ▽ More Usually, Neural Networks models are trained with a large dataset of images in homogeneous backgrounds. The issue is that the performance of the network models trained could be significantly degraded in a complex and heterogeneous environment. To mitigate the issue, this paper develops a framework that permits to autonomously generate a training dataset in heterogeneous cluttered backgrounds. It is clear that the learning effectiveness of the proposed framework should be improved in complex and heterogeneous environments, compared with the ones with the typical dataset. In our framework, a state-of-the-art image segmentation technique called DeepLab is used to extract objects of interest from a picture and Chroma-key technique is then used to merge the extracted objects of interest into specific heterogeneous backgrounds. The performance of the proposed framework is investigated through empirical tests and compared with that of the model trained with the COCO dataset. The results show that the proposed framework outperforms the model compared. This implies that the learning effectiveness of the framework developed is superior to the models with the typical dataset. △ Less

Submitted 27 February, 2020; originally announced February 2020.

Comments: IEEE RED-UAS 2019 Conference

arXiv:2001.08956 [pdf, ps, other]

Online Resource Procurement and Allocation in a Hybrid Edge-Cloud Computing System

Authors: Thinh Quang Dinh, Ben Liang, Tony Q. S. Quek, Hyundong Shin

Abstract: By acquiring cloud-like capacities at the edge of a network, edge computing is expected to significantly improve user experience. In this paper, we formulate a hybrid edge-cloud computing system where an edge device with limited local resources can rent more from a cloud node and perform resource allocation to serve its users. The resource procurement and allocation decisions depend not only on th… ▽ More By acquiring cloud-like capacities at the edge of a network, edge computing is expected to significantly improve user experience. In this paper, we formulate a hybrid edge-cloud computing system where an edge device with limited local resources can rent more from a cloud node and perform resource allocation to serve its users. The resource procurement and allocation decisions depend not only on the cloud's multiple rental options but also on the edge's local processing cost and capacity. We first propose an offline algorithm whose decisions are made with full information of future demand. Then, an online algorithm is proposed where the edge node makes irrevocable decisions in each timeslot without future information of demand. We show that both algorithms have constant performance bounds from the offline optimum. Numerical results acquired with Google cluster-usage traces indicate that the cost of the edge node can be substantially reduced by using the proposed algorithms, up to $80\%$ in comparison with baseline algorithms. We also observe how the cloud's pricing structure and edge's local cost influence the procurement decisions. △ Less

Submitted 24 January, 2020; originally announced January 2020.

arXiv:1911.03461 [pdf, other]

AIM 2019 Challenge on Image Demoireing: Methods and Results

Authors: Shanxin Yuan, Radu Timofte, Gregory Slabaugh, Ales Leonardis, Bolun Zheng, Xin Ye, Xiang Tian, Yaowu Chen, Xi Cheng, Zhenyong Fu, Jian Yang, Ming Hong, Wenying Lin, Wen** Yang, Yanyun Qu, Hong-Kyu Shin, Joon-Yeon Kim, Sung-Jea Ko, Hang Dong, Yu Guo, Jie Wang, Xuan Ding, Zongyan Han, Sourya Dipta Das, Kuldeep Purohit , et al. (3 additional authors not shown)

Abstract: This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire wa… ▽ More This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire was created for this challenge, and consists of 10,200 synthetically generated image pairs (moire and clean ground truth). The challenge was divided into 2 tracks. Track 1 targeted fidelity, measuring the ability of demoire methods to obtain a moire-free image compared with the ground truth, while Track 2 examined the perceptual quality of demoire methods. The tracks had 60 and 39 registered participants, respectively. A total of eight teams competed in the final testing phase. The entries span the current the state-of-the-art in the image demoireing problem. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Comments: arXiv admin note: text overlap with arXiv:1911.02498

arXiv:1908.06884 [pdf, other]

A Domain-Knowledge-Aided Deep Reinforcement Learning Approach for Flight Control Design

Authors: Hyo-Sang Shin, Shaoming He, Antonios Tsourdos

Abstract: This paper aims to examine the potential of using the emerging deep reinforcement learning techniques in flight control. Instead of learning from scratch, we suggest to leverage domain knowledge available in learning to improve learning efficiency and generalisability. More specifically, the proposed approach fixes the autopilot structure as typical three-loop autopilot and deep reinforcement lear… ▽ More This paper aims to examine the potential of using the emerging deep reinforcement learning techniques in flight control. Instead of learning from scratch, we suggest to leverage domain knowledge available in learning to improve learning efficiency and generalisability. More specifically, the proposed approach fixes the autopilot structure as typical three-loop autopilot and deep reinforcement learning is utilised to learn the autopilot gains. To solve the flight control problem, we then formulate a Markovian decision process with a proper reward function that enable the application of reinforcement learning theory. Another type of domain knowledge is exploited for defining the reward function, by sha** reference inputs in consideration of important control objectives and using the shaped reference inputs in the reward function. The state-of-the-art deep deterministic policy gradient algorithm is utilised to learn an action policy that maps the observed states to the autopilot gains. Extensive empirical numerical simulations are performed to validate the proposed computational control algorithm. △ Less

Submitted 11 November, 2020; v1 submitted 19 August, 2019; originally announced August 2019.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1907.03728 [pdf, other]

Correlation via synthesis: end-to-end nodule image generation and radiogenomic map learning based on generative adversarial network

Authors: Ziyue Xu, Xiaosong Wang, Hoo-Chang Shin, Dong Yang, Holger Roth, Fausto Milletari, Ling Zhang, Daguang Xu

Abstract: Radiogenomic map linking image features and gene expression profiles is useful for noninvasively identifying molecular properties of a particular type of disease. Conventionally, such map is produced in three separate steps: 1) gene-clustering to "metagenes", 2) image feature extraction, and 3) statistical correlation between metagenes and image features. Each step is independently performed and r… ▽ More Radiogenomic map linking image features and gene expression profiles is useful for noninvasively identifying molecular properties of a particular type of disease. Conventionally, such map is produced in three separate steps: 1) gene-clustering to "metagenes", 2) image feature extraction, and 3) statistical correlation between metagenes and image features. Each step is independently performed and relies on arbitrary measurements. In this work, we investigate the potential of an end-to-end method fusing gene data with image features to generate synthetic image and learn radiogenomic map simultaneously. To achieve this goal, we develop a generative adversarial network (GAN) conditioned on both background images and gene expression profiles, synthesizing the corresponding image. Image and gene features are fused at different scales to ensure the realism and quality of the synthesized image. We tested our method on non-small cell lung cancer (NSCLC) dataset. Results demonstrate that the proposed method produces realistic synthetic images, and provides a promising way to find gene-image relationship in a holistic end-to-end manner. △ Less

Submitted 8 July, 2019; originally announced July 2019.

arXiv:1904.12522 [pdf]

Artificial neural network for myelin water imaging

Authors: Jieun Lee, Doohee Lee, Joon Yul Choi, Dongmyung Shin, Hyeong-Geol Shin, Jongho Lee

Abstract: Purpose: To demonstrate the application of artificial-neural-network (ANN) for real-time processing of myelin water imaging (MWI). Methods: Three neural networks, ANN-IMWF, ANN-IGMT2, and ANN-II, were developed to generate MWI. ANN-IMWF and ANN-IGMT2 were designed to output myelin water fraction (MWF) and geometric mean T2 (GMT2,IEW), respectively whereas ANN-II generates a T2 distribution. For th… ▽ More Purpose: To demonstrate the application of artificial-neural-network (ANN) for real-time processing of myelin water imaging (MWI). Methods: Three neural networks, ANN-IMWF, ANN-IGMT2, and ANN-II, were developed to generate MWI. ANN-IMWF and ANN-IGMT2 were designed to output myelin water fraction (MWF) and geometric mean T2 (GMT2,IEW), respectively whereas ANN-II generates a T2 distribution. For the networks, gradient and spin echo data from 18 healthy controls (HC) and 26 multiple sclerosis patients (MS) were utilized. Among them, 10 HC and 12 MS had the same scan parameters and were used for training (6 HC and 6 MS), validation (1 HC and 1 MS), and test sets (3 HC and 5 HC). The remaining data had different scan parameters and were applied to exam the effects of the scan parameters. The network results were compared with those of conventional MWI in the white matter mask and regions of interest (ROI). Results: The networks produced highly accurate results, showing averaged normalized root-mean-squared error under 3% for MWF and 0.4% for GMT2,IEW in the white matter mask of the test set. In the ROI analysis, the differences between ANNs and conventional MWI were less than 0.1% in MWF and 0.1 ms in GMT2,IEW (no statistical difference and R2 > 0.97). Datasets with different scan parameters showed increased errors. The average processing time was 0.68 sec in ANNs, gaining 11,702 times acceleration in the computational speed (conventional MWI: 7,958 sec). Conclusion: The proposed neural networks demonstrate the feasibility of real-time processing for MWI with high accuracy. △ Less

Submitted 19 September, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

Comments: 15 pages

arXiv:1903.11531 [pdf, ps, other]

Sample Greedy Gossip for Distributed Network-Wide Average Computation

Authors: Hyo-Sang Shin, Shaoming He, Antonios Tsourdos

Abstract: This paper investigates the problem of distributed network-wide averaging and proposes a new greedy gossip algorithm. Instead of finding the optimal path of each node in a greedy manner, the proposed approach utilises a suboptimal communication path by performing greedy selection among randomly selected active local nodes. Theoretical analysis on convergence speed is also performed to investigate… ▽ More This paper investigates the problem of distributed network-wide averaging and proposes a new greedy gossip algorithm. Instead of finding the optimal path of each node in a greedy manner, the proposed approach utilises a suboptimal communication path by performing greedy selection among randomly selected active local nodes. Theoretical analysis on convergence speed is also performed to investigate the characteristics of the proposed algorithm. The main feature of the new algorithm is that it provides great flexibility and well balance between communication cost and convergence performance introduced by the stochastic sampling strategy. Extensive numerical simulations are performed to validate the analytic findings. △ Less

Submitted 19 August, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

arXiv:1809.00758 [pdf]

End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights

Authors: Myungsu Chae, Tae-Ho Kim, Young Hoon Shin, June-Woo Kim, Soo-Young Lee

Abstract: Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient cons… ▽ More Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient considerations by setting them uniformly or empirically. In this study, we propose a method to calculate joint loss using dynamic weights to improve the total performance, instead of the individual performance, of tasks. We apply this method to design an end-to-end multimodal emotion and gender recognition model using audio and video data. This approach provides proper weights for the loss of each task when the training process ends. In our experiments, emotion and gender recognition with the proposed method yielded a lower joint loss, which is computed as the negative log-likelihood, than using static weights for joint loss. Moreover, our proposed model has better generalizability than other models. To the best of our knowledge, this research is the first to demonstrate the strength of using dynamic weights for joint loss for maximizing overall performance in emotion and gender recognition tasks. △ Less

Submitted 2 October, 2018; v1 submitted 3 September, 2018; originally announced September 2018.

Comments: IROS 2018 Workshop on Crossmodal Learning for Intelligent Robotics

MSC Class: 68T05

arXiv:1510.03062 [pdf]

GPS Receiver with Enhanced User Positioning Time

Authors: Seung-Hyun Yoon, Ji-Woon Jung, Su-Bong Kim, Hyun-Chang Shin, Jae-Hyang Lee, Kyu-Yun Lee, Hyo-Sun Shim

Abstract: This paper introduces a Global Positioning System (GPS) Receiver that locates user's position instantly. Recently, many mobile devices require location information to add user position into their contents, and some applications require quick positioning when the device is initially switched on. In order to reduce the time to fix user's position, we propose the Instant-On GPS receiver system which… ▽ More This paper introduces a Global Positioning System (GPS) Receiver that locates user's position instantly. Recently, many mobile devices require location information to add user position into their contents, and some applications require quick positioning when the device is initially switched on. In order to reduce the time to fix user's position, we propose the Instant-On GPS receiver system which is implemented on an ARM based FPGA board, and operates under a very low power mode. We've developed a repeated sleep mode by periods to control the GPS receiver's main power in order to achieve reduced power consumption. By using a high resolution Real Time Clock (RTC), we can estimate frame sync timing without receiving the current frame sync preamble data from a satellite when GPS turns back on. However, the navigation solution needs to be calculated once in advance. The performance results of the proposed GPS receiver in both real world and simulation environment are presented. △ Less

Submitted 11 October, 2015; originally announced October 2015.

Comments: Proc. International Symposium on GPS/GNSS 2008

Showing 1–50 of 51 results for author: Shin, H