Search | arXiv e-print repository

Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Image Synthesis: T1 MRI to Tau-PET

Authors: Symac Kim, Junho Moon, Haejun Chung, Ikbeom Jang

Abstract: Alzheimer's Disease (AD) is the most common form of dementia, characterised by cognitive decline and biomarkers such as tau-proteins. Tau-positron emission tomography (tau-PET), which employs a radiotracer to selectively bind, detect, and visualise tau protein aggregates within the brain, is valuable for early AD diagnosis but is less accessible due to high costs, limited availability, and its inv… ▽ More Alzheimer's Disease (AD) is the most common form of dementia, characterised by cognitive decline and biomarkers such as tau-proteins. Tau-positron emission tomography (tau-PET), which employs a radiotracer to selectively bind, detect, and visualise tau protein aggregates within the brain, is valuable for early AD diagnosis but is less accessible due to high costs, limited availability, and its invasive nature. Image synthesis with neural networks enables the generation of tau-PET images from more accessible T1-weighted magnetic resonance imaging (MRI) images. To ensure high-quality image synthesis, we propose a cyclic 2.5D perceptual loss combined with mean squared error and structural similarity index measure (SSIM) losses. The cyclic 2.5D perceptual loss sequentially calculates the axial 2D average perceptual loss for a specified number of epochs, followed by the coronal and sagittal planes for the same number of epochs. This sequence is cyclically performed, with intervals reducing as the cycles repeat. We conduct supervised synthesis of tau-PET images from T1w MRI images using 516 paired T1w MRI and tau-PET 3D images from the ADNI database. For the collected data, we perform preprocessing, including intensity standardisation for tau-PET images from each manufacturer. The proposed loss, applied to generative 3D U-Net and its variants, outperformed those with 2.5D and 3D perceptual losses in SSIM and peak signal-to-noise ratio (PSNR). In addition, including the cyclic 2.5D perceptual loss to the original losses of GAN-based image synthesis models such as CycleGAN and Pix2Pix improves SSIM and PSNR by at least 2% and 3%. Furthermore, by-manufacturer PET standardisation helps the models in synthesising high-quality images than min-max PET normalisation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 24 pages, 5 figures

arXiv:2404.00791 [pdf, other]

doi 10.1109/ICASSP48485.2024.10446067

Personalized Neural Speech Codec

Authors: Inseon Jang, Haici Yang, Wootaek Lim, Seungkwon Beack, Minje Kim

Abstract: In this paper, we propose a personalized neural speech codec, envisioning that personalization can reduce the model complexity or improve perceptual speech quality. Despite the common usage of speech codecs where only a single talker is involved on each side of the communication, personalizing a codec for the specific user has rarely been explored in the literature. First, we assume speakers can b… ▽ More In this paper, we propose a personalized neural speech codec, envisioning that personalization can reduce the model complexity or improve perceptual speech quality. Despite the common usage of speech codecs where only a single talker is involved on each side of the communication, personalizing a codec for the specific user has rarely been explored in the literature. First, we assume speakers can be grouped into smaller subsets based on their perceptual similarity. Then, we also postulate that a group-specific codec can focus on the group's speech characteristics to improve its perceptual quality and computational efficiency. To this end, we first develop a Siamese network that learns the speaker embeddings from the LibriSpeech dataset, which are then grouped into underlying speaker clusters. Finally, we retrain the LPCNet-based speech codec baselines on each of the speaker clusters. Subjective listening tests show that the proposed personalization scheme introduces model compression while maintaining speech quality. In other words, with the same model complexity, personalized codecs produce better speech quality. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 991-995

arXiv:2312.06902 [pdf, other]

Perseus: Removing Energy Bloat from Large Model Training

Authors: Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury

Abstract: Training large AI models on numerous GPUs consumes a massive amount of energy. We observe that not all energy consumed during training directly contributes to end-to-end training throughput, and a significant portion can be removed without slowing down training, which we call energy bloat. In this work, we identify two independent sources of energy bloat in large model training, intrinsic and ex… ▽ More Training large AI models on numerous GPUs consumes a massive amount of energy. We observe that not all energy consumed during training directly contributes to end-to-end training throughput, and a significant portion can be removed without slowing down training, which we call energy bloat. In this work, we identify two independent sources of energy bloat in large model training, intrinsic and extrinsic, and propose Perseus, a unified optimization framework that mitigates both. Perseus obtains the "iteration time-energy" Pareto frontier of any large model training job using an efficient iterative graph cut-based algorithm and schedules energy consumption of its forward and backward computations across time to remove intrinsic and extrinsic energy bloat. Evaluation on large models like GPT-3 and Bloom shows that Perseus reduces energy consumption of large model training by up to 30%, enabling savings otherwise unobtainable before. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Open-source at https://ml.energy/zeus/perseus/

arXiv:2311.08330 [pdf, other]

Generative De-Quantization for Neural Speech Codec via Latent Diffusion

Authors: Haici Yang, Inseon Jang, Minje Kim

Abstract: In low-bitrate speech coding, end-to-end speech coding networks aim to learn compact yet expressive features and a powerful decoder in a single network. A challenging problem as such results in unwelcome complexity increase and inferior speech quality. In this paper, we propose to separate the representation learning and information reconstruction tasks. We leverage an end-to-end codec for learnin… ▽ More In low-bitrate speech coding, end-to-end speech coding networks aim to learn compact yet expressive features and a powerful decoder in a single network. A challenging problem as such results in unwelcome complexity increase and inferior speech quality. In this paper, we propose to separate the representation learning and information reconstruction tasks. We leverage an end-to-end codec for learning low-dimensional discrete tokens and employ a latent diffusion model to de-quantize coded features into a high-dimensional continuous space, relieving the decoder's burden of de-quantizing and upsampling. To mitigate the issue of over-smooth generation, we introduce midway-infilling with less noise reduction and stronger conditioning. In ablation studies, we investigate the hyperparameters for midway-infilling and latent diffusion space with different dimensions. Subjective listening tests show that our model outperforms the state-of-the-art at two low bitrates, 1.5 and 3 kbps. Codes and samples of this work are available on our webpage. △ Less

Submitted 15 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: Submitted to ICASSP 2024

arXiv:2309.08125 [pdf, other]

doi 10.1145/3600006.3613152

Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates

Authors: Insu Jang, Zhenning Yang, Zhen Zhang, Xin **, Mosharaf Chowdhury

Abstract: Oobleck enables resilient distributed training of large DNN models with guaranteed fault tolerance. It takes a planning-execution co-design approach, where it first generates a set of heterogeneous pipeline templates and instantiates at least $f+1$ logically equivalent pipeline replicas to tolerate any $f$ simultaneous failures. During execution, it relies on already-replicated model states across… ▽ More Oobleck enables resilient distributed training of large DNN models with guaranteed fault tolerance. It takes a planning-execution co-design approach, where it first generates a set of heterogeneous pipeline templates and instantiates at least $f+1$ logically equivalent pipeline replicas to tolerate any $f$ simultaneous failures. During execution, it relies on already-replicated model states across the replicas to provide fast recovery. Oobleck provably guarantees that some combination of the initially created pipeline templates can be used to cover all available resources after $f$ or fewer simultaneous failures, thereby avoiding resource idling at all times. Evaluation on large DNN models with billions of parameters shows that Oobleck provides consistently high throughput, and it outperforms state-of-the-art fault tolerance solutions like Bamboo and Varuna by up to $29.6x$. △ Less

Submitted 7 November, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: SOSP'23 | Camera-ready + figures and numbers are corrected

arXiv:2306.03379 [pdf, other]

OptimShare: A Unified Framework for Privacy Preserving Data Sharing -- Towards the Practical Utility of Data with Privacy

Authors: M. A. P. Chamikara, Seung Ick Jang, Ian Oppermann, Dongxi Liu, Musotto Roberto, Sushmita Ruj, Arindam Pal, Meisam Mohammady, Seyit Camtepe, Sylvia Young, Chris Dorrian, Nasir David

Abstract: Tabular data sharing serves as a common method for data exchange. However, sharing sensitive information without adequate privacy protection can compromise individual privacy. Thus, ensuring privacy-preserving data sharing is crucial. Differential privacy (DP) is regarded as the gold standard in data privacy. Despite this, current DP methods tend to generate privacy-preserving tabular datasets tha… ▽ More Tabular data sharing serves as a common method for data exchange. However, sharing sensitive information without adequate privacy protection can compromise individual privacy. Thus, ensuring privacy-preserving data sharing is crucial. Differential privacy (DP) is regarded as the gold standard in data privacy. Despite this, current DP methods tend to generate privacy-preserving tabular datasets that often suffer from limited practical utility due to heavy perturbation and disregard for the tables' utility dynamics. Besides, there has not been much research on selective attribute release, particularly in the context of controlled partially perturbed data sharing. This has significant implications for scenarios such as cross-agency data sharing in real-world situations. We introduce OptimShare: a utility-focused, multi-criteria solution designed to perturb input datasets selectively optimized for specific real-world applications. OptimShare combines the principles of differential privacy, fuzzy logic, and probability theory to establish an integrated tool for privacy-preserving data sharing. Empirical assessments confirm that OptimShare successfully strikes a balance between better data utility and robust privacy, effectively serving various real-world problem scenarios. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2304.09507 [pdf, other]

Self-supervised Image Denoising with Downsampled Invariance Loss and Conditional Blind-Spot Network

Authors: Yeong Il Jang, Keuntek Lee, Gu Yong Park, Seyun Kim, Nam Ik Cho

Abstract: There have been many image denoisers using deep neural networks, which outperform conventional model-based methods by large margins. Recently, self-supervised methods have attracted attention because constructing a large real noise dataset for supervised training is an enormous burden. The most representative self-supervised denoisers are based on blind-spot networks, which exclude the receptive f… ▽ More There have been many image denoisers using deep neural networks, which outperform conventional model-based methods by large margins. Recently, self-supervised methods have attracted attention because constructing a large real noise dataset for supervised training is an enormous burden. The most representative self-supervised denoisers are based on blind-spot networks, which exclude the receptive field's center pixel. However, excluding any input pixel is abandoning some information, especially when the input pixel at the corresponding output position is excluded. In addition, a standard blind-spot network fails to reduce real camera noise due to the pixel-wise correlation of noise, though it successfully removes independently distributed synthetic noise. Hence, to realize a more practical denoiser, we propose a novel self-supervised training framework that can remove real noise. For this, we derive the theoretic upper bound of a supervised loss where the network is guided by the downsampled blinded output. Also, we design a conditional blind-spot network (C-BSN), which selectively controls the blindness of the network to use the center pixel information. Furthermore, we exploit a random subsampler to decorrelate noise spatially, making the C-BSN free of visual artifacts that were often seen in downsample-based methods. Extensive experiments show that the proposed C-BSN achieves state-of-the-art performance on real-world datasets as a self-supervised denoiser and shows qualitatively pleasing results without any post-processing or refinement. △ Less

Submitted 28 July, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: Accepted to ICCV 2023

arXiv:2304.09471 [pdf, other]

Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment

Authors: Hsiang-Wei Huang, Cheng-Yen Yang, Zhongyu Jiang, Pyong-Kun Kim, Kyoungoh Lee, Kwangju Kim, Samartha Ramkumar, Chaitanya Mullapudi, In-Su Jang, Chung-I Huang, Jenq-Neng Hwang

Abstract: Multi-camera multiple people tracking has become an increasingly important area of research due to the growing demand for accurate and efficient indoor people tracking systems, particularly in settings such as retail, healthcare centers, and transit hubs. We proposed a novel multi-camera multiple people tracking method that uses anchor-guided clustering for cross-camera re-identification and spati… ▽ More Multi-camera multiple people tracking has become an increasingly important area of research due to the growing demand for accurate and efficient indoor people tracking systems, particularly in settings such as retail, healthcare centers, and transit hubs. We proposed a novel multi-camera multiple people tracking method that uses anchor-guided clustering for cross-camera re-identification and spatio-temporal consistency for geometry-based cross-camera ID reassigning. Our approach aims to improve the accuracy of tracking by identifying key features that are unique to every individual and utilizing the overlap of views between cameras to predict accurate trajectories without needing the actual camera parameters. The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data. The proposed method is evaluated on CVPR AI City Challenge 2023 dataset, achieving IDF1 of 95.36% with the first-place ranking in the challenge. The code is available at: https://github.com/ipl-uw/AIC23_Track1_UWIPL_ETRI. △ Less

Submitted 17 June, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

arXiv:2303.17719 [pdf, other]

Why is the winner the best?

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Sharib Ali, Vincent Andrearczyk, Marc Aubreville, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano, Jorge Bernal, Sebastian Bodenstedt, Alessandro Casella, Veronika Cheplygina, Marie Daum, Marleen de Bruijne, Adrien Depeursinge, Reuben Dorent, Jan Egger, David G. Ellis, Sandy Engelhardt, Melanie Ganz , et al. (100 additional authors not shown)

Abstract: International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To addre… ▽ More International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The "typical" lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: accepted to CVPR 2023

arXiv:2303.08005 [pdf, other]

Native Multi-Band Audio Coding within Hyper-Autoencoded Reconstruction Propagation Networks

Authors: Darius Petermann, Inseon Jang, Minje Kim

Abstract: Spectral sub-bands do not portray the same perceptual relevance. In audio coding, it is therefore desirable to have independent control over each of the constituent bands so that bitrate assignment and signal reconstruction can be achieved efficiently. In this work, we present a novel neural audio coding network that natively supports a multi-band coding paradigm. Our model extends the idea of com… ▽ More Spectral sub-bands do not portray the same perceptual relevance. In audio coding, it is therefore desirable to have independent control over each of the constituent bands so that bitrate assignment and signal reconstruction can be achieved efficiently. In this work, we present a novel neural audio coding network that natively supports a multi-band coding paradigm. Our model extends the idea of compressed skip connections in the U-Net-based codec, allowing for independent control over both core and high band-specific reconstructions and bit allocation. Our system reconstructs the full-band signal mainly from the condensed core-band code, therefore exploiting and showcasing its bandwidth extension capabilities to its fullest. Meanwhile, the low-bitrate high-band code helps the high-band reconstruction similarly to MPEG audio codecs' spectral bandwidth replication. MUSHRA tests show that the proposed model not only improves the quality of the core band by explicitly assigning more bits to it but retains a good quality in the high-band as well. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted to ICASSP 2023. For resources and examples, see https://saige.sice.indiana.edu/research-projects/HARP-Net/

arXiv:2212.08568 [pdf, other]

Biomedical image analysis competitions: The state of current participation practice

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps. △ Less

Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2211.08715 [pdf, other]

Conditional variational autoencoder to improve neural audio synthesis for polyphonic music sound

Authors: Seok** Lee, Minhan Kim, Seunghyeon Shin, Daeho Lee, Inseon Jang, Wootaek Lim

Abstract: Deep generative models for audio synthesis have recently been significantly improved. However, the task of modeling raw-waveforms remains a difficult problem, especially for audio waveforms and music signals. Recently, the realtime audio variational autoencoder (RAVE) method was developed for high-quality audio waveform synthesis. The RAVE method is based on the variational autoencoder and utilize… ▽ More Deep generative models for audio synthesis have recently been significantly improved. However, the task of modeling raw-waveforms remains a difficult problem, especially for audio waveforms and music signals. Recently, the realtime audio variational autoencoder (RAVE) method was developed for high-quality audio waveform synthesis. The RAVE method is based on the variational autoencoder and utilizes the two-stage training strategy. Unfortunately, the RAVE model is limited in reproducing wide-pitch polyphonic music sound. Therefore, to enhance the reconstruction performance, we adopt the pitch activation data as an auxiliary information to the RAVE model. To handle the auxiliary information, we propose an enhanced RAVE model with a conditional variational autoencoder structure and an additional fully-connected layer. To evaluate the proposed structure, we conducted a listening experiment based on multiple stimulus tests with hidden references and an anchor (MUSHRA) with the MAESTRO. The obtained results indicate that the proposed model exhibits a more significant performance and stability improvement than the conventional RAVE model. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: 5 pages, 6 figures

arXiv:2210.05150 [pdf, other]

DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning

Authors: Seungjae Lee, Jigang Kim, Inkyu Jang, H. ** Kim

Abstract: Hierarchical Reinforcement Learning (HRL) has made notable progress in complex control tasks by leveraging temporal abstraction. However, previous HRL algorithms often suffer from serious data inefficiency as environments get large. The extended components, $i.e.$, goal space and length of episodes, impose a burden on either one or both high-level and low-level policies since both levels share the… ▽ More Hierarchical Reinforcement Learning (HRL) has made notable progress in complex control tasks by leveraging temporal abstraction. However, previous HRL algorithms often suffer from serious data inefficiency as environments get large. The extended components, $i.e.$, goal space and length of episodes, impose a burden on either one or both high-level and low-level policies since both levels share the total horizon of the episode. In this paper, we present a method of Decoupling Horizons Using a Graph in Hierarchical Reinforcement Learning (DHRL) which can alleviate this problem by decoupling the horizons of high-level and low-level policies and bridging the gap between the length of both horizons using a graph. DHRL provides a freely stretchable high-level action interval, which facilitates longer temporal abstraction and faster training in complex tasks. Our method outperforms state-of-the-art HRL algorithms in typical HRL environments. Moreover, DHRL achieves long and complex locomotion and manipulation tasks. △ Less

Submitted 19 November, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: Accepted to NeurIPS 2022 (Selected as Oral)

arXiv:2209.09447 [pdf, other]

Decentralized Deadlock-free Trajectory Planning for Quadrotor Swarm in Obstacle-rich Environments -- Extended version

Authors: Jungwon Park, Inkyu Jang, H. ** Kim

Abstract: This paper presents a decentralized multi-agent trajectory planning (MATP) algorithm that guarantees to generate a safe, deadlock-free trajectory in an obstacle-rich environment under a limited communication range. The proposed algorithm utilizes a grid-based multi-agent path planning (MAPP) algorithm for deadlock resolution, and we introduce the subgoal optimization method to make the agent conve… ▽ More This paper presents a decentralized multi-agent trajectory planning (MATP) algorithm that guarantees to generate a safe, deadlock-free trajectory in an obstacle-rich environment under a limited communication range. The proposed algorithm utilizes a grid-based multi-agent path planning (MAPP) algorithm for deadlock resolution, and we introduce the subgoal optimization method to make the agent converge to the waypoint generated from the MAPP without deadlock. In addition, the proposed algorithm ensures the feasibility of the optimization problem and collision avoidance by adopting a linear safe corridor (LSC). We verify that the proposed algorithm does not cause a deadlock in both random forests and dense mazes regardless of communication range, and it outperforms our previous work in flight time and distance. We validate the proposed algorithm through a hardware demonstration with ten quadrotors. △ Less

Submitted 1 May, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: 11 pages, extended version of conference version

arXiv:2205.12429 [pdf, other]

Interaction of a priori Anatomic Knowledge with Self-Supervised Contrastive Learning in Cardiac Magnetic Resonance Imaging

Authors: Makiya Nakashima, Inyeop Jang, Ramesh Basnet, Mitchel Benovoy, W. H. Wilson Tang, Christopher Nguyen, Deborah Kwon, Tae Hyun Hwang, David Chen

Abstract: Training deep learning models on cardiac magnetic resonance imaging (CMR) can be a challenge due to the small amount of expert generated labels and inherent complexity of data source. Self-supervised contrastive learning (SSCL) has recently been shown to boost performance in several medical imaging tasks. However, it is unclear how much the pre-trained representation reflects the primary organ of… ▽ More Training deep learning models on cardiac magnetic resonance imaging (CMR) can be a challenge due to the small amount of expert generated labels and inherent complexity of data source. Self-supervised contrastive learning (SSCL) has recently been shown to boost performance in several medical imaging tasks. However, it is unclear how much the pre-trained representation reflects the primary organ of interest compared to spurious surrounding tissue. In this work, we evaluate the optimal method of incorporating prior knowledge of anatomy into a SSCL training paradigm. Specifically, we evaluate using a segmentation network to explicitly local the heart in CMR images, followed by SSCL pretraining in multiple diagnostic tasks. We find that using a priori knowledge of anatomy can greatly improve the downstream diagnostic performance. Furthermore, SSCL pre-training with in-domain data generally improved downstream performance and more human-like saliency compared to end-to-end training and ImageNet pre-trained networks. However, introducing anatomic knowledge to pre-training generally does not have significant impact. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: Under review at Machine Learning in Healthcare

arXiv:2204.03214 [pdf, other]

Transformer-Based Language Models for Software Vulnerability Detection

Authors: Chandra Thapa, Seung Ick Jang, Muhammad Ejaz Ahmed, Seyit Camtepe, Josef Pieprzyk, Surya Nepal

Abstract: The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the transferability of the knowledge gained by these models in one domain to other related domains, and the closeness of natural languages to high-level programming languages, such as C/C++, this work studies how to leverage (large) transformer-based language models in detec… ▽ More The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the transferability of the knowledge gained by these models in one domain to other related domains, and the closeness of natural languages to high-level programming languages, such as C/C++, this work studies how to leverage (large) transformer-based language models in detecting software vulnerabilities and how good are these models for vulnerability detection tasks. In this regard, firstly, a systematic (cohesive) framework that details source code translation, model preparation, and inference is presented. Then, an empirical analysis is performed with software vulnerability datasets with C/C++ source codes having multiple vulnerabilities corresponding to the library function call, pointer usage, array usage, and arithmetic expression. Our empirical results demonstrate the good performance of the language models in vulnerability detection. Moreover, these language models have better performance metrics, such as F1-score, than the contemporary models, namely bidirectional long short-term memory and bidirectional gated recurrent unit. Experimenting with the language models is always challenging due to the requirement of computing resources, platforms, libraries, and dependencies. Thus, this paper also analyses the popular platforms to efficiently fine-tune these models and present recommendations while choosing the platforms. △ Less

Submitted 5 September, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: 16 pages

arXiv:2202.04823 [pdf, other]

Decreasing Annotation Burden of Pairwise Comparisons with Human-in-the-Loop Sorting: Application in Medical Image Artifact Rating

Authors: Ikbeom Jang, Garrison Danley, Ken Chang, Jayashree Kalpathy-Cramer

Abstract: Ranking by pairwise comparisons has shown improved reliability over ordinal classification. However, as the annotations of pairwise comparisons scale quadratically, this becomes less practical when the dataset is large. We propose a method for reducing the number of pairwise comparisons required to rank by a quantitative metric, demonstrating the effectiveness of the approach in ranking medical im… ▽ More Ranking by pairwise comparisons has shown improved reliability over ordinal classification. However, as the annotations of pairwise comparisons scale quadratically, this becomes less practical when the dataset is large. We propose a method for reducing the number of pairwise comparisons required to rank by a quantitative metric, demonstrating the effectiveness of the approach in ranking medical images by image quality in this proof of concept study. Using the medical image annotation software that we developed, we actively subsample pairwise comparisons using a sorting algorithm with a human rater in the loop. We find that this method substantially reduces the number of comparisons required for a full ordinal ranking without compromising inter-rater reliability when compared to pairwise comparisons without sorting. △ Less

Submitted 9 February, 2022; originally announced February 2022.

Comments: 5 pages, 2 figures, NeurIPS Data-Centric AI Workshop 2021

ACM Class: I.2.1

arXiv:2112.06417 [pdf, other]

LC-FDNet: Learned Lossless Image Compression with Frequency Decomposition Network

Authors: Hochang Rhee, Yeong Il Jang, Seyun Kim, Nam Ik Cho

Abstract: Recent learning-based lossless image compression methods encode an image in the unit of subimages and achieve comparable performances to conventional non-learning algorithms. However, these methods do not consider the performance drop in the high-frequency region, giving equal consideration to the low and high-frequency areas. In this paper, we propose a new lossless image compression method that… ▽ More Recent learning-based lossless image compression methods encode an image in the unit of subimages and achieve comparable performances to conventional non-learning algorithms. However, these methods do not consider the performance drop in the high-frequency region, giving equal consideration to the low and high-frequency areas. In this paper, we propose a new lossless image compression method that proceeds the encoding in a coarse-to-fine manner to separate and process low and high-frequency regions differently. We initially compress the low-frequency components and then use them as additional input for encoding the remaining high-frequency region. The low-frequency components act as a strong prior in this case, which leads to improved estimation in the high-frequency area. In addition, we design the frequency decomposition process to be adaptive to color channel, spatial location, and image characteristics. As a result, our method derives an image-specific optimal ratio of low/high-frequency components. Experiments show that the proposed method achieves state-of-the-art performance for benchmark high-resolution datasets. △ Less

Submitted 12 December, 2021; originally announced December 2021.

arXiv:2112.01629 [pdf, ps, other]

Engineering AI Tools for Systematic and Scalable Quality Assessment in Magnetic Resonance Imaging

Authors: Yukai Zou, Ikbeom Jang

Abstract: A desire to achieve large medical imaging datasets keeps increasing as machine learning algorithms, parallel computing, and hardware technology evolve. Accordingly, there is a growing demand in pooling data from multiple clinical and academic institutes to enable large-scale clinical or translational research studies. Magnetic resonance imaging (MRI) is a frequently used, non-invasive imaging moda… ▽ More A desire to achieve large medical imaging datasets keeps increasing as machine learning algorithms, parallel computing, and hardware technology evolve. Accordingly, there is a growing demand in pooling data from multiple clinical and academic institutes to enable large-scale clinical or translational research studies. Magnetic resonance imaging (MRI) is a frequently used, non-invasive imaging modality. However, constructing a big MRI data repository has multiple challenges related to privacy, data size, DICOM format, logistics, and non-standardized images. Not only building the data repository is difficult, but using data pooled from the repository is also challenging, due to heterogeneity in image acquisition, reconstruction, and processing pipelines across MRI vendors and imaging sites. This position paper describes challenges in constructing a large MRI data repository and using data downloaded from such data repositories in various aspects. To help address the challenges, the paper proposes introducing a quality assessment pipeline, with considerations and general design principles. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Comments: 6 pages, 2 figures, NeurIPS Data-Centric AI Workshop 2021 (Virtual)

ACM Class: I.2.0

arXiv:2110.14565 [pdf, other]

DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

Authors: Fei Deng, Ingook Jang, Sung** Ahn

Abstract: Top-performing Model-Based Reinforcement Learning (MBRL) agents, such as Dreamer, learn the world model by reconstructing the image observations. Hence, they often fail to discard task-irrelevant details and struggle to handle visual distractions. To address this issue, previous work has proposed to contrastively learn the world model, but the performance tends to be inferior in the absence of dis… ▽ More Top-performing Model-Based Reinforcement Learning (MBRL) agents, such as Dreamer, learn the world model by reconstructing the image observations. Hence, they often fail to discard task-irrelevant details and struggle to handle visual distractions. To address this issue, previous work has proposed to contrastively learn the world model, but the performance tends to be inferior in the absence of distractions. In this paper, we seek to enhance robustness to distractions for MBRL agents. Specifically, we consider incorporating prototypical representations, which have yielded more accurate and robust results than contrastive approaches in computer vision. However, it remains elusive how prototypical representations can benefit temporal dynamics learning in MBRL, since they treat each image independently without capturing temporal structures. To this end, we propose to learn the prototypes from the recurrent states of the world model, thereby distilling temporal structures from past observations and actions into the prototypes. The resulting model, DreamerPro, successfully combines Dreamer with prototypes, making large performance gains on the DeepMind Control suite both in the standard setting and when there are complex background distractions. Code available at https://github.com/fdeng18/dreamer-pro . △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2107.06484 [pdf, other]

Robust and Recursively Feasible Real-Time Trajectory Planning in Unknown Environments

Authors: Inkyu Jang, Dongjae Lee, Seungjae Lee, H. ** Kim

Abstract: Motion planners for mobile robots in unknown environments face the challenge of simultaneously maintaining both robustness against unmodeled uncertainties and persistent feasibility of the trajectory-finding problem. That is, while dealing with uncertainties, a motion planner must update its trajectory, adapting to the newly revealed environment in real-time; failing to do so may involve unsafe ci… ▽ More Motion planners for mobile robots in unknown environments face the challenge of simultaneously maintaining both robustness against unmodeled uncertainties and persistent feasibility of the trajectory-finding problem. That is, while dealing with uncertainties, a motion planner must update its trajectory, adapting to the newly revealed environment in real-time; failing to do so may involve unsafe circumstances. Many existing planning algorithms guarantee these by maintaining the clearance needed to perform an emergency brake, which is itself a robust and persistently feasible maneuver. However, such maneuvers are not applicable for systems in which braking is impossible or risky, such as fixed-wing aircraft. To that end, we propose a real-time robust planner that recursively guarantees persistent feasibility without any need of braking. The planner ensures robustness against bounded uncertainties and persistent feasibility by constructing a loop of sequentially composed funnels, starting from the receding horizon local trajectory's forward reachable set. We implement the proposed algorithm for a robotic car tracking a speed-fixed reference trajectory. The experiment results show that the proposed algorithm can be run at faster than 16 Hz, while successfully kee** the system away from entering any dead-end, to maintain safety and feasibility. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Comments: 8 pages, 11 figures, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) accepted

arXiv:2107.02366 [pdf, other]

Real-Time Motion Planning of a Hydraulic Excavator using Trajectory Optimization and Model Predictive Control

Authors: Dongjae Lee, Inkyu Jang, Jeonghyun Byun, Hoseong Seo, H. ** Kim

Abstract: Automation of excavation tasks requires real-time trajectory planning satisfying various constraints. To guarantee both constraint feasibility and real-time trajectory re-plannability, we present an integrated framework for real-time optimization-based trajectory planning of a hydraulic excavator. The proposed framework is composed of two main modules: a global planner and a real-time local planne… ▽ More Automation of excavation tasks requires real-time trajectory planning satisfying various constraints. To guarantee both constraint feasibility and real-time trajectory re-plannability, we present an integrated framework for real-time optimization-based trajectory planning of a hydraulic excavator. The proposed framework is composed of two main modules: a global planner and a real-time local planner. The global planner computes the entire global trajectory considering excavation volume and energy minimization while the local counterpart tracks the global trajectory in a receding horizon manner, satisfying dynamic feasibility, physical constraints, and disturbance-awareness. We validate the proposed planning algorithm in a simulation environment where two types of operations are conducted in the presence of emulated disturbance from hydraulic friction and soil-bucket interaction: shallow and deep excavation. The optimized global trajectories are obtained in an order of a second, which is tracked by the local planner at faster than 30 Hz. To the best of our knowledge, this work presents the first real-time motion planning framework that satisfies constraints of a hydraulic excavator, such as force/torque, power, cylinder displacement, and flow rate limits. △ Less

Submitted 7 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: 8 pages, 8 figures, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) accepted

arXiv:2107.00353 [pdf, other]

Stability and Robustness Analysis of Plug-Pulling using an Aerial Manipulator

Authors: Jeonghyun Byun, Dongjae Lee, Hoseong Seo, Inkyu Jang, Jeongjun Choi, H. ** Kim

Abstract: In this paper, an autonomous aerial manipulation task of pulling a plug out of an electric socket is conducted, where maintaining the stability and robustness is challenging due to sudden disappearance of a large interaction force. The abrupt change in the dynamical model before and after the separation of the plug can cause destabilization or mission failure. To accomplish aerial plug-pulling, we… ▽ More In this paper, an autonomous aerial manipulation task of pulling a plug out of an electric socket is conducted, where maintaining the stability and robustness is challenging due to sudden disappearance of a large interaction force. The abrupt change in the dynamical model before and after the separation of the plug can cause destabilization or mission failure. To accomplish aerial plug-pulling, we employ the concept of hybrid automata to divide the task into three operative modes, i.e, wire-pulling, stabilizing, and free-flight. Also, a strategy for trajectory generation and a design of disturbance-observer-based controllers for each operative mode are presented. Furthermore, the theory of hybrid automata is used to prove the stability and robustness during the mode transition. We validate the proposed trajectory generation and control method by an actual wire-pulling experiment with a multirotor-based aerial manipulator. △ Less

Submitted 5 July, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

Comments: to be presented in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021

arXiv:2105.11681 [pdf, other]

Deep Neural Networks and End-to-End Learning for Audio Compression

Authors: Daniela N. Rim, Inseon Jang, Heeyoul Choi

Abstract: Recent achievements in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data with unified deep network models. Having such models for compressing audio signals has been challenging since it requires discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that… ▽ More Recent achievements in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data with unified deep network models. Having such models for compressing audio signals has been challenging since it requires discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that combines recurrent neural networks (RNNs) within the training strategy of variational autoencoders (VAEs) with a binary representation of the latent space. We apply a reparametrization trick for the Bernoulli distribution for the discrete representations, which allows smooth backpropagation. In addition, our approach allows the separation of the encoder and decoder, which is necessary for compression tasks. To our best knowledge, this is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.54. △ Less

Submitted 13 July, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

arXiv:2002.11326 [pdf, other]

Fail-safe Flight of a Fully-Actuated Quadcopter in a Single Motor Failure

Authors: Seung Jae Lee, Inkyu Jang, H. ** Kim

Abstract: In this paper, we introduce a new quadcopter fail-safe flight solution that can perform the same four controllable degrees-of-freedom flight as a regular multirotor even when a single thruster fails. The new solution employs a novel multirotor platform known as the T3-Multirotor and utilizes a distinctive strategy of actively controlling the center of gravity position to restore the nominal flight… ▽ More In this paper, we introduce a new quadcopter fail-safe flight solution that can perform the same four controllable degrees-of-freedom flight as a regular multirotor even when a single thruster fails. The new solution employs a novel multirotor platform known as the T3-Multirotor and utilizes a distinctive strategy of actively controlling the center of gravity position to restore the nominal flight performance. A dedicated control structure is introduced, along with a detailed analysis of the dynamic characteristics of the platform that change during emergency flights. Experimental results are provided to validate the feasibility of the proposed fail-safe flight strategy. △ Less

Submitted 26 February, 2020; originally announced February 2020.

Comments: 8 pages, 8 figures

arXiv:1911.01635 [pdf, other]

Emotional speech synthesis with rich and granularized control

Authors: Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang

Abstract: This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion… ▽ More This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion category while maximizing its distance to the other emotion categories. To further enhance the expressiveness of a target speech, we also introduce an effective interpolation technique that enables the intensity of a target emotion to be gradually changed to that of neutral speech. Subjective evaluation results in terms of emotional expressiveness and controllability show the superiority of the proposed algorithm to the conventional methods. △ Less

Submitted 5 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

Comments: Submitted to ICASSP 2020

arXiv:1903.10064 [pdf, other]

Omnipotent Virtual Giant for Remote Human-Swarm Interaction

Authors: Inmo Jang, Junyan Hu, Farshad Arvin, Joaquin Carrasco, Barry Lennox

Abstract: This paper proposes an intuitive human-swarm interaction framework inspired by our childhood memory in which we interacted with living ants by changing their positions and environments as if we were omnipotent relative to the ants. In virtual reality, analogously, we can be a super-powered virtual giant who can supervise a swarm of mobile robots in a vast and remote environment by flying over or r… ▽ More This paper proposes an intuitive human-swarm interaction framework inspired by our childhood memory in which we interacted with living ants by changing their positions and environments as if we were omnipotent relative to the ants. In virtual reality, analogously, we can be a super-powered virtual giant who can supervise a swarm of mobile robots in a vast and remote environment by flying over or resizing the world and coordinate them by picking and placing a robot or creating virtual walls. This work implements this idea by using Virtual Reality along with Leap Motion, which is then validated by proof-of-concept experiments using real and virtual mobile robots in mixed reality. We conduct a usability analysis to quantify the effectiveness of the overall system as well as the individual interfaces proposed in this work. The results revealed that the proposed method is intuitive and feasible for interaction with swarm robots, but may require appropriate training for the new end-user interface device. △ Less

Submitted 1 April, 2019; v1 submitted 24 March, 2019; originally announced March 2019.

Comments: Submitted to IROS2019. The full demo video is available in https://youtu.be/LOIJPFM8YRA

arXiv:1801.05463 [pdf]

doi 10.1007/s00158-018-2101-5

Deep learning for determining a near-optimal topological design without any iteration

Authors: Yonggyun Yu, Taeil Hur, Jaeho Jung, In Gwun Jang

Abstract: In this study, we propose a novel deep learning-based method to predict an optimized structure for a given boundary condition and optimization setting without using any iterative scheme. For this purpose, first, using open-source topology optimization code, datasets of the optimized structures paired with the corresponding information on boundary conditions and optimization settings are generated… ▽ More In this study, we propose a novel deep learning-based method to predict an optimized structure for a given boundary condition and optimization setting without using any iterative scheme. For this purpose, first, using open-source topology optimization code, datasets of the optimized structures paired with the corresponding information on boundary conditions and optimization settings are generated at low (32 x 32) and high (128 x 128) resolutions. To construct the artificial neural network for the proposed method, a convolutional neural network (CNN)-based encoder and decoder network is trained using the training dataset generated at low resolution. Then, as a two-stage refinement, the conditional generative adversarial network (cGAN) is trained with the optimized structures paired at both low and high resolutions, and is connected to the trained CNN-based encoder and decoder network. The performance evaluation results of the integrated network demonstrate that the proposed method can determine a near-optimal structure in terms of pixel values and compliance with negligible computational time. △ Less

Submitted 22 September, 2018; v1 submitted 13 January, 2018; originally announced January 2018.

Comments: 27 page, 11 figures, The paper is accepted in the Structural and Multidisciplinary Optimization journal, Springer

arXiv:1711.06871 [pdf, other]

doi 10.1109/TRO.2018.2858292

Anonymous Hedonic Game for Task Allocation in a Large-Scale Multiple Agent System

Authors: Inmo Jang, Hyo-Sang Shin, Antonios Tsourdos

Abstract: This paper proposes a novel game-theoretical autonomous decision-making framework to address a task allocation problem for a swarm of multiple agents. We consider cooperation of self-interested agents, and show that our proposed decentralized algorithm guarantees convergence of agents with social inhibition to a Nash stable partition (i.e., social agreement) within polynomial time. The algorithm i… ▽ More This paper proposes a novel game-theoretical autonomous decision-making framework to address a task allocation problem for a swarm of multiple agents. We consider cooperation of self-interested agents, and show that our proposed decentralized algorithm guarantees convergence of agents with social inhibition to a Nash stable partition (i.e., social agreement) within polynomial time. The algorithm is simple and executable based on local interactions with neighbor agents under a strongly-connected communication network and even in asynchronous environments. We analytically present a mathematical formulation for computing the lower bound of suboptimality of the solution, and additionally show that 50% of suboptimality can be at least guaranteed if social utilities are non-decreasing functions with respect to the number of co-working agents. The results of numerical experiments confirm that the proposed framework is scalable, fast adaptable against dynamical environments, and robust even in a realistic situation. △ Less

Submitted 24 July, 2018; v1 submitted 18 November, 2017; originally announced November 2017.

Comments: Accepted by IEEE Transactions on Robotics (on 22 May 2018)

Journal ref: Published in IEEE Transactions on Robotics, 2018

arXiv:1711.06869 [pdf, other]

doi 10.1007/s11721-018-0160-2

Bio-Inspired Local Information-Based Control for Probabilistic Swarm Distribution Guidance

Authors: Inmo Jang, Hyo-Sang Shin, Antonios Tsourdos

Abstract: This paper addresses a task allocation problem for a large-scale robotic swarm, namely swarm distribution guidance problem. Unlike most of the existing frameworks handling this problem, the proposed framework suggests utilising local information available to generate its time-varying stochastic policies. As each agent requires only local consistency on information with neighbouring agents, rather… ▽ More This paper addresses a task allocation problem for a large-scale robotic swarm, namely swarm distribution guidance problem. Unlike most of the existing frameworks handling this problem, the proposed framework suggests utilising local information available to generate its time-varying stochastic policies. As each agent requires only local consistency on information with neighbouring agents, rather than the global consistency, the proposed framework offers various advantages, e.g., a shorter timescale for using new information and potential to incorporate an asynchronous decision-making process. We perform theoretical analysis on the properties of the proposed framework. From the analysis, it is proved that the framework can guarantee the convergence to the desired density distribution even using local information while maintaining advantages of global-information-based approaches. The design requirements for these advantages are explicitly listed in this paper. This paper also provides specific examples of how to implement the framework developed. The results of numerical experiments confirm the effectiveness and comparability of the proposed framework, compared with the global-information-based framework. △ Less

Submitted 18 November, 2017; originally announced November 2017.

Comments: Submitted to IEEE Transactions on Robotics

Journal ref: Published in Swarm Intelligence, 2018

Showing 1–30 of 30 results for author: Jang, I