Search | arXiv e-print repository

Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention

Authors: Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim, Sang-Chul Lee

Abstract: Generalizability in deep neural networks plays a pivotal role in medical image segmentation. However, deep learning-based medical image analyses tend to overlook the importance of frequency variance, which is critical element for achieving a model that is both modality-agnostic and domain-generalizable. Additionally, various models fail to account for the potential information loss that can arise… ▽ More Generalizability in deep neural networks plays a pivotal role in medical image segmentation. However, deep learning-based medical image analyses tend to overlook the importance of frequency variance, which is critical element for achieving a model that is both modality-agnostic and domain-generalizable. Additionally, various models fail to account for the potential information loss that can arise from multi-task learning under deep supervision, a factor that can impair the model representation ability. To address these challenges, we propose a Modality-agnostic Domain Generalizable Network (MADGNet) for medical image segmentation, which comprises two key components: a Multi-Frequency in Multi-Scale Attention (MFMSA) block and Ensemble Sub-Decoding Module (E-SDM). The MFMSA block refines the process of spatial feature extraction, particularly in capturing boundary features, by incorporating multi-frequency and multi-scale features, thereby offering informative cues for tissue outline and anatomical structures. Moreover, we propose E-SDM to mitigate information loss in multi-task learning with deep supervision, especially during substantial upsampling from low resolution. We evaluate the segmentation performance of MADGNet across six modalities and fifteen datasets. Through extensive experiments, we demonstrate that MADGNet consistently outperforms state-of-the-art models across various modalities, showcasing superior segmentation performance. This affirms MADGNet as a robust solution for medical image segmentation that excels in diverse imaging scenarios. Our MADGNet code is available in GitHub Link. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Accepted in Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2308.10269 [pdf, other]

Domain Reduction Strategy for Non Line of Sight Imaging

Authors: Hyunbo Shim, In Cho, Daekyu Kwon, Seon Joo Kim

Abstract: This paper presents a novel optimization-based method for non-line-of-sight (NLOS) imaging that aims to reconstruct hidden scenes under various setups. Our method is built upon the observation that photons returning from each point in hidden volumes can be independently computed if the interactions between hidden surfaces are trivially ignored. We model the generalized light propagation function t… ▽ More This paper presents a novel optimization-based method for non-line-of-sight (NLOS) imaging that aims to reconstruct hidden scenes under various setups. Our method is built upon the observation that photons returning from each point in hidden volumes can be independently computed if the interactions between hidden surfaces are trivially ignored. We model the generalized light propagation function to accurately represent the transients as a linear combination of these functions. Moreover, our proposed method includes a domain reduction procedure to exclude empty areas of the hidden volumes from the set of propagation functions, thereby improving computational efficiency of the optimization. We demonstrate the effectiveness of the method in various NLOS scenarios, including non-planar relay wall, sparse scanning patterns, confocal and non-confocal, and surface geometry reconstruction. Experiments conducted on both synthetic and real-world data clearly support the superiority and the efficiency of the proposed method in general NLOS scenarios. △ Less

Submitted 20 August, 2023; originally announced August 2023.

arXiv:2301.06244 [pdf, other]

Haptic Transparency and Interaction Force Control for a Lower-Limb Exoskeleton

Authors: Emek Barış Küçüktabak, Yue Wen, Sangjoon J. Kim, Matthew Short, Daniel Ludvig, Levi Hargrove, Eric Perreault, Kevin Lynch, Jose Pons

Abstract: Controlling the interaction forces between a human and an exoskeleton is crucial for providing transparency or adjusting assistance or resistance levels. However, it is an open problem to control the interaction forces of lower-limb exoskeletons designed for unrestricted overground walking. For these types of exoskeletons, it is challenging to implement force/torque sensors at every contact betwee… ▽ More Controlling the interaction forces between a human and an exoskeleton is crucial for providing transparency or adjusting assistance or resistance levels. However, it is an open problem to control the interaction forces of lower-limb exoskeletons designed for unrestricted overground walking. For these types of exoskeletons, it is challenging to implement force/torque sensors at every contact between the user and the exoskeleton for direct force measurement. Moreover, it is important to compensate for the exoskeleton's whole-body gravitational and dynamical forces, especially for heavy lower-limb exoskeletons. Previous works either simplified the dynamic model by treating the legs as independent double pendulums, or they did not close the loop with interaction force feedback. The proposed whole-exoskeleton closed-loop compensation (WECC) method calculates the interaction torques during the complete gait cycle by using whole-body dynamics and joint torque measurements on a hip-knee exoskeleton. Furthermore, it uses a constrained optimization scheme to track desired interaction torques in a closed loop while considering physical and safety constraints. We evaluated the haptic transparency and dynamic interaction torque tracking of WECC control on three subjects. We also compared the performance of WECC with a controller based on a simplified dynamic model and a passive version of the exoskeleton. The WECC controller results in a consistently low absolute interaction torque error during the whole gait cycle for both zero and nonzero desired interaction torques. In contrast, the simplified controller yields poor performance in tracking desired interaction torques during the stance phase. △ Less

Submitted 22 January, 2024; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: 19 pages, 13 figures. Accepted for publication in the IEEE Transactions on Robotics (T-RO)

arXiv:2211.09385 [pdf, other]

ComMU: Dataset for Combinatorial Music Generation

Authors: Lee Hyun, Taehyun Kim, Hyolim Kang, Minjoo Ki, Hyeonchan Hwang, Kwanho Park, Sharang Han, Seon Joo Kim

Abstract: Commercial adoption of automatic music composition requires the capability of generating diverse and high-quality music suitable for the desired context (e.g., music for romantic movies, action games, restaurants, etc.). In this paper, we introduce combinatorial music generation, a new task to create varying background music based on given conditions. Combinatorial music generation creates short s… ▽ More Commercial adoption of automatic music composition requires the capability of generating diverse and high-quality music suitable for the desired context (e.g., music for romantic movies, action games, restaurants, etc.). In this paper, we introduce combinatorial music generation, a new task to create varying background music based on given conditions. Combinatorial music generation creates short samples of music with rich musical metadata, and combines them to produce a complete music. In addition, we introduce ComMU, the first symbolic music dataset consisting of short music samples and their corresponding 12 musical metadata for combinatorial music generation. Notable properties of ComMU are that (1) dataset is manually constructed by professional composers with an objective guideline that induces regularity, and (2) it has 12 musical metadata that embraces composers' intentions. Our results show that we can generate diverse high-quality music only with metadata, and that our unique metadata such as track-role and extended chord quality improves the capacity of the automatic composition. We highly recommend watching our video before reading the paper (https://pozalabs.github.io/ComMU). △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 19 pages, 12 figures

arXiv:2102.05790 [pdf]

doi 10.1038/s41565-021-00967-4

Nonlocal metasurfaces for spectrally decoupled wavefront manipulation and eye tracking

Authors: Jung-Hwan Song, Jorik van de Groep, Soo ** Kim, Mark L. Brongersma

Abstract: Metasurface-based optical elements typically manipulate light waves by imparting space-variant changes in the amplitude and phase with a dense array of scattering nanostructures. The highly-localized and low optical-quality-factor (Q) modes of nanostructures are beneficial for wavefront-sha** as they afford quasi-local control over the electromagnetic fields. However, many emerging imaging, sens… ▽ More Metasurface-based optical elements typically manipulate light waves by imparting space-variant changes in the amplitude and phase with a dense array of scattering nanostructures. The highly-localized and low optical-quality-factor (Q) modes of nanostructures are beneficial for wavefront-sha** as they afford quasi-local control over the electromagnetic fields. However, many emerging imaging, sensing, communication, display, and non-linear optics applications instead require flat, high-Q optical elements that provide notable energy storage and a much higher degree of spectral control over the wavefront. Here, we demonstrate high-Q, nonlocal metasurfaces with atomically-thin metasurface elements that offer notably enhanced light-matter interaction and fully-decoupled optical functions at different wavelengths. We illustrate a possible use of such a flat optic in eye tracking for eye-wear. Here, a metasurface patterned on a regular pair of eye-glasses provides an unperturbed view of the world across the visible spectrum and redirects near-infrared light to a camera to allow imaging of the eye. △ Less

Submitted 10 February, 2021; originally announced February 2021.

arXiv:2009.05210 [pdf]

A 6.3-Nanowatt-per-Channel 96-Channel Neural Spike Processor for a Movement-Intention-Decoding Brain-Computer-Interface Implant

Authors: Zhewei Jiang, Jiangyi Li, Pavan K. Chundi, Sung Justin Kim, Minhao Yang, Joonseong Kang, Seungchul Jung, Sang Joon Kim, Mingoo Seok

Abstract: This paper presents microwatt end-to-end neural signal processing hardware for deployment-stage real-time upper-limb movement intent decoding. This module features intercellular spike detection, sorting, and decoding operations for a 96-channel prosthetic implant. We design the algorithms for those operations to achieve minimal computation complexity while matching or advancing the accuracy of sta… ▽ More This paper presents microwatt end-to-end neural signal processing hardware for deployment-stage real-time upper-limb movement intent decoding. This module features intercellular spike detection, sorting, and decoding operations for a 96-channel prosthetic implant. We design the algorithms for those operations to achieve minimal computation complexity while matching or advancing the accuracy of state-of-art Brain-Computer-Interface sorting and movement decoding. Based on those algorithms, we devise the architect of the neural signal processing hardware with the focus on hardware reuse and event-driven operation. The design achieves among the highest levels of integration, reducing wireless data rate by more than four orders of magnitude. The chip prototype in a 180-nm high-VTH, achieving the lowest power dissipation of 0.61 uW for 96 channels, 21X lower than the prior art at a comparable/better accuracy even with integration of kinematic state estimation computation. △ Less

Submitted 10 September, 2020; originally announced September 2020.

arXiv:2005.01056 [pdf, other]

NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

Authors: Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, **g Liu, Kwang** Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Dongliang He , et al. (38 additional authors not shown)

Abstract: This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best percept… ▽ More This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best perceptual quality and similar to the ground truth. The track had 280 registered participants, and 19 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution. △ Less

Submitted 3 May, 2020; originally announced May 2020.

Comments: CVPRW 2020

arXiv:2003.09171 [pdf, other]

DMV: Visual Object Tracking via Part-level Dense Memory and Voting-based Retrieval

Authors: Gunhee Nam, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

Abstract: We propose a novel memory-based tracker via part-level dense memory and voting-based retrieval, called DMV. Since deep learning techniques have been introduced to the tracking field, Siamese trackers have attracted many researchers due to the balance between speed and accuracy. However, most of them are based on a single template matching, which limits the performance as it restricts the accessibl… ▽ More We propose a novel memory-based tracker via part-level dense memory and voting-based retrieval, called DMV. Since deep learning techniques have been introduced to the tracking field, Siamese trackers have attracted many researchers due to the balance between speed and accuracy. However, most of them are based on a single template matching, which limits the performance as it restricts the accessible in-formation to the initial target features. In this paper, we relieve this limitation by maintaining an external memory that saves the tracking record. Part-level retrieval from the memory also liberates the information from the template and allows our tracker to better handle the challenges such as appearance changes and occlusions. By updating the memory during tracking, the representative power for the target object can be enhanced without online learning. We also propose a novel voting mechanism for the memory reading to filter out unreliable information in the memory. We comprehensively evaluate our tracker on OTB-100,TrackingNet, GOT-10k, LaSOT, and UAV123, which show that our method yields comparable results to the state-of-the-art methods. △ Less

Submitted 20 March, 2020; originally announced March 2020.

Comments: 19 pages, 9 figures

arXiv:2003.09124 [pdf, other]

Learning the Loss Functions in a Discriminative Space for Video Restoration

Authors: Younghyun Jo, Jaeyeon Kang, Seoung Wug Oh, Seonghyeon Nam, Peter Vajda, Seon Joo Kim

Abstract: With more advanced deep network architectures and learning schemes such as GANs, the performance of video restoration algorithms has greatly improved recently. Meanwhile, the loss functions for optimizing deep neural networks remain relatively unchanged. To this end, we propose a new framework for building effective loss functions by learning a discriminative space specific to a video restoration… ▽ More With more advanced deep network architectures and learning schemes such as GANs, the performance of video restoration algorithms has greatly improved recently. Meanwhile, the loss functions for optimizing deep neural networks remain relatively unchanged. To this end, we propose a new framework for building effective loss functions by learning a discriminative space specific to a video restoration task. Our framework is similar to GANs in that we iteratively train two networks - a generator and a loss network. The generator learns to restore videos in a supervised fashion, by following ground truth features through the feature matching in the discriminative space learned by the loss network. In addition, we also introduce a new relation loss in order to maintain the temporal consistency in output videos. Experiments on video superresolution and deblurring show that our method generates visually more pleasing videos with better quantitative perceptual metric values than the other state-of-the-art methods. △ Less

Submitted 20 March, 2020; originally announced March 2020.

Comments: 24 pages

Showing 1–9 of 9 results for author: Kim, S J