Search | arXiv e-print repository

arXiv:2406.14308 [pdf, other]

FIESTA: Fourier-Based Semantic Augmentation with Uncertainty Guidance for Enhanced Domain Generalizability in Medical Image Segmentation

Authors: Kwanseok Oh, Eun** Jeon, Da-Woon Heo, Yooseung Shin, Heung-Il Suk

Abstract: Single-source domain generalization (SDG) in medical image segmentation (MIS) aims to generalize a model using data from only one source domain to segment data from an unseen target domain. Despite substantial advances in SDG with data augmentation, existing methods often fail to fully consider the details and uncertain areas prevalent in MIS, leading to mis-segmentation. This paper proposes a Fou… ▽ More Single-source domain generalization (SDG) in medical image segmentation (MIS) aims to generalize a model using data from only one source domain to segment data from an unseen target domain. Despite substantial advances in SDG with data augmentation, existing methods often fail to fully consider the details and uncertain areas prevalent in MIS, leading to mis-segmentation. This paper proposes a Fourier-based semantic augmentation method called FIESTA using uncertainty guidance to enhance the fundamental goals of MIS in an SDG context by manipulating the amplitude and phase components in the frequency domain. The proposed Fourier augmentative transformer addresses semantic amplitude modulation based on meaningful angular points to induce pertinent variations and harnesses the phase spectrum to ensure structural coherence. Moreover, FIESTA employs epistemic uncertainty to fine-tune the augmentation process, improving the ability of the model to adapt to diverse augmented data and concentrate on areas with higher ambiguity. Extensive experiments across three cross-domain scenarios demonstrate that FIESTA surpasses recent state-of-the-art SDG approaches in segmentation performance and significantly contributes to boosting the applicability of the model in medical imaging modalities. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 40 pages, 7 figures, 5 tables

arXiv:2403.13941 [pdf, ps, other]

Sensory Glove-Based Surgical Robot User Interface

Authors: Leonardo Borgioli, Ki-Hwan Oh, Alberto Mangano, Alvaro Ducas, Luciano Ambrosini, Federico Pinto, Paula A Lopez, Jessica Cassiani, Milos Zefran, Liaohai Chen, Pier Cristoforo Giulianotti

Abstract: Robotic surgery has reached a high level of maturity and has become an integral part of standard surgical care. However, existing surgeon consoles are bulky and take up valuable space in the operating room, present challenges for surgical team coordination, and their proprietary nature makes it difficult to take advantage of recent technological advances, especially in virtual and augmented realit… ▽ More Robotic surgery has reached a high level of maturity and has become an integral part of standard surgical care. However, existing surgeon consoles are bulky and take up valuable space in the operating room, present challenges for surgical team coordination, and their proprietary nature makes it difficult to take advantage of recent technological advances, especially in virtual and augmented reality. One potential area for further improvement is the integration of modern sensory gloves into robotic platforms, allowing surgeons to control robotic arms directly with their hand movements intuitively. We propose one such system that combines an HTC Vive tracker, a Manus Meta Prime 3 XR sensory glove, and God Vision wireless smart glasses. The system controls one arm of a da Vinci surgical robot. In addition to moving the arm, the surgeon can use fingers to control the end-effector of the surgical instrument. Hand gestures are used to implement clutching and similar functions. In particular, we introduce clutching of the instrument orientation, a functionality not available in the da Vinci system. The vibrotactile elements of the glove are used to provide feedback to the user when gesture commands are invoked. A preliminary evaluation of the system shows that it has excellent tracking accuracy and allows surgeons to efficiently perform common surgical training tasks with minimal practice with the new interface; this suggests that the interface is highly intuitive. The proposed system is inexpensive, allows rapid prototy**, and opens opportunities for further innovations in the design of surgical robot interfaces. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 6 pages, 5 figures, 7 tables, submitted to International Conference on Intelligent Robots and Systems (IROS)2024

arXiv:2402.19237 [pdf, ps, other]

Context-based Interpretable Spatio-Temporal Graph Convolutional Network for Human Motion Forecasting

Authors: Edgar Medina, Leyong Loh, Namrata Gurung, Kyung Hun Oh, Niels Heller

Abstract: Human motion prediction is still an open problem extremely important for autonomous driving and safety applications. Due to the complex spatiotemporal relation of motion sequences, this remains a challenging problem not only for movement prediction but also to perform a preliminary interpretation of the joint connections. In this work, we present a Context-based Interpretable Spatio-Temporal Graph… ▽ More Human motion prediction is still an open problem extremely important for autonomous driving and safety applications. Due to the complex spatiotemporal relation of motion sequences, this remains a challenging problem not only for movement prediction but also to perform a preliminary interpretation of the joint connections. In this work, we present a Context-based Interpretable Spatio-Temporal Graph Convolutional Network (CIST-GCN), as an efficient 3D human pose forecasting model based on GCNs that encompasses specific layers, aiding model interpretability and providing information that might be useful when analyzing motion distribution and body behavior. Our architecture extracts meaningful information from pose sequences, aggregates displacements and accelerations into the input model, and finally predicts the output displacements. Extensive experiments on Human 3.6M, AMASS, 3DPW, and ExPI datasets demonstrate that CIST-GCN outperforms previous methods in human motion prediction and robustness. Since the idea of enhancing interpretability for motion prediction has its merits, we showcase experiments towards it and provide preliminary evaluations of such insights here. available code: https://github.com/QualityMinds/cistgcn △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 10 pages, 6 figures

arXiv:2402.09025 [pdf, other]

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Authors: Jiwon Song, Kyungseok Oh, Taesu Kim, Hyungjun Kim, Yulhwa Kim, Jae-Joon Kim

Abstract: Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existi… ▽ More Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB outperforms previous LLM pruning methods in accelerating LLM inference while also maintaining superior perplexity and accuracy, making SLEB as a promising technique for enhancing the efficiency of LLMs. The code is available at: https://github.com/jiwonsong-dev/SLEB. △ Less

Submitted 11 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.08409 [pdf, other]

Transferring Ultrahigh-Field Representations for Intensity-Guided Brain Segmentation of Low-Field Magnetic Resonance Imaging

Authors: Kwanseok Oh, Jieun Lee, Da-Woon Heo, Dinggang Shen, Heung-Il Suk

Abstract: Ultrahigh-field (UHF) magnetic resonance imaging (MRI), i.e., 7T MRI, provides superior anatomical details of internal brain structures owing to its enhanced signal-to-noise ratio and susceptibility-induced contrast. However, the widespread use of 7T MRI is limited by its high cost and lower accessibility compared to low-field (LF) MRI. This study proposes a deep-learning framework that systematic… ▽ More Ultrahigh-field (UHF) magnetic resonance imaging (MRI), i.e., 7T MRI, provides superior anatomical details of internal brain structures owing to its enhanced signal-to-noise ratio and susceptibility-induced contrast. However, the widespread use of 7T MRI is limited by its high cost and lower accessibility compared to low-field (LF) MRI. This study proposes a deep-learning framework that systematically fuses the input LF magnetic resonance feature representations with the inferred 7T-like feature representations for brain image segmentation tasks in a 7T-absent environment. Specifically, our adaptive fusion module aggregates 7T-like features derived from the LF image by a pre-trained network and then refines them to be effectively assimilable UHF guidance into LF image features. Using intensity-guided features obtained from such aggregation and assimilation, segmentation models can recognize subtle structural representations that are usually difficult to recognize when relying only on LF features. Beyond such advantages, this strategy can seamlessly be utilized by modulating the contrast of LF features in alignment with UHF guidance, even when employing arbitrary segmentation models. Exhaustive experiments demonstrated that the proposed method significantly outperformed all baseline models on both brain tissue and whole-brain segmentation tasks; further, it exhibited remarkable adaptability and scalability by successfully integrating diverse segmentation models and tasks. These improvements were not only quantifiable but also visible in the superlative visual quality of segmentation masks. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 32 pages, 9 figures, and 5 tables

arXiv:2312.01183 [pdf, other]

Comprehensive Robotic Cholecystectomy Dataset (CRCD): Integrating Kinematics, Pedal Signals, and Endoscopic Videos

Authors: Ki-Hwan Oh, Leonardo Borgioli, Alberto Mangano, Valentina Valle, Marco Di Pangrazio, Francesco Toti, Gioia Pozza, Luciano Ambrosini, Alvaro Ducas, Milos Zefran, Liaohai Chen, Pier Cristoforo Giulianotti

Abstract: In recent years, the potential applications of machine learning to Minimally Invasive Surgery (MIS) have spurred interest in data sets that can be used to develop data-driven tools. This paper introduces a novel dataset recorded during ex vivo pseudo-cholecystectomy procedures on pig livers, utilizing the da Vinci Research Kit (dVRK). Unlike current datasets, ours bridges a critical gap by offerin… ▽ More In recent years, the potential applications of machine learning to Minimally Invasive Surgery (MIS) have spurred interest in data sets that can be used to develop data-driven tools. This paper introduces a novel dataset recorded during ex vivo pseudo-cholecystectomy procedures on pig livers, utilizing the da Vinci Research Kit (dVRK). Unlike current datasets, ours bridges a critical gap by offering not only full kinematic data but also capturing all pedal inputs used during the procedure and providing a time-stamped record of the endoscope's movements. Contributed by seven surgeons, this data set introduces a new dimension to surgical robotics research, allowing the creation of advanced models for automating console functionalities. Our work addresses the existing limitation of incomplete recordings and imprecise kinematic data, common in other datasets. By introducing two models, dedicated to predicting clutch usage and camera activation, we highlight the dataset's potential for advancing automation in surgical robotics. The comparison of methodologies and time windows provides insights into the models' boundaries and limitations. △ Less

Submitted 6 April, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

Comments: 6 pages, 8 figures, 5 tables. Accepted for presentation at the 2024 International Symposium on Medical Robotics

arXiv:2311.04250 [pdf, other]

Unifying Structure and Language Semantic for Efficient Contrastive Knowledge Graph Completion with Structured Entity Anchors

Authors: Sang-Hyun Je, Wontae Choi, Kwang** Oh

Abstract: The goal of knowledge graph completion (KGC) is to predict missing links in a KG using trained facts that are already known. In recent, pre-trained language model (PLM) based methods that utilize both textual and structural information are emerging, but their performances lag behind state-of-the-art (SOTA) structure-based methods or some methods lose their inductive inference capabilities in the p… ▽ More The goal of knowledge graph completion (KGC) is to predict missing links in a KG using trained facts that are already known. In recent, pre-trained language model (PLM) based methods that utilize both textual and structural information are emerging, but their performances lag behind state-of-the-art (SOTA) structure-based methods or some methods lose their inductive inference capabilities in the process of fusing structure embedding to text encoder. In this paper, we propose a novel method to effectively unify structure information and language semantics without losing the power of inductive reasoning. We adopt entity anchors and these anchors and textual description of KG elements are fed together into the PLM-based encoder to learn unified representations. In addition, the proposed method utilizes additional random negative samples which can be reused in the each mini-batch during contrastive learning to learn a generalized entity representations. We verify the effectiveness of the our proposed method through various experiments and analysis. The experimental results on standard benchmark widely used in link prediction task show that the proposed model outperforms existing the SOTA KGC models. Especially, our method show the largest performance improvement on FB15K-237, which is competitive to the SOTA of structure-based KGC methods. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.09669 [pdf, other]

A Framework For Automated Dissection Along Tissue Boundary

Authors: Ki-Hwan Oh, Leonardo Borgioli, Milos Zefran, Liaohai Chen, Pier Cristoforo Giulianotti

Abstract: Robotic surgery promises enhanced precision and adaptability over traditional surgical methods. It also offers the possibility of automating surgical interventions, resulting in reduced stress on the surgeon, better surgical outcomes, and lower costs. Cholecystectomy, the removal of the gallbladder, serves as an ideal model procedure for automation due to its distinct and well-contrasted anatomica… ▽ More Robotic surgery promises enhanced precision and adaptability over traditional surgical methods. It also offers the possibility of automating surgical interventions, resulting in reduced stress on the surgeon, better surgical outcomes, and lower costs. Cholecystectomy, the removal of the gallbladder, serves as an ideal model procedure for automation due to its distinct and well-contrasted anatomical features between the gallbladder and liver, along with standardized surgical maneuvers. Dissection is a frequently used subtask in cholecystectomy where the surgeon delivers the energy on the hook to detach the gallbladder from the liver. Hence, dissection along tissue boundaries is a good candidate for surgical automation. For the da Vinci surgical robot to perform the same procedure as a surgeon automatically, it needs to have the ability to (1) recognize and distinguish between the two different tissues (e.g. the liver and the gallbladder), (2) understand where the boundary between the two tissues is located in the 3D workspace, (3) locate the instrument tip relative to the boundary in the 3D space using visual feedback, and (4) move the instrument along the boundary. This paper presents a novel framework that addresses these challenges through AI-assisted image processing and vision-based robot control. We also present the ex-vivo evaluation of the automated procedure on chicken and pork liver specimens that demonstrates the effectiveness of the proposed framework. △ Less

Submitted 27 February, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: 7 pages, 7 figures, 7 tables, submitted to 2024 International Conference on Biomedical Robotics and Biomechatronics

arXiv:2310.08598 [pdf, other]

Domain Generalization for Medical Image Analysis: A Survey

Authors: Jee Seok Yoon, Kwanseok Oh, Yooseung Shin, Maciej A. Mazurowski, Heung-Il Suk

Abstract: Medical image analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, deploying DL models for MedIA in real-world situations remains challenging due to their failure to generalize across the distributional gap bet… ▽ More Medical image analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, deploying DL models for MedIA in real-world situations remains challenging due to their failure to generalize across the distributional gap between training and testing samples - a problem known as domain shift. Researchers have dedicated their efforts to develo** various DL methods to adapt and perform robustly on unknown and out-of-distribution data distributions. This paper comprehensively reviews domain generalization studies specifically tailored for MedIA. We provide a holistic view of how domain generalization techniques interact within the broader MedIA system, going beyond methodologies to consider the operational implications on the entire MedIA workflow. Specifically, we categorize domain generalization methods into data-level, feature-level, model-level, and analysis-level methods. We show how those methods can be used in various stages of the MedIA workflow with DL equipped from data acquisition to model prediction and analysis. Furthermore, we critically analyze the strengths and weaknesses of various methods, unveiling future research opportunities. △ Less

Submitted 15 February, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

arXiv:2310.03457 [pdf, other]

A Quantitatively Interpretable Model for Alzheimer's Disease Prediction Using Deep Counterfactuals

Authors: Kwanseok Oh, Da-Woon Heo, Ahmad Wisnu Mulyadi, Wonsik Jung, Eunsong Kang, Kun Ho Lee, Heung-Il Suk

Abstract: Deep learning (DL) for predicting Alzheimer's disease (AD) has provided timely intervention in disease progression yet still demands attentive interpretability to explain how their DL models make definitive decisions. Recently, counterfactual reasoning has gained increasing attention in medical research because of its ability to provide a refined visual explanatory map. However, such visual explan… ▽ More Deep learning (DL) for predicting Alzheimer's disease (AD) has provided timely intervention in disease progression yet still demands attentive interpretability to explain how their DL models make definitive decisions. Recently, counterfactual reasoning has gained increasing attention in medical research because of its ability to provide a refined visual explanatory map. However, such visual explanatory maps based on visual inspection alone are insufficient unless we intuitively demonstrate their medical or neuroscientific validity via quantitative features. In this study, we synthesize the counterfactual-labeled structural MRIs using our proposed framework and transform it into a gray matter density map to measure its volumetric changes over the parcellated region of interest (ROI). We also devised a lightweight linear classifier to boost the effectiveness of constructed ROIs, promoted quantitative interpretation, and achieved comparable predictive performance to DL methods. Throughout this, our framework produces an ``AD-relatedness index'' for each ROI and offers an intuitive understanding of brain status for an individual patient and across patient groups with respect to AD progression. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 15 pages, 5 figures, 4 tables

arXiv:2308.09177 [pdf, other]

doi 10.1145/3577190.3614174

Recognizing Intent in Collaborative Manipulation

Authors: Zhanibek Rysbek, Ki Hwan Oh, Milos Zefran

Abstract: Collaborative manipulation is inherently multimodal, with haptic communication playing a central role. When performed by humans, it involves back-and-forth force exchanges between the participants through which they resolve possible conflicts and determine their roles. Much of the existing work on collaborative human-robot manipulation assumes that the robot follows the human. But for a robot to m… ▽ More Collaborative manipulation is inherently multimodal, with haptic communication playing a central role. When performed by humans, it involves back-and-forth force exchanges between the participants through which they resolve possible conflicts and determine their roles. Much of the existing work on collaborative human-robot manipulation assumes that the robot follows the human. But for a robot to match the performance of a human partner it needs to be able to take initiative and lead when appropriate. To achieve such human-like performance, the robot needs to have the ability to (1) determine the intent of the human, (2) clearly express its own intent, and (3) choose its actions so that the dyad reaches consensus. This work proposes a framework for recognizing human intent in collaborative manipulation tasks using force exchanges. Grounded in a dataset collected during a human study, we introduce a set of features that can be computed from the measured signals and report the results of a classifier trained on our collected human-human interaction data. Two metrics are used to evaluate the intent recognizer: overall accuracy and the ability to correctly identify transitions. The proposed recognizer shows robustness against the variations in the partner's actions and the confounding effects due to the variability in grasp forces and dynamic effects of walking. The results demonstrate that the proposed recognizer is well-suited for implementation in a physical interaction control scheme. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2307.13677 [pdf, other]

doi 10.1145/3590140.3592850

Smartpick: Workload Prediction for Serverless-enabled Scalable Data Analytics Systems

Authors: Anshuman Das Mohapatra, Kwangsung Oh

Abstract: Many data analytic systems have adopted a newly emerging compute resource, serverless (SL), to handle data analytics queries in a timely and cost-efficient manner, i.e., serverless data analytics. While these systems can start processing queries quickly thanks to the agility and scalability of SL, they may encounter performance- and cost-bottlenecks based on workloads due to SL's worse performance… ▽ More Many data analytic systems have adopted a newly emerging compute resource, serverless (SL), to handle data analytics queries in a timely and cost-efficient manner, i.e., serverless data analytics. While these systems can start processing queries quickly thanks to the agility and scalability of SL, they may encounter performance- and cost-bottlenecks based on workloads due to SL's worse performance and more expensive cost than traditional compute resources, e.g., virtual machine (VM). In this project, we introduce Smartpick, a SL-enabled scalable data analytics system that exploits SL and VM together to realize composite benefits, i.e., agility from SL and better performance with reduced cost from VM. Smartpick uses a machine learning prediction scheme, decision-tree based Random Forest with Bayesian Optimizer, to determine SL and VM configurations, i.e., how many SL and VM instances for queries, that meet cost-performance goals. Smartpick offers a knob for applications to allow them to explore a richer cost-performance tradeoff space opened by exploiting SL and VM together. To maximize the benefits of SL, Smartpick supports a simple but strong mechanism, called relay-instances. Smartpick also supports event-driven prediction model retraining to deal with workload dynamics. A Smartpick prototype was implemented on Spark and deployed on live test-beds, Amazon AWS and Google Cloud Platform. Evaluation results indicate 97.05% and 83.49% prediction accuracies respectively with up to 50% cost reduction as opposed to the baselines. The results also confirm that Smartpick allows data analytics applications to navigate the richer cost-performance tradeoff space efficiently and to handle workload dynamics effectively and automatically. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: This paper is accepted for publication at the 24th ACM/IFIP International Middleware Conference (Middleware '23)

arXiv:2307.10204 [pdf, ps, other]

An IPW-based Unbiased Ranking Metric in Two-sided Markets

Authors: Keisho Oh, Naoki Nishimura, Minje Sung, Ken Kobayashi, Kazuhide Nakata

Abstract: In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions requ… ▽ More In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions require matching preferences from both users. This paper addresses the complex interaction of biases between users in two-sided markets and proposes a tailored LTR approach. We first present a formulation of feedback mechanisms in two-sided matching platforms and point out that their implicit feedback may include position bias from both user groups. On the basis of this observation, we extend the IPW estimator and propose a new estimator, named two-sided IPW, to address the position bases in two-sided markets. We prove that the proposed estimator satisfies the unbiasedness for the ground-truth ranking metric. We conducted numerical experiments on real-world two-sided platforms and demonstrated the effectiveness of our proposed method in terms of both precision and robustness. Our experiments showed that our method outperformed baselines especially when handling rare items, which are less frequently observed in the training data. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2304.12288 [pdf, other]

Robots Taking Initiative in Collaborative Object Manipulation: Lessons from Physical Human-Human Interaction

Authors: Zhanibek Rysbek, Ki Hwan Oh, Afagh Mehri Shervedani, Timotej Klemencic, Milos Zefran, Barbara Di Eugenio

Abstract: Physical Human-Human Interaction (pHHI) involves the use of multiple sensory modalities. Studies of communication through spoken utterances and gestures are well established, but communication through force signals is not well understood. In this paper, we focus on investigating the mechanisms employed by humans during the negotiation through force signals, and how the robot can communicate task g… ▽ More Physical Human-Human Interaction (pHHI) involves the use of multiple sensory modalities. Studies of communication through spoken utterances and gestures are well established, but communication through force signals is not well understood. In this paper, we focus on investigating the mechanisms employed by humans during the negotiation through force signals, and how the robot can communicate task goals, comprehend human intent, and take the lead as needed. To achieve this, we formulate a task that requires active force communication and propose a taxonomy that extends existing literature. Also, we conducted a study to observe how humans behave during collaborative manipulation tasks. An important contribution of this work is the novel features based on force-kinematic signals that demonstrate predictive power to recognize symbolic human intent. Further, we show the feasibility of develo** a real-time intent classifier based on the novel features and speculate the role it plays in high-level robot controllers for physical Human-Robot Interaction (pHRI). This work provides important steps to achieve more human-like fluid interaction in physical co-manipulation tasks that are applicable and not limited to humanoid, assistive robots, and human-in-the-loop automation. △ Less

Submitted 29 July, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.08878 [pdf, other]

Deep Collective Knowledge Distillation

Authors: Jihyeon Seo, Kyusam Oh, Chanho Min, Yongkeun Yun, Sungwoo Cho

Abstract: Many existing studies on knowledge distillation have focused on methods in which a student model mimics a teacher model well. Simply imitating the teacher's knowledge, however, is not sufficient for the student to surpass that of the teacher. We explore a method to harness the knowledge of other students to complement the knowledge of the teacher. We propose deep collective knowledge distill… ▽ More Many existing studies on knowledge distillation have focused on methods in which a student model mimics a teacher model well. Simply imitating the teacher's knowledge, however, is not sufficient for the student to surpass that of the teacher. We explore a method to harness the knowledge of other students to complement the knowledge of the teacher. We propose deep collective knowledge distillation for model compression, called DCKD, which is a method for training student models with rich information to acquire knowledge from not only their teacher model but also other student models. The knowledge collected from several student models consists of a wealth of information about the correlation between classes. Our DCKD considers how to increase the correlation knowledge of classes during training. Our novel method enables us to create better performing student models for collecting knowledge. This simple yet powerful method achieves state-of-the-art performances in many experiments. For example, for ImageNet, ResNet18 trained with DCKD achieves 72.27\%, which outperforms the pretrained ResNet18 by 2.52\%. For CIFAR-100, the student model of ShuffleNetV1 with DCKD achieves 6.55\% higher top-1 accuracy than the pretrained ShuffleNetV1. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2212.10425 [pdf, other]

Evaluating Multimodal Interaction of Robots Assisting Older Adults

Authors: Afagh Mehri Shervedani, Ki-Hwan Oh, Bahareh Abbasi, Natawut Monaikul, Zhanibek Rysbek, Barbara Di Eugenio, Milos Zefran

Abstract: We outline our work on evaluating robots that assist older adults by engaging with them through multiple modalities that include physical interaction. Our thesis is that to increase the effectiveness of assistive robots: 1) robots need to understand and effect multimodal actions, 2) robots should not only react to the human, they need to take the initiative and lead the task when it is necessary.… ▽ More We outline our work on evaluating robots that assist older adults by engaging with them through multiple modalities that include physical interaction. Our thesis is that to increase the effectiveness of assistive robots: 1) robots need to understand and effect multimodal actions, 2) robots should not only react to the human, they need to take the initiative and lead the task when it is necessary. We start by briefly introducing our proposed framework for multimodal interaction and then describe two different experiments with the actual robots. In the first experiment, a Baxter robot helps a human find and locate an object using the Multimodal Interaction Manager (MIM) framework. In the second experiment, a NAO robot is used in the same task, however, the roles of the robot and the human are reversed. We discuss the evaluation methods that were used in these experiments, including different metrics employed to characterize the performance of the robot in each case. We conclude by providing our perspective on the challenges and opportunities for the evaluation of assistive robots for older adults in realistic settings. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2207.13223 [pdf, other]

XADLiME: eXplainable Alzheimer's Disease Likelihood Map Estimation via Clinically-guided Prototype Learning

Authors: Ahmad Wisnu Mulyadi, Wonsik Jung, Kwanseok Oh, Jee Seok Yoon, Heung-Il Suk

Abstract: Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-le… ▽ More Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-learning approach through eXplainable AD Likelihood Map Estimation (XADLiME) for AD progression modeling over 3D sMRIs using clinically-guided prototype learning. Specifically, we establish a set of topologically-aware prototypes onto the clusters of latent clinical features, uncovering an AD spectrum manifold. We then measure the similarities between latent clinical features and well-established prototypes, estimating a "pseudo" likelihood map. By considering this pseudo map as an enriched reference, we employ an estimating network to estimate the AD likelihood map over a 3D sMRI scan. Additionally, we promote the explainability of such a likelihood map by revealing a comprehensible overview from two perspectives: clinical and morphological. During the inference, this estimated likelihood map served as a substitute over unseen sMRI scans for effectively conducting the downstream task while providing thorough explainable states. △ Less

Submitted 26 July, 2022; originally announced July 2022.

arXiv:2207.00208 [pdf, other]

doi 10.1145/3511808.3557067

e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

Authors: Wonyoung Shin, Jonghun Park, Taekang Woo, Yongwoo Cho, Kwang** Oh, Hwanjun Song

Abstract: Understanding vision and language representations of product content is vital for search and recommendation applications in e-commerce. As a backbone for online shop** platforms and inspired by the recent success in representation learning research, we propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images. We present technique… ▽ More Understanding vision and language representations of product content is vital for search and recommendation applications in e-commerce. As a backbone for online shop** platforms and inspired by the recent success in representation learning research, we propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images. We present techniques we used to train large-scale representation learning models and share solutions that address domain-specific challenges. We study the performance using our pre-trained model as backbones for diverse downstream tasks, including category classification, attribute extraction, product matching, product clustering, and adult product recognition. Experimental results show that our proposed method outperforms the baseline in each downstream task regarding both single modality and multiple modalities. △ Less

Submitted 22 August, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

Comments: Accepted to CIKM 2022

arXiv:2108.09451 [pdf, other]

Learn-Explain-Reinforce: Counterfactual Reasoning and Its Guidance to Reinforce an Alzheimer's Disease Diagnosis Model

Authors: Kwanseok Oh, Jee Seok Yoon, Heung-Il Suk

Abstract: Existing studies on disease diagnostic models focus either on diagnostic model learning for performance improvement or on the visual explanation of a trained diagnostic model. We propose a novel learn-explain-reinforce (LEAR) framework that unifies diagnostic model learning, visual explanation generation (explanation unit), and trained diagnostic model reinforcement (reinforcement unit) guided by… ▽ More Existing studies on disease diagnostic models focus either on diagnostic model learning for performance improvement or on the visual explanation of a trained diagnostic model. We propose a novel learn-explain-reinforce (LEAR) framework that unifies diagnostic model learning, visual explanation generation (explanation unit), and trained diagnostic model reinforcement (reinforcement unit) guided by the visual explanation. For the visual explanation, we generate a counterfactual map that transforms an input sample to be identified as an intended target label. For example, a counterfactual map can localize hypothetical abnormalities within a normal brain image that may cause it to be diagnosed with Alzheimer's disease (AD). We believe that the generated counterfactual maps represent data-driven and model-induced knowledge about a target task, i.e., AD diagnosis using structural MRI, which can be a vital source of information to reinforce the generalization of the trained diagnostic model. To this end, we devise an attention-based feature refinement module with the guidance of the counterfactual maps. The explanation and reinforcement units are reciprocal and can be operated iteratively. Our proposed approach was validated via qualitative and quantitative analysis on the ADNI dataset. Its comprehensibility and fidelity were demonstrated through ablation studies and comparisons with existing methods. △ Less

Submitted 21 August, 2021; originally announced August 2021.

Comments: 14 pages, 9 figures

arXiv:2106.11756 [pdf, other]

Trinity: A No-Code AI platform for complex spatial datasets

Authors: C. V. Krishnakumar Iyer, Feili Hou, Henry Wang, Yonghong Wang, Kay Oh, Swetava Ganguli, Vipul Pandey

Abstract: We present a no-code Artificial Intelligence (AI) platform called Trinity with the main design goal of enabling both machine learning researchers and non-technical geospatial domain experts to experiment with domain-specific signals and datasets for solving a variety of complex problems on their own. This versatility to solve diverse problems is achieved by transforming complex Spatio-temporal dat… ▽ More We present a no-code Artificial Intelligence (AI) platform called Trinity with the main design goal of enabling both machine learning researchers and non-technical geospatial domain experts to experiment with domain-specific signals and datasets for solving a variety of complex problems on their own. This versatility to solve diverse problems is achieved by transforming complex Spatio-temporal datasets to make them consumable by standard deep learning models, in this case, Convolutional Neural Networks (CNNs), and giving the ability to formulate disparate problems in a standard way, eg. semantic segmentation. With an intuitive user interface, a feature store that hosts derivatives of complex feature engineering, a deep learning kernel, and a scalable data processing mechanism, Trinity provides a powerful platform for domain experts to share the stage with scientists and engineers in solving business-critical problems. It enables quick prototy**, rapid experimentation and reduces the time to production by standardizing model building and deployment. In this paper, we present our motivation behind Trinity and its design along with showcasing sample applications to motivate the idea of lowering the bar to using AI. △ Less

Submitted 1 July, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: 12 pages

arXiv:2106.05542 [pdf]

doi 10.1109/ICPR48806.2021.9412928

DUET: Detection Utilizing Enhancement for Text in Scanned or Captured Documents

Authors: Eun-Soo Jung, HyeongGwan Son, Kyusam Oh, Yongkeun Yun, Soonhwan Kwon, Min Soo Kim

Abstract: We present a novel deep neural model for text detection in document images. For robust text detection in noisy scanned documents, the advantages of multi-task learning are adopted by adding an auxiliary task of text enhancement. Namely, our proposed model is designed to perform noise reduction and text region enhancement as well as text detection. Moreover, we enrich the training data for the mode… ▽ More We present a novel deep neural model for text detection in document images. For robust text detection in noisy scanned documents, the advantages of multi-task learning are adopted by adding an auxiliary task of text enhancement. Namely, our proposed model is designed to perform noise reduction and text region enhancement as well as text detection. Moreover, we enrich the training data for the model with synthesized document images that are fully labeled for text detection and enhancement, thus overcome the insufficiency of labeled document image data. For the effective exploitation of the synthetic and real data, the training process is separated in two phases. The first phase is training only synthetic data in a fully-supervised manner. Then real data with only detection labels are added in the second phase. The enhancement task for the real data is weakly-supervised with information from their detection labels. Our methods are demonstrated in a real document dataset with performances exceeding those of other text detection methods. Moreover, ablations are conducted and the results confirm the effectiveness of the synthetic data, auxiliary task, and weak-supervision. Whereas the existing text detection studies mostly focus on the text in scenes, our proposed method is optimized to the applications for the text in scanned documents. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Journal ref: 2020 25th International Conference on Pattern Recognition (ICPR)

arXiv:2105.03386 [pdf, other]

Topology and Routing Problems: The Circular Frame

Authors: Rak-Kyeong Seong, Chanho Min, Sang-Hoon Han, Jaeho Yang, Seungwoo Nam, Kyusam Oh

Abstract: In this work, we solve the problem of finding non-intersecting paths between points on a plane with a new approach by borrowing ideas from geometric topology, in particular, from the study of polygonal schema in mathematics. We use a topological transformation on the 2-dimensional planar routing environment that simplifies the routing problem into a problem of connecting points on a circle with st… ▽ More In this work, we solve the problem of finding non-intersecting paths between points on a plane with a new approach by borrowing ideas from geometric topology, in particular, from the study of polygonal schema in mathematics. We use a topological transformation on the 2-dimensional planar routing environment that simplifies the routing problem into a problem of connecting points on a circle with straight line segments that do not intersect in the interior of the circle. These points are either the points that need to be connected by non-intersecting paths or special `reference' points that parametrize the topology of the original environment prior to the transformation. When all the necessary points on the circle are fully connected, the transformation is reversed such that the line segments combine to become the non-intersecting paths that connect the start and end points in the original environment. We interpret the transformed environment in which the routing problem is solved as a new data structure where any routing problem can be solved efficiently. We perform experiments and show that the routing time and success rate of the new routing algorithm outperforms the ones for the A*-algorithm. △ Less

Submitted 7 May, 2021; originally announced May 2021.

Comments: 15 pages, 10 figures

arXiv:2011.12429 [pdf]

Fully Automated Mitral Inflow Doppler Analysis Using Deep Learning

Authors: Mohamed Y. Elwazir, Zeynettin Akkus, Didem Oguz, Jae K. Oh

Abstract: Echocardiography (echo) is an indispensable tool in a cardiologist's diagnostic armamentarium. To date, almost all echocardiographic parameters require time-consuming manual labeling and measurements by an experienced echocardiographer and exhibit significant variability, owing to the noisy and artifact-laden nature of echo images. For example, mitral inflow (MI) Doppler is used to assess left ven… ▽ More Echocardiography (echo) is an indispensable tool in a cardiologist's diagnostic armamentarium. To date, almost all echocardiographic parameters require time-consuming manual labeling and measurements by an experienced echocardiographer and exhibit significant variability, owing to the noisy and artifact-laden nature of echo images. For example, mitral inflow (MI) Doppler is used to assess left ventricular (LV) diastolic function, which is of paramount clinical importance to distinguish between different cardiac diseases. In the current work we present a fully automated workflow which leverages deep learning to a) label MI Doppler images acquired in an echo study, b) detect the envelope of MI Doppler signal, c) extract early and late filing (E and A wave) flow velocities and E-wave deceleration time from the envelope. We trained a variety of convolutional neural networks (CNN) models on 5544 images of 140 patients for predicting 24 image classes including MI Doppler images and obtained overall accuracy of 0.97 on 1737 images of 40 patients. Automated E and A wave velocity showed excellent correlation (Pearson R 0.99 and 0.98 respectively) and Bland Altman agreement (mean difference 0.06 and 0.05 m/s respectively and SD 0.03 for both) with the operator measurements. Deceleration time also showed good but lower correlation (Pearson R 0.82) and Bland-Altman agreement (mean difference: 34.1ms, SD: 30.9ms). These results demonstrate feasibility of Doppler echocardiography measurement automation and the promise of a fully automated echocardiography measurement package. △ Less

Submitted 24 November, 2020; originally announced November 2020.

Journal ref: IEEE BIBE 2020 Proceedings

arXiv:2011.10381 [pdf, other]

Born Identity Network: Multi-way Counterfactual Map Generation to Explain a Classifier's Decision

Authors: Kwanseok Oh, Jee Seok Yoon, Heung-Il Suk

Abstract: There exists an apparent negative correlation between performance and interpretability of deep learning models. In an effort to reduce this negative correlation, we propose a Born Identity Network (BIN), which is a post-hoc approach for producing multi-way counterfactual maps. A counterfactual map transforms an input sample to be conditioned and classified as a target label, which is similar to ho… ▽ More There exists an apparent negative correlation between performance and interpretability of deep learning models. In an effort to reduce this negative correlation, we propose a Born Identity Network (BIN), which is a post-hoc approach for producing multi-way counterfactual maps. A counterfactual map transforms an input sample to be conditioned and classified as a target label, which is similar to how humans process knowledge through counterfactual thinking. For example, a counterfactual map can localize hypothetical abnormalities from a normal brain image that may cause it to be diagnosed with a disease. Specifically, our proposed BIN consists of two core components: Counterfactual Map Generator and Target Attribution Network. The Counterfactual Map Generator is a variation of conditional GAN which can synthesize a counterfactual map conditioned on an arbitrary target label. The Target Attribution Network provides adequate assistance for generating synthesized maps by conditioning a target label into the Counterfactual Map Generator. We have validated our proposed BIN in qualitative and quantitative analysis on MNIST, 3D Shapes, and ADNI datasets, and showed the comprehensibility and fidelity of our method from various ablation studies. △ Less

Submitted 8 April, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

Comments: 17 pages, 10 figures

arXiv:2011.06769 [pdf]

Toward the Fully Physics-Informed Echo State Network -- an ODE Approximator Based on Recurrent Artificial Neurons

Authors: Dong Keun Oh

Abstract: Inspired by recent theoretical arguments, physics-informed echo state network (ESN) is discussed on the attempt to train a reservoir model absolutely in physics-informed manner. As the plainest work on such a purpose, an ODE (ordinary differential equation) approximator is designed to replicate the solution in sequence with respect to the recurrent evaluations. On the principal invariance of diffe… ▽ More Inspired by recent theoretical arguments, physics-informed echo state network (ESN) is discussed on the attempt to train a reservoir model absolutely in physics-informed manner. As the plainest work on such a purpose, an ODE (ordinary differential equation) approximator is designed to replicate the solution in sequence with respect to the recurrent evaluations. On the principal invariance of differential equations, the constraint in recurrence just takes shape to secure a proper regression method for the ESN-based ODE approximator. After then, the actual training process is established on the idea of two-pass strategy for regression. Aiming at the fully physics-informed reservoir model, a couple of nonlinear dynamical problems are demonstrated as the computations obtained from the proposed method in this study. △ Less

Submitted 13 November, 2020; originally announced November 2020.

Comments: 30 pages, 12 figures, research paper

MSC Class: 68T27; 65L99 ACM Class: I.2.8; J.2

arXiv:1807.11720 [pdf, other]

doi 10.1109/ACCESS.2019.2963055

Regional Multi-scale Approach for Visually Pleasing Explanations of Deep Neural Networks

Authors: Dasom Seo, Kanghan Oh, Il-Seok Oh

Abstract: Recently, many methods to interpret and visualize deep neural network predictions have been proposed and significant progress has been made. However, a more class-discriminative and visually pleasing explanation is required. Thus, this paper proposes a region-based approach that estimates feature importance in terms of appropriately segmented regions. By fusing the saliency maps generated from mul… ▽ More Recently, many methods to interpret and visualize deep neural network predictions have been proposed and significant progress has been made. However, a more class-discriminative and visually pleasing explanation is required. Thus, this paper proposes a region-based approach that estimates feature importance in terms of appropriately segmented regions. By fusing the saliency maps generated from multi-scale segmentations, a more class-discriminative and visually pleasing map is obtained. We incorporate this regional multi-scale concept into a prediction difference method that is model-agnostic. An input image is segmented in several scales using the super-pixel method, and exclusion of a region is simulated by sampling a normal distribution constructed using the boundary prior. The experimental results demonstrate that the regional multi-scale method produces much more class-discriminative and visually pleasing saliency maps. △ Less

Submitted 1 August, 2018; v1 submitted 31 July, 2018; originally announced July 2018.

Comments: 9 pages, 5 figures, submitted on NIPS 2018

arXiv:1501.01725 [pdf]

Load-Modulated Single-RF MIMO Transmission for Spatially Multiplexed QAM Signals

Authors: Seung-Eun Hong, Kyoung-Sub Oh

Abstract: Today, MIMO has become an indispensable scheme for providing significant spectral efficiency in wireless communication and for future wireless system, recently, it goes to two extremes: massive MIMO and single-RF MIMO. This paper, which is put in the latter, utilizes load-modulated arrays with only reactance loads for single-RF transmission of spatially multiplexed QAM signals. To alleviate the ne… ▽ More Today, MIMO has become an indispensable scheme for providing significant spectral efficiency in wireless communication and for future wireless system, recently, it goes to two extremes: massive MIMO and single-RF MIMO. This paper, which is put in the latter, utilizes load-modulated arrays with only reactance loads for single-RF transmission of spatially multiplexed QAM signals. To alleviate the need for iterative processes while considering mutual coupling in the compact antenna, we present a novel design methodology for the loading network, which enables the exact computation of the three reactance loads per antenna element and also the perfect matching to the source with the opportunity to select appropriate analog tunable loads. We verify the design methodology by comparing the calculated values for some key parameters with the values from the circuit simulation. In addition, as an evaluation of the proposed architecture, we perform the bit error rate (BER) comparison which shows that our scheme with ideal loading is comparable to the conventional MIMO. △ Less

Submitted 7 January, 2015; originally announced January 2015.

Comments: 5 pages with 2-column format

Showing 1–27 of 27 results for author: Oh, K