-
ManWav: The First Manchu ASR Model
Authors:
Jean Seo,
Minha Kang,
Sungjoo Byun,
Sangah Lee
Abstract:
This study addresses the widening gap in Automatic Speech Recognition (ASR) research between high resource and extremely low resource languages, with a particular focus on Manchu, a critically endangered language. Manchu exemplifies the challenges faced by marginalized linguistic communities in accessing state-of-the-art technologies. In a pioneering effort, we introduce the first-ever Manchu ASR…
▽ More
This study addresses the widening gap in Automatic Speech Recognition (ASR) research between high resource and extremely low resource languages, with a particular focus on Manchu, a critically endangered language. Manchu exemplifies the challenges faced by marginalized linguistic communities in accessing state-of-the-art technologies. In a pioneering effort, we introduce the first-ever Manchu ASR model ManWav, leveraging Wav2Vec2-XLSR-53. The results of the first Manchu ASR is promising, especially when trained with our augmented data. Wav2Vec2-XLSR-53 fine-tuned with augmented data demonstrates a 0.02 drop in CER and 0.13 drop in WER compared to the same base model fine-tuned with original data.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Real-time Digital RF Emulation -- II: A Near Memory Custom Accelerator
Authors:
Mandovi Mukherjee,
Xiangyu Mao,
Nael Rahman,
Coleman DeLude,
Joe Driscoll,
Sudarshan Sharma,
Payman Behnam,
Uday Kamal,
Jongseok Woo,
Daehyun Kim,
Sharjeel Khan,
Jianming Tong,
Jamin Seo,
Prachi Sinha,
Madhavan Swaminathan,
Tushar Krishna,
Santosh Pande,
Justin Romberg,
Saibal Mukhopadhyay
Abstract:
A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous…
▽ More
A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous control to extract concurrency in compute as well as low latency. It achieves a $518$ MHz per channel bandwidth in a prototype $4$-node system. The maximum emulation range supported in this paradigm is $9.5$ km with $0.24$ $μ$s of per-sample emulation latency. 2). The FPGA-based implementation, evaluated on a Xilinx ZCU104 board, demonstrates a $9$-node test case (two Transmitters, one Receiver, and $6$ passive reflectors) with an emulation range of $1.13$ km to $27.3$ km at $215$ MHz bandwidth.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder
Authors:
Jiwan Seo,
Joonhyuk Kang
Abstract:
Vector Quantized Variational AutoEncoder (VQ-VAE) is an established technique in machine learning for learning discrete representations across various modalities. However, its scalability and applicability are limited by the need to retrain the model to adjust the codebook for different data or model scales. We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses this challenge…
▽ More
Vector Quantized Variational AutoEncoder (VQ-VAE) is an established technique in machine learning for learning discrete representations across various modalities. However, its scalability and applicability are limited by the need to retrain the model to adjust the codebook for different data or model scales. We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses this challenge with two novel codebook representation methods: a model-based approach using a clustering-based technique on an existing well-trained VQ-VAE model, and a data-driven approach utilizing a sequence-to-sequence (Seq2Seq) model for variable-rate codebook generation. Our experiments demonstrate that RAQ-VAE achieves effective reconstruction performance across multiple rates, often outperforming conventional fixed-rate VQ-VAE models. This work enhances the adaptability and performance of VQ-VAEs, with broad applications in data reconstruction, generation, and computer vision tasks.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
Authors:
Taein Kang,
Soyul Han,
Sunmook Choi,
Jae** Seo,
Sanghyeok Chung,
Seungeun Lee,
Seungsang Oh,
Il-Youp Kwak
Abstract:
Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, p…
▽ More
Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, particularly those utilizing large pretrained wav2vec 2.0 as a featurization front-end, highlights the importance of refined feature encoders. In response, this research assessed the representational capability of wav2vec 2.0 as an audio feature extractor, modifying the size of its pretrained Transformer layers through two key adjustments: (1) selecting a subset of layers starting from the leftmost one and (2) fine-tuning a portion of the selected layers from the rightmost one. We complemented this analysis with five spoofing detection back-end models, with a primary focus on AASIST, enabling us to pinpoint the optimal configuration for the selection and fine-tuning process. In contrast to conventional handcrafted features, our investigation identified several spoofing detection systems that achieve state-of-the-art performance in the ASVspoof 2019 LA dataset. This comprehensive exploration offers valuable insights into feature selection strategies, advancing the field of spoofing detection.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
A Comparison Between Lie Group- and Lie Algebra- Based Potential Functions for Geometric Impedance Control
Authors:
Joohwan Seo,
Nikhil Potu Surya Prakash,
Jongeun Choi,
Roberto Horowitz
Abstract:
In this paper, a comparison analysis between geometric impedance controls (GICs) derived from two different potential functions on SE(3) for robotic manipulators is presented. The first potential function is defined on the Lie group, utilizing the Frobenius norm of the configuration error matrix. The second potential function is defined utilizing the Lie algebra, i.e., log-map of the configuration…
▽ More
In this paper, a comparison analysis between geometric impedance controls (GICs) derived from two different potential functions on SE(3) for robotic manipulators is presented. The first potential function is defined on the Lie group, utilizing the Frobenius norm of the configuration error matrix. The second potential function is defined utilizing the Lie algebra, i.e., log-map of the configuration error. Using a differential geometric approach, the detailed derivation of the distance metric and potential function on SE(3) is introduced. The GIC laws are respectively derived from the two potential functions, followed by extensive comparison analyses. In the qualitative analysis, the properties of the error function and control laws are analyzed, while the performances of the controllers are quantitatively compared using numerical simulation.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases
Authors:
Sunghwa Lee,
Younghoon Shin,
Myungjong Kim,
Jiwon Seo
Abstract:
Several sensing techniques have been proposed for silent speech recognition (SSR); however, many of these methods require invasive processes or sensor attachment to the skin using adhesive tape or glue, rendering them unsuitable for frequent use in daily life. By contrast, impulse radio ultra-wideband (IR-UWB) radar can operate without physical contact with users' articulators and related body par…
▽ More
Several sensing techniques have been proposed for silent speech recognition (SSR); however, many of these methods require invasive processes or sensor attachment to the skin using adhesive tape or glue, rendering them unsuitable for frequent use in daily life. By contrast, impulse radio ultra-wideband (IR-UWB) radar can operate without physical contact with users' articulators and related body parts, offering several advantages for SSR. These advantages include high range resolution, high penetrability, low power consumption, robustness to external light or sound interference, and the ability to be embedded in space-constrained handheld devices. This study demonstrated IR-UWB radar-based contactless SSR using four types of speech stimuli (vowels, consonants, words, and phrases). To achieve this, a novel speech feature extraction algorithm specifically designed for IR-UWB radar-based SSR is proposed. Each speech stimulus is recognized by applying a classification algorithm to the extracted speech features. Two different algorithms, multidimensional dynamic time war** (MD-DTW) and deep neural network-hidden Markov model (DNN-HMM), were compared for the classification task. Additionally, a favorable radar antenna position, either in front of the user's lips or below the user's chin, was determined to achieve higher recognition accuracy. Experimental results demonstrated the efficacy of the proposed speech feature extraction algorithm combined with DNN-HMM for classifying vowels, consonants, words, and phrases. Notably, this study represents the first demonstration of phoneme-level SSR using contactless radar.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Deep-learning-driven end-to-end metalens imaging
Authors:
Joonhyuk Seo,
Jaegang Jo,
Joohoon Kim,
Joonho Kang,
Chanik Kang,
Seongwon Moon,
Eunji Lee,
Jehyeong Hong,
Junsuk Rho,
Haejun Chung
Abstract:
Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic ab…
▽ More
Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic aberration, angular aberration, and a relatively low efficiency. In this study, a deep-learning-based image restoration framework is proposed to overcome these limitations and realize end-to-end metalens imaging, thereby achieving aberration-free full-color imaging for mass-produced metalenses with 10-mm diameter. Neural-network-assisted metalens imaging achieved a high resolution comparable to that of the ground truth image.
△ Less
Submitted 10 May, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Clustering Techniques for Stable Linear Dynamical Systems with applications to Hard Disk Drives
Authors:
Nikhil Potu Surya Prakash,
Joohwan Seo,
Jongeun Choi,
Roberto Horowitz
Abstract:
In Robust Control and Data Driven Robust Control design methodologies, multiple plant transfer functions or a family of transfer functions are considered and a common controller is designed such that all the plants that fall into this family are stabilized. Though the plants are stabilized, the controller might be sub-optimal for each of the plants when the variations in the plants are large. This…
▽ More
In Robust Control and Data Driven Robust Control design methodologies, multiple plant transfer functions or a family of transfer functions are considered and a common controller is designed such that all the plants that fall into this family are stabilized. Though the plants are stabilized, the controller might be sub-optimal for each of the plants when the variations in the plants are large. This paper presents a way of clustering stable linear dynamical systems for the design of robust controllers within each of the clusters such that the controllers are optimal for each of the clusters. First a k-medoids algorithm for hard clustering will be presented for stable Linear Time Invariant (LTI) systems and then a Gaussian Mixture Models (GMM) clustering for a special class of LTI systems, common for Hard Disk Drive plants, will be presented.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Contact-rich SE(3)-Equivariant Robot Manipulation Task Learning via Geometric Impedance Control
Authors:
Joohwan Seo,
Nikhil Potu Surya Prakash,
Xiang Zhang,
Changhao Wang,
Jongeun Choi,
Masayoshi Tomizuka,
Roberto Horowitz
Abstract:
This paper presents a differential geometric control approach that leverages SE(3) group invariance and equivariance to increase transferability in learning robot manipulation tasks that involve interaction with the environment. Specifically, we employ a control law and a learning representation framework that remain invariant under arbitrary SE(3) transformations of the manipulation task definiti…
▽ More
This paper presents a differential geometric control approach that leverages SE(3) group invariance and equivariance to increase transferability in learning robot manipulation tasks that involve interaction with the environment. Specifically, we employ a control law and a learning representation framework that remain invariant under arbitrary SE(3) transformations of the manipulation task definition. Furthermore, the control law and learning representation framework are shown to be SE(3) equivariant when represented relative to the spatial frame. The proposed approach is based on utilizing a recently presented geometric impedance control (GIC) combined with a learning variable impedance control framework, where the gain scheduling policy is trained in a supervised learning fashion from expert demonstrations. A geometrically consistent error vector (GCEV) is fed to a neural network to achieve a gain scheduling policy that remains invariant to arbitrary translation and rotations. A comparison of our proposed control and learning framework with a well-known Cartesian space learning impedance control, equipped with a Cartesian error vector-based gain scheduling policy, confirms the significantly superior learning transferability of our proposed approach. A hardware implementation on a peg-in-hole task is conducted to validate the learning transferability and feasibility of the proposed approach.
△ Less
Submitted 18 December, 2023; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Reachable Set-based Path Planning for Automated Vertical Parking System
Authors:
In Hyuk Oh,
Ju Won Seo,
** Sung Kim,
Chung Choo Chung
Abstract:
This paper proposes a local path planning method with a reachable set for Automated vertical Parking Systems (APS). First, given a parking lot layout with a goal position, we define an intermediate pose for the APS to accomplish reverse parking with a single maneuver, i.e., without changing the gear shift. Then, we introduce a reachable set which is a set of points consisting of the grid points of…
▽ More
This paper proposes a local path planning method with a reachable set for Automated vertical Parking Systems (APS). First, given a parking lot layout with a goal position, we define an intermediate pose for the APS to accomplish reverse parking with a single maneuver, i.e., without changing the gear shift. Then, we introduce a reachable set which is a set of points consisting of the grid points of all possible intermediate poses. Once the APS approaches the goal position, it must select an intermediate pose in the reachable set. A minimization problem was formulated and solved to choose the intermediate pose. We performed various scenarios with different parking lot conditions. We used the Hybrid-A* algorithm for the global path planning to move the vehicle from the starting pose to the intermediate pose and utilized clothoid-based local path planning to move from the intermediate pose to the goal pose. Additionally, we designed a controller to follow the generated path and validated its tracking performance. It was confirmed that the tracking error in the mean root square for the lateral position was bounded within 0.06m and for orientation within 0.01rad.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Classification Method of Road Surface Condition and Type with LiDAR Using Spatiotemporal Information
Authors:
Ju Won Seo,
** Sung Kim,
Chung Choo Chung
Abstract:
This paper proposes a spatiotemporal architecture with a deep neural network (DNN) for road surface conditions and types classification using LiDAR. It is known that LiDAR provides information on the reflectivity and number of point clouds depending on a road surface. Thus, this paper utilizes the information to classify the road surface. We divided the front road area into four subregions. First,…
▽ More
This paper proposes a spatiotemporal architecture with a deep neural network (DNN) for road surface conditions and types classification using LiDAR. It is known that LiDAR provides information on the reflectivity and number of point clouds depending on a road surface. Thus, this paper utilizes the information to classify the road surface. We divided the front road area into four subregions. First, we constructed feature vectors using each subregion's reflectivity, number of point clouds, and in-vehicle information. Second, the DNN classifies road surface conditions and types for each subregion. Finally, the output of the DNN feeds into the spatiotemporal process to make the final classification reflecting vehicle speed and probability given by the outcomes of softmax functions of the DNN output layer. To validate the effectiveness of the proposed method, we performed a comparative study with five other algorithms. With the proposed DNN, we obtained the highest accuracy of 98.0\% and 98.6\% for two subregions near the vehicle. In addition, we implemented the proposed method on the Jetson TX2 board to confirm that it is applicable in real-time.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Low-Cost GNSS Simulators with Wireless Clock Synchronization for Indoor Positioning
Authors:
Woohyun Kim,
Jiwon Seo
Abstract:
In regions where global navigation satellite systems (GNSS) signals are unavailable, such as underground areas and tunnels, GNSS simulators can be deployed for transmitting simulated GNSS signals. Then, a GNSS receiver in the simulator coverage outputs the position based on the received GNSS signals (e.g., Global Positioning System (GPS) L1 signals in this study) transmitted by the corresponding s…
▽ More
In regions where global navigation satellite systems (GNSS) signals are unavailable, such as underground areas and tunnels, GNSS simulators can be deployed for transmitting simulated GNSS signals. Then, a GNSS receiver in the simulator coverage outputs the position based on the received GNSS signals (e.g., Global Positioning System (GPS) L1 signals in this study) transmitted by the corresponding simulator. This approach provides periodic position updates to GNSS users while deploying a small number of simulators without modifying the hardware and software of user receivers. However, the simulator clock should be synchronized to the GNSS satellite clock to generate almost identical signals to the live-sky GNSS signals, which is necessary for seamless indoor and outdoor positioning handover. The conventional clock synchronization method based on the wired connection between each simulator and an outdoor GNSS antenna causes practical difficulty and increases the cost of deploying the simulators. This study proposes a wireless clock synchronization method based on a private time server and time delay calibration. Additionally, we derived the constraints for determining the optimal simulator coverage and separation between adjacent simulators. The positioning performance of the proposed GPS simulator-based indoor positioning system was demonstrated in the underground testbed for a driving vehicle with a GPS receiver and a pedestrian with a smartphone. The average position errors were 3.7 m for the vehicle and 9.6 m for the pedestrian during the field tests with successful indoor and outdoor positioning handovers. Since those errors are within the coverage of each deployed simulator, it is confirmed that the proposed system with wireless clock synchronization can effectively provide periodic position updates to users where live-sky GNSS signals are unavailable.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Room Impulse Response Estimation in a Multiple Source Environment
Authors:
Kyungyun Lee,
Jeonghun Seo,
Keunwoo Choi,
Sangmoon Lee,
Ben Sangbae Chon
Abstract:
In real-world acoustic scenarios, there often are multiple sound sources present in a room. These sources are situated in various locations and produce sounds that reach the listener from multiple directions. The presence of multiple sources in a room creates new challenges in estimating the room impulse response (RIR) as each source has a unique RIR, dependent on its location and orientation. The…
▽ More
In real-world acoustic scenarios, there often are multiple sound sources present in a room. These sources are situated in various locations and produce sounds that reach the listener from multiple directions. The presence of multiple sources in a room creates new challenges in estimating the room impulse response (RIR) as each source has a unique RIR, dependent on its location and orientation. Therefore, issues of determining which RIR should be predicted and how to predict it arise, when the input signal is a mixture of multiple reverberated sources. To address these, we propose a new task of predicting a "representative" RIR for a room in a multiple source environment and present a training method to achieve this goal. In contrast to the model trained in a single source environment, our method shows robust performance, regardless of the number of sources in the environment.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Machine-Learning-Based Classification of GPS Signal Reception Conditions Using a Dual-Polarized Antenna in Urban Areas
Authors:
Sanghyun Kim,
Jiwon Seo
Abstract:
In urban areas, dense buildings frequently block and reflect global positioning system (GPS) signals, resulting in the reception of a few visible satellites with many multipath signals. This is a significant problem that results in unreliable positioning in urban areas. If a signal reception condition from a certain satellite can be detected, the positioning performance can be improved by excludin…
▽ More
In urban areas, dense buildings frequently block and reflect global positioning system (GPS) signals, resulting in the reception of a few visible satellites with many multipath signals. This is a significant problem that results in unreliable positioning in urban areas. If a signal reception condition from a certain satellite can be detected, the positioning performance can be improved by excluding or de-weighting the multipath contaminated satellite signal. Thus, we developed a machine-learning-based method of classifying GPS signal reception conditions using a dual-polarized antenna. We employed a decision tree algorithm for classification using three features, one of which can be obtained only from a dual-polarized antenna. A machine-learning model was trained using GPS signals collected from various locations. When the features extracted from the GPS raw signal are input, the generated machine-learning model outputs one of the three signal reception conditions: non-line-of-sight (NLOS) only, line-of-sight (LOS) only, or LOS+NLOS. Multiple testing datasets were used to analyze the classification accuracy, which was then compared with an existing method using dual single-polarized antennas. Consequently, when the testing dataset was collected at different locations from the training dataset, a classification accuracy of 64.47% was obtained, which was slightly higher than the accuracy of the existing method using dual single-polarized antennas. Therefore, the dual-polarized antenna solution is more beneficial than the dual single-polarized antenna solution because it has a more compact form factor and its performance is similar to that of the other solution.
△ Less
Submitted 6 May, 2023;
originally announced May 2023.
-
Performance Comparison of Numerical Optimization Algorithms for RSS-TOA-Based Target Localization
Authors:
Halim Lee,
Jiwon Seo
Abstract:
The maximum likelihood (ML) estimator can be applied to localize a target mobile device using the RSS and TOA. However, the ML estimator for the RSS-TOA-based target localization problem is nonconvex and nonlinear, having no analytical solution. Therefore, the ML estimator should be solved numerically, unless it is relaxed into a convex or linear form. This study investigates the target localizati…
▽ More
The maximum likelihood (ML) estimator can be applied to localize a target mobile device using the RSS and TOA. However, the ML estimator for the RSS-TOA-based target localization problem is nonconvex and nonlinear, having no analytical solution. Therefore, the ML estimator should be solved numerically, unless it is relaxed into a convex or linear form. This study investigates the target localization performance and computational complexity of numerical methods for solving an ML estimator. The three widely used numerical methods are: grid search, gradient descent, and particle swarm optimization. In the experimental evaluation, the grid search yielded the lowest target localization root-mean-squared error; however, the 95th percentile error of the grid search was larger than those of the other two algorithms. The average code computation time of the grid search was extremely large compared with those of the other two algorithms, and gradient descent exhibited the lowest computation time.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Data-Driven Track Following Control for Dual Stage-Actuator Hard Disk Drives
Authors:
Nikhil Potu Surya Prakash,
Joohwan Seo,
Alexander Rose,
Roberto Horowitz
Abstract:
In this paper, we present a frequency domain data-driven feedback control design methodology for the design of tracking controllers for hard disk drives with two-stage actuator as a part of the open invited track 'Benchmark Problem on Control System Design of Hard Disk Drive with a Dual-Stage Actuator' in the IFAC World Congress 2023 (Yokohoma, Japan). The benchmark models are Compared to the trad…
▽ More
In this paper, we present a frequency domain data-driven feedback control design methodology for the design of tracking controllers for hard disk drives with two-stage actuator as a part of the open invited track 'Benchmark Problem on Control System Design of Hard Disk Drive with a Dual-Stage Actuator' in the IFAC World Congress 2023 (Yokohoma, Japan). The benchmark models are Compared to the traditional controller design, we improve robustness and avoid model mismatch by using multiple frequency response plant measurements directly instead of plant models. Disturbance rejection and corresponding error minimization is posed as an H2 norm minimization problem with H infinity and H2 norm constraints. H infinity norm constraints are used to shape the closed loop transfer functions and ensure closed loop stability and H2 norm constraints are used to constrain and/or minimize the variance of relevant.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Nonlinear ill-posed problem in low-dose dental cone-beam computed tomography
Authors:
Hyoung Suk Park,
Chang Min Hyun,
** Keun Seo
Abstract:
This paper describes the mathematical structure of the ill-posed nonlinear inverse problem of low-dose dental cone-beam computed tomography (CBCT) and explains the advantages of a deep learning-based approach to the reconstruction of computed tomography images over conventional regularization methods. This paper explains the underlying reasons why dental CBCT is more ill-posed than standard comput…
▽ More
This paper describes the mathematical structure of the ill-posed nonlinear inverse problem of low-dose dental cone-beam computed tomography (CBCT) and explains the advantages of a deep learning-based approach to the reconstruction of computed tomography images over conventional regularization methods. This paper explains the underlying reasons why dental CBCT is more ill-posed than standard computed tomography. Despite this severe ill-posedness, the demand for dental CBCT systems is rapidly growing because of their cost competitiveness and low radiation dose. We then describe the limitations of existing methods in the accurate restoration of the morphological structures of teeth using dental CBCT data severely damaged by metal implants. We further discuss the usefulness of panoramic images generated from CBCT data for accurate tooth segmentation. We also discuss the possibility of utilizing radiation-free intra-oral scan data as prior information in CBCT image reconstruction to compensate for the damage to data caused by metal implants.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Performance Evaluation and Hybrid Application of the Greedy and Predictive UAV Trajectory Optimization Methods for Localizing a Target Mobile Device
Authors:
Halim Lee,
Jiwon Seo
Abstract:
This study investigates unmanned aerial vehicle (UAV) trajectory planning strategies for localizing a target mobile device in emergency situations. The global navigation satellite system (GNSS)-based accurate position information of a target mobile device in an emergency may not be always available to first responders. For example, 1) GNSS positioning accuracy may be degraded in harsh signal envir…
▽ More
This study investigates unmanned aerial vehicle (UAV) trajectory planning strategies for localizing a target mobile device in emergency situations. The global navigation satellite system (GNSS)-based accurate position information of a target mobile device in an emergency may not be always available to first responders. For example, 1) GNSS positioning accuracy may be degraded in harsh signal environments and 2) in countries where emergency positioning service is not mandatory, some mobile devices may not report their locations. Under the cases mentioned above, one way to find the target mobile device is to use UAVs. Dispatched UAVs may search the target directly on the emergency site by measuring the strength of the signal (e.g., LTE wireless communication signal) from the target mobile device. To accurately localize the target mobile device in the shortest time possible, UAVs should fly in the most efficient way possible. The two popular trajectory optimization strategies of UAVs are greedy and predictive approaches. However, the research on localization performances of the two approaches has been evaluated only under favorable settings (i.e., under good UAV geometries and small received signal strength (RSS) errors); more realistic scenarios still remain unexplored. In this study, we compare the localization performance of the greedy and predictive approaches under realistic RSS errors (i.e., up to 6 dB according to the ITU-R channel model).
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Geometric Impedance Control on SE(3) for Robotic Manipulators
Authors:
Joohwan Seo,
Nikhil Potu Surya Prakash,
Alexander Rose,
Jongeun Choi,
Roberto Horowitz
Abstract:
After its introduction, impedance control has been utilized as a primary control scheme for robotic manipulation tasks that involve interaction with unknown environments. While impedance control has been extensively studied, the geometric structure of SE(3) for the robotic manipulator itself and its use in formulating a robotic task has not been adequately addressed. In this paper, we propose a di…
▽ More
After its introduction, impedance control has been utilized as a primary control scheme for robotic manipulation tasks that involve interaction with unknown environments. While impedance control has been extensively studied, the geometric structure of SE(3) for the robotic manipulator itself and its use in formulating a robotic task has not been adequately addressed. In this paper, we propose a differential geometric approach to impedance control. Given a left-invariant error metric in SE(3), the corresponding error vectors in position and velocity are first derived. We then propose the impedance control schemes that adequately account for the geometric structure of the manipulator in SE(3) based on a left-invariant potential function. The closed-loop stabilities for the proposed control schemes are verified using Lyapunov function-based analysis. The proposed control design clearly outperformed a conventional impedance control approach when tracking challenging trajectory profiles.
△ Less
Submitted 18 December, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Evaluation of RF Fingerprinting-Aided RSS-Based Target Localization for Emergency Response
Authors:
Halim Lee,
Taewon Kang,
Suhui Jeong,
Jiwon Seo
Abstract:
Target localization is essential for emergency dispatching situations. Maximum likelihood estimation (MLE) methods are widely used to estimate the target position based on the received signal strength measurements. However, the performance of MLE solvers is significantly affected by the initialization (i.e., initial guess of the solution or solution search space). To address this, a previous study…
▽ More
Target localization is essential for emergency dispatching situations. Maximum likelihood estimation (MLE) methods are widely used to estimate the target position based on the received signal strength measurements. However, the performance of MLE solvers is significantly affected by the initialization (i.e., initial guess of the solution or solution search space). To address this, a previous study proposed the semidefinite programming (SDP)-based MLE initialization. However, the performance of the SDP-based initialization technique is largely affected by the shadowing variance and geometric diversity between the target and receivers. In this study, a radio frequency (RF) fingerprinting-based MLE initialization is proposed. Further, a maximum likelihood problem for target localization combining RF fingerprinting is formulated. In the three test environments of open space, urban, and indoor, the proposed RF fingerprinting-aided target localization method showed a performance improvement of up to 63.31% and an average of 39.13%, compared to the MLE algorithm initialized with SDP. Furthermore, unlike the SDP-MLE method, the proposed method was not significantly affected by the poor geometry between the target and receivers in our experiments.
△ Less
Submitted 18 June, 2022;
originally announced June 2022.
-
Seamless Accurate Positioning in Deep Urban Area based on Mode Switching Between DGNSS and Multipath Mitigation Positioning
Authors:
Yongjun Lee,
Yoola Hwang,
Jae Young Ahn,
Jiwon Seo,
Byungwoon Park
Abstract:
Multipath and non-line-of-sight (NLOS) signals are the major causes of poor accuracy of a global navigation satellite system (GNSS) in urban areas. Despite the wide usage of the GNSS in populated urban areas, it is difficult to suggest a generalized method because multipath errors are user-specific errors that cannot be eliminated by the DGNSS or a real-time kinematic technique. This paper introdu…
▽ More
Multipath and non-line-of-sight (NLOS) signals are the major causes of poor accuracy of a global navigation satellite system (GNSS) in urban areas. Despite the wide usage of the GNSS in populated urban areas, it is difficult to suggest a generalized method because multipath errors are user-specific errors that cannot be eliminated by the DGNSS or a real-time kinematic technique. This paper introduces a real-time multipath estimation and mitigation technique, which considers compensation for the time offset between constellations. It also presents a mode-switching algorithm between the DGNSS and multipath mitigating mode and shows that this technique can be effectively utilized for automobiles in a deep urban environment without any help from sensors other than GNSS. The availability is improved from 64% to 100% and the error RMS is reduced from 11.1 m to 1.2 m on Teheran-ro, Seoul, Korea. Because this method does not require prior information or additional sensor implementation for high-positioning performance in deep urban areas, it is expected to gain wide usage in not only the automotive industry but also future intelligent transportation systems.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Urban Road Safety Prediction: A Satellite Navigation Perspective
Authors:
Halim Lee,
Jiwon Seo,
Zaher M. Kassas
Abstract:
Predicting the safety of urban roads for navigation via global navigation satellite systems (GNSS) signals is considered. To ensure safe driving of automated vehicles, the vehicle must plan its trajectory to avoid navigating on unsafe roads (e.g., icy conditions, construction zones, narrow streets, etc.). Such information can be derived from the roads' physical properties, vehicle's capabilities,…
▽ More
Predicting the safety of urban roads for navigation via global navigation satellite systems (GNSS) signals is considered. To ensure safe driving of automated vehicles, the vehicle must plan its trajectory to avoid navigating on unsafe roads (e.g., icy conditions, construction zones, narrow streets, etc.). Such information can be derived from the roads' physical properties, vehicle's capabilities, and weather conditions. From a GNSS-based navigation perspective, the reliability of GNSS signals in different locales, which is heavily dependent on the road layout within the surrounding environment, is crucial to ensure safe automated driving. An urban road environment surrounded by tall objects can significantly degrade the accuracy and availability of GNSS signals. This article proposes an approach to predict the reliability of GNSS-based navigation to ensure safe urban navigation. Satellite navigation reliability at a given location and time on a road is determined based on the probabilistic position error bound of the vehicle-mounted GNSS receiver. A metric for GNSS reliability for ground vehicles is suggested, and a method to predict the conservative probabilistic error bound of the GNSS navigation solution is proposed. A satellite navigation reliability map is generated for various navigation applications. As a case study, the reliability map is used in the proposed optimization problem formulation for automated ground vehicle safety-constrained path planning.
△ Less
Submitted 21 June, 2022; v1 submitted 6 June, 2022;
originally announced June 2022.
-
AI can evolve without labels: self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation
Authors:
Sangjoon Park,
Gwanghyun Kim,
Yu** Oh,
Joon Beom Seo,
Sang Min Lee,
** Hwan Kim,
Sungjun Moon,
Jae-Kwang Lim,
Chang Min Park,
Jong Chul Ye
Abstract:
Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, develo** a robust deep learning model requires large, high-quality data with manual annotation, which is expensive to obtain. This situation poses the problem that the chest x-rays collected annually in hospitals cannot be used due to the lack of manual labeling by experts, especially i…
▽ More
Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, develo** a robust deep learning model requires large, high-quality data with manual annotation, which is expensive to obtain. This situation poses the problem that the chest x-rays collected annually in hospitals cannot be used due to the lack of manual labeling by experts, especially in deprived areas. To address this, here we present a novel deep learning framework that uses knowledge distillation through self-supervised learning and self-training, which shows that the performance of the original model trained with a small number of labels can be gradually improved with more unlabeled data. Experimental results show that the proposed framework maintains impressive robustness against a real-world environment and has general applicability to several diagnostic tasks such as tuberculosis, pneumothorax, and COVID-19. Notably, we demonstrated that our model performs even better than those trained with the same amount of labeled data. The proposed framework has a great potential for medical imaging, where plenty of data is accumulated every year, but ground truth annotations are expensive to obtain.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
Metal Artifact Reduction with Intra-Oral Scan Data for 3D Low Dose Maxillofacial CBCT Modeling
Authors:
Chang Min Hyun,
Taigyntuya Bayaraa,
Hye Sun Yun,
Tae Jun Jang,
Hyoung Suk Park,
** Keun Seo
Abstract:
Low-dose dental cone beam computed tomography (CBCT) has been increasingly used for maxillofacial modeling. However, the presence of metallic inserts, such as implants, crowns, and dental filling, causes severe streaking and shading artifacts in a CBCT image and loss of the morphological structures of the teeth, which consequently prevents accurate segmentation of bones. A two-stage metal artifact…
▽ More
Low-dose dental cone beam computed tomography (CBCT) has been increasingly used for maxillofacial modeling. However, the presence of metallic inserts, such as implants, crowns, and dental filling, causes severe streaking and shading artifacts in a CBCT image and loss of the morphological structures of the teeth, which consequently prevents accurate segmentation of bones. A two-stage metal artifact reduction method is proposed for accurate 3D low-dose maxillofacial CBCT modeling, where a key idea is to utilize explicit tooth shape prior information from intra-oral scan data whose acquisition does not require any extra radiation exposure. In the first stage, an image-to-image deep learning network is employed to mitigate metal-related artifacts. To improve the learning ability, the proposed network is designed to take advantage of the intra-oral scan data as side-inputs and perform multi-task learning of auxiliary tooth segmentation. In the second stage, a 3D maxillofacial model is constructed by segmenting the bones from the dental CBCT image corrected in the first stage. For accurate bone segmentation, weighted thresholding is applied, wherein the weighting region is determined depending on the geometry of the intra-oral scan data. Because acquiring a paired training dataset of metal-artifact-free and metal artifact-affected dental CBCT images is challenging in clinical practice, an automatic method of generating a realistic dataset according to the CBCT physics model is introduced. Numerical simulations and clinical experiments show the feasibility of the proposed method, which takes advantage of tooth surface information from intra-oral scan data in 3D low dose maxillofacial CBCT modeling.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Fully automatic integration of dental CBCT images and full-arch intraoral impressions with stitching error correction via individual tooth segmentation and identification
Authors:
Tae Jun Jang,
Hye Sun Yun,
Chang Min Hyun,
Jong-Eun Kim,
Sang-Hwy Lee,
** Keun Seo
Abstract:
We present a fully automated method of integrating intraoral scan (IOS) and dental cone-beam computerized tomography (CBCT) images into one image by complementing each image's weaknesses. Dental CBCT alone may not be able to delineate precise details of the tooth surface due to limited image resolution and various CBCT artifacts, including metal-induced artifacts. IOS is very accurate for the scan…
▽ More
We present a fully automated method of integrating intraoral scan (IOS) and dental cone-beam computerized tomography (CBCT) images into one image by complementing each image's weaknesses. Dental CBCT alone may not be able to delineate precise details of the tooth surface due to limited image resolution and various CBCT artifacts, including metal-induced artifacts. IOS is very accurate for the scanning of narrow areas, but it produces cumulative stitching errors during full-arch scanning. The proposed method is intended not only to compensate the low-quality of CBCT-derived tooth surfaces with IOS, but also to correct the cumulative stitching errors of IOS across the entire dental arch. Moreover, the integration provide both gingival structure of IOS and tooth roots of CBCT in one image. The proposed fully automated method consists of four parts; (i) individual tooth segmentation and identification module for IOS data (TSIM-IOS); (ii) individual tooth segmentation and identification module for CBCT data (TSIM-CBCT); (iii) global-to-local tooth registration between IOS and CBCT; and (iv) stitching error correction of full-arch IOS. The experimental results show that the proposed method achieved landmark and surface distance errors of 112.4 $μ$m and 301.7 $μ$m, respectively.
△ Less
Submitted 2 March, 2023; v1 submitted 3 December, 2021;
originally announced December 2021.
-
Optimal Parameter Inflation to Enhance the Availability of Single-Frequency GBAS for Intelligent Air Transportation
Authors:
Halim Lee,
Sam Pullen,
Jiyun Lee,
Byungwoon Park,
Moonseok Yoon,
Jiwon Seo
Abstract:
Ground-based Augmentation System (GBAS) augments Global Navigation Satellite Systems (GNSS) to support the precision approach and landing of aircraft. To guarantee integrity, existing single-frequency GBAS utilizes position-domain geometry screening to eliminate potentially unsafe satellite geometries by inflating one or more broadcast GBAS parameters. However, GBAS availability can be drastically…
▽ More
Ground-based Augmentation System (GBAS) augments Global Navigation Satellite Systems (GNSS) to support the precision approach and landing of aircraft. To guarantee integrity, existing single-frequency GBAS utilizes position-domain geometry screening to eliminate potentially unsafe satellite geometries by inflating one or more broadcast GBAS parameters. However, GBAS availability can be drastically impacted in low-latitude regions where severe ionospheric conditions have been observed. Thus, we developed a novel geometry-screening algorithm in this study to improve GBAS availability in low-latitude regions. Simulations demonstrate that the proposed method can provide 5-8 percentage point availability enhancement of GBAS at Galeão airport near Rio de Janeiro, Brazil, compared to existing methods.
△ Less
Submitted 23 April, 2022; v1 submitted 2 November, 2021;
originally announced November 2021.
-
SFOL DME Pulse Sha** Through Digital Predistortion for High-Accuracy DME
Authors:
Sunghwa Lee,
Euiho Kim,
Jiwon Seo
Abstract:
The Stretched-FrOnt-Leg (SFOL) pulse is a high-accuracy distance measuring equipment (DME) pulse developed to support alternative positioning and navigation for aircraft during global navigation satellite system outages. To facilitate the use of the SFOL pulse, it is best to use legacy DMEs that are already deployed to transmit the SFOL pulse, rather than the current Gaussian pulse, through softwa…
▽ More
The Stretched-FrOnt-Leg (SFOL) pulse is a high-accuracy distance measuring equipment (DME) pulse developed to support alternative positioning and navigation for aircraft during global navigation satellite system outages. To facilitate the use of the SFOL pulse, it is best to use legacy DMEs that are already deployed to transmit the SFOL pulse, rather than the current Gaussian pulse, through software changes only. When attempting to transmit the SFOL pulse in legacy DMEs, the greatest challenge is the pulse shape distortion caused by the pulse-sha** circuits and power amplifiers in the transmission unit such that the original SFOL pulse shape is no longer preserved. This letter proposes an inverse-learning-based DME digital predistortion method and presents successfully transmitted SFOL pulses from a testbed based on a commercial legacy DME that was designed to transmit Gaussian pulses.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
First Demonstration of the Korean eLoran Accuracy in a Narrow Waterway Using Improved ASF Maps
Authors:
Woohyun Kim,
Pyo-Woong Son,
Sul Gee Park,
Sang Hyun Park,
Jiwon Seo
Abstract:
The vulnerabilities of global navigation satellite systems (GNSSs) to radio frequency jamming and spoofing have attracted significant research attention. In particular, the large-scale jamming incidents that occurred in South Korea substantiate the practical importance of implementing a complementary navigation system. This letter briefly summarizes the efforts of South Korea to deploy an enhanced…
▽ More
The vulnerabilities of global navigation satellite systems (GNSSs) to radio frequency jamming and spoofing have attracted significant research attention. In particular, the large-scale jamming incidents that occurred in South Korea substantiate the practical importance of implementing a complementary navigation system. This letter briefly summarizes the efforts of South Korea to deploy an enhanced long-range navigation (eLoran) system, which is a terrestrial low-frequency radio navigation system that can complement GNSSs. After four years of research and development, the Korean eLoran testbed system has been recently deployed and is operational since June 1, 2021. Although its initial performance at sea is satisfactory, navigation through a narrow waterway is still challenging because a complete survey of the additional secondary factor (ASF), which is the largest source of error for eLoran, is practically difficult in a narrow waterway. This letter proposes an alternative way to survey the ASF in a narrow waterway and improve the ASF map generation methods. Moreover, the performance of the proposed approach was validated experimentally.
△ Less
Submitted 28 September, 2021; v1 submitted 18 September, 2021;
originally announced September 2021.
-
Enhanced Accuracy Simulator for a Future Korean Nationwide eLoran System
Authors:
Joon Hyo Rhee,
Sanghyun Kim,
Pyo-Woong Son,
Jiwon Seo
Abstract:
The Global Positioning System (GPS) has become the most widely used positioning, navigation, and timing system. However, the vulnerability of GPS to radio frequency interference has attracted significant attention. After experiencing several incidents of intentional high-power GPS jamming trials by North Korea, South Korea decided to deploy the enhanced long-range navigation (eLoran) system, which…
▽ More
The Global Positioning System (GPS) has become the most widely used positioning, navigation, and timing system. However, the vulnerability of GPS to radio frequency interference has attracted significant attention. After experiencing several incidents of intentional high-power GPS jamming trials by North Korea, South Korea decided to deploy the enhanced long-range navigation (eLoran) system, which is a high-power terrestrial radio-navigation system that can complement GPS. As the first phase of the South Korean eLoran program, an eLoran testbed system was recently developed and declared operational on June 1, 2021. Once its operational performance is determined to be satisfactory, South Korea plans to move to the second phase of the program, which is a nationwide eLoran system. For the optimal deployment of additional eLoran transmitters in a nationwide system, it is necessary to properly simulate the expected positioning accuracy of the said future system. In this study, we propose enhanced eLoran accuracy simulation methods based on a land cover map and transmitter jitter estimation. Using actual measurements over the country, the simulation accuracy of the proposed methods was confirmed to be approximately 10%-91% better than that of the existing Loran (i.e., Loran-C and eLoran) positioning accuracy simulators depending on the test locations.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
Barcode Method for Generative Model Evaluation driven by Topological Data Analysis
Authors:
Ryoungwoo Jang,
Minjee Kim,
Da-in Eun,
Kyung** Cho,
Jiyeon Seo,
Namkug Kim
Abstract:
Evaluating the performance of generative models in image synthesis is a challenging task. Although the Fréchet Inception Distance is a widely accepted evaluation metric, it integrates different aspects (e.g., fidelity and diversity) of synthesized images into a single score and assumes the normality of embedded vectors. Recent methods such as precision-and-recall and its variants such as density-a…
▽ More
Evaluating the performance of generative models in image synthesis is a challenging task. Although the Fréchet Inception Distance is a widely accepted evaluation metric, it integrates different aspects (e.g., fidelity and diversity) of synthesized images into a single score and assumes the normality of embedded vectors. Recent methods such as precision-and-recall and its variants such as density-and-coverage have been developed to separate fidelity and diversity based on k-nearest neighborhood methods. In this study, we propose an algorithm named barcode, which is inspired by the topological data analysis and is almost free of assumption and hyperparameter selections. In extensive experiments on real-world datasets as well as theoretical approach on high-dimensional normal samples, it was found that the 'usual' normality assumption of embedded vectors has several drawbacks. The experimental results demonstrate that barcode outperforms other methods in evaluating fidelity and diversity of GAN outputs. Official codes can be found in https://github.com/minjeekim00/Barcode.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Vision Transformer using Low-level Chest X-ray Feature Corpus for COVID-19 Diagnosis and Severity Quantification
Authors:
Sangjoon Park,
Gwanghyun Kim,
Yu** Oh,
Joon Beom Seo,
Sang Min Lee,
** Hwan Kim,
Sungjun Moon,
Jae-Kwang Lim,
Jong Chul Ye
Abstract:
Develo** a robust algorithm to diagnose and quantify the severity of COVID-19 using Chest X-ray (CXR) requires a large number of well-curated COVID-19 datasets, which is difficult to collect under the global COVID-19 pandemic. On the other hand, CXR data with other findings are abundant. This situation is ideally suited for the Vision Transformer (ViT) architecture, where a lot of unlabeled data…
▽ More
Develo** a robust algorithm to diagnose and quantify the severity of COVID-19 using Chest X-ray (CXR) requires a large number of well-curated COVID-19 datasets, which is difficult to collect under the global COVID-19 pandemic. On the other hand, CXR data with other findings are abundant. This situation is ideally suited for the Vision Transformer (ViT) architecture, where a lot of unlabeled data can be used through structural modeling by the self-attention mechanism. However, the use of existing ViT is not optimal, since feature embedding through direct patch flattening or ResNet backbone in the standard ViT is not intended for CXR. To address this problem, here we propose a novel Vision Transformer that utilizes low-level CXR feature corpus obtained from a backbone network that extracts common CXR findings. Specifically, the backbone network is first trained with large public datasets to detect common abnormal findings such as consolidation, opacity, edema, etc. Then, the embedded features from the backbone network are used as corpora for a Transformer model for the diagnosis and the severity quantification of COVID-19. We evaluate our model on various external test datasets from totally different institutions to evaluate the generalization capability. The experimental results confirm that our model can achieve the state-of-the-art performance in both diagnosis and severity quantification tasks with superior generalization capability, which are sine qua non of widespread deployment.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
RA-BNN: Constructing Robust & Accurate Binary Neural Network to Simultaneously Defend Adversarial Bit-Flip Attack and Improve Accuracy
Authors:
Adnan Siraj Rakin,
Li Yang,
**gtao Li,
Fan Yao,
Chaitali Chakrabarti,
Yu Cao,
Jae-sun Seo,
Deliang Fan
Abstract:
Recently developed adversarial weight attack, a.k.a. bit-flip attack (BFA), has shown enormous success in compromising Deep Neural Network (DNN) performance with an extremely small amount of model parameter perturbation. To defend against this threat, we propose RA-BNN that adopts a complete binary (i.e., for both weights and activation) neural network (BNN) to significantly improve DNN model robu…
▽ More
Recently developed adversarial weight attack, a.k.a. bit-flip attack (BFA), has shown enormous success in compromising Deep Neural Network (DNN) performance with an extremely small amount of model parameter perturbation. To defend against this threat, we propose RA-BNN that adopts a complete binary (i.e., for both weights and activation) neural network (BNN) to significantly improve DNN model robustness (defined as the number of bit-flips required to degrade the accuracy to as low as a random guess). However, such an aggressive low bit-width model suffers from poor clean (i.e., no attack) inference accuracy. To counter this, we propose a novel and efficient two-stage network growing method, named Early-Growth. It selectively grows the channel size of each BNN layer based on channel-wise binary masks training with Gumbel-Sigmoid function. Apart from recovering the inference accuracy, our RA-BNN after growing also shows significantly higher resistance to BFA. Our evaluation of the CIFAR-10 dataset shows that the proposed RA-BNN can improve the clean model accuracy by ~2-8 %, compared with a baseline BNN, while simultaneously improving the resistance to BFA by more than 125 x. Moreover, on ImageNet, with a sufficiently large (e.g., 5,000) amount of bit-flips, the baseline BNN accuracy drops to 4.3 % from 51.9 %, while our RA-BNN accuracy only drops to 37.1 % from 60.9 % (9 % clean accuracy improvement).
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Severity Quantification and Lesion Localization of COVID-19 on CXR using Vision Transformer
Authors:
Gwanghyun Kim,
Sangjoon Park,
Yu** Oh,
Joon Beom Seo,
Sang Min Lee,
** Hwan Kim,
Sungjun Moon,
Jae-Kwang Lim,
Jong Chul Ye
Abstract:
Under the global pandemic of COVID-19, building an automated framework that quantifies the severity of COVID-19 and localizes the relevant lesion on chest X-ray images has become increasingly important. Although pixel-level lesion severity labels, e.g. lesion segmentation, can be the most excellent target to build a robust model, collecting enough data with such labels is difficult due to time and…
▽ More
Under the global pandemic of COVID-19, building an automated framework that quantifies the severity of COVID-19 and localizes the relevant lesion on chest X-ray images has become increasingly important. Although pixel-level lesion severity labels, e.g. lesion segmentation, can be the most excellent target to build a robust model, collecting enough data with such labels is difficult due to time and labor-intensive annotation tasks. Instead, array-based severity labeling that assigns integer scores on six subdivisions of lungs can be an alternative choice enabling the quick labeling. Several groups proposed deep learning algorithms that quantify the severity of COVID-19 using the array-based COVID-19 labels and localize the lesions with explainability maps. To further improve the accuracy and interpretability, here we propose a novel Vision Transformer tailored for both quantification of the severity and clinically applicable localization of the COVID-19 related lesions. Our model is trained in a weakly-supervised manner to generate the full probability maps from weak array-based labels. Furthermore, a novel progressive self-training method enables us to build a model with a small labeled dataset. The quantitative and qualitative analysis on the external testset demonstrates that our method shows comparable performance with radiologists for both tasks with stability in a real-world application.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Vision Transformer for COVID-19 CXR Diagnosis using Chest X-ray Feature Corpus
Authors:
Sangjoon Park,
Gwanghyun Kim,
Yu** Oh,
Joon Beom Seo,
Sang Min Lee,
** Hwan Kim,
Sungjun Moon,
Jae-Kwang Lim,
Jong Chul Ye
Abstract:
Under the global COVID-19 crisis, develo** robust diagnosis algorithm for COVID-19 using CXR is hampered by the lack of the well-curated COVID-19 data set, although CXR data with other disease are abundant. This situation is suitable for vision transformer architecture that can exploit the abundant unlabeled data using pre-training. However, the direct use of existing vision transformer that use…
▽ More
Under the global COVID-19 crisis, develo** robust diagnosis algorithm for COVID-19 using CXR is hampered by the lack of the well-curated COVID-19 data set, although CXR data with other disease are abundant. This situation is suitable for vision transformer architecture that can exploit the abundant unlabeled data using pre-training. However, the direct use of existing vision transformer that uses the corpus generated by the ResNet is not optimal for correct feature embedding. To mitigate this problem, we propose a novel vision Transformer by using the low-level CXR feature corpus that are obtained to extract the abnormal CXR features. Specifically, the backbone network is trained using large public datasets to obtain the abnormal features in routine diagnosis such as consolidation, glass-grass opacity (GGO), etc. Then, the embedded features from the backbone network are used as corpus for vision transformer training. We examine our model on various external test datasets acquired from totally different institutions to assess the generalization ability. Our experiments demonstrate that our method achieved the state-of-art performance and has better generalization capability, which are crucial for a widespread deployment.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Automated 3D cephalometric landmark identification using computerized tomography
Authors:
Hye Sun Yun,
Chang Min Hyun,
Seong Hyeon Baek,
Sang-Hwy Lee,
** Keun Seo
Abstract:
Identification of 3D cephalometric landmarks that serve as proxy to the shape of human skull is the fundamental step in cephalometric analysis. Since manual landmarking from 3D computed tomography (CT) images is a cumbersome task even for the trained experts, automatic 3D landmark detection system is in a great need. Recently, automatic landmarking of 2D cephalograms using deep learning (DL) has a…
▽ More
Identification of 3D cephalometric landmarks that serve as proxy to the shape of human skull is the fundamental step in cephalometric analysis. Since manual landmarking from 3D computed tomography (CT) images is a cumbersome task even for the trained experts, automatic 3D landmark detection system is in a great need. Recently, automatic landmarking of 2D cephalograms using deep learning (DL) has achieved great success, but 3D landmarking for more than 80 landmarks has not yet reached a satisfactory level, because of the factors hindering machine learning such as the high dimensionality of the input data and limited amount of training data due to ethical restrictions on the use of medical data. This paper presents a semi-supervised DL method for 3D landmarking that takes advantage of anonymized landmark dataset with paired CT data being removed. The proposed method first detects a small number of easy-to-find reference landmarks, then uses them to provide a rough estimation of the entire landmarks by utilizing the low dimensional representation learned by variational autoencoder (VAE). Anonymized landmark dataset is used for training the VAE. Finally, coarse-to-fine detection is applied to the small bounding box provided by rough estimation, using separate strategies suitable for mandible and cranium. For mandibular landmarks, patch-based 3D CNN is applied to the segmented image of the mandible (separated from the maxilla), in order to capture 3D morphological features of mandible associated with the landmarks. We detect 6 landmarks around the condyle all at once, instead of one by one, because they are closely related to each other. For cranial landmarks, we again use VAE-based latent representation for more accurate annotation. In our experiment, the proposed method achieved an averaged 3D point-to-point error of 2.91 mm for 90 landmarks only with 15 paired training data.
△ Less
Submitted 16 December, 2020;
originally announced January 2021.
-
Single-Antenna-Based GPS Antijamming Method Exploiting Polarization Diversity
Authors:
Kwansik Park,
Jiwon Seo
Abstract:
The vulnerability of Global Positioning System (GPS) receivers to jammers is a major concern owing to the extremely weak received signal power of GPS. Researches have been conducted on a variety of antenna array techniques to be used as countermeasures to GPS jammers, and their antijamming performance is known to be greater than that of single antenna methods. However, the application of antenna a…
▽ More
The vulnerability of Global Positioning System (GPS) receivers to jammers is a major concern owing to the extremely weak received signal power of GPS. Researches have been conducted on a variety of antenna array techniques to be used as countermeasures to GPS jammers, and their antijamming performance is known to be greater than that of single antenna methods. However, the application of antenna arrays remains limited because of their size, cost, and computational complexity. This study proposes and experimentally validates a novel space-time-polarization domain adaptive processing for a single-element dual-polarized antenna (STPAPS) by focusing on the polarization diversity of a dual-polarized antenna. The mathematical models of arbitrarily polarized signals received by dual-polarized antenna are derived, and an appropriate constraint matrix for dual-polarized-antenna-based GPS antijam is suggested. To reduce the computational complexity of the constraint matrix approach, the eigenvector constraint design scheme is adopted. The performance of STPAPS is quantitively and qualitatively evaluated through experiments as follows. 1) The carrier-to-noise-density ratio (C/N0) of STPAPS under synthetic jamming is demonstrated to be higher than that of the previous minimum mean squared error (MMSE) or minimum variance distortionless response (MVDR) based dual-polarized antenna methods. 2) The strengths and weaknesses of STPAPS are qualitatively compared with those of the previous single-element dual-polarized antenna methods that are not based on the MMSE or MVDR algorithms. 3) The characteristics of STPAPS (in terms of the directions and polarizations of the GPS and jamming signals) are compared with those of the conventional two-element single-polarized antenna array method, which has the same degree of freedom as that of STPAPS.
△ Less
Submitted 18 November, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Integrity-Based Path Planning Strategy for Urban Autonomous Vehicular Navigation Using GPS and Cellular Signals
Authors:
Halim Lee,
Jiwon Seo,
Zaher M. Kassas
Abstract:
An integrity-based path planning strategy for autonomous ground vehicle (AGV) navigation in urban environments is developed. The vehicle is assumed to navigate by utilizing cellular long-term evolution (LTE) signals in addition to Global Positioning System (GPS) signals. Given a desired destination, an optimal path is calculated, which minimizes a cost function that considers both the horizontal p…
▽ More
An integrity-based path planning strategy for autonomous ground vehicle (AGV) navigation in urban environments is developed. The vehicle is assumed to navigate by utilizing cellular long-term evolution (LTE) signals in addition to Global Positioning System (GPS) signals. Given a desired destination, an optimal path is calculated, which minimizes a cost function that considers both the horizontal protection level (HPL) and travel distance. The constraints are that (i) the ratio of nodes with faulty signals to the total nodes be lower than a maximum allowable ratio and (ii) the HPLs along each candidate path be lower than the horizontal alert limit (HAL). To predict the faults and HPL before the vehicle is driven, GPS and LTE pseudoranges along the candidate paths are generated utilizing a commercial ray-tracing software and three-dimensional (3D) terrain and building maps. Simulated pseudoranges inform the path planning algorithm about potential biases due to reflections from buildings in urban environments. Simulation results are presented showing that the optimal path produced by the proposed path planning strategy has the minimum average HPL among the candidate paths.
△ Less
Submitted 12 October, 2020; v1 submitted 8 October, 2020;
originally announced October 2020.
-
RSS-based LTE Base Station Localization Using Single Receiver in Environment with Unknown Path-Loss Exponent
Authors:
Suhui Jeong,
Halim Lee,
Taewon Kang,
Jiwon Seo
Abstract:
With the increasing demand for location-based services, localization technology research has recently intensified. Received signal strength (RSS)-based localization has the advantage of simplicity. However, as RSS-based localization requires the path-loss model parameters, it is difficult to use in place on which those parameters are unknown. In prior research, a transmitter localization algorithm…
▽ More
With the increasing demand for location-based services, localization technology research has recently intensified. Received signal strength (RSS)-based localization has the advantage of simplicity. However, as RSS-based localization requires the path-loss model parameters, it is difficult to use in place on which those parameters are unknown. In prior research, a transmitter localization algorithm with multiple stationary receivers was proposed for use under unknown path-loss exponent (PLE) conditions. However, if a mobile receiver is utilized, the localization would be possible with a single receiver alone. In this paper, we suggest a method of RSS-based LTE base station (BS) localization with a single mobile receiver when the PLE is unknown. We also propose an efficient mobile-receiver movement method to improve the PLE estimation and BS localization accuracy. Simulation results demonstrate the performance of the proposed methods.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
A Preliminary Study of Machine-Learning-Based Ranging with LTE Channel Impulse Response in Multipath Environment
Authors:
Halim Lee,
Jiwon Seo
Abstract:
Alternative navigation technology to global navigation satellite systems (GNSSs) is required for unmanned ground vehicles (UGVs) in multipath environments (such as urban areas). In urban areas, long-term evolution (LTE) signals can be received ubiquitously at high power without any additional infrastructure. We present a machine learning approach to estimate the range between the LTE base station…
▽ More
Alternative navigation technology to global navigation satellite systems (GNSSs) is required for unmanned ground vehicles (UGVs) in multipath environments (such as urban areas). In urban areas, long-term evolution (LTE) signals can be received ubiquitously at high power without any additional infrastructure. We present a machine learning approach to estimate the range between the LTE base station and UGV based on the LTE channel impulse response (CIR). The CIR, which includes information of signal attenuation from the channel, was extracted from the LTE physical layer using a software-defined radio (SDR). We designed a convolutional neural network (CNN) that estimates ranges with the CIR as input. The proposed method demonstrated better ranging performance than a received signal strength indicator (RSSI)-based method during our field test.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Neural Network-Based Ranging with LTE Channel Impulse Response for Localization in Indoor Environments
Authors:
Halim Lee,
Ali A. Abdallah,
Jongmin Park,
Jiwon Seo,
Zaher M. Kassas
Abstract:
A neural network (NN)-based approach for indoor localization via cellular long-term evolution (LTE) signals is proposed. The approach estimates, from the channel impulse response (CIR), the range between an LTE eNodeB and a receiver. A software-defined radio (SDR) extracts the CIR, which is fed to a long short-term memory model (LSTM) recurrent neural network (RNN) to estimate the range. Experimen…
▽ More
A neural network (NN)-based approach for indoor localization via cellular long-term evolution (LTE) signals is proposed. The approach estimates, from the channel impulse response (CIR), the range between an LTE eNodeB and a receiver. A software-defined radio (SDR) extracts the CIR, which is fed to a long short-term memory model (LSTM) recurrent neural network (RNN) to estimate the range. Experimental results are presented comparing the proposed approach against a baseline RNN without LSTM. The results show a receiver navigating for 100 m in an indoor environment, while receiving signals from one LTE eNodeB. The ranging root-mean squared error (RMSE) and ranging maximum error along the receiver's trajectory were reduced from 13.11 m and 55.68 m, respectively, in the baseline RNN to 9.02 m and 27.40 m, respectively, with the proposed RNN-LSTM.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Effect of Outlier Removal from Temporal ASF Corrections on Multichain Loran Positioning Accuracy
Authors:
Jongmin Park,
Pyo-Woong Son,
Woohyun Kim,
Joon Hyo Rhee,
Jiwon Seo
Abstract:
The widely used global navigation satellite systems (GNSSs) are vulnerable to radio frequency interference (RFI). Long-range navigation (Loran), a terrestrial navigation system, can compensate for this weakness; however, it suffers from low positioning accuracy, and studies are under way to improve its positioning performance. One such study has proposed the multichain Loran positioning method tha…
▽ More
The widely used global navigation satellite systems (GNSSs) are vulnerable to radio frequency interference (RFI). Long-range navigation (Loran), a terrestrial navigation system, can compensate for this weakness; however, it suffers from low positioning accuracy, and studies are under way to improve its positioning performance. One such study has proposed the multichain Loran positioning method that uses the signals of transmitting stations belonging to different chains. Although the multichain Loran positioning performance is superior to the performance of conventional methods, the additional secondary factor (ASF) can still degrade its positioning accuracy. To mitigate the effects of temporal ASF, which is one of the ASF components, it is necessary to obtain temporal correction data from a nearby reference station at a known location. In this study, an experiment is performed to verify the effect of removing the outliers in the temporal correction data on the multichain Loran positioning accuracy.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Effects of Initial Attitude Estimation Errors on Loosely Coupled Smartphone GPS/IMU Integration System
Authors:
Kwansik Park,
Woohyun Kim,
Jiwon Seo
Abstract:
Global Positioning System (GPS) and inertial measurement unit (IMU) sensors are commonly integrated using the extended Kalman filter (EKF), for achieving better navigation performance. However, because of nonlinearity, the performance of the EKF is affected by the initial state estimation errors, and the navigation solutions, including the attitude, diverge rapidly as the initial errors increase.…
▽ More
Global Positioning System (GPS) and inertial measurement unit (IMU) sensors are commonly integrated using the extended Kalman filter (EKF), for achieving better navigation performance. However, because of nonlinearity, the performance of the EKF is affected by the initial state estimation errors, and the navigation solutions, including the attitude, diverge rapidly as the initial errors increase. This paper analyzes the data obtained from an outdoor experiment, and investigates the effect of the initial errors on the attitude estimation performance using EKF, which is used in loosely coupled low-cost smartphone GPS/IMU sensors.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Development of Record and Management Software for GPS/Loran Measurements
Authors:
Woohyun Kim,
Pyo-Woong Son,
Joon Hyo Rhee,
Jiwon Seo
Abstract:
In this paper, a software implementation that records Global Positioning System (GPS) and long-range navigation (Loran) measurement data output from an integrated GPS/Loran receiver and organizes them based on time is proposed. The purpose of the developed software is to collect measurements from multiple Loran transmitter chains for performance analysis of navigation methods using Loran, and to o…
▽ More
In this paper, a software implementation that records Global Positioning System (GPS) and long-range navigation (Loran) measurement data output from an integrated GPS/Loran receiver and organizes them based on time is proposed. The purpose of the developed software is to collect measurements from multiple Loran transmitter chains for performance analysis of navigation methods using Loran, and to organize the data based on time to make it easy to use them. In addition, GPS measurements are also collected and managed as ground truth data for performance analysis. The implemented software consists of three modules: recording, classification, and conversion. The recording module records raw text data streamed from the receiver, and the classification module classifies the recorded text data according to the message format. The conversion module parses the classified text data, sorts GPS and Loran measurements based on timestamp, and outputs them according to the software platform of the user to analyze the measurements. Each module of the software runs automatically without user intervention. The functionality of the implemented software was verified using GPS and Loran measurements collected over 24 h from an actual integrated GPS/Loran receiver.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Practical Simplified Indoor Multiwall Path-Loss Model
Authors:
Taewon Kang,
Jiwon Seo
Abstract:
Over the past few decades, attempts had been made to build a suitable channel prediction model to optimize radio transmission systems. It is particularly essential to predict the path loss due to the blockage of the signal, in indoor radio system applications. This paper proposed a multiwall path-loss propagation model for an indoor environment, operating at a transmission frequency of 2.45 GHz in…
▽ More
Over the past few decades, attempts had been made to build a suitable channel prediction model to optimize radio transmission systems. It is particularly essential to predict the path loss due to the blockage of the signal, in indoor radio system applications. This paper proposed a multiwall path-loss propagation model for an indoor environment, operating at a transmission frequency of 2.45 GHz in the industrial, scientific, and medical (ISM) radio band. The effects of the number of the walls to be traversed along the radio propagation path are considered in the model. To propose the model, the previous works on well-known indoor path loss models are discussed. Then, the path loss produced by the intervening walls in the propagation path is measured, and the terms representing the loss factors in the theoretical pathloss model are modified. The analyzed results of the path loss factors acquired at 2.45 GHz are presented. The proposed path-loss model simplifies the loss factor term with an admissible assumption of the indoor environment and predicts the path-loss factor accurately.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Threat from being Social: Vulnerability Analysis of Social Network Coupled Smart Grid
Authors:
Tianyi Pan,
Subhankar Mishra,
Lan N. Nguyen,
Gunhee Lee,
Jungmin Kang,
Jungtaek Seo,
My T. Thai
Abstract:
Social Networks (SNs) have been gradually applied by utility companies as an addition to smart grid and are proved to be helpful in smoothing load curves and reducing energy usage. However, SNs also bring in new threats to smart grid: misinformation in SNs may cause smart grid users to alter their demand, resulting in transmission line overloading and in turn leading to catastrophic impact to the…
▽ More
Social Networks (SNs) have been gradually applied by utility companies as an addition to smart grid and are proved to be helpful in smoothing load curves and reducing energy usage. However, SNs also bring in new threats to smart grid: misinformation in SNs may cause smart grid users to alter their demand, resulting in transmission line overloading and in turn leading to catastrophic impact to the grid. In this paper, we discuss the interdependency in the social network coupled smart grid and focus on its vulnerability. That is, how much can the smart grid be damaged when misinformation related to it diffuses in SNs? To analytically study the problem, we propose the Misinformation Attack Problem in Social-Smart Grid (MAPSS) that identifies the top critical nodes in the SN, such that the smart grid can be greatly damaged when misinformation propagates from those nodes. This problem is challenging as we have to incorporate the complexity of the two networks concurrently. Nevertheless, we propose a technique that can explicitly take into account information diffusion in SN, power flow balance and cascading failure in smart grid integratedly when evaluating node criticality, based on which we propose various strategies in selecting the most critical nodes. Also, we introduce controlled load shedding as a protection strategy to reduce the impact of cascading failure. The effectiveness of our algorithms are demonstrated by experiments on IEEE bus test cases as well as the Pegase data set.
△ Less
Submitted 15 May, 2020;
originally announced May 2020.
-
CAFENet: Class-Agnostic Few-Shot Edge Detection Network
Authors:
Young-Hyun Park,
Jun Seo,
Jaekyun Moon
Abstract:
We tackle a novel few-shot learning challenge, which we call few-shot semantic edge detection, aiming to localize crisp boundaries of novel categories using only a few labeled samples. We also present a Class-Agnostic Few-shot Edge detection Network (CAFENet) based on meta-learning strategy. CAFENet employs a semantic segmentation module in small-scale to compensate for lack of semantic informatio…
▽ More
We tackle a novel few-shot learning challenge, which we call few-shot semantic edge detection, aiming to localize crisp boundaries of novel categories using only a few labeled samples. We also present a Class-Agnostic Few-shot Edge detection Network (CAFENet) based on meta-learning strategy. CAFENet employs a semantic segmentation module in small-scale to compensate for lack of semantic information in edge labels. The predicted segmentation mask is used to generate an attention map to highlight the target object region, and make the decoder module concentrate on that region. We also propose a new regularization method based on multi-split matching. In meta-training, the metric-learning problem with high-dimensional vectors are divided into small subproblems with low-dimensional sub-vectors. Since there is no existing dataset for few-shot semantic edge detection, we construct two new datasets, FSE-1000 and SBD-$5^i$, and evaluate the performance of the proposed CAFENet on them. Extensive simulation results confirm the performance merits of the techniques adopted in CAFENet.
△ Less
Submitted 18 March, 2020;
originally announced March 2020.
-
Deep Learning-Based Solvability of Underdetermined Inverse Problems in Medical Imaging
Authors:
Chang Min Hyun,
Seong Hyeon Baek,
Mingyu Lee,
Sung Min Lee,
** Keun Seo
Abstract:
Recently, with the significant developments in deep learning techniques, solving underdetermined inverse problems has become one of the major concerns in the medical imaging domain. Typical examples include undersampled magnetic resonance imaging, interior tomography, and sparse-view computed tomography, where deep learning techniques have achieved excellent performances. Although deep learning me…
▽ More
Recently, with the significant developments in deep learning techniques, solving underdetermined inverse problems has become one of the major concerns in the medical imaging domain. Typical examples include undersampled magnetic resonance imaging, interior tomography, and sparse-view computed tomography, where deep learning techniques have achieved excellent performances. Although deep learning methods appear to overcome the limitations of existing mathematical methods when handling various underdetermined problems, there is a lack of rigorous mathematical foundations that would allow us to elucidate the reasons for the remarkable performance of deep learning methods. This study focuses on learning the causal relationship regarding the structure of the training data suitable for deep learning, to solve highly underdetermined inverse problems. We observe that a majority of the problems of solving underdetermined linear systems in medical imaging are highly non-linear. Furthermore, we analyze if a desired reconstruction map can be learnable from the training data and underdetermined system.
△ Less
Submitted 25 June, 2020; v1 submitted 6 January, 2020;
originally announced January 2020.
-
Automatic Compiler Based FPGA Accelerator for CNN Training
Authors:
Shreyas Kolala Venkataramanaiah,
Yufei Ma,
Shihui Yin,
Eriko Nurvithadhi,
Aravind Dasu,
Yu Cao,
Jae-sun Seo
Abstract:
Training of convolutional neural networks (CNNs)on embedded platforms to support on-device learning is earning vital importance in recent days. Designing flexible training hard-ware is much more challenging than inference hardware, due to design complexity and large computation/memory requirement. In this work, we present an automatic compiler-based FPGA accelerator with 16-bit fixed-point precisi…
▽ More
Training of convolutional neural networks (CNNs)on embedded platforms to support on-device learning is earning vital importance in recent days. Designing flexible training hard-ware is much more challenging than inference hardware, due to design complexity and large computation/memory requirement. In this work, we present an automatic compiler-based FPGA accelerator with 16-bit fixed-point precision for complete CNNtraining, including Forward Pass (FP), Backward Pass (BP) and Weight Update (WU). We implemented an optimized RTL library to perform training-specific tasks and developed an RTL compiler to automatically generate FPGA-synthesizable RTL based on user-defined constraints. We present a new cyclic weight storage/access scheme for on-chip BRAM and off-chip DRAMto efficiently implement non-transpose and transpose operations during FP and BP phases, respectively. Representative CNNs for CIFAR-10 dataset are implemented and trained on Intel Stratix 10-GX FPGA using proposed hardware architecture, demonstrating up to 479 GOPS performance.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Framelet Pooling Aided Deep Learning Network : The Method to Process High Dimensional Medical Data
Authors:
Chang Min Hyun,
Kang Cheol Kim,
Hyun Cheol Cho,
Jae Kyu Choi,
** Keun Seo
Abstract:
Machine learning-based analysis of medical images often faces several hurdles, such as the lack of training data, the curse of dimensionality problem, and the generalization issues. One of the main difficulties is that there exists computational cost problem in dealing with input data of large size matrices which represent medical images. The purpose of this paper is to introduce a framelet-poolin…
▽ More
Machine learning-based analysis of medical images often faces several hurdles, such as the lack of training data, the curse of dimensionality problem, and the generalization issues. One of the main difficulties is that there exists computational cost problem in dealing with input data of large size matrices which represent medical images. The purpose of this paper is to introduce a framelet-pooling aided deep learning method for mitigating computational bundle, caused by large dimensionality. By transforming high dimensional data into low dimensional components by filter banks with preserving detailed information, the proposed method aims to reduce the complexity of the neural network and computational costs significantly during the learning process. Various experiments show that our method is comparable to the standard unreduced learning method, while reducing computational burdens by decomposing large-sized learning tasks into several small-scale learning tasks.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Visualizing Uncertainty and Saliency Maps of Deep Convolutional Neural Networks for Medical Imaging Applications
Authors:
Jae Duk Seo
Abstract:
Deep learning models are now used in many different industries, while in certain domains safety is not a critical issue in the medical field it is a huge concern. Not only, we want the models to generalize well but we also want to know the models confidence respect to its decision and which features matter the most. Our team aims to develop a full pipeline in which not only displays the uncertaint…
▽ More
Deep learning models are now used in many different industries, while in certain domains safety is not a critical issue in the medical field it is a huge concern. Not only, we want the models to generalize well but we also want to know the models confidence respect to its decision and which features matter the most. Our team aims to develop a full pipeline in which not only displays the uncertainty of the models decision but also, the saliency map to show which sets of pixels of the input image contribute most to the predictions.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.