Search | arXiv e-print repository

Quantifying Task Priority for Multi-Task Optimization

Abstract: The goal of multi-task learning is to learn diverse tasks within a single unified network. As each task has its own unique objective function, conflicts emerge during training, resulting in negative transfer among them. Earlier research identified these conflicting gradients in shared parameters between tasks and attempted to realign them in the same direction. However, we prove that such optimiza… ▽ More The goal of multi-task learning is to learn diverse tasks within a single unified network. As each task has its own unique objective function, conflicts emerge during training, resulting in negative transfer among them. Earlier research identified these conflicting gradients in shared parameters between tasks and attempted to realign them in the same direction. However, we prove that such optimization strategies lead to sub-optimal Pareto solutions due to their inability to accurately determine the individual contributions of each parameter across various tasks. In this paper, we propose the concept of task priority to evaluate parameter contributions across different tasks. To learn task priority, we identify the type of connections related to links between parameters influenced by task-specific losses during backpropagation. The strength of connections is gauged by the magnitude of parameters to determine task priority. Based on these, we present a new method named connection strength-based optimization for multi-task learning which consists of two phases. The first phase learns the task priority within the network, while the second phase modifies the gradients while upholding this priority. This ultimately leads to finding new Pareto optimal solutions for multiple tasks. Through extensive experiments, we show that our approach greatly enhances multi-task performance in comparison to earlier gradient manipulation methods. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Journal ref: CVPR 2024

arXiv:2406.02541 [pdf, other]

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Authors: Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-** Yoon, Liang-Chieh Chen

Abstract: Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailo… ▽ More Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos. In the first stage, Video-3DGS employs an improved version of COLMAP, referred to as MC-COLMAP, which processes original videos using a Masked and Clipped approach. For each video clip, MC-COLMAP generates the point clouds for dynamic foreground objects and complex backgrounds. These point clouds are utilized to initialize two sets of 3D Gaussians (Frg-3DGS and Bkg-3DGS) aiming to represent foreground and background views. Both foreground and background views are then merged with a 2D learnable parameter map to reconstruct full views. In the second stage, we leverage the reconstruction ability developed in the first stage to impose the temporal constraints on the video diffusion model. To demonstrate the efficacy of Video-3DGS on both stages, we conduct extensive experiments across two related tasks: Video Reconstruction and Video Editing. Video-3DGS trained with 3k iterations significantly improves video reconstruction quality (+3 PSNR, +7 PSNR increase) and training efficiency (x1.9, x4.5 times faster) over NeRF-based and 3DGS-based state-of-art methods on DAVIS dataset, respectively. Moreover, it enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos. △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Project page at https://video-3dgs-project.github.io/

arXiv:2405.19469 [pdf, other]

Constraining Inflation with the BICEP/Keck CMB Polarization Experiments

Authors: The BICEP/Keck Collaboration, :, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, D. Beck, J. J. Bock, H. Boenish, V. Buza, J. R. Cheshire IV, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, E. V. Denison, M. Dierickx, L. Duband, M. Eiben, B. Elwood, S. Fatigoni, J. P. Filippini, M. Gao , et al. (63 additional authors not shown)

Abstract: The BICEP/$\textit{Keck}$ (BK) series of cosmic microwave background (CMB) polarization experiments has, over the past decade and a half, produced a series of field-leading constraints on cosmic inflation via measurements of the "B-mode" polarization of the CMB. Primordial B modes are directly tied to the amplitude of primordial gravitational waves (PGW), their strength parameterized by the tensor… ▽ More The BICEP/$\textit{Keck}$ (BK) series of cosmic microwave background (CMB) polarization experiments has, over the past decade and a half, produced a series of field-leading constraints on cosmic inflation via measurements of the "B-mode" polarization of the CMB. Primordial B modes are directly tied to the amplitude of primordial gravitational waves (PGW), their strength parameterized by the tensor-to-scalar ratio, $r$, and thus the energy scale of inflation. Having set the most sensitive constraints to-date on $r$, $σ(r)=0.009$ ($r_{0.05}<0.036, 95\%$ C.L.) using data through the 2018 observing season (``BK18''), the BICEP/$\textit{Keck}$ program has continued to improve its dataset in the years since. We give a brief overview of the BK program and the "BK18" result before discussing the program's ongoing efforts, including the deployment and performance of the $\textit{Keck Array}$'s successor instrument, BICEP Array, improvements to data processing and internal consistency testing, new techniques such as delensing, and how those will ultimately serve to allow BK reach $σ(r) \lesssim 0.003$ using data through the 2027 observing season. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 9 pages, 5 figures. Contribution to the 2024 Cosmology session of the 58th Rencontres de Moriond

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16484 [pdf, other]

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

arXiv:2404.05218 [pdf, other]

Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning

Authors: Jaewoo Jeong, Daehee Park, Kuk-** Yoon

Abstract: Human pose forecasting garners attention for its diverse applications. However, challenges in modeling the multi-modal nature of human motion and intricate interactions among agents persist, particularly with longer timescales and more agents. In this paper, we propose an interaction-aware trajectory-conditioned long-term multi-agent human pose forecasting model, utilizing a coarse-to-fine predict… ▽ More Human pose forecasting garners attention for its diverse applications. However, challenges in modeling the multi-modal nature of human motion and intricate interactions among agents persist, particularly with longer timescales and more agents. In this paper, we propose an interaction-aware trajectory-conditioned long-term multi-agent human pose forecasting model, utilizing a coarse-to-fine prediction approach: multi-modal global trajectories are initially forecasted, followed by respective local pose forecasts conditioned on each mode. In doing so, our Trajectory2Pose model introduces a graph-based agent-wise interaction module for a reciprocal forecast of local motion-conditioned global trajectory and trajectory-conditioned local pose. Our model effectively handles the multi-modality of human motion and the complexity of long-term multi-agent interactions, improving performance in complex environments. Furthermore, we address the lack of long-term (6s+) multi-agent (5+) datasets by constructing a new dataset from real-world images and 2D annotations, enabling a comprehensive evaluation of our proposed model. State-of-the-art prediction performance on both complex and simpler datasets confirms the generalized effectiveness of our method. The code is available at https://github.com/Jaewoo97/T2P. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 2024 CVPR Highlight

arXiv:2403.19985 [pdf, other]

Stable Surface Regularization for Fast Few-Shot NeRF

Authors: Byeongin Joung, Byeong-Uk Lee, Jaesung Choe, Ukcheol Shin, Minjun Kang, Taeyeop Lee, In So Kweon, Kuk-** Yoon

Abstract: This paper proposes an algorithm for synthesizing novel views under few-shot setup. The main concept is to develop a stable surface regularization technique called Annealing Signed Distance Function (ASDF), which anneals the surface in a coarse-to-fine manner to accelerate convergence speed. We observe that the Eikonal loss - which is a widely known geometric regularization - requires dense traini… ▽ More This paper proposes an algorithm for synthesizing novel views under few-shot setup. The main concept is to develop a stable surface regularization technique called Annealing Signed Distance Function (ASDF), which anneals the surface in a coarse-to-fine manner to accelerate convergence speed. We observe that the Eikonal loss - which is a widely known geometric regularization - requires dense training signal to shape different level-sets of SDF, leading to low-fidelity results under few-shot training. In contrast, the proposed surface regularization successfully reconstructs scenes and produce high-fidelity geometry with stable training. Our method is further accelerated by utilizing grid representation and monocular geometric priors. Finally, the proposed approach is up to 45 times faster than existing few-shot novel view synthesis methods, and it produces comparable results in the ScanNet dataset and NeRF-Real dataset. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 3DV 2024

arXiv:2403.10052 [pdf, other]

T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory

Authors: Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon, Jaewoo Jeong, Kuk-** Yoon

Abstract: Trajectory prediction is a challenging problem that requires considering interactions among multiple actors and the surrounding environment. While data-driven approaches have been used to address this complex problem, they suffer from unreliable predictions under distribution shifts during test time. Accordingly, several online learning methods have been proposed using regression loss from the gro… ▽ More Trajectory prediction is a challenging problem that requires considering interactions among multiple actors and the surrounding environment. While data-driven approaches have been used to address this complex problem, they suffer from unreliable predictions under distribution shifts during test time. Accordingly, several online learning methods have been proposed using regression loss from the ground truth of observed data leveraging the auto-labeling nature of trajectory prediction task. We mainly tackle the following two issues. First, previous works underfit and overfit as they only optimize the last layer of the motion decoder. To this end, we employ the masked autoencoder (MAE) for representation learning to encourage complex interaction modeling in shifted test distribution for updating deeper layers. Second, utilizing the sequential nature of driving data, we propose an actor-specific token memory that enables the test-time learning of actor-wise motion characteristics. Our proposed method has been validated across various challenging cross-dataset distribution shift scenarios including nuScenes, Lyft, Waymo, and Interaction. Our method surpasses the performance of existing state-of-the-art online learning methods in terms of both prediction accuracy and computational efficiency. The code is available at https://github.com/daeheepark/T4P. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2402.14269 [pdf, other]

Optimal Mechanism in a Dynamic Stochastic Knapsack Environment

Authors: Jihyeok Jung, Chan-Oi Song, Deok-Joo Lee, Kiho Yoon

Abstract: This study introduces an optimal mechanism in a dynamic stochastic knapsack environment. The model features a single seller who has a fixed quantity of a perfectly divisible item. Impatient buyers with a piece-wise linear utility function arrive randomly and they report the two-dimensional private information: marginal value and demanded quantity. We derive a revenue-maximizing dynamic mechanism i… ▽ More This study introduces an optimal mechanism in a dynamic stochastic knapsack environment. The model features a single seller who has a fixed quantity of a perfectly divisible item. Impatient buyers with a piece-wise linear utility function arrive randomly and they report the two-dimensional private information: marginal value and demanded quantity. We derive a revenue-maximizing dynamic mechanism in a finite discrete time framework that satisfies incentive compatibility, individual rationality, and feasibility conditions. It is achieved by characterizing buyers' utility and deriving the Bellman equation. Moreover, we propose the essential penalty scheme for incentive compatibility, as well as the allocation and payment policies. Lastly, we propose algorithms to approximate the optimal policy, based on the Monte Carlo simulation-based regression method and reinforcement learning. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 8 pages, 1 figures, presented in AAAI 38th conference on Artificial Intelligence

arXiv:2402.11837 [pdf, other]

Self-Guided Robust Graph Structure Refinement

Authors: Yeonjun In, Kanghoon Yoon, Kibum Kim, Kijung Shin, Chanyoung Park

Abstract: Recent studies have revealed that GNNs are vulnerable to adversarial attacks. To defend against such attacks, robust graph structure refinement (GSR) methods aim at minimizing the effect of adversarial edges based on node features, graph structure, or external information. However, we have discovered that existing GSR methods are limited by narrowassumptions, such as assuming clean node features,… ▽ More Recent studies have revealed that GNNs are vulnerable to adversarial attacks. To defend against such attacks, robust graph structure refinement (GSR) methods aim at minimizing the effect of adversarial edges based on node features, graph structure, or external information. However, we have discovered that existing GSR methods are limited by narrowassumptions, such as assuming clean node features, moderate structural attacks, and the availability of external clean graphs, resulting in the restricted applicability in real-world scenarios. In this paper, we propose a self-guided GSR framework (SG-GSR), which utilizes a clean sub-graph found within the given attacked graph itself. Furthermore, we propose a novel graph augmentation and a group-training strategy to handle the two technical challenges in the clean sub-graph extraction: 1) loss of structural information, and 2) imbalanced node degree distribution. Extensive experiments demonstrate the effectiveness of SG-GSR under various scenarios including non-targeted attacks, targeted attacks, feature attacks, e-commerce fraud, and noisy node labels. Our code is available at https://github.com/yeonjun-in/torch-SG-GSR. △ Less

Submitted 2 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: This paper has been accepted by TheWebConf 2024 (Oral Presentation)

arXiv:2401.09786 [pdf, other]

Adaptive Self-training Framework for Fine-grained Scene Graph Generation

Authors: Kibum Kim, Kanghoon Yoon, Yeonjun In, **young Moon, Donghyun Kim, Chanyoung Park

Abstract: Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated tr… ▽ More Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 9 pages; ICLR 2024

arXiv:2312.15906 [pdf, other]

Improving Transferability for Cross-domain Trajectory Prediction via Neural Stochastic Differential Equation

Authors: Daehee Park, Jaewoo Jeong, Kuk-** Yoon

Abstract: Multi-agent trajectory prediction is crucial for various practical applications, spurring the construction of many large-scale trajectory datasets, including vehicles and pedestrians. However, discrepancies exist among datasets due to external factors and data acquisition strategies. External factors include geographical differences and driving styles, while data acquisition strategies include dat… ▽ More Multi-agent trajectory prediction is crucial for various practical applications, spurring the construction of many large-scale trajectory datasets, including vehicles and pedestrians. However, discrepancies exist among datasets due to external factors and data acquisition strategies. External factors include geographical differences and driving styles, while data acquisition strategies include data acquisition rate, history/prediction length, and detector/tracker error. Consequently, the proficient performance of models trained on large-scale datasets has limited transferability on other small-size datasets, bounding the utilization of existing large-scale datasets. To address this limitation, we propose a method based on continuous and stochastic representations of Neural Stochastic Differential Equations (NSDE) for alleviating discrepancies due to data acquisition strategy. We utilize the benefits of continuous representation for handling arbitrary time steps and the use of stochastic representation for handling detector/tracker errors. Additionally, we propose a dataset-specific diffusion network and its training framework to handle dataset-specific detection/tracking errors. The effectiveness of our method is validated against state-of-the-art trajectory prediction models on the popular benchmark datasets: nuScenes, Argoverse, Lyft, INTERACTION, and Waymo Open Motion Dataset (WOMD). Improvement in performance gain on various source and target dataset configurations shows the generalized competence of our approach in addressing cross-dataset discrepancies. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: AAAI24

arXiv:2312.13947 [pdf, other]

PhysRFANet: Physics-Guided Neural Network for Real-Time Prediction of Thermal Effect During Radiofrequency Ablation Treatment

Authors: Minwoo Shin, Minjee Seo, Seonaeng Cho, Juil Park, Joon Ho Kwon, Deukhee Lee, Kyungho Yoon

Abstract: Radiofrequency ablation (RFA) is a widely used minimally invasive technique for ablating solid tumors. Achieving precise personalized treatment necessitates feedback information on in situ thermal effects induced by the RFA procedure. While computer simulation facilitates the prediction of electrical and thermal phenomena associated with RFA, its practical implementation in clinical settings is hi… ▽ More Radiofrequency ablation (RFA) is a widely used minimally invasive technique for ablating solid tumors. Achieving precise personalized treatment necessitates feedback information on in situ thermal effects induced by the RFA procedure. While computer simulation facilitates the prediction of electrical and thermal phenomena associated with RFA, its practical implementation in clinical settings is hindered by high computational demands. In this paper, we propose a physics-guided neural network model, named PhysRFANet, to enable real-time prediction of thermal effect during RFA treatment. The networks, designed for predicting temperature distribution and the corresponding ablation lesion, were trained using biophysical computational models that integrated electrostatics, bio-heat transfer, and cell necrosis, alongside magnetic resonance (MR) images of breast cancer patients. Validation of the computational model was performed through experiments on ex vivo bovine liver tissue. Our model demonstrated a 96% Dice score in predicting the lesion volume and an RMSE of 0.4854 for temperature distribution when tested with foreseen tumor images. Notably, even with unforeseen images, it achieved a 93% Dice score for the ablation lesion and an RMSE of 0.6783 for temperature distribution. All networks were capable of inferring results within 10 ms. The presented technique, applied to optimize the placement of the electrode for a specific target region, holds significant promise in enhancing the safety and efficacy of RFA treatments. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2311.12630 [pdf, other]

Hierarchical Joint Graph Learning and Multivariate Time Series Forecasting

Authors: Juhyeon Kim, Hyungeun Lee, Seungwon Yu, Ung Hwang, Wooyul Jung, Miseon Park, Kijung Yoon

Abstract: Multivariate time series is prevalent in many scientific and industrial domains. Modeling multivariate signals is challenging due to their long-range temporal dependencies and intricate interactions--both direct and indirect. To confront these complexities, we introduce a method of representing multivariate signals as nodes in a graph with edges indicating interdependency between them. Specificall… ▽ More Multivariate time series is prevalent in many scientific and industrial domains. Modeling multivariate signals is challenging due to their long-range temporal dependencies and intricate interactions--both direct and indirect. To confront these complexities, we introduce a method of representing multivariate signals as nodes in a graph with edges indicating interdependency between them. Specifically, we leverage graph neural networks (GNN) and attention mechanisms to efficiently learn the underlying relationships within the time series data. Moreover, we suggest employing hierarchical signal decompositions running over the graphs to capture multiple spatial dependencies. The effectiveness of our proposed model is evaluated across various real-world benchmark datasets designed for long-term forecasting tasks. The results consistently showcase the superiority of our model, achieving an average 23\% reduction in mean squared error (MSE) compared to existing models. △ Less

Submitted 30 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: Temporal Graph Learning Workshop @ NeurIPS 2023, New Orleans, United States

arXiv:2311.06834 [pdf, other]

Osteoporosis Prediction from Hand and Wrist X-rays using Image Segmentation and Self-Supervised Learning

Authors: Hyungeun Lee, Ung Hwang, Seungwon Yu, Chang-Hun Lee, Kijung Yoon

Abstract: Osteoporosis is a widespread and chronic metabolic bone disease that often remains undiagnosed and untreated due to limited access to bone mineral density (BMD) tests like Dual-energy X-ray absorptiometry (DXA). In response to this challenge, current advancements are pivoting towards detecting osteoporosis by examining alternative indicators from peripheral bone areas, with the goal of increasing… ▽ More Osteoporosis is a widespread and chronic metabolic bone disease that often remains undiagnosed and untreated due to limited access to bone mineral density (BMD) tests like Dual-energy X-ray absorptiometry (DXA). In response to this challenge, current advancements are pivoting towards detecting osteoporosis by examining alternative indicators from peripheral bone areas, with the goal of increasing screening rates without added expenses or time. In this paper, we present a method to predict osteoporosis using hand and wrist X-ray images, which are both widely accessible and affordable, though their link to DXA-based data is not thoroughly explored. Initially, our method segments the ulnar, radius, and metacarpal bones using a foundational model for image segmentation. Then, we use a self-supervised learning approach to extract meaningful representations without the need for explicit labels, and move on to classify osteoporosis in a supervised manner. Our method is evaluated on a dataset with 192 individuals, cross-referencing their verified osteoporosis conditions against the standard DXA test. With a notable classification score (AUC=0.83), our model represents a pioneering effort in leveraging vision-based techniques for osteoporosis identification from the peripheral skeleton sites. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 10 pages

arXiv:2310.10404 [pdf, other]

LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation

Authors: Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon, Yeonjun In, **young Moon, Donghyun Kim, Chanyoung Park

Abstract: Weakly-Supervised Scene Graph Generation (WSSGG) research has recently emerged as an alternative to the fully-supervised approach that heavily relies on costly annotations. In this regard, studies on WSSGG have utilized image captions to obtain unlocalized triplets while primarily focusing on grounding the unlocalized triplets over image regions. However, they have overlooked the two issues involv… ▽ More Weakly-Supervised Scene Graph Generation (WSSGG) research has recently emerged as an alternative to the fully-supervised approach that heavily relies on costly annotations. In this regard, studies on WSSGG have utilized image captions to obtain unlocalized triplets while primarily focusing on grounding the unlocalized triplets over image regions. However, they have overlooked the two issues involved in the triplet formation process from the captions: 1) Semantic over-simplification issue arises when extracting triplets from captions, where fine-grained predicates in captions are undesirably converted into coarse-grained predicates, resulting in a long-tailed predicate distribution, and 2) Low-density scene graph issue arises when aligning the triplets in the caption with entity/predicate classes of interest, where many triplets are discarded and not used in training, leading to insufficient supervision. To tackle the two issues, we propose a new approach, i.e., Large Language Model for weakly-supervised SGG (LLM4SGG), where we mitigate the two issues by leveraging the LLM's in-depth understanding of language and reasoning ability during the extraction of triplets from captions and alignment of entity/predicate classes with target data. To further engage the LLM in these processes, we adopt the idea of Chain-of-Thought and the in-context few-shot learning strategy. To validate the effectiveness of LLM4SGG, we conduct extensive experiments on Visual Genome and GQA datasets, showing significant improvements in both Recall@K and mean Recall@K compared to the state-of-the-art WSSGG methods. A further appeal is that LLM4SGG is data-efficient, enabling effective model training with a small amount of training images. △ Less

Submitted 18 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 8 pages; CVPR 2024

arXiv:2309.00760 [pdf, other]

Spatial Regression With Multiplicative Errors, and Its Application With Lidar Measurements

Authors: Hojun You, Wei-Ying Wu, Chae Young Lim, Kyubaek Yoon, Jongeun Choi

Abstract: Multiplicative errors in addition to spatially referenced observations often arise in geodetic applications, particularly in surface estimation with light detection and ranging (LiDAR) measurements. However, spatial regression involving multiplicative errors remains relatively unexplored in such applications. In this regard, we present a penalized modified least squares estimator to handle the com… ▽ More Multiplicative errors in addition to spatially referenced observations often arise in geodetic applications, particularly in surface estimation with light detection and ranging (LiDAR) measurements. However, spatial regression involving multiplicative errors remains relatively unexplored in such applications. In this regard, we present a penalized modified least squares estimator to handle the complexities of a multiplicative error structure while identifying significant variables in spatially dependent observations for surface estimation. The proposed estimator can be also applied to classical additive error spatial regression. By establishing asymptotic properties of the proposed estimator under increasing domain asymptotics with stochastic sampling design, we provide a rigorous foundation for its effectiveness. A comprehensive simulation study confirms the superior performance of our proposed estimator in accurately estimating and selecting parameters, outperforming existing approaches. To demonstrate its real-world applicability, we employ our proposed method, along with other alternative techniques, to estimate a rotational landslide surface using LiDAR measurements. The results highlight the efficacy and potential of our approach in tackling complex spatial regression problems involving multiplicative errors. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.11669 [pdf, other]

doi 10.1145/3583780.3615249

Class Label-aware Graph Anomaly Detection

Authors: Junghoon Kim, Yeonjun In, Kanghoon Yoon, Junmo Lee, Chanyoung Park

Abstract: Unsupervised GAD methods assume the lack of anomaly labels, i.e., whether a node is anomalous or not. One common observation we made from previous unsupervised methods is that they not only assume the absence of such anomaly labels, but also the absence of class labels (the class a node belongs to used in a general node classification task). In this work, we study the utility of class labels for u… ▽ More Unsupervised GAD methods assume the lack of anomaly labels, i.e., whether a node is anomalous or not. One common observation we made from previous unsupervised methods is that they not only assume the absence of such anomaly labels, but also the absence of class labels (the class a node belongs to used in a general node classification task). In this work, we study the utility of class labels for unsupervised GAD; in particular, how they enhance the detection of structural anomalies. To this end, we propose a Class Label-aware Graph Anomaly Detection framework (CLAD) that utilizes a limited amount of labeled nodes to enhance the performance of unsupervised GAD. Extensive experiments on ten datasets demonstrate the superior performance of CLAD in comparison to existing unsupervised GAD methods, even in the absence of ground-truth class label information. The source code for CLAD is available at \url{https://github.com/jhkim611/CLAD}. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: CIKM 2023 (short paper)

arXiv:2308.11608 [pdf, other]

doi 10.1103/PhysRevD.108.122005

A Measurement of Gravitational Lensing of the Cosmic Microwave Background Using SPT-3G 2018 Data

Authors: Z. Pan, F. Bianchini, W. L. K. Wu, P. A. R. Ade, Z. Ahmed, E. Anderes, A. J. Anderson, B. Ansarinejad, M. Archipley, K. Aylor, L. Balkenhol, P. S. Barry, R. Basu Thakur, K. Benabed, A. N. Bender, B. A. Benson, L. E. Bleem, F. R. Bouchet, L. Bryant, K. Byrum, E. Camphuis, J. E. Carlstrom, F. W. Carter, T. W. Cecil, C. L. Chang , et al. (111 additional authors not shown)

Abstract: We present a measurement of gravitational lensing over 1500 deg$^2$ of the Southern sky using SPT-3G temperature data at 95 and 150 GHz taken in 2018. The lensing amplitude relative to a fiducial Planck 2018 $Λ$CDM cosmology is found to be $1.020\pm0.060$, excluding instrumental and astrophysical systematic uncertainties. We conduct extensive systematic and null tests to check the robustness of th… ▽ More We present a measurement of gravitational lensing over 1500 deg$^2$ of the Southern sky using SPT-3G temperature data at 95 and 150 GHz taken in 2018. The lensing amplitude relative to a fiducial Planck 2018 $Λ$CDM cosmology is found to be $1.020\pm0.060$, excluding instrumental and astrophysical systematic uncertainties. We conduct extensive systematic and null tests to check the robustness of the lensing measurements, and report a minimum-variance combined lensing power spectrum over angular multipoles of $50<L<2000$, which we use to constrain cosmological models. When analyzed alone and jointly with primary cosmic microwave background (CMB) spectra within the $Λ$CDM model, our lensing amplitude measurements are consistent with measurements from SPT-SZ, SPTpol, ACT, and Planck. Incorporating loose priors on the baryon density and other parameters including uncertainties on a foreground bias template, we obtain a $1σ$ constraint on $σ_8 Ω_{\rm m}^{0.25}=0.595 \pm 0.026$ using the SPT-3G 2018 lensing data alone, where $σ_8$ is a common measure of the amplitude of structure today and $Ω_{\rm m}$ is the matter density parameter. Combining SPT-3G 2018 lensing measurements with baryon acoustic oscillation (BAO) data, we derive parameter constraints of $σ_8 = 0.810 \pm 0.033$, $S_8 \equiv σ_8(Ω_{\rm m}/0.3)^{0.5}= 0.836 \pm 0.039$, and Hubble constant $H_0 =68.8^{+1.3}_{-1.6}$ km s$^{-1}$ Mpc$^{-1}$. Using CMB anisotropy and lensing measurements from SPT-3G only, we provide independent constraints on the spatial curvature of $Ω_{K} = 0.014^{+0.023}_{-0.026}$ (95% C.L.) and the dark energy density of $Ω_Λ= 0.722^{+0.031}_{-0.026}$ (68% C.L.). When combining SPT-3G lensing data with SPT-3G CMB anisotropy and BAO data, we find an upper limit on the sum of the neutrino masses of $\sum m_ν< 0.30$ eV (95% C.L.). △ Less

Submitted 29 January, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: Bandpower and likelihood data available at https://pole.uchicago.edu/public/data/spt3g_2018_lensing/

Journal ref: Physical Review D 108.12 (2023): 122005

arXiv:2308.09383 [pdf, other]

Label-Free Event-based Object Recognition via Joint Learning with Image Reconstruction from Events

Authors: Hoonhee Cho, Hyeonseong Kim, Yujeong Chae, Kuk-** Yoon

Abstract: Recognizing objects from sparse and noisy events becomes extremely difficult when paired images and category labels do not exist. In this paper, we study label-free event-based object recognition where category labels and paired images are not available. To this end, we propose a joint formulation of object recognition and image reconstruction in a complementary manner. Our method first reconstruc… ▽ More Recognizing objects from sparse and noisy events becomes extremely difficult when paired images and category labels do not exist. In this paper, we study label-free event-based object recognition where category labels and paired images are not available. To this end, we propose a joint formulation of object recognition and image reconstruction in a complementary manner. Our method first reconstructs images from events and performs object recognition through Contrastive Language-Image Pre-training (CLIP), enabling better recognition through a rich context of images. Since the category information is essential in reconstructing images, we propose category-guided attraction loss and category-agnostic repulsion loss to bridge the textual features of predicted categories and the visual features of reconstructed images using CLIP. Moreover, we introduce a reliable data sampling strategy and local-global reconstruction consistency to boost joint learning of two tasks. To enhance the accuracy of prediction and quality of reconstruction, we also propose a prototype-based approach using unpaired images. Extensive experiments demonstrate the superiority of our method and its extensibility for zero-shot object recognition. Our project code is available at \url{https://github.com/Chohoonhee/Ev-LaFOR}. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023 (Oral)

arXiv:2308.05046 [pdf, other]

RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction

Authors: Sameer Khanna, Adam Dejl, Kibo Yoon, Quoc Hung Truong, Hanh Duong, Agustina Saenz, Pranav Rajpurkar

Abstract: We present RadGraph2, a novel dataset for extracting information from radiology reports that focuses on capturing changes in disease state and device placement over time. We introduce a hierarchical schema that organizes entities based on their relationships and show that using this hierarchy during training improves the performance of an information extraction model. Specifically, we propose a mo… ▽ More We present RadGraph2, a novel dataset for extracting information from radiology reports that focuses on capturing changes in disease state and device placement over time. We introduce a hierarchical schema that organizes entities based on their relationships and show that using this hierarchy during training improves the performance of an information extraction model. Specifically, we propose a modification to the DyGIE++ framework, resulting in our model HGIE, which outperforms previous models in entity and relation extraction tasks. We demonstrate that RadGraph2 enables models to capture a wider variety of findings and perform better at relation extraction compared to those trained on the original RadGraph dataset. Our work provides the foundation for develo** automated systems that can track disease progression over time and develop information extraction models that leverage the natural hierarchy of labels in the medical domain. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted at Machine Learning for Healthcare 2023

arXiv:2308.02126 [pdf, other]

Cognitive TransFuser: Semantics-guided Transformer-based Sensor Fusion for Improved Waypoint Prediction

Authors: Hwan-Soo Choi, Jongoh Jeong, Young Hoo Cho, Kuk-** Yoon, Jong-Hwan Kim

Abstract: Sensor fusion approaches for intelligent self-driving agents remain key to driving scene understanding given visual global contexts acquired from input sensors. Specifically, for the local waypoint prediction task, single-modality networks are still limited by strong dependency on the sensitivity of the input sensor, and thus recent works therefore promote the use of multiple sensors in fusion in… ▽ More Sensor fusion approaches for intelligent self-driving agents remain key to driving scene understanding given visual global contexts acquired from input sensors. Specifically, for the local waypoint prediction task, single-modality networks are still limited by strong dependency on the sensitivity of the input sensor, and thus recent works therefore promote the use of multiple sensors in fusion in feature level in practice. While it is well known that multiple data modalities encourage mutual contextual exchange, it requires global 3D scene understanding in real-time with minimal computation upon deployment to practical driving scenarios, thereby placing greater significance on the training strategy given a limited number of practically usable sensors. In this light, we exploit carefully selected auxiliary tasks that are highly correlated with the target task of interest (e.g., traffic light recognition and semantic segmentation) by fusing auxiliary task features and also using auxiliary heads for waypoint prediction based on imitation learning. Our RGB-LIDAR-based multi-task feature fusion network, coined Cognitive TransFuser, augments and exceeds the baseline network by a significant margin for safer and more complete road navigation in the CARLA simulator. We validate the proposed network on the Town05 Short and Town05 Long Benchmark through extensive experiments, achieving up to 44.2 FPS real-time inference time. △ Less

Submitted 31 January, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: Accepted to RiTA 2023

arXiv:2307.11910 [pdf, other]

Constraining white dwarf mass and magnetic field strength of a new intermediate polar through X-ray observations

Authors: Benjamin Vermette, Ciro Salcedo, Kaya Mori, Julian Gerber, Kyung Duk Yoon, Gabriel Bridges, Charles J. Hailey, Frank Haberl, Jaesub Hong, Jonathan Grindlay, Gabriele Ponti, Gavin Ramsay

Abstract: We report a broad-band analysis of a Galactic X-ray source, CXOGBS J174517.0-321356 (J1745), with a 614-second periodicity. Chandra discovered the source in the direction of the Galactic Bulge. Gong (2022) proposed J1745 was either an intermediate polar (IP) with a mass of ~1 $M_{\odot}$, or an ultra-compact X-ray binary (UCXB). By jointly fitting XMM-Newton and NuSTAR spectra, we rule out a UCXB… ▽ More We report a broad-band analysis of a Galactic X-ray source, CXOGBS J174517.0-321356 (J1745), with a 614-second periodicity. Chandra discovered the source in the direction of the Galactic Bulge. Gong (2022) proposed J1745 was either an intermediate polar (IP) with a mass of ~1 $M_{\odot}$, or an ultra-compact X-ray binary (UCXB). By jointly fitting XMM-Newton and NuSTAR spectra, we rule out a UCXB origin. We have developed a physically realistic model that considers finite magnetosphere radius, X-ray absorption from the pre-shock region, and reflection from the WD surface to determine the IP properties, especially its WD mass. To assess systematic errors on WD mass measurement, we consider a broad range of specific accretion rates ($\dot{m}$ = 0.6 - 44 g\cm$^2$\s) based on the uncertain source distance (d = 3-8 kpc) and fractional accretion area (f = 0.001-0.025). Our model properly implements the fitted accretion column height in the X-ray reflection model and accounts for the underestimated mass accretion rate due to the (unobserved) soft X-ray blackbody and cyclotron cooling emissions. We found that the lowest accretion rate of $\dot{m}$ = 0.6 g\cm$^2$\s, which corresponds to the nearest source distance and maximum f value, yield the WD mass of $(0.92\pm0.08) M_{\odot}$. However, if the accretion rate is $\dot{m}$ > ~3 g\cm$^2$\s, the WD mass is robustly measured to be $(0.81\pm0.06) M_{\odot}$, nearly independent of $\dot{m}$. The derived WD mass range is consistent with the mean WD mass of nearby IPs. Assuming spin equilibrium between the WD and accretion disk, we constrained the WD magnetic field to B > ~7 MG, indicating that it could be a highly magnetized IP. Our analysis presents the most comprehensive methodology for constraining the WD mass and B-field of an IP by consolidating the effects of cyclotron cooling, finite magnetospheric radius, and accretion column height. △ Less

Submitted 25 July, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: 19 pages, 6 figures, Accepted to ApJ

arXiv:2306.13854 [pdf, other]

Similarity Preserving Adversarial Graph Contrastive Learning

Authors: Yeonjun In, Kanghoon Yoon, Chanyoung Park

Abstract: Recent works demonstrate that GNN models are vulnerable to adversarial attacks, which refer to imperceptible perturbation on the graph structure and node features. Among various GNN models, graph contrastive learning (GCL) based methods specifically suffer from adversarial attacks due to their inherent design that highly depends on the self-supervision signals derived from the original graph, whic… ▽ More Recent works demonstrate that GNN models are vulnerable to adversarial attacks, which refer to imperceptible perturbation on the graph structure and node features. Among various GNN models, graph contrastive learning (GCL) based methods specifically suffer from adversarial attacks due to their inherent design that highly depends on the self-supervision signals derived from the original graph, which however already contains noise when the graph is attacked. To achieve adversarial robustness against such attacks, existing methods adopt adversarial training (AT) to the GCL framework, which considers the attacked graph as an augmentation under the GCL framework. However, we find that existing adversarially trained GCL methods achieve robustness at the expense of not being able to preserve the node feature similarity. In this paper, we propose a similarity-preserving adversarial graph contrastive learning (SP-AGCL) framework that contrasts the clean graph with two auxiliary views of different properties (i.e., the node similarity-preserving view and the adversarial view). Extensive experiments demonstrate that SP-AGCL achieves a competitive performance on several downstream tasks, and shows its effectiveness in various scenarios, e.g., a network with adversarial attacks, noisy labels, and heterophilous neighbors. Our code is available at https://github.com/yeonjun-in/torch-SP-AGCL. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 9 pages; KDD'23

arXiv:2306.04940 [pdf, other]

LayerAct: Advanced activation mechanism utilizing layer-direction normalization for CNNs with BatchNorm

Authors: Kihyuk Yoon, Chiehyeon Lim

Abstract: In this work, we propose a novel activation mechanism aimed at establishing layer-level activation (LayerAct) functions for CNNs with BatchNorm. These functions are designed to be more noise-robust compared to existing element-level activation functions by reducing the layer-level fluctuation of the activation outputs due to shift in inputs. Moreover, the LayerAct functions achieve this noise-robu… ▽ More In this work, we propose a novel activation mechanism aimed at establishing layer-level activation (LayerAct) functions for CNNs with BatchNorm. These functions are designed to be more noise-robust compared to existing element-level activation functions by reducing the layer-level fluctuation of the activation outputs due to shift in inputs. Moreover, the LayerAct functions achieve this noise-robustness independent of the activation's saturation state, which limits the activation output space and complicates efficient training. We present an analysis and experiments demonstrating that LayerAct functions exhibit superior noise-robustness compared to element-level activation functions, and empirically show that these functions have a zero-like mean activation. Experimental results with three clean and three out-of-distribution benchmark datasets for image classification tasks show that LayerAct functions excel in handling noisy datasets, outperforming element-level activation functions, while the performance on clean datasets is also superior in most cases. △ Less

Submitted 4 February, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 10 pages, 3 figures, 3 tables except appendix

MSC Class: 68T07 (Primary) 68T45 (Secondary)

arXiv:2305.18451 [pdf, other]

Shift-Robust Molecular Relational Learning with Causal Substructure

Authors: Namkyeong Lee, Kanghoon Yoon, Gyoung S. Na, Sein Kim, Chanyoung Park

Abstract: Recently, molecular relational learning, whose goal is to predict the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. In this work, we propose CMRL that is robust to the distributional shift in molecular relational learning by detecting the core substructure that is causally related to chemical reactions. To do so,… ▽ More Recently, molecular relational learning, whose goal is to predict the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. In this work, we propose CMRL that is robust to the distributional shift in molecular relational learning by detecting the core substructure that is causally related to chemical reactions. To do so, we first assume a causal relationship based on the domain knowledge of molecular sciences and construct a structural causal model (SCM) that reveals the relationship between variables. Based on the SCM, we introduce a novel conditional intervention framework whose intervention is conditioned on the paired molecule. With the conditional intervention framework, our model successfully learns from the causal substructure and alleviates the confounding effect of shortcut substructures that are spuriously correlated to chemical reactions. Extensive experiments on various tasks with real-world and synthetic datasets demonstrate the superiority of CMRL over state-of-the-art baseline models. Our code is available at https://github.com/Namkyeong/CMRL. △ Less

Submitted 20 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: KDD 2023

arXiv:2305.14715 [pdf, other]

Leveraging Future Relationship Reasoning for Vehicle Trajectory Prediction

Authors: Daehee Park, Hobin Ryu, Yunseo Yang, Jegyeong Cho, Jiwon Kim, Kuk-** Yoon

Abstract: Understanding the interaction between multiple agents is crucial for realistic vehicle trajectory prediction. Existing methods have attempted to infer the interaction from the observed past trajectories of agents using pooling, attention, or graph-based methods, which rely on a deterministic approach. However, these methods can fail under complex road structures, as they cannot predict various int… ▽ More Understanding the interaction between multiple agents is crucial for realistic vehicle trajectory prediction. Existing methods have attempted to infer the interaction from the observed past trajectories of agents using pooling, attention, or graph-based methods, which rely on a deterministic approach. However, these methods can fail under complex road structures, as they cannot predict various interactions that may occur in the future. In this paper, we propose a novel approach that uses lane information to predict a stochastic future relationship among agents. To obtain a coarse future motion of agents, our method first predicts the probability of lane-level waypoint occupancy of vehicles. We then utilize the temporal probability of passing adjacent lanes for each agent pair, assuming that agents passing adjacent lanes will highly interact. We also model the interaction using a probabilistic distribution, which allows for multiple possible future interactions. The distribution is learned from the posterior distribution of interaction obtained from ground truth future trajectories. We validate our method on popular trajectory prediction datasets: nuScenes and Argoverse. The results show that the proposed method brings remarkable performance gain in prediction accuracy, and achieves state-of-the-art performance in long-term prediction benchmark dataset. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: ICLR 2023

arXiv:2304.04694 [pdf, other]

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

Authors: Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-** Yoon, Liang-Chieh Chen

Abstract: Video Panoptic Segmentation (VPS) aims to achieve comprehensive pixel-level scene understanding by segmenting all pixels and associating objects in a video. Current solutions can be categorized into online and near-online approaches. Evolving over the time, each category has its own specialized designs, making it nontrivial to adapt models between different categories. To alleviate the discrepancy… ▽ More Video Panoptic Segmentation (VPS) aims to achieve comprehensive pixel-level scene understanding by segmenting all pixels and associating objects in a video. Current solutions can be categorized into online and near-online approaches. Evolving over the time, each category has its own specialized designs, making it nontrivial to adapt models between different categories. To alleviate the discrepancy, in this work, we propose a unified approach for online and near-online VPS. The meta architecture of the proposed Video-kMaX consists of two components: within clip segmenter (for clip-level segmentation) and cross-clip associater (for association beyond clips). We propose clip-kMaX (clip k-means mask transformer) and HiLA-MB (Hierarchical Location-Aware Memory Buffer) to instantiate the segmenter and associater, respectively. Our general formulation includes the online scenario as a special case by adopting clip length of one. Without bells and whistles, Video-kMaX sets a new state-of-the-art on KITTI-STEP and VIPSeg for video panoptic segmentation, and VSPW for video semantic segmentation. Code will be made publicly available. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2303.16730 [pdf, other]

TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation

Authors: Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, Kuk-** Yoon

Abstract: Test-time adaptation methods have been gaining attention recently as a practical solution for addressing source-to-target domain gaps by gradually updating the model without requiring labels on the target data. In this paper, we propose a method of test-time adaptation for category-level object pose estimation called TTA-COPE. We design a pose ensemble approach with a self-training loss using pose… ▽ More Test-time adaptation methods have been gaining attention recently as a practical solution for addressing source-to-target domain gaps by gradually updating the model without requiring labels on the target data. In this paper, we propose a method of test-time adaptation for category-level object pose estimation called TTA-COPE. We design a pose ensemble approach with a self-training loss using pose-aware confidence. Unlike previous unsupervised domain adaptation methods for category-level object pose estimation, our approach processes the test data in a sequential, online manner, and it does not require access to the source domain at runtime. Extensive experimental results demonstrate that the proposed pose ensemble and the self-training loss improve category-level object pose performance during test time under both semi-supervised and unsupervised settings. Project page: https://taeyeop.com/ttacope △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023, Project page: https://taeyeop.com/ttacope

arXiv:2303.05026 [pdf, other]

SSL^2: Self-Supervised Learning meets Semi-Supervised Learning: Multiple Sclerosis Segmentation in 7T-MRI from large-scale 3T-MRI

Authors: Jiacheng Wang, Hao Li, Han Liu, Dewei Hu, Daiwei Lu, Kee** Yoon, Kelsey Barter, Francesca Bagnato, Ipek Oguz

Abstract: Automated segmentation of multiple sclerosis (MS) lesions from MRI scans is important to quantify disease progression. In recent years, convolutional neural networks (CNNs) have shown top performance for this task when a large amount of labeled data is available. However, the accuracy of CNNs suffers when dealing with few and/or sparsely labeled datasets. A potential solution is to leverage the in… ▽ More Automated segmentation of multiple sclerosis (MS) lesions from MRI scans is important to quantify disease progression. In recent years, convolutional neural networks (CNNs) have shown top performance for this task when a large amount of labeled data is available. However, the accuracy of CNNs suffers when dealing with few and/or sparsely labeled datasets. A potential solution is to leverage the information available in large public datasets in conjunction with a target dataset which only has limited labeled data. In this paper, we propose a training framework, SSL2 (self-supervised-semi-supervised), for multi-modality MS lesion segmentation with limited supervision. We adopt self-supervised learning to leverage the knowledge from large public 3T datasets to tackle the limitations of a small 7T target dataset. To leverage the information from unlabeled 7T data, we also evaluate state-of-the-art semi-supervised methods for other limited annotation settings, such as small labeled training size and sparse annotations. We use the shifted-window (Swin) transformer1 as our backbone network. The effectiveness of self-supervised and semi-supervised training strategies is evaluated in our in-house 7T MRI dataset. The results indicate that each strategy improves lesion segmentation for both limited training data size and for sparse labeling scenarios. The combined overall framework further improves the performance substantially compared to either of its components alone. Our proposed framework thus provides a promising solution for future data/label-hungry 7T MS studies. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: Accepted at the International Society for Optics and Photonics - Medical Imaging (SPIE-MI) 2023

arXiv:2212.00443 [pdf, other]

Unbiased Heterogeneous Scene Graph Generation with Relation-aware Message Passing Neural Network

Authors: Kanghoon Yoon, Kibum Kim, **young Moon, Chanyoung Park

Abstract: Recent scene graph generation (SGG) frameworks have focused on learning complex relationships among multiple objects in an image. Thanks to the nature of the message passing neural network (MPNN) that models high-order interactions between objects and their neighboring objects, they are dominant representation learning modules for SGG. However, existing MPNN-based frameworks assume the scene graph… ▽ More Recent scene graph generation (SGG) frameworks have focused on learning complex relationships among multiple objects in an image. Thanks to the nature of the message passing neural network (MPNN) that models high-order interactions between objects and their neighboring objects, they are dominant representation learning modules for SGG. However, existing MPNN-based frameworks assume the scene graph as a homogeneous graph, which restricts the context-awareness of visual relations between objects. That is, they overlook the fact that the relations tend to be highly dependent on the objects with which the relations are associated. In this paper, we propose an unbiased heterogeneous scene graph generation (HetSGG) framework that captures relation-aware context using message passing neural networks. We devise a novel message passing layer, called relation-aware message passing neural network (RMP), that aggregates the contextual information of an image considering the predicate type between objects. Our extensive evaluations demonstrate that HetSGG outperforms state-of-the-art methods, especially outperforming on tail predicate classes. △ Less

Submitted 6 July, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: 9 pages; AAAI 2023

arXiv:2211.11432 [pdf, other]

MATE: Masked Autoencoders are Online 3D Test-Time Learners

Authors: M. Jehanzeb Mirza, Inkyu Shin, Wei Lin, Andreas Schriebl, Kunyang Sun, Jaesung Choe, Horst Possegger, Mateusz Kozinski, In So Kweon, Kun-** Yoon, Horst Bischof

Abstract: Our MATE is the first Test-Time-Training (TTT) method designed for 3D data, which makes deep networks trained for point cloud classification robust to distribution shifts occurring in test data. Like existing TTT methods from the 2D image domain, MATE also leverages test data for adaptation. Its test-time objective is that of a Masked Autoencoder: a large portion of each test point cloud is remove… ▽ More Our MATE is the first Test-Time-Training (TTT) method designed for 3D data, which makes deep networks trained for point cloud classification robust to distribution shifts occurring in test data. Like existing TTT methods from the 2D image domain, MATE also leverages test data for adaptation. Its test-time objective is that of a Masked Autoencoder: a large portion of each test point cloud is removed before it is fed to the network, tasked with reconstructing the full point cloud. Once the network is updated, it is used to classify the point cloud. We test MATE on several 3D object classification datasets and show that it significantly improves robustness of deep networks to several types of corruptions commonly occurring in 3D point clouds. We show that MATE is very efficient in terms of the fraction of points it needs for the adaptation. It can effectively adapt given as few as 5% of tokens of each test sample, making it extremely lightweight. Our experiments show that MATE also achieves competitive performance by adapting sparsely on the test data, which further reduces its computational overhead, making it ideal for real-time applications. △ Less

Submitted 20 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Code is available at this repository: https://github.com/jmiemirza/MATE

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2210.13550 [pdf, other]

doi 10.1007/s10463-023-00895-1

Regularized Nonlinear Regression with Dependent Errors and its Application to a Biomechanical Model

Authors: Hojun You, Kyubaek Yoon, Wei-Ying Wu, Jongeun Choi, Chae Young Lim

Abstract: A biomechanical model often requires parameter estimation and selection in a known but complicated nonlinear function. Motivated by observing that data from a head-neck position tracking system, one of biomechanical models, show multiplicative time dependent errors, we develop a modified penalized weighted least squares estimator. The proposed method can be also applied to a model with non-zero me… ▽ More A biomechanical model often requires parameter estimation and selection in a known but complicated nonlinear function. Motivated by observing that data from a head-neck position tracking system, one of biomechanical models, show multiplicative time dependent errors, we develop a modified penalized weighted least squares estimator. The proposed method can be also applied to a model with non-zero mean time dependent additive errors. Asymptotic properties of the proposed estimator are investigated under mild conditions on a weight matrix and the error process. A simulation study demonstrates that the proposed estimation works well in both parameter estimation and selection with time dependent error. The analysis and comparison with an existing method for head-neck position tracking data show better performance of the proposed method in terms of the variance accounted for (VAF). △ Less

Submitted 11 October, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: The article revised in overall

Journal ref: Annals of the Institute of Statistical Mathematics, 2024

arXiv:2210.12126 [pdf, other]

One-Shot Neural Fields for 3D Object Understanding

Authors: Valts Blukis, Taeyeop Lee, Jonathan Tremblay, Bowen Wen, In So Kweon, Kuk-** Yoon, Dieter Fox, Stan Birchfield

Abstract: We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from… ▽ More We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from a single RGB input image at test time by leveraging recent advances in Neural Radiance Fields (NeRF) that learn category-level priors on large multiview datasets, then fine-tune on novel objects from one or few views. We expand the NeRF model for additional grasp outputs and explore ways to leverage this representation for robotics. At test-time, we build the representation from a single RGB input image observing the scene from only one viewpoint. We find that the recovered representation allows rendering from novel views, including of occluded object parts, and also for predicting successful stable grasps. Grasp poses can be directly decoded from our latent representation with an implicit grasp decoder. We experimented in both simulation and real world and demonstrated the capability for robust robotic gras** using such compact representation. Website: https://nerfgrasp.github.io △ Less

Submitted 8 August, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) on XRNeRF: Advances in NeRF for the Metaverse 2023

arXiv:2210.08038 [pdf, other]

doi 10.3847/1538-4357/acc85c

BICEP / Keck XVII: Line of Sight Distortion Analysis: Estimates of Gravitational Lensing, Anisotropic Cosmic Birefringence, Patchy Reionization, and Systematic Errors

Authors: BICEP/Keck Collaboration, :, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, D. Beck, C. A. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. R. Cheshire IV, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, E. V. Denison, M. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini, S. Fliescher , et al. (70 additional authors not shown)

Abstract: We present estimates of line-of-sight distortion fields derived from the 95 GHz and 150 GHz data taken by BICEP2, BICEP3, and Keck Array up to the 2018 observing season, leading to cosmological constraints and a study of instrumental and astrophysical systematics. Cosmological constraints are derived from three of the distortion fields concerning gravitational lensing from large-scale structure, p… ▽ More We present estimates of line-of-sight distortion fields derived from the 95 GHz and 150 GHz data taken by BICEP2, BICEP3, and Keck Array up to the 2018 observing season, leading to cosmological constraints and a study of instrumental and astrophysical systematics. Cosmological constraints are derived from three of the distortion fields concerning gravitational lensing from large-scale structure, polarization rotation from magnetic fields or an axion-like field, and the screening effect of patchy reionization. We measure an amplitude of the lensing power spectrum $A_L^{φφ}=0.95 \pm 0.20$. We constrain polarization rotation, expressed as the coupling constant of a Chern-Simons electromagnetic term $g_{aγ} \leq 2.6 \times 10^{-2}/H_I$, where $H_I$ is the inflationary Hubble parameter, and an amplitude of primordial magnetic fields smoothed over 1 Mpc $B_{1\text{Mpc}} \leq 6.6 \;\text{nG}$ at 95 GHz. We constrain the root mean square of optical-depth fluctuations in a simple "crinkly surface" model of patchy reionization, finding $A^τ<0.19$ ($2σ$) for the coherence scale of $L_c=100$. We show that all of the distortion fields of the 95 GHz and 150 GHz polarization maps are consistent with simulations including lensed-$Λ$CDM, dust, and noise, with no evidence for instrumental systematics. In some cases, the EB and TB quadratic estimators presented here are more sensitive than our previous map-based null tests at identifying and rejecting spurious B-modes that might arise from instrumental effects. Finally, we verify that the standard deprojection filtering in the BICEP/Keck data processing is effective at removing temperature to polarization leakage. △ Less

Submitted 5 June, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 34 pages, 19 figures, accepted for publication in The Astrophysical Journal

Journal ref: ApJ (2023) 949 43

arXiv:2210.05684 [pdf, ps, other]

doi 10.3847/1538-4357/acb64c

BICEP / Keck XVI: Characterizing Dust Polarization through Correlations with Neutral Hydrogen

Authors: BICEP/Keck Collaboration, :, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, D. Beck, C. A. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. R. Cheshire IV, S. E. Clark, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, E. V. Denison, M. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini , et al. (71 additional authors not shown)

Abstract: We characterize Galactic dust filaments by correlating BICEP/Keck and Planck data with polarization templates based on neutral hydrogen (H I) observations. Dust polarization is important for both our understanding of astrophysical processes in the interstellar medium (ISM) and the search for primordial gravitational waves in the cosmic microwave background (CMB). In the diffuse ISM, H I is strongl… ▽ More We characterize Galactic dust filaments by correlating BICEP/Keck and Planck data with polarization templates based on neutral hydrogen (H I) observations. Dust polarization is important for both our understanding of astrophysical processes in the interstellar medium (ISM) and the search for primordial gravitational waves in the cosmic microwave background (CMB). In the diffuse ISM, H I is strongly correlated with the dust and partly organized into filaments that are aligned with the local magnetic field. We analyze the deep BICEP/Keck data at 95, 150, and 220 GHz, over the low-column-density region of sky where BICEP/Keck has set the best limits on primordial gravitational waves. We separate the H I emission into distinct velocity components and detect dust polarization correlated with the local Galactic H I but not with the H I associated with Magellanic Stream I. We present a robust, multifrequency detection of polarized dust emission correlated with the filamentary H I morphology template down to 95 GHz. For assessing its utility for foreground cleaning, we report that the H I morphology template correlates in B modes at a $\sim$10-65$\%$ level over the multipole range $20 < \ell < 200$ with the BICEP/Keck maps, which contain contributions from dust, CMB, and noise components. We measure the spectral index of the filamentary dust component spectral energy distribution to be $β= 1.54 \pm 0.13$. We find no evidence for decorrelation in this region between the filaments and the rest of the dust field or from the inclusion of dust associated with the intermediate velocity H I. Finally, we explore the morphological parameter space in the H I-based filamentary model. △ Less

Submitted 13 March, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: 27 pages, 12 figures

Journal ref: ApJ 945 72 (2023)

arXiv:2209.06589 [pdf, other]

Towards Better Generalization with Flexible Representation of Multi-Module Graph Neural Networks

Authors: Hyungeun Lee, Kijung Yoon

Abstract: Graph neural networks (GNNs) have become compelling models designed to perform learning and inference on graph-structured data. However, little work has been done to understand the fundamental limitations of GNNs for scaling to larger graphs and generalizing to out-of-distribution (OOD) inputs. In this paper, we use a random graph generator to systematically investigate how the graph size and stru… ▽ More Graph neural networks (GNNs) have become compelling models designed to perform learning and inference on graph-structured data. However, little work has been done to understand the fundamental limitations of GNNs for scaling to larger graphs and generalizing to out-of-distribution (OOD) inputs. In this paper, we use a random graph generator to systematically investigate how the graph size and structural properties affect the predictive performance of GNNs. We present specific evidence that the average node degree is a key feature in determining whether GNNs can generalize to unseen graphs, and that the use of multiple node update functions can improve the generalization performance of GNNs when dealing with graphs of multimodal degree distributions. Accordingly, we propose a multi-module GNN framework that allows the network to adapt flexibly to new graphs by generalizing a single canonical nonlinear transformation over aggregated inputs. Our results show that the multi-module GNNs improve the OOD generalization on a variety of inference tasks in the direction of diverse structural features. △ Less

Submitted 26 October, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

Journal ref: Transactions on Machine Learning Research (TMLR) 2023

arXiv:2208.10205 [pdf, other]

doi 10.1145/3511808.3557381

LTE4G: Long-Tail Experts for Graph Neural Networks

Authors: Sukwon Yun, Kibum Kim, Kanghoon Yoon, Chanyoung Park

Abstract: Existing Graph Neural Networks (GNNs) usually assume a balanced situation where both the class distribution and the node degree distribution are balanced. However, in real-world situations, we often encounter cases where a few classes (i.e., head class) dominate other classes (i.e., tail class) as well as in the node degree perspective, and thus naively applying existing GNNs eventually fall short… ▽ More Existing Graph Neural Networks (GNNs) usually assume a balanced situation where both the class distribution and the node degree distribution are balanced. However, in real-world situations, we often encounter cases where a few classes (i.e., head class) dominate other classes (i.e., tail class) as well as in the node degree perspective, and thus naively applying existing GNNs eventually fall short of generalizing to the tail cases. Although recent studies proposed methods to handle long-tail situations on graphs, they only focus on either the class long-tailedness or the degree long-tailedness. In this paper, we propose a novel framework for training GNNs, called Long-Tail Experts for Graphs (LTE4G), which jointly considers the class long-tailedness, and the degree long-tailedness for node classification. The core idea is to assign an expert GNN model to each subset of nodes that are split in a balanced manner considering both the class and degree long-tailedness. After having trained an expert for each balanced subset, we adopt knowledge distillation to obtain two class-wise students, i.e., Head class student and Tail class student, each of which is responsible for classifying nodes in the head classes and tail classes, respectively. We demonstrate that LTE4G outperforms a wide range of state-of-the-art methods in node classification evaluated on both manual and natural imbalanced graphs. The source code of LTE4G can be found at https://github.com/SukwonYun/LTE4G. △ Less

Submitted 8 September, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: Accepted by CIKM 2022, the code is available at https://github.com/SukwonYun/LTE4G

arXiv:2208.02755 [pdf, other]

Thermal Testing for Cryogenic CMB Instrument Optical Design

Authors: D. C. Goldfinger, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, D. Beck, C. A. Bischoff, J. J. Bock, V. Buza, J. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. J. Cukierman, E. V. Denison, M. I. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini, C. Giannakopoulos, N. Goeckner-Wald, J. Grayson, P. K. Grimes , et al. (61 additional authors not shown)

Abstract: Observations of the Cosmic Microwave Background rely on cryogenic instrumentation with cold detectors, readout, and optics providing the low noise performance and instrumental stability required to make more sensitive measurements. It is therefore critical to optimize all aspects of the cryogenic design to achieve the necessary performance, with low temperature components and acceptable system coo… ▽ More Observations of the Cosmic Microwave Background rely on cryogenic instrumentation with cold detectors, readout, and optics providing the low noise performance and instrumental stability required to make more sensitive measurements. It is therefore critical to optimize all aspects of the cryogenic design to achieve the necessary performance, with low temperature components and acceptable system cooling requirements. In particular, we will focus on our use of thermal filters and cold optics, which reduce the thermal load passed along to the cryogenic stages. To test their performance, we have made a series of in situ measurements while integrating the third receiver for the BICEP Array telescope. In addition to characterizing the behavior of this receiver, these measurements continue to refine the models that are being used to inform design choices being made for future instruments. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: 9 pages, 8 figures, Proceedings of SPIE 2022

arXiv:2208.01080 [pdf, other]

2022 Upgrade and Improved Low Frequency Camera Sensitivity for CMB Observation at the South Pole

Authors: A. Soliman, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, D. Beck, J. J. Bock, V. Buza, J. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. J. Cukierman, E. V. Denison, M. I. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini, C. Giannakopoulos, N. Goeckner-Wald, D. C. Goldfinger, J. Grayson , et al. (61 additional authors not shown)

Abstract: Constraining the Galactic foregrounds with multi-frequency Cosmic Microwave Background (CMB) observations is an essential step towards ultimately reaching the sensitivity to measure primordial gravitational waves (PGWs), the sign of inflation after the Big-Bang that would be imprinted on the CMB. The BICEP Array telescope is a set of multi-frequency cameras designed to constrain the energy scale o… ▽ More Constraining the Galactic foregrounds with multi-frequency Cosmic Microwave Background (CMB) observations is an essential step towards ultimately reaching the sensitivity to measure primordial gravitational waves (PGWs), the sign of inflation after the Big-Bang that would be imprinted on the CMB. The BICEP Array telescope is a set of multi-frequency cameras designed to constrain the energy scale of inflation through CMB B-mode searches while also controlling the polarized galactic foregrounds. The lowest frequency BICEP Array receiver (BA1) has been observing from the South Pole since 2020 and provides 30 GHz and 40 GHz data to characterize the Galactic synchrotron in our CMB maps. In this paper, we present the design of the BA1 detectors and the full optical characterization of the camera including the on-sky performance at the South Pole. The paper also introduces the design challenges during the first observing season including the effect of out-of-band photons on detectors performance. It also describes the tests done to diagnose that effect and the new upgrade to minimize these photons, as well as installing more dichroic detectors during the 2022 deployment season to improve the BA1 sensitivity. We finally report background noise measurements of the detectors with the goal of having photon noise dominated detectors in both optical channels. BA1 achieves an improvement in map** speed compared to the previous deployment season. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: Proceedings of SPIE Astronomical Telescopes + Instrumentation 2022 (AS22)

arXiv:2207.14796 [pdf, other]

Improved Polarization Calibration of the BICEP3 CMB Polarimeter at the South Pole

Authors: J. Cornelison, C. Vergès, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, D. Beck, C. A. Bischoff, J. J. Bock, V. Buza, J. R. Cheshire IV, J. Connors, M. Crumrine, A. J. Cukierman, E. V. Denison, M. I. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini, C. Giannakopoulos, N. Goeckner-Wald, D. C. Goldfinger, J. Grayson , et al. (61 additional authors not shown)

Abstract: The BICEP3 Polarimeter is a small aperture, refracting telescope, dedicated to the observation of the Cosmic Microwave Background (CMB) at 95GHz. It is designed to target degree angular scale polarization patterns, in particular the very-much-sought-after primordial B-mode signal, which is a unique signature of cosmic inflation. The polarized signal from the sky is reconstructed by differencing co… ▽ More The BICEP3 Polarimeter is a small aperture, refracting telescope, dedicated to the observation of the Cosmic Microwave Background (CMB) at 95GHz. It is designed to target degree angular scale polarization patterns, in particular the very-much-sought-after primordial B-mode signal, which is a unique signature of cosmic inflation. The polarized signal from the sky is reconstructed by differencing co-localized, orthogonally polarized superconducting Transition Edge Sensor (TES) bolometers. In this work, we present absolute measurements of the polarization response of the detectors for more than $\sim 800$ functioning detector pairs of the BICEP3 experiment, out of a total of $\sim 1000$. We use a specifically designed Rotating Polarized Source (RPS) to measure the polarization response at multiple source and telescope boresight rotation angles, to fully map the response over 360 degrees. We present here polarization properties extracted from on-site calibration data taken in January 2022. A similar calibration campaign was performed in 2018, but we found that our constraint was dominated by systematics on the level of $\sim0.5^\circ$. After a number of improvements to the calibration set-up, we are now able to report a significantly lower level of systematic contamination. In the future, such precise measurements will be used to constrain physics beyond the standard cosmological model, namely cosmic birefringence. △ Less

Submitted 25 August, 2022; v1 submitted 29 July, 2022; originally announced July 2022.

Comments: Submitted to: SPIE Astronomical Telescopes + Instrumentation (AS22)

arXiv:2206.08971 [pdf, other]

Capture the Flag for Team Construction in Cybersecurity

Authors: Sang-Yoon Chang, Kay Yoon, Simeon Wuthier, Kelei Zhang

Abstract: Team collaboration among individuals with diverse sets of expertise and skills is essential for solving complex problems. As part of an interdisciplinary effort, we studied the effects of Capture the Flag (CTF) game, a popular and engaging education/training tool in cybersecurity and engineering, in enhancing team construction and collaboration. We developed a framework to incorporate CTF as part… ▽ More Team collaboration among individuals with diverse sets of expertise and skills is essential for solving complex problems. As part of an interdisciplinary effort, we studied the effects of Capture the Flag (CTF) game, a popular and engaging education/training tool in cybersecurity and engineering, in enhancing team construction and collaboration. We developed a framework to incorporate CTF as part of a computer-human process for expertise recognition and role assignment and evaluated and tested its effectiveness through a study with cybersecurity students enrolled in a Virtual Teams course. In our computer-human process framework, the post-CTF algorithm using the CTF outcomes assembles the team (assigning individuals to teams) and provides the initial role assignments, which then gets updated by human-based team discussions. This paper shares our insights, design choices/rationales, and analyses of our CTF-incorporated computer-human process framework. The students' evaluations revealed that the computer-human process framework was helpful in learning about their team members' backgrounds and expertise and assigning roles accordingly made a positive impact on the learning outcomes for the team collaboration skills in the course. This experience report showcases the utility of CTF as a tool for expertise recognition and role assignments in teams and highlights the complementary roles of CTF-based and discussion-based processes for an effective team collaboration among engineering students. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: 7 pages, including 6 figures and 1 table

arXiv:2205.15609 [pdf, other]

Bag of Tricks for Domain Adaptive Multi-Object Tracking

Authors: Minseok Seo, Jeongwon Ryu, Kwang** Yoon

Abstract: In this paper, SIA_Track is presented which is developed by a research team from SI Analytics. The proposed method was built from pre-existing detector and tracker under the tracking-by-detection paradigm. The tracker we used is an online tracker that merely links newly received detections with existing tracks. The core part of our method is training procedure of the object detector where syntheti… ▽ More In this paper, SIA_Track is presented which is developed by a research team from SI Analytics. The proposed method was built from pre-existing detector and tracker under the tracking-by-detection paradigm. The tracker we used is an online tracker that merely links newly received detections with existing tracks. The core part of our method is training procedure of the object detector where synthetic and unlabeled real data were only used for training. To maximize the performance on real data, we first propose to use pseudo-labeling that generates imperfect labels for real data using a model trained with synthetic dataset. After that model soups scheme was applied to aggregate weights produced during iterative pseudo-labeling. Besides, cross-domain mixed sampling also helped to increase detection performance on real data. Our method, SIA_Track, takes the first place on MOTSynth2MOT17 track at BMTT 2022 challenge. The code is available on https://github.com/SIAnalytics/BMTT2022_SIA_track. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Comments: This technical paper contains a brief overview of the proposed method, SIA_Track, which wins the MOTSynth2MOT17 track at BMTT 2022 challenge

arXiv:2205.11751 [pdf]

doi 10.34133/2021/9854040

Video Capsule Endoscopy and Ingestible Electronics: Emerging Trends in Sensors, Circuits, Materials, Telemetry, Optics, and Rapid Reading Software

Authors: Dylan Miley, Leonardo Bertoncello Machado, Calvin Condo, Albert E. Jergens, Kyoung-** Yoon, Santosh Pandey

Abstract: Real-time monitoring of the gastrointestinal tract in a safe and comfortable manner is valuable for the diagnosis and therapy of many diseases. Within this realm, our review captures the trends in ingestible capsule systems with a focus on hardware and software technologies used for capsule endoscopy and remote patient monitoring. We introduce the structure and functions of the gastrointestinal tr… ▽ More Real-time monitoring of the gastrointestinal tract in a safe and comfortable manner is valuable for the diagnosis and therapy of many diseases. Within this realm, our review captures the trends in ingestible capsule systems with a focus on hardware and software technologies used for capsule endoscopy and remote patient monitoring. We introduce the structure and functions of the gastrointestinal tract, and the FDA guidelines for ingestible wireless telemetric medical devices. We survey the advanced features incorporated in ingestible capsule systems, such as microrobotics, closed-loop feedback, physiological sensing, nerve stimulation, sampling and delivery, panoramic imaging with adaptive frame rates, and rapid reading software. Examples of experimental and commercialized capsule systems are presented with descriptions of their sensors, devices, and circuits for gastrointestinal health monitoring. We also show the recent research in biocompatible materials and batteries, edible electronics, and alternative energy sources for ingestible capsule systems. The results from clinical studies are discussed for the assessment of key performance indicators related to the safety and effectiveness of ingestible capsule procedures. Lastly, the present challenges and outlook are summarized with respect to the risks to health, clinical testing and approval process, and technology adoption by patients and clinicians. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Journal ref: Advanced Devices & Instrumentation, 2021

arXiv:2205.08668 [pdf, other]

Learning Monocular Depth Estimation via Selective Distillation of Stereo Knowledge

Authors: Kyeongseob Song, Kuk-** Yoon

Abstract: Monocular depth estimation has been extensively explored based on deep learning, yet its accuracy and generalization ability still lag far behind the stereo-based methods. To tackle this, a few recent studies have proposed to supervise the monocular depth estimation network by distilling disparity maps as proxy ground-truths. However, these studies naively distill the stereo knowledge without cons… ▽ More Monocular depth estimation has been extensively explored based on deep learning, yet its accuracy and generalization ability still lag far behind the stereo-based methods. To tackle this, a few recent studies have proposed to supervise the monocular depth estimation network by distilling disparity maps as proxy ground-truths. However, these studies naively distill the stereo knowledge without considering the comparative advantages of stereo-based and monocular depth estimation methods. In this paper, we propose to selectively distill the disparity maps for more reliable proxy supervision. Specifically, we first design a decoder (MaskDecoder) that learns two binary masks which are trained to choose optimally between the proxy disparity maps and the estimated depth maps for each pixel. The learned masks are then fed to another decoder (DepthDecoder) to enforce the estimated depths to learn from only the masked area in the proxy disparity maps. Additionally, a Teacher-Student module is designed to transfer the geometric knowledge of the StereoNet to the MonoNet. Extensive experiments validate our methods achieve state-of-the-art performance for self- and proxy-supervised monocular depth estimation on the KITTI dataset, even surpassing some of the semi-supervised methods. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2204.12667 [pdf, other]

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

Authors: Inkyu Shin, Yi-Hsuan Tsai, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Sparsh Garg, In So Kweon, Kuk-** Yoon

Abstract: Test-time adaptation approaches have recently emerged as a practical solution for handling domain shift without access to the source domain data. In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. We find that directly applying existing methods usually results in performance instability at test time because multi-modal input is n… ▽ More Test-time adaptation approaches have recently emerged as a practical solution for handling domain shift without access to the source domain data. In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. We find that directly applying existing methods usually results in performance instability at test time because multi-modal input is not considered jointly. To design a framework that can take full advantage of multi-modality, where each modality provides regularized self-supervisory signals to other modalities, we propose two complementary modules within and across the modalities. First, Intra-modal Pseudolabel Generation (Intra-PG) is introduced to obtain reliable pseudo labels within each modality by aggregating information from two models that are both pre-trained on source data but updated with target data at different paces. Second, Inter-modal Pseudo-label Refinement (Inter-PR) adaptively selects more reliable pseudo labels from different modalities based on a proposed consistency scheme. Experiments demonstrate that our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios for 3D semantic segmentation. Visit our project website at https://www.nec-labs.com/~mas/MM-TTA. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: CVPR 2022

arXiv:2204.01795 [pdf, other]

Lightweight HDR Camera ISP for Robust Perception in Dynamic Illumination Conditions via Fourier Adversarial Networks

Authors: Pranjay Shyam, Sandeep Singh Sengar, Kuk-** Yoon, Kyung-Soo Kim

Abstract: The limited dynamic range of commercial compact camera sensors results in an inaccurate representation of scenes with varying illumination conditions, adversely affecting image quality and subsequently limiting the performance of underlying image processing algorithms. Current state-of-the-art (SoTA) convolutional neural networks (CNN) are developed as post-processing techniques to independently r… ▽ More The limited dynamic range of commercial compact camera sensors results in an inaccurate representation of scenes with varying illumination conditions, adversely affecting image quality and subsequently limiting the performance of underlying image processing algorithms. Current state-of-the-art (SoTA) convolutional neural networks (CNN) are developed as post-processing techniques to independently recover under-/over-exposed images. However, when applied to images containing real-world degradations such as glare, high-beam, color bleeding with varying noise intensity, these algorithms amplify the degradations, further degrading image quality. We propose a lightweight two-stage image enhancement algorithm sequentially balancing illumination and noise removal using frequency priors for structural guidance to overcome these limitations. Furthermore, to ensure realistic image quality, we leverage the relationship between frequency and spatial domain properties of an image and propose a Fourier spectrum-based adversarial framework (AFNet) for consistent image enhancement under varying illumination conditions. While current formulations of image enhancement are envisioned as post-processing techniques, we examine if such an algorithm could be extended to integrate the functionality of the Image Signal Processing (ISP) pipeline within the camera sensor benefiting from RAW sensor data and lightweight CNN architecture. Based on quantitative and qualitative evaluations, we also examine the practicality and effects of image enhancement techniques on the performance of common perception tasks such as object detection and semantic segmentation in varying illumination conditions. △ Less

Submitted 14 May, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: Accepted in BMVC 2021

arXiv:2203.16556 [pdf, other]

doi 10.58027/3q8k-ew90

The Latest Constraints on Inflationary B-modes from the BICEP/Keck Telescopes

Authors: BICEP/Keck Collaboration, :, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, D. Beck, C. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. R. Cheshire IV, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, E. V. Denison, M. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini, S. Fliescher , et al. (71 additional authors not shown)

Abstract: For the past decade, the BICEP/Keck collaboration has been operating a series of telescopes at the Amundsen-Scott South Pole Station measuring degree-scale $B$-mode polarization imprinted in the Cosmic Microwave Background (CMB) by primordial gravitational waves (PGWs). These telescopes are compact refracting polarimeters map** about 2% of the sky, observing at a broad range of frequencies to ac… ▽ More For the past decade, the BICEP/Keck collaboration has been operating a series of telescopes at the Amundsen-Scott South Pole Station measuring degree-scale $B$-mode polarization imprinted in the Cosmic Microwave Background (CMB) by primordial gravitational waves (PGWs). These telescopes are compact refracting polarimeters map** about 2% of the sky, observing at a broad range of frequencies to account for the polarized foreground from Galactic synchrotron and thermal dust emission. Our latest publication "BK18" utilizes the data collected up to the 2018 observing season, in conjunction with the publicly available WMAP and Planck data, to constrain the tensor-to-scalar ratio $r$. It particularly includes (1) the 3-year BICEP3 data which is the current deepest CMB polarization map at the foreground-minimum 95 GHz; and (2) the Keck 220 GHz map with a higher signal-to-noise ratio on the dust foreground than the Planck 353 GHz map. We fit the auto- and cross-spectra of these maps to a multicomponent likelihood model ($Λ$CDM+dust+synchrotron+noise+$r$) and find it to be an adequate description of the data at the current noise level. The likelihood analysis yields $σ(r)=0.009$. The inference of $r$ from our baseline model is tightened to $r_{0.05}=0.014^{+0.010}_{-0.011}$ and $r_{0.05}<0.036$ at 95% confidence, meaning that the BICEP/Keck $B$-mode data is the most powerful existing dataset for the constraint of PGWs. The up-coming BICEP Array telescope is projected to reach $σ(r) \lesssim 0.003$ using data up to 2027. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: 8 pages, 6 figures, contribution to the 2022 Cosmology session of the 56th Rencontres de Moriond

arXiv:2202.13144 [pdf, other]

DGSS : Domain Generalized Semantic Segmentation using Iterative Style Mining and Latent Representation Alignment

Authors: Pranjay Shyam, Antyanta Bangunharcana, Kuk-** Yoon, Kyung-Soo Kim

Abstract: Semantic segmentation algorithms require access to well-annotated datasets captured under diverse illumination conditions to ensure consistent performance. However, poor visibility conditions at varying illumination conditions result in laborious and error-prone labeling. Alternatively, using synthetic samples to train segmentation algorithms has gained interest with the drawback of domain gap tha… ▽ More Semantic segmentation algorithms require access to well-annotated datasets captured under diverse illumination conditions to ensure consistent performance. However, poor visibility conditions at varying illumination conditions result in laborious and error-prone labeling. Alternatively, using synthetic samples to train segmentation algorithms has gained interest with the drawback of domain gap that results in sub-optimal performance. While current state-of-the-art (SoTA) have proposed different mechanisms to bridge the domain gap, they still perform poorly in low illumination conditions with an average performance drop of - 10.7 mIOU. In this paper, we focus upon single source domain generalization to overcome the domain gap and propose a two-step framework wherein we first identify an adversarial style that maximizes the domain gap between stylized and source images. Subsequently, these stylized images are used to categorically align features such that features belonging to the same class are clustered together in latent space, irrespective of domain gap. Furthermore, to increase intra-class variance while training, we propose a style mixing mechanism wherein the same objects from different styles are mixed to construct a new training image. This framework allows us to achieve a domain generalized semantic segmentation algorithm with consistent performance without prior information of the target domain while relying on a single source. Based on extensive experiments, we match SoTA performance on SYNTHIA $\to$ Cityscapes, GTAV $\to$ Cityscapes while setting new SoTA on GTAV $\to$ Dark Zurich and GTAV $\to$ Night Driving benchmarks without retraining. △ Less

Submitted 26 February, 2022; originally announced February 2022.

Showing 1–50 of 202 results for author: Yoon, K