Search | arXiv e-print repository

Speed-up of Data Analysis with Kernel Trick in Encrypted Domain

Authors: Joon Soo Yoo, Baek Kyung Song, Tae Min Ahn, Ji Won Heo, Ji Won Yoon

Abstract: Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performanc… ▽ More Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performance in ML/STAT algorithms within encrypted domains. This technique, independent of underlying HE mechanisms and complementing existing optimizations, notably reduces costly HE multiplications, offering near constant time complexity relative to data dimension. Aimed at accessibility, this method is tailored for data scientists and developers with limited cryptography background, facilitating advanced data analysis in secure environments. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Submitted as a preprint

arXiv:2406.07103 [pdf, other]

MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-** Yu

Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw waveforms. The MR-RawNet extracts time-frequency representations from raw waveforms via a multi-resolution feature extractor that optimally adjusts both temporal and spectral resolutions simultaneously. Furthermore, we apply a multi-resolution attention block that focuses on diverse and extensive temporal contexts, ensuring robustness against changes in utterance length. The experimental results, conducted on VoxCeleb1 dataset, demonstrate that the MR-RawNet exhibits superior performance in handling utterances of variable duration compared to other raw waveform-based systems. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 5 pages, accepted by Interspeech 2024

arXiv:2406.02968 [pdf, other]

Adversarial Generation of Hierarchical Gaussians for 3D Generative Model

Authors: Sangeek Hyun, Jae-Pil Heo

Abstract: Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its effi… ▽ More Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a naïve generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability. Project page: https://hse1032.github.io/gsgan. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Project page: https://hse1032.github.io/gsgan

arXiv:2406.00410 [pdf, other]

Posterior Label Smoothing for Node Classification

Authors: Jaeseung Heo, Moonjeong Park, Dongwoo Kim

Abstract: Soft labels can improve the generalization of a neural network classifier in many domains, such as image classification. Despite its success, the current literature has overlooked the efficiency of label smoothing in node classification with graph-structured data. In this work, we propose a simple yet effective label smoothing for the transductive node classification task. We design the soft label… ▽ More Soft labels can improve the generalization of a neural network classifier in many domains, such as image classification. Despite its success, the current literature has overlooked the efficiency of label smoothing in node classification with graph-structured data. In this work, we propose a simple yet effective label smoothing for the transductive node classification task. We design the soft label to encapsulate the local context of the target node through the neighborhood label distribution. We apply the smoothing method for seven baseline models to show its effectiveness. The label smoothing methods improve the classification accuracy in 10 node classification datasets in most cases. In the following analysis, we find that incorporating global label statistics in posterior computation is the key to the success of label smoothing. Further investigation reveals that the soft labels mitigate overfitting during training, leading to better generalization performance. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.05426 [pdf, other]

ATLS: Automated Trailer Loading for Surface Vessels

Authors: Amer Abughaida, Meet Gandhi, Jun Heo, Vaishnav Tadiparthi, Yosuke Sakamoto, Joohyun Woo, Sangjae Bae

Abstract: Automated docking technologies of marine boats have been enlightened by an increasing number of literature. This paper contributes to the literature by proposing a mathematical framework that automates "trailer loading" in the presence of wind disturbances, which is unexplored despite its importance to boat owners. The comprehensive pipeline of localization, system identification, and trajectory o… ▽ More Automated docking technologies of marine boats have been enlightened by an increasing number of literature. This paper contributes to the literature by proposing a mathematical framework that automates "trailer loading" in the presence of wind disturbances, which is unexplored despite its importance to boat owners. The comprehensive pipeline of localization, system identification, and trajectory optimization is structured, followed by several techniques to improve performance reliability. The performance of the proposed method was demonstrated with a commercial pontoon boat in Michigan, in 2023, securing a success rate of 80\% in the presence of perception errors and wind disturbance. This result indicates the strong potential of the proposed pipeline, effectively accommodating the wind effect. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: To be presented at IEEE Intelligent Vehicles Symposium (IV 2024)

arXiv:2404.15395 [pdf, other]

doi 10.3847/1538-3881/ad32c8

Planet Hunters NGTS: New Planet Candidates from a Citizen Science Search of the Next Generation Transit Survey Public Data

Authors: Sean M. O'Brien, Megan E. Schwamb, Samuel Gill, Christopher A. Watson, Matthew R. Burleigh, Alicia Kendall, David R. Anderson, José I. Vines, James S. Jenkins, Douglas R. Alves, Laura Trouille, Solène Ulmer-Moll, Edward M. Bryant, Ioannis Apergis, Matthew P. Battley, Daniel Bayliss, Nora L. Eisner, Edward Gillen, Michael R. Goad, Maximilian N. Günther, Beth A. Henderson, Jeong-Eun Heo, David G. Jackson, Chris Lintott, James McCormac , et al. (13 additional authors not shown)

Abstract: We present the results from the first two years of the Planet Hunters NGTS citizen science project, which searches for transiting planet candidates in data from the Next Generation Transit Survey (NGTS) by enlisting the help of members of the general public. Over 8,000 registered volunteers reviewed 138,198 light curves from the NGTS Public Data Releases 1 and 2. We utilize a user weighting scheme… ▽ More We present the results from the first two years of the Planet Hunters NGTS citizen science project, which searches for transiting planet candidates in data from the Next Generation Transit Survey (NGTS) by enlisting the help of members of the general public. Over 8,000 registered volunteers reviewed 138,198 light curves from the NGTS Public Data Releases 1 and 2. We utilize a user weighting scheme to combine the classifications of multiple users to identify the most promising planet candidates not initially discovered by the NGTS team. We highlight the five most interesting planet candidates detected through this search, which are all candidate short-period giant planets. This includes the TIC-165227846 system that, if confirmed, would be the lowest-mass star to host a close-in giant planet. We assess the detection efficiency of the project by determining the number of confirmed planets from the NASA Exoplanet Archive and TESS Objects of Interest (TOIs) successfully recovered by this search and find that 74% of confirmed planets and 63% of TOIs detected by NGTS are recovered by the Planet Hunters NGTS project. The identification of new planet candidates shows that the citizen science approach can provide a complementary method to the detection of exoplanets with ground-based surveys such as NGTS. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 42 pages, 20 figures, 17 tables. To be published in AJ

Journal ref: AJ 167 (2024) 238

arXiv:2404.04911 [pdf, other]

Comparative Study of Quantum-Circuit Scalability in a Financial Problem

Authors: Jaewoong Heo, Moonjoo Lee

Abstract: Quantum computer is extensively used in solving financial problems. Quantum amplitude estimation, an algorithm that aims to estimate the amplitude of a given quantum state, can be utilized to determine the expectation value of bonds as the logic introduced in quantum risk analysis. As the number of the evaluation qubit increases, the more accurate the precise the outcome expectation value is. This… ▽ More Quantum computer is extensively used in solving financial problems. Quantum amplitude estimation, an algorithm that aims to estimate the amplitude of a given quantum state, can be utilized to determine the expectation value of bonds as the logic introduced in quantum risk analysis. As the number of the evaluation qubit increases, the more accurate the precise the outcome expectation value is. This augmentation in qubits, however, also leads to a varied escalation in circuit complexity, contingent upon the type of quantum computing device. By analyzing the number of two-qubit gates in the superconducting circuit and ion-trap quantum system, this study examines that the native gates and connectivity nature of the ion-trap system lead to less complicated quantum circuits. Across a range of experiments conducted with one to nineteen qubits, the examination reveals that the ion-trap system exhibits a two to three factor reduction in the number of required two-qubit gates when compared to the superconducting circuit system. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 19 pages, 10 figures

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01411 [pdf, other]

Reptile trapezoids

Authors: ** Heo

Abstract: A geometric figure is a reptile if it can be dissected into at least two similar copies congruent to each other. We prove that if a trapezoid is a reptile and not a parallelogram, then the length of each base is a linear combination of the lengths of its legs with rational coefficients. We then rule out isosceles trapezoids and right trapezoids which are not reptile. In particular, we prove that,… ▽ More A geometric figure is a reptile if it can be dissected into at least two similar copies congruent to each other. We prove that if a trapezoid is a reptile and not a parallelogram, then the length of each base is a linear combination of the lengths of its legs with rational coefficients. We then rule out isosceles trapezoids and right trapezoids which are not reptile. In particular, we prove that, up to similarity, there are at most six reptile right trapezoids, not a parallelogram, whose acute internal angle is a rational multiple of $π$. Finally, we present a rep-25 right trapezoid that is not a parallelogram and is not similar to any of the known reptile trapezoids. △ Less

Submitted 4 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 42 pages, 20 figures, sections 1 and 3 improved, references added for section 3, author's affiliation added

MSC Class: 05B45 (Primary) 52C20 (Secondary)

arXiv:2403.13548 [pdf, other]

Diversity-aware Channel Pruning for StyleGAN Compression

Authors: Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim, Jae-Pil Heo

Abstract: StyleGAN has shown remarkable performance in unconditional image generation. However, its high computational cost poses a significant challenge for practical applications. Although recent efforts have been made to compress StyleGAN while preserving its performance, existing compressed models still lag behind the original model, particularly in terms of sample diversity. To overcome this, we propos… ▽ More StyleGAN has shown remarkable performance in unconditional image generation. However, its high computational cost poses a significant challenge for practical applications. Although recent efforts have been made to compress StyleGAN while preserving its performance, existing compressed models still lag behind the original model, particularly in terms of sample diversity. To overcome this, we propose a novel channel pruning method that leverages varying sensitivities of channels to latent vectors, which is a key factor in sample diversity. Specifically, by assessing channel importance based on their sensitivities to latent vector perturbations, our method enhances the diversity of samples in the compressed model. Since our method solely focuses on the channel pruning stage, it has complementary benefits with prior training schemes without additional training cost. Extensive experiments demonstrate that our method significantly enhances sample diversity across various datasets. Moreover, in terms of FID scores, our method not only surpasses state-of-the-art by a large margin but also achieves comparable scores with only half training iterations. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024. Project page: https://jiwoogit.github.io/DCP-GAN_site

arXiv:2403.10543 [pdf, other]

Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs

Authors: MoonJeong Park, Jaeseung Heo, Dongwoo Kim

Abstract: Graph Neural Network (GNN) resembles the diffusion process, leading to the over-smoothing of learned representations when stacking many layers. Hence, the reverse process of message passing can produce the distinguishable node representations by inverting the forward message propagation. The distinguishable representations can help us to better classify neighboring nodes with different labels, suc… ▽ More Graph Neural Network (GNN) resembles the diffusion process, leading to the over-smoothing of learned representations when stacking many layers. Hence, the reverse process of message passing can produce the distinguishable node representations by inverting the forward message propagation. The distinguishable representations can help us to better classify neighboring nodes with different labels, such as in heterophilic graphs. In this work, we apply the design principle of the reverse process to the three variants of the GNNs. Through the experiments on heterophilic graph data, where adjacent nodes need to have different representations for successful classification, we show that the reverse process significantly improves the prediction performance in many cases. Additional analysis reveals that the reverse mechanism can mitigate the over-smoothing over hundreds of layers. Our code is available at https://github.com/ml-postech/reverse-gnn. △ Less

Submitted 11 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: Accepted by ICML 2024

arXiv:2403.07907 [pdf]

Reflection of Federal Data Protection Standards on Cloud Governance

Authors: Olga Dye, Justin Heo, Ebru Celikel Cankaya

Abstract: As demand for more storage and processing power increases rapidly, cloud services in general are becoming more ubiquitous and popular. This, in turn, is increasing the need for develo** highly sophisticated mechanisms and governance to reduce data breach risks in cloud-based infrastructures. Our research focuses on cloud governance by harmoniously combining multiple data security measures with l… ▽ More As demand for more storage and processing power increases rapidly, cloud services in general are becoming more ubiquitous and popular. This, in turn, is increasing the need for develo** highly sophisticated mechanisms and governance to reduce data breach risks in cloud-based infrastructures. Our research focuses on cloud governance by harmoniously combining multiple data security measures with legislative authority. We present legal aspects aimed at the prevention of data breaches, as well as the technical requirements regarding the implementation of data protection mechanisms. Specifically, we discuss primary authority and technical frameworks addressing least privilege in correlation with its application in Amazon Web Services (AWS), one of the major Cloud Service Providers (CSPs) on the market at present. △ Less

Submitted 26 February, 2024; originally announced March 2024.

ACM Class: F.2.2, I.2.7

arXiv:2401.11899 [pdf, other]

Efficiency in random allocation with ordinal rules

Authors: Samson Alva, Eun Jeong Heo, Vikram Manjunath

Abstract: We study ordinal rules for allocating indivisible goods via lottery. Ordinality requires a rule to consider only how agents rank degenerate lotteries and may be necessitated by cognitive, informational, or as we show, incentive constraints. The limited responsiveness of ordinal rules to agents' preferences means that they can only satisfy welfare properties based on first order stochastic dominanc… ▽ More We study ordinal rules for allocating indivisible goods via lottery. Ordinality requires a rule to consider only how agents rank degenerate lotteries and may be necessitated by cognitive, informational, or as we show, incentive constraints. The limited responsiveness of ordinal rules to agents' preferences means that they can only satisfy welfare properties based on first order stochastic dominance, which is incomplete. We define a new efficiency concept for ordinal rules. While ordinality and efficiency together are incompatible with the usual notions of fairness and somewhat limit randomization, they do leave room for a rich class of rules. We demonstrate this through a characterization of all ordinal, efficient, strategy-proof, non-bossy, boundedly invariant, and neutral rules. △ Less

Submitted 23 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.01259 [pdf, other]

Do Concept Bottleneck Models Obey Locality?

Authors: Naveen Raman, Mateo Espinosa Zarlenga, Juyeon Heo, Mateja Jamnik

Abstract: Concept-based methods explain model predictions using human-understandable concepts. These models require accurate concept predictors, yet the faithfulness of existing concept predictors to their underlying concepts is unclear. In this paper, we investigate the faithfulness of Concept Bottleneck Models (CBMs), a popular family of concept-based architectures, by looking at whether they respect "loc… ▽ More Concept-based methods explain model predictions using human-understandable concepts. These models require accurate concept predictors, yet the faithfulness of existing concept predictors to their underlying concepts is unclear. In this paper, we investigate the faithfulness of Concept Bottleneck Models (CBMs), a popular family of concept-based architectures, by looking at whether they respect "localities" in datasets. Localities involve using only relevant features when predicting a concept's value. When localities are not considered, concepts may be predicted based on spuriously correlated features, degrading performance and robustness. This work examines how CBM predictions change when perturbing model inputs, and reveals that CBMs may not capture localities, even when independent concepts are localised to non-overlap** feature subsets. Our empirical and theoretical results demonstrate that datasets with correlated concepts may lead to accurate but uninterpretable models that fail to learn localities. Overall, we find that CBM interpretability is fragile, as CBMs occasionally rely upon spurious features, necessitating further research into the robustness of concept predictors. △ Less

Submitted 28 May, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: Previous Version Accepted at NeurIPs 23 XAI in Action Workshop

arXiv:2312.17526 [pdf, other]

Noise-free Optimization in Early Training Steps for Image Super-Resolution

Authors: MinKyu Lee, Jae-Pil Heo

Abstract: Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investiga… ▽ More Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. Codes are available at github.com/2minkyulee/ECO. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024. Codes are available at github.com/2minkyulee/ECO

arXiv:2312.16580 [pdf, other]

VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

Authors: Seunggu Kang, WonJun Moon, Euiyeon Kim, Jae-Pil Heo

Abstract: Zero-Shot Object Counting (ZSOC) aims to count referred instances of arbitrary classes in a query image without human-annotated exemplars. To deal with ZSOC, preceding studies proposed a two-stage pipeline: discovering exemplars and counting. However, there remains a challenge of vulnerability to error propagation of the sequentially designed two-stage process. In this work, an one-stage baseline,… ▽ More Zero-Shot Object Counting (ZSOC) aims to count referred instances of arbitrary classes in a query image without human-annotated exemplars. To deal with ZSOC, preceding studies proposed a two-stage pipeline: discovering exemplars and counting. However, there remains a challenge of vulnerability to error propagation of the sequentially designed two-stage process. In this work, an one-stage baseline, Visual-Language Baseline (VLBase), exploring the implicit association of the semantic-patch embeddings of CLIP is proposed. Subsequently, the extension of VLBase to Visual-language Counter (VLCounter) is achieved by incorporating three modules devised to tailor VLBase for object counting. First, Semantic-conditioned Prompt Tuning (SPT) is introduced within the image encoder to acquire target-highlighted representations. Second, Learnable Affine Transformation (LAT) is employed to translate the semantic-patch similarity map to be appropriate for the counting task. Lastly, the layer-wisely encoded features are transferred to the decoder through Segment-aware Skip Connection (SaSC) to keep the generalization capability for unseen classes. Through extensive experiments on FSC147, CARPK, and PUCPR+, the benefits of the end-to-end framework, VLCounter, are demonstrated. △ Less

Submitted 30 December, 2023; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024. Code is available at https://github.com/Seunggu0305/VLCounter

arXiv:2312.15894 [pdf, other]

Task-Disruptive Background Suppression for Few-Shot Segmentation

Authors: Suho Park, SuBeen Lee, Sangeek Hyun, Hyun Seok Seong, Jae-Pil Heo

Abstract: Few-shot segmentation aims to accurately segment novel target objects within query images using only a limited number of annotated support images. The recent works exploit support background as well as its foreground to precisely compute the dense correlations between query and support. However, they overlook the characteristics of the background that generally contains various types of objects. I… ▽ More Few-shot segmentation aims to accurately segment novel target objects within query images using only a limited number of annotated support images. The recent works exploit support background as well as its foreground to precisely compute the dense correlations between query and support. However, they overlook the characteristics of the background that generally contains various types of objects. In this paper, we highlight this characteristic of background which can bring problematic cases as follows: (1) when the query and support backgrounds are dissimilar and (2) when objects in the support background are similar to the target object in the query. Without any consideration of the above cases, adopting the entire support background leads to a misprediction of the query foreground as background. To address this issue, we propose Task-disruptive Background Suppression (TBS), a module to suppress those disruptive support background features based on two spatial-wise scores: query-relevant and target-relevant scores. The former aims to mitigate the impact of unshared features solely existing in the support background, while the latter aims to reduce the influence of target-similar support background features. Based on these two scores, we define a query background relevant score that captures the similarity between the backgrounds of the query and the support, and utilize it to scale support background features to adaptively restrict the impact of disruptive support backgrounds. Our proposed method achieves state-of-the-art performance on PASCAL-5 and COCO-20 datasets on 1-shot segmentation. Our official code is available at github.com/SuhoPark0706/TBSNet. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.15861 [pdf, other]

Towards Squeezing-Averse Virtual Try-On via Sequential Deformation

Authors: Sang-Heon Shim, Jiwoo Chung, Jae-Pil Heo

Abstract: In this paper, we first investigate a visual quality degradation problem observed in recent high-resolution virtual try-on approach. The tendency is empirically found that the textures of clothes are squeezed at the sleeve, as visualized in the upper row of Fig.1(a). A main reason for the issue arises from a gradient conflict between two popular losses, the Total Variation (TV) and adversarial los… ▽ More In this paper, we first investigate a visual quality degradation problem observed in recent high-resolution virtual try-on approach. The tendency is empirically found that the textures of clothes are squeezed at the sleeve, as visualized in the upper row of Fig.1(a). A main reason for the issue arises from a gradient conflict between two popular losses, the Total Variation (TV) and adversarial losses. Specifically, the TV loss aims to disconnect boundaries between the sleeve and torso in a warped clothing mask, whereas the adversarial loss aims to combine between them. Such contrary objectives feedback the misaligned gradients to a cascaded appearance flow estimation, resulting in undesirable squeezing artifacts. To reduce this, we propose a Sequential Deformation (SD-VITON) that disentangles the appearance flow prediction layers into TV objective-dominant (TVOB) layers and a task-coexistence (TACO) layer. Specifically, we coarsely fit the clothes onto a human body via the TVOB layers, and then keep on refining via the TACO layer. In addition, the bottom row of Fig.1(a) shows a different type of squeezing artifacts around the waist. To address it, we further propose that we first warp the clothes into a tucked-out shirts style, and then partially erase the texture from the warped clothes without hurting the smoothness of the appearance flows. Experimental results show that our SD-VITON successfully resolves both types of artifacts and outperforms the baseline methods. Source code will be available at https://github.com/SHShim0513/SD-VITON. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.12100 [pdf, other]

VITA: 'Carefully Chosen and Weighted Less' Is Better in Medication Recommendation

Authors: Taeri Kim, Jiho Heo, Hongil Kim, Kijung Shin, Sang-Wook Kim

Abstract: We address the medication recommendation problem, which aims to recommend effective medications for a patient's current visit by utilizing information (e.g., diagnoses and procedures) given at the patient's current and past visits. While there exist a number of recommender systems designed for this problem, we point out that they are challenged in accurately capturing the relation (spec., the degr… ▽ More We address the medication recommendation problem, which aims to recommend effective medications for a patient's current visit by utilizing information (e.g., diagnoses and procedures) given at the patient's current and past visits. While there exist a number of recommender systems designed for this problem, we point out that they are challenged in accurately capturing the relation (spec., the degree of relevance) between the current and each of the past visits for the patient when obtaining her current health status, which is the basis for recommending medications. To address this limitation, we propose a novel medication recommendation framework, named VITA, based on the following two novel ideas: (1) relevant-Visit selectIon; (2) Target-aware Attention. Through extensive experiments using real-world datasets, we demonstrate the superiority of VITA (spec., up to 5.56% higher accuracy, in terms of Jaccard, than the best competitor) and the effectiveness of its two core ideas. The code is available at https://github.com/jhheo0123/VITA. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.09008 [pdf, other]

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

Authors: Jiwoo Chung, Sangeek Hyun, Jae-Pil Heo

Abstract: Despite the impressive generative capabilities of diffusion models, existing diffusion model-based style transfer methods require inference-stage optimization (e.g. fine-tuning or textual inversion of style) which is time-consuming, or fails to leverage the generative ability of large-scale diffusion models. To address these issues, we introduce a novel artistic style transfer method based on a pr… ▽ More Despite the impressive generative capabilities of diffusion models, existing diffusion model-based style transfer methods require inference-stage optimization (e.g. fine-tuning or textual inversion of style) which is time-consuming, or fails to leverage the generative ability of large-scale diffusion models. To address these issues, we introduce a novel artistic style transfer method based on a pre-trained large-scale diffusion model without any optimization. Specifically, we manipulate the features of self-attention layers as the way the cross-attention mechanism works; in the generation process, substituting the key and value of content with those of style image. This approach provides several desirable characteristics for style transfer including 1) preservation of content by transferring similar styles into similar image patches and 2) transfer of style based on similarity of local texture (e.g. edge) between content and style images. Furthermore, we introduce query preservation and attention temperature scaling to mitigate the issue of disruption of original content, and initial latent Adaptive Instance Normalization (AdaIN) to deal with the disharmonious color (failure to transfer the colors of style). Our experimental results demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer baselines. △ Less

Submitted 20 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted to CVPR 2024. Project page: https://jiwoogit.github.io/StyleID_site

arXiv:2312.08063 [pdf, other]

Estimation of Concept Explanations Should be Uncertainty Aware

Authors: Vihari Piratla, Juyeon Heo, Katherine M. Collins, Sukriti Singh, Adrian Weller

Abstract: Model explanations can be valuable for interpreting and debugging predictive models. We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts. Although popular for their easy interpretation, concept explanations are known to be noisy. We begin our work by identifying various sources of uncertainty in the estimation pipeline th… ▽ More Model explanations can be valuable for interpreting and debugging predictive models. We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts. Although popular for their easy interpretation, concept explanations are known to be noisy. We begin our work by identifying various sources of uncertainty in the estimation pipeline that lead to such noise. We then propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations. We demonstrate with theoretical analysis and empirical evaluation that explanations computed by our method are robust to train-time choices while also being label-efficient. Further, our method proved capable of recovering relevant concepts amongst a bank of thousands, in an evaluation with real-datasets and off-the-shelf models, demonstrating its scalability. We believe the improved quality of uncertainty-aware concept explanations make them a strong candidate for more reliable model interpretation. We release our code at https://github.com/vps-anonconfs/uace. △ Less

Submitted 5 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.03465 [pdf, other]

Quantum-secured single-pixel imaging under general spoofing attacks

Authors: Jaesung Heo, Taek Jeong, Nam Hun Park, Yonggi Jo

Abstract: In this paper, we introduce a quantum-secured single-pixel imaging (QS-SPI) technique designed to withstand spoofing attacks, wherein adversaries attempt to deceive imaging systems with fake signals. Unlike previous quantum-secured protocols that impose a threshold error rate limiting their operation, even with the existence of true signals, our approach not only identifies spoofing attacks but al… ▽ More In this paper, we introduce a quantum-secured single-pixel imaging (QS-SPI) technique designed to withstand spoofing attacks, wherein adversaries attempt to deceive imaging systems with fake signals. Unlike previous quantum-secured protocols that impose a threshold error rate limiting their operation, even with the existence of true signals, our approach not only identifies spoofing attacks but also facilitates the reconstruction of a true image. Our method involves the analysis of a specific mode correlation of a photon-pair, which is independent of the mode used for image construction, to check security. Through this analysis, we can identify both the targeted image region by the attack and the type of spoofing attack, enabling reconstruction of the true image. A proof-of-principle demonstration employing polarization-correlation of a photon-pair is provided, showcasing successful image reconstruction even under the condition of spoofing signals 2000 times stronger than the true signals. We expect our approach to be applied to quantum-secured signal processing such as quantum target detection or ranging. △ Less

Submitted 5 February, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: 9 pages, 6 figures

arXiv:2311.08835 [pdf, other]

Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding

Authors: WonJun Moon, Sangeek Hyun, SuBeen Lee, Jae-Pil Heo

Abstract: Video Temporal Grounding is to identify specific moments or highlights from a video corresponding to textual descriptions. Typical approaches in temporal grounding treat all video clips equally during the encoding process regardless of their semantic relevance with the text query. Therefore, we propose Correlation-Guided DEtection TRansformer(CG-DETR), exploring to provide clues for query-associat… ▽ More Video Temporal Grounding is to identify specific moments or highlights from a video corresponding to textual descriptions. Typical approaches in temporal grounding treat all video clips equally during the encoding process regardless of their semantic relevance with the text query. Therefore, we propose Correlation-Guided DEtection TRansformer(CG-DETR), exploring to provide clues for query-associated video clips within the cross-modal attention. First, we design an adaptive cross-attention with dummy tokens. Dummy tokens conditioned by text query take portions of the attention weights, preventing irrelevant video clips from being represented by the text query. Yet, not all words equally inherit the text query's correlation to video clips. Thus, we further guide the cross-attention map by inferring the fine-grained correlation between video clips and words. We enable this by learning a joint embedding space for high-level concepts, i.e., moment and sentence level, and inferring the clip-word correlation. Lastly, we exploit the moment-specific characteristics and combine them with the context of each video to form a moment-adaptive saliency detector. By exploiting the degrees of text engagement in each video clip, it precisely measures the highlightness of each clip. CG-DETR achieves state-of-the-art results on various benchmarks for temporal grounding. △ Less

Submitted 30 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 34 pages, 16 figures, 13 tables, Code is available at https://github.com/wjun0830/CGDETR

arXiv:2311.06243 [pdf, other]

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Authors: Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf

Abstract: Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly larg… ▽ More Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language. △ Less

Submitted 28 April, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: ICLR 2024 (v2: 34 pages, 19 figures)

arXiv:2310.17024 [pdf, other]

SPLUS J142445.34-254247.1: An R-Process Enhanced, Actinide-Boost, Extremely Metal-Poor star observed with GHOST

Authors: Vinicius M. Placco, Felipe Almeida-Fernandes, Erika M. Holmbeck, Ian U. Roederer, Mohammad K. Mardini, Christian R. Hayes, Kim Venn, Kristin Chiboucas, Emily Deibert, Roberto Gamen, Jeong-Eun Heo, Miji Jeong, Venu Kalari, Eder Martioli, Siyi Xu, Ruben Diaz, Manuel Gomez-Jimenez, David Henderson, Pablo Prado, Carlos Quiroz, Roque Ruiz-Carmona, Chris Simpson, Cristian Urrutia, Alan W. McConnachie, John Pazder , et al. (11 additional authors not shown)

Abstract: We report on the chemo-dynamical analysis of SPLUS J142445.34-254247.1, an extremely metal-poor halo star enhanced in elements formed by the rapid neutron-capture process. This star was first selected as a metal-poor candidate from its narrow-band S-PLUS photometry and followed up spectroscopically in medium-resolution with Gemini South/GMOS, which confirmed its low-metallicity status. High-resolu… ▽ More We report on the chemo-dynamical analysis of SPLUS J142445.34-254247.1, an extremely metal-poor halo star enhanced in elements formed by the rapid neutron-capture process. This star was first selected as a metal-poor candidate from its narrow-band S-PLUS photometry and followed up spectroscopically in medium-resolution with Gemini South/GMOS, which confirmed its low-metallicity status. High-resolution spectroscopy was gathered with GHOST at Gemini South, allowing for the determination of chemical abundances for 36 elements, from carbon to thorium. At [Fe/H]=-3.39, SPLUS J1424-2542 is one of the lowest metallicity stars with measured Th and has the highest logeps(Th/Eu) observed to date, making it part of the "actinide-boost" category of r-process enhanced stars. The analysis presented here suggests that the gas cloud from which SPLUS J1424-2542 was formed must have been enriched by at least two progenitor populations. The light-element (Z<=30) abundance pattern is consistent with the yields from a supernova explosion of metal-free stars with 11.3-13.4 Msun, and the heavy-element (Z>=38) abundance pattern can be reproduced by the yields from a neutron star merger (1.66Msun and 1.27Msun) event. A kinematical analysis also reveals that SPLUS J1424-2542 is a low-mass, old halo star with a likely in-situ origin, not associated with any known early merger events in the Milky Way. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 26 pages, 11 figures, accepted for publication on ApJ

arXiv:2310.14208 [pdf]

doi 10.1007/s40042-023-00729-6

Electrical conductivity enhancement of epitaxially grown TiN thin films

Authors: Yeong Gwang Khim, Beom** Park, ** Eun Heo, Young Hun Khim, Young Rok Khim, Minsun Gu, Tae Gyu Rhee, Seo Hyoung Chang, Moonsup Han, Young Jun Chang

Abstract: Titanium nitride (TiN) presents superior electrical conductivity with mechanical and chemical stability and compatibility with the semiconductor fabrication process. Here, we fabricated epitaxial and polycrystalline TiN (111) thin films on MgO (111), sapphire (001), and mica substrates at 640oC and room temperature by using a DC sputtering, respectively. The epitaxial films show less amount of sur… ▽ More Titanium nitride (TiN) presents superior electrical conductivity with mechanical and chemical stability and compatibility with the semiconductor fabrication process. Here, we fabricated epitaxial and polycrystalline TiN (111) thin films on MgO (111), sapphire (001), and mica substrates at 640oC and room temperature by using a DC sputtering, respectively. The epitaxial films show less amount of surface oxidation than the polycrystalline ones grown at room temperature. The epitaxial films show drastically reduced resistivity (~30 micro-ohm-cm), much smaller than the polycrystalline films. Temperature-dependent resistivity measurements show a nearly monotonic temperature slope down to low temperature. These results demonstrate that high temperature growth of TiN thin films leads to significant enhancement of electrical conductivity, promising for durable and scalable electrode applications. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 14 pages, 3 figures

Journal ref: Journal of the Korean Physical Society 82, 486 (2023)

arXiv:2310.03075 [pdf, other]

doi 10.1093/mnras/stad3673

Probing the early Milky Way with GHOST spectra of an extremely metal-poor star in the Galactic disk

Authors: Anya Dovgal, Kim A. Venn, Federico Sestito, Christian R. Hayes, Alan W. McConnachie, Julio F. Navarro, Vinicius M. Placco, Else Starkenburg, Nicolas F. Martin, John S. Pazder, Kristin Chiboucas, Emily Deibert, Roberto Gamen, Jeong-Eun Heo, Venu M. Kalari, Eder Martioli, Siyi Xu, Ruben Diaz, Manuel Gomez-Jiminez, David Henderson, Pablo Prado, Carlos Quiroz, J. Gordon Robertson, Roque Ruiz-Carmona, Chris Simpson , et al. (9 additional authors not shown)

Abstract: Pristine_183.6849+04.8619 (P1836849) is an extremely metal-poor ([Fe/H]$=-3.3\pm0.1$) star on a prograde orbit confined to the Galactic disk. Such stars are rare and may have their origins in protogalactic fragments that formed the early Milky Way, in low mass satellites accreted later, or forming in situ in the Galactic plane. Here we present a chemo-dynamical analysis of the spectral features be… ▽ More Pristine_183.6849+04.8619 (P1836849) is an extremely metal-poor ([Fe/H]$=-3.3\pm0.1$) star on a prograde orbit confined to the Galactic disk. Such stars are rare and may have their origins in protogalactic fragments that formed the early Milky Way, in low mass satellites accreted later, or forming in situ in the Galactic plane. Here we present a chemo-dynamical analysis of the spectral features between $3700-11000$Å from a high-resolution spectrum taken during Science Verification of the new Gemini High-resolution Optical SpecTrograph (GHOST). Spectral features for many chemical elements are analysed (Mg, Al, Si, Ca, Sc, Ti, Cr, Mn, Fe, Ni), and valuable upper limits are determined for others (C, Na, Sr, Ba). This main sequence star exhibits several rare chemical signatures, including (i) extremely low metallicity for a star in the Galactic disk, (ii) very low abundances of the light $α$-elements (Na, Mg, Si) compared to other metal-poor stars, and (iii) unusually large abundances of Cr and Mn, where [Cr, Mn/Fe]$_{\rm NLTE}>+0.5$. A comparison to theoretical yields from supernova models suggests that two low mass Population III objects (one 10 M$_\odot$ supernova and one 17 M$_\odot$ hypernova) can reproduce the abundance pattern well (reduced $χ^2<1$). When this star is compared to other extremely metal-poor stars on quasi-circular, prograde planar orbits, differences in both chemistry and kinematics imply there is little evidence for a common origin. The unique chemistry of P1836849 is discussed in terms of the earliest stages in the formation of the Milky Way. △ Less

Submitted 26 November, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: 16 pages, 10 figures, 6 tables. Accepted by MNRAS November 22; Revisions include comparisons to more EMP stars, results unchanged

Journal ref: MNRAS 527 (2024) 7810-7824

arXiv:2309.15531 [pdf, other]

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

Authors: Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee

Abstract: Large Language Models (LLMs) have recently demonstrated remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to the large memory bottleneck, specifically in small batch inference settings (e.g. mobile devices). Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outlier… ▽ More Large Language Models (LLMs) have recently demonstrated remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to the large memory bottleneck, specifically in small batch inference settings (e.g. mobile devices). Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outliers. To mitigate the undesirable outlier effect, we first propose per-IC quantization, a simple yet effective method that creates quantization groups within each input channel (IC) rather than the conventional per-output-channel (per-OC). Our method is motivated by the observation that activation outliers affect the input dimension of the weight matrix, so similarly grou** the weights in the IC direction can isolate outliers within a group. We also find that activation outliers do not dictate quantization difficulty, and inherent weight sensitivities also exist. With per-IC quantization as a new outlier-friendly scheme, we propose Adaptive Dimensions (AdaDim), a versatile quantization framework that can adapt to various weight sensitivity patterns. We demonstrate the effectiveness of AdaDim by augmenting prior methods such as Round-To-Nearest and GPTQ, showing significant improvements across various language modeling benchmarks for both base (up to +4.7% on MMLU) and instruction-tuned (up to +10% on HumanEval) LLMs. Code is available at https://github.com/johnheo/adadim-llm △ Less

Submitted 24 March, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: ICLR 2024. 19 pages, 11 figures, 10 tables

arXiv:2309.11772 [pdf, other]

Active Learning for a Recursive Non-Additive Emulator for Multi-Fidelity Computer Experiments

Authors: Junoh Heo, Chih-Li Sung

Abstract: Computer simulations have become essential for analyzing complex systems, but high-fidelity simulations often come with significant computational costs. To tackle this challenge, multi-fidelity computer experiments have emerged as a promising approach that leverages both low-fidelity and high-fidelity simulations, enhancing both the accuracy and efficiency of the analysis. In this paper, we introd… ▽ More Computer simulations have become essential for analyzing complex systems, but high-fidelity simulations often come with significant computational costs. To tackle this challenge, multi-fidelity computer experiments have emerged as a promising approach that leverages both low-fidelity and high-fidelity simulations, enhancing both the accuracy and efficiency of the analysis. In this paper, we introduce a new and flexible statistical model, the Recursive Non-Additive (RNA) emulator, that integrates the data from multi-fidelity computer experiments. Unlike conventional multi-fidelity emulation approaches that rely on an additive auto-regressive structure, the proposed RNA emulator recursively captures the relationships between multi-fidelity data using Gaussian process priors without making the additive assumption, allowing the model to accommodate more complex data patterns. Importantly, we derive the posterior predictive mean and variance of the emulator, which can be efficiently computed in a closed-form manner, leading to significant improvements in computational efficiency. Additionally, based on this emulator, we introduce four active learning strategies that optimize the balance between accuracy and simulation costs to guide the selection of the fidelity level and input locations for the next simulation run. We demonstrate the effectiveness of the proposed approach in a suite of synthetic examples and a real-world problem. An R package RNAmf for the proposed methodology is provided on CRAN. △ Less

Submitted 4 June, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 37 pages for the paper including references, 17 pages for supplementary

arXiv:2309.08320 [pdf, other]

Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-** Yu

Abstract: Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV… ▽ More Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV framework that leverages DPM. Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure. The proposed model was evaluated under both in-domain and out-of-domain noisy conditions using the VoxCeleb1 test set, an external noise source, and the VOiCES corpus. The obtained experimental results demonstrate that Diff-SV achieves state-of-the-art performance, outperforming recently proposed noise-robust SV systems. △ Less

Submitted 13 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: 5 pages, 2 figures, accepted for ICASSP 2024

arXiv:2309.08208 [pdf, other]

HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

Authors: Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-** Yu

Abstract: Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since… ▽ More Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since the Conformer was designed for sequence-to-sequence tasks, its direct application to ADD tasks may be sub-optimal. To tackle this limitation, we propose HM-Conformer by adopting two components: (1) Hierarchical pooling method progressively reducing the sequence length to eliminate duplicated information (2) Multi-level classification token aggregation method utilizing classification tokens to gather information from different blocks. Owing to these components, HM-Conformer can efficiently detect spoofing evidence by processing various sequence lengths and aggregating them. In experimental results on the ASVspoof 2021 Deepfake dataset, HM-Conformer achieved a 15.71% EER, showing competitive performance compared to recent systems. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Submitted to 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

arXiv:2309.04549 [pdf, other]

doi 10.1109/SEC54971.2022.00036

Poster: Making Edge-assisted LiDAR Perceptions Robust to Lossy Point Cloud Compression

Authors: ** Heo, Gregorie Phillips, Per-Erik Brodin, Ada Gavrilovska

Abstract: Real-time light detection and ranging (LiDAR) perceptions, e.g., 3D object detection and simultaneous localization and map** are computationally intensive to mobile devices of limited resources and often offloaded on the edge. Offloading LiDAR perceptions requires compressing the raw sensor data, and lossy compression is used for efficiently reducing the data volume. Lossy compression degrades t… ▽ More Real-time light detection and ranging (LiDAR) perceptions, e.g., 3D object detection and simultaneous localization and map** are computationally intensive to mobile devices of limited resources and often offloaded on the edge. Offloading LiDAR perceptions requires compressing the raw sensor data, and lossy compression is used for efficiently reducing the data volume. Lossy compression degrades the quality of LiDAR point clouds, and the perception performance is decreased consequently. In this work, we present an interpolation algorithm improving the quality of a LiDAR point cloud to mitigate the perception performance loss due to lossy compression. The algorithm targets the range image (RI) representation of a point cloud and interpolates points at the RI based on depth gradients. Compared to existing image interpolation algorithms, our algorithm shows a better qualitative result when the point cloud is reconstructed from the interpolated RI. With the preliminary results, we also describe the next steps of the current work. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: extended abstract of 2 pages, 2 figures, 1 table

arXiv:2309.04548 [pdf, other]

doi 10.1145/3453142.3491408

Poster: Enabling Flexible Edge-assisted XR

Authors: ** Heo, Ketan Bhardwaj, Ada Gavrilovska

Abstract: Extended reality (XR) is touted as the next frontier of the digital future. XR includes all immersive technologies of augmented reality (AR), virtual reality (VR), and mixed reality (MR). XR applications obtain the real-world context of the user from an underlying system, and provide rich, immersive, and interactive virtual experiences based on the user's context in real-time. XR systems process s… ▽ More Extended reality (XR) is touted as the next frontier of the digital future. XR includes all immersive technologies of augmented reality (AR), virtual reality (VR), and mixed reality (MR). XR applications obtain the real-world context of the user from an underlying system, and provide rich, immersive, and interactive virtual experiences based on the user's context in real-time. XR systems process streams of data from device sensors, and provide functionalities including perceptions and graphics required by the applications. These processing steps are computationally intensive, and the challenge is that they must be performed within the strict latency requirements of XR. This poses limitations on the possible XR experiences that can be supported on mobile devices with limited computing resources. In this XR context, edge computing is an effective approach to address this problem for mobile users. The edge is located closer to the end users and enables processing and storing data near them. In addition, the development of high bandwidth and low latency network technologies such as 5G facilitates the application of edge computing for latency-critical use cases [4, 11]. This work presents an XR system for enabling flexible edge-assisted XR. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: extended abstract of 2 pages, 1 figure, 2 tables

arXiv:2308.10570 [pdf, other]

Self-Feedback DETR for Temporal Action Detection

Authors: Jihwan Kim, Miso Lee, Jae-Pil Heo

Abstract: Temporal Action Detection (TAD) is challenging but fundamental for real-world video applications. Recently, DETR-based models have been devised for TAD but have not performed well yet. In this paper, we point out the problem in the self-attention of DETR for TAD; the attention modules focus on a few key elements, called temporal collapse problem. It degrades the capability of the encoder and decod… ▽ More Temporal Action Detection (TAD) is challenging but fundamental for real-world video applications. Recently, DETR-based models have been devised for TAD but have not performed well yet. In this paper, we point out the problem in the self-attention of DETR for TAD; the attention modules focus on a few key elements, called temporal collapse problem. It degrades the capability of the encoder and decoder since their self-attention modules play no role. To solve the problem, we propose a novel framework, Self-DETR, which utilizes cross-attention maps of the decoder to reactivate self-attention modules. We recover the relationship between encoder features by simple matrix multiplication of the cross-attention map and its transpose. Likewise, we also get the information within decoder queries. By guiding collapsed self-attention maps with the guidance map calculated, we settle down the temporal collapse of self-attention modules in the encoder and decoder. Our extensive experiments demonstrate that Self-DETR resolves the temporal collapse problem by kee** high diversity of attention over all layers. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023

arXiv:2308.00093 [pdf, other]

Task-Oriented Channel Attention for Fine-Grained Few-Shot Classification

Authors: SuBeen Lee, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

Abstract: The difficulty of the fine-grained image classification mainly comes from a shared overall appearance across classes. Thus, recognizing discriminative details, such as eyes and beaks for birds, is a key in the task. However, this is particularly challenging when training data is limited. To address this, we propose Task Discrepancy Maximization (TDM), a task-oriented channel attention method tailo… ▽ More The difficulty of the fine-grained image classification mainly comes from a shared overall appearance across classes. Thus, recognizing discriminative details, such as eyes and beaks for birds, is a key in the task. However, this is particularly challenging when training data is limited. To address this, we propose Task Discrepancy Maximization (TDM), a task-oriented channel attention method tailored for fine-grained few-shot classification with two novel modules Support Attention Module (SAM) and Query Attention Module (QAM). SAM highlights channels encoding class-wise discriminative features, while QAM assigns higher weights to object-relevant channels of the query. Based on these submodules, TDM produces task-adaptive features by focusing on channels encoding class-discriminative details and possessed by the query at the same time, for accurate class-sensitive similarity measure between support and query instances. While TDM influences high-level feature maps by task-adaptive calibration of channel-wise importance, we further introduce Instance Attention Module (IAM) operating in intermediate layers of feature extractors to instance-wisely highlight object-relevant channels, by extending QAM. The merits of TDM and IAM and their complementary benefits are experimentally validated in fine-grained few-shot classification tasks. Moreover, IAM is also shown to be effective in coarse-grained and cross-domain few-shot classifications. △ Less

Submitted 28 July, 2023; originally announced August 2023.

Comments: arXiv admin note: text overlap with arXiv:2207.01376

arXiv:2307.15574 [pdf, other]

doi 10.1145/3587819.3590966

FleXR: A System Enabling Flexibly Distributed Extended Reality

Authors: ** Heo, Ketan Bhardwaj, Ada Gavrilovska

Abstract: Extended reality (XR) applications require computationally demanding functionalities with low end-to-end latency and high throughput. To enable XR on commodity devices, a number of distributed systems solutions enable offloading of XR workloads on remote servers. However, they make a priori decisions regarding the offloaded functionalities based on assumptions about operating factors, and their be… ▽ More Extended reality (XR) applications require computationally demanding functionalities with low end-to-end latency and high throughput. To enable XR on commodity devices, a number of distributed systems solutions enable offloading of XR workloads on remote servers. However, they make a priori decisions regarding the offloaded functionalities based on assumptions about operating factors, and their benefits are restricted to specific deployment contexts. To realize the benefits of offloading in various distributed environments, we present a distributed stream processing system, FleXR, which is specialized for real-time and interactive workloads and enables flexible distributions of XR functionalities. In building FleXR, we identified and resolved several issues of presenting XR functionalities as distributed pipelines. FleXR provides a framework for flexible distribution of XR pipelines while streamlining development and deployment phases. We evaluate FleXR with three XR use cases in four different distribution scenarios. In the results, the best-case distribution scenario shows up to 50% less end-to-end latency and 3.9x pipeline throughput compared to alternatives. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 11 pages, 11 figures, conference paper

Journal ref: In Proceedings of the 14th Conference on ACM Multimedia Systems (pp. 1-13) June, 2023

arXiv:2307.15005 [pdf, other]

doi 10.1109/SEC54971.2022.00012

FLiCR: A Fast and Lightweight LiDAR Point Cloud Compression Based on Lossy RI

Authors: ** Heo, Christopher Phillips, Ada Gavrilovska

Abstract: Light detection and ranging (LiDAR) sensors are becoming available on modern mobile devices and provide a 3D sensing capability. This new capability is beneficial for perceptions in various use cases, but it is challenging for resource-constrained mobile devices to use the perceptions in real-time because of their high computational complexity. In this context, edge computing can be used to enable… ▽ More Light detection and ranging (LiDAR) sensors are becoming available on modern mobile devices and provide a 3D sensing capability. This new capability is beneficial for perceptions in various use cases, but it is challenging for resource-constrained mobile devices to use the perceptions in real-time because of their high computational complexity. In this context, edge computing can be used to enable LiDAR online perceptions, but offloading the perceptions on the edge server requires a low-latency, lightweight, and efficient compression due to the large volume of LiDAR point clouds data. This paper presents FLiCR, a fast and lightweight LiDAR point cloud compression method for enabling edge-assisted online perceptions. FLiCR is based on range images (RI) as an intermediate representation (IR), and dictionary coding for compressing RIs. FLiCR achieves its benefits by leveraging lossy RIs, and we show the efficiency of bytestream compression is largely improved with quantization and subsampling. In addition, we identify the limitation of current quality metrics for presenting the entropy of a point cloud, and introduce a new metric that reflects both point-wise and entropy-wise qualities for lossy IRs. The evaluation results show FLiCR is more suitable for edge-assisted real-time perceptions than the existing LiDAR compressions, and we demonstrate the effectiveness of our compression and metric with the evaluations on 3D object detection and LiDAR SLAM. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 12 pages, 11 figures, conference paper

Journal ref: In 2022 IEEE/ACM 7th Symposium on Edge Computing (SEC) (pp. 54-67). IEEE 2022

arXiv:2307.10628 [pdf, other]

PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification

Authors: Wonbin Kim, Hyun-seo Shin, Ju-ho Kim, Jungwoo Heo, Chan-yeong Lim, Ha-** Yu

Abstract: Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environ… ▽ More Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environments. The experimental results demonstrate that PAS outperforms traditional additive noise in terms of equal error rates (EER), with relative improvements of 4.64% and 5.01% observed in SE-ResNet34 and ECAPA-TDNN. We also show the effectiveness of proposed method by analyzing attention modules and visualizing speaker embeddings. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 5 pages, 2 figures, 1 table, accepted to CKAIA2023 as a conference paper

arXiv:2306.14861 [pdf, other]

Leveraging Task Structures for Improved Identifiability in Neural Network Representations

Authors: Wenlin Chen, Julien Horwood, Juyeon Heo, José Miguel Hernández-Lobato

Abstract: This work extends the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks. In such cases, we show that identifiability is achievable even in the case of regression, extending prior work restricted to linear identifiability in the single-task classification case. Furthermore, we show that the existence of a task distribution w… ▽ More This work extends the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks. In such cases, we show that identifiability is achievable even in the case of regression, extending prior work restricted to linear identifiability in the single-task classification case. Furthermore, we show that the existence of a task distribution which defines a conditional prior over latent factors reduces the equivalence class for identifiability to permutations and scaling, a much stronger and more useful result than linear identifiability. When we further assume a causal structure over these tasks, our approach enables simple maximum marginal likelihood optimization together with downstream applicability to causal representation learning. Empirically, we validate that our model outperforms more general unsupervised models in recovering canonical representations for both synthetic and real-world molecular data. △ Less

Submitted 29 September, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 18 pages, 4 figures, 5 tables, 1 algorithm

arXiv:2306.01310 [pdf, other]

EPIC: Graph Augmentation with Edit Path Interpolation via Learnable Cost

Authors: Jaeseung Heo, Seungbeom Lee, Sungsoo Ahn, Dongwoo Kim

Abstract: Data augmentation plays a critical role in improving model performance across various domains, but it becomes challenging with graph data due to their complex and irregular structure. To address this issue, we propose EPIC (Edit Path Interpolation via learnable Cost), a novel interpolation-based method for augmenting graph datasets. To interpolate between two graphs lying in an irregular domain, E… ▽ More Data augmentation plays a critical role in improving model performance across various domains, but it becomes challenging with graph data due to their complex and irregular structure. To address this issue, we propose EPIC (Edit Path Interpolation via learnable Cost), a novel interpolation-based method for augmenting graph datasets. To interpolate between two graphs lying in an irregular domain, EPIC leverages the concept of graph edit distance, constructing an edit path that represents the transformation process between two graphs via edit operations. Moreover, our method introduces a context-sensitive cost model that accounts for the importance of specific edit operations formulated through a learning framework. This allows for a more nuanced transformation process, where the edit distance is not merely count-based but reflects meaningful graph attributes. With randomly sampled graphs from the edit path, we enrich the training set to enhance the generalization capability of classification models. Experimental evaluations across several benchmark datasets demonstrate that our approach outperforms existing augmentation techniques in many tasks. △ Less

Submitted 4 June, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.17394 [pdf, other]

One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

Authors: Jungwoo Heo, Chan-yeong Lim, Ju-ho Kim, Hyun-seo Shin, Ha-** Yu

Abstract: The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods coul… ▽ More The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods could not extract SV-tailored features. This paper suggests One-Step Knowledge Distillation and Fine-Tuning (OS-KDFT), which incorporates KD and fine-tuning (FT). We optimize a student model for SV during KD training to avert the distillation of inappropriate information for the SV. OS-KDFT could downsize Wav2Vec 2.0 based ECAPA-TDNN size by approximately 76.2%, and reduce the SSL model's inference time by 79% while presenting an EER of 0.98%. The proposed OS-KDFT is validated across VoxCeleb1 and VoxCeleb2 datasets and W2V2 and HuBERT SSL models. Experiments are available on our GitHub. △ Less

Submitted 7 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: ISCA INTERSPEECH 2023

arXiv:2305.04526 [pdf, other]

CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation

Authors: Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram

Abstract: Transfer learning has become a popular task adaptation method in the era of foundation models. However, many foundation models require large storage and computing resources, which makes off-the-shelf deployment impractical. Post-training compression techniques such as pruning and quantization can help lower deployment costs. Unfortunately, the resulting performance degradation limits the usability… ▽ More Transfer learning has become a popular task adaptation method in the era of foundation models. However, many foundation models require large storage and computing resources, which makes off-the-shelf deployment impractical. Post-training compression techniques such as pruning and quantization can help lower deployment costs. Unfortunately, the resulting performance degradation limits the usability and benefits of such techniques. To close this performance gap, we propose CrAFT, a simple fine-tuning framework that enables effective post-training network compression. In CrAFT, users simply employ the default fine-tuning schedule along with sharpness minimization objective, simultaneously facilitating task adaptation and compression-friendliness. Contrary to the conventional sharpness minimization techniques, which are applied during pretraining, the CrAFT approach adds negligible training overhead as fine-tuning is done in under a couple of minutes or hours with a single GPU. The effectiveness of CrAFT, which is a general-purpose tool that can significantly boost one-shot pruning and post-training quantization, is demonstrated on both convolution-based and attention-based vision foundation models on a variety of target tasks. The code will be made publicly available. △ Less

Submitted 8 July, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: Preprint

arXiv:2303.15014 [pdf, other]

Leveraging Hidden Positives for Unsupervised Semantic Segmentation

Authors: Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo

Abstract: Dramatic demand for manpower to label pixel-level annotations triggered the advent of unsupervised semantic segmentation. Although the recent work employing the vision transformer (ViT) backbone shows exceptional performance, there is still a lack of consideration for task-specific training guidance and local semantic consistency. To tackle these issues, we leverage contrastive learning by excavat… ▽ More Dramatic demand for manpower to label pixel-level annotations triggered the advent of unsupervised semantic segmentation. Although the recent work employing the vision transformer (ViT) backbone shows exceptional performance, there is still a lack of consideration for task-specific training guidance and local semantic consistency. To tackle these issues, we leverage contrastive learning by excavating hidden positives to learn rich semantic relationships and ensure semantic consistency in local regions. Specifically, we first discover two types of global hidden positives, task-agnostic and task-specific ones for each anchor based on the feature similarities defined by a fixed pre-trained backbone and a segmentation head-in-training, respectively. A gradual increase in the contribution of the latter induces the model to capture task-specific semantic features. In addition, we introduce a gradient propagation strategy to learn semantic consistency between adjacent patches, under the inherent premise that nearby patches are highly likely to possess the same semantics. Specifically, we add the loss propagating to local hidden positives, semantically similar nearby patches, in proportion to the predefined similarity scores. With these training schemes, our proposed method achieves new state-of-the-art (SOTA) results in COCO-stuff, Cityscapes, and Potsdam-3 datasets. Our code is available at: https://github.com/hynnsk/HP. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023

arXiv:2303.13874 [pdf, other]

Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

Authors: WonJun Moon, Sangeek Hyun, SangUk Park, Dongchan Park, Jae-Pil Heo

Abstract: Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as the demand for video understanding is drastically increased. The key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to the given text query. Although the recent transformer-based models brought some advances, we found that these methods do not fully… ▽ More Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as the demand for video understanding is drastically increased. The key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to the given text query. Although the recent transformer-based models brought some advances, we found that these methods do not fully exploit the information of a given query. For example, the relevance between text query and video contents is sometimes neglected when predicting the moment and its saliency. To tackle this issue, we introduce Query-Dependent DETR (QD-DETR), a detection transformer tailored for MR/HD. As we observe the insignificant role of a given query in transformer architectures, our encoding module starts with cross-attention layers to explicitly inject the context of text query into video representation. Then, to enhance the model's capability of exploiting the query information, we manipulate the video-query pairs to produce irrelevant pairs. Such negative (irrelevant) video-query pairs are trained to yield low saliency scores, which in turn, encourages the model to estimate precise accordance between query-video pairs. Lastly, we present an input-adaptive saliency predictor which adaptively defines the criterion of saliency scores for the given video-query pairs. Our extensive studies verify the importance of building the query-dependent representation for MR/HD. Specifically, QD-DETR outperforms state-of-the-art methods on QVHighlights, TVSum, and Charades-STA datasets. Codes are available at github.com/wjun0830/QD-DETR. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023. Code is available at https://github.com/wjun0830/QD-DETR

arXiv:2303.06419 [pdf, other]

Use Perturbations when Learning from Explanations

Authors: Juyeon Heo, Vihari Piratla, Matthew Wicker, Adrian Weller

Abstract: Machine learning from explanations (MLX) is an approach to learning that uses human-provided explanations of relevant or irrelevant features for each input to ensure that model predictions are right for the right reasons. Existing MLX approaches rely on local model interpretation methods and require strong model smoothing to align model and human explanations, leading to sub-optimal performance. W… ▽ More Machine learning from explanations (MLX) is an approach to learning that uses human-provided explanations of relevant or irrelevant features for each input to ensure that model predictions are right for the right reasons. Existing MLX approaches rely on local model interpretation methods and require strong model smoothing to align model and human explanations, leading to sub-optimal performance. We recast MLX as a robustness problem, where human explanations specify a lower dimensional manifold from which perturbations can be drawn, and show both theoretically and empirically how this approach alleviates the need for strong model smoothing. We consider various approaches to achieving robustness, leading to improved performance over prior MLX methods. Finally, we show how to combine robustness with an earlier MLX method, yielding state-of-the-art results on both synthetic and real-world benchmarks. △ Less

Submitted 1 December, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

Comments: NeurIPS 2023; https://github.com/vihari/robust_mlx

arXiv:2303.02331 [pdf, other]

Training-Free Acceleration of ViTs with Delayed Spatial Merging

Authors: Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram

Abstract: Token merging has emerged as a new paradigm that can accelerate the inference of Vision Transformers (ViTs) without any retraining or fine-tuning. To push the frontier of training-free acceleration in ViTs, we improve token merging by adding the perspectives of 1) activation outliers and 2) hierarchical representations. Through a careful analysis of the attention behavior in ViTs, we characterize… ▽ More Token merging has emerged as a new paradigm that can accelerate the inference of Vision Transformers (ViTs) without any retraining or fine-tuning. To push the frontier of training-free acceleration in ViTs, we improve token merging by adding the perspectives of 1) activation outliers and 2) hierarchical representations. Through a careful analysis of the attention behavior in ViTs, we characterize a delayed onset of the convergent attention phenomenon, which makes token merging undesirable in the bottom blocks of ViTs. Moreover, we augment token merging with a hierarchical processing scheme to capture multi-scale redundancy between visual tokens. Combining these two insights, we build a unified inference framework called DSM: Delayed Spatial Merging. We extensively evaluate DSM on various ViT model scales (Tiny to Huge) and tasks (ImageNet-1k and transfer learning), achieving up to 1.8$\times$ FLOP reduction and 1.6$\times$ throughput speedup at a negligible loss while being two orders of magnitude faster than existing methods. △ Less

Submitted 1 July, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: ICML 2024 ES-FoMo Workshop

arXiv:2302.04280 [pdf, other]

doi 10.1093/mnrasl/slad018

Keck, Gemini, and Palomar 200-inch visible photometry of red and very-red Neptunian Trojans

Authors: B. T. Bolin, C. Fremling, A. Morbidelli, K. S. Noll, J. van Roestel, E. K. Deibert, M. Delbo, G. Gimeno, J. -E. Heo, C. M. Lisse, T. Seccull, H. Suh

Abstract: Neptunian Trojans (NTs), trans-Neptunian objects in 1:1 mean-motion resonance with Neptune, are generally thought to have been captured from the original trans-Neptunian protoplanetary disk into co-orbital resonance with the ice giant during its outward migration. It is possible, therefore, that the colour distribution of NTs is a constraint on the location of any colour transition zones that may… ▽ More Neptunian Trojans (NTs), trans-Neptunian objects in 1:1 mean-motion resonance with Neptune, are generally thought to have been captured from the original trans-Neptunian protoplanetary disk into co-orbital resonance with the ice giant during its outward migration. It is possible, therefore, that the colour distribution of NTs is a constraint on the location of any colour transition zones that may have been present in the disk. In support of this possible test, we obtained $g$, $r$, and $i$-band observations of 18 NTs, more than doubling the sample of NTs with known visible colours to 31 objects. Out of the combined sample, we found $\approx$4 objects with $g$-$i$ colours of $>$1.2 mags placing them in the very red (VR) category as typically defined. We find, without taking observational selection effects into account, that the NT $g$-$i$ colour distribution is statistically distinct from other trans-Neptunian dynamical classes. The optical colours of Jovian Trojans and NTs are shown to be less similar than previously claimed with additional VR NTs. The presence of VR objects among the NTs may suggest that the location of the red to VR colour transition zone in the protoplanetary disk was interior to 30-35 au. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: 8 pages, 3 figures, 3 tables, accepted for publication in MNRAS:Letters

arXiv:2212.08568 [pdf, other]

Biomedical image analysis competitions: The state of current participation practice

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps. △ Less

Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2212.08507 [pdf, other]

Robust Explanation Constraints for Neural Networks

Authors: Matthew Wicker, Juyeon Heo, Luca Costabello, Adrian Weller

Abstract: Post-hoc explanation methods are used with the intent of providing insights about neural networks and are sometimes said to help engender trust in their outputs. However, popular explanations methods have been found to be fragile to minor perturbations of input features or model parameters. Relying on constraint relaxation techniques from non-convex optimization, we develop a method that upper-bou… ▽ More Post-hoc explanation methods are used with the intent of providing insights about neural networks and are sometimes said to help engender trust in their outputs. However, popular explanations methods have been found to be fragile to minor perturbations of input features or model parameters. Relying on constraint relaxation techniques from non-convex optimization, we develop a method that upper-bounds the largest change an adversary can make to a gradient-based explanation via bounded manipulation of either the input features or model parameters. By propagating a compact input or parameter set as symbolic intervals through the forwards and backwards computations of the neural network we can formally certify the robustness of gradient-based explanations. Our bounds are differentiable, hence we can incorporate provable explanation robustness into neural network training. Empirically, our method surpasses the robustness provided by previous heuristic approaches. We find that our training method is the only method able to learn neural networks with certificates of explanation robustness across all six datasets tested. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: 23 pages, 12 figures

arXiv:2211.15900 [pdf, other]

Towards More Robust Interpretation via Local Gradient Alignment

Authors: Sunghwan Joo, Seokhyeon Jeong, Juyeon Heo, Adrian Weller, Taesup Moon

Abstract: Neural network interpretation methods, particularly feature attribution methods, are known to be fragile with respect to adversarial input perturbations. To address this, several methods for enhancing the local smoothness of the gradient while training have been proposed for attaining \textit{robust} feature attributions. However, the lack of considering the normalization of the attributions, whic… ▽ More Neural network interpretation methods, particularly feature attribution methods, are known to be fragile with respect to adversarial input perturbations. To address this, several methods for enhancing the local smoothness of the gradient while training have been proposed for attaining \textit{robust} feature attributions. However, the lack of considering the normalization of the attributions, which is essential in their visualizations, has been an obstacle to understanding and improving the robustness of feature attribution methods. In this paper, we provide new insights by taking such normalization into account. First, we show that for every non-negative homogeneous neural network, a naive $\ell_2$-robust criterion for gradients is \textit{not} normalization invariant, which means that two functions with the same normalized gradient can have different values. Second, we formulate a normalization invariant cosine distance-based criterion and derive its upper bound, which gives insight for why simply minimizing the Hessian norm at the input, as has been done in previous work, is not sufficient for attaining robust feature attribution. Finally, we propose to combine both $\ell_2$ and cosine distance-based criteria as regularization terms to leverage the advantages of both in aligning the local gradient. As a result, we experimentally show that models trained with our method produce much more robust interpretations on CIFAR-10 and ImageNet-100 without significantly hurting the accuracy, compared to the recent baselines. To the best of our knowledge, this is the first work to verify the robustness of interpretation on a larger-scale dataset beyond CIFAR-10, thanks to the computational efficiency of our method. △ Less

Submitted 7 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: 22 pages (9 pages in paper, 13 pages in Appendix), 9 figures, 6 tables Accepted in AAAI 23 (Association for the Advancement of Artificial Intelligence)

Showing 1–50 of 116 results for author: Heo, J