Search | arXiv e-print repository

Dielectric Fano Nanoantennas for Enabling Sub-Nanosecond Lifetimes in NV-based Single Photon Emitters

Authors: Shu An, Dmitry Kalashnikov, Wenqiao Shi, Zackaria Mahfoud, Ah Bian Chew, Yan Liu, **g Wu, Di Zhu, Weibo Gao, Cheng-Wei Qiu, Victor Leong, Zhaogang Dong

Abstract: Solid-state quantum emitters are essential sources of single photons, and enhancing their emission rates is of paramount importance for applications in quantum communications, computing, and metrology. One approach is to couple quantum emitters with resonant photonic nanostructures, where the emission rate is enhanced due to the Purcell effect. Dielectric nanoantennas are promising as they provide… ▽ More Solid-state quantum emitters are essential sources of single photons, and enhancing their emission rates is of paramount importance for applications in quantum communications, computing, and metrology. One approach is to couple quantum emitters with resonant photonic nanostructures, where the emission rate is enhanced due to the Purcell effect. Dielectric nanoantennas are promising as they provide strong emission enhancement compared to plasmonic ones, which suffer from high Ohmic loss. Here, we designed and fabricated a dielectric Fano resonator based on a pair of silicon (Si) ellipses and a disk, which supports the mode hybridization between quasi-bound-states-in-the-continuum (quasi-BIC) and Mie resonance. We demonstrated the performance of the developed resonant system by interfacing it with single photon emitters (SPEs) based on nitrogen-vacancy (NV-) centers in nanodiamonds (NDs). We observed that the interfaced emitters have a Purcell enhancement factor of ~10, with sub-ns emission lifetime and a polarization contrast of 9. Our results indicate a promising method for develo** efficient and compact single-photon sources for integrated quantum photonics applications. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 20 pages, 4 figures

arXiv:2407.02536 [pdf, other]

doi 10.4230/LIPIcs.GIScience.2023.3

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Authors: Subhankar Ghosh, Jayant Gupta, Arun Sharma, Shuai An, Shashi Shekhar

Abstract: Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The prob… ▽ More Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost. △ Less

Submitted 1 July, 2024; originally announced July 2024.

ACM Class: E.m; F.2; E.1; H.3; I.5; J.0

arXiv:2407.00256 [pdf, other]

One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction.… ▽ More Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction. Such simplification significantly limits their capacity, as a single demo-free instruction might not be able to cover the entire complex problem space of the targeted task. To alleviate this issue, we adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions; Each sub-region is governed by a specialized expert, equipped with both an instruction and a set of demos. A two-phase process is developed to construct the specialized expert for each region: (1) demo assignment: Inspired by the theoretical connection between in-context learning and kernel regression, we group demos into experts based on their semantic similarity; (2) instruction assignment: A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect. The resulting method, codenamed Mixture-of-Prompts (MoP), achieves an average win rate of 81% against prior arts across several major benchmarks. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts

MSC Class: 68T01

Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024

arXiv:2406.17424 [pdf, other]

Sparse Outerstring Graphs Have Logarithmic Treewidth

Authors: Shinwoo An, Eun** Oh, Jie Xue

Abstract: An outerstring graph is the intersection graph of curves lying inside a disk with one endpoint on the boundary of the disk. We show that an outerstring graph with $n$ vertices has treewidth $O(α\log n)$, where $α$ denotes the arboricity of the graph, with an almost matching lower bound of $Ω(α\log (n/α))$. As a corollary, we show that a $t$-biclique-free outerstring graph has treewidth… ▽ More An outerstring graph is the intersection graph of curves lying inside a disk with one endpoint on the boundary of the disk. We show that an outerstring graph with $n$ vertices has treewidth $O(α\log n)$, where $α$ denotes the arboricity of the graph, with an almost matching lower bound of $Ω(α\log (n/α))$. As a corollary, we show that a $t$-biclique-free outerstring graph has treewidth $O(t(\log t)\log n)$. This leads to polynomial-time algorithms for most of the central NP-complete problems such as \textsc{Independent Set}, \textsc{Vertex Cover}, \textsc{Dominating Set}, \textsc{Feedback Vertex Set}, \textsc{Coloring} for sparse outerstring graphs. Also, we can obtain subexponential-time (exact, parameterized, and approximation) algorithms for various NP-complete problems such as \textsc{Vertex Cover}, \textsc{Feedback Vertex Set} and \textsc{Cycle Packing} for (not necessarily sparse) outerstring graphs. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 17pages, In ESA'24

arXiv:2406.16003 [pdf]

Unidirectional Chiral Emission via Twisted Bi-layer Metasurfaces

Authors: Dmitrii Gromyko, Shu An, Sergey Gorelik, Jiahui Xu, Li Jun Lim, Henry Yit Loong Lee, Febiana Tjiptoharsono, Zhi-Kuang Tan, Cheng-Wei Qiu, Zhaogang Dong, Lin Wu

Abstract: Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain… ▽ More Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain obscure. In this paper, we present experimental observations of unidirectional chiral emission from a twisted bi-layer metasurface via multi-dimensional control, including twist angle, interlayer distance, and lateral displacement between the top and bottom layers, as enabled by doublet alignment lithography (DAL). First, maintaining alignment, the metasurface demonstrates a resonant intrinsic optical chirality with near-unity circular dichroism of 0.94 and reflectance difference of 74%, where a high circular dichroism greater than 0.9 persists across a wide range of angles from -11 to 11 degrees. Second, engineered lateral displacement induces a unidirectional chiral resonance, resulting in unidirectional chiral emission from the quantum dots deposited onto the metasurface. Our bi-layer metasurfaces offer a universal compact platform for efficient radiation manipulation over a wide angular range, promising potential applications in miniaturized lasers, grating couplers, and chiral nanoantennas. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 16 pages, 4 figures

arXiv:2406.12430 [pdf, other]

PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers

Authors: Myeonghwa Lee, Seonho An, Min-Soo Kim

Abstract: In this paper, we conduct a study to utilize LLMs as a solution for decision making that requires complex data analysis. We define Decision QA as the task of answering the best decision, $d_{best}$, for a decision-making question $Q$, business rules $R$ and a database $D$. Since there is no benchmark that can examine Decision QA, we propose Decision QA benchmark, DQA. It has two scenarios, Locatin… ▽ More In this paper, we conduct a study to utilize LLMs as a solution for decision making that requires complex data analysis. We define Decision QA as the task of answering the best decision, $d_{best}$, for a decision-making question $Q$, business rules $R$ and a database $D$. Since there is no benchmark that can examine Decision QA, we propose Decision QA benchmark, DQA. It has two scenarios, Locating and Building, constructed from two video games (Europa Universalis IV and Victoria 3) that have almost the same goal as Decision QA. To address Decision QA effectively, we also propose a new RAG technique called the iterative plan-then-retrieval augmented generation (PlanRAG). Our PlanRAG-based LM generates the plan for decision making as the first step, and the retriever generates the queries for data analysis as the second step. The proposed method outperforms the state-of-the-art iterative RAG method by 15.8% in the Locating scenario and by 7.4% in the Building scenario, respectively. We release our code and benchmark at https://github.com/myeon9h/PlanRAG. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: NAACL 2024

ACM Class: I.2.7

arXiv:2406.10957 [pdf, other]

Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence

Authors: Junru Lu, Jiazheng Li, Siyu An, Meng Zhao, Yulan He, Di Yin, Xing Sun

Abstract: Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a more straightforward alternative to the complex Reinforcement Learning from Human Feedback (RLHF). Despite its promising efficacy, DPO faces a notable drawback: "verbosity", a common over-optimization phenomenon also observ… ▽ More Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a more straightforward alternative to the complex Reinforcement Learning from Human Feedback (RLHF). Despite its promising efficacy, DPO faces a notable drawback: "verbosity", a common over-optimization phenomenon also observed in RLHF. While previous studies mainly attributed verbosity to biased labels within the data, we propose that the issue also stems from an inherent algorithmic length reliance in DPO. Specifically, we suggest that the discrepancy between sequence-level Kullback-Leibler (KL) divergences between chosen and rejected sequences, used in DPO, results in overestimated or underestimated rewards due to varying token lengths. Empirically, we utilize datasets with different label lengths to demonstrate the presence of biased rewards. We then introduce an effective downsampling approach, named SamPO, to eliminate potential length reliance. Our experimental evaluations, conducted across three LLMs of varying scales and a diverse array of conditional and open-ended benchmarks, highlight the efficacy of SamPO in mitigating verbosity, achieving improvements of 5% to 12% over DPO through debaised rewards. Our codes can be accessed at: https://github.com/LuJunru/SamPO/. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10744

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: The author list and contents need to be verified by all authors

arXiv:2406.06379 [pdf, other]

FinVerse: An Autonomous Agent System for Versatile Financial Analysis

Authors: Siyu An, Qin Li, Junru Lu, Di Yin, Xing Sun

Abstract: With the significant advancements in cognitive intelligence driven by LLMs, autonomous agent systems have attracted extensive attention. Despite this growing interest, the development of stable and efficient agent systems poses substantial practical challenges. In this paper, we introduce FinVerse, a meticulously crafted agent system designed for a broad range of financial topics. FinVerse integra… ▽ More With the significant advancements in cognitive intelligence driven by LLMs, autonomous agent systems have attracted extensive attention. Despite this growing interest, the development of stable and efficient agent systems poses substantial practical challenges. In this paper, we introduce FinVerse, a meticulously crafted agent system designed for a broad range of financial topics. FinVerse integrates over 600 financial APIs, enabling access to more accurate and extensive financial information compared to generalist agents. To enhance financial information processing capabilities, FinVerse is equipped with an embedded code interpreter, enabling the execution of complex data analysis tasks with precision and efficiency. Our work includes an empirical comparison of several LLMs in driving FinVerse. Specifically, we propose our own scheme for training LLMs using SFT to optimize LLM performance within FinVerse. Recognizing the scarcity of specialized datasets to build LLMs for agents, we have constructed a dataset and plan to make it open-source, providing a valuable resource for peer application developers. The demo video has been released on YouTube at https://www.youtube.com/watch?v=sk8L9_Wv7J4 △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.04867 [pdf, other]

Deep learning for precipitation nowcasting: A survey from the perspective of time series forecasting

Authors: Sojung An, Tae-** Oh, Eunha Sohn, Donghyun Kim

Abstract: Deep learning-based time series forecasting has dominated the short-term precipitation forecasting field with the help of its ability to estimate motion flow in high-resolution datasets. The growing interest in precipitation nowcasting offers substantial opportunities for the advancement of current forecasting technologies. Nevertheless, there has been a scarcity of in-depth surveys of time series… ▽ More Deep learning-based time series forecasting has dominated the short-term precipitation forecasting field with the help of its ability to estimate motion flow in high-resolution datasets. The growing interest in precipitation nowcasting offers substantial opportunities for the advancement of current forecasting technologies. Nevertheless, there has been a scarcity of in-depth surveys of time series precipitation forecasting using deep learning. Thus, this paper systemically reviews recent progress in time series precipitation forecasting models. Specifically, we investigate the following key points within background components, covering: i) preprocessing, ii) objective functions, and iii) evaluation metrics. We then categorize forecasting models into \textit{recursive} and \textit{multiple} strategies based on their approaches to predict future frames, investigate the impacts of models using the strategies, and performance assessments. Finally, we evaluate current deep learning-based models for precipitation forecasting on a public benchmark, discuss their limitations and challenges, and present some promising research directions. Our contribution lies in providing insights for a better understanding of time series precipitation forecasting and in aiding the development of robust AI solutions for the future. △ Less

Submitted 13 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: 21 pages, 7 figures, 5 tables

arXiv:2405.20602 [pdf, other]

Masked Language Modeling Becomes Conditional Density Estimation for Tabular Data Synthesis

Authors: Seunghwan An, Gyeongdong Woo, Jaesung Lim, ChangHyun Kim, Sungchul Hong, Jong-June Jeon

Abstract: In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Given that the MLu performance relies on accurately approximating the conditional distributions, we focus on devising a synthetic data generation method based on conditional distribution estimation. We propose a novel synthetic data generation method, MaCo… ▽ More In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Given that the MLu performance relies on accurately approximating the conditional distributions, we focus on devising a synthetic data generation method based on conditional distribution estimation. We propose a novel synthetic data generation method, MaCoDE, by redefining the multi-class classification task of Masked Language Modeling (MLM) as histogram-based non-parametric conditional density estimation. Our proposed method enables estimating conditional densities across arbitrary combinations of target and conditional variables. Furthermore, we demonstrate that our proposed method bridges the theoretical gap between distributional learning and MLM. To validate the effectiveness of our proposed model, we conduct synthetic data generation experiments on 10 real-world datasets. Given the analogy between predicting masked input tokens in MLM and missing data imputation, we also evaluate the performance of multiple imputations on incomplete datasets with various missing data mechanisms. Moreover, our proposed model offers the advantage of enabling adjustments to data privacy levels without requiring re-training. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19757 [pdf, other]

Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise Filtering

Authors: Sungchul Hong, Seunghwan An, Jong-June Jeon

Abstract: Recent advances in a generative neural network model extend the development of data augmentation methods. However, the augmentation methods based on the modern generative models fail to achieve notable performance for class imbalance data compared to the conventional model, the SMOTE. We investigate the problem of the generative model for imbalanced classification and introduce a framework to enha… ▽ More Recent advances in a generative neural network model extend the development of data augmentation methods. However, the augmentation methods based on the modern generative models fail to achieve notable performance for class imbalance data compared to the conventional model, the SMOTE. We investigate the problem of the generative model for imbalanced classification and introduce a framework to enhance the SMOTE algorithm using Variational Autoencoders (VAE). Our approach systematically quantifies the density of data points in a low-dimensional latent space using the VAE, simultaneously incorporating information on class labels and classification difficulty. Then, the data points potentially degrading the augmentation are systematically excluded, and the neighboring observations are directly augmented on the data space. Empirical studies on several imbalanced datasets represent that this simple process innovatively improves the conventional SMOTE algorithm over the deep learning models. Consequently, we conclude that the selection of minority data and the interpolation in the data space are beneficial for imbalanced classification problems with a relatively small number of data points. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19346 [pdf, other]

Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification

Authors: Sion An, Myeongkyun Kang, Soopil Kim, Philip Chikontwe, Li Shen, Sang Hyun Park

Abstract: Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In… ▽ More Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In contrast, resting state (RS) EEG signals are a viable alternative due to ease of acquisition with rich subject information. In this paper, we propose a novel subject-adaptive transfer learning strategy that utilizes RS EEG signals to adapt models on unseen subject data. Specifically, we disentangle extracted features into task- and subject-dependent features and use them to calibrate RS EEG signals for obtaining task information while preserving subject characteristics. The calibrated signals are then used to adapt the model to the target subject, enabling the model to simulate processing TS EEG signals of the target subject. The proposed method achieves state-of-the-art accuracy on three public benchmarks, demonstrating the effectiveness of our method in cross-subject EEG MI classification. Our findings highlight the potential of leveraging RS EEG signals to advance practical brain-computer interface systems. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Early Accepted at MICCAI 2024

arXiv:2405.16013 [pdf, other]

Convergence Behavior of an Adversarial Weak Supervision Method

Authors: Steven An, Sanjoy Dasgupta

Abstract: Labeling data via rules-of-thumb and minimal label supervision is central to Weak Supervision, a paradigm subsuming subareas of machine learning such as crowdsourced learning and semi-supervised ensemble learning. By using this labeled data to train modern machine learning methods, the cost of acquiring large amounts of hand labeled data can be ameliorated. Approaches to combining the rules-of-thu… ▽ More Labeling data via rules-of-thumb and minimal label supervision is central to Weak Supervision, a paradigm subsuming subareas of machine learning such as crowdsourced learning and semi-supervised ensemble learning. By using this labeled data to train modern machine learning methods, the cost of acquiring large amounts of hand labeled data can be ameliorated. Approaches to combining the rules-of-thumb falls into two camps, reflecting different ideologies of statistical estimation. The most common approach, exemplified by the Dawid-Skene model, is based on probabilistic modeling. The other, developed in the work of Balsubramani-Freund and others, is adversarial and game-theoretic. We provide a variety of statistical results for the adversarial approach under log-loss: we characterize the form of the solution, relate it to logistic regression, demonstrate consistency, and give rates of convergence. On the other hand, we find that probabilistic approaches for the same model class can fail to be consistent. Experimental results are provided to corroborate the theoretical results. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 49 pages, 16 figures, to be published in UAI 2024

arXiv:2405.03439 [pdf, other]

Anomalous Inverse Spin Hall Effect (AISHE) due to Unconventional Spin Currents in Ferromagnetic Films with Tailored Interfacial Magnetic Anisotropy

Authors: Soumik Aon, Harekrishna Bhunia, Pratap Kumar Pal, Abu Bakkar Miah, Dhananjaya Mahapatra, Anjan Barman, Partha Mitra

Abstract: A single layer ferromagnetic film magnetized in the plane of an ac current flow, exhibits a characteristic Hall voltage with harmonic and second harmonic components, which is attributed to the presence of spin currents with polarization non-collinear with the magnetization. A set of 30 nm thick permalloy (Py) films used in this study are deposited at an oblique angle with respect to the substrate… ▽ More A single layer ferromagnetic film magnetized in the plane of an ac current flow, exhibits a characteristic Hall voltage with harmonic and second harmonic components, which is attributed to the presence of spin currents with polarization non-collinear with the magnetization. A set of 30 nm thick permalloy (Py) films used in this study are deposited at an oblique angle with respect to the substrate plane which induces an in-plane easy axis in the magnetization of the initial nucleating layers of the films which is distinct from the overall bulk magnetic properties of the film. This unusual magnetic texture provides a platform for the direct detection of inverse spin Hall effect in Hall bar shaped macroscopic devices at room temperatures which we denote as Anomalous Inverse Spin Hall Effect (AISHE). Control samples fabricated by normal deposition of permalloy with slow rotation of substrate shows significant reduction of the harmonic Hall signal that further substantiates the model. The analysis of the second harmonic Hall signal corroborates the presence of spin-orbit torque arising from the unconventional spin-currents in the single-layer ferromagnets. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.02955 [pdf, other]

Minimizing Kinetic Inductance in Tantalum-Based Superconducting Coplanar Waveguide Resonators for Alleviating Frequency Fluctuation Issues

Authors: Dengfeng Li, **g**g Hu, Yuan Li, Shuoming An

Abstract: Advancements in the fabrication of superconducting quantum devices have highlighted tantalum as a promising material, owing to its low surface oxidation microwave loss at low temperatures. However, tantalum films exhibit significantly larger kinetic inductances compared to materials such as aluminum or niobium. Given the inevitable variations in film thickness, this increased kinetic inductance le… ▽ More Advancements in the fabrication of superconducting quantum devices have highlighted tantalum as a promising material, owing to its low surface oxidation microwave loss at low temperatures. However, tantalum films exhibit significantly larger kinetic inductances compared to materials such as aluminum or niobium. Given the inevitable variations in film thickness, this increased kinetic inductance leads to considerable, uncontrolled frequency variances and shifts in components like superconducting coplanar waveguide (SCPW) resonators. Achieving high precision in resonator frequencies is crucial, particularly when multiple resonators share a common Purcell filter with limited bandwidth in superconducting quantum information processors. Here, we tackle this challenge from both fabrication and design perspectives, achieving a reduction in resonator frequency fluctuation by a factor of more than 100. Concurrently, the internal quality factor of the SCPW resonator remains at high level. Our findings open up new avenues for the enhanced utilization of tantalum in large-scale superconducting chips. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.17273 [pdf, other]

doi 10.1016/j.ipm.2024.103716

3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

Authors: Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose

Abstract: In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval. 3SHNet highlights the salient identification of prominent objects and their spatial locations within the visual modality, thus allowing the integration of visual semantics-spatial interactions and maintaining indep… ▽ More In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval. 3SHNet highlights the salient identification of prominent objects and their spatial locations within the visual modality, thus allowing the integration of visual semantics-spatial interactions and maintaining independence between two modalities. This integration effectively combines object regions with the corresponding semantic and position layouts derived from segmentation to enhance the visual representation. And the modality-independence guarantees efficiency and generalization. Additionally, 3SHNet utilizes the structured contextual visual scene information from segmentation to conduct the local (region-based) or global (grid-based) guidance and achieve accurate hybrid-level retrieval. Extensive experiments conducted on MS-COCO and Flickr30K benchmarks substantiate the superior performances, inference efficiency and generalization of the proposed 3SHNet when juxtaposed with contemporary state-of-the-art methodologies. Specifically, on the larger MS-COCO 5K test set, we achieve 16.3%, 24.8%, and 18.3% improvements in terms of rSum score, respectively, compared with the state-of-the-art methods using different image representations, while maintaining optimal retrieval efficiency. Moreover, our performance on cross-dataset generalization improves by 18.6%. Data and code are available at https://github.com/XuriGe1995/3SHNet. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted Information Processing and Management (IP&M), 10 pages, 9 figures and 8 tables

Journal ref: Information Processing & Management, Volume 61, Issue 4, July 2024, 103716

arXiv:2404.16898 [pdf, other]

How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training

Authors: Jaeseong You, Minseop Park, Kyunggeun Lee, Seokjun An, Chirag Patel, Markus Nage

Abstract: This paper investigates three different parameterizations of asymmetric uniform quantization for quantization-aware training: (1) scale and offset, (2) minimum and maximum, and (3) beta and gamma. We perform a comprehensive comparative analysis of these parameterizations' influence on quantization-aware training, using both controlled experiments and real-world large language models. Our particula… ▽ More This paper investigates three different parameterizations of asymmetric uniform quantization for quantization-aware training: (1) scale and offset, (2) minimum and maximum, and (3) beta and gamma. We perform a comprehensive comparative analysis of these parameterizations' influence on quantization-aware training, using both controlled experiments and real-world large language models. Our particular focus is on their changing behavior in response to critical training hyperparameters, bit width and learning rate. Based on our investigation, we propose best practices to stabilize and accelerate quantization-aware training with learnable asymmetric quantization ranges. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16811 [pdf, other]

Make Your LLM Fully Utilize the Context

Authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou

Abstract: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on t… ▽ More While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM. △ Less

Submitted 26 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 19 pages, 7 figures, 3 tables, 9 examples

arXiv:2404.05680 [pdf, other]

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation

Authors: Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, Xiaoguang Han

Abstract: While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often caus… ▽ More While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often causes artifacts for back views. Based on our in-depth analysis, we found the reasons are mainly twofold. First, from network architecture perspective, we found each plane in the utilized tri-plane/tri-grid representation space tends to confuse the features from both sides, causing "mirroring" artifacts (e.g., the glasses appear in the back). Second, from data supervision aspect, we found that existing discriminator training in 3D GANs mainly focuses on the quality of the rendered image itself, and does not care much about its plausibility with the perspective from which it was rendered. This makes it possible to generate "face" in non-frontal views, due to its easiness to fool the discriminator. In response, we propose SphereHead, a novel tri-plane representation in the spherical coordinate system that fits the human head's geometric characteristics and efficiently mitigates many of the generated artifacts. We further introduce a view-image consistency loss for the discriminator to emphasize the correspondence of the camera parameters and the images. The combination of these efforts results in visually superior outcomes with significantly fewer artifacts. Our code and dataset are publicly available at https://lhyfst.github.io/spherehead. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: project page: https://lhyfst.github.io/spherehead

arXiv:2404.03934 [pdf, other]

Direct Electrical Detection of Spin Chemical Potential Due to Spin Hall Effect in $β$-Tungsten and Platinum Using a Pair of Ferromagnetic and Normal Metal Voltage probes

Authors: Soumik Aon, Abu Bakkar Miah, Arpita Mandal, Harekrishna Bhunia, Dhananjaya Mahapatra, Partha Mitra

Abstract: The phenomenon of Spin Hall Effect (SHE) generates a pure spin current transverse to an applied current in materials with strong spin-orbit coupling, although not detectable through conventional electrical measurement. An intuitive Hall effect like measurement configuration is implemented to directly measure pure spin chemical potential of the accumulated spins at the edges of heavy metal (HM) cha… ▽ More The phenomenon of Spin Hall Effect (SHE) generates a pure spin current transverse to an applied current in materials with strong spin-orbit coupling, although not detectable through conventional electrical measurement. An intuitive Hall effect like measurement configuration is implemented to directly measure pure spin chemical potential of the accumulated spins at the edges of heavy metal (HM) channels that generates large SHE. A pair of transverse linearly aligned voltage probes in placed in ohmic contact with the top surface of HM , one being a ferromagnetic metal (FM) with non-zero spin polarization and other is the reference metal (RM) with zero polarization of carriers. This combination of FM/RM electrodes is shown to induce an additional voltage proportional to a spin accumulation potential, which is anti symmetric with respect to opposite orientations of FM controlled by a 2D vector magnet. Proof of concept of the measurement scheme is verified by comparing the signs of voltages for HM channels of Tungsten (W) and Platinum (Pt) which are known to generate opposite spin accumulation under similar conditions of applied current. The same devices are also able to detect the reciprocal effect, inverse spin Hall effect (ISHE) by swap** the current and voltage leads and the results are consistent with reciprocity principle. Further, exploiting a characteristic feature of W thin film deposition, a series of devices were fabricated with W resistivity varying over a wide range of 10 - 750 $μΩ$-cm and the calculated spin Hall resistivity exhibits a pronounced power law dependence on resistivity. Our measurement scheme combined with almost two decades of HM resistivity variation provides the ideal platform required to test the underlying microscopic mechanism responsible for SHE/ISHE. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.17188 [pdf, other]

LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning

Authors: Siyuan Cheng, Guanhong Tao, Yingqi Liu, Guangyu Shen, Shengwei An, Shiwei Feng, Xiangzhe Xu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang

Abstract: Backdoor attack poses a significant security threat to Deep Learning applications. Existing attacks are often not evasive to established backdoor detection techniques. This susceptibility primarily stems from the fact that these attacks typically leverage a universal trigger pattern or transformation function, such that the trigger can cause misclassification for any input. In response to this, re… ▽ More Backdoor attack poses a significant security threat to Deep Learning applications. Existing attacks are often not evasive to established backdoor detection techniques. This susceptibility primarily stems from the fact that these attacks typically leverage a universal trigger pattern or transformation function, such that the trigger can cause misclassification for any input. In response to this, recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions. While these approaches manage to evade detection to some extent, they reveal vulnerability to existing backdoor mitigation techniques. To address and enhance both evasiveness and resilience, we introduce a novel backdoor attack LOTUS. Specifically, it leverages a secret function to separate samples in the victim class into a set of partitions and applies unique triggers to different partitions. Furthermore, LOTUS incorporates an effective trigger focusing mechanism, ensuring only the trigger corresponding to the partition can induce the backdoor behavior. Extensive experimental results show that LOTUS can achieve high attack success rate across 4 datasets and 7 model structures, and effectively evading 13 backdoor detection and mitigation techniques. The code is available at https://github.com/Megum1/LOTUS. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)

arXiv:2403.11426 [pdf, other]

ETH-Tight Algorithm for Cycle Packing on Unit Disk Graphs

Authors: Shinwoo An, Eun** Oh

Abstract: In this paper, we consider the Cycle Packing problem on unit disk graphs defined as follows. Given a unit disk graph G with n vertices and an integer k, the goal is to find a set of $k$ vertex-disjoint cycles of G if it exists. Our algorithm runs in time $2^{O(\sqrt k)}n^{O(1)}$. This improves the $2^{O(\sqrt k\log k)}n^{O(1)}$-time algorithm by Fomin et al. [SODA 2012, ICALP 2017]. Moreover, our… ▽ More In this paper, we consider the Cycle Packing problem on unit disk graphs defined as follows. Given a unit disk graph G with n vertices and an integer k, the goal is to find a set of $k$ vertex-disjoint cycles of G if it exists. Our algorithm runs in time $2^{O(\sqrt k)}n^{O(1)}$. This improves the $2^{O(\sqrt k\log k)}n^{O(1)}$-time algorithm by Fomin et al. [SODA 2012, ICALP 2017]. Moreover, our algorithm is optimal assuming the exponential-time hypothesis. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: In SoCG'24

arXiv:2403.10141 [pdf, other]

Anisotropic magneto-photothermal voltage in Sb2Te3 topological insulator thin films

Authors: Subhadip Manna, Sambhu G Nath, Samrat Roy, Soumik Aon, Sayani Pal, Kanav Sharma, Dhananjaya Mahapatra, Partha Mitra, Sourin Das, Bipul Pal, Chiranjib Mitra

Abstract: We studied longitudinal and Hall photothermal voltages under a planar magnetic field scan in epitaxial thin films of the Topological Insulator (TI) Sb2Te3, grown using pulsed laser deposition (PLD). Unlike prior research that utilised polarised light-induced photocurrent to investigate the TI, our study introduces advancements based on unpolarized light-induced local heating. This method yields a… ▽ More We studied longitudinal and Hall photothermal voltages under a planar magnetic field scan in epitaxial thin films of the Topological Insulator (TI) Sb2Te3, grown using pulsed laser deposition (PLD). Unlike prior research that utilised polarised light-induced photocurrent to investigate the TI, our study introduces advancements based on unpolarized light-induced local heating. This method yields a thermoelectric response exhibiting a direct signature of strong spin-orbit coupling. Our analysis reveals three distinct contributions when fitting the photothermal voltage data to the angular dependence of the planar magnetic field. The interaction between the applied magnetic field and the thermal gradient on the bulk band orbitals enables the differentiation between the ordinary Nernst effect from the out-of-plane thermal gradient and an extraordinary magneto-thermal contribution from the planar thermal gradient. The fitting of our data to theoretical models indicates that these effects primarily arise from the bulk states of the TI rather than the surface states. These findings highlight PLD-grown epitaxial topological insulator thin films as promising candidates for optoelectronic devices, including sensors and actuators. Such devices offer controllable responses through position-dependent, non-invasive local heating via focused incident light and variations in the applied magnetic field direction. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.19431 [pdf, other]

doi 10.1145/3643916.3644403

Compositional API Recommendation for Library-Oriented Code Generation

Authors: Zexiong Ma, Shengnan An, Bing Xie, Zeqi Lin

Abstract: Large language models (LLMs) have achieved exceptional performance in code generation. However, the performance remains unsatisfactory in generating library-oriented code, especially for the libraries not present in the training data of LLMs. Previous work utilizes API recommendation technology to help LLMs use libraries: it retrieves APIs related to the user requirements, then leverages them as c… ▽ More Large language models (LLMs) have achieved exceptional performance in code generation. However, the performance remains unsatisfactory in generating library-oriented code, especially for the libraries not present in the training data of LLMs. Previous work utilizes API recommendation technology to help LLMs use libraries: it retrieves APIs related to the user requirements, then leverages them as context to prompt LLMs. However, developmental requirements can be coarse-grained, requiring a combination of multiple fine-grained APIs. This granularity inconsistency makes API recommendation a challenging task. To address this, we propose CAPIR (Compositional API Recommendation), which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements. Specifically, CAPIR employs an LLM-based Decomposer to break down a coarse-grained task description into several detailed subtasks. Then, CAPIR applies an embedding-based Retriever to identify relevant APIs corresponding to each subtask. Moreover, CAPIR leverages an LLM-based Reranker to filter out redundant APIs and provides the final recommendation. To facilitate the evaluation of API recommendation methods on coarse-grained requirements, we present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation). Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines. Specifically, on RAPID's Torchdata-AR dataset, compared to the state-of-the-art API recommendation approach, CAPIR improves recall@5 from 18.7% to 43.2% and precision@5 from 15.5% to 37.1%. On LOCG's Torchdata-Code dataset, compared to code generation without API recommendation, CAPIR improves pass@100 from 16.0% to 28.0%. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Journal ref: 32nd IEEE/ACM International Conference on Program Comprehension (ICPC 2024), Apr 2024, Lisboa, Portugal

arXiv:2402.11811 [pdf, other]

FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema

Authors: Junru Lu, Siyu An, Min Zhang, Yulan He, Di Yin, Xing Sun

Abstract: When the quality of naive prompts is carefully optimized by human experts, the task performance of large language models (LLMs) can be significantly improved. However, expert-based prompt optimizations are expensive. Herein, some works have proposed Automatic Prompt Optimization (APO), to optimize naive prompts according to task outputs of given in-box testing models, with the help of advanced LLM… ▽ More When the quality of naive prompts is carefully optimized by human experts, the task performance of large language models (LLMs) can be significantly improved. However, expert-based prompt optimizations are expensive. Herein, some works have proposed Automatic Prompt Optimization (APO), to optimize naive prompts according to task outputs of given in-box testing models, with the help of advanced LLMs (e.g., GPT-4) in an ad-hoc way. Although effective, existing schemes suffer from poor generalization ability and privacy risk. To this end, we collect the first large-scale Prompt Optimization Preference dataset (POP), fine-tune offline local LLM-based optimizers, then fairly test with various downstream models. Our method allows accurate optimization of the core task instruction part within the naive prompt in a model-agnostic manner, and thus is named Free-from Instruction-oriented Prompt Optimization (FIPO). In specific, FIPO uses a modular APO template that dynamically integrate the naive task instruction, optional instruction responses, and optional ground truth to produce finely optimized prompts. The POP dataset is meticulously constructed using advanced LLMs, undergoing rigorous cross-validation by human experts and analytical models. Leveraging insights from the data with Tulu2 models and diverse fine-tuning strategies, we validate the efficacy of FIPO framework across five public benchmarks and three testing models. Check codes and data here: https://github.com/LuJunru/FIPO_Project. △ Less

Submitted 16 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.05467 [pdf, other]

Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia

Authors: Guangyu Shen, Siyuan Cheng, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Lu Yan, Zhuo Zhang, Shiqing Ma, Xiangyu Zhang

Abstract: Large Language Models (LLMs) have become prevalent across diverse sectors, transforming human life with their extraordinary reasoning and comprehension abilities. As they find increased use in sensitive tasks, safety concerns have gained widespread attention. Extensive efforts have been dedicated to aligning LLMs with human moral principles to ensure their safe deployment. Despite their potential,… ▽ More Large Language Models (LLMs) have become prevalent across diverse sectors, transforming human life with their extraordinary reasoning and comprehension abilities. As they find increased use in sensitive tasks, safety concerns have gained widespread attention. Extensive efforts have been dedicated to aligning LLMs with human moral principles to ensure their safe deployment. Despite their potential, recent research indicates aligned LLMs are prone to specialized jailbreaking prompts that bypass safety measures to elicit violent and harmful content. The intrinsic discrete nature and substantial scale of contemporary LLMs pose significant challenges in automatically generating diverse, efficient, and potent jailbreaking prompts, representing a continuous obstacle. In this paper, we introduce RIPPLE (Rapid Optimization via Subconscious Exploitation and Echopraxia), a novel optimization-based method inspired by two psychological concepts: subconsciousness and echopraxia, which describe the processes of the mind that occur without conscious awareness and the involuntary mimicry of actions, respectively. Evaluations across 6 open-source LLMs and 4 commercial LLM APIs show RIPPLE achieves an average Attack Success Rate of 91.5\%, outperforming five current methods by up to 47.0\% with an 8x reduction in overhead. Furthermore, it displays significant transferability and stealth, successfully evading established detection mechanisms. The code of our work is available at \url{https://github.com/SolidShen/RIPPLE_official/tree/official} △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2401.07571 [pdf, other]

A Bi-Pyramid Multimodal Fusion Method for the Diagnosis of Bipolar Disorders

Authors: Guoxin Wang, Sheng Shi, Shan An, Fengmei Fan, Wenshu Ge, Qi Wang, Feng Yu, Zhiren Wang

Abstract: Previous research on the diagnosis of Bipolar disorder has mainly focused on resting-state functional magnetic resonance imaging. However, their accuracy can not meet the requirements of clinical diagnosis. Efficient multimodal fusion strategies have great potential for applications in multimodal data and can further improve the performance of medical diagnosis models. In this work, we utilize bot… ▽ More Previous research on the diagnosis of Bipolar disorder has mainly focused on resting-state functional magnetic resonance imaging. However, their accuracy can not meet the requirements of clinical diagnosis. Efficient multimodal fusion strategies have great potential for applications in multimodal data and can further improve the performance of medical diagnosis models. In this work, we utilize both sMRI and fMRI data and propose a novel multimodal diagnosis model for bipolar disorder. The proposed Patch Pyramid Feature Extraction Module extracts sMRI features, and the spatio-temporal pyramid structure extracts the fMRI features. Finally, they are fused by a fusion module to output diagnosis results with a classifier. Extensive experiments show that our proposed method outperforms others in balanced accuracy from 0.657 to 0.732 on the OpenfMRI dataset, and achieves the state of the art. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: Accepted by IEEE ICASSP 2024

arXiv:2401.03850 [pdf, other]

Inverse Nonlinearity Compensation of Hyperelastic Deformation in Dielectric Elastomer for Acoustic Actuation

Authors: ** Woo Lee, Gwang Seok An, Jeong-Yun Sun, Kyogu Lee

Abstract: This paper delves into the analysis of nonlinear deformation induced by dielectric actuation in pre-stressed ideal dielectric elastomers. It formulates a nonlinear ordinary differential equation governing this deformation based on the hyperelastic model under dielectric stress. Through numerical integration and neural network approximations, the relationship between voltage and stretch is establis… ▽ More This paper delves into the analysis of nonlinear deformation induced by dielectric actuation in pre-stressed ideal dielectric elastomers. It formulates a nonlinear ordinary differential equation governing this deformation based on the hyperelastic model under dielectric stress. Through numerical integration and neural network approximations, the relationship between voltage and stretch is established. Neural networks are employed to approximate solutions for voltage-to-stretch and stretch-to-voltage transformations obtained via an explicit Runge-Kutta method. The effectiveness of these approximations is demonstrated by leveraging them for compensating nonlinearity through the wavesha** of the input signal. The comparative analysis highlights the superior accuracy of the approximated solutions over baseline methods, resulting in minimized harmonic distortions when utilizing dielectric elastomers as acoustic actuators. This study underscores the efficacy of the proposed approach in mitigating nonlinearities and enhancing the performance of dielectric elastomers in acoustic actuation applications. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.01457 [pdf, other]

Flip Graphs on Self-Complementary Ideals of Chain Products

Authors: Serena An, Holden Mui

Abstract: In this paper, we introduce a flip operation on self-complementary ideals of chain product posets and study the resulting flip graphs. We give asymptotics for the number of vertices in these graphs, compute their diameters, and give bounds for their radii. We also define similar flip operations on self-complementary ideals of the chain product $[2r]\times [2r]\times [2r]$ satisfying additional sym… ▽ More In this paper, we introduce a flip operation on self-complementary ideals of chain product posets and study the resulting flip graphs. We give asymptotics for the number of vertices in these graphs, compute their diameters, and give bounds for their radii. We also define similar flip operations on self-complementary ideals of the chain product $[2r]\times [2r]\times [2r]$ satisfying additional symmetries, and we achieve similar results for the resulting flip graphs. △ Less

Submitted 3 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: 39 pages, 32 figures

arXiv:2401.00496 [pdf, other]

SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

Authors: Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi , et al. (25 additional authors not shown)

Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme… ▽ More Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091 △ Less

Submitted 23 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.14492 [pdf, other]

Context Enhanced Transformer for Single Image Object Detection

Authors: Seungjun An, Seonghoon Park, Gyeongnyeon Kim, Jeongyeol Baek, Byeongwon Lee, Seungryong Kim

Abstract: With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-bas… ▽ More With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR. △ Less

Submitted 26 December, 2023; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: Project page: https://ku-cvlab.github.io/CETR

arXiv:2312.13783 [pdf, other]

Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection

Authors: Soopil Kim, Sion An, Philip Chikontwe, Myeongkyun Kang, Ehsan Adeli, Kilian M. Pohl, Sang Hyun Park

Abstract: Logical anomalies (LA) refer to data violating underlying logical constraints e.g., the quantity, arrangement, or composition of components within an image. Detecting accurately such anomalies requires models to reason about various component types through segmentation. However, curation of pixel-level annotations for semantic segmentation is both time-consuming and expensive. Although there are s… ▽ More Logical anomalies (LA) refer to data violating underlying logical constraints e.g., the quantity, arrangement, or composition of components within an image. Detecting accurately such anomalies requires models to reason about various component types through segmentation. However, curation of pixel-level annotations for semantic segmentation is both time-consuming and expensive. Although there are some prior few-shot or unsupervised co-part segmentation algorithms, they often fail on images with industrial object. These images have components with similar textures and shapes, and a precise differentiation proves challenging. In this study, we introduce a novel component segmentation model for LA detection that leverages a few labeled samples and unlabeled images sharing logical constraints. To ensure consistent segmentation across unlabeled images, we employ a histogram matching loss in conjunction with an entropy loss. As segmentation predictions play a crucial role, we propose to enhance both local and global sample validity detection by capturing key aspects from visual semantics via three memory banks: class histograms, component composition embeddings and patch-level representations. For effective LA detection, we propose an adaptive scaling strategy to standardize anomaly scores from different memory banks in inference. Extensive experiments on the public benchmark MVTec LOCO AD reveal our method achieves 98.1% AUROC in LA detection vs. 89.6% from competing methods. △ Less

Submitted 15 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted in AAAI2024

arXiv:2312.06871 [pdf, other]

Using Analytics on Student Created Data to Content Validate Pedagogical Tools

Authors: John Kos, Kenneth Eaton, Sareen Zhang, Rahul Dass, Stephen Buckley, Sungeun An, Ashok Goel

Abstract: Conceptual and simulation models can function as useful pedagogical tools, however it is important to categorize different outcomes when evaluating them in order to more meaningfully interpret results. VERA is a ecology-based conceptual modeling software that enables users to simulate interactions between biotics and abiotics in an ecosystem, allowing users to form and then verify hypothesis throu… ▽ More Conceptual and simulation models can function as useful pedagogical tools, however it is important to categorize different outcomes when evaluating them in order to more meaningfully interpret results. VERA is a ecology-based conceptual modeling software that enables users to simulate interactions between biotics and abiotics in an ecosystem, allowing users to form and then verify hypothesis through observing a time series of the species populations. In this paper, we classify this time series into common patterns found in the domain of ecological modeling through two methods, hierarchical clustering and curve fitting, illustrating a general methodology for showing content validity when combining different pedagogical tools. When applied to a diverse sample of 263 models containing 971 time series collected from three different VERA user categories: a Georgia Tech (GATECH), North Georgia Technical College (NGTC), and ``Self Directed Learners'', results showed agreement between both classification methods on 89.38\% of the sample curves in the test set. This serves as a good indication that our methodology for determining content validity was successful. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 16 pages, preprint

arXiv:2312.06405 [pdf, other]

Optimizing Resonator Frequency Stability in Flip-Chip Architectures: A Novel Experimental Design Approach

Authors: Yuan Li, Tianhui Wang, **g**g Hu, Dengfeng Li, Shuoming An

Abstract: In multi-qubit superconducting systems utilizing flip-chip technology, achieving high accuracy in resonator frequencies is of paramount importance, particularly when multiple resonators share a common Purcell filter with restricted bandwidth. Nevertheless, variations in inter-chip spacing can considerably influence these frequencies. To tackle this issue, we present and experimentally validate the… ▽ More In multi-qubit superconducting systems utilizing flip-chip technology, achieving high accuracy in resonator frequencies is of paramount importance, particularly when multiple resonators share a common Purcell filter with restricted bandwidth. Nevertheless, variations in inter-chip spacing can considerably influence these frequencies. To tackle this issue, we present and experimentally validate the effectiveness of a resonator design. In our design, we etch portions of the metal on the bottom chip that faces the resonator structure on the top chip. This enhanced design substantially improves frequency stability by a factor of over 3.5 compared to the non-optimized design, as evaluated by the root mean square error of a linear fitting of the observed frequency distribution, which is intended to be linear. This advancement is crucial for successful scale-up and achievement of high-fidelity quantum operations. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.03307 [pdf, other]

Balanced Marginal and Joint Distributional Learning via Mixture Cramer-Wold Distance

Authors: Seunghwan An, Sungchul Hong, Jong-June Jeon

Abstract: In the process of training a generative model, it becomes essential to measure the discrepancy between two high-dimensional probability distributions: the generative distribution and the ground-truth distribution of the observed dataset. Recently, there has been growing interest in an approach that involves slicing high-dimensional distributions, with the Cramer-Wold distance emerging as a promisi… ▽ More In the process of training a generative model, it becomes essential to measure the discrepancy between two high-dimensional probability distributions: the generative distribution and the ground-truth distribution of the observed dataset. Recently, there has been growing interest in an approach that involves slicing high-dimensional distributions, with the Cramer-Wold distance emerging as a promising method. However, we have identified that the Cramer-Wold distance primarily focuses on joint distributional learning, whereas understanding marginal distributional patterns is crucial for effective synthetic data generation. In this paper, we introduce a novel measure of dissimilarity, the mixture Cramer-Wold distance. This measure enables us to capture both marginal and joint distributional information simultaneously, as it incorporates a mixture measure with point masses on standard basis vectors. Building upon the mixture Cramer-Wold distance, we propose a new generative model called CWDAE (Cramer-Wold Distributional AutoEncoder), which shows remarkable performance in generating synthetic data when applied to real tabular datasets. Furthermore, our model offers the flexibility to adjust the level of data privacy with ease. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.01729 [pdf, other]

EdgeConvFormer: Dynamic Graph CNN and Transformer based Anomaly Detection in Multivariate Time Series

Authors: Jie Liu, Qilin Li, Senjian An, Bradley Ezard, Ling Li

Abstract: Transformer-based models for anomaly detection in multivariate time series can benefit from the self-attention mechanism due to its advantage in modeling long-term dependencies. However, Transformer-based anomaly detection models have problems such as a large amount of data being required for training, standard positional encoding is not suitable for multivariate time series data, and the interdep… ▽ More Transformer-based models for anomaly detection in multivariate time series can benefit from the self-attention mechanism due to its advantage in modeling long-term dependencies. However, Transformer-based anomaly detection models have problems such as a large amount of data being required for training, standard positional encoding is not suitable for multivariate time series data, and the interdependence between time series is not considered. To address these limitations, we propose a novel anomaly detection method, named EdgeConvFormer, which integrates Time2vec embedding, stacked dynamic graph CNN, and Transformer to extract global and local spatial-time information. This design of EdgeConvFormer empowers it with decomposition capacities for complex time series, progressive spatiotemporal correlation discovery between time series, and representation aggregation of multi-scale features. Experiments demonstrate that EdgeConvFormer can learn the spatial-temporal correlations from multivariate time series data and achieve better anomaly detection performance than the state-of-the-art approaches on many real-world datasets of different scales. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.00050 [pdf, other]

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

Authors: Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang, Qiuling Xu, Guanhong Tao, Guangyu Shen, Siyuan Cheng, Shiqing Ma, Pin-Yu Chen, Tsung-Yi Ho, Xiangyu Zhang

Abstract: Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with a trigger (e.g., a white patch), the backdoored model always generates the target image… ▽ More Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with a trigger (e.g., a white patch), the backdoored model always generates the target image (e.g., an improper photo). However, effective defense strategies to mitigate backdoors from DMs are underexplored. To bridge this gap, we propose the first backdoor detection and removal framework for DMs. We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM, with 13 samplers against 3 existing backdoor attacks. Extensive experiments show that our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility. △ Less

Submitted 4 February, 2024; v1 submitted 27 November, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2311.03665 [pdf, other]

Faster Algorithms for Cycle Hitting Problems on Disk Graphs

Authors: Shinwoo An, Kyung** Cho, Eun** Oh

Abstract: In this paper, we consider three hitting problems on a disk intersection graph: Triangle Hitting Set, Feedback Vertex Set, and Odd Cycle Transversal. Given a disk intersection graph $G$, our goal is to compute a set of vertices hitting all triangles, all cycles, or all odd cycles, respectively. Our algorithms run in time $2^{\tilde O(k^{4/5})}n^{O(1)}$, $2^{\tilde O(k^{9/10})}n^{O(1)}$, and… ▽ More In this paper, we consider three hitting problems on a disk intersection graph: Triangle Hitting Set, Feedback Vertex Set, and Odd Cycle Transversal. Given a disk intersection graph $G$, our goal is to compute a set of vertices hitting all triangles, all cycles, or all odd cycles, respectively. Our algorithms run in time $2^{\tilde O(k^{4/5})}n^{O(1)}$, $2^{\tilde O(k^{9/10})}n^{O(1)}$, and $2^{\tilde O(k^{19/20})}n^{O(1)}$, respectively, where $n$ denotes the number of vertices of $G$. These do not require a geometric representation of a disk graph. If a geometric representation of a disk graph is given as input, we can solve these problems more efficiently. In this way, we improve the algorithms for those three problem by Lokshtanov et al. [SODA 2022]. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: WADS 2023

arXiv:2310.20689 [pdf, other]

Learning From Mistakes Makes LLM Better Reasoner

Authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, Weizhu Chen

Abstract: Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve their reasoning capabilities, this work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this… ▽ More Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve their reasoning capabilities, this work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LEMA incorporates mistake-correction data pairs during fine-tuning LLMs. Specifically, we first collect inaccurate reasoning paths from various LLMs, and then employ GPT-4 as a ''corrector'' to identify the mistake step, explain the reason for the mistake, correct the mistake and generate the final answer. In addition, we apply a correction-centric evolution strategy that effectively expands the question set for generating correction data. Experiments across various LLMs and reasoning tasks show that LEMA effectively improves CoT-alone fine-tuning. Our further ablations shed light on the non-homogeneous effectiveness between CoT data and correction data. These results suggest a significant potential for LLMs to improve through learning from their mistakes. Our code, models and prompts are publicly available at https://github.com/microsoft/LEMA. △ Less

Submitted 29 March, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: 23 pages, 13 figures, 6 tables

arXiv:2310.20187 [pdf, other]

Self-Supervised Pre-Training for Precipitation Post-Processor

Authors: Sojung An, Junha Lee, Jiyeon Jang, Inchae Na, Wooyeon Park, Sujeong You

Abstract: Obtaining a sufficient forecast lead time for local precipitation is essential in preventing hazardous weather events. Global warming-induced climate change increases the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this paper, we propose a deep learning-based precipitation post-processor for numerical weather prediction (NWP) models. The precipitation… ▽ More Obtaining a sufficient forecast lead time for local precipitation is essential in preventing hazardous weather events. Global warming-induced climate change increases the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this paper, we propose a deep learning-based precipitation post-processor for numerical weather prediction (NWP) models. The precipitation post-processor consists of (i) employing self-supervised pre-training, where the parameters of the encoder are pre-trained on the reconstruction of the masked variables of the atmospheric physics domain; and (ii) conducting transfer learning on precipitation segmentation tasks (the target domain) from the pre-trained encoder. In addition, we introduced a heuristic labeling approach to effectively train class-imbalanced datasets. Our experiments on precipitation correction for regional NWP show that the proposed method outperforms other approaches. △ Less

Submitted 19 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: 7 pages, 3 figures, 1 table, accepted to NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning at [this http URL](https://www.climatechange.ai/papers/neurips2023/18)

arXiv:2310.16374 [pdf, other]

Joint Distributional Learning via Cramer-Wold Distance

Authors: Seunghwan An, Jong-June Jeon

Abstract: The assumption of conditional independence among observed variables, primarily used in the Variational Autoencoder (VAE) decoder modeling, has limitations when dealing with high-dimensional datasets or complex correlation structures among observed variables. To address this issue, we introduced the Cramer-Wold distance regularization, which can be computed in a closed-form, to facilitate joint dis… ▽ More The assumption of conditional independence among observed variables, primarily used in the Variational Autoencoder (VAE) decoder modeling, has limitations when dealing with high-dimensional datasets or complex correlation structures among observed variables. To address this issue, we introduced the Cramer-Wold distance regularization, which can be computed in a closed-form, to facilitate joint distributional learning for high-dimensional datasets. Additionally, we introduced a two-step learning method to enable flexible prior modeling and improve the alignment between the aggregated posterior and the prior distribution. Furthermore, we provide theoretical distinctions from existing methods within this category. To evaluate the synthetic data generation performance of our proposed approach, we conducted experiments on high-dimensional datasets with multiple categorical variables. Given that many readily available datasets and data science applications involve such datasets, our experiments demonstrate the effectiveness of our proposed methodology. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.15179 [pdf, other]

Reducing Uncertainty in Sea-level Rise Prediction: A Spatial-variability-aware Approach

Authors: Subhankar Ghosh, Shuai An, Arun Sharma, Jayant Gupta, Shashi Shekhar, Aneesh Subramanian

Abstract: Given multi-model ensemble climate projections, the goal is to accurately and reliably predict future sea-level rise while lowering the uncertainty. This problem is important because sea-level rise affects millions of people in coastal communities and beyond due to climate change's impacts on polar ice sheets and the ocean. This problem is challenging due to spatial variability and unknowns such a… ▽ More Given multi-model ensemble climate projections, the goal is to accurately and reliably predict future sea-level rise while lowering the uncertainty. This problem is important because sea-level rise affects millions of people in coastal communities and beyond due to climate change's impacts on polar ice sheets and the ocean. This problem is challenging due to spatial variability and unknowns such as possible tip** points (e.g., collapse of Greenland or West Antarctic ice-shelf), climate feedback loops (e.g., clouds, permafrost thawing), future policy decisions, and human actions. Most existing climate modeling approaches use the same set of weights globally, during either regression or deep learning to combine different climate projections. Such approaches are inadequate when different regions require different weighting schemes for accurate and reliable sea-level rise predictions. This paper proposes a zonal regression model which addresses spatial variability and model inter-dependency. Experimental results show more reliable predictions using the weights learned via this approach on a regional scale. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 6 pages, 5 figures, I-GUIDE 2023 conference

ACM Class: J.2; I.2.m; I.2.6; I.2.1; I.2

arXiv:2310.11650 [pdf, other]

VKIE: The Application of Key Information Extraction on Video Text

Authors: Siyu An, Ye Liu, Haoyuan Peng, Di Yin

Abstract: Extracting structured information from videos is critical for numerous downstream applications in the industry. In this paper, we define a significant task of extracting hierarchical key information from visual texts on videos. To fulfill this task, we decouple it into four subtasks and introduce two implementation solutions called PipVKIE and UniVKIE. PipVKIE sequentially completes the four subta… ▽ More Extracting structured information from videos is critical for numerous downstream applications in the industry. In this paper, we define a significant task of extracting hierarchical key information from visual texts on videos. To fulfill this task, we decouple it into four subtasks and introduce two implementation solutions called PipVKIE and UniVKIE. PipVKIE sequentially completes the four subtasks in continuous stages, while UniVKIE is improved by unifying all the subtasks into one backbone. Both PipVKIE and UniVKIE leverage multimodal information from vision, text, and coordinates for feature representation. Extensive experiments on one well-defined dataset demonstrate that our solutions can achieve remarkable performance and efficient inference speed. △ Less

Submitted 9 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.06011 [pdf, other]

On the root cause of the host `mass-step' in the Hubble residuals of type Ia supernovae

Authors: Chul Chung, Suk-** Yoon, Seunghyun Park, Seunghyeon An, Junhyuk Son, Hyejeon Cho, Young-Wook Lee

Abstract: It is well established that the Hubble residuals of type Ia supernovae (SNe Ia) show the luminosity step with respect to their host galaxy stellar masses. This `mass-step' is taken as an additional correction factor for the SN Ia luminosity standardization. Here we investigate the root cause of the mass-step and propose that the bimodal nature of the host $age$ distribution is responsible for the… ▽ More It is well established that the Hubble residuals of type Ia supernovae (SNe Ia) show the luminosity step with respect to their host galaxy stellar masses. This `mass-step' is taken as an additional correction factor for the SN Ia luminosity standardization. Here we investigate the root cause of the mass-step and propose that the bimodal nature of the host $age$ distribution is responsible for the step. In particular, by using the empirical $nonlinear$ mass-to-age relation of local galaxies, we convert the mass function of SN Ia hosts to their age distribution. We find that the age distribution shows clear bimodality: a younger ($<$ 6 Gyr) group with lower mass ($\sim 10^{9.5}{\rm M}_{\rm sun}$) and an older ($>$ 6 Gyr) group with higher mass ($\sim 10^{10.5}{\rm M}_{\rm sun}$). On the Hubble residual versus host mass plane, the two groups create the mass-step at $\sim 10^{10}{\rm M}_{\rm sun}$. This leads us to conclude that the host galaxy mass-step can be attributed to the bimodal age distribution in relation to a nonlinear relation between galaxy mass and age. We suggest that the mass-step is another manifestation of the old `red sequence' and the young `blue cloud' observed in the galactic color--magnitude diagram. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted for publication in ApJ, 10 pages, 5 figures, 1 table

arXiv:2310.02690 [pdf, other]

Multi-Dimension-Embedding-Aware Modality Fusion Transformer for Psychiatric Disorder Clasification

Authors: Guoxin Wang, Xuyang Cao, Shan An, Fengmei Fan, Chao Zhang, **song Wang, Feng Yu, Zhiren Wang

Abstract: Deep learning approaches, together with neuroimaging techniques, play an important role in psychiatric disorders classification. Previous studies on psychiatric disorders diagnosis mainly focus on using functional connectivity matrices of resting-state functional magnetic resonance imaging (rs-fMRI) as input, which still needs to fully utilize the rich temporal information of the time series of rs… ▽ More Deep learning approaches, together with neuroimaging techniques, play an important role in psychiatric disorders classification. Previous studies on psychiatric disorders diagnosis mainly focus on using functional connectivity matrices of resting-state functional magnetic resonance imaging (rs-fMRI) as input, which still needs to fully utilize the rich temporal information of the time series of rs-fMRI data. In this work, we proposed a multi-dimension-embedding-aware modality fusion transformer (MFFormer) for schizophrenia and bipolar disorder classification using rs-fMRI and T1 weighted structural MRI (T1w sMRI). Concretely, to fully utilize the temporal information of rs-fMRI and spatial information of sMRI, we constructed a deep learning architecture that takes as input 2D time series of rs-fMRI and 3D volumes T1w. Furthermore, to promote intra-modality attention and information fusion across different modalities, a fusion transformer module (FTM) is designed through extensive self-attention of hybrid feature maps of multi-modality. In addition, a dimension-up and dimension-down strategy is suggested to properly align feature maps of multi-dimensional from different modalities. Experimental results on our private and public OpenfMRI datasets show that our proposed MFFormer performs better than that using a single modality or multi-modality MRI on schizophrenia and bipolar disorder diagnosis. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.13450 [pdf]

Conducting A/B Experiments with a Scalable Architecture

Authors: Andrew Hornback, Sungeun An, Scott Bunin, Stephen Buckley, John Kos, Ashok Goel

Abstract: A/B experiments are commonly used in research to compare the effects of changing one or more variables in two different experimental groups - a control group and a treatment group. While the benefits of using A/B experiments are widely known and accepted, there is less agreement on a principled approach to creating software infrastructure systems to assist in rapidly conducting such experiments. W… ▽ More A/B experiments are commonly used in research to compare the effects of changing one or more variables in two different experimental groups - a control group and a treatment group. While the benefits of using A/B experiments are widely known and accepted, there is less agreement on a principled approach to creating software infrastructure systems to assist in rapidly conducting such experiments. We propose a four-principle approach for develo** a software architecture to support A/B experiments that is domain agnostic and can help alleviate some of the resource constraints currently needed to successfully implement these experiments: the software architecture (i) must retain the typical properties of A/B experiments, (ii) capture problem solving activities and outcomes, (iii) allow researchers to understand the behavior and outcomes of participants in the experiment, and (iv) must enable automated analysis. We successfully developed a software system to encapsulate these principles and implement it in a real-world A/B experiment. △ Less

Submitted 23 September, 2023; originally announced September 2023.

arXiv:2309.05590 [pdf, other]

Temporal Action Localization with Enhanced Instant Discriminability

Authors: Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, Dacheng Tao

Abstract: Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often result in imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated… ▽ More Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often result in imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. Then, we analyze the rank-loss problem (i.e. instant discriminability deterioration) in transformer-based methods and propose an efficient scalable-granularity perception (SGP) layer to mitigate this issue. To further push the limit of instant discriminability in the video backbone, we leverage the strong representation capability of pretrained large models and investigate their performance on TAD. Last, considering the adequate spatial-temporal context for classification, we design a decoupled feature pyramid network with separate feature pyramids to incorporate rich spatial context from the large model for localization. Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets, including hierarchical (multilabel) TAD datasets. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: An extended version of the CVPR paper arXiv:2303.07347, submitted to IJCV

arXiv:2308.15779 [pdf, other]

Exploring GaN crystallographic orientation disparity and its origin on bare and partly graphene-covered $m$-plane sapphire substrates

Authors: Hyunkyu Lee, Hyeonoh Jo, Jae Hun Kim, Jongwoo Ha, Su Young An, Jaewu Choi, Chinkyo Kim

Abstract: The crystallographic orientation of 3D materials grown over 2D material-covered substrates is one of the critical factors in discerning the true growth mechanism among competing possibilities, including remote epitaxy, van der Waals epitaxy, and pinhole-seeded lateral epitaxy also known as thru-hole epitaxy. However, definitive identification demands meticulous investigation to accurately interpre… ▽ More The crystallographic orientation of 3D materials grown over 2D material-covered substrates is one of the critical factors in discerning the true growth mechanism among competing possibilities, including remote epitaxy, van der Waals epitaxy, and pinhole-seeded lateral epitaxy also known as thru-hole epitaxy. However, definitive identification demands meticulous investigation to accurately interpret experimentally observed crystallographic orientations, as misinterpretation can lead to mistaken conclusions regarding the underlying growth mechanism. In this study, we demonstrate that GaN domains exhibit orientation disparities when grown on both bare and partly graphene-covered $m$-plane sapphire substrates. Comprehensive measurements of crystallographic orientation unambiguously reveal that GaN domains adopt (100) and (103) orientations even when grown under identical growth conditions on bare and partly graphene-covered $m$-plane sapphire substrates, respectively. Particularly, high-resolution transmission electron microscopy unequivocally establishes that GaN grown over partly graphene-covered $m$-plane sapphire substrates started to nucleate on the exposed sapphire surface. Our research elucidates that crystallographic orientation disparities can arise even from thru-hole epitaxy, challenging the commonly accepted notion that such disparities cannot be attributed to thru-hole epitaxy when grown under identical growth conditions. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 15 pages, 5 figures

Showing 1–50 of 213 results for author: An, S