-
Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Authors:
Ashok Vardhan Makkuva,
Marco Bondaschi,
Chanakya Ekbote,
Adway Girish,
Alliot Nagle,
Hyeji Kim,
Michael Gastpar
Abstract:
In recent years, transformer-based models have revolutionized deep learning, particularly in sequence modeling. To better understand this phenomenon, there is a growing interest in using Markov input processes to study transformers. However, our current understanding in this regard remains limited with many fundamental questions about how transformers learn Markov chains still unanswered. In this…
▽ More
In recent years, transformer-based models have revolutionized deep learning, particularly in sequence modeling. To better understand this phenomenon, there is a growing interest in using Markov input processes to study transformers. However, our current understanding in this regard remains limited with many fundamental questions about how transformers learn Markov chains still unanswered. In this paper, we address this by focusing on first-order Markov chains and single-layer transformers, providing a comprehensive characterization of the learning dynamics in this context. Specifically, we prove that transformer parameters trained on next-token prediction loss can either converge to global or local minima, contingent on the initialization and the Markovian data properties, and we characterize the precise conditions under which this occurs. To the best of our knowledge, this is the first result of its kind highlighting the role of initialization. We further demonstrate that our theoretical findings are corroborated by empirical evidence. Based on these insights, we provide guidelines for the initialization of transformer parameters and demonstrate their effectiveness. Finally, we outline several open problems in this arena. Code is available at: https://github.com/Bond1995/Markov.
△ Less
Submitted 27 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task
Authors:
Unggi Lee,
Jiyeong Bae,
Dohee Kim,
Sookbun Lee,
Jaekwon Park,
Taekyung Ahn,
Gunho Lee,
Damji Stratton,
Hyeoncheol Kim
Abstract:
Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that…
▽ More
Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that integrates pre-trained language models (PLMs) with KT methods. By leveraging the power of language models to capture semantic representations, LKT effectively incorporates textual information and significantly outperforms previous KT models on large benchmark datasets. Moreover, we demonstrate that LKT can effectively address the cold-start problem in KT by leveraging the semantic knowledge captured by PLMs. Interpretability of LKT is enhanced compared to traditional KT models due to its use of text-rich data. We conducted the local interpretable model-agnostic explanation technique and analysis of attention scores to interpret the model performance further. Our work highlights the potential of integrating PLMs with KT and paves the way for future research in KT domain.
△ Less
Submitted 9 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks
Authors:
Hojoon Lee,
Hyeonseo Cho,
Hyunseung Kim,
Donghu Kim,
Dugki Min,
Jaegul Choo,
Clare Lyle
Abstract:
This study investigates the loss of generalization ability in neural networks, revisiting warm-starting experiments from Ash & Adams. Our empirical analysis reveals that common methods designed to enhance plasticity by maintaining trainability provide limited benefits to generalization. While reinitializing the network can be effective, it also risks losing valuable prior knowledge. To this end, w…
▽ More
This study investigates the loss of generalization ability in neural networks, revisiting warm-starting experiments from Ash & Adams. Our empirical analysis reveals that common methods designed to enhance plasticity by maintaining trainability provide limited benefits to generalization. While reinitializing the network can be effective, it also risks losing valuable prior knowledge. To this end, we introduce the Hare & Tortoise, inspired by the brain's complementary learning system. Hare & Tortoise consists of two components: the Hare network, which rapidly adapts to new information analogously to the hippocampus, and the Tortoise network, which gradually integrates knowledge akin to the neocortex. By periodically reinitializing the Hare network to the Tortoise's weights, our method preserves plasticity while retaining general knowledge. Hare & Tortoise can effectively maintain the network's ability to generalize, which improves advanced reinforcement learning algorithms on the Atari-100k benchmark. The code is available at https://github.com/dojeon-ai/hare-tortoise.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis
Authors:
Yi Hu,
Hyeon** Kim,
Kai Ye,
Ning Lu
Abstract:
This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate t…
▽ More
This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate the effectiveness of the fine-tuned model in accurately restoring missing data, achieving comparable performance to state-of-the-art specifically designed models such as BERT-PIN. Key findings include the importance of prompt engineering and the optimal utilization of fine-tuning samples, highlighting the efficiency of few-shot learning in transferring knowledge from general user cases to specific target users. Furthermore, the proposed approach demonstrates notable cost-effectiveness and time efficiency compared to training models from scratch, making it a practical solution for scenarios with limited data availability and computing resources. This research has significant potential for application to other power system load profile analysis tasks. Consequently, it advances the use of LLMs in power system analytics, offering promising implications for enhancing the resilience and efficiency of power distribution systems.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Identifying Sample Size and Accuracy and Precision of the Estimators in Case-Crossover Designs with Distributed Lags of Heteroskedastic Time-Varying Continuous Exposures Measured with Simple or Complex Error
Authors:
Honghyok Kim
Abstract:
Understanding of sample size, statistical power, and the accuracy and precision of the estimator in epidemiological research can facilitate power and bias analyses. However, such understanding can become complicated for several reasons. First, exposures varying spatiotemporally may be heteroskedastic. Second, distributed lags of exposures may be used to identify critical exposure time-windows. Thi…
▽ More
Understanding of sample size, statistical power, and the accuracy and precision of the estimator in epidemiological research can facilitate power and bias analyses. However, such understanding can become complicated for several reasons. First, exposures varying spatiotemporally may be heteroskedastic. Second, distributed lags of exposures may be used to identify critical exposure time-windows. Third, exposure measurement error may exist, impacting the accuracy and/or precision of the estimator that consequently affects sample size and statistical power. Fourth, research may rely on different study designs, so understanding may differ. For example, case-crossover designs as matched case-control designs, are used to estimate health effects of short-term exposures. To address these gaps, I developed approximation equations for sample size, estimates of the estimators and standard errors, including polynomials for non-linear effect estimation. With air pollution exposure estimates, I examined approximations using statistical simulations. Overall, sample size, the accuracy and precision of the estimators can be approximated based on external information about validation, without validation data in hand. For distributed lags, approximations may perform well if residual confounding due to covariate measurement errors is not severe. This condition may be difficult to identify without validation data, so validation research is recommended in identifying critical exposure time-windows.
△ Less
Submitted 18 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Measure-Observe-Remeasure: An Interactive Paradigm for Differentially-Private Exploratory Analysis
Authors:
Priyanka Nanayakkara,
Hyeok Kim,
Yifan Wu,
Ali Sarvghad,
Narges Mahyar,
Gerome Miklau,
Jessica Hullman
Abstract:
Differential privacy (DP) has the potential to enable privacy-preserving analysis on sensitive data, but requires analysts to judiciously spend a limited ``privacy loss budget'' $ε$ across queries. Analysts conducting exploratory analyses do not, however, know all queries in advance and seldom have DP expertise. Thus, they are limited in their ability to specify $ε$ allotments across queries prior…
▽ More
Differential privacy (DP) has the potential to enable privacy-preserving analysis on sensitive data, but requires analysts to judiciously spend a limited ``privacy loss budget'' $ε$ across queries. Analysts conducting exploratory analyses do not, however, know all queries in advance and seldom have DP expertise. Thus, they are limited in their ability to specify $ε$ allotments across queries prior to an analysis. To support analysts in spending $ε$ efficiently, we propose a new interactive analysis paradigm, Measure-Observe-Remeasure, where analysts ``measure'' the database with a limited amount of $ε$, observe estimates and their errors, and remeasure with more $ε$ as needed.
We instantiate the paradigm in an interactive visualization interface which allows analysts to spend increasing amounts of $ε$ under a total budget. To observe how analysts interact with the Measure-Observe-Remeasure paradigm via the interface, we conduct a user study that compares the utility of $ε$ allocations and findings from sensitive data participants make to the allocations and findings expected of a rational agent who faces the same decision task. We find that participants are able to use the workflow relatively successfully, including using budget allocation strategies that maximize over half of the available utility stemming from $ε$ allocation. Their loss in performance relative to a rational agent appears to be driven more by their inability to access information and report it than to allocate $ε$.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Authors:
Junho Kim,
Hyunjun Kim,
Yeonju Kim,
Yong Man Ro
Abstract:
Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation. However, alongside these advancements, the issue of hallucinations has emerged as a significant challenge, producing erroneous responses that are unrelated to the visual contents. In this paper, we introduce a novel contrastive-based decoding method, COu…
▽ More
Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation. However, alongside these advancements, the issue of hallucinations has emerged as a significant challenge, producing erroneous responses that are unrelated to the visual contents. In this paper, we introduce a novel contrastive-based decoding method, COuntering DEscription Contrastive Decoding (CODE), which leverages self-generated descriptions as contrasting references during the decoding phase of LMMs to address hallucination issues. CODE utilizes the comprehensive descriptions from model itself as visual counterpart to correct and improve response alignment with actual visual content. By dynamically adjusting the information flow and distribution of next-token predictions in the LMM's vocabulary, CODE enhances the coherence and informativeness of generated responses. Extensive experiments demonstrate that our method significantly reduces hallucinations and improves cross-modal consistency across various benchmarks and cutting-edge LMMs. Our method provides a simple yet effective decoding strategy that can be integrated to existing LMM frameworks without additional training.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
General relativistic self-gravitating equilibrium disks around rotating neutron stars
Authors:
Yoonsoo Kim,
**ho Kim,
Hee Il Kim,
Hyung Mok Lee
Abstract:
In modeling a relativistic disk around a compact object, the self-gravity of the disk is often neglected while it needs to be incorporated for more accurate descriptions in several circumstances. Extending the Komatsu-Eriguchi-Hachisu self-consistent field method, we present numerical models of a rapidly rotating neutron star with a self-gravitating disk in stationary equilibrium. In particular, o…
▽ More
In modeling a relativistic disk around a compact object, the self-gravity of the disk is often neglected while it needs to be incorporated for more accurate descriptions in several circumstances. Extending the Komatsu-Eriguchi-Hachisu self-consistent field method, we present numerical models of a rapidly rotating neutron star with a self-gravitating disk in stationary equilibrium. In particular, our approach allows us to obtain numerical solutions involving a massive disk with the rest mass $O(10^{-1})-O(10^0) M_\odot$ closely attached to a rotating neutron star. We also assess the impact of self-gravity on the internal structure of the disk and the neutron star. These axisymmetric, stationary solutions can be employed for simulations involving the neutron star-disk system in the context of high-energy transients and gravitational wave emissions.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Expanding the Attack Scenarios of SAE J1939: A Comprehensive Analysis of Established and Novel Vulnerabilities in Transport Protocol
Authors:
Hwejae Lee,
Hyosun Lee,
Saehee Jun,
Huy Kang Kim
Abstract:
Following the enactment of the UN Regulation, substantial efforts have been directed toward implementing intrusion detection and prevention systems (IDPSs) and vulnerability analysis in Controller Area Network (CAN). However, Society of Automotive Engineers (SAE) J1939 protocol, despite its extensive application in cam** cars and commercial vehicles, has seen limited vulnerability identification…
▽ More
Following the enactment of the UN Regulation, substantial efforts have been directed toward implementing intrusion detection and prevention systems (IDPSs) and vulnerability analysis in Controller Area Network (CAN). However, Society of Automotive Engineers (SAE) J1939 protocol, despite its extensive application in cam** cars and commercial vehicles, has seen limited vulnerability identification, which raises significant safety concerns in the event of security breaches. In this research, we explore and demonstrate attack techniques specific to SAE J1939 communication protocol. We introduce 14 attack scenarios, enhancing the discourse with seven scenarios recognized in the previous research and unveiling seven novel scenarios through our elaborate study. To verify the feasibility of these scenarios, we leverage a sophisticated testbed that facilitates real-time communication and the simulation of attacks. Our testing confirms the successful execution of 11 scenarios, underscoring their imminent threat to commercial vehicle operations. Some attacks will be difficult to detect because they only inject a single message. These results highlight unique vulnerabilities within SAE J1939 protocol, indicating the automotive cybersecurity community needs to address the identified risks.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
A Blueprint Architecture of Compound AI Systems for Enterprise
Authors:
Eser Kandogan,
Sajjadur Rahman,
Nikita Bhutani,
Dan Zhang,
Rafael Li Chen,
Kushan Mitra,
Sairam Gurajada,
Pouya Pezeshkpour,
Hayate Iso,
Yanlin Feng,
Hannah Kim,
Chen Shen,
** Wang,
Estevam Hruschka
Abstract:
Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we intr…
▽ More
Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we introduce a blueprint architecture for compound AI systems to operate in enterprise settings cost-effectively and feasibly. Our proposed architecture aims for seamless integration with existing compute and data infrastructure, with ``stream'' serving as the key orchestration concept to coordinate data and instructions among agents and other components. Task and data planners, respectively, break down, map, and optimize tasks and data to available agents and data sources defined in respective registries, given production constraints such as accuracy and latency.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Do's and Don'ts: Learning Desirable Skills with Instruction Videos
Authors:
Hyunseung Kim,
Byungkun Lee,
Hojoon Lee,
Dongyoon Hwang,
Donghu Kim,
Jaegul Choo
Abstract:
Unsupervised skill discovery is a learning paradigm that aims to acquire diverse behaviors without explicit rewards. However, it faces challenges in learning complex behaviors and often leads to learning unsafe or undesirable behaviors. For instance, in various continuous control tasks, current unsupervised skill discovery methods succeed in learning basic locomotions like standing but struggle wi…
▽ More
Unsupervised skill discovery is a learning paradigm that aims to acquire diverse behaviors without explicit rewards. However, it faces challenges in learning complex behaviors and often leads to learning unsafe or undesirable behaviors. For instance, in various continuous control tasks, current unsupervised skill discovery methods succeed in learning basic locomotions like standing but struggle with learning more complex movements such as walking and running. Moreover, they may acquire unsafe behaviors like trip** and rolling or navigate to undesirable locations such as pitfalls or hazardous areas. In response, we present DoDont (Do's and Don'ts), an instruction-based skill discovery algorithm composed of two stages. First, in an instruction learning stage, DoDont leverages action-free instruction videos to train an instruction network to distinguish desirable transitions from undesirable ones. Then, in the skill learning stage, the instruction network adjusts the reward function of the skill discovery algorithm to weight the desired behaviors. Specifically, we integrate the instruction network into a distance-maximizing skill discovery algorithm, where the instruction network serves as the distance function. Empirically, with less than 8 instruction videos, DoDont effectively learns desirable behaviors and avoids undesirable ones across complex continuous control tasks. Code and videos are available at https://mynsng.github.io/dodont/
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Mirror Symmetry and Level-rank Duality for 3d $\mathcal{N} = 4$ Rank 0 SCFTs
Authors:
Thomas Creutzig,
Niklas Garner,
Heeyeon Kim
Abstract:
We introduce a family of 3d $\mathcal{N} = 4$ superconformal field theories that have zero-dimensional Coulomb and Higgs branches and propose that the rational vertex operator algebras $W^{\text{min}}_{k - \scriptstyle{\frac{1}{2}}}(\mathfrak{sp}_{2N})$ and $L_{k}(\mathfrak{osp}_{1|2N})$ model the modular tensor categories of line operators in their topological $A$ and $B$ twists, respectively. Ou…
▽ More
We introduce a family of 3d $\mathcal{N} = 4$ superconformal field theories that have zero-dimensional Coulomb and Higgs branches and propose that the rational vertex operator algebras $W^{\text{min}}_{k - \scriptstyle{\frac{1}{2}}}(\mathfrak{sp}_{2N})$ and $L_{k}(\mathfrak{osp}_{1|2N})$ model the modular tensor categories of line operators in their topological $A$ and $B$ twists, respectively. Our analysis indicates that the action of 3d mirror symmetry on this family of theories is related to a novel level-rank duality and leads to several conjectural $q$-series identities of independent interest.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR
Authors:
Hajung Kim,
Chanhwi Kim,
Hoonick Lee,
Kyochul Jang,
Jiwoo Lee,
Kyungjae Lee,
Gangwoo Kim,
Jaewoo Kang
Abstract:
Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database's scope or exceed the system's capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handle…
▽ More
Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database's scope or exceed the system's capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handles out-of-domain questions and verifies the generated queries with query execution.Our framework begins by standardizing the structure of questions into a templated format. We use a powerful large language model (LLM), fine-tuned GPT-3.5 with detailed prompts involving the table schemas of the EHR database system. Our experimental results demonstrate the effectiveness of our framework on the EHRSQL-2024 benchmark benchmark, a shared task in the ClinicalNLP workshop. Although a straightforward fine-tuning of GPT shows promising results on the development set, it struggled with the out-of-domain questions in the test set. With our framework, we improve our system's adaptability and achieve competitive performances in the official leaderboard of the EHRSQL-2024 challenge.
△ Less
Submitted 19 June, 2024; v1 submitted 21 May, 2024;
originally announced June 2024.
-
Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
Authors:
Chanjun Park,
Hyeonwoo Kim,
Dahyun Kim,
Seonghwan Cho,
Sanghoon Kim,
Sukyung Lee,
Yungi Kim,
Hwalsuk Lee
Abstract:
This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test…
▽ More
This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test sets along with a correlation study within the Ko-H5 benchmark and temporal analyses of the Ko-H5 score. Moreover, we present empirical support for the need to expand beyond set benchmarks. We hope the Open Ko-LLM Leaderboard sets precedent for expanding LLM evaluation to foster more linguistic diversity.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Knockout: A simple way to handle missing inputs
Authors:
Minh Nguyen,
Batuhan K. Karaman,
Heejong Kim,
Alan Q. Wang,
Fengbei Liu,
Mert R. Sabuncu
Abstract:
Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training mul…
▽ More
Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore only feasible for low dimensional inputs. Imputation may result in inaccurate predictions because it employs point estimates for missing variables and does not work well for high dimensional inputs (e.g., images). Training multiple models whereby each model takes different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training and retaining multiple models can be costly. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification of Knockout and show that it can be viewed as an implicit marginalization strategy. We evaluate Knockout in a wide range of simulations and real-world datasets and show that it can offer strong empirical performance.
△ Less
Submitted 3 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models
Authors:
Fujiao Ji,
Kiho Lee,
Hyungjoon Koo,
Wenhao You,
Eui** Choo,
Hyoungshick Kim,
Doowon Kim
Abstract:
Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate…
▽ More
Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate state-of-the-art visual similarity-based anti-phishing models using a large-scale dataset of 450K real-world phishing websites. Our analysis reveals that while certain models maintain high accuracy, others exhibit notably lower performance than results on curated datasets, highlighting the importance of real-world evaluation. In addition, we observe the real-world tactic of manipulating visual components that phishing attackers employ to circumvent the detection systems. To assess the resilience of existing models against adversarial attacks and robustness, we apply visible and perturbation-based manipulations to website logos, which adversaries typically target. We then evaluate the models' robustness in handling these adversarial samples. Our findings reveal vulnerabilities in several models, emphasizing the need for more robust visual similarity techniques capable of withstanding sophisticated evasion attempts. We provide actionable insights for enhancing the security of phishing defense systems, encouraging proactive actions. To the best of our knowledge, this work represents the first large-scale, systematic evaluation of visual similarity-based models for phishing detection in real-world settings, necessitating the development of more effective and robust defenses.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space
Authors:
Minji Lee,
Luiz Felipe Vecchietti,
Hyunkyu Jung,
Hyun Joo Ro,
Meeyoung Cha,
Ho Min Kim
Abstract:
Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space…
▽ More
Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model. To escape local optima, our optimization is modeled as a Markov decision process using reinforcement learning acting directly in latent space. We evaluate our approach on two important fitness optimization tasks, demonstrating its ability to achieve comparable or superior fitness over baseline methods. Our findings and in vitro evaluation show that the generated sequences can reach high-fitness regions, suggesting a substantial potential of LatProtRL in lab-in-the-loop scenarios.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
I See You: Teacher Analytics with GPT-4 Vision-Powered Observational Assessment
Authors:
Unggi Lee,
Yeil Jeong,
Junbo Koh,
Gyuri Byun,
Yunseo Lee,
Hyunwoong Lee,
Seunmin Eun,
Jewoong Moon,
Cheolil Lim,
Hyeoncheol Kim
Abstract:
This preliminary study explores the integration of GPT-4 Vision (GPT-4V) technology into teacher analytics, focusing on its applicability in observational assessment to enhance reflective teaching practice. This research is grounded in develo** a Video-based Automatic Assessment System (VidAAS) empowered by GPT-4V. Our approach aims to revolutionize teachers' assessment of students' practices by…
▽ More
This preliminary study explores the integration of GPT-4 Vision (GPT-4V) technology into teacher analytics, focusing on its applicability in observational assessment to enhance reflective teaching practice. This research is grounded in develo** a Video-based Automatic Assessment System (VidAAS) empowered by GPT-4V. Our approach aims to revolutionize teachers' assessment of students' practices by leveraging Generative Artificial Intelligence (GenAI) to offer detailed insights into classroom dynamics. Our research methodology encompasses a comprehensive literature review, prototype development of the VidAAS, and usability testing with in-service teachers. The study findings provide future research avenues for VidAAS design, implementation, and integration in teacher analytics, underscoring the potential of GPT-4V to provide real-time, scalable feedback and a deeper understanding of the classroom.
△ Less
Submitted 30 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Classifying 2D topological phases: map** ground states to string-nets
Authors:
Isaac H. Kim,
Daniel Ranard
Abstract:
We prove the conjectured classification of topological phases in two spatial dimensions with gappable boundary, in a simplified setting. Two gapped ground states of lattice Hamiltonians are in the same quantum phase of matter, or topological phase, if they can be connected by a constant-depth quantum circuit. It is conjectured that the Levin-Wen string-net models exhaust all possible gapped phases…
▽ More
We prove the conjectured classification of topological phases in two spatial dimensions with gappable boundary, in a simplified setting. Two gapped ground states of lattice Hamiltonians are in the same quantum phase of matter, or topological phase, if they can be connected by a constant-depth quantum circuit. It is conjectured that the Levin-Wen string-net models exhaust all possible gapped phases with gappable boundary, and these phases are labeled by unitary modular tensor categories. We prove this under the assumption that every phase has a representative state with zero correlation length satisfying the entanglement bootstrap axioms, or a strict form of area law. Our main technical development is to transform these states into string-net states using constant-depth quantum circuits.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Are Self-Attentions Effective for Time Series Forecasting?
Authors:
Dongbin Kim,
**seong Park,
Jaewook Lee,
Hoki Kim
Abstract:
Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformer models have dramatically shifted the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In t…
▽ More
Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformer models have dramatically shifted the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In this paper, we shift focus from the overall architecture of the Transformer to the effectiveness of self-attentions for time series forecasting. To this end, we introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional Transformer framework by eliminating self-attention and leveraging cross-attention mechanisms instead. By establishing future horizon-dependent parameters as queries and enhanced parameter sharing, our model not only improves long-term forecasting accuracy but also reduces the number of parameters and memory usage. Extensive experiment across various datasets demonstrates that our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Exploring the Enigma of Neural Dynamics Through A Scattering-Transform Mixer Landscape for Riemannian Manifold
Authors:
Tingting Dan,
Ziquan Wei,
Won Hwa Kim,
Guorong Wu
Abstract:
The human brain is a complex inter-wired system that emerges spontaneous functional fluctuations. In spite of tremendous success in the experimental neuroscience field, a system-level understanding of how brain anatomy supports various neural activities remains elusive. Capitalizing on the unprecedented amount of neuroimaging data, we present a physics-informed deep model to uncover the coupling m…
▽ More
The human brain is a complex inter-wired system that emerges spontaneous functional fluctuations. In spite of tremendous success in the experimental neuroscience field, a system-level understanding of how brain anatomy supports various neural activities remains elusive. Capitalizing on the unprecedented amount of neuroimaging data, we present a physics-informed deep model to uncover the coupling mechanism between brain structure and function through the lens of data geometry that is rooted in the widespread wiring topology of connections between distant brain regions. Since deciphering the puzzle of self-organized patterns in functional fluctuations is the gateway to understanding the emergence of cognition and behavior, we devise a geometric deep model to uncover manifold map** functions that characterize the intrinsic feature representations of evolving functional fluctuations on the Riemannian manifold. In lieu of learning unconstrained map** functions, we introduce a set of graph-harmonic scattering transforms to impose the brain-wide geometry on top of manifold map** functions, which allows us to cast the manifold-based deep learning into a reminiscent of MLP-Mixer architecture (in computer vision) for Riemannian manifold. As a proof-of-concept approach, we explore a neural-manifold perspective to understand the relationship between (static) brain structure and (dynamic) function, challenging the prevailing notion in cognitive neuroscience by proposing that neural activities are essentially excited by brain-wide oscillation waves living on the geometry of human connectomes, instead of being confined to focal areas.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Modes of Analyzing Disinformation Narratives With AI/ML/Text Mining to Assist in Mitigating the Weaponization of Social Media
Authors:
Andy Skumanich,
Han Kyul Kim
Abstract:
This paper highlights the develo** need for quantitative modes for capturing and monitoring malicious communication in social media. There has been a deliberate "weaponization" of messaging through the use of social networks including by politically oriented entities both state sponsored and privately run. The article identifies a use of AI/ML characterization of generalized "mal-info," a broad…
▽ More
This paper highlights the develo** need for quantitative modes for capturing and monitoring malicious communication in social media. There has been a deliberate "weaponization" of messaging through the use of social networks including by politically oriented entities both state sponsored and privately run. The article identifies a use of AI/ML characterization of generalized "mal-info," a broad term which includes deliberate malicious narratives similar with hate speech, which adversely impact society. A key point of the discussion is that this mal-info will dramatically increase in volume, and it will become essential for sharable quantifying tools to provide support for human expert intervention. Despite attempts to introduce moderation on major platforms like Facebook and X/Twitter, there are now established alternative social networks that offer completely unmoderated spaces. The paper presents an introduction to these platforms and the initial results of a qualitative and semi-quantitative analysis of characteristic mal-info posts. The authors perform a rudimentary text mining function for a preliminary characterization in order to evaluate the modes for better-automated monitoring. The action examines several inflammatory terms using text analysis and, importantly, discusses the use of generative algorithms by one political agent in particular, providing some examples of the potential risks to society. This latter is of grave concern, and monitoring tools must be established. This paper presents a preliminary step to selecting relevant sources and to setting a foundation for characterizing the mal-info, which must be monitored. The AI/ML methods provide a means for semi-quantitative signature capture. The impending use of "mal-GenAI" is presented.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Interfacially enhanced superconductivity in Fe(Te,Se)/Bi4Te3 heterostructures
Authors:
An-Hsi Chen,
Qiangsheng Lu,
Eitan Hershkovitz,
Miguel L. Crespillo,
Alessandro R. Mazza,
Tyler Smith,
T. Zac Ward,
Gyula Eres,
Shornam Gandhi,
Meer Muhtasim Mahfuz,
Vitalii Starchenko,
Khalid Hattar,
Joon Sue Lee,
Honggyu Kim,
Robert G. Moore,
Matthew Brahlek
Abstract:
Realizing topological superconductivity by integrating high-transition-temperature ($T_C$) superconductors with topological insulators can open new paths for quantum computing applications. Here, we report a new approach for increasing the superconducting transition temperature ($T_{C}^{onset}$) by interfacing the unconventional superconductor Fe(Te,Se) with the topological insulator Bi-Te system…
▽ More
Realizing topological superconductivity by integrating high-transition-temperature ($T_C$) superconductors with topological insulators can open new paths for quantum computing applications. Here, we report a new approach for increasing the superconducting transition temperature ($T_{C}^{onset}$) by interfacing the unconventional superconductor Fe(Te,Se) with the topological insulator Bi-Te system in the low-Se do** regime, near where superconductivity vanishes in the bulk. The critical finding is that the $T_{C}^{onset}$ of Fe(Te,Se) increases from nominally non-superconducting to as high as 12.5 K when $Bi_2Te_3$ is replaced with the topological phase $Bi_4Te_3$. Interfacing Fe(Te,Se) with $Bi_4Te_3$ is also found to be critical for stabilizing superconductivity in monolayer films where $T_{C}^{onset}$ can be as high as 6 K. Measurements of the electronic and crystalline structure of the $Bi_4Te_3$ layer reveal that a large electron transfer, epitaxial strain, and novel chemical reduction processes are critical factors for the enhancement of superconductivity. This novel route for enhancing $T_C$ in an important epitaxial system provides new insight on the nature of interfacial superconductivity and a platform to identify and utilize new electronic phases.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Constraints for electron-capture decays mimicking production of axion-like particles in nuclei
Authors:
Aagrah Agnihotri,
Jouni Suhonen,
Hong Joo Kim
Abstract:
We give for the first time, theoretical estimates of ground-state-to-ground-state (GS-to-GS) electron-capture (EC) branch decay rates of $^{44}$Ti, $^{57}$Co, and $^{139}$Ce. The nuclear-structure calculations have been done exploiting the nuclear shell model (NSM) with well-established Hamiltonians and an advanced theory of $β$ decay. In the absence of experimental measurements of these GS-to-GS…
▽ More
We give for the first time, theoretical estimates of ground-state-to-ground-state (GS-to-GS) electron-capture (EC) branch decay rates of $^{44}$Ti, $^{57}$Co, and $^{139}$Ce. The nuclear-structure calculations have been done exploiting the nuclear shell model (NSM) with well-established Hamiltonians and an advanced theory of $β$ decay. In the absence of experimental measurements of these GS-to-GS branches, these estimates are of utmost importance for terrestrial searches of axion-like particles (ALPs). Predictions are made for EC-decay rates of 2$^{nd}$-forbidden unique (FU) and 2$^{nd}$-forbidden non-unique (FNU) EC transitions that can potentially mimic nuclear axion production in experiments designed to detect ALPs in nuclear environments.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Time evolution of two-point function in expanding universes
Authors:
Chanyong Park,
Hanse Kim,
Kyungchan Cho
Abstract:
By applying the braneworld model, we investigate the time evolution of two-point functions holographically in expanding universes. In order to explain the FLRW cosmologies, we take into account a $p$-brane gas geometry. When we take the boundary of a $p$-brane gas geometry as the braneworld we are living in, $p$-branes are detected as $(p-1)$-dimensional objects to an observer living in the branew…
▽ More
By applying the braneworld model, we investigate the time evolution of two-point functions holographically in expanding universes. In order to explain the FLRW cosmologies, we take into account a $p$-brane gas geometry. When we take the boundary of a $p$-brane gas geometry as the braneworld we are living in, $p$-branes are detected as $(p-1)$-dimensional objects to an observer living in the braneworld. These objects lead to the FLRW cosmology which is related to the radial motion of the braneworld in the $p$-brane gas geometry. Utilizing this $p$-brane gas geometry, we investigate the time-dependent two-point functions of FLRW cosmologies. For $p=4$, we reproduce the known two-point functions of the dS space and find a new analytic two-point function for $p=2$ which corresponds to an expanding universe with uniformly distributed cosmic strings. Moreover, we show that the two-point function suppresses in the late time era with a specific power that is associated with the equation of state parameter of the uniformly distributed matter.
△ Less
Submitted 17 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Quantized geodesic lengths for Teichmüller spaces: algebraic aspects
Authors:
Hyun Kyu Kim
Abstract:
In 1980's H Verlinde suggested to construct and use a quantization of Teichmüller spaces to construct spaces of conformal blocks for the Liouville conformal field theory. This suggestion led to a mathematical formulation by Fock in 1990's, called the modular functor conjecture, based on the Chekhov-Fock quantum Teichmüller theory. In 2000's Teschner combined the Chekhov-Fock version and the Kashae…
▽ More
In 1980's H Verlinde suggested to construct and use a quantization of Teichmüller spaces to construct spaces of conformal blocks for the Liouville conformal field theory. This suggestion led to a mathematical formulation by Fock in 1990's, called the modular functor conjecture, based on the Chekhov-Fock quantum Teichmüller theory. In 2000's Teschner combined the Chekhov-Fock version and the Kashaev version of quantum Teichmüller theory to construct a solution to a modified form of the conjecture. We embark on a direct approach to the conjecture based on the Chekhov-Fock(-Goncharov) theory. We construct quantized trace-of-monodromy along simple loops via Bonahon and Wong's quantum trace maps developed in 2010's, and investigate algebraic structures of them, which will eventually lead to construction and properties of quantized geodesic length operators. We show that a special recursion relation used by Teschner is satisfied by the quantized trace-of-monodromy, and that the quantized trace-of-monodromy for disjoint loops commute in a certain strong sense.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
The integration of heterogeneous resources in the CMS Submission Infrastructure for the LHC Run 3 and beyond
Authors:
Antonio Perez-Calero Yzquierdo,
Marco Mascheroni,
Edita Kizinevic,
Farrukh Aftab Khan,
Hyunwoo Kim,
Maria Acosta Flechas,
Nikos Tsipinakis,
Saqib Haleem
Abstract:
While the computing landscape supporting LHC experiments is currently dominated by x86 processors at WLCG sites, this configuration will evolve in the coming years. LHC collaborations will be increasingly employing HPC and Cloud facilities to process the vast amounts of data expected during the LHC Run 3 and the future HL-LHC phase. These facilities often feature diverse compute resources, includi…
▽ More
While the computing landscape supporting LHC experiments is currently dominated by x86 processors at WLCG sites, this configuration will evolve in the coming years. LHC collaborations will be increasingly employing HPC and Cloud facilities to process the vast amounts of data expected during the LHC Run 3 and the future HL-LHC phase. These facilities often feature diverse compute resources, including alternative CPU architectures like ARM and IBM Power, as well as a variety of GPU specifications. Using these heterogeneous resources efficiently is thus essential for the LHC collaborations reaching their future scientific goals. The Submission Infrastructure (SI) is a central element in CMS Computing, enabling resource acquisition and exploitation by CMS data processing, simulation and analysis tasks. The SI must therefore be adapted to ensure access and optimal utilization of this heterogeneous compute capacity. Some steps in this evolution have been already taken, as CMS is currently using opportunistically a small pool of GPU slots provided mainly at the CMS WLCG sites. Additionally, Power9 processors have been validated for CMS production at the Marconi-100 cluster at CINECA. This note will describe the updated capabilities of the SI to continue ensuring the efficient allocation and use of computing resources by CMS, despite their increasing diversity. The next steps towards a full integration and support of heterogeneous resources according to CMS needs will also be reported.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Adoption of a token-based authentication model for the CMS Submission Infrastructure
Authors:
Antonio Perez-Calero Yzquierdo,
Marco Mascheroni,
Edita Kizinevic,
Farrukh Aftab Khan,
Hyunwoo Kim,
Maria Acosta Flechas,
Nikos Tsipinakis,
Saqib Haleem,
Frank Wurthwein
Abstract:
The CMS Submission Infrastructure (SI) is the main computing resource provisioning system for CMS workloads. A number of HTCondor pools are employed to manage this infrastructure, which aggregates geographically distributed resources from the WLCG and other providers. Historically, the model of authentication among the diverse components of this infrastructure has relied on the Grid Security Infra…
▽ More
The CMS Submission Infrastructure (SI) is the main computing resource provisioning system for CMS workloads. A number of HTCondor pools are employed to manage this infrastructure, which aggregates geographically distributed resources from the WLCG and other providers. Historically, the model of authentication among the diverse components of this infrastructure has relied on the Grid Security Infrastructure (GSI), based on identities and X509 certificates. In contrast, commonly used modern authentication standards are based on capabilities and tokens. The WLCG has identified this trend and aims at a transparent replacement of GSI for all its workload management, data transfer and storage access operations, to be completed during the current LHC Run 3. As part of this effort, and within the context of CMS computing, the Submission Infrastructure group is in the process of phasing out the GSI part of its authentication layers, in favor of IDTokens and Scitokens. The use of tokens is already well integrated into the HTCondor Software Suite, which has allowed us to fully migrate the authentication between internal components of SI. Additionally, recent versions of the HTCondor-CE support tokens as well, enabling CMS resource requests to Grid sites employing this CE technology to be granted by means of token exchange. After a rollout campaign to sites, successfully completed by the third quarter of 2022, the totality of HTCondor CEs in use by CMS are already receiving Scitoken-based pilot jobs. On the ARC CE side, a parallel campaign was launched to foster the adoption of the REST interface at CMS sites (required to enable token-based job submission via HTCondor-G), which is nearing completion as well. In this contribution, the newly adopted authentication model will be described. We will then report on the migration status and final steps towards complete GSI phase out in the CMS SI.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Repurposing of the Run 2 CMS High Level Trigger Infrastructure as a Cloud Resource for Offline Computing
Authors:
Marco Mascheroni,
Antonio Perez-Calero Yzquierdo,
Edita Kizinevic,
Farrukh Aftab Khan,
Hyunwoo Kim,
Maria Acosta Flechas,
Nikos Tsipinakis,
Saqib Haleem,
Damiele Spiga,
Christoph Wissing,
Frank Wurthwein
Abstract:
The former CMS Run 2 High Level Trigger (HLT) farm is one of the largest contributors to CMS compute resources, providing about 25k job slots for offline computing. This CPU farm was initially employed as an opportunistic resource, exploited during inter-fill periods, in the LHC Run 2. Since then, it has become a nearly transparent extension of the CMS capacity at CERN, being located on-site at th…
▽ More
The former CMS Run 2 High Level Trigger (HLT) farm is one of the largest contributors to CMS compute resources, providing about 25k job slots for offline computing. This CPU farm was initially employed as an opportunistic resource, exploited during inter-fill periods, in the LHC Run 2. Since then, it has become a nearly transparent extension of the CMS capacity at CERN, being located on-site at the LHC interaction point 5 (P5), where the CMS detector is installed. This resource has been configured to support the execution of critical CMS tasks, such as prompt detector data reconstruction. It can therefore be used in combination with the dedicated Tier 0 capacity at CERN, in order to process and absorb peaks in the stream of data coming from the CMS detector. The initial configuration for this resource, based on statically configured VMs, provided the required level of functionality. However, regular operations of this cluster revealed certain limitations compared to the resource provisioning and use model employed in the case of WLCG sites. A new configuration, based on a vacuum-like model, has been implemented for this resource in order to solve the detected shortcomings. This paper reports about this redeployment work on the permanent cloud for an enhanced support to CMS offline computing, comparing the former and new models' respective functionalities, along with the commissioning effort for the new setup.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
HPC resources for CMS offline computing: An integration and scalability challenge for the Submission Infrastructure
Authors:
Antonio Perez-Calero Yzquierdo,
Marco Mascheroni,
Edita Kizinevic,
Farrukh Aftab Khan,
Hyunwoo Kim,
Maria Acosta Flechas,
Nikos Tsipinakis,
Saqib Haleem
Abstract:
The computing resource needs of LHC experiments are expected to continue growing significantly during the Run 3 and into the HL-LHC era. The landscape of available resources will also evolve, as High Performance Computing (HPC) and Cloud resources will provide a comparable, or even dominant, fraction of the total compute capacity. The future years present a challenge for the experiments' resource…
▽ More
The computing resource needs of LHC experiments are expected to continue growing significantly during the Run 3 and into the HL-LHC era. The landscape of available resources will also evolve, as High Performance Computing (HPC) and Cloud resources will provide a comparable, or even dominant, fraction of the total compute capacity. The future years present a challenge for the experiments' resource provisioning models, both in terms of scalability and increasing complexity. The CMS Submission Infrastructure (SI) provisions computing resources for CMS workflows. This infrastructure is built on a set of federated HTCondor pools, currently aggregating 400k CPU cores distributed worldwide and supporting the simultaneous execution of over 200k computing tasks. Incorporating HPC resources into CMS computing represents firstly an integration challenge, as HPC centers are much more diverse compared to Grid sites. Secondly, evolving the present SI, dimensioned to harness the current CMS computing capacity, to reach the resource scales required for the HLLHC phase, while maintaining global flexibility and efficiency, will represent an additional challenge for the SI. To preventively address future potential scalability limits, the SI team regularly runs tests to explore the maximum reach of our infrastructure. In this note, the integration of HPC resources into CMS offline computing is summarized, the potential concerns for the SI derived from the increased scale of operations are described, and the most recent results of scalability test on the CMS SI are reported.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Authors:
Jaewoo Yang,
Hayun Kim,
Younghoon Kim
Abstract:
Modern large language models (LLMs) have established state-of-the-art performance through architectural improvements, but still require significant computational cost for inference. In an effort to reduce the inference cost, post-training quantization (PTQ) has become a popular approach, quantizing weights and activations to lower precision, such as INT8. In this paper, we reveal the challenges of…
▽ More
Modern large language models (LLMs) have established state-of-the-art performance through architectural improvements, but still require significant computational cost for inference. In an effort to reduce the inference cost, post-training quantization (PTQ) has become a popular approach, quantizing weights and activations to lower precision, such as INT8. In this paper, we reveal the challenges of activation quantization in GLU variants, which are widely used in feed-forward network (FFN) of modern LLMs, such as LLaMA family. The problem is that severe local quantization errors, caused by excessive magnitudes of activation in GLU variants, significantly degrade the performance of the quantized LLM. We denote these activations as activation spikes. Our further observations provide a systematic pattern of activation spikes: 1) The activation spikes occur in the FFN of specific layers, particularly in the early and late layers, 2) The activation spikes are dedicated to a couple of tokens, rather than being shared across a sequence. Based on our observations, we propose two empirical methods, Quantization-free Module (QFeM) and Quantization-free Prefix (QFeP), to isolate the activation spikes during quantization. Our extensive experiments validate the effectiveness of the proposed methods for the activation quantization, especially with coarse-grained scheme, of latest LLMs with GLU variants, including LLaMA-2/3, Mistral, Mixtral, SOLAR, and Gemma. In particular, our methods enhance the current alleviation techniques (e.g., SmoothQuant) that fail to control the activation spikes. Code is available at https://github.com/onnoo/activation-spikes.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
High-precision spectroscopy of $^{20}$O benchmarking ab-initio calculations in light nuclei
Authors:
I. Zanon,
E. Clément,
A. Goasduff,
J. Menéndez,
T. Miyagi,
M. Assié,
M. Ciemała,
F. Flavigny,
A. Lemasson,
A. Matta,
D. Ramos,
M. Rejmund,
L. Achouri,
D. Ackermann,
D. Barrientos,
D. Beaumel,
G. Benzoni,
A. J. Boston,
H. C. Boston,
S. Bottoni,
A. Bracco,
D. Brugnara,
G. de France,
N. de Sereville,
F. Delaunay
, et al. (56 additional authors not shown)
Abstract:
The excited states of unstable $^{20}$O were investigated via $γ$-ray spectroscopy following the $^{19}$O$(d,p)^{20}$O reaction at 8 $A$MeV. By exploiting the Doppler Shift Attenuation Method, the lifetime of the 2$^+_2$ and 3$^+_1$ states were firmly established. From the $γ$-ray branching and E2/M1 mixing ratios for transitions deexciting the 2$^+_2$ and 3$^+_1$ states, the B(E2) and B(M1) were…
▽ More
The excited states of unstable $^{20}$O were investigated via $γ$-ray spectroscopy following the $^{19}$O$(d,p)^{20}$O reaction at 8 $A$MeV. By exploiting the Doppler Shift Attenuation Method, the lifetime of the 2$^+_2$ and 3$^+_1$ states were firmly established. From the $γ$-ray branching and E2/M1 mixing ratios for transitions deexciting the 2$^+_2$ and 3$^+_1$ states, the B(E2) and B(M1) were determined. Various chiral effective field theory Hamiltonians, describing the nuclear properties beyond ground states, along with a standard USDB interaction, were compared with the experimentally obtained data. Such a comparison for a large set of $γ$-ray transition probabilities with the valence space in medium similarity renormalization group ab-initio calculations was performed for the first time in a nucleus far from stability. It was shown that the ab-initio approaches using chiral EFT forces are challenged by detailed high-precision spectroscopic properties of nuclei. The reduced transition probabilities were found to be a very constraining test of the performance of the ab-initio models.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Room-temperature waveguide-integrated photodetector using bolometric effect for mid-infrared spectroscopy applications
Authors:
Joonsup Shim,
**ha Lim,
Inki Kim,
Jaeyong Jeong,
Bong Ho Kim,
Seong Kwang Kim,
Dae-Myeong Geum,
SangHyeon Kim
Abstract:
Waveguide-integrated mid-infrared (MIR) photodetectors are pivotal components for develo** molecular spectroscopy applications, leveraging mature photonic integrated circuit (PIC) technologies. Despite various strategies, critical challenges still remain in achieving broadband photoresponse, cooling-free operation, and large-scale complementary-metal-oxide-semiconductor (CMOS)-compatible manufac…
▽ More
Waveguide-integrated mid-infrared (MIR) photodetectors are pivotal components for develo** molecular spectroscopy applications, leveraging mature photonic integrated circuit (PIC) technologies. Despite various strategies, critical challenges still remain in achieving broadband photoresponse, cooling-free operation, and large-scale complementary-metal-oxide-semiconductor (CMOS)-compatible manufacturability. To leap beyond these limitations, the bolometric effect - a thermal detection mechanism - is introduced into the waveguide platform. More importantly, we pursue a free-carrier absorption (FCA) process in germanium (Ge) to create an efficient light-absorbing medium, providing a pragmatic solution for full coverage of the MIR spectrum without incorporating exotic materials into CMOS. Here, we present an uncooled waveguide-integrated photodetector based on a Ge-on-insulator (Ge-OI) PIC architecture, exploiting the bolometric effect combined with FCA. Notably, our device exhibits a broadband responsivity of ~12 mA/W across 4030-4360 nm (and potentially beyond), challenging the state of the art, while achieving a noise-equivalent power of 3.4x10^-9 W/Hz^0.5 at 4180 nm. We further demonstrate label-free sensing of carbon dioxide using our integrated photodetector and sensing waveguide on a single chip. This approach to room-temperature waveguide-integrated MIR photodetection, harnessing bolometry with FCA in Ge, not only facilitates the realization of fully integrated lab-on-a-chip systems with wavelength flexibility but also provides a blueprint for MIR PICs with CMOS-foundry-compatibility.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
BrainMorph: A Foundational Keypoint Model for Robust and Flexible Brain MRI Registration
Authors:
Alan Q. Wang,
Rachit Saluja,
Heejong Kim,
Xinzi He,
Adrian Dalca,
Mert R. Sabuncu
Abstract:
We present a keypoint-based foundation model for general purpose brain MRI registration, based on the recently-proposed KeyMorph framework. Our model, called BrainMorph, serves as a tool that supports multi-modal, pairwise, and scalable groupwise registration. BrainMorph is trained on a massive dataset of over 100,000 3D volumes, skull-stripped and non-skull-stripped, from nearly 16,000 unique hea…
▽ More
We present a keypoint-based foundation model for general purpose brain MRI registration, based on the recently-proposed KeyMorph framework. Our model, called BrainMorph, serves as a tool that supports multi-modal, pairwise, and scalable groupwise registration. BrainMorph is trained on a massive dataset of over 100,000 3D volumes, skull-stripped and non-skull-stripped, from nearly 16,000 unique healthy and diseased subjects. BrainMorph is robust to large misalignments, interpretable via interrogating automatically-extracted keypoints, and enables rapid and controllable generation of many plausible transformations with different alignment types and different degrees of nonlinearity at test-time. We demonstrate the superiority of BrainMorph in solving 3D rigid, affine, and nonlinear registration on a variety of multi-modal brain MRI scans of healthy and diseased subjects, in both the pairwise and groupwise setting. In particular, we show registration accuracy and speeds that surpass current state-of-the-art methods, especially in the context of large initial misalignments and large group settings. All code and models are available at https://github.com/alanqrwang/brainmorph.
△ Less
Submitted 24 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency
Authors:
Hyeong** Kim,
Sangwon Kim,
Dasom Ahn,
Jong Taek Lee,
Byoung Chul Ko
Abstract:
Scene graph generation (SGG) is an important task in image understanding because it represents the relationships between objects in an image as a graph structure, making it possible to understand the semantic relationships between objects intuitively. Previous SGG studies used a message-passing neural networks (MPNN) to update features, which can effectively reflect information about surrounding o…
▽ More
Scene graph generation (SGG) is an important task in image understanding because it represents the relationships between objects in an image as a graph structure, making it possible to understand the semantic relationships between objects intuitively. Previous SGG studies used a message-passing neural networks (MPNN) to update features, which can effectively reflect information about surrounding objects. However, these studies have failed to reflect the co-occurrence of objects during SGG generation. In addition, they only addressed the long-tail problem of the training dataset from the perspectives of sampling and learning methods. To address these two problems, we propose CooK, which reflects the Co-occurrence Knowledge between objects, and the learnable term frequency-inverse document frequency (TF-l-IDF) to solve the long-tail problem. We applied the proposed model to the SGG benchmark dataset, and the results showed a performance improvement of up to 3.8% compared with existing state-of-the-art models in SGGen subtask. The proposed method exhibits generalization ability from the results obtained, showing uniform performance improvement for all MPNN models.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Strongly-Consistent Distributed Discrete-event Systems
Authors:
Peter Donovan,
Erling Jellum,
Byeonggil Jun,
Hokeun Kim,
Edward A. Lee,
Shaokai Lin,
Marten Lohstroh,
Anirudh Rengarajan
Abstract:
Discrete-event (DE) systems are concurrent programs where components communicate via tagged events, where tags are drawn from a totally ordered set. Reactors are an emerging model of computation based on DE and realized in the open-source coordination language Lingua Franca. Distributed DE (DDE) systems are DE systems where the components (reactors) communicate over networks. The prior art has req…
▽ More
Discrete-event (DE) systems are concurrent programs where components communicate via tagged events, where tags are drawn from a totally ordered set. Reactors are an emerging model of computation based on DE and realized in the open-source coordination language Lingua Franca. Distributed DE (DDE) systems are DE systems where the components (reactors) communicate over networks. The prior art has required that for DDE systems with cycles, each cycle must contain at least one logical delay, where the tag of events is incremented. Such delays, however, are not required by the elegant fixed-point semantics of DE. The only requirement is that the program be constructive, meaning it is free of causality cycles. This paper gives a way to coordinate the execution of DDE systems that can execute any constructive program, even one with zero-delay cycles. It provides a formal model that exposes exactly the information that must be shared across networks for such execution to be possible. Furthermore, it describes a concrete implementation that is an extension of the coordination mechanisms in Lingua Franca.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Systematic Review on Healthcare Systems Engineering utilizing ChatGPT
Authors:
Jungwoo Kim,
Ji-Su Lee,
Huijae Kim,
Taesik Lee
Abstract:
This paper presents an analytical framework for conducting academic reviews in the field of Healthcare Systems Engineering, employing ChatGPT, a state-of-the-art tool among recent language models. We utilized 9,809 abstract paragraphs from conference presentations to systematically review the field. The framework comprises distinct analytical processes, each employing tailored prompts and the syst…
▽ More
This paper presents an analytical framework for conducting academic reviews in the field of Healthcare Systems Engineering, employing ChatGPT, a state-of-the-art tool among recent language models. We utilized 9,809 abstract paragraphs from conference presentations to systematically review the field. The framework comprises distinct analytical processes, each employing tailored prompts and the systematic use of the ChatGPT API. Through this framework, we organized the target field into 11 topic categories and conducted a comprehensive analysis covering quantitative yearly trends and detailed sub-categories. This effort explores the potential for leveraging ChatGPT to alleviate the burden of academic reviews. Furthermore, it provides valuable insights into the dynamic landscape of Healthcare Systems Engineering research.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Interior Harnack inequality and Hölder estimates for linearized Monge-Ampère equations in divergence form with drift
Authors:
Young Ho Kim
Abstract:
In this paper, we study interior estimates for solutions to linearized Monge-Ampère equations in divergence form with drift terms and the right-hand side containing the divergence of a bounded vector field. Equations of this type appear in the study of semigeostrophic equations in meteorology and the solvability of singular Abreu equations in the calculus of variations with a convexity constraint.…
▽ More
In this paper, we study interior estimates for solutions to linearized Monge-Ampère equations in divergence form with drift terms and the right-hand side containing the divergence of a bounded vector field. Equations of this type appear in the study of semigeostrophic equations in meteorology and the solvability of singular Abreu equations in the calculus of variations with a convexity constraint. We prove an interior Harnack inequality and Hölder estimates for solutions to equations of this type in two dimensions, and under an integrability assumption on the Hessian matrix of the Monge-Ampère potential in higher dimensions. Our results extend those of Le (Analysis of Monge-Ampère equations, Graduate Studies in Mathematics, vol.240, American Mathematical Society, 2024) to equations with drift terms.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Attention to Quantum Complexity
Authors:
Hye** Kim,
Yiqing Zhou,
Yichen Xu,
Kaarthik Varma,
Amir H. Karamlou,
Ilan T. Rosen,
Jesse C. Hoke,
Chao Wan,
** Peng Zhou,
William D. Oliver,
Yuri D. Lensky,
Kilian Q. Weinberger,
Eun-Ah Kim
Abstract:
The imminent era of error-corrected quantum computing urgently demands robust methods to characterize complex quantum states, even from limited and noisy measurements. We introduce the Quantum Attention Network (QuAN), a versatile classical AI framework leveraging the power of attention mechanisms specifically tailored to address the unique challenges of learning quantum complexity. Inspired by la…
▽ More
The imminent era of error-corrected quantum computing urgently demands robust methods to characterize complex quantum states, even from limited and noisy measurements. We introduce the Quantum Attention Network (QuAN), a versatile classical AI framework leveraging the power of attention mechanisms specifically tailored to address the unique challenges of learning quantum complexity. Inspired by large language models, QuAN treats measurement snapshots as tokens while respecting their permutation invariance. Combined with a novel parameter-efficient mini-set self-attention block (MSSAB), such data structure enables QuAN to access high-order moments of the bit-string distribution and preferentially attend to less noisy snapshots. We rigorously test QuAN across three distinct quantum simulation settings: driven hard-core Bose-Hubbard model, random quantum circuits, and the toric code under coherent and incoherent noise. QuAN directly learns the growth in entanglement and state complexity from experimentally obtained computational basis measurements. In particular, it learns the growth in complexity of random circuit data upon increasing depth from noisy experimental data. Taken to a regime inaccessible by existing theory, QuAN unveils the complete phase diagram for noisy toric code data as a function of both noise types. This breakthrough highlights the transformative potential of using purposefully designed AI-driven solutions to assist quantum hardware.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Search for Two-Body $B$ Meson Decays to $Λ^{0}$ and $Ω^{(*)0}_{c}$
Authors:
Belle Collaboration,
V. Savinov,
I. Adachi,
J. K. Ahn,
H. Aihara,
D. M. Asner,
H. Atmacan,
R. Ayad,
Sw. Banerjee,
J. Bennett,
M. Bessner,
V. Bhardwaj,
D. Biswas,
A. Bobrov,
D. Bodrov,
J. Borah,
M. Bračko,
P. Branchini,
T. E. Browder,
A. Budano,
D. Červenkov,
M. -C. Chang,
P. Chang,
B. G. Cheon,
K. Cho
, et al. (124 additional authors not shown)
Abstract:
We report the results of the first search for Standard Model and baryon-number-violating two-body decays of the neutral $B$ mesons to $Λ^{0}$ and $Ω^{(*)0}_c$ using 711~${\rm fb^{-1}}$ of data collected at the $Υ(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+ e^-$ collider. We observe no evidence of signal from any such decays and set 95\% confidence-level upper limits o…
▽ More
We report the results of the first search for Standard Model and baryon-number-violating two-body decays of the neutral $B$ mesons to $Λ^{0}$ and $Ω^{(*)0}_c$ using 711~${\rm fb^{-1}}$ of data collected at the $Υ(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+ e^-$ collider. We observe no evidence of signal from any such decays and set 95\% confidence-level upper limits on the products of $B^0$ and $\bar{B}^0$ branching fractions for these two-body decays with $\mathcal{B}(Ω_{c}^{0} \to π^+ Ω^-)$ in the range between 9.5~$\times 10^{-8}$ and 31.2~$\times 10^{-8}$.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Harnessing moiré ferroelectricity to modulate semiconductor monolayer light emission
Authors:
Dong Seob Kim,
Chengxin Xiao,
Roy C. Dominguez,
Zhida Liu,
Hamza Abudayyeh,
Kyoungpyo Lee,
Rigo Mayorga-Luna,
Hyunsue Kim,
Kenji Watanabe,
Takashi Taniguchi,
Chih-Kang Shih,
Yoichi Miyahara,
Wang Yao,
Xiaoqin Li
Abstract:
Ferroelectricity has been recently discovered in stacked or twisted van der Waals (vdW) moiré systems. The versatility of producing a large array of size-tunable domains and integration with diverse functional materials make them an enticing platform for develo** multifunctional devices. Here, we show that ferroelectric polar domains formed in a twisted hexagonal boron nitride (t-hBN) substrate…
▽ More
Ferroelectricity has been recently discovered in stacked or twisted van der Waals (vdW) moiré systems. The versatility of producing a large array of size-tunable domains and integration with diverse functional materials make them an enticing platform for develo** multifunctional devices. Here, we show that ferroelectric polar domains formed in a twisted hexagonal boron nitride (t-hBN) substrate can modulate light emission from an adjacent semiconductor monolayer. The abrupt change in electrostatic potential across the domains produces an in-plane electric field (E-field) and leads to a remarkably large exciton Stark shift in the adjacent MoSe$_2$ monolayer, previously only observable in \textit{p-n} junctions created by the advanced e-beam lithography tools. Both the spectrum and spatial pattern of the light emission from the monolayer are periodically modulated by the remote moiré potential imposed by the t-hBN substrate. We further observe a characteristic hysteresis behavior in the light emission as an electric gate erases and restores the domains. Our findings chart an exciting pathway for integrating nanometer-scale moiré ferroelectric domains with various optically active functional layers, paving the way for advanced nanophotonic applications.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Out-of-time-order asymptotic observables are quasi-isomorphic to time-ordered amplitudes
Authors:
Leron Borsten,
David Simon Henrik Jonsson,
Hyungrok Kim
Abstract:
Asymptotic observables in quantum field theory beyond the familiar $S$-matrix have recently attracted much interest, for instance in the context of gravity waveforms. Such observables can be understood in terms of Schwinger-Keldysh-type 'amplitudes' computed by a set of modified Feynman rules involving cut internal legs and external legs labelled by time-folds.
In parallel, a homotopy-algebraic…
▽ More
Asymptotic observables in quantum field theory beyond the familiar $S$-matrix have recently attracted much interest, for instance in the context of gravity waveforms. Such observables can be understood in terms of Schwinger-Keldysh-type 'amplitudes' computed by a set of modified Feynman rules involving cut internal legs and external legs labelled by time-folds.
In parallel, a homotopy-algebraic understanding of perturbative quantum field theory has emerged in recent years. In particular, passing through homotopy transfer, the $S$-matrix of a perturbative quantum field theory can be understood as the minimal model of an associated (quantum) $L_\infty$-algebra.
Here we bring these two developments together. In particular, we show that Schwinger-Keldysh amplitudes are naturally encoded in an $L_\infty$-algebra, similar to ordinary scattering amplitudes. As before, they are computed via homotopy transfer, but using deformation-retract data that are not canonical (in contrast to the conventional $S$-matrix). We further show that the $L_\infty$-algebras encoding Schwinger-Keldysh amplitudes and ordinary amplitudes are quasi-isomorphic (meaning, in a suitable sense, equivalent). This entails a set of recursion relations that enable one to compute Schwinger-Keldysh amplitudes in terms of ordinary amplitudes or vice versa.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Charged rotating wormholes: charge without charge
Authors:
Hyeong-Chan Kim,
Sung-Won Kim,
Bum-Hoon Lee,
Wonwoo Lee
Abstract:
We present a family of charged rotating wormhole solutions to the Einstein-Maxwell equations, supported by anisotropic matter fields. We first revisit the charged static cases and analyze the conditions for the solution to represent a wormhole geometry. The rotating geometry is obtained by applying the Newman-Janis algorithm to the static geometry. We show the solutions to Maxwell equations in det…
▽ More
We present a family of charged rotating wormhole solutions to the Einstein-Maxwell equations, supported by anisotropic matter fields. We first revisit the charged static cases and analyze the conditions for the solution to represent a wormhole geometry. The rotating geometry is obtained by applying the Newman-Janis algorithm to the static geometry. We show the solutions to Maxwell equations in detail. We believe that our wormhole geometry offers a geometric realization corresponding to the concept of 'charge without charge'.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Highly Tunable Ru-dimer Molecular Orbital State in 6H-perovskite Ba$_3$MRu$_2$O$_9$
Authors:
Bo Yuan,
Beom Hyun Kim,
Qiang Chen,
Daniel Dobrowolski,
Monika Azmanska,
G. M. Luke,
Shiyu Fan,
Valentina Bisogni,
Jonathan Pelliciari,
J. P. Clancy
Abstract:
Molecular orbital (MO) systems with clusters of heavy transition metal (TM) ions are one of the most important classes of model materials for studying the interplay between local physics and effects of itinerancy. Despite a large number of candidates identified in the family of 4d TM materials, an understanding of their physics from competing \textit{microscopic} energy scales is still missing. We…
▽ More
Molecular orbital (MO) systems with clusters of heavy transition metal (TM) ions are one of the most important classes of model materials for studying the interplay between local physics and effects of itinerancy. Despite a large number of candidates identified in the family of 4d TM materials, an understanding of their physics from competing \textit{microscopic} energy scales is still missing. We bridge this gap by reporting the first resonant inelastic X-ray scattering (RIXS) measurement on a well-known series of Ru dimer systems with a 6H-perovskite structure, Ba$_3$MRu$_2$O$_9$ (M$^{3+}$=In$^{3+}$, Y$^{3+}$, La$^{3+}$). Our RIXS measurements reveal an extremely fragile MO state in these Ru dimer compounds, evidenced by an abrupt change in the RIXS spectrum accompanying a tiny change in the local structure tuned by the M-site ion. By modelling the RIXS spectra, we attribute the enhanced electronic instability in Ba$_3$MRu$_2$O$_9$ to the combined effect of a large hop** and a small spin-orbit coupling in the Ru dimers. The unique combination of energy scales uncovered in the present study make Ru MO systems ideal model systems for studying quantum phase transitions with molecular orbitals.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Charge-Transfer Hyperbolic Polaritons in $α$-MoO$_3$/graphene heterostructures
Authors:
J. Shen,
M. Chen,
V. Korostelev,
H. Kim,
P. Fathi-Hafshejani,
M. Mahjouri-Samani,
K. Klyukin,
G-H. Lee,
S. Dai
Abstract:
Charge transfer is a fundamental interface process that can be harnessed for light detection, photovoltaics, and photosynthesis. Recently, charge transfer was exploited in nanophotonics to alter plasmon polaritons by involving additional non-polaritonic materials to activate the charge transfer. Yet, direct charge transfer between polaritonic materials hasn't been demonstrated. We report the direc…
▽ More
Charge transfer is a fundamental interface process that can be harnessed for light detection, photovoltaics, and photosynthesis. Recently, charge transfer was exploited in nanophotonics to alter plasmon polaritons by involving additional non-polaritonic materials to activate the charge transfer. Yet, direct charge transfer between polaritonic materials hasn't been demonstrated. We report the direct charge transfer in pure polaritonic van der Waals (vdW) heterostructures of $α$-MoO$_3$/graphene. We extracted the Fermi energy of 0.6 eV for graphene by infrared nano-imaging of charge transfer hyperbolic polaritons in the vdW heterostructure. This unusually high Fermi energy is attributed to the charge transfer between graphene and $α$-MoO$_3$. Moreover, we have observed charge transfer hyperbolic polaritons in multiple energy-momentum dispersion branches with a wavelength elongation of up to 150%. With support from the DFT calculation, we find that the charge transfer between graphene and $α$-MoO$_3$, absent in mechanically assembled vdW heterostructures, is attributed to the relatively pristine heterointerface preserved in the epitaxially grown vdW heterostructure. The direct charge transfer and charge transfer hyperbolic polaritons demonstrated in our work hold great promise for develo** nano-optical circuits, computational devices, communication systems, and light and energy manipulation devices.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
A Unification of Exchangeability and Continuous Exposure and Confounder Measurement Errors: Probabilistic Exchangeability
Authors:
Honghyok Kim
Abstract:
Exchangeability concerning a continuous exposure, X, implies no confounding bias when identifying average exposure effects of X, AEE(X). When X is measured with error (Xep), two challenges arise in identifying AEE(X). Firstly, exchangeability regarding Xep does not equal exchangeability regarding X. Secondly, the non-differential error assumption (NDEA) could be overly stringent in practice. To ad…
▽ More
Exchangeability concerning a continuous exposure, X, implies no confounding bias when identifying average exposure effects of X, AEE(X). When X is measured with error (Xep), two challenges arise in identifying AEE(X). Firstly, exchangeability regarding Xep does not equal exchangeability regarding X. Secondly, the non-differential error assumption (NDEA) could be overly stringent in practice. To address them, this article proposes unifying exchangeability and exposure and confounder measurement errors with three novel concepts. The first, Probabilistic Exchangeability (PE), states that the outcomes of those with Xep=e are probabilistically exchangeable with the outcomes of those truly exposed to X=eT. The relationship between AEE(Xep) and AEE(X) in risk difference and ratio scales is mathematically expressed as a probabilistic certainty, termed exchangeability probability (Pe). Squared Pe (Pe2) quantifies the extent to which AEE(Xep) differs from AEE(X) due to exposure measurement error through mechanisms not akin to confounding mechanisms. The coefficient of determination (R2) in the regression of Xep against X may sometimes be sufficient to measure Pe2. The second concept, Emergent Pseudo Confounding (EPC), describes the bias introduced by exposure measurement error through mechanisms akin to confounding mechanisms. PE requires controlling for EPC, which is weaker than NDEA. The third, Emergent Confounding, describes when bias due to confounder measurement error arises. Adjustment for E(P)C can be performed like confounding adjustment. This paper provides maximum insight into when AEE(Xep) is an appropriate surrogate of AEE(X) and how to measure the difference between these two. Differential errors could be addressed and may not compromise causal inference.
△ Less
Submitted 30 May, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
A replica theory for the dynamic glass transition of hardspheres with continuous polydispersity
Authors:
Hyonggi Kim,
Atsushi Ikeda
Abstract:
Glassy soft matter is often continuously polydisperse, in which the sizes or various properties of the constituent particles are distributed continuously. However, most of the microscopic theories of the glass transition focus on the monodisperse particles. Here, we developed a replica theory for the dynamic glass transition of continuously polydisperse hardspheres. We focused on the limit of infi…
▽ More
Glassy soft matter is often continuously polydisperse, in which the sizes or various properties of the constituent particles are distributed continuously. However, most of the microscopic theories of the glass transition focus on the monodisperse particles. Here, we developed a replica theory for the dynamic glass transition of continuously polydisperse hardspheres. We focused on the limit of infinite spatial dimension, where replica theory becomes exact. In theory, the cage size $A$, which plays the role of an order parameter, appears to depend on the particle size $σ$, and thus, the effective free energy, the so-called Franz-Parisi potential, is a functional of $A(σ)$. We applied this theory to two fundamental systems: a nearly monodisperse system and an exponential distribution system. We found that dynamic decoupling occurs in both cases; the critical particle size $σ^{\ast}$ emerges, and larger particles with $σ\geq σ^{\ast}$ vitrify, while smaller particles $σ< σ^{\ast}$ remain mobile. Moreover, the cage size $A(σ)$ exhibits a critical behavior at $σ\simeq σ^{\ast}$, originating from spinodal instability of $σ^{\ast}$-sized particles. We discuss the implications of these results for finite dimensional systems.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Metasurfaces with Full Control over Asymmetric Transmission of Light
Authors:
Hyeonhee Kim,
Joonkyo Jung,
Jonghwa Shin
Abstract:
The study of optical systems with asymmetric responses has grown significantly due to their broad application potential in various fields. In particular, Janus metasurfaces are notable for their ability to control light asymmetrically at the pixel level within thin films. However, previous demonstrations were restricted to the partial control of asymmetric transmission for a limited set of input p…
▽ More
The study of optical systems with asymmetric responses has grown significantly due to their broad application potential in various fields. In particular, Janus metasurfaces are notable for their ability to control light asymmetrically at the pixel level within thin films. However, previous demonstrations were restricted to the partial control of asymmetric transmission for a limited set of input polarizations, focusing primarily on scalar functionalities. Here, we introduce optical metasurfaces consisting of bi-layer silicon nanostructures that can achieve a fully generalized form of asymmetric transmission for any possible input polarization. Their designs owe much to our theoretical model of asymmetric optical transmission in reciprocal systems that explicates the relationship between front- and back-side Jones matrices for general cases, revealing a fundamental correlation between the polarization-direction channels of opposing sides of incidence. To practically circumvent this constraint, we propose a method to partition transmission space, enabling the realization of four distinct vector functionalities within the target volume. As a proof of concept, we experimentally demonstrate polarization-direction-multiplexed Janus vector holograms generating four vector holograms. When integrated with computational vector polarizer arrays, this approach facilitates optical encryption with a high level of obscurity. We anticipate that our mathematical framework and novel material systems for generalized asymmetric transmission may pave the way for applications such as optical computations, sensing, and imaging.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
New observational recipes for measuring dynamical state of galaxy clusters
Authors:
Hyowon Kim,
Rory Smith,
Jongwan Ko,
Jong-Ho Shinn,
Kyungwon Chun,
Jihye Shin,
Jaewon Yoo
Abstract:
During cluster assembly, a cluster's virialization process leaves behind signatures that can provide information on its dynamical state. However, no clear consensus yet exists on the best way to achieve this. Therefore, we attempt to derive improved recipes for classifying the dynamical state of clusters in observations using cosmological simulations. Cluster halo mass and their subhalos' mass are…
▽ More
During cluster assembly, a cluster's virialization process leaves behind signatures that can provide information on its dynamical state. However, no clear consensus yet exists on the best way to achieve this. Therefore, we attempt to derive improved recipes for classifying the dynamical state of clusters in observations using cosmological simulations. Cluster halo mass and their subhalos' mass are used to $ 10^{14}M_{\odot} h^{-1}$ and $10^{10}M_{\odot} h^{-1}$ to calculate five independent dynamical state indicators. We experiment with recipes by combining two to four indicators for detecting specific merger stages like recent and ancient mergers. These recipes are made by plotting merging clusters and a control sample of relaxed clusters in multiple indicators parameter space, and then applying a rotation matrix method to derive the best way to separate mergers from the control sample. The success of the recipe is quantified using the success rate and the overlap percentage of the merger and control histograms along the newly rotated $x$-axis. This provides us with recipes using different numbers of combined indicators and for different merger stage. Among the recipes, the stellar mass gap and center offset are the first and second most dominant of the indicators, and using more indicators improves the effectiveness of the recipe. When applied to observations, our results show good agreement with literature values of cluster dynamical state.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Towards comprehensive coverage of chemical space: Quantum mechanical properties of 836k constitutional and conformational closed shell neutral isomers consisting of HCNOFSiPSClBr
Authors:
Danish Khan,
Anouar Benali,
Scott Y. H. Kim,
Guido Falk von Rudorff,
O. Anatole von Lilienfeld
Abstract:
The Vector-QM24 (VQM24) dataset attempts to more comprehensively cover all possible neutral closed shell small organic and inorganic molecules and their conformers at state of the art level of theory. We have used density functional theory ($ω$B97X-D3/cc-pVDZ) to optimize 577k conformational isomers corresponding to 258k constitutional isomers.Isomers included contain up to five heavy atoms (non-h…
▽ More
The Vector-QM24 (VQM24) dataset attempts to more comprehensively cover all possible neutral closed shell small organic and inorganic molecules and their conformers at state of the art level of theory. We have used density functional theory ($ω$B97X-D3/cc-pVDZ) to optimize 577k conformational isomers corresponding to 258k constitutional isomers.Isomers included contain up to five heavy atoms (non-hydrogen) consisting of $p$-block elements C, N, O, F, Si, P, S, Cl, Br. Single point diffusion quantum Monte Carlo (DMC@PBE0(ccECP/cc-pVQZ)) energies are reported for the sub-set of the lowest conformers of 10,793 molecules with up to 4 heavy atoms.This dataset has been systematically generated by considering all combinatorially possible stoichiometries, and graphs (according to Lewis rules as implemented in the {\tt SURGE} package), along with all stable conformers identified by GFN2-xTB. Apart from graphs, geometries, rotational constants, and vibrational normal modes, VQM24 includes internal, atomization, electron-electron repulsion, exchange correlation, dispersion, vibrational frequency, Gibbs free, enthalpy, ZPV, molecular orbital energies; as well as entropy, and heat capacities. Electronic properties include multipole moments (dipole, quadrupole, octupole, hexadecapole), electrostatic potentials at nuclei (alchemical potential), Mulliken charges, and molecular wavefunctions. VQM24 represents a highly accurate and unbiased dataset of molecules, ideal for testing and training transferable, scalable, and generative ML models of real quantum systems.
△ Less
Submitted 13 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.